Prometheus connector
Zipstack Cloud features a powerful SQL querying engine on top of many types of connectors, including those from Trino, some custom connectors and connectors from the open source Airbyte project. The underlying native connectors are Trino's connectors. Additionally, some parts of the documentation for these connectors have been adapted from the connector documentation found in Trino's open source project.
The Prometheus connector allows reading Prometheus metrics as tables in Trino.
The mechanism for querying Prometheus is to use the Prometheus HTTP API.
Specifically, all queries are resolved to Prometheus Instant queries
with a form like:
http://localhost:9090/api/v1/query?query=up%5B21d%5D&time=1568229904.000.
In this case the up metric is taken from the Trino query table name,
21d is the duration of the query. The Prometheus time value
corresponds to the timestamp field. Trino queries are translated from
their use of the timestamp field to a duration and time value as
needed. Trino splits are generated by dividing the query range into
attempted equal chunks.
Requirements
To query Prometheus, you need:
Network access from Zipstack Cloud to the Prometheus server. The default port is 9090.
Prometheus version 2.15.1 or later.
Configuration
Create a data source with the following minimum configuration. Replacr the properties as appropriate:
prometheus.uri=http://localhost:9090
prometheus.query.chunk.size.duration=1d
prometheus.max.query.range.duration=21d
prometheus.cache.ttl=30s
prometheus.bearer.token.file=/path/to/bearer/token/file
prometheus.read-timeout=10s
Configuration properties
The following configuration properties are available:
| Property name | Description |
|---|---|
prometheus.uri | Where to find Prometheus coordinator host |
prometheus.query.chunk.size.duration | The duration of each query to Prometheus |
prometheus.max.query.range.duration | Width of overall query to Prometheus, will be divided into query-chunk-size-duration queries |
prometheus.cache.ttl | How long values from this config file are cached |
prometheus.auth.user | Username for basic authentication |
prometheus.auth.password | Password for basic authentication |
prometheus.bearer.token.file | File holding bearer token if needed for access to Prometheus |
prometheus.read-timeout | How much time a query to Prometheus has before timing out |
prometheus.case-insensitive-name-matching | Match Prometheus metric names case insensitively. Defaults to false |
Not exhausting your Trino available heap
The prometheus.query.chunk.size.duration and
prometheus.max.query.range.duration are values to protect Trino from
too much data coming back from Prometheus. The
prometheus.max.query.range.duration is the item of particular
interest.
On a Prometheus instance that has been running for awhile and depending
on data retention settings, 21d might be far too much. Perhaps 1h
might be a more reasonable setting. In the case of 1h it might be then
useful to set prometheus.query.chunk.size.duration to 10m, dividing
the query window into 6 queries each of which can be handled in a Trino
split.
Primarily query issuers can limit the amount of data returned by
Prometheus by taking advantage of WHERE clause limits on timestamp,
setting an upper bound and lower bound that define a relatively small
window. For example:
SELECT * FROM example.default.up WHERE timestamp > (NOW() - INTERVAL '10' second);
If the query does not include a WHERE clause limit, these config settings are meant to protect against an unlimited query.
Bearer token authentication
Prometheus can be setup to require a Authorization header with every
query. The value in prometheus.bearer.token.file allows for a bearer
token to be read from the configured file. This file is optional and not
required unless your Prometheus setup requires it.
Type mapping
Because Trino and Prometheus each support types that the other does not,
this connector modifies some types <type-mapping-overview> when
reading data.
The connector returns fixed columns that have a defined mapping to Trino types according to the following table:
| Prometheus column | Trino type |
|---|---|
labels | MAP(VARCHAR,VARCHAR) |
timestamp | TIMESTAMP(3) WITH TIMEZONE |
value | DOUBLE |
No other types are supported.
The following example query result shows how the Prometheus up metric
is represented in Trino:
SELECT * FROM example.default.up;
labels | timestamp | value
--------------------------------------------------------+--------------------------------+-------
{instance=localhost:9090, job=prometheus, __name__=up} | 2022-09-01 06:18:54.481 +09:00 | 1.0
{instance=localhost:9090, job=prometheus, __name__=up} | 2022-09-01 06:19:09.446 +09:00 | 1.0
(2 rows)
SQL support
The connector provides globally available <sql-globally-available> and
read operation <sql-read-operations> statements to access data and
metadata in Prometheus.