Query Language
Akumuli query language reference.
Last updated
Akumuli query language reference.
Last updated
Akumuli uses REST interface for data retrieval. You should use endpoint for data queries and endpoint for metadata queries.
All data is split between the different metrics. You can think about the metric as a namespace for series names. Each series name starts with metric name:
Here, cpu.user is a metric name, and host=pg-balancer is a tag-value pair. Akumuli uses for series name searching. For every query you need to specify a metric name and optionally, a set of tag-value pairs (using the field). Series that have this metric name and contains this tag-value pairs will be added to result set. If the query doesn't have a where field, it will return all time-series with the same metric name.
Every series name is linked to the list of data points (the actual time-series data). Every data-point contains a timestamp (64-bit, nanosecond precision) and a value (double precision floating point). You need to specify a search range for that data inside the query.
To retrieve the data you should create a query object that describes what data you need and what shape it should have. The query should be JSON-encoded and sent to .
Query parsing errors are reported using the . The only line in the response will be started with '-' followed by the error message.
Some errors can be reported using the HTTP error codes (e.g. when the wrong API endpoint is used). The query parsing errors are reported using the error messages and the query processing errors usually reported using the HTTP error codes.
Query object can be of one of the following types:
This is a simplest possible query type. It can return raw time-series data without aggregation. Select query can have a post-processing steps (e.g. rate or sliding window computation).
Select query can return more than one series but they should have the same metric name.
Field
Required
Commentary
Yes
Metric name
Yes
Time range
No
Tag filter
No
Merge series by tag (depricated)
No
Merge series by tag
No
Merge series by tag
No
Order of the data-points in the result set
No
Value based filtering
No
Limit on output size
No
Offset of the query output
This query returns all events in specified time-range. This query can be used only with events.
Field
Required
Commentary
Yes
Event name
Yes
Time range
No
Tag filter
No
Merge series by tag (depricated)
No
Merge series by tag
No
Merge series by tag
No
Order of the data-points in the result set
No
Limit on output size
No
Offset of the query output
This query can be used to calculate aggregates over time-series. The query returns only one result for every time-series.
Field
Required
Commentary
Yes
Metric name and aggregation function
Yes
Time range
No
Merge series by tag
No
Merge series by tag
No
Tag filter
No
Set output format
Field
Required
Commentary
Yes
Query specific parameters
Yes
Time range
No
Tag filter
No
Merge series by tag
No
Merge series by tag
No
Order of the data-points in the result set
No
Value filter
No
Limit on output size
No
Offset of the query output
Field
Required
Commentary
Yes
List of metrics to join
Yes
Time range
No
Tag filter
No
Merge sereis by tag
No
Merge series by tag
No
Ordering of the data-points in the result set
No
Value filter
No
Limit on output size
No
Offset of the query output
The query object is JSON encoded. It can contain the following set of fields. Some of this fields are mandatory and other are optional. The 'select', 'aggregate', 'group-aggregate', and 'join' fields define query type. Query object should have one of them.
Both "range.from" and "range.to" fields are mandatory for most queries. They can be omitted in 'aggregate' query. In this case the 'aggregate' query will compute aggregate for the whole time-series.
If "range.from" is less than "range.to" the time-series data points will be returned in ascending order (from old to new). If "range.from" is greater than "range.to" then the time-series data points will be returned in descending order (from recent to old).
Range field denotes the time interval that query should fetch.
Field
Format
Description
"range"
{ "from": "20180530T123000", "to": "20180530T130000" }
Field should contain a dictionary with two keys, "from" and "to". Both values are timestamps.
Both timestamps should be encoded using the basic ISO8601 format. The same format is used for data ingestion with RESP protocol. Alternatively, nanosecond precision UNIX timestamps can be used as values.
Select field is used to tell Akumuli what metric should be fetched.
Field
Format
Description
"select"
"metric.name"
Metric name
Select events field is used in 'select-events' query to specify event name.
Field
Format
Descritpion
"select-events"
"!event.name"
Event name
At least one metric-name and aggregation-function pair should be provided. The available aggregation functions are these:
Name
Description
count
Number of elements in series
max
Largest element in the series
min
Smallest element in the series
mean
Average value
sum
Sum of all values in the series
min_timestamp
The timestamp of the smallest element
max_timestamp
The timestamp of the largest element
first
The first value in the range
last
The last value in the range
Field
Format
Description
group-aggregate.metric
String
group-aggregate.metric
List
List of metric names
group-aggregate.step
String
Aggregation step (10s, 1h, 5m)
group-aggregate.func
String
Aggregation function
group-aggregate.func
List
List of aggregation functions
If more than one metric name is used the values with different metric names will be interleaved in the output. All function will be applied to every metric name. For instance if the query has two metrics - 'cpu' and 'mem', and two functions - 'min' and 'max' the output will have the following format:
So the output format is defined only by group-aggregate functions and not by the number of metric names.
If only one aggregation function is used in group-aggregate
field the output will have the following format:
The series name of the original series changes. The tags stays the same but the metric name gets the :<function-name> suffix. In the example above, original series name was 'cpu host=host1' but the resulting series name is 'cpu:min host=host1'.
If more than one aggregation function was used in group-aggregate field the output will have the following format:
This field was replaced with `pivot-by-tag` field.
Where field is used to limit number of series returned by the query.
Field
Format
Description
"where"
{ "tag-name": "tag-value" }
Include only series names which has tag "tag-name" set to "tag-value".
"where"
{ "tag-name": [ "value1", "value2" ] }
Include only series names which has tag "tag-name" set to one of the values "value1" or "value2".
Note that the timestamps and values are the same. Only series names are different.
In a nutshell, pivot-by-tag
tells query processor to remove all tags from series name except the ones that was listed. After that all series that have matching tags are considered equal and merged together.
For instance, if pivot-by-tag
field was used to specify a single tag name, all series with this tag with the same value will collapse into one. All data points from that series will be joined together. The resulting time-series will contain all data-points from the original series. The series name will contain only the specified tag. It's also possible to specify more than one tag.
The group-by-tag
tells query processor to remove listed tags from series name. After that all series that have matching tags are considered equal and merged together.
Field
Format
Description
group-by-tag
["tag1", ..., "tagN"]
List of tag to remove from series
group-by-tag
"tag-name"
Tag to remove from series name
The group-by-tag
is the opposite of pivot-by-tag
. Given the example in previous section query with "group-by-tag": ["valve_num"]
would produce the following output:
Field
Format
Description
pivot-by-tag
[ "tag1", "tag2", ..., "tagN" ]
The list of tags that resulting series name should have.
pivot-by-tag
"tag-name"
The single tag the resulting series should have.
Suppose that you need to store the valve pressure measurements. Pressure in each valve is measured by two separate sensors so you're end up with this schema: pressure_kPa valve_num=XXX sensor_num=YYY
. Here we have pressure_kPa
metric with two tags: valve_num
and sensor_num
. If you query this series you will get the following results (\r\n omitted):
Each combination of sensor and valve produces its own time-series. If you want to group data only by valve you can use "pivot-by-tag" field. If you add a "pivot-by-tag": [ "valve_num" ]
field to the query the result will look like this:
This field can be used to control the order of the data-points in the query output.
Field
Format
Description
"order-by"
"series"
Sort output by series name
"order-by"
"time"
Sort output by timestamp
This field takes single string. It can be "series" or "time". If order-by
is "series" the results will be ordered by series name first and then by timestamp. If order-by
is "time" then data points will be ordered by timestamp first and then by series name.
This field can be used to control format of the output.
Field
Format
Description
"output"
{ "format": "csv", "timestamp": "raw" }
Set output format to "csv" and timestamp format to "raw".
"output"
{ "format": "resp", "timestamp": "iso" }
Set output format to "resp" and timestamp format to "iso".
The second field is output.timestamp
. It controls formatting of the timestamps in the output of the query. If it's set to "raw" Akumuli will format timestamps as 64-bit integers.
If it's set to "iso" timestamps will be formatted according to ISO8601 standard.
Filter field can be used to filter data-points by value.
Field
Format
Description
"format"
{ "gt": 10, "lt": 100 }
Filter out all values less or equal to 10 and greater or equal to 100.
"format"
{ "ge": 0, "le": 1 }
Filter out all negative values and all values greater then one.
This field should contain a dictionary with the predicates. The possible predicates are "gt" (greater than), "ge" (greater or equal), "lt" (less than), and "le" (less or equal). It is possible to combine two predicates if you want to read values that fit some range, for instance "filter: {"gt": 0, "lt": 10 }
will select all values between 0 and 10, but not 0 and 10. You can use only predicate if needed.
The use of filter field can speed up query execution if the number of returned values is small. In this case the query engine won't read all the data from disk but only those pages that have the data the query needs.
Filter field can be used with the join query. If this is the case, you have to specify the metric to which the filter should be applied.
In this case filter >200 will be applied to "cpu" metric and the filter <100 will be applied to "mem" metric.
You can use limit
and offset
query fields to limit the number of returned tuples and to skip some tuples at the beginning of the query output. This fields works the same as LIMIT and OFFSET clauses in SQL.
Don't use this fields if you need to read all the data in chunks. Akumuli executes queries lazily. To read data in chunks, you can issue a normal query (without limit and offset) and read the first chunk (without disconnecting from the server afterwards). When you done with the first chunk you can read the next one, and so on. The query will be executed as far as you read data through the TCP connection. When you'll stop reading to process the data the query execution on the server will pause. It will resume when you'll continue reading.
This query is used to downsample time-series data. It divides all data-points into a series of equally sized bins and computes a single value for every bin if the bin is not empty. If the bin is empty it doesn't produce any value. The same aggregation functions that can be used with can be used with group-aggregate. The difference between the aggregate and group-aggregate queries is that the aggregate produces only one value for every series but the group-aggregate can produce a time-series with fixed step. Also, more than one aggregation function can be used with group-aggregate query.
Join query can be used to align several metrics together. The query will group together series that has the same tags but different metric names. The resulting output will be in . Series names of the individual series will be joined together using the .
This field defines the query type. If this field is used the query will be a . You can provide only one "select" field. This field can have only one metric name. The query will fetch all series with this metric name. This series can be further filtered by using "where" field.
This field defines query type. If this field is used in the query, the query will be a query.
Aggregate field is required to create an . The type of the field is a dictionary that has the following format:
The aggregate query object computes aggregate only for values inside the specified . If there is no values inside the range, the query will return an error.
Group-aggregate field is required to make a . The field is a dictionary with the following format:
Metric name (same as in )
Metric name is changed as described above plus, the is used. The query will return a series for every aggregation function in the list. This series will have the same timestamps but different values (since different functions were used to produce them). Then, these series will be joined together and the is used to return them.
Join field is used to make a . This field's type is list. The list should contain valid metric names. Example:
Here cpu
, mem
, and iops
is different metric names. Query processor will find series names with the same set of tags with this metrics and join them. E.g. if we have three series - "cpu host=host1", "mem host=host1", and "iops host=host1" - all three series will be joined together producing single series "cpu|mem|iops host=host1". The output will contain records in .
You can specify many tags in one where field. This data in conjunction with metric name (or names) will form be used to inside the index.
The field is a dictionary with two possible values. The first one is output.format
. It can be set to "resp" or "csv". The first value is used by default. The output will be formatted using format. The same that is used to send data to Akumuli. The second value changes the output format to CSV. This is how the output of the query will look with output.format
set to "csv":