Query Language
Akumuli query language reference.
Akumuli uses REST interface for data retrieval. You should use api/query endpoint for data queries and api/search endpoint for metadata queries.
Data Model
All data is split between the different metrics. You can think about the metric as a namespace for series names. Each series name starts with metric name:
Here, cpu.user is a metric name, and host=pg-balancer is a tag-value pair. Akumuli uses Boolean model (BIR) for series name searching. For every query you need to specify a metric name and optionally, a set of tag-value pairs (using the where field). Series that have this metric name and contains this tag-value pairs will be added to result set. If the query doesn't have a where field, it will return all time-series with the same metric name.
Every series name is linked to the list of data points (the actual time-series data). Every data-point contains a timestamp (64-bit, nanosecond precision) and a value (double precision floating point). You need to specify a search range for that data inside the query.
Query Object
To retrieve the data you should create a query object that describes what data you need and what shape it should have. The query should be JSON-encoded and sent to 'query' endpoint.
Error Handling
Query parsing errors are reported using the RESP protocol. The only line in the response will be started with '-' followed by the error message.
Some errors can be reported using the HTTP error codes (e.g. when the wrong API endpoint is used). The query parsing errors are reported using the error messages and the query processing errors usually reported using the HTTP error codes.
Query Types
Query object can be of one of the following types:
Select Query
This is a simplest possible query type. It can return raw time-series data without aggregation. Select query can have a post-processing steps (e.g. rate or sliding window computation).
Select query can return more than one series but they should have the same metric name.
Select Events Query
This query returns all events in specified time-range. This query can be used only with events.
Aggregate Query
This query can be used to calculate aggregates over time-series. The query returns only one result for every time-series.
Group-aggregate Query
This query is used to downsample time-series data. It divides all data-points into a series of equally sized bins and computes a single value for every bin if the bin is not empty. If the bin is empty it doesn't produce any value. The same aggregation functions that can be used with aggregate query can be used with group-aggregate. The difference between the aggregate and group-aggregate queries is that the aggregate produces only one value for every series but the group-aggregate can produce a time-series with fixed step. Also, more than one aggregation function can be used with group-aggregate query.
Join Query
Join query can be used to align several metrics together. The query will group together series that has the same tags but different metric names. The resulting output will be in bulk load format. Series names of the individual series will be joined together using the compound series name format.
Query Fields
The query object is JSON encoded. It can contain the following set of fields. Some of this fields are mandatory and other are optional. The 'select', 'aggregate', 'group-aggregate', and 'join' fields define query type. Query object should have one of them.
Range Field
Both "range.from" and "range.to" fields are mandatory for most queries. They can be omitted in 'aggregate' query. In this case the 'aggregate' query will compute aggregate for the whole time-series.
If "range.from" is less than "range.to" the time-series data points will be returned in ascending order (from old to new). If "range.from" is greater than "range.to" then the time-series data points will be returned in descending order (from recent to old).
Range field denotes the time interval that query should fetch.
Both timestamps should be encoded using the basic ISO8601 format. The same format is used for data ingestion with RESP protocol. Alternatively, nanosecond precision UNIX timestamps can be used as values.
Select Field
Select field is used to tell Akumuli what metric should be fetched.
This field defines the query type. If this field is used the query will be a select query. You can provide only one "select" field. This field can have only one metric name. The query will fetch all series with this metric name. This series can be further filtered by using "where" field.
Select Events Field
Select events field is used in 'select-events' query to specify event name.
This field defines query type. If this field is used in the query, the query will be a select-events query.
Aggregate Field
Aggregate field is required to create an aggregate query. The type of the field is a dictionary that has the following format:
At least one metric-name and aggregation-function pair should be provided. The available aggregation functions are these:
The aggregate query object computes aggregate only for values inside the specified time-range. If there is no values inside the range, the query will return an error.
Group-aggregate Field
Group-aggregate field is required to make a group-aggregate query. The field is a dictionary with the following format:
Using one function
Using list of metrics
If more than one metric name is used the values with different metric names will be interleaved in the output. All function will be applied to every metric name. For instance if the query has two metrics - 'cpu' and 'mem', and two functions - 'min' and 'max' the output will have the following format:
So the output format is defined only by group-aggregate functions and not by the number of metric names.
If only one aggregation function is used in group-aggregate
field the output will have the following format:
The series name of the original series changes. The tags stays the same but the metric name gets the :<function-name> suffix. In the example above, original series name was 'cpu host=host1' but the resulting series name is 'cpu:min host=host1'.
Using list of functions
If more than one aggregation function was used in group-aggregate field the output will have the following format:
Metric name is changed as described above plus, the compound series name format is used. The query will return a series for every aggregation function in the list. This series will have the same timestamps but different values (since different functions were used to produce them). Then, these series will be joined together and the bulk format is used to return them.
Join Field
Join field is used to make a join query. This field's type is list. The list should contain valid metric names. Example:
Here cpu
, mem
, and iops
is different metric names. Query processor will find series names with the same set of tags with this metrics and join them. E.g. if we have three series - "cpu host=host1", "mem host=host1", and "iops host=host1" - all three series will be joined together producing single series "cpu|mem|iops host=host1". The output will contain records in bulk format.
This field was replaced with `pivot-by-tag` field.
Where Field
Where field is used to limit number of series returned by the query.
You can specify many tags in one where field. This data in conjunction with metric name (or names) will form be used to search series inside the index.
Note that the timestamps and values are the same. Only series names are different.
Pivot-by-tag Field
In a nutshell, pivot-by-tag
tells query processor to remove all tags from series name except the ones that was listed. After that all series that have matching tags are considered equal and merged together.
For instance, if pivot-by-tag
field was used to specify a single tag name, all series with this tag with the same value will collapse into one. All data points from that series will be joined together. The resulting time-series will contain all data-points from the original series. The series name will contain only the specified tag. It's also possible to specify more than one tag.
Group-by-tag
The group-by-tag
tells query processor to remove listed tags from series name. After that all series that have matching tags are considered equal and merged together.
The group-by-tag
is the opposite of pivot-by-tag
. Given the example in previous section query with "group-by-tag": ["valve_num"]
would produce the following output:
Group-by Field (DEPRICATED)
Suppose that you need to store the valve pressure measurements. Pressure in each valve is measured by two separate sensors so you're end up with this schema: pressure_kPa valve_num=XXX sensor_num=YYY
. Here we have pressure_kPa
metric with two tags: valve_num
and sensor_num
. If you query this series you will get the following results (\r\n omitted):
Each combination of sensor and valve produces its own time-series. If you want to group data only by valve you can use "pivot-by-tag" field. If you add a "pivot-by-tag": [ "valve_num" ]
field to the query the result will look like this:
Order-by Field
This field can be used to control the order of the data-points in the query output.
This field takes single string. It can be "series" or "time". If order-by
is "series" the results will be ordered by series name first and then by timestamp. If order-by
is "time" then data points will be ordered by timestamp first and then by series name.
Output Field
This field can be used to control format of the output.
The field is a dictionary with two possible values. The first one is output.format
. It can be set to "resp" or "csv". The first value is used by default. The output will be formatted using RESP serialization format. The same that is used to send data to Akumuli. The second value changes the output format to CSV. This is how the output of the query will look with output.format
set to "csv":
The second field is output.timestamp
. It controls formatting of the timestamps in the output of the query. If it's set to "raw" Akumuli will format timestamps as 64-bit integers.
If it's set to "iso" timestamps will be formatted according to ISO8601 standard.
Filter Field
Filter field can be used to filter data-points by value.
This field should contain a dictionary with the predicates. The possible predicates are "gt" (greater than), "ge" (greater or equal), "lt" (less than), and "le" (less or equal). It is possible to combine two predicates if you want to read values that fit some range, for instance "filter: {"gt": 0, "lt": 10 }
will select all values between 0 and 10, but not 0 and 10. You can use only predicate if needed.
The use of filter field can speed up query execution if the number of returned values is small. In this case the query engine won't read all the data from disk but only those pages that have the data the query needs.
Multi-dimensional filter
Filter field can be used with the join query. If this is the case, you have to specify the metric to which the filter should be applied.
In this case filter >200 will be applied to "cpu" metric and the filter <100 will be applied to "mem" metric.
Limit and Offset Fields
You can use limit
and offset
query fields to limit the number of returned tuples and to skip some tuples at the beginning of the query output. This fields works the same as LIMIT and OFFSET clauses in SQL.
Don't use this fields if you need to read all the data in chunks. Akumuli executes queries lazily. To read data in chunks, you can issue a normal query (without limit and offset) and read the first chunk (without disconnecting from the server afterwards). When you done with the first chunk you can read the next one, and so on. The query will be executed as far as you read data through the TCP connection. When you'll stop reading to process the data the query execution on the server will pause. It will resume when you'll continue reading.
Last updated