Home » Blog » Understanding Time Series Data Processing Functions in CnosDB

Understanding Time Series Data Processing Functions in CnosDB

CnosDB is a database specialized in time series data processing, designed to address storage and analysis challenges related to time series data. It provides efficient management and querying of time series data for users. To achieve this goal, CnosDB implements a series of dedicated functions, including the following categories:

stats_agg

Usage

The stats_agg function is suitable for scenarios that require statistical analysis of time series data, such as calculating correlation coefficients and covariances. It can also calculate common statistics for each dimension, such as means and standard deviations.

Function

stats_agg(y, x)

Input:

  • y: double
  • x: double

Return:

{ 
  n: bigint,   -- count 
  sx: double,  -- sum(x)- sum(x)
  sx2: double, -- sum((x-sx/n)^2) (sum of squares)
  sx3: double, -- sum((x-sx/n)^3)
  sx4: double, -- sum((x-sx/n)^4)
  sy: double,  -- sum(y)
  sy2: double, -- sum((y-sy/n)^2) (sum of squares)
  sy3: double, -- sum((y-sy/n)^3)
  sy4: double, -- sum((y-sy/n)^4)
  sxy: double, -- sum((x-sx/n)*(y-sy/n)) (sum of products) 
}

Example

Imagine you have a table that records the execution status of a certain test project, where ‘x’ represents time, and ‘y’ represents the corresponding test data. You want to understand the distribution patterns of these test data, and stats_agg is your powerful assistant for this task.

create table if not exists test_stats(x bigint, y bigint);
insert into test_stats(time, x, y) values
(1, 1, 1),
(2, 1, 2),
(3, 1, 3),
(4, 1, 4),
(5, 1, 5),
(6, 2, 1),
(7, 2, 2),
(8, 2, 3),
(9, 2, 4),
(10, 2, 5);
select stats_agg(y, x) from test_stats;

In this example, the stats_agg function is used to analyze the distribution of test data. It provides information such as the count (n), sum of x (sx), sum of x squares (sx2), sum of y (sy), and sum of y squares (sy2).

+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stats_agg(test_stats.y,test_stats.x)                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {n: 10, sx: 15.0, sx2: 2.5, sx3: -2.7755575615628914e-16, sx4: 0.6249999999999999, sy: 30.0, sy2: 20.0, sy3: -1.7763568394002505e-15, sy4: 68.0, sxy: 0.0} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+

gauge_agg

Usage

Gauge_agg is suitable for monitoring systems, IoT applications, and the financial sector. It provides snapshots of data within a time period.

Function

gauge_agg(time, value)

Input:

  • time: Timestamp
  • value: DOUBLE

Return:

{ 
  first: { ts: Timestamp, value: Double },
  second: { ts: Timestamp, value: Double },
  penultimate: { ts: Timestamp, value: Double },
  last: { ts: Timestamp, value: Double },
  num_elements: Bigint Unsingned
}

Example

The gauge_agg function is used to aggregate air quality data by month and provide snapshots of pressure data for each month.SELECT gauge_agg(time, pressure) FROM air GROUP BY date_trunc(‘month', time);

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| gauge_agg(air.time,air.pressure)                                                                                                                                                                                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {first: {ts: 2023-03-01T00:00:00, val: 54.0}, second: {ts: 2023-03-01T00:00:00, val: 59.0}, penultimate: {ts: 2023-03-14T16:00:00, val: 55.0}, last: {ts: 2023-03-14T16:00:00, val: 80.0}, num_elements: 13122} |
| {first: {ts: 2023-01-14T16:00:00, val: 63.0}, second: {ts: 2023-01-14T16:00:00, val: 68.0}, penultimate: {ts: 2023-01-31T23:57:00, val: 77.0}, last: {ts: 2023-01-31T23:57:00, val: 54.0}, num_elements: 16640} |
| {first: {ts: 2023-02-01T00:00:00, val: 54.0}, second: {ts: 2023-02-01T00:00:00, val: 60.0}, penultimate: {ts: 2023-02-28T23:57:00, val: 74.0}, last: {ts: 2023-02-28T23:57:00, val: 59.0}, num_elements: 26880} |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The data above represents the aggregation results of data for each month. “first” to “last” respectively represent the timestamps and pressure values of the first, second, second-to-last, and last data points within each month. “num_elements” represents the number of data points in that month.

compact_state_agg/state_agg

Usage

Used for tracking the duration of states, suitable for device health monitoring, event statistics, etc.

Function

compact_state_agg(time, state)
state_agg(time, state)

Input:

  • time: Timestamp
  • state: any date type

Return:

{ 
  state_duration: [
    { state: Any, interval: Duration },
    ...
  ],
  state_periods: [
    {
      state: Any,
      periods: [
        { start_time: Timestamp, end_time: Timestamp },
        ...
      ] 
    },
    ...
  ]
}

Example

Insert data containing timestamps and state information into a table named “states”. Then, call the compact_state_agg and state_agg functions using two different SQL queries.

CREATE TABLE IF NOT EXISTS states(state STRING);
INSERT INTO states VALUES
('2020-01-01 10:00:00', 'starting'),
('2020-01-01 10:30:00', 'running'),
('2020-01-03 16:00:00', 'error'),
('2020-01-03 18:30:00', 'starting'),
('2020-01-03 19:30:00', 'running'),
('2020-01-05 12:00:00', 'stopping');

compact_state_agg Example

SELECT compact_state_agg(time, state) FROM states;

The result is a JSON-formatted object that contains the total duration of different states (state_duration). In this example, it shows the total duration of each state but does not provide specific transition periods for each state.

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| compact_state_agg(states.time,states.state)                                                                                                                                                                                                                                                                                                                                          |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {state_duration: [{state: running, duration: 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs}, {state: error, duration: 0 years 0 mons 0 days 2 hours 30 mins 0.000000000 secs}, {state: starting, duration: 0 years 0 mons 0 days 1 hours 30 mins 0.000000000 secs}, {state: stopping, duration: 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs}], state_periods: []} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

state_agg Example

SELECT state_agg(time, state) FROM states;

The result is also a JSON-formatted object that includes both the total duration of each state (state_duration) and the transition periods for each state (state_periods). In this example, the state_periods list shows the specific transition periods for each state, including start and end times.

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| state_agg(states.time,states.state)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {state_duration: [{state: starting, duration: 0 years 0 mons 0 days 1 hours 30 mins 0.000000000 secs}, {state: running, duration: 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs}, {state: stopping, duration: 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs}, {state: error, duration: 0 years 0 mons 0 days 2 hours 30 mins 0.000000000 secs}], state_periods: [{state: starting, periods: [{start_time: 2020-01-01T10:00:00, end_time: 2020-01-01T10:30:00}, {start_time: 2020-01-03T18:30:00, end_time: 2020-01-03T19:30:00}]}, {state: error, periods: [{start_time: 2020-01-03T16:00:00, end_time: 2020-01-03T18:30:00}]}, {state: running, periods: [{start_time: 2020-01-01T10:30:00, end_time: 2020-01-03T16:00:00}, {start_time: 2020-01-03T19:30:00, end_time: 2020-01-05T12:00:00}]}]} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Note

  • state_agg records each duration segment for each state, and it may use more memory when there are a large number of states or when the duration segments are widely distributed. Use it with caution.
  • compact_state_agg does not include the state_periods field in the returned data. It aggregates all duration segments for each state into total time, consuming relatively less memory. You can choose between these two functions based on your specific needs.

So, the difference between these two functions is that compact_state_agg focuses more on the overall duration, while state_agg provides more detailed information about state transition periods.

candlestick_agg

Usage

Applicable in the financial field to generate data for opening price, closing price, highest price, and lowest price.

Function

candlestick_agg(time, price, volume)

Input:

  • time: Timestamp
  • price: Double
  • volume: Double

Return:

{ 
  open: { ts: Timestamp, val: Double },
  close: { ts: Timestamp, val: Double },
  high: { ts: Timestamp, val: Double },
  low: { ts: Timestamp, val: Double },
  volume: { vol: Double, vwap: Double },
}

Example

Now we create a sample dataset.

CREATE TABLE IF NOT EXISTS tick(price bigint ,volume bigint);
INSERT tick(time, price, volume)
VALUES
    ('1999-12-31 00:00:00.000', 111, 444),
    ('1999-12-31 00:00:00.005', 222, 444),
    ('1999-12-31 00:00:00.010', 333, 222),
    ('1999-12-31 00:00:10.015', 444, 111),
    ('1999-12-31 00:00:10.020', 222, 555),
    ('1999-12-31 00:10:00.025', 333, 555),
    ('1999-12-31 00:10:00.030', 444, 333),
    ('1999-12-31 01:00:00.035', 555, 222);

Then, query statistical information in candlestick chart format for time series data, including open price, close price, and low price.

SELECT candlestick_agg(time, price, volume) FROM tick;

Result return:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| candlestick_agg(tick.time,tick.price,tick.volume)                                                                                                                                                                                   |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {open: {ts: 1999-12-31T00:00:00, val: 111.0}, close: {ts: 1999-12-31T01:00:00.035, val: 555.0}, low: {ts: 1999-12-31T00:00:00, val: 111.0}, high: {ts: 1999-12-31T01:00:00.035, val: 555.0}, volume: {vol: 2886.0, vwap: 850149.0}} |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

asap_smooth

Usage

ASAP_SMOOTH is a downsampling function suitable for visualization, data exploration, forecasting, and model training scenarios.

Function

asap_smooth(time, value, resolution order by time)

Input:

  • time: Timestamp 类型
  • value: Double 类型
  • resolution: Bigint 类型

Return:

{ 
  time: [ Timestamp, ... ],
  value: [ Double, ... ],
  resolution: Int Unsigned,
}

Example

Use the asap_smooth function to aggregate air quality data by month and apply ASAP (As Soon As Possible) smoothing on pressure data to obtain smoothed time series data.

SELECT asap_smooth(time, pressure, 10) FROM air GROUP BY date_trunc('month', time);

Result return:

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| asap_smooth(air.time,air.pressure,Int64(10))                                                                                                                                                                                                                                                                                                                                                                                                   |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {time: [2023-01-14T16:00:00, 2023-01-16T14:13:00, 2023-01-18T12:26:00, 2023-01-20T10:39:00, 2023-01-22T08:52:00, 2023-01-24T07:05:00, 2023-01-26T05:18:00, 2023-01-28T03:31:00, 2023-01-30T01:44:00, 2023-01-31T23:57:00], value: [64.79507211538461, 65.31009615384616, 65.25841346153847, 64.8485576923077, 65.09495192307692, 65.02524038461539, 64.8389423076923, 65.2421875, 65.02103365384616, 65.1141826923077], resolution: 10}        |
| {time: [2023-02-01T00:00:00, 2023-02-04T02:39:40, 2023-02-07T05:19:20, 2023-02-10T07:59:00, 2023-02-13T10:38:40, 2023-02-16T13:18:20, 2023-02-19T15:58:00, 2023-02-22T18:37:40, 2023-02-25T21:17:20, 2023-02-28T23:57:00], value: [65.20982142857143, 64.90625, 64.94828869047619, 64.97916666666667, 64.88504464285714, 64.8203125, 64.64434523809524, 64.88802083333333, 65.0, 64.76004464285714], resolution: 10}                           |
| {time: [2023-03-01T00:00:00, 2023-03-02T12:26:40, 2023-03-04T00:53:20, 2023-03-05T13:20:00, 2023-03-07T01:46:40, 2023-03-08T14:13:20, 2023-03-10T02:40:00, 2023-03-11T15:06:40, 2023-03-13T03:33:20, 2023-03-14T16:00:00], value: [65.29115853658537, 64.58307926829268, 64.7530487804878, 64.76753048780488, 65.14405487804878, 65.4298780487805, 65.1920731707317, 65.10365853658537, 64.86356707317073, 64.83841463414635], resolution: 10} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Note

  • This function is sensitive to the order of time and should be used with “ORDER BY time” when used.
  • The “time” and “value” fields in the result are arrays.
  • This function processes all data at once, so try to avoid reading large amounts of data to prevent excessive memory consumption.