esper.codehaus.org and espertech.comDocumentation
This chapter outlines the views that are built into Esper. All views can be arbitrarily combined as many of the examples below show.
The section on Chapter 3, Processing Model provides additional information on the relationship of views, filtering and aggregation. Please also see Section 5.4.3, “Specifying Views” for the use of views in the from
clause with streams, patterns and named windows.
Esper organizes built-in views in namespaces and names. Views that provide sliding or tumbling data windows are in the win
namespace. Other most commonly used views are in the std
namespace. The ext
namespace are views that order events. The stat
namespace is used for views that derive statistical data.
Esper distinguishes between data window views and derived-value views. Data windows, or data window views, are views that retain incoming events until an expiry policy indicates to release events. Derived-value views derive a new value from event streams and post the result as events of a new type.
Two or more data window views can be combined. This allows a sets of events retained by one data window to be placed into a union or an intersection with the set of events retained by one or more other data windows. Please see Section 5.4.4, “Multiple Data Window Views” for more detail.
The keep-all data window counts as a data window but has no expiry policy: it retains all events received. The grouped-window declaration allocates a new data window per grouping criteria and thereby counts as a data window, but cannot appear alone.
The next table summarizes data window views:
Table 12.1. Built-in Data Window Views
View | Syntax | Description |
---|---|---|
Length Window | win:length(size) | Sliding length window extending the specified number of elements into the past. |
Length Batch Window | win:length_batch(size) | Tumbling window that batches events and releases them when a given minimum number of events has been collected. |
Time Window | win:time(time period) | Sliding time window extending the specified time interval into the past. |
Externally-timed Window | win:ext_timed(timestamp expression, time period) | Sliding time window, based on the millisecond time value supplied by an expression. |
Time Batch Window | win:time_batch(time period[,optional reference point] [, flow control]) | Tumbling window that batches events and releases them every specified time interval, with flow control options. |
Externally-timed Batch Window | win:ext_timed_batch(timestamp expression, time period[,optional reference point]) | Tumbling window that batches events and releases them every specified time interval based on the millisecond value supplied by an expression. |
Time-Length Combination Batch Window | win:time_length_batch(time period, size [, flow control]) | Tumbling multi-policy time and length batch window with flow control options. |
Time-Accumulating Window | win:time_accum(time period) | Sliding time window accumulates events until no more events arrive within a given time interval. |
Keep-All Window | win:keepall() | The keep-all data window view simply retains all events. |
Sorted Window | ext:sort(size, sort criteria) | Sorts by values returned by sort criteria expressions and keeps only the top events up to the given size. |
Ranked Window | ext:rank(unique criteria(s), size, sort criteria(s)) | Retains only the most recent among events having the same value for the criteria expression(s) sorted by sort criteria expressions and keeps only the top events up to the given size. |
Time-Order Window | ext:time_order(timestamp expression, time period) | Orders events that arrive out-of-order, using an expression providing timestamps to be ordered. |
Unique Window | std:unique(unique criteria(s)) | Retains only the most recent among events having the same value for the criteria expression(s). Acts as a length window of size 1 for each distinct expression value. |
Grouped Data Window | std:groupwin(grouping criteria(s)) | Groups events into sub-views by the value of the specified expression(s), generally used to provide a separate data window per group. |
Last Event Window | std:lastevent() | Retains the last event, acts as a length window of size 1. |
First Event Window | std:firstevent() | Retains the very first arriving event, disregarding all subsequent events. |
First Unique Window | std:firstunique(unique criteria(s)) | Retains only the very first among events having the same value for the criteria expression(s), disregarding all subsequent events for same value(s). |
First Length Window | win:firstlength(size) | Retains the first size events, disregarding all subsequent events. |
First Time Window | win:firsttime(time period) | Retains the events arriving until the time interval has passed, disregarding all subsequent events. |
Expiry Expression Window | win:expr(expiry expression) | Expire events based on the result of an expiry expression passed as a parameter. |
Expiry Expression Batch Window | win:expr_batch(expiry expression) | Tumbling window that batches events and releases them based on the result of an expiry expression passed as a parameter. |
The table below summarizes views that derive information from received events and present the derived information as an insert and remove stream of events that are typed specifically to carry the result of the computations:
Table 12.2. Built-in Derived-Value Views
View | Syntax | Description |
---|---|---|
Size | std:size([expression, ...]) | Derives a count of the number of events in a data window, or in an insert stream if used without a data window, and optionally provides additional event properties as listed in parameters. |
Univariate statistics | stat:uni(value expression [,expression, ...]) | Calculates univariate statistics on the values returned by the expression. |
Regression | stat:linest(value expression, value expression [,expression, ...]) | Calculates regression on the values returned by two expressions. |
Correlation | stat:correl(value expression, value expression [,expression, ...]) | Calculates the correlation value on the values returned by two expressions. |
Weighted average | stat:weighted_avg(value expression, value expression [,expression, ...]) | Calculates weighted average given a weight expression and an expression to compute the average for. |
The syntax for view specifications starts with the namespace name and the name and is followed by optional view parameter expressions in parenthesis:
namespace:name(view_parameters)
This example specifies a time window of 5 seconds:
select * from StockTickEvent.win:time(5 sec)
All expressions are allowed as parameters to views, including expressions that contain variables or substitution parameters for prepared statements. Subqueries, the special prior
and prev
functions and aggregations (with the exception of the expression window and expression batch window) are not allowed as view parameters.
For example, assuming a variable by name VAR_WINDOW_SIZE
is defined:
select * from StockTickEvent.win:time(VAR_WINDOW_SIZE)
Expression parameters for views are evaluated at the time the view is first created with the exception of the expression window (win:expr
) and expression batch window (win:expr_batch
).
Also consider multiple data windows in intersection or union (keywords retain-intersection
and retain-union
). Consider writing a custom plug-in view if your application requires behavior that is not yet provided by any of the built-in views.
If a view takes no parameters, use empty parenthesis ()
.
All the views explained below are data window views, as are std:unique
, std:firstunique
, std:lastevent
and std:firstevent
.
This view is a moving (sliding) length window extending the specified number of elements into the past. The view takes a single expression as a parameter providing a numeric size value that defines the window size:
win:length(size_expression)
The below example sums the price for the last 5 stock ticks for symbol GE.
select sum(price) from StockTickEvent(symbol='GE').win:length(5)
The next example keeps a length window of 10 events of stock trade events, with a separate window for each symbol. The sum of price is calculated only for the last 10 events for each symbol and aggregates per symbol:
select sum(price) from StockTickEvent.std:groupwin(symbol).win:length(10) group by symbol
A length window of 1 is equivalent to the last event window std:lastevent
. The std:lastevent
data window is the preferred notation:
select * from StockTickEvent.std:lastevent() // Prefer this // ... equivalent to ... select * from StockTickEvent.win:length(1)
This window view buffers events (tumbling window) and releases them when a given minimum number of events has been collected. Provide an expression defining the number of events to batch as a parameter:
win:length_batch(size_expression)
The next statement buffers events until a minimum of 10 events have collected. Listeners to updates posted by this view receive updated information only when 10 or more events have collected.
select * from StockTickEvent.win:length_batch(10)
This view is a moving (sliding) time window extending the specified time interval into the past based on the system time. Provide a time period (see Section 5.2.1, “Specifying Time Periods”) or an expression defining the number of seconds as a parameter:
win:time(time period)
win:time(seconds_interval_expression)
For the GE stock tick events in the last 1 second, calculate a sum of price.
select sum(price) from StockTickEvent(symbol='GE').win:time(1 sec)
The following time windows are equivalent specifications:
win:time(2 minutes 5 seconds) win:time(125 sec) win:time(125) win:time(MYINTERVAL) // MYINTERVAL defined as a variable
Similar to the time window, this view is a moving (sliding) time window extending the specified time interval into the past, but based on the millisecond time value supplied by a timestamp expression. The view takes two parameters: the expression to return long-typed timestamp values, and a time period or expression that provides a number of seconds:
win:ext_timed(timestamp_expression, time_period)
win:ext_timed(timestamp_expression, seconds_interval_expression)
The key difference comparing the externally-timed window to the regular time window is that the window slides not based on the engine time, but strictly based on the result of the timestamp expression when evaluated against the events entering the window.
The algorithm underlying the view compares the timestamp value returned by the expression when the oldest event arrived with the timestamp value returned by the expression for the newest arriving event on event arrival. If the time interval between the timestamp values is larger then the timer period parameter, then the algorithm removes all oldest events tail-first until the difference between the oldest and newest event is within the time interval. The window therefore slides only when events arrive and only considers each event's timestamp property (or other expression value returned) and not engine time.
This view holds stock tick events of the last 10 seconds based on the timestamp property in StockTickEvent
.
select * from StockTickEvent.win:ext_timed(timestamp, 10 seconds)
The externally-timed data window expects strict ordering of the timestamp values returned by the timestamp expression. The view is not useful for ordering events in time order, please use the time-order view instead.
On a related subject, engine time itself can be entirely under control of the application as described in Section 14.8, “Controlling Time-Keeping”, allowing control over all time-based aspects of processing in one place.
This window view buffers events (tumbling window) and releases them every specified time interval in one update. The view takes a time period or an expression providing a number of seconds as a parameter, plus optional parameters described next.
win:time_batch(time_period [,optional_reference_point] [,flow_control])
win:time_batch(seconds_interval_expression [,optional_reference_point] [,flow_control])
The time batch window takes a second, optional parameter that serves as a reference point to batch flush times. If not specified, the arrival of the first event into the batch window sets the reference point. Therefore if the reference point is not specified and the first event arrives at time t1, then the batch flushes at time t1 plus time_period and every time_period thereafter.
Note that using this view means that the engine keeps events in memory until the time is up: Consider your event arrival rate and determine if this is the behavior you want. Use context declaration or output rate limiting such as output snapshot
as an alternative.
The below example batches events into a 5 second window releasing new batches every 5 seconds. Listeners to updates posted by this view receive updated information only every 5 seconds.
select * from StockTickEvent.win:time_batch(5 sec)
By default, if there are no events arriving in the current interval (insert stream), and no events remain from the prior batch (remove stream), then the view does not post results to listeners. The view allows overriding this default behavior via flow control keywords.
The synopsis with flow control parameters is:
win:time_batch(time_period or seconds_interval_expr [,optional_reference_point] [, "flow-control-keyword [, keyword...]"] )
The FORCE_UPDATE flow control keyword instructs the view to post an empty result set to listeners if there is no
data to post for an interval. When using this keyword the irstream
keyword should be used in the select
clause to ensure the remove stream is also output.
Note that FORCE_UPDATE is for use with listeners to the same statement and not for use with named windows. Consider output rate limiting instead.
The START_EAGER flow control keyword instructs the view to post empty result sets even before the first event arrives, starting a time interval at statement creation time. As when using FORCE_UPDATE, the view also posts an empty result set to listeners if there is no data to post for an interval, however it starts doing so at time of statement creation rather then at the time of arrival of the first event.
Taking the two flow control keywords in one sample statement, this example presents a view that waits for 10 seconds. It posts empty result sets after one interval after the statement is created, and keeps posting an empty result set as no events arrive during intervals:
select * from MyEvent.win:time_batch(10 sec, "FORCE_UPDATE, START_EAGER")
The optional reference point is provided as a long-value of milliseconds relative to January 1, 1970 and time 00:00:00.
The following example statement sets the reference point to 5 seconds and the batch size to 1 hour, so that each batch output is 5 seconds after each hour:
select * from OrderSummaryEvent.win:time_batch(1 hour, 5000L)
Similar to the time batch window, this view buffers events (tumbling) and releases them every specified time interval in one update, but based on the millisecond time value supplied by a timestamp expression. The view has two required parameters taking an expression that returns long-typed timestamp values and a time period or constant-value expression that provides a number of seconds:
win:ext_timed_batch(timestamp_expression, time_period [,optional_reference_point])
win:ext_timed_batch(timestamp_expression, seconds_interval_expression [,optional_reference_point])
The externally-timed batch window takes a third, optional parameter that serves as a reference point to batch flush times. If not specified, the arrival of the first event into the batch window sets the reference point. Therefore if the reference point is not specified and the first event arrives at time t1, then the batch flushes at time t1 plus time_period and every time_period thereafter.
The key difference comparing the externally-timed batch window to the regular time batch window is that the window tumbles not based on the engine time, but strictly based on the result of the timestamp expression when evaluated against the events entering the window.
The algorithm underlying the view compares the timestamp value returned by the expression when the oldest event arrived with the timestamp value returned by the expression for the newest arriving event on event arrival. If the time interval between the timestamp values is larger then the timer period parameter, then the algorithm posts the current batch of events. The window therefore posts batches only when events arrive and only considers each event's timestamp property (or other expression value returned) and not engine time.
Note that using this view means that the engine keeps events in memory until the time is up: Consider your event arrival rate and determine if this is the behavior you want. Use context declaration or output rate limiting such as output snapshot
as an alternative.
The below example batches events into a 5 second window releasing new batches every 5 seconds. Listeners to updates posted by this view receive updated information only when event arrive with timestamps that indicate the start of a new batch:
select * from StockTickEvent.win:ext_timed_batch(timestamp, 5 sec)
The optional reference point is provided as a long-value of milliseconds relative to January 1, 1970 and time 00:00:00.
The following example statement sets the reference point to 5 seconds and the batch size to 1 hour, so that each batch output is 5 seconds after each hour:
select * from OrderSummaryEvent.win:ext_timed_batch(timestamp, 1 hour, 5000L)
The externally-timed data window expects strict ordering of the timestamp values returned by the timestamp expression. The view is not useful for ordering events in time order, please use the timeorder view instead.
On a related subject, engine time itself can be entirely under control of the application as described in Section 14.8, “Controlling Time-Keeping”, allowing control over all time-based aspects of processing in one place.
This data window view is a combination of time and length batch (tumbling) windows. Similar to the time and length batch windows, this view batches events and releases the batched events when either one of the following conditions occurs, whichever occurs first: the data window has collected a given number of events, or a given time interval has passed.
The view parameters take 2 forms. The first form accepts a time period or an expression providing a number of seconds, and an expression for the number of events:
win:time_length_batch(time_period, number_of_events_expression)
win:time_length_batch(seconds_interval_expression, number_of_events_expression)
The next example shows a time-length combination batch window that batches up to 100 events or all events arriving within a 1-second time interval, whichever condition occurs first:
select * from MyEvent.win:time_length_batch(1 sec, 100)
In this example, if 100 events arrive into the window before a 1-second time interval passes, the view posts the batch of 100 events. If less then 100 events arrive within a 1-second interval, the view posts all events that arrived within the 1-second interval at the end of the interval.
By default, if there are no events arriving in the current interval (insert stream), and no events remain from the prior batch (remove stream), then the view does not post results to listeners. This view allows overriding this default behavior via flow control keywords.
The synopsis of the view with flow control parameters is:
win:time_length_batch(time_period or seconds_interval_expression, number_of_events_expression, "flow control keyword [, keyword...]")
The FORCE_UPDATE
flow control keyword instructs the view to post an empty result set to listeners if there is no data to post for an interval. The view begins posting no later then after one time interval passed after the first event arrives. When using this keyword the irstream
keyword should be used in the select
clause to ensure the remove stream is also output.
The START_EAGER
flow control keyword instructs the view to post empty result sets even before the first event arrives, starting a time interval at statement creation time. As when using FORCE_UPDATE
, the view also posts an empty result set to listeners if there is no data to post for an interval, however it starts doing so at time of statement creation rather then at the time of arrival of the first event.
Taking the two flow control keywords in one sample statement, this example presents a view that waits for 10 seconds or reacts when the 5th event arrives, whichever comes first. It posts empty result sets after one interval after the statement is created, and keeps posting an empty result set as no events arrive during intervals:
select * from MyEvent.win:time_length_batch(10 sec, 5, "FORCE_UPDATE, START_EAGER")
This data window view is a specialized moving (sliding) time window that differs from the regular time window in that it accumulates events until no more events arrive within a given time interval, and only then releases the accumulated events as a remove stream.
The view accepts a single parameter: the time period or seconds-expression specifying the length of the time interval during which no events must arrive until the view releases accumulated events. The synopsis is as follows:
win:time_accum(time_period)
win:time_accum(seconds_interval_expression)
The next example shows a time-accumulating window that accumulates events, and then releases events if within the time interval no more events arrive:
select * from MyEvent.win:time_accum(10 sec)
This example accumulates events, until when for a period of 10 seconds no more MyEvent events arrive, at which time it posts all accumulated MyEvent events.
Your application may only be interested in the batches of events as events leave the data window. This can be done simply by selecting the remove stream of this data window, populated by the engine as accumulated events leave the data window all-at-once when no events arrive during the time interval following the time the last event arrived:
select rstream * from MyEvent.win:time_accum(10 sec)
If there are no events arriving, then the view does not post results to listeners.
This keep-all data window view simply retains all events. The view does not remove events from the data window, unless used with a named window and the on delete
clause.
The view accepts no parameters. The synopsis is as follows:
win:keepall()
The next example shows a keep-all window that accumulates all events received into the window:
select * from MyEvent.win:keepall()
Note that since the view does not release events, care must be taken to prevent retained events from using all available memory.
The firstlength
view retains the very first size_expression events.
The synopsis is:
win:firstlength(size_expression)
If used within a named window and an on-delete
clause deletes events, the view accepts further arriving events until the number of retained events reaches the size of size_expression.
The below example creates a view that retains only the first 10 events:
select * from MyEvent.win:firstlength(10)
The firsttime
view retains all events arriving within a given time interval after statement start.
The synopsis is:
win:firsttime(time_period)
win:firsttime(seconds_interval_expression)
The below example creates a view that retains only those events arriving within 1 minute and 10 seconds of statement start:
select * from MyEvent.win:firsttime(1 minute 10 seconds)
The expr
view applies an expiry expression and removes events from the data window when the expression returns false.
Use this view to implement rolling and dynamically shrinking or expanding time, length or other windows. Rolling can, for example, be controlled based on event properties of arriving events, based on aggregation values or based on the return result of user-defined functions. Use this view to accumulate events until a value changes or other condition occurs based on arriving events or change of a variable value.
The synopsis is:
win:expr(expiry_expression)
The expiry expression can be any expression including expressions on event properties, variables, aggregation functions or user-defined functions. The view applies this expression to the oldest event(s) currently in the view, as described next.
When a new event arrives or when a variable value referenced by the expiry expression changes then the view applies the expiry expression starting from the oldest event in the data window. If the expiry expression returns false for the oldest event, the view removes the event from the data window. The view then applies the expression to the next oldest event. If the expiry expression returns true for the oldest event, no further evaluation takes place and the view indicates any new and expired events through insert and remove stream.
By using variables in the expiry expression it is possible to change the behavior of the view dynamically at runtime. When one or more variables used in the expression are updated the view evaluates the expiry expression starting from the oldest event.
Aggregation functions, if present in the expiry expression, are continuously updated as events enter and leave the data window. Use the grouped data window with this window to compute aggregations per group.
The engine makes the following built-in properties available to the expiry expression:
Table 12.3. Built-in Properties of the Expiry Expression Data Window View
Name | Type | Description |
---|---|---|
current_count | int | The number of events in the data window including the currently-arriving event. |
expired_count | int | The number of events expired during this evaluation. |
newest_event | (same event type as arriving events) | The last-arriving event itself. |
newest_timestamp | long | The engine timestamp associated with the last-arriving event. |
oldest_event | (same event type as arriving events) | The currently-evaluated event itself. |
oldest_timestamp | long | The engine timestamp associated with the currently-evaluated event. |
view_reference | Object | The object handle to this view. |
This EPL declares an expiry expression that retains the last 2 events:
select * from MyEvent.win:expr(current_count <= 2)
The following example implements a dynamically-sized length window by means of a SIZE variable. As the SIZE variable value changes the view retains the number of events according to the current value of SIZE:
create variable int SIZE = 1000
select * from MyEvent.win:expr(current_count <= SIZE)
The next EPL retains the last 2 seconds of events:
select * from MyEvent.win:expr(oldest_timestamp > newest_timestamp - 2000)
The following example implements a dynamically-sized time window. As the SIZE millisecond variable value changes the view retains a time interval accordingly:
create variable long SIZE = 1000
select * from MyEvent.win:expr(newest_timestamp - oldest_timestamp < SIZE)
The following example declares a KEEP variable and flushes all events from the data window when the variable turns false:
create variable boolean KEEP = true
select * from MyEvent.win:expr(KEEP)
The next example specifies a rolling window that removes the oldest events from the window until the total price of all events in the window is less then 1000:
select * from MyEvent.win:expr(sum(price) < 1000)
This example retains all events that have the same value of the flag
event property. When the flag
value changes,
the data window expires all events with the old flag
value and retains only the most recent event of the new flag
value:
select * from MyEvent.win:expr(newest_event.flag = oldest_event.flag)
You may not use subqueries or the prev
and prior
functions as part of the expiry expression. Consider using a named window and on-delete
or on-merge
instead.
When using variables in the expiry expression, the thread that updates the variable does not evaluate the view. The thread that updates the variable instead schedules a reevaluation and view evaluates by timer execution.
The expr_batch
view buffers events (tumbling window) and releases them when a given expiry expression returns true.
Use this view to implement dynamic or custom batching behavior, such as for dynamically shrinking or growing time, length or other batches, for batching based on event properties of arriving events, aggregation values or for batching based on a user-defined function.
The synopsis is:
win:expr_batch(expiry_expression, [include_triggering_event])
The expiry expression can be any expression including expressions on event properties, variables, aggregation functions or user-defined functions. The view applies this expression to arriving event(s), as described next.
The optional second parameter include_triggering_event defines whether to include the event that triggers the batch in the current batch (true
, the default) or in the next batch (false
).
When a new event arrives or when a variable value referenced by the expiry expression changes or when events get removed from the data window then the view applies the expiry expression. If the expiry expression returns true the data window posts the collected events as the insert stream and the last batch of events as remove stream.
By using variables in the expiry expression it is possible to change the behavior of the view dynamically at runtime. When one or more variables used in the expression are updated the view evaluates the expiry expression as well.
Aggregation functions, if present in the expiry expression, are continuously updated as events enter the data window and reset when the engine posts a batch of events. Use the grouped data window with this window to compute aggregations per group.
The engine makes the following built-in properties available to the expiry expression:
Table 12.4. Built-in Properties of the Expiry Expression Data Window View
Name | Type | Description |
---|---|---|
current_count | int | The number of events in the data window including the currently-arriving event. |
newest_event | (same event type as arriving events) | The last-arriving event itself. |
newest_timestamp | long | The engine timestamp associated with the last-arriving event. |
oldest_event | (same event type as arriving events) | The currently-evaluated event itself. |
oldest_timestamp | long | The engine timestamp associated with the currently-evaluated event. |
view_reference | Object | The object handle to this view. |
This EPL declares an expiry expression that posts event batches consisting of 2 events:
select * from MyEvent.win:expr_batch(current_count >= 2)
The following example implements a dynamically-sized length batch window by means of a SIZE variable. As the SIZE variable value changes the view accomulates and posts the number of events according to the current value of SIZE:
create variable int SIZE = 1000
select * from MyEvent.win:expr_batch(current_count >= SIZE)
The following example accumulates events until an event arrives that has a value of postme
for property myvalue
:
select * from MyEvent.win:expr_batch(myvalue = 'postme')
The following example declares a POST variable and posts a batch of events when the variable turns true:
create variable boolean POST = false
select * from MyEvent.win:expr_batch(POST)
The next example specifies a tumbling window that posts a batch of events when the total price of all events in the window is greater then 1000:
select * from MyEvent.win:expr_batch(sum(price) > 1000)
Specify the second parameter as false
when you want the triggering event not included in the current batch.
This example batches all events that have the same value of the flag
event property. When the flag
value changes,
the data window releases the batch of events collected for the old flag
value. The data window collects the most recent event and the future arriving events of the same new flag
value:
select * from MyEvent.win:expr_batch(newest_event.flag != oldest_event.flag, false)
You may not use subqueries or the prev
and prior
functions as part of the expiry expression. Consider using a named window and on-delete
or on-merge
instead.
When using variables in the expiry expression, the thread that updates the variable does not evaluate the view. The thread that updates the variable instead schedules a reevaluation and view evaluates by timer execution.
The unique
view is a view that includes only the most recent among events having the same value(s) for the result of the specified expression or list of expressions.
The synopsis is:
std:unique(unique_expression [, unique_expression ...])
The view acts as a length window of size 1 for each distinct value returned by an expression, or combination of values returned by multiple expressions. It thus posts as old events the prior event of the same value(s), if any.
An expression may return a null
value. The engine treats a null
value as any other value. An expression can also return a custom application object, whereby the application class should implement the hashCode
and equals
methods.
The below example creates a view that retains only the last event per symbol.
select * from StockTickEvent.std:unique(symbol)
The next example creates a view that retains the last event per symbol and feed.
select * from StockTickEvent.std:unique(symbol, feed)
When using unique
the engine plans queries applying an implicit unique index, where applicable.
Specify @Hint('DISABLE_UNIQUE_IMPLICIT_IDX')
to force the engine to plan queries using a non-unique index.
This view groups events into sub-views by the value returned by the specified expression or the combination of values returned by a list of expressions. The view takes a single expression to supply the group criteria values, or a list of expressions as parameters, as the synopsis shows:
std:groupwin(grouping_expression [, grouping_expression ...])
The grouping_expression expression(s) return one or more group keys, by which the view creates sub-views for each distinct group key. Note that the expression should not return an unlimited number of values: the grouping expression should not return a time value or otherwise unlimited key.
An expression may return a null
value. The engine treats a null
value as any other value. An expression can also return a custom application object, whereby the application class should implement the hashCode
and equals
methods.
Use group by
instead of the grouped data window to control how aggregations are grouped.
A grouped data window with a length window of 1 is equivalent to the unique data window std:unique
. The std:unique
data window is the preferred notation:
select * from StockTickEvent.std:unique(symbol) // Prefer this // ... equivalent to ... select * from StockTickEvent.std:groupwin(symbol).win:length(1)
This example computes the total price for the last 5 events considering the last 5 events per each symbol, aggregating the price across all symbols (since no group by
clause is specified the aggregation is across all symbols):
select symbol, sum(price) from StockTickEvent.std:groupwin(symbol).win:length(5)
The @Hint("reclaim_group_aged=
age_in_seconds") hint instructs the engine to discard grouped data window state that has not been updated for age_in_seconds seconds.
The optional @Hint("reclaim_group_freq=
sweep_frequency_in_seconds") can be specified in addition to control the frequency at which the engine sweeps data window state.
If the hint is not specified, the frequency defaults to the same value as age_in_seconds. Use the hints when your group criteria returns a changing or unlimited number of values. By default and without hints the view does not reclaim or remove data windows for group criteria values.
The updated sample statement with both hints:
// Remove data window views for symbols not updated for 10 seconds or more and sweep every 30 seconds @Hint('reclaim_group_aged=10,reclaim_group_freq=30') select symbol, sum(price) from StockTickEvent.std:groupwin(symbol).win:length(5)
Reclaim executes when an event arrives and not in the timer thread. In the example above reclaim can occur up to 40 seconds of engine time after the newest event arrives. Reclaim may affect iteration order for the statement and iteration order becomes indeterministic with reclaim.
To compute the total price for the last 5 events considering the last 5 events per each symbol and outputting a price per symbol, add the group by
clause:
select symbol, sum(price) from StockTickEvent.std:groupwin(symbol).win:length(5) group by symbol
The std:groupwin
grouped-window view can also take multiple expressions that provide values to group by. This example computes the total price for each symbol and feed for the last 10 events per symbol and feed combination:
select sum(price) from StockTickEvent.std:groupwin(symbol, feed).win:length(10)
The order in which the std:groupwin
grouped-window view appears within sub-views of a stream controls the data the engine derives from events for each group. The next 2 statements demonstrate this using a length window.
Without the std:groupwin
declaration query the same query returns the total price per symbol for only the last 10 events across all symbols. Here the engine allocates only one length window for all events:
select sum(price) from StockTickEvent.win:length(10)
We have learned that by placing the std:groupwin
grouped-window view before other views, these other views become part of the grouped set of views. The engine dynamically allocates a new view instance for each subview, every time it encounters a new group key such as a new value for symbol. Therefore, in std:groupwin(symbol).win:length(10)
the engine allocates a new length window for each distinct symbol. However in win:length(10)
alone the engine maintains a single length window.
The std:groupwin
can be used with multiple data window views to achieve a grouped intersection or union policy.
The next query retains the last 4 events per symbol and only those events that are also not older then 10 seconds:
select * from StockTickEvent.std:groupwin(symbol).win:length(4).win:time(10)
Last, we consider a grouped data window for two group criteria. Here, the query results are total price per symbol and feed for the last 100 events per symbol and feed.
select sum(price) from StockTickEvent.std:groupwin(symbol, feed).win:length(100)
A note on grouped time windows: When using grouped-window with time windows, note that whether the engine retains 5 minutes of events or retains 5 minutes of events per group, the result is the same from the perspective of retaining events as both policies retain, considering all groups, the same set of events. Therefore please specify the time window alone (ungrouped).
For example:
// Use this: select sum(price) from StockTickEvent.win:time(1 minute) // is equivalent to (don't use this): // select sum(price) from StockTickEvent.std:groupwin(symbol).win:time(1 minute) // Use the group-by clause for grouping aggregation by symbol.
For advanced users: There is an optional view that can control how the std:groupwin
grouped-window view gets evaluated and that view is the std:merge
view.
The merge view can only occur after a std:groupwin
grouped-window view in a view chain and controls
at what point in the view chain the merge of the data stream occurs from view-instance-per-criteria to single view.
Compare the following statements:
select * from Market.std:groupwin(ticker).win:length(1000000) .stat:weighted_avg(price, volume).std:merge(ticker) // ... and ... select * from Market.std:groupwin(ticker).win:length(1000000).std:merge(ticker) .stat:weighted_avg(price, volume)
If your statement does not specify the optional std:merge
view, the semantics are the same as the first statement.
The first statement, in which the merge-view is added to the end (same as no merge view), computes weighted average per ticker, considering, per-ticker, the last 1M Market events for each ticker. The second statement, in which the merge view is added to the middle, computes weighted average considering, per-ticker, the last 1M Market events, computing the weighted average for all such events using a single view rather then multiple view instances with one view per ticker.
This view posts the number of events received from a stream or view plus any additional event properties or expression values listed as parameters. The synopsis is:
std:size([expression, ...] [ * ])
The view posts a single long-typed property named size
. The view posts the prior size as old data, and the current size as new data to update listeners of the view. Via the iterator
method of the statement the size value can also be polled (read). The view only posts output events when the size
count changes and does not stay the same.
As optional parameters the view takes a list of expressions that the view evaluates against the last arriving event and provides along the size
field. You may also provide the *
wildcard selector to have the view output all event properties.
An alternative to receiving a data window event count is the prevcount
function. Compared to the std:size
view the prevcount
function requires a data window while the std:size
view does not. The related count(...)
aggregation function provides a count per group when used with group by
.
When combined with a data window view, the size view reports the current number of events in the data window in the insert stream and the prior number of events in the data window as the remove stream. This example reports the number of tick events within the last 1 minute:
select size from StockTickEvent.win:time(1 min).std:size()
To select additional event properties you may add each event property to output as a parameter to the view.
The next example selects the symbol and feed event properties in addition to the size
property:
select size, symbol, feed from StockTickEvent.win:time(1 min).std:size(symbol, feed)
This example selects all event properties in addition to the size
property:
select * from StockTickEvent.win:time(1 min).std:size(*)
The size view is also useful in conjunction with a std:groupwin
grouped-window view to count the number of events per group. The EPL below returns the number of events per symbol.
select size from StockTickEvent.std:groupwin(symbol).std:size()
When used without a data window, the view simply counts the number of events:
select size from StockTickEvent.std:size()
All views can be used with pattern statements as well. The next EPL snippet shows a pattern where we look for tick events followed by trade events for the same symbol. The size view counts the number of occurrences of the pattern.
select size from pattern[every s=StockTickEvent -> TradeEvent(symbol=s.symbol)].std:size()
This view exposes the last element of its parent view:
std:lastevent()
The view acts as a length window of size 1. It thus posts as old events the prior event in the stream, if any.
This example statement retains the last stock tick event for the symbol GE.
select * from StockTickEvent(symbol='GE').std:lastevent()
If you want to output the last event within a sliding window, please see Section 9.1.9, “The Previous Function”. That function accepts a relative (count) or absolute index and returns event properties or an event in the context of the specified data window.
This view retains only the first arriving event:
std:firstevent()
All events arriving after the first event are discarded.
If used within a named window and an on-delete
clause deletes the first event, the view resets and will retain the next arriving event.
An example of a statement that retains the first ReferenceData
event arriving is:
select * from ReferenceData.std:firstevent()
If you want to output the first event within a sliding window, please see Section 9.1.9, “The Previous Function”. That function accepts a relative (count) or absolute index and returns event properties or an event in the context of the specified data window.
The firstunique
view retains only the very first among events having the same value for the specified expression or list of expressions.
The synopsis is:
std:firstunique(unique_expression [, unique_expression ...])
If used within a named window and an on-delete
clause deletes events, the view resets and will retain the next arriving event for the expression result value(s) of the deleted events.
The below example creates a view that retains only the first event per category:
select * from ReferenceData.std:firstunique(category)
When using firstunique
the engine plans queries applying an implicit unique index, where applicable.
Specify @Hint('DISABLE_UNIQUE_IMPLICIT_IDX')
to force the engine to plan queries using a non-unique index.
The statistics views can be used combined with data window views or alone. Very similar to aggregation functions, these views aggregate or derive information from an event stream. As compared to aggregation functions,
statistics views can post multiple derived fields including properties from the last event that was received. The derived fields and event properties are available for querying in the where
-clause and are often compared to prior values using the prior
function.
Statistics views accept one or more primary value expressions and any number of optional additional expressions that return values based on the last event received.
This view calculates univariate statistics on a numeric expression. The view takes a single value expression as a parameter plus any number of optional additional expressions to return properties of the last event. The value expression must return a numeric value:
stat:uni(value_expression [,expression, ...] [ * ])
After the value expression you may optionally list additional expressions or event properties to evaluate for the stream and return their value based on the last arriving event. You may also provide the *
wildcard selector to have the view output all event properties.
Table 12.5. Univariate statistics derived properties
Property Name | Description |
---|---|
datapoints | Number of values, equivalent to count(*) for the stream |
total | Sum of values |
average | Average of values |
variance | Variance |
stddev | Sample standard deviation (square root of variance) |
stddevpa | Population standard deviation |
The below example selects the standard deviation on price for stock tick events for the last 10 events.
select stddev from StockTickEvent.win:length(10).stat:uni(price)
To add properties from the event stream you may simply add all additional properties as parameters to the view.
This example selects all of the derived values, based on the price property, plus the values of the symbol and feed event properties:
select * from StockTickEvent.win:length(10).stat:uni(price, symbol, feed)
The following example selects all of the derived values plus all event properties:
select * from StockTickEvent.win:length(10).stat:uni(price, symbol, *)
This view calculates regression and related intermediate results on the values returned by two expressions. The view takes two value expressions as parameters plus any number of optional additional expressions to return properties of the last event. The value expressions must return a numeric value:
stat:linest(value_expression, value_expression [,expression, ...] [ * ])
After the two value expressions you may optionally list additional expressions or event properties to evaluate for the stream and return their value based on the last arriving event. You may also provide the *
wildcard selector to have the view output all event properties.
Table 12.6. Regression derived properties
Property Name | Description |
---|---|
slope | Slope. |
YIntercept | Y intercept. |
XAverage | X average. |
XStandardDeviationPop | X standard deviation population. |
XStandardDeviationSample | X standard deviation sample. |
XSum | X sum. |
XVariance | X variance. |
YAverage | X average. |
YStandardDeviationPop | Y standard deviation population. |
YStandardDeviationSample | Y standard deviation sample. |
YSum | Y sum. |
YVariance | Y variance. |
dataPoints | Number of data points. |
n | Number of data points. |
sumX | Sum of X (same as X Sum). |
sumXSq | Sum of X squared. |
sumXY | Sum of X times Y. |
sumY | Sum of Y (same as Y Sum). |
sumYSq | Sum of Y squared. |
The next example calculates regression and returns the slope and y-intercept on price and offer for all events in the last 10 seconds.
select slope, YIntercept from StockTickEvent.win:time(10 seconds).stat:linest(price, offer)
To add properties from the event stream you may simply add all additional properties as parameters to the view.
This example selects all of the derived values, based on the price and offer properties, plus the values of the symbol and feed event properties:
select * from StockTickEvent.win:time(10 seconds).stat:linest(price, offer, symbol, feed)
The following example selects all of the derived values plus all event properties:
select * from StockTickEvent.win:time(10 seconds).stat:linest(price, offer, *)
This view calculates the correlation value on the value returned by two expressions. The view takes two value expressions as parameters plus any number of optional additional expressions to return properties of the last event. The value expressions must be return a numeric value:
stat:correl(value_expression, value_expression [,expression, ...] [ * ])
After the two value expressions you may optionally list additional expressions or event properties to evaluate for the stream and return their value based on the last arriving event. You may also provide the *
wildcard selector to have the view output all event properties.
Table 12.7. Correlation derived properties
Property Name | Description |
---|---|
correlation | Correlation between two event properties |
The next example calculates correlation on price and offer over all stock tick events for GE:
select correlation from StockTickEvent(symbol='GE').stat:correl(price, offer)
To add properties from the event stream you may simply add all additional properties as parameters to the view.
This example selects all of the derived values, based on the price and offer property, plus the values of the feed event property:
select * from StockTickEvent(symbol='GE').stat:correl(price, offer, feed)
The next example selects all of the derived values plus all event properties:
select * from StockTickEvent(symbol='GE').stat:correl(price, offer, *)
This view returns the weighted average given an expression returning values to compute the average for and an expression returning weight. The view takes two value expressions as parameters plus any number of optional additional expressions to return properties of the last event. The value expressions must return numeric values:
stat:weighted_avg(value_expression_field, value_expression_weight [,expression, ...] [ * ])
After the value expression you may optionally list additional expressions or event properties to evaluate for the stream and return their value based on the last arriving event. You may also provide the *
wildcard selector to have the view output all event properties.
A statement that derives the volume-weighted average price for the last 3 seconds for a given symbol is shown below:
select average from StockTickEvent(symbol='GE').win:time(3 seconds).stat:weighted_avg(price, volume)
To add properties from the event stream you may simply add all additional properties as parameters to the view.
This example selects all of the derived values, based on the price and volume properties, plus the values of the symbol and feed event properties:
select * from StockTickEvent.win:time(3 seconds).stat:weighted_avg(price, volume, symbol, feed)
The next example selects all of the derived values plus the values of all event properties:
select * from StockTickEvent.win:time(3 seconds).stat:weighted_avg(price, volume, *)
Aggregation functions could instead be used to compute the weighted average as well. The next example also posts weighted average per symbol considering the last 3 seconds of stock tick data:
select symbol, sum(price*volume)/sum(volume) from StockTickEvent.win:time(3 seconds) group by symbol
The following example computes weighted average keeping a separate data window per symbol considering the last 5 events of each symbol:
select symbol, average from StockTickEvent.std:groupwin(symbol).win:length(5).stat:weighted_avg(price, volume)
The views in this set are data windows that order events according to a criteria.
This view sorts by values returned by the specified expression or list of expressions and keeps only the top (or bottom) events up to the given size.
This view retains all events in the stream that fall into the sort range. Use the ranked window as described next to retain events per unique key(s) and sorted.
The syntax is as follows:
ext:sort(size_expression, sort_criteria_expression [asc/desc][, sort_criteria_expression [asc/desc]...])
An expression may be followed by the optional asc
or desc
keywords to indicate that the values returned by that expression are sorted in ascending or descending sort order.
The view below retains only those events that have the highest 10 prices considering all events (and not only the last event per symbol, see rank below) and reports a total price:
select sum(price) from StockTickEvent.ext:sort(10, price desc)
The following example sorts events first by price in descending order, and then by symbol name in ascending (alphabetical) order, keeping only the 10 events with the highest price (with ties resolved by alphabetical order of symbol).
select * from StockTickEvent.ext:sort(10, price desc, symbol asc)
The sorted window is often used with the prev
, prevwindow
or prevtail
single-row functions to output properties of events at a certain position or to output the complete data window according to sort order.
Use the grouped window to retain a separate sort window for each group. For example, the views std:groupwin(market).ext:sort(10, price desc)
instruct the engine to retain, per market, the highest 10 prices.
This view retains only the most recent among events having the same value for the criteria expression(s), sorted by sort criteria expressions and keeps only the top events up to the given size.
This view is similar to the sorted window in that it keeps only the top (or bottom) events up to the given size, however the view also retains only the most recent among events having the same value(s) for the specified uniqueness expression(s).
The syntax is as follows:
ext:rank(unique_expression [, unique_expression ...], size_expression, sort_criteria_expression [asc/desc][, sort_criteria_expression [asc/desc]...])
Specify the expressions returning unique key values first. Then specify a constant value that is the size of the ranked window. Then specify the expressions returning sort criteria values. The sort criteria expressions may be followed by the optional asc
or desc
keywords to indicate that the values returned by that expression are sorted in ascending or descending sort order.
The view below retains only those events that have the highest 10 prices considering only the last event per symbol and reports a total price:
select sum(price) from StockTickEvent.ext:rank(symbol, 10, price desc)
The following example retains, for the last event per market and symbol, those events that sort by price and quantity ascending into the first 10 ranks:
select * from StockTickEvent.ext:rank(market, symbol, 10, price, quantity)
The ranked window is often used with the prev
, prevwindow
or prevtail
single-row functions to output properties of events at a certain position or to output the complete data window according to sort order.
This example outputs every 5 seconds the top 10 events according to price descending and considering only the last event per symbol:
select prevwindow(*) from StockTickEvent.ext:rank(symbol, 10, price desc) output snapshot every 5 seconds limit 1 // need only 1 row
Use the grouped window to retain a separate rank for each group. For example, the views std:groupwin(market).ext:rank(symbol, 10, price desc)
instruct the engine to retain, per market, the highest 10 prices considering the last event per symbol.
This view orders events that arrive out-of-order, using timestamp-values provided by an expression, and by comparing that timestamp value to engine system time.
The syntax for this view is as follows.
ext:time_order(timestamp_expression, time_period)
ext:time_order(timestamp_expression, seconds_interval_expression)
The first parameter to the view is the expression that supplies timestamp values. The timestamp is expected to be a long-typed millisecond value that denotes an event's time of consideration by the view (or other expression). This is typically the time of arrival. The second parameter is a number-of-seconds expression or the time period specifying the time interval that an arriving event should maximally be held, in order to consider older events arriving at a later time.
Since the view compares timestamp values to engine time, the view requires that the timestamp values and current engine time are both following the same clock. Therefore, to the extend that the clocks that originated both timestamps differ, the view may produce inaccurate results.
As an example, the next statement uses the arrival_time
property of MyTimestampedEvent
events to order and release events by arrival time:
insert rstream into ArrivalTimeOrderedStream select rstream * from MyTimestampedEvent.ext:time_order(arrival_time, 10 sec)
In the example above, the arrival_time
property holds a long-typed timestamp value in milliseconds. On arrival of an event, the engine compares the timestamp value of each event to the tail-time of the window. The tail-time of the window is, in this example, 10 seconds before engine time (continuously sliding). If the timestamp value indicates that the event is older then the tail-time of the time window, the event is released immediately in the remove stream. If the timestamp value indicates that the event is newer then the tail-time of the window, the view retains the event until engine time moves such that the event timestamp is older then tail-time.
The examples thus holds each arriving event in memory anywhere from zero seconds to 10 seconds, to allow for older events (considering arrival time timestamp) to arrive. In other words, the view holds an event with an arrival time equal to engine time for 10 seconds. The view holds an event with an arrival time that is 2 seconds older then engine time for 8 seconds. The view holds an event with an arrival time that is 10 or more seconds older then engine time for zero seconds, and releases such (old) events immediately into the remove stream.
The insert stream of this sliding window consists of all arriving events. The remove stream of the view is ordered by timestamp value: The event that has the oldest timestamp value is released first, followed by the next newer events. Note the statement above uses the rstream
keyword in both the insert into
clause and the select
clause to select ordered events only. It uses the insert into
clause to makes such ordered stream available for subsequent statements to use.
It is up to your application to populate the timestamp property into your events or use a sensible expression that returns timestamp values for consideration by the view. The view also works well if you use externally-provided time via timer events.