www.espertech.comDocumentation
This section compares pattern detection via match recognize and via the EPL pattern language.
Table 8.1. Comparison Match Recognize to EPL Patterns
Category | EPL Patterns | Match Recognize |
---|---|---|
Purpose | Pattern detection in sequences of events. | Same. |
Standards | Not standardized, similar to Rapide pattern language. | Proposal for incorporation into the SQL standard. |
Real-time Processing | Yes. | Yes. |
On-Demand query via Iterator | No. | Yes. |
Language | Nestable expressions consisting of boolean AND , OR , NOT and time or arrival-based constructs such as -> (followed-by), timer:within and timer:interval . | Regular expression consisting of variables each representing conditions on events. |
Event Types | An EPL pattern may react to multiple different types of events. | The input is a single type of event (unless used with variant streams). |
Data Window Interaction | Disconnected, i.e. an event leaving a data window does not change pattern state. | Connected, i.e. an event leaving a data window removes the event from match selection. |
Semantic Evaluation | Truth-value based: A EPL pattern such as (A and B) can fire when a single event arrives that satisfies both A and B conditions. | Sequence-based: A regular expression (A B) requires at least two events to match. |
Time Relationship Between Events | The timer:within , timer:interval and NOT operator can expressively search for absence of events or other more complex timing relationships. | Some support for detecting absence of events using the interval clause. |
Extensibility | Custom pattern objects, user-defined functions. | User-defined functions, custom aggregation functions. |
Memory Use | Likely between 500 bytes to 2k per open sequence, depends on pattern. | Likely between 100 bytes to 1k per open sequence, depends on pattern. |
match_recognize ( [ partition by partition_expression [, partition_expression] [,...] ] measures measure_expressionas
col_name [, measure_expressionas
col_name ] [,...] [ all matches ] [ after match skip (past last row | to next row | to current row) ] pattern ( variable_regular_expr [, variable_regular_expr] [,...] ) [ interval time_period [or terminated] ] [ define variable as variable_condition [, variable as variable_condition] [,...] ] )
Partition by
is optional and may be used to specify that events are to be partitioned by one or more event properties or expressions. If there is no Partition by
then all rows of the table constitute a single partition. The regular expression applies to events in the same partition and not across partitions.
Further detail on key expressions can be found at Section 5.2.13, “Composite Keys and Array Values as Keys”.
The measures
clause defines columns that contain expressions over the pattern variables. The expressions can reference partition columns, singleton variables, aggregates as well as indexed properties on the group variables. Each measure_expression expression must be followed by the as
keyword and a col_name column name.
The all matches
keywords are optional and instructs the runtime to find all possible matches. By default matches are ranked and the runtime returns a single match following an algorithm to eliminate duplicate matches, as described below. When specifying all matches
, matches may overlap and may start at the same row.
The after match skip
keywords are optional and serve to determine the resumption point of pattern matching after a match has been found. By default the behavior is after match skip past last row
. This means that after eliminating duplicate matches, the logic skips to resume pattern matching at the next event after the last event of the current match.
The pattern
component is used to specify a regular expression. The regular expression is built from variable names, and may use quantifiers such as *
, +
, ?
, *?
, +?
, ??
, {repetition}
and |
alteration (concatenation is indicated by the absence of any operator sign between two successive items in a pattern).
With the optional interval
keyword, time period and or terminated
you can control how long the runtime should wait for further events to arrive that may be part of a matching event sequence, before indicating a match (or matches) (not applicable to on-demand pattern matching).
Define
is optional and is used to specify the boolean condition(s) that define some or all variable names that are declared in the pattern. A variable name does not require a definition and if there is no definition, the default is a predicate that is always true. Such a variable name can be used to match any row.
select * from TemperatureSensorEvent match_recognize ( partition by device measures A.id as a_id, B.id as b_id, A.temp as a_temp, B.temp as b_temp pattern (A B) define B as Math.abs(B.temp - A.temp) >= 10 )
Below table is an example sequence of events and output of the pattern:
At time 4000 when event with id E4
(or event E4 or just E4 for short) arrives the pattern matches and produces an output event. Matching then skips past the last event of the current match (E4) and begins at event E5 (the default skip clause is past last row). Therefore events E4 and E5 do not constitute a match.
At time 3000, events E1 and E3 do not constitute a match as E3 does not immediately follow E, since there is E2 in between.
At time 7000, event E7 does not constitute a match as it is from device 2 and thereby not in the same partition as prior events.
The operators at the top of this table take precedence over operators lower on the table.
Table 8.3. Match Recognize Regular Expression Operator Precedence
Precedence | Operator | Description | Example |
---|---|---|---|
1 | Grouping | () | (A B) |
2 | Quantifiers | * + ? {repetition} | A* B+ C? |
3 | Concatenation | (no operator) | A B |
4 | Alternation | | | A | B |
If you are not sure about the precedence, please consider placing parenthesis ()
around your groups. Parenthesis can also help make
expressions easier to read and understand.
select * from TemperatureSensorEvent match_recognize ( partition by device measures A.id as a_id, B.id as b_id, A.temp as a_temp, B.temp as b_temp pattern (A B) define B as Math.abs(B.temp - A.temp) >= 10 )
Please see the Section 8.3.1, “Syntax Example” for a sample event sequence.
The alternation operator is a vertical bar ( |
).
select * from TemperatureSensorEvent match_recognize ( partition by device measures A.id as a_id, B.id as b_id, C.id as c.id pattern (A (B | C)) define A as A.temp >= 50, B as B.temp <= 45, C as Math.abs(C.temp - A.temp) >= 10)
Below table is an example sequence of events and output of the pattern:
Quantifiers are postfix operators with the following choices:
Quantifiers that control the number of repetitions are:
Table 8.6. Quantifiers
Quantifier | Meaning |
---|---|
{n} | Exactly n matches. |
{n, } | n or more matches. |
{n, m} | Between n and m matches (inclusive). |
{ ,m} | Between zero and m matches (inclusive). |
Repetition quantifiers can be combined with other quantifiers and grouping. For example A?{2}
or (A B){2}
are valid.
The following table outlines sample equivalent permutations.
This sample pattern looks for either an event with temperature less than 100 and then an event with temperature greater or equal to 100, or an event with temperature greater or equal to 100 and then an event with temperature less than 100.
select * from TemperatureSensorEvent match_recognize ( partition by device measures A.id as a_id, B.id as b_id pattern (match_recognize_permute(A, B)) define A as A.temp < 100, B as B.temp >= 100)
An example sequence of events that matches the pattern above is:
Table 8.8. Example
Arrival Time | Tuple | Output Event (if any) |
---|---|---|
1000 | id=E1, device=1, temp=99 | |
2000 | id=E2, device=1, temp=100 | a_id = E1, b_id = E2 |
3000 | id=E3, device=1, temp=100 | |
4000 | id=E4, device=1, temp=99 | a_id = E4, b_id = E3 |
5000 | id=E5, device=1, temp=98 |
variableName.propertyName
variableName[index].propertyName
last(variableName.propertyName)
Enumeration methods can also be applied to group variables. An example is provided in Section 11.4.11, “Match-Recognize Group Variable”.
Please find examples of singleton and group variables and example measures
and define
clauses below.
After ranking matches by preferment, matches are chosen as follows:
select * from TemperatureSensorEvent match_recognize ( partition by device measures A.id as a_id, B.id as b_id pattern (A?? B?) define A as A.temp >= 100 B as B.temp >= 105)
A sample sequence of events and pattern matches:
As the ?
qualifier on condition B is greedy, event E2 matches the pattern and is indicated as a B event by the measure
clause (and not as an A event therefore a_id
is null).
select * from TemperatureSensorEvent match_recognize ( partition by device measures first(A.id) as first_a, last(A.id) as last_a, B[0].id as b0_id, B[1].id as b1_id pattern (A+ B+) define A as A.temp >= 100, B as B.temp > A.temp)
An example sequence of events that matches the pattern above is:
Note that for statements there is no match that includes event E5 since after the pattern matches for E4 the pattern skips to start fresh at E5 (by default skip clause). When performing on-demand matching via iterator
, event E5 gets included in the match
and the output is first_a = E2, last_a = E3, b0_id = E4, b1_id = E5
.
select * from TemperatureSensorEvent match_recognize ( partition by device measures A.id as a_id, count(B.id) as count_b, C.id as c_id pattern (A B* C) define A as A.temp < 50, B as B.temp between 50 and 60, C as C.temp > 60)
An example sequence of events that matches the pattern above is:
select * from TemperatureSensorEvent match_recognize ( partition by device measures A.id as a_id, B.id as b_id, C.id as c_id, D.id as d_id pattern (A B? C? D) define A as A.temp < 50, B as B.temp > 50, C as C.temp < 50, D as D.temp > 55)
An example sequence of events that matches the pattern above is:
This sample pattern looks for two events in which the temperature is over 100:
select * from TemperatureSensorEvent match_recognize ( partition by device measures A[0].id as a0_id, A[1].id as a1_id pattern (A{2}) define A as A.temp >= 100)
An example sequence of events that matches the pattern above is:
The next sample applies the quantifier to a group. This sample pattern looks for a four events in which the temperature is, in sequence, 100, 101, 100 and 101:
select * from TemperatureSensorEvent match_recognize ( partition by device measures A[0].id as a0_id, A[1].id as a1_id pattern (A B){2} define A as A.temp = 100, B as B.temp = 101)
select * from TemperatureSensorEvent match_recognize ( partition by device measures A[0].id as a0_id, A[1].id as a1_id, A[2].id as a2_id, B.id as b_id pattern (A{2,} B) define A as A.temp >= 100, B as B.temp >= 102)
An example sequence of events that matches the pattern above is:
select * from TemperatureSensorEvent match_recognize ( partition by device measures A[0].id as a0_id, A[1].id as a1_id, A[2].id as a2_id, B.id as b_id pattern (A{2,3} B) define A as A.temp >= 100, B as B.temp >= 102)
An example sequence of events that matches the pattern above is:
select * from TemperatureSensorEvent match_recognize ( partition by device measures A[0].id as a0_id, A[1].id as a1_id, B.id as b_id pattern (A{,2} B) define A as A.temp >= 100, B as B.temp >= 102)
An example sequence of events that matches the pattern above is:
This function can access variables currently defined, for example:
Y as Y.price < prev(Y.price, 2)
It is not legal to use prev
with another variable then the one being defined:
// not allowed Y as Y.price < prev(X.price, 2)
select * from TemperatureSensorEvent match_recognize ( partition by device measures A.id as a_id pattern (A) define A as A.temp > 100 and prev(A.temp, 2) > 100)
An example sequence of events that matches the pattern above is:
Expressions in the measures
clause must use the as
keyword to assign a column name.
If a variable is a group variable and used in an aggregate, then the aggregate is performed over all rows that have matched the variable. If a group variable is not in an aggregate function, its variable name must be post-fixed with an index. See Section 8.4.6, “Variables Can Be Singleton or Group” for more information.
select * from TemperatureSensorEvent#time(10 sec) match_recognize ( partition by device measures A.id as a_id pattern (A B C D) define B as B.temp > A.temp, C as C.temp > B.temp, D as D.temp > C.temp)
An example sequence of events that matches the pattern above is:
Note that E8, E9, E10 and E11 doe not constitute a match since E8 leaves the data window at 25000.
select * from TemperatureSensorEvent match_recognize ( partition by device measures A.id as a_id, count(B.id) as count_b, first(B.id) as first_b, last(B.id) as last_b pattern (A B*) interval 5 seconds define A as A.temp > 100, B as B.temp > 100)
An example sequence of events that matches the pattern above is:
Notice that the runtime waits 5 seconds (5000 milliseconds) after the arrival time of the first event E2 of the match at 2000, to indicate the match at 7000.
select * from TemperatureSensorEvent match_recognize ( partition by device measures A.id as a_id, count(B.id) as count_b, first(B.id) as first_b, last(B.id) as last_b pattern (A B*) interval 5 seconds or terminated define A as A.temp > 100, B as B.temp > 100)
An example sequence of events that matches the pattern above is:
Interval
and Interval
with or terminated
make most sense for open-ended patterns such as,
for example, pattern (A B*)
or pattern (A B C+)
.
For patterns that terminate when a given event arrives, for example, pattern (A B)
, an Interval
in combination with or terminated
should not be specified
and if specified have no effect on matching.
You may match different types of events using match-recognize by following any of these strategies:
A short example that demonstrates variant streams and match-recognize is listed below:
// Declare one sample type create schema S0 as (col string)
// Declare second sample type create schema S1 as (col string)
// Declare variant stream holding either type create variant schema MyVariantStream as S0, S1
// Populate variant stream insert into MyVariantStream select * from S0
// Populate variant stream insert into MyVariantStream select * from S1
// Simple pattern to match S0 S1 pairs select * from MyVariantType#time(1 min) match_recognize ( measures A.id? as a, B.id? as b pattern (A B) define A as typeof(A) = 'S0', B as typeof(B) = 'S1' )
If your application uses match-recognize in multiple statements and all such match-recognize constructs should count towards a total number of states counts, you may consider setting a maximum number of states, runtime-wide, via the configuration described in Section 17.6.5.1, “Maximum State Count”.
When the limit is reached the match-recognize runtime issues a notification object to any condition handlers registered with the runtime as described in Section 16.11, “Condition Handling”. Depending on your configuration the runtime can prevent the allocation of a new state instance, until states are discarded or statements are undeployed or context partitions are terminated.
The notification object issued to condition handlers is an instance of com.espertech.esper.common.client.hook.condition.ConditionMatchRecognizeStatesMax
. The notification object contains information which statement triggered the limit and the state counts per statement for all statements.
For information on configuration please consult Section 17.6.5.1, “Maximum State Count”.