Chapter 6. EPL Reference: Match Recognize

6.1. Overview

Using match recognize patterns are defined in the familiar syntax of regular expressions.

The match recognize syntax presents an alternative way to specify pattern detection as compared to the EPL pattern language described in the previous chapter. A comparison of match recognize and EPL patterns is below.

The match recognize syntax is a proposal for incorporation into the SQL standard. It is thus subject to change as the standard evolves and finalizes (it has not finalized yet). Please consult row-pattern-recogniton-11-public for further information.

You may be familiar with regular expressions in the context of finding text of interest in a string, such as particular characters, words, or patterns of characters. Instead of matching characters, match recognize matches sequences of events of interest.

Esper can apply match-recognize patterns in real-time upon arrival of new events in a stream of events (also termed incrementally, streaming or continuous). Esper can also match patterns on-demand via the iterator pull-API, if specifying a named window or data window on a stream.

6.2. Comparison of Match Recognize and EPL Patterns

This section compares pattern detection via match recognize and via the EPL pattern language.

Table 6.1. Comparison Match Recognize to EPL Patterns

CategoryEPL PatternsMatch Recognize
PurposePattern detection in sequences of events.Same.
StandardsNot standardized, similar to Rapide pattern language.Proposal for incorporation into the SQL standard.
Real-time ProcessingYes.Yes.
On-Demand query via IteratorNo.Yes.
LanguageNestable expressions consisting of boolean AND, OR, NOT and time or arrival-based constructs such as -> (followed-by), timer:within and timer:interval.Regular expression consisting of variables each representing conditions on events.
Event TypesAn EPL pattern may react to multiple different types of events.The input is a single type of event (unless used with variant streams).
Data Window InteractionDisconnected, i.e. an event leaving a data window does not change pattern state.Connected, i.e. an event leaving a data window removes the event from match selection.
Semantic EvaluationTruth-value based: A EPL pattern such as (A and B) can fire when a single event arrives that satisfies both A and B conditions.Sequence-based: A regular expression (A B) requires at least two events to match.
Time Relationship Between EventsThe timer:within, timer:interval and NOT operator can expressively search for absence of events or other more complex timing relationships.Some support for detecting absence of events using the interval clause.
ExtensibilityCustom pattern objects, user-defined functions.User-defined functions, custom aggregation functions.
Memory UseLikely between 500 bytes to 2k per open sequence, depends on pattern.Likely between 100 bytes to 1k per open sequence, depends on pattern.

6.3. Syntax

The synopsis is as follows:

match_recognize (
  [ partition by partition_expression [, partition_expression] [,...]  ]
  measures  measure_expression as col_name [, measure_expression as col_name ] [,...]
  [ all matches ]
  [ after match skip (past last row | to next row | to current row) ]
  pattern ( variable_regular_expr [, variable_regular_expr] [,...] )
  [ interval time_period ]
  define  variable as variable_condition [, variable as variable_condition]  [,...] 
)

The match_recognize keyword starts the match recognize definition and occurs right after the from clause in an EPL select statement. It is followed by parenthesis that surround the match recognize definition.

Partition by is optional and may be used to specify that events are to be partitioned by one or more event properties or expressions. If there is no Partition by then all rows of the table constitute a single partition. The regular expression applies to events in the same partition and not across partitions.

The measures clause defines columns that contain expressions over the pattern variables. The expressions can reference partition columns, singleton variables, aggregates as well as indexed properties on the group variables. Each measure_expression expression must be followed by the as keyword and a col_name column name.

The all matches keywords are optional and instructs the engine to find all possible matches. By default matches are ranked and the engine returns a single match following an algorithm to eliminate duplicate matches, as described below. When specifying all matches, matches may overlap and may start at the same row.

The after match skip keywords are optional and serve to determine the resumption point of pattern matching after a match has been found. By default the behavior is after match skip past last row. This means that after eliminating duplicate matches, the logic skips to resume pattern matching at the next event after the last event of the current match.

The pattern component is used to specify a regular expression. The regular expression is built from variable names, and may use the operators such as *, +, ?, *?, +?, ?? quantifiers and | alteration (concatenation is indicated by the absence of any operator sign between two successive items in a pattern).

With the optional interval keyword and time period you can control how long the engine should wait for further events to arrive that may be part of a matching event sequence, before indicating a match (or matches) (not applicable to on-demand pattern matching).

Define is a mandatory component, used to specify the boolean condition that defines a variable name that is declared in the pattern. A variable name does not require a definition and if there is no definition, the default is a predicate that is always true. Such a variable name can be used to match any row.

6.3.1. Syntax Example

For illustration, the examples in this chapter use the TemperatureSensorEvent event. Each event has 3 properties: the id property is a unique event id, the device is a sensor device number and the temp property is a temperature reading. An event described as "id=E1, device=1, temp=100" is a TemperatureSensorEvent event with id "E1" for device 1 with a reading of 100.

This example statement looks for two TemperatureSensorEvent events from the same device, directly following each other, that indicate a jump in temperature of 10 or more between the two events:

select * from TemperatureSensorEvent
match_recognize (
  partition by device
  measures A.id as a_id, B.id as b_id, A.temp as a_temp, B.temp as b_temp
  pattern (A B)
  define 
    B as Math.abs(B.temp - A.temp) >= 10
)

The partition by ensures that the regular expression applies to sequences of events from the same device.

The measures clause provides a list of properties or expressions to be selected from matching events. Each property name must be prefixed by the variable name.

In the pattern component the statement declares two variables: A and B. As a matter of convention, variable names are uppercase characters.

The define clause specifies no condition for variable A. This means that A defaults to true and any event matches A. Therefore, the pattern can start at any event.

The pattern A B indicates to look for a pattern in which an event that fulfills the condition for variable A is immediately followed by an event that fulfills the condition for variable B. A pattern thus defines the sequence (or sequences) of conditions that must be met for the pattern to fire.

Below table is an example sequence of events and output of the pattern:

Table 6.2. Example

Arrival TimeTupleOutput Event (if any)
1000id=E1, device=1, temp=50 
2000id=E2, device=1, temp=55 
3000id=E3, device=1, temp=60 
4000id=E4, device=1, temp=70a_id = E3, b_id = E4, a_temp = 60, b_temp = 70
5000id=E5, device=1, temp=85 
6000id=E6, device=1, temp=85 
7000id=E7, device=2, temp=100 

At time 4000 when event with id E4 (or event E4 or just E4 for short) arrives the pattern matches and produces an output event. Matching then skips past the last event of the current match (E4) and begins at event E5 (the default skip clause is past last row). Therefore events E4 and E5 do not constitute a match.

At time 3000, events E1 and E3 do not constitute a match as E3 does not immediately follow E, since there is E2 in between.

At time 7000, event E7 does not constitute a match as it is from device 2 and thereby not in the same partition as prior events.

6.4. Pattern and Pattern Operators

The pattern specifies the pattern to be recognized in the ordered sequence of events in a partition using regular expression syntax. Each variable name in a pattern corresponds to a boolean condition, which is specified later using the define component of the syntax. Thus the pattern can be regarded as implicitly declaring one or more variable names; the definition of those variable names appears later in the syntax. If a variable is not defined the variable defaults to true.

It is permitted for a variable name to occur more than once in a pattern, for example pattern (A B A).

6.4.1. Operator Precedence

The operators at the top of this table take precedence over operators lower on the table.

Table 6.3. Match Recognize Regular Expression Operator Precedence

PrecedenceOperatorDescriptionExample
1Grouping()
(A B)
2Single-character duplication* + ?
A* B+ C?
3Concatenation(no operator)
A B
4Alternation|
A | B

If you are not sure about the precedence, please consider placing parenthesis () around your groups. Parenthesis can also help make expressions easier to read and understand.

6.4.2. Concatenation

The concatenation is indicated by the absence of any operator sign between two successive items in a pattern.

In below pattern the two items A and B have no operator between them. The pattern matches for any event immediately followed by an event from the same device that indicates a jump in temperature over 10:

select * from TemperatureSensorEvent
match_recognize (
  partition by device
  measures A.id as a_id, B.id as b_id, A.temp as a_temp, B.temp as b_temp
  pattern (A B)
  define 
    B as Math.abs(B.temp - A.temp) >= 10
)

Please see the Section 6.3.1, “Syntax Example” for a sample event sequence.

6.4.3. Alternation

The alternation operator is a vertical bar ( | ).

The alternation operator has the lowest precedence of all operators. It tells the engine to match either everything to the left of the vertical bar, or everything to the right of the vertical bar. If you want to limit the reach of the alternation, you will need to use round brackets for grouping.

This example pattern looks for a sequence of an event with a temperature over 50 followed immediately by either an event with a temperature less then 45 or an event that indicates the temperature jumped by 10 (all for the same device):

select * from TemperatureSensorEvent
match_recognize (
  partition by device
  measures A.id as a_id, B.id as b_id, C.id as c.id
  pattern (A (B | C))
  define 
    A as A.temp >= 50,
    B as B.temp <= 45,
    C as Math.abs(B.temp - A.temp) >= 10)

Below table is an example sequence of events and output of the pattern:

Table 6.4. Example

Arrival TimeTupleOutput Event (if any)
1000id=E1, device=1, temp=50 
2000id=E2, device=1, temp=45a_id=E1, b_id=E2, c_id=null
3000id=E3, device=1, temp=46 
4000id=E4, device=1, temp=48 
5000id=E5, device=1, temp=50 
6000id=E6, device=1, temp=60a_id = E5, b_id = null, c_id=E6

6.4.4. Quantifiers Overview

Quantifiers are postfix operators with the following choices:

Table 6.5. Quantifiers

QuantifierMeaning
*Zero or more matches (greedy).
+One or more matches (greedy).
?Zero or one match (greedy).
*?Zero or more matches (reluctant).
+?One or more matches (reluctant).
??Zero or one match (reluctant).

6.4.5. Variables Can be Singleton or Group

A singleton variable is a variable in a pattern that does not have a quantifier or has a zero-or-one quantifier (? or ??) and occurs only once in the pattern (except with alteration). In the measures clause a singleton variable can be selected as:

variableName.propertyName

Variables with a zero-or-more or one-or-more quantifier, or variables that occur multiple places in a pattern (except when using alteration), may match multiple events and are group variables. In the measures clause a group variable must be selected either by providing an index or via any of the aggregation functions, such as first, last, count and sum:

variableName[index].propertyName
last(variableName.propertyName)

Please find examples of singleton and group variables and example measures clauses below.

6.4.5.1. Additional Aggregation Functions

For group variables all existing aggregation functions can be used and in addition the following aggregation functions may be used:

Table 6.6. Syntax and results of aggregate functions

Aggregate FunctionResult
first([all|distinct] expression)

Returns the first value.

last([all|distinct] expression)

Returns the last value.

6.4.6. Eliminating Duplicate Matches

The execution of match recognize is continuous and real-time by default. This means that every arriving event, or batch of events if using batching, evaluates against the pattern and matches are immediately indicated. Elimination of duplicate matches occurs between all matches of the arriving events (or batch of events) at a given time.

As an alternative, and if your application does not require continuous pattern evaluation, you may use the iterator API to perform on-demand matching of the pattern. For the purpose of indicating to the engine to not generate continuous results, specify the @Hint('iterate_only') hint.

When using one-or-more, zero-or-more or zero-or-one quantifiers (?, +, *, ??, +?, *?), the output of the real-time continuous query can differ from the output of the on-demand iterator execution: The continuous query will output a match (or multiple matches) as soon as matches are detected at a given time upon arrival of events (not knowing what further events may arrive). The on-demand execution, since it knows all possible events in advance, can determine the longest match(es). Thus elimination of duplicate matches can lead to different results between real-time and on-demand use.

If the all matches keywords are specified, then all matches are returned as the result and no elimination of duplicate matches as below occurs.

Otherwise matches to a pattern in a partition are ordered by preferment. Preferment is given to matches based on the following priorities:

  1. A match that begins at an earlier row is preferred over a match that begins at a later row.

  2. Of two matches matching a greedy quantifier, the longer match is preferred.

  3. Of two matches matching a reluctant quantifier, the shorter match is preferred.

After ranking matches by preferment, matches are chosen as follows:

  1. The first match by preferment is taken.

  2. The pool of matches is reduced as follows based on the SKIP TO clause: If SKIP PAST LAST ROW is specified, all matches that overlap the first match are discarded from the pool. If SKIP TO NEXT ROW is specified, then all matches that overlap the first row of the first match are discarded. If SKIP TO CURRENT ROW is specified, then no matches are discarded.

  3. The first match by preferment of the ones remaining is taken.

  4. Step 2 is repeated to remove more matches from the pool.

  5. Steps 3 and 4 are repeated until there are no remaining matches in the pool.

6.4.7. Greedy Or Reluctant

Reluctant quantifiers are indicated by an additional question mark (*?, +?, ??,). Reluctant quantifiers try to match as few rows as possible, whereas non-reluctant quantifiers are greedy and try to match as many rows as possible.

Greedy and reluctant come into play only for match selection among multiple possible matches. When specifying all matches there is no difference between greedy and reluctant quantifiers.

Consider the below example. The conditions may overlap: an event with a temperature reading of 105 and over matches both A and B conditions:

select * from TemperatureSensorEvent
match_recognize (
  partition by device
  measures A.id as a_id, B.id as b_id
  pattern (A?? B?)
  define 
    A as A.temp >= 100
    B as B.temp >= 105)

A sample sequence of events and pattern matches:

Table 6.7. Example

Arrival TimeTupleOutput Event (if any)
1000id=E1, device=1, temp=99 
2000id=E2, device=2, temp=106a_id=null, b_id=E2
3000id=E3, device=1, temp=100a_id=E3, b_id=null

As the ? qualifier on condition B is greedy, event E2 matches the pattern and is indicated as a B event by the measure clause (and not as an A event therefore a_id is null).

6.4.8. Quantifier - One Or More (+ and +?)

The one-or-more quantifier (+) must be matched one or more times by events. The operator is greedy and the reluctant version is +?.

In the below example with pattern (A+ B+) the pattern consists of two variable names, A and B, each of which is quantified with +, indicating that they must be matched one or more times.

The pattern looks for one or more events in which the temperature is over 100 followed by one or more events indicating a higher temperature:

select * from TemperatureSensorEvent
match_recognize (
  partition by device
  measures first(A.id) as first_a, last(A.id) as last_a, B[0].id as b0_id, B[1].id as b1_id
  pattern (A+ B+)
  define 
	A as A.temp >= 100,
	B as B.temp > A.temp)

An example sequence of events that matches the pattern above is:

Table 6.8. Example

Arrival TimeTupleOutput Event (if any)
1000id=E1, device=1, temp=99 
2000id=E2, device=1, temp=100 
3000id=E3, device=1, temp=100 
4000id=E4, device=1, temp=101first_a = E2, last_a = E3, b0_id = E4, b1_id = null
5000id=E5, device=1, temp=102 

Note that for continuous queries, there is no match that includes event E5 since after the pattern matches for E4 the pattern skips to start fresh at E5 (by default skip clause). When performing on-demand matching via iterator, event E5 gets included in the match and the output is first_a = E2, last_a = E3, b0_id = E4, b1_id = E5.

6.4.9. Quantifier - Zero Or More (* and *?)

The zero-or-more quantifier (*) must be matched zero or more times by events. The operator is greedy and the reluctant version is *?.

The pattern looks for a sequence of events in which the temperature starts out below 50 and then stays between 50 and 60 and finally comes over 60:

select * from TemperatureSensorEvent
match_recognize (
  partition by device
  measures A.id as a_id, count(B.id) as count_b, C.id as c_id
  pattern (A B* C)
  define 
	A as A.temp < 50,
	B as B.temp between 50 and 60,
	C as C.temp > 60)

An example sequence of events that matches the pattern above is:

Table 6.9. Example

Arrival TimeTupleOutput Event (if any)
1000id=E1, device=1, temp=55 
2000id=E2, device=1, temp=52 
3000id=E3, device=1, temp=49 
4000id=E4, device=1, temp=51 
5000id=E5, device=1, temp=55 
6000id=E5, device=1, temp=61a_id=E3, count_b=2, c_id=E6

6.4.10. Quantifier - Zero Or One (? and ??)

The zero-or-one quantifier (?) must be matched zero or one time by events. The operator is greedy and the reluctant version is ??.

The pattern looks for a sequence of events in which the temperature is below 50 and then dips to over 50 and then to under 50 before indicating a value over 55:

select * from TemperatureSensorEvent
match_recognize (
  partition by device
  measures A.id as a_id, B.id as b_id, C.id as c_id, D.id as d_id
  pattern (A B? C? D)
  define 
	A as A.temp < 50,
	B as B.temp > 50,
	C as C.temp < 50,
	D as D.temp > 55)

An example sequence of events that matches the pattern above is:

Table 6.10. Example

Arrival TimeTupleOutput Event (if any)
1000id=E1, device=1, temp=44 
2000id=E2, device=1, temp=49 
3000id=E3, device=1, temp=51 
4000id=E4, device=1, temp=49 
5000id=E5, device=1, temp=56a_id=E2, b_id=E3, c_id=E4, d_id=E5
6000id=E5, device=1, temp=61 

6.5. Define Clause

Within define are listed the boolean conditions that defines a variable name that is declared in the pattern.

A variable name does not require a definition and if there is no definition, the default is a predicate that is always true. Such a variable name can be used to match any row.

The definitions of variable names may reference the same or other variable names as prior examples have shown.

If a variable in your condition expression is a singleton variable, then only individual columns may be referenced. If the variable is not matched by an event, a null value is returned.

If a variable in your condition expression is a group variable, then only indexed columns may be referenced. If the variable is not matched by an event, a null value is returned.

Aggregation functions are not allowed within expressions of the define clause.

6.5.1. The Prev Operator

The prev function may be used in a define expression to access columns of the previous row of a variable name. If there is no previous row, the null value is returned.

The prev function can accept an optional non-negative integer argument indicating the offset to the previous rows. That argument must be a constant. In this case, the engine returns the property from the N-th row preceding the current row, and if the row doesn’t exist, it returns null.

This function can access variables currently defined, for example:

Y as Y.price < prev(Y.price, 2)

It is not legal to use prev with another variable then the one being defined:

// not allowed
Y as Y.price < prev(X.price, 2)

The prev function returns properties of events in the same partition. Also, it returns properties of events according to event order-of-arrival. When using data windows or deleting events from a named window, the remove stream does not remove events from the prev function.

The pattern looks for an event in which the temperature is greater or equal 100 and that, relative to that event, has an event preceding it by 2 events that also had a temperature greater or equal 100:

select * from TemperatureSensorEvent
match_recognize (
  partition by device
  measures A.id as a_id
  pattern (A)
  define 
	A as A.temp > 100 and prev(A.temp, 2) > 100)

An example sequence of events that matches the pattern above is:

Table 6.11. Example

Arrival TimeTupleOutput Event (if any)
1000id=E1, device=1, temp=98 
2000id=E2, device=1, temp=101 
3000id=E3, device=1, temp=101 
4000id=E4, device=1, temp=99 
5000id=E5, device=1, temp=101a_id=E5

6.6. Measure Clause

The measures clause defines exported columns that contain expressions over the pattern variables. The expressions can reference partition columns, singleton variables and any aggregation functions including last and first on the group variables.

Expressions in the measures clause must use the as keyword to assign a column name.

If a variable is a singleton variable then only individual columns may be referenced, not aggregates. If the variable is not matched by an event, a null value is returned.

If a variable is a group variable and used in an aggregate, then the aggregate is performed over all rows that have matched the variable. If a group variable is not in an aggregate function, its variable name must be post-fixed with an index. See Section 6.4.5, “Variables Can be Singleton or Group” for more information.

6.7. Datawindow-Bound

When using match recognize with a named window or stream bound by a data window, all events removed from the named window or data window also removed the match-in-progress that includes the event(s) removed.

The next example looks for four sensor events from the same device immediately following each other and indicating a rising temperature, but only events that arrived in the last 10 seconds are considered:

select * from TemperatureSensorEvent.win:time(10 sec)
match_recognize (
partition by device
measures A.id as a_id
pattern (A B C D)
define 
B as B.temp > A.temp,
C as C.temp > B.temp,
D as D.temp > C.temp)

An example sequence of events that matches the pattern above is:

Table 6.12. Example

Arrival TimeTupleOutput Event (if any)
1000id=E1, device=1, temp=80 
2000id=E2, device=1, temp=81 
3000id=E3, device=1, temp=82 
4000id=E4, device=1, temp=81 
7000id=E5, device=1, temp=82 
9000id=E6, device=1, temp=83 
13000id=E7, device=1, temp=84a_id=E4, a_id=E5, a_id=E6, a_id=E7
15000id=E8, device=1, temp=84 
20000id=E9, device=1, temp=85 
21000id=E10, device=1, temp=86 
26000id=E11, device=1, temp=87 

Note that E8, E9, E10 and E11 doe not constitute a match since E8 leaves the data window at 25000.

6.8. Interval

With the optional interval keyword and time period you can control how long the engine should wait for further events to arrive that may be part of a matching event sequence, before indicating a match (or matches). This is not applicable to on-demand pattern matching.

The interval timer starts are the arrival of the first event matching a sequence for a partition. When the time interval passes and an event sequence matches, duplicate matches are eliminated and output occurs.

The next example looks for sensor events indicating a temperature of over 100 waiting for any number of additional events with a temperature of over 100 for 10 seconds before indicating a match:

select * from TemperatureSensorEvent
match_recognize (
partition by device
measures A.id as a_id, count(B.id) as count_b, first(B.id) as first_b, last(B.id) as last_b
pattern (A B*)
interval 5 seconds
define 
  A as A.temp > 100,
  B as B.temp > 100)

An example sequence of events that matches the pattern above is:

Table 6.13. Example

Arrival TimeTupleOutput Event (if any)
1000id=E1, device=1, temp=98 
2000id=E2, device=1, temp=101 
3000id=E3, device=1, temp=102 
4000id=E4, device=1, temp=104 
5000id=E5, device=1, temp=104 
7000 a_id=E2, count_b=3, first_b=E3, last_b=E5

Notice that the engine waits 5 seconds (5000 milliseconds) after the arrival time of the first event E2 of the match at 2000, to indicate the match at 7000.

6.9. Limitations

Please note the following limitations:

  1. Subqueries are not allowed in expressions within match_recognize.

  2. Joins and outer joins are not allowed in the same statement as match_recognize.

  3. match_recognize may not be used within on-select or on-insert statements.

  4. When using match_recognize on unbound streams (no data window provided) the iterator pull API returns no rows.

  5. A Statement Object Model API for match_recognize is not yet available.


© 2011 EsperTech Inc. All Rights Reserved