esper.codehaus.org and espertech.comDocumentation

Chapter 10. EPL Reference: Enumeration Methods

10.1. Overview
10.2. Example Events
10.3. How to Use
10.3.1. Syntax
10.3.2. Introductory Examples
10.3.3. Input, Output and Limitations
10.4. Inputs
10.4.1. Subquery Results
10.4.2. Named Window
10.4.3. Event Property
10.4.4. Event Aggregation Function
10.4.5. prev, prevwindow and prevtail Single-Row Functions as Input
10.4.6. Single-Row Function, User-Defined Function and Enum Types
10.4.7. Declared Expression
10.4.8. Variables
10.4.9. Match-Recognize Group Variable
10.4.10. Pattern Repeat and Repeat-Until Operators
10.5. Example
10.6. Reference
10.6.1. Aggregate
10.6.2. AllOf
10.6.3. AnyOf
10.6.4. Average
10.6.5. CountOf
10.6.6. DistinctOf
10.6.7. Except
10.6.8. FirstOf
10.6.9. GroupBy
10.6.10. Intersect
10.6.11. LastOf
10.6.12. LeastFrequent
10.6.13. Max
10.6.14. MaxBy
10.6.15. Min
10.6.16. MinBy
10.6.17. MostFrequent
10.6.18. OrderBy and OrderByDesc
10.6.19. Reverse
10.6.20. SelectFrom
10.6.21. SequenceEqual
10.6.22. SumOf
10.6.23. Take
10.6.24. TakeLast
10.6.25. TakeWhile
10.6.26. TakeWhileLast
10.6.27. ToMap
10.6.28. Union
10.6.29. Where

EPL provides enumeration methods that work with lambda expressions to perform common tasks on subquery results, named windows, event properties or inputs that are or can be projected to a collection of events, scalar values or objects.

A lambda expression is an anonymous expression. Lambda expressions are useful for encapsulating user-defined expressions that are applied to each element in a collection. This section discusses built-in enumeration methods and their lambda expression parameters.

Lambda expressions use the lambda operator =>, which is read as "goes to". The left side of the lambda operator specifies the lambda expression input parameter(s) (if any) and the right side holds the expression. The lambda expression x => x * x is read "x goes to x times x.". Lambda expressions are also used for expression declaration as discussed in Section 5.2.8, “Expression Declaration”.

When writing lambdas, you do not have to specify a type for the input parameter(s) or output result(s) because the engine can infer all types based on the input and the expression body. So if you are querying an RFIDEvent, for example, then the input variable is inferred to be an RFIDEvent event, which means you have access to its properties and methods.

The term element in respect to enumeration methods means a single event, scalar value or object in a collection that is the input to an enumeraton method. The term collection means a sequence or group of elements.

The below table summarizes the built-in enumeration methods available:

Table 10.1. Enumeration Methods

MethodResult
aggregate(seed, accumulator lambda)

Aggregate elements by using seed as an initial accumulator value and applying an accumulator expression.

Section 10.6.1, “Aggregate”.

allof(predicate lambda)

Return true when all elements satisfy a condition.

Section 10.6.2, “AllOf”.

anyof(predicate lambda)

Return true when any element satisfies a condition.

Section 10.6.3, “AnyOf”.

average()

Computes the average of values obtained from numeric elements.

Section 10.6.4, “Average”.

average(projection lambda)

Computes the average of values obtained from elements by invoking a projection expression on each element.

Section 10.6.4, “Average”.

countof()

Returns the number of elements.

Section 10.6.5, “CountOf”.

countof(predicate lambda)

Returns the number of elements that satisfy a condition.

Section 10.6.5, “CountOf”.

distinctOf()

Returns distinct elements according to default hash and equals semantics.

Section 10.6.6, “DistinctOf”.

distinctOf(key-selector lambda)

Returns distinct elements according using the key function provided.

Section 10.6.6, “DistinctOf”.

except(source)

Produces the set difference of the two collections.

Section 10.6.7, “Except”.

firstof()

Returns the first element.

Section 10.6.8, “FirstOf”.

firstof(predicate lambda)

Returns the first element that satisfies a condition.

Section 10.6.8, “FirstOf”.

groupby(key-selector lambda)

Groups the elements according to a specified key-selector expression.

Section 10.6.9, “GroupBy”.

groupby(key-selector lambda, value-selector lambda)

Groups the elements according to a key-selector expression mapping each element to a value according to a value-selector.

Section 10.6.9, “GroupBy”.

intersect(source)

Produces the set intersection of the two collections.

Section 10.6.10, “Intersect”.

lastof()

Returns the last element.

Section 10.6.11, “LastOf”.

lastof(predicate lambda)

Returns the last element that satisfies a condition.

Section 10.6.11, “LastOf”.

leastFrequent()

Returns the least frequent value among a collection of values.

Section 10.6.12, “LeastFrequent”.

leastFrequent(transform lambda)

Returns the least frequent value returned by the transform expression when applied to each element.

Section 10.6.12, “LeastFrequent”.

max()

Returns the maximum value among a collection of elements.

Section 10.6.13, “Max”.

max(value-selector lambda)

Returns the maximum value returned by the value-selector expression when applied to each element.

Section 10.6.13, “Max”.

maxby(value-selector lambda)

Returns the element that provides the maximum value returned by the value-selector expression when applied to each element.

Section 10.6.14, “MaxBy”.

min()

Returns the minimum value among a collection of elements.

Section 10.6.13, “Max”.

min(value-selector lambda)

Returns the minimum value returned by the value-selector expression when applied to each element.

Section 10.6.15, “Min”.

minby(value-selector lambda)

Returns the element that provides the minimum value returned by the value-selector expression when applied to each element..

Section 10.6.16, “MinBy”.

mostFrequent()

Returns the most frequent value among a collection of values.

Section 10.6.17, “MostFrequent”.

mostFrequent(transform lambda)

Returns the most frequent value returned by the transform expression when applied to each element.

Section 10.6.17, “MostFrequent”.

orderBy()

Sorts the elements in ascending order.

Section 10.6.18, “OrderBy and OrderByDesc”.

orderBy(key-selector lambda)

Sorts the elements in ascending order according to a key.

Section 10.6.18, “OrderBy and OrderByDesc”.

orderByDesc()

Sorts the elements in descending order.

Section 10.6.18, “OrderBy and OrderByDesc”.

orderByDesc(key-selector lambda)

Sorts the elements in descending order according to a key.

Section 10.6.18, “OrderBy and OrderByDesc”.

reverse

Reverses the order of elements.

Section 10.6.19, “Reverse”.

selectFrom(transform lambda)

Transforms each element resulting in a collection of transformed elements.

Section 10.6.20, “SelectFrom”.

sequenceEqual(second)

Determines whether two collections are equal by comparing each element (equals semantics apply).

Section 10.6.21, “SequenceEqual”.

sumOf()

Computes the sum from a collection of numeric elements.

Section 10.6.22, “SumOf”.

sumOf(projection lambda)

Computes the sum by invoking a projection expression on each element.

Section 10.6.22, “SumOf”.

take(numElements)

Returns a specified number of contiguous elements from the start.

Section 10.6.23, “Take”.

takeLast(numElements)

Returns a specified number of contiguous elements from the end.

Section 10.6.24, “TakeLast”.

takeWhile(predicate lambda)

Returns elements from the start as long as a specified condition is true.

Section 10.6.25, “TakeWhile”.

takeWhile( (predicate, index) lambda)

Returns elements from the start as long as a specified condition is true, allowing each element's index to be used in the logic of the predicate expression.

Section 10.6.25, “TakeWhile”.

takeWhileLast(predicate)

Returns elements from the end as long as a specified condition is true.

Section 10.6.26, “TakeWhileLast”.

takeWhileLast( (predicate,index) lambda)

Returns elements from the end as long as a specified condition is true, allowing each element's index to be used in the logic of the predicate expression.

Section 10.6.26, “TakeWhileLast”.

toMap(key-selector lambda, value-selector lambda)

Returns a Map according to specified key selector and value-selector expressions.

Section 10.6.27, “ToMap”.

union(source)

Forms a union of the input elements with source elements.

Section 10.6.28, “Union”.

where(predicate lambda)

Filters elements based on a predicate.

Section 10.6.29, “Where”.

where( (predicate,index) lambda)

Filters elements based on a predicate, allowing each element's index to be used in the logic of the predicate expression.

Section 10.6.29, “Where”.


The examples in this section come out of the domain of location report (aka. RFID, asset tracking etc.) processing:

The examples use example single-row functions for computing the distance (distance) and for determining if a location falls within a rectangle (inrect) that are not provided by the EPL language. These example UDF functions are not enumeration methods and are used in EPL statements to provide a sensible example.

The Item event contains an assetId id, a (x,y) location, a luggage flag to indicate whether the item represents a luggage (true) or passenger (false), and the assetIdPassenger that holds the asset id of the associated passenger when the item is a piece of luggage.

The Item event is defined as follows (access methods not shown for brevity):

public class Item {
  String assetId;             // passenger or luggage asset id
  Location location;          // (x,y) location
  boolean luggage;            // true if this item is a luggage piece
  String assetIdPassenger;    // if the item is luggage, contains passenger associated
...

The LocationReport event contains a list of Item items for which it reports events.

The LocationReport event is defined as follows:

public class LocationReport {
  List<Item> items;
...

The Zone event contains a zone name and (x1, y1, x2, y2) rectangle.

The Zone event is defined as follows:

public class Zone {
  String name;
  Rectangle rectangle;
...

The Location object is a nested object to Item and provides the current (x,y) location:

public class Location {
  int x;
  int y;
...

The Rectangle object is a nested object to Zone and provides a zone rectangle(x1,y1,x2,y2):

public class Rectangle {
  int x1;
  int y1;
  int x2;
  int y2;
...

It is not necessary to use classes for event representation. The example above applies the same to Object-array, Map or XML underlying events.

For most enumeration methods the input can be any collection of events, scalar values or objects. For some enumeration methods limitations apply that are documented below. For example, the sumOf enumeration method requires a collection of numeric scalar values if used without parameters. If the input to sumOf is a collection of events or scalar values the enumeration method requires a lambda expression as parameter that yields the numeric value to use to compute the sum.

Many examples of this section operate on the collection returned by the event property items in the LocationReport event class. There are many other inputs yielding collections as listed below. Most examples herein use an event property as a input simply because the example can thus be brief and does not need to refer to a subquery or named window or other concept.

For enumeration methods that return a collection, for example where and orderBy, the engine outputs an implementation of the Collection interface that contains the selected value(s). The collection returned must be considered read-only. As Java does not allow resettable iterators, the Collection interface allows more flexibility to query size and navigate among collection elements. We recommend against down-casting a collection returned by the engine to a more specific subclass of the Collection interface.

For enumeration methods that return an element, for example firstOf, lastOf, minBy and maxBy the engine outputs the scalar value or the underlying event if operating on events. You may add an event property name after the enumeration method to return a property value.

Enumeration methods generally retain the order of elements provided by the collection.

The following restrictions apply to enumeration methods:

The input of data for built-in enumeration methods is a collection of scalar values, events or other objects. Input can originate from any of the following:

Subqueries can return the rows of another stream's data window or rows from a named window. By providing a where-clause the rows returned by a subquery can be correlated to data provided by stream(s) in the from-clause. See Section 5.11, “Subqueries”.

A subquery that selects (*) wildcard provides a collection of events as input. A subquery that selects a single value expression provides a collection of scalar values as input. Subqueries that selects multiple value expressions are not allowed as input to enumeration methods.

The following example uses a subquery to retrieve all zones for each location report item where the location falls within the rectangle of the zone. Please see a description of example events and functions above.

select assetId,
  (select * from Zone.std:unique(name)).where(z => inrect(z.rectangle, location)) as zones
from Item

You may place the subquery in an expression declaration to reuse the subquery in multiple places of the same EPL statement.

This sample EPL declares the same query as above in an expression declaration:

expression myquery {itm =>
  (select * from Zone.std:unique(name)).where(z => inrect(z.rectangle, itm.location))
}
select assetId, myquery(item) as subq, 
    myquery(item).where(z => z.zone = 'Z01') as assetItem
from Item as item

The above query also demonstrates how an enumeration method, in the example the where-method, can be run across the results returned by a subquery in an expression declaration.

Place a single column in the subquery select-clause to provide a collection of scalar values as input.

The next example selects all names of zones and orders the names returning an order collection of string names every 30 seconds:

select (select name from Zone.std:unique(name)).orderBy() as orderedZones
from pattern[every timer:interval(30)]

The next example utilizes a subquery that counts zone events per name and finds those that have a count greater then 1:

select (select name, count(*) as cnt from Zone.win:keepall() group by name)
  .where(v => cnt > 1) from LocationReport]

When the subquery selects a single column that is itself an event, the result of the subquery is a collection of events of that type and can provide input to enumeration methods.

For example:

create schema SettlementEvent (symbol string);
create schema PriceEvent (symbol string, price double);
create schema OrderEvent (orderId string, pricedata PriceEvent);
select (select pricedata from OrderEvent.std:unique(orderId))
  .anyOf(v => v.symbol = 'GE') as has_ge from SettlementEvent(symbol = 'GE')

Note that the engine can cache intermediate results thereby is not forced to re-evaluate the subquery for each occurrence in the select-clause.

Named windows are globally-visible data windows. See Section 5.15, “Creating and Using Named Windows”.

You may specify the named window name as input for an enumeration method and can optionally provide a correlation where-clause. The syntax is equivalent to a sub-query against a named window but much shorter.

Synopsis:

named-window-name[(correlation-expression)].enum-method-name(...)

When selecting all events in a named window you do not need the correlation-expression. To select a subset of data in the named window, specify a correlation-expression. From the perspective of best runtime performance, a correlation expression is preferred to reduce the number of rows returned.

The following example first declares a named window to hold the last zone event per zone name:

create window ZoneWindow.std:unique(name) as Zone

Then we create a statement to insert zone events that arrive to the named window:

insert into ZoneWindow select * from Zone

Finally this statement queries the named window to retrieve all zones for each location report item where the location falls within the rectangle of the zone:

select ZoneWindow.where(z => inrect(z.rectangle, location)) as zones from Item

If you have a filter or correlation expression, append the expression to the named window name and place in parenthesis.

This slightly modified query is the example above except that it adds a filter expression such that only zones with name Z1, Z2 or Z3 are considered:

select ZoneWindow(name in ('Z1', 'Z2', 'Z3')).where(z => inrect(z.rectangle, location)) as zones 
from Item

You may prefix property names provided by the named window with the name to disambiguate property names.

This sample query prefixed the name property and returns the count of matching zones:

select ZoneWindow(ZoneWindow.name in ('Z1', 'Z2', 'Z3')).countof()) as zoneCount
from Item

The engine internally interprets the shortcut syntax and creates a subquery from it. Thus all indexing and query planning for subqueries against named windows apply here as well.

Event aggregation functions return an event or multiple events. They are aggregation functions and as such sensitive to the presence of group by. See Section 9.2.2, “Event Aggregation Functions”.

You can use window, first or last event aggregation functions as input to an enumeration method. Specify the * wildcard as the parameter to the event aggregation function to provide a collection of events as input. Or specify a property name as the parameter to event aggregation function to provide a collection of scalar values as input.

You can use the sorted, maxby, minby, maxbyever or minbyever event aggregation functions as input to an enumeration method. Specify one or more criteria expressions that provide the sort order as parameters to the event aggregation function.

In this example query the window(*) aggregation function returns the last 10 seconds of item location reports for the same asset id as the incoming event. Among that last 10 seconds of events for the same asset id, the enumeration method returns those item location reports where the distance to center is less then 20, for each arriving Item event.

Sample query:

select window(*).where(p => distance(0, 0, p.location.x, p.location.y) < 20) as centeritems
from Item(type='P').win:time(10) group by assetId

The next sample query instead selects the asset id property of all events and returns an ordered collection:

select window(assetId).orderBy() as orderedAssetIds
from Item.win:time(10) group by assetId

The following example outputs the 5 highest prices per symbol among the last 10 seconds of stock ticks:

select sorted(price desc).take(5) as highest5PricesPerSymbol
from StockTick.win:time(10) group by symbol

The prev, prevwindow and prevtail single-row functions allow access into a stream's data window however are not aggregation functions and and as such not sensitive to the presence of group by. See Section 9.1.11, “The Previous-Window Function”.

When using any of the prev single-row functions as input to a built-in enumeration method you can specify the stream name as a parameter to the function or an event property. The input to the enumeration method is a collection of events if you specify the stream name, or a collection of scalar value if you specify an event property.

In this example query the prevwindow(stream) single-row function returns the last 10 seconds of item location reports, among which the enumeration method filters those item location reports where the distance to center is less then 20, for each Item event that arrived in the last 10 seconds considering passenger-type Item events only (see filter type = 'P').

Sample query:

select prevwindow(items)
    .where(p => distance(0, 0, p.location.x, p.location.y) < 20) as centeritems
from Item(type='P').win:time(10) as items

This sample query demonstrates the use of the prevwindow function to return a collection of scalar values (collection of asset id) as input to orderby:

select prevwindow(assetId).orderBy() as orderedAssetIds
from Item.win:time(10) as items

Your single-row or user-defined function can return either an array or any collection that implements either the Collection or Iterable interface. For arrays, the array component type and for collections, the collection or iterable generic type should be the class providing event properties.

As an example, assume a ZoneFactory class exists and a static method getZones() returns a list of zones to filter items, for example:

public class ZoneFactory {
  public static Iterable<Zone> getZones() {
    List<Zone> zones = new ArrayList<Zone>();
    zones.add(new Zone("Z1", new Rectangle(0, 0, 20, 20)));
    return zones;
  }
}

Import the class through runtime or static configuration, or add the method above as a plug-in single-row function.

The following query returns for each Item event all zones that the item belongs to:

select ZoneFactory.getZones().where(z => inrect(z.rectangle, item.location)) as zones
from Item as item

If the class and method were registered as a plug-in single-row function, you can leave the class name off, for example:

select getZones().where(z => inrect(z.rectangle, item.location)) as zones
from Item as item

Your single-row or user-defined function can also return an array, collection or iterable or scalar values.

For example, the static method getZoneNames() returns a list of zone names:

public static String[] getZoneNames() { 
  return new String[] { "Z1", "Z2"};
}

The following query returns zone names every 30 seconds and excludes zone Z1:

select getZoneNames().where(z => z != "Z1")
from pattern[every timer:interval(30)]

An enum type can also be a useful source for enumerable values.

The following sample Java declares an enum type EnumOfZones:

public enum EnumOfZones {
  ZONES_OUTSIDE(new String[] {"z1", "z2"}),
  ZONES_INSIDE(new String[] {"z3", "z4"})

  private final String[] zones;

  private EnumOfZones(String[] zones) {
	  this.zones = zones;
  }

  public String[] getZones() {
    return zones;
  }
}

A sample statement that utilizes the enum type is shown next:

select EnumOfZones.ZONES_OUTSIDE.getZones().anyOf(v => v = zone) from Item

Following the RFID asset tracking example as introduced earlier, this section introduces two use cases solved by enumeration methods.

The first use case requires us to find any luggage that is more then 20 units away from the passenger that the luggage belongs to. The declared expression lostLuggage solves this question.

The second question to answer is: For each of such lost luggage what single other passenger is nearest to that luggage. The declared expression nearestOwner which uses lostLuggage answers this question.

Below is the complete EPL statement (one statement not multiple):

// expression to return a collection of lost luggage
expression lostLuggage {
  lr => lr.items.where(l => l.type='L' and
    lr.items.some(p => p.type='P' and p.assetId=l.assetIdPassenger 
      and LRUtil.distance(l.location.x, l.location.y, p.location.x, p.location.y) > 20))
}

// expression to return all passengers
expression passengers {
  lr => lr.items.where(l => l.type='P')
}

// expression to find the nearest owner
expression nearestOwner {
  lr => lostLuggage(lr).toMap(key => key.assetId,
    value => passengers(lr).minBy(
        p => LRUtil.distance(value.location.x, value.location.y, p.location.x, p.location.y))
    )
}

select lostLuggage(lr) as val1, nearestOwner(lr) as val2 from LocationReport lr

The aggregate enumeration method takes an expression providing the initialization value (seed) and an accumulator lambda expression. The return value is the final accumulator value.

Via the aggregate method you may perform a calculation over elements. The method initializes the aggregated value by evaluating the expression provided in the first parameter. The method then calls the lambda expression of the second parameter once for each element in the input. The lambda expression receives the last aggregated value and the element from the input. The result of the expression replaces the previous aggregated value and returns the final result after completing all elements.

An expression example with scalar values:

{1, 2, 3}.aggregate(0, (result, value) => result + value)  // Returns 6

The example below aggregates price of each OrderEvent in the last 10 seconds computing a total price:

// Initialization value is zero.
// Aggregate by adding up the price.
select window(*).aggregate(0, (result, order) => result + order.price) as totalPrice
from OrderEvent.win:time(10)

In the query above, the initialization value is zero, result is used for the last aggregated value and order denotes the element that the expression adds the value of the price property.

This example aggregation builds a comma-separated list of all asset ids of all items:

select items.aggregate('', 
  (result, item) => result || (case when result='' then '' else ',' end) || item.assetId) as assets			
from LocationReport

In above query, the empty string '' represents the initialization value. The name result is used for the last aggregated value and the name item is used to denote the element.

The type value returned by the initialization expression must match to the type of value returned by the accumulator lambda expression.

If the input is null the method returns null. If the input is empty the method returns the initialization value.

The selectFrom enumeration method transforms each element resulting in a collection of transformed elements.

The enumeration method applies a transformation lambda expression to each element and returns the result of each transformation as a collection. Use the new operator to yield multiple values for each element, see Section 8.13, “The 'new' Keyword”.

The next EPL query returns a collection of asset ids:

select items.selectFrom(i => assetId) as itemAssetIds from LocationReport

This sample EPL query evaluates each item and returns the asset id as well as the distance from center for each item:

select items.selectFrom(i => 
  new {
    assetId, 
    distanceCenter = distance(i.location.x, i.location.y, 0, 0)
  } ) as itemInfo from LocationReport

If the input is null the method returns null. If the input is empty the method returns an empty collection.