www.espertech.comDocumentation

Chapter 5. EPL Reference: Clauses

5.1. EPL Introduction
5.2. EPL Syntax
5.2.1. Specifying Time Periods
5.2.2. Using Comments
5.2.3. Reserved Keywords
5.2.4. Escaping Strings
5.2.5. Data Types
5.2.6. Using Constants and Enum Types
5.2.7. Annotation
5.2.8. Expression Alias
5.2.9. Expression Declaration
5.2.10. Script Declaration
5.2.11. Referring to a Context
5.2.12. Composite Keys and Array Values as Keys
5.3. Choosing Event Properties and Events: The Select Clause
5.3.1. Choosing the Event Itself: Select *
5.3.2. Choosing Specific Event Properties
5.3.3. Expressions
5.3.4. Renaming Event Properties
5.3.5. Choosing Event Properties and Events in a Join
5.3.6. Choosing Event Properties and Events From a Pattern
5.3.7. Selecting Insert and Remove Stream Events
5.3.8. Select Distinct
5.3.9. Transposing an Expression Result to a Stream
5.3.10. Selecting EventBean Instead of Underlying Event
5.4. Specifying Event Streams: The From Clause
5.4.1. Filter-Based Event Streams
5.4.2. Pattern-Based Event Streams
5.4.3. Specifying Data Windows
5.4.4. Multiple Data Windows
5.4.5. Using the Stream Name
5.5. Specifying Search Conditions: The Where Clause
5.6. Aggregates and Grouping: The Group-By Clause and the Having Clause
5.6.1. Using Aggregate Functions
5.6.2. Organizing Statement Results into Groups: The Group-by Clause
5.6.3. Using Group-By with Rollup, Cube and Grouping Sets
5.6.4. Specifying Grouping for Each Aggregation Function
5.6.5. Specifying a Filter Expression for Each Aggregation Function
5.6.6. Selecting Groups of Events: The Having Clause
5.6.7. How the Stream Filter, Where, Group By and Having-Clauses Interact
5.6.8. Comparing Keyed Segmented Context, the Group By Clause and #groupwin for Data Windows
5.7. Stabilizing and Controlling Output: The Output Clause
5.7.1. Output Clause Options
5.7.2. Aggregation, Group By, Having and Output Clause Interaction
5.8. Sorting Output: the Order By Clause
5.9. Limiting Row Count: the Limit Clause
5.10. Merging Streams and Continuous Insertion: The Insert Into Clause
5.10.1. Transposing a Property to a Stream
5.10.2. Merging Streams by Event Type
5.10.3. Merging Disparate Types of Events: Variant Streams
5.10.4. Decorated Events
5.10.5. Event as a Property
5.10.6. Instantiating and Populating an Underlying Event Object
5.10.7. Transposing an Expression Result
5.10.8. Select-Clause Expression and Inserted-Into Column Event Type
5.10.9. Insert Into for Event Types Without Properties
5.11. Subqueries
5.11.1. The 'Exists' Keyword
5.11.2. The 'In' and 'Not In' Keywords
5.11.3. The 'Any' and 'Some' Keywords
5.11.4. The 'All' Keyword
5.11.5. Subquery With Group By Clause
5.11.6. Multi-Column Selection
5.11.7. Multi-Row Selection
5.11.8. Hints Related to Subqueries
5.12. Joining Event Streams
5.12.1. Introducing Joins
5.12.2. Inner (Default) Joins
5.12.3. Outer, Left and Right Joins
5.12.4. Unidirectional Joins
5.12.5. Unidirectional Full Outer Joins
5.12.6. Hints Related to Joins
5.13. Accessing Relational Data via SQL
5.13.1. Joining SQL Query Results
5.13.2. SQL Query and the EPL Where Clause
5.13.3. Outer Joins With SQL Queries
5.13.4. Using Patterns to Request (Poll) Data
5.13.5. Polling SQL Queries via Iterator
5.13.6. JDBC Implementation Overview
5.13.7. Oracle Drivers and No-Metadata Workaround
5.13.8. SQL Input Parameter and Column Output Conversion
5.13.9. SQL Row POJO Conversion
5.14. Accessing Non-Relational Data via Method, Script or UDF Invocation
5.14.1. Joining Method, Script or UDF Invocation Results
5.14.2. Polling Invocation Results via Iterator
5.14.3. Providing the Method
5.14.4. Using a Map Return Type
5.14.5. Using a Object Array Return Type
5.14.6. Using an EventBean Return Type
5.14.7. Providing the Script
5.14.8. Providing the UDF
5.15. Declaring an Event Type: Create Schema
5.15.1. Declare an Event Type by Providing Names and Types
5.15.2. Declare an Event Type by Providing a Class Name
5.15.3. Declare a Variant Stream
5.16. Splitting and Duplicating Streams
5.16.1. Generating Marker Events for Contained Events
5.17. Variables and Constants
5.17.1. Creating Variables: The Create Variable Clause
5.17.2. Setting Variable Values: The On Set Clause
5.17.3. Using Variables
5.17.4. Object-Type Variables
5.17.5. Class and Event-Type Variables
5.18. Declaring Global Expressions, Aliases and Scripts: Create Expression
5.18.1. Global Expression Aliases
5.18.2. Global Expression Declarations
5.18.3. Global Scripts
5.19. Contained-Event Selection
5.19.1. Select-Clause in a Contained-Event Selection
5.19.2. Where Clause in a Contained-Event Selection
5.19.3. Contained-Event Selection and Joins
5.19.4. Sentence and Word Example
5.19.5. More Examples
5.19.6. Contained Expression Returning an Array of Property Values
5.19.7. Contained Expression Returning an Array of EventBean
5.19.8. Generating Marker Events Such as a Begin and End Event
5.19.9. Contained-Event Limitations
5.20. Updating an Insert Stream: The Update IStream Clause
5.20.1. Immutability and Updates
5.21. Controlling Event Delivery : The For Clause

The Event Processing Language (EPL) is a SQL-standard language with extensions, offering SELECT, FROM, WHERE, GROUP BY, HAVING and ORDER BY clauses. Streams replace tables as the source of data with events replacing rows as the basic unit of data. Since events are composed of data, the SQL concepts of correlation through joins, subqueries and aggregation through grouping can be effectively leveraged.

The INSERT INTO clause is recast as a means of forwarding events to other streams for further processing. External data may be queried and joined with the stream data. Additional clauses such as the PATTERN and OUTPUT clauses are available to provide the missing SQL language constructs specific to event processing.

Statements can specify data windows. Similar to tables in a SQL statement, data windows define the subset of events to be analyzed. Data windows can be combined to an intersection or union of sets of events. Some of the often-used data windows are #length, #time, #unique, #lastevent, #firstevent and #keepall.

EPL provides the concept of named window. Named windows are data windows that can be used by multiple statements. The name of a named window can occur in a statement's FROM clause to query the named window or to include the named window in a join or subquery.

EPL provides the concept of table. Tables are globally-visible data structures that typically have primary key columns and that can hold aggregation state. An overview of named windows and tables, and a comparison between them, can be found at Section 6.1, “Overview”.

EPL allows execution of fire-and-forget (on-demand, non-continuous, triggered by API) queries against named windows and tables through the runtime API. The statement compiler automatically indexes named window data for fast access by ON SELECT/MERGE/UPDATE/INSERT/DELETE without the need to create an index explicitly, or can access explicit (secondary) table indexes for operations on tables. For fast fire-and-forget query execution via runtime API use the CREATE INDEX syntax to create an explicit index for the named window or table in question.

Use CREATE SCHEMA to declare an event type.

Variables can come in handy to parameterize statements and change parameters on-the-fly and in response to events. Variables can be used in an expression anywhere in a statement as well as in the output clause for dynamic control of output rates.

The compiler and runtime can be extended by plugging-in custom developed data windows, aggregation functions, and more.

Statement are compiled and deployed into the runtime, and publish results to listeners as events are received by the runtime or time advances that match the criteria specified in the statement. Events can also be obtained from polling statement via the safeIterator and iterator methods that provide a pull-data API.

The select clause in a statement specifies the event properties or events to retrieve. The from clause in a statement specifies the event stream definitions and stream names to use. The where clause in n statement specifies search conditions that specify which event or event combination to search for. For example, the following statement returns the average price for IBM stock ticks in the last 30 seconds.

select avg(price) from StockTick#time(30 sec) where symbol='IBM'

Statements follow the below syntax. Statements can be simple queries or more complex queries. A simple select contains only a select clause and a single stream definition. Complex statements can be build that feature a more elaborate select list utilizing expressions, may join multiple streams, may contain a where clause with search conditions and so on.

[annotations]
[expression_declarations]
[context context_name]
[into table table_name]
[insert into insert_into_def]
select select_list
from stream_def [as name] [, stream_def [as name]] [,...]
[where search_conditions]
[group by grouping_expression_list]
[having grouping_search_conditions]
[output output_specification]
[order by order_by_expression_list]
[limit num_rows]

Certain words such as select, delete or set are reserved and may not be used as identifiers. Please consult Appendix C, Reserved Keywords for the list of reserved keywords and permitted keywords.

Names of built-in functions and certain auxiliary keywords are permitted as event property names and in the rename syntax of the select clause. For example, count is acceptable.

Consider the example below, which assumes that 'last' is an event property of MyEvent:

// valid
select last, count(*) as count from MyEvent

This example shows an incorrect use of a reserved keyword:

// invalid
select insert from MyEvent

EPL offers an escape syntax for reserved keywords: Event properties as well as event or stream names may be escaped via the backwards apostrophe ` (ASCII 96) character.

The next example queries an event type by name Order (a reserved keyword) that provides a property by name insert (a reserved keyword):

// valid
select `insert` from `Order`

You may surround string values by either double-quotes (") or single-quotes ('). When your string constant in a statement itself contains double quotes or single quotes, you must escape the quotes.

Double and single quotes may be escaped by the backslash (\) character or by unicode notation. Unicode 0027 is a single quote (') and 0022 is a double quote (").

Escaping event property names is described in Section 3.2.1, “Escape Characters”.

The sample EPL below escapes the single quote in the string constant John's, and filters out order events where the name value matches:

select * from OrderEvent(name='John\'s')
// ...equivalent to...
select * from OrderEvent(name='John\u0027s')

The next EPL escapes the string constant Quote "Hello":

select * from OrderEvent(description like "Quote \"Hello\"")
// is equivalent to
select * from OrderEvent(description like "Quote \u0022Hello\u0022")

When building an escape string via the API, escape the backslash, as shown in below code snippet:

compiler.compile("select * from OrderEvent(name='John\\'s')", ...);
// ... and for double quotes...
compiler.compile("select * from OrderEvent(description like \"Quote \\\"Hello\\\"\")", ...);

For NEsper .NET also see Section J.12, “.NET EPL Syntax - Data Types”.

EPL honors all Java built-in primitive and boxed types, including java.math.BigInteger and java.math.BigDecimal.

EPL also follows Java standards in terms of widening, performing widening automatically in cases where widening type conversion is allowed without loss of precision, for both boxed and primitive types and including BigInteger and BigDecimal:

  1. byte to short, int, long, float, double, BigInteger or BigDecimal

  2. short to int, long, float, or double, BigInteger or BigDecimal

  3. char to int, long, float, or double, BigInteger or BigDecimal

  4. int to long, float, or double, BigInteger or BigDecimal

  5. long to float or double, BigInteger or BigDecimal

  6. float to double or BigDecimal

  7. double to BigDecimal

In cases where loss of precision is possible because of narrowing requirements, EPL compilation outputs a compilation error.

EPL supports casting via the cast function.

EPL returns double-type values for division regardless of operand type. EPL can also be configured to follow Java rules for integer arithmetic instead as described in Section 17.5.6, “Compiler Settings Related to Expression Evaluation”.

Division by zero returns positive or negative infinity. Division by zero can be configured to return null instead.

This chapter is about Java language constants and enum types and their use in EPL expressions.

Java language constants are public static final fields in Java that may participate in expressions of all kinds, as this example shows:

select * from MyEvent where property = MyConstantClass.FIELD_VALUE

Event properties that are enumeration values can be compared by their enum type value:

select * from MyEvent where enumProp = EnumClass.ENUM_VALUE_1

Event properties can also be passed to enum type functions or compared to an enum type method result:

select * from MyEvent where somevalue = EnumClass.ENUM_VALUE_1.getSomeValue()
  or EnumClass.ENUM_VALUE_2.analyze(someothervalue)

Enum types have a valueOf method that returns the enum type value:

select * from MyEvent where enumProp = EnumClass.valueOf('ENUM_VALUE_1')

If your application does not import, through configuration, the package that contains the enumeration class, then it must also specify the package name of the class. Enum types that are inner classes must be qualified with $ following Java conventions.

For example, the Color enum type as an inner class to MyEvent in package org.myorg can be referenced as shown:

select * from MyEvent(enumProp=org.myorg.MyEvent$Color.GREEN)#firstevent

Instance methods may also be invoked on event instances by specifying a stream name, as shown below:

select myevent.computeSomething() as result from MyEvent as myevent

Chaining instance methods is supported as this example shows:

select myevent.getComputerFor('books', 'movies').calculate() as result 
from MyEvent as myevent

An annotation is an addition made to information in a statement. EPL provides certain built-in annotations for defining statement name, adding a statement description or for tagging statements such as for managing statements or directing statement output. Other than the built-in annotations, applications can provide their own annotation classes that the EPL compiler can populate.

An annotation is part of the statement text and precedes the statement. Annotations are therefore part of the EPL grammar. The syntax for annotations follows the host language (Java, .NET) annotation syntax:

@annotation_name [(annotation_parameters)]

An annotation consists of the annotation name and optional annotation parameters. The annotation_name is the simple class name or fully-qualified class name of the annotation class. The optional annotation_parameters are a list of key-value pairs following the syntax:

@annotation_name (attribute_name = attribute_value, [name=value, ...])

The attribute_name is an identifier that must match the attributes defined by the annotation class. An attribute_value is a constant of any of the primitive types or string, an array, an enum type value or another (nested) annotation. Null values are not allowed as annotation attribute values. Enumeration values are supported in statements and not support in statements created via the createPattern method.

Use the getAnnotations method of EPStatement to obtain annotations.

Your application may provide its own annotation classes. The compiler detects and populates annotation instances for application annotation classes.

The name of application-provided annotations is case-sensitive.

To enable the compiler to recognize application annotation classes, your annotation name must include the package name (i.e. be fully-qualified) or your compiler configuration must import the annotation class or package via the configuration API.

For example, assume that your application defines an annotation in its application code as follows:

public @interface ProcessMonitor {
  String processName();
  boolean isLongRunning default false;
  int[] subProcessIds;
}

Shown next is a statement that utilizes the annotation class defined earlier:

@ProcessMonitor(processName='CreditApproval',
  isLongRunning=true, subProcessIds = {1, 2, 3} )
  
select count(*) from ProcessEvent(processId in (1, 2, 3)#time(30)

Above example assumes the ProcessMonitor annotation class is imported via configuration XML or API.

If ProcessMonitor should only be visible for use in annotations, use addAnnotationImport (or the auto-import-annotations XML tag). If ProcessMonitor should be visible in all of EPL including annotations, use addImport (or the auto-import XML tag).

Here is an example API call to import for annotation-only all classes in package com.mycompany.app.myannotations:

Configuration configuration = new Configuration();
configuration.getCommon().addAnnotationImport("com.mycompany.app.myannotations.*");

The next example imports the ProcessMonitor class only and only for annotation use:

Configuration configuration = new Configuration();
configuration.getCommon().addAnnotationImport("com.mycompany.myannotations.ProcessMonitor");

The name of built-in annotations is not case-sensitive, allowing both @NAME or @name, for example.

The list of built-in statement-level annotations is:

Table 5.2. Built-In Statement Annotations

NamePurpose and AttributesExample
Name

Provides a statement name. Attributes are:

value : Statement name.

@Name("MyStatementName")
Description

Provides a statement description. Attributes are:

value : Statement description.

@Description("Place statement
description here.")
Tag

For tagging a statement with additional information. Attributes are:

name : Tag name.

value : Tag value.

@Tag(name="MyTagName", 
 value="MyTagValue")
Priority

Applicable when an event (or schedule) matches filter criteria for multiple statements: Defines the order of statement processing (requires an runtime-level setting).

Attributes are:

value : priority value.

@Priority(10)
Drop

Applicable when an event (or schedule) matches filter criteria for multiple statements, drops the event after processing the statement (requires a runtime-level setting).

No attributes.

@Drop
Hint

For providing one or more hints towards how the runtime should execute a statement. Attributes are:

value : A comma-separated list of one or more case-insensitive keywords.

@Hint('iterate_only')
Hook

Use this annotation to register one or more statement-specific hooks providing a hook type for each individual hook, such as for SQL parameter, column or row conversion.

Attributes are the hook type and the hook itself (usually a import or class name):

@Hook(type=HookType.SQLCOL,
  hook='MyDBTypeConvertor')
Audit

Causes the runtime to output detailed processing information for a statement.

optional value : A comma-separated list of one or more case-insensitive keywords.

@Audit
EventRepresentation

Causes the compiler to use a specific event representation for output and internal event types.

For Object-Array:

@EventRepresentation(objectarray)

For JSON:

@EventRepresentation(json)

For Map:

@EventRepresentation(map)

For Avro:

@EventRepresentation(avro)
IterableUnbound

For use when iterating statements with unbound streams, instructs the compiler to retain the last event for iterating.

@IterableUnbound

The following example statement specifies some of the built-in annotations in combination:

@Name("RevenuePerCustomer")
@Description("Outputs revenue per customer considering all events encountered so far.")
@Tag(name="grouping", value="customer")

select customerId, sum(revenue) from CustomerRevenueEvent

This annotation only takes effect if the runtime-level setting for prioritized execution is set via configuration, as described in Section 17.6.10, “Runtime Settings Related to Execution of Statements”.

Use the @Priority EPL annotation to tag statements with a priority value. The default priority value is zero (0) for all statements. When an event (or single timer execution) requires processing the event for multiple statements, processing begins with the highest priority statement and ends with the lowest-priority statement.

Example:

@Priority(10) select * from SecurityFilter(ip="127.0.0.1")

This annotation only takes effect if the runtime-level setting for prioritized execution is set via configuration, as described in Section 17.6.10, “Runtime Settings Related to Execution of Statements”.

Use the @Drop EPL annotation to tag statements that preempt all other same or lower-priority statements. When an event (or single timer execution) requires processing the event for multiple statements, processing begins with the highest priority statement and ends with the first statement marked with @Drop, which becomes the last statement to process that event.

Unless a different priority is specified, the statement with the @Drop EPL annotation executes at priority 1. Thereby @Drop alone is an effective means to remove events from a stream.

Example:

@Drop select * from SecurityFilter(ip="127.0.0.1")

An expression alias simply assigns a name to an expression. The alias name can be used in other expressions to refer to that expression, without the need to duplicate the expression.

The expression alias obtains its scope from where it is used. Parameters cannot be provided. A second means to sharing expressions is the expression declaration as described next, which allows passing parameters and is more tightly scoped.

A statement can contain and refer to any number of expression aliases. For expressions aliases that are visible across multiple statements please consult Section 5.18.1, “Global Expression Aliases” that explains the create expression clause.

The syntax for an expression alias is:

expression expression_name alias for { expression }

An expression alias consists of the expression name and an expression in curly braces. The return type of the expression is determined by the compiler and need not be specified. The scope is automatic and determined by where the alias name is used therefore parameters cannot be specified.

This example declares an expression alias twoPI that substitutes Math.PI * 2:

expression twoPI alias for { Math.PI * 2 }
select twoPI from SampleEvent

The next example specifies an alias countPeople and uses the alias in the select-clause and the having-clause:

expression countPeople alias for { count(*) }
select countPeople from EnterRoomEvent#time(10 seconds) having countPeople > 10

When using the expression alias in an expression, empty parentheses can optionally be specified. In the above example, countPeople() can be used instead and equivalently.

The following scope rules apply for expression aliases:

  1. Expression aliases do not remove implicit limitations: For example, aggregation functions cannot be used in a filter expression even if assigned an alias.

A statement can contain expression declarations. Expressions that are common to multiple places in the same statement can be moved to a named expression declaration and reused within the same statement without duplicating the expression itself.

For declaring expressions that are visible across multiple statements i.e. globally visible expressions please consult Section 5.18.2, “Global Expression Declarations” that explains the create expression clause.

The runtime may cache declared expression result values and reuse cache values, see Section 17.6.10.5, “Declared Expression Value Cache Size”.

An expression declaration follows the lambda-style expression syntax. This syntax was chosen as it typically allows for a shorter and more concise expression body that can be easier to read then most procedural code.

The syntax for an expression declaration is:

expression expression_name { expression_body }

An expression declaration consists of the expression name and an expression body. The expression_name is any identifier. The expression_body contains optional parameters and the expression. The parameter types and the return type of the expression is determined by the compiler and do not need to be specified.

Parameters to a declared expression can be a stream name, pattern tag name or wildcard (*). Use wildcard to pass the event itself to the expression. In a join or subquery, or more generally in an expression where multiple streams or pattern tags are available, the EPL must specify the stream name or pattern tag name and cannot use wildcard.

In the expression body the => lambda operator reads as "goes to" (-> may be used and is equivalent). The left side of the lambda operator specifies the input parameters (if any) and the right side holds the expression. The lambda expression x => x * x is read "x goes to x times x".

In the expression body, if your expression takes no parameters, you may simply specify the expression and do not need the => lambda operator.

If your expression takes one parameters, specify the input parameter name followed by the => lambda operator and followed by the expression. The synopsis for use with a single input parameter is:

expression_body:   input_param_name => expression 

If your expression takes two or more parameters, specify the input parameter names in parenthesis followed by the => lambda operator followed by the expression. The synopsis for use with a multiple input parameter is:

expression_body:   (input_param [,input_param [,...]]) => expression 

The following example declares an expression that returns two times PI (ratio of the circumference of a circle to its diameter) and demonstrates its use in a select-clause:

expression twoPI { Math.PI * 2} select twoPI() from SampleEvent

The parentheses are optional when the expression accepts no parameters. The below is equivalent to the previous example:

expression twoPI { Math.PI * 2} select twoPI from SampleEvent

The next example declares an expression that accepts one parameter: a MarketData event. The expression computes a new "mid" price based on the buy and sell price:

expression midPrice { x => (x.buy + x.sell) / 2 } 
select midPrice(md) from MarketDataEvent as md

The variable name can be left off if event property names resolve without ambiguity.

This example EPL removes the variable name x:

expression midPrice { x => (buy + sell) / 2 } 
select midPrice(md) from MarketDataEvent as md

The next example EPL specifies wildcard instead:

expression midPrice { x => (buy + sell) / 2 } 
select midPrice(*) from MarketDataEvent

A further example that demonstrates two parameters is listed next. The example joins two streams and uses the price value from MarketDataEvent and the sentiment value of NewsEvent to compute a weighted sentiment:

expression weightedSentiment { (x, y) => x.price * y.sentiment } 
select weightedSentiment(md, news) 
from MarketDataEvent#lastevent as md, NewsEvent#lastevent news

Any expression can be used in the expression body including aggregations, variables, subqueries or further declared or alias expressions. Sub-queries, when used without in or exists, must be placed within parenthesis.

An example subquery within an expression declaration is shown next:

expression newsSubq { md -> 
    (select sentiment from NewsEvent#unique(symbol) where symbol = md.symbol) 
} 
select newsSubq(mdstream)
from MarketDataEvent mdstream

When using expression declarations please note these limitations:

  1. Parameters to a declared expression can only be a stream name, pattern tag name or wildcard (*).

  2. Expression declarations do not remove implicit limitations: For example, aggregation functions cannot be used in a filter expression even if using an expression declaration.

The following scope rules apply for declared expressions:

  1. The scope of the expression body of a declared expression only includes the parameters explicitly listed. Consider using an expression alias instead.

You may refer to a context in the EPL text by specifying the context keyword followed by a context name. Context are described in more detail at Chapter 4, Context and Context Partitions

The effect of referring to a context is that your statement operates according to the context dimensional information as declared for the context.

The synopsis is:

... context context_name ...

You may refer to a context in all statements except for the following types of statements:

  1. create schema for declaring event types.

  2. create variable for declaring a variable.

  3. create index for creating an index on a named window or table.

  4. update istream for updating insert stream events.

The select clause is required in all statements. The select clause can be used to select all properties via the wildcard *, or to specify a list of event properties and expressions. The select clause defines the event type (event property names and types) of the resulting events published by the statement, or pulled from the statement via the iterator methods.

The select clause also offers optional istream, irstream and rstream keywords to control whether input stream, remove stream or input and remove stream events are posted to UpdateListener instances and observers to a statement. By default, the runtime provides only the insert stream to listener and observers. See Section 17.5.4, “Compiler Settings Related to Stream Selection” on how to change the default.

The syntax for the select clause is summarized below.

select [istream | irstream | rstream] [distinct] * | expression_list ... 

The istream keyword is the default, and indicates that the runtime only delivers insert stream events to listeners and observers. The irstream keyword indicates that the runtime delivers both insert and remove stream. Finally, the rstream keyword tells the runtime to deliver only the remove stream.

The distinct keyword outputs only unique rows depending on the column list you have specified after it. It must occur after the select and after the optional stream keywords, as described in more detail below.

The syntax for selecting the event itself is:

select * from stream_def

The following statement selects StockTick events for the last 30 seconds of IBM stock ticks.

select * from StockTick(symbol='IBM')#time(30 sec)

You may well be asking: Why does the statement specify a time window here? First, the statement is meant to demonstrate the use of * wildcard. When the runtime pushes statement results to your listener and as the statement does not select remove stream events via rstream keyword, the listener receives only new events and the time window could be left off. By adding the time window the pull API (iterator API or JDBC driver) returns the last 30 seconds of events.

The * wildcard and expressions can also be combined in a select clause. The combination selects all event properties and in addition the computed values as specified by any additional expressions that are part of the select clause. Here is an example that selects all properties of stock tick events plus a computed product of price and volume that the statement names 'pricevolume':

select *, price * volume as pricevolume from StockTick

When using wildcard (*), the runtime does not actually read or copy your event properties out of your event or events, neither does it copy the event object. It simply wraps your native type in an EventBean interface. Your application has access to the underlying event object through the getUnderlying method and has access to the property values through the get method.

In a join statement, using the select * syntax selects one event property per stream to hold the event for that stream. The property name is the stream name in the from clause.

If your statement is joining multiple streams, your may specify property names that are unique among the joined streams, or use wildcard (*) as explained earlier.

In case the property name in your select or other clauses is not unique considering all joined streams, you will need to use the name of the stream as a prefix to the property.

This example is a join between the two streams StockTick and News, respectively named as 'tick' and 'news'. The example selects from the StockTick event the symbol value using the 'tick' stream name as a prefix:

select tick.symbol from StockTick#time(10) as tick, News#time(10) as news
where news.symbol = tick.symbol

Use the wildcard (*) selector in a join to generate a property for each stream, with the property value being the event itself. The output events of the statement below have two properties: the 'tick' property holds the StockTick event and the 'news' property holds the News event:

select * from StockTick#time(10) as tick, News#time(10) as news

The following syntax can also be used to specify what stream's properties to select:

select stream_name.* [as name] from ...

The selection of tick.* selects the StockTick stream events only:

select tick.* from StockTick#time(10) as tick, News#time(10) as news
where tick.symbol = news.symbol

The next example uses the as keyword to name each stream's joined events. This instructs the compiler to create a property for each named event:

select tick.* as stocktick, news.* as news 
from StockTick#time(10) as tick, News#time(10) as news
where stock.symbol = news.symbol

The output events of the above example have two properties 'stocktick' and 'news' that are the StockTick and News events.

The stream name itself, as further described in Section 5.4.5, “Using the Stream Name”, may be used within expressions or alone.

This example passes events to a user-defined function named compute and also shows insert-into to populate an event stream of combined events:

insert into TickNewStream select tick, news, MyLib.compute(news, tick) as result
from StockTick#time(10) as tick, News#time(10) as news
where tick.symbol = news.symbol
// second statement that uses the TickNewStream stream
select tick.price, news.text, result from TickNewStream

In summary, the stream_name.* streamname wildcard syntax can be used to select a stream as the underlying event or as a property, but cannot appear within an expression. While the stream_name syntax (without wildcard) always selects a property (and not as an underlying event), and can occur anywhere within an expression.

If your statement employs pattern expressions, then your pattern expression tags events with a tag name. Each tag name becomes available for use as a property in the select clause and all other clauses.

For example, here is a very simple pattern that matches on every StockTick event received within 30 seconds after start of the statement. The sample selects the symbol and price properties of the matching events:

select tick.symbol as symbol, tick.price as price
from pattern[every tick=StockTick where timer:within(10 sec)]

The use of the wildcard selector, as shown in the next statement, creates a property for each tagged event in the output. The next statement outputs events that hold a single 'tick' property whose value is the event itself:

select * from pattern[every tick=StockTick where timer:within(10 sec)]

You may also select the matching event itself using the tick.* syntax. The runtime outputs the StockTick event itself to listeners:

select tick.* from pattern[every tick=StockTick where timer:within(10 sec)]

A tag name as specified in a pattern is a valid expression itself. This example uses the insert into clause to make available the events matched by a pattern to further statements:

// make a new stream of ticks and news available
insert into StockTickAndNews 
select tick, news from pattern [every tick=StockTick -> news=News(symbol=tick.symbol)]
// second statement to select from the stream of ticks and news
select tick.symbol, tick.price, news.text from StockTickAndNews

The optional istream, irstream and rstream keywords in the select clause control the event streams posted to listeners and observers to a statement.

If neither keyword is specified, and in the default configuration, the runtime posts only insert stream events via the newEvents parameter to the update method of UpdateListener instances listening to the statement. The runtime does not post remove stream events, by default.

The insert stream consists of the events entering the respective window(s) or stream(s) or aggregations, while the remove stream consists of the events leaving the respective window(s) or the changed aggregation result. See Chapter 2, Basic Concepts for more information on insert and remove streams.

The runtime posts remove stream events to the oldEvents parameter of the update method only if the irstream keyword occurs in the select clause. This behavior can be changed via configuration as described in Section 17.5.4, “Compiler Settings Related to Stream Selection”.

By specifying the istream keyword you can instruct the runtime to only post insert stream events via the newEvents parameter to the update method on listeners. The runtime will then not post any remove stream events, and the oldEvents parameter is always a null value.

By specifying the irstream keyword you can instruct the runtime to post both insert stream and remove stream events.

By specifying the rstream keyword you can instruct the runtime to only post remove stream events via the newEvents parameter to the update method on listeners. The runtime will then not post any insert stream events, and the oldEvents parameter is also always a null value.

The following statement selects only the events that are leaving the 30 second time window.

select rstream * from StockTick#time(30 sec)

The istream and rstream keywords in the select clause are matched by same-name keywords available in the insert into clause. While the keywords in the select clause control the event stream posted to listeners to the statement, the same keywords in the insert into clause specify the event stream that the runtime makes available to other statements.

The optional distinct keyword removes duplicate output events from output. The keyword must occur after the select keyword and after the optional irstream keyword.

The distinct keyword in your select instructs the runtime to consolidate, at time of output, the output event(s) and remove output events with identical property values. Duplicate removal only takes place when two or more events are output together at any one time, therefore distinct is typically used with a batch data window, output rate limiting, fire-and-forget queries, on-select or iterator pull API.

If two or more output event objects have same property values for all properties of the event, the distinct removes all but one duplicated event before outputting events to listeners. Indexed, nested and mapped properties are considered in the comparison, if present in the output event. Further detail on key expressions can be found at Section 5.2.12, “Composite Keys and Array Values as Keys”.

The next example outputs sensor ids of temperature sensor events, but only every 10 seconds and only unique sensor id values during the 10 seconds:

select distinct sensorId from TemperatureSensorEvent output every 10 seconds

Use distinct with wildcard (*) to remove duplicate output events considering all properties of an event.

This example statement outputs all distinct events either when 100 events arrive or when 10 seconds passed, whichever occurs first:

select distinct * from TemperatureSensorEvent#time_length_batch(10, 100)

When selecting nested, indexed, mapped or dynamic properties in a select clause with distinct, it is relevant to know that the comparison uses hash code and the Java equals semantics.

By default, for certain select-clause expressions that output events or a collection of events, the runtime outputs the underlying event objects. The term outputs means the data passed to listeners, subscribers and inserted-into into another stream via insert-into.

The select-clause expressions for which underlying event objects are output by default are:

To have the runtime output EventBean instance(s) instead, add @eventbean to the relevant expressions of the select-clause.

The sample EPL shown below outputs current data window contents as EventBean instances into the stream OutStream, thereby statements consuming the stream may operate on such instances:

insert into OutStream 
select prevwindow(s0) @eventbean as win 
from MyEvent#length(2) as s0

The next EPL consumes the stream and selects the last event:

select win.lastOf() from OutStream

It is not necessary to use @eventbean if an event type by the same name (OutStream in the example) is already declared and a property exist on the type by the same name (win in this example) and the type of the property is the event type (MyEvent in the example) returned by the expression. This is further described in Section 5.10.8, “Select-Clause Expression and Inserted-Into Column Event Type”.

The from clause is required in all statements. It specifies one or more event streams, named windows or tables. Each event stream, named window or table can optionally be given a name by means of the as keyword.

from stream_def [as name] [unidirectional] [retain-union | retain-intersection] 
    [, stream_def [as stream_name]] [, ...]

The event stream definition stream_def as shown in the syntax above can consists of either a filter-based event stream definition or a pattern-based event stream definition.

For joins and outer joins, specify two or more event streams. Joins and the unidirectional keyword are described in more detail in Section 5.12, “Joining Event Streams”. Joins are handy when multiple streams or patterns can trigger output and outer joins can be used to union and connect streams via or.

EPL supports joins against relational databases for access to historical or reference data as explained in Section 5.13, “Accessing Relational Data via SQL”. EPL can also join results returned by an arbitrary invocation, as discussed in Section 5.14, “Accessing Non-Relational Data via Method, Script or UDF Invocation”.

The stream_name is an optional identifier assigned to the stream. The stream name can itself occur in any expression and provides access to the event itself from the named stream. Also, a stream name may be combined with a method name to invoke instance methods on events of that stream.

For all streams with the exception of historical sources your statement may employ data windows as outlined below. The retain-intersection (the default) and retain-union keywords build a union or intersection of two or more data windows as described in Section 5.4.4, “Multiple Data Windows”.

The stream_def syntax for a filter-based event stream is as below:

event_stream_name [(filter_criteria)] [contained_selection] [#window_spec] [#window_spec] [...]

The event_stream_name is either the name of an event type or name of an event stream populated by an insert into statement or the name of a named window or table.

The filter_criteria is optional and consists of a list of expressions filtering the events of the event stream, within parenthesis after the event stream name. Filter criteria cannot be specified for tables.

The contained_selection is optional and is for use with coarse-grained events that have properties that are themselves one or more events, see Section 5.19, “Contained-Event Selection” for the synopsis and examples. Contained-event cannot be specified for tables.

The window_spec specify one or more data windows. Data windows cannot be specified for named windows and tables. Instead of the # hash character the . dot character can also be used, however the dot character requires the data window namespace.

The following statement shows event type, filter criteria and data windows combined in one statement. It selects all event properties for the last 100 events of IBM stock ticks for volume. In the example, the event type is StockTick. The expression filters for events where the property symbol has a value of "IBM". This statement specifies a length window and thus computes the total volume of the last 100 events.

select sum(volume) from StockTick(symbol='IBM')#length(100)

The runtime filters out events in an event stream as defined by filter criteria that are placed in parenthesis, before it sends events to the data window(s) (if any). Thus, compared to search conditions in a where clause, filter criteria remove unneeded events early. In the above example, events with a symbol other than IBM do not enter the length window.

The filtering criteria to filter for events with certain event property values are placed within parenthesis after the event type name:

select * from RfidEvent(category="Perishable")

All expressions can be used in filters, including static methods that return a boolean value:

select * from RfidEvent(MyRFIDLib.isInRange(x, y) or (x < 0 and y < 0))

Filter expressions can be separated via a single comma ','. The comma represents a logical AND between filter expressions:

select * from RfidEvent(zone=1, category=10)
...is equivalent to...
select * from RfidEvent(zone=1 and category=10)

The compiler analyzes the filter expressions and determines the filter indexes to use or to create. The following operators are the preferred means of filtering event streams, especially in the presence of a larger number of filters or statements. Please read Section 2.18.2, “Filter Indexes” for more information. The compiler can translate the following operators, including combinations of these operators connected via and and or, into filter indexes:

  • equals =

  • not equals !=

  • comparison operators < , > , >=, <=

  • ranges

    • use the between keyword for a closed range where both endpoints are included

    • use the in keyword and round () or square brackets [] to control how endpoints are included

    • for inverted ranges use the not keyword and the between or in keywords

  • list-of-values checks using the in keyword or the not in keywords followed by a comma-separated list of values

  • single-row functions that have been registered and are invoked via function name (see user-defined functions) and that either return a boolean value or that have their return value compared to a constant

  • the and and or logical operators

At compile time the compiler scans new filter expressions for sub-expressions that can be placed into filter indexes. Indexing filter values to match event properties of incoming events enables the runtime to match incoming events faster, especially if your application creates a large number of statements or context partitions or requires many similar filters. The above list of operators represents the set of operators that the compiler can best convert into filter index entries. The use of comma or logical and in filter expressions is fully equivalent.

Event pattern expressions can also be used to specify one or more event streams in a statement. For pattern-based event streams, the event stream definition stream_def consists of the keyword pattern and a pattern expression in brackets []. The syntax for an event stream definition using a pattern expression is below. As in filter-based event streams you can specify data windows.

pattern [pattern_expression] [#window_spec] [#window_spec] [...]

The next statement specifies an event stream that consists of both stock tick events and trade events. The example tags stock tick events with the name "tick" and trade events with the name "trade".

select * from pattern [every tick=StockTickEvent or every trade=TradeEvent]

This statement generates an event every time the runtime receives either one of the event types. The generated events resemble a map with "tick" and "trade" keys. For stock tick events, the "tick" key value is the underlying stock tick event, and the "trade" key value is a null value. For trade events, the "trade" key value is the underlying trade event, and the "tick" key value is a null value.

Lets further refine this statement adding a data window the gives us the last 30 seconds of either stock tick or trade events. Lets also select prices and a price total.

select tick.price as tickPrice, trade.price as tradePrice, 
       sum(tick.price) + sum(trade.price) as total
  from pattern [every tick=StockTickEvent or every trade=TradeEvent]#time(30 sec)

Note that in the statement above tickPrice and tradePrice can each be null values depending on the event processed. Therefore, an aggregation function such as sum(tick.price + trade.price)) would always return null values as either of the two price properties are always a null value for any event matching the pattern. Use the coalesce function to handle null values, for example: sum(coalesce(tick.price, 0) + coalesce(trade.price, 0)).

Data windows retain a subset of events. They provide an retain/expiry policy for events and the runtime automatically removes events according to the retain/expiry policy. Data windows can be grouped and data windows can be intersected or unioned. See the section Chapter 14, EPL Reference: Data Windows on the data windows available. Data windows can take parameters. Any expressions can be a parameter, with limitations.

The example statement below outputs a count per expressway for car location events (contains information about the location of a car on a highway) of the last 60 seconds:

select expressway, count(*) from CarLocEvent#time(60) 
group by expressway

The next example declares #groupwin and a #length window to indicate that there is a separate length window per car id:

select cardId, expressway, direction, segment, count(*) 
from CarLocEvent#groupwin(carId)#length(4) 
group by carId, expressway, direction, segment

The #groupwin(carId) groups car location events by car id. The #length(4) keeps a length window of the 4 last events, with one separate length window for each car id. The example reports the number of events per car id and per expressway, direction and segment considering the last 4 events for each car id only.

The special keep-all window keeps all events: It does not expire events and does not provide a remove stream, i.e. events are not removed from the keep-all window unless by means of on-delete or on-merge or fire-and-forget delete.

Data windows provide an expiry policy that indicates when to remove events from the data window, with the exception of the keep-all data window which has no expiry policy and the #groupwin grouped-window for allocating a new data window per group.

EPL allows the freedom to use multiple data windows onto a stream and thus combine expiry policies. Combining data windows into an intersection (the default) or a union can achieve a useful strategy for retaining events and expiring events that are no longer of interest. Named windows, tables and on-merge and on-delete provide an additional degree of freedom.

In order to combine two or more data windows there is no keyword required. The retain-intersection keyword is the default and the retain-union keyword may instead be provided for a stream.

The concept of union and intersection come from Set mathematics. In the language of Set mathematics, two sets A and B can be "added" together: The intersection of A and B is the set of all things which are members of both A and B, i.e. the members two sets have "in common". The union of A and B is the set of all things which are members of either A or B.

Use the retain-intersection (the default) keyword to retain an intersection of all events as defined by two or more data windows. All events removed from any of the intersected data windows are entered into the remove stream. This is the default behavior if neither retain keyword is specified.

Use the retain-union keyword to retain a union of all events as defined by two or more data windows. Only events removed from all data windows are entered into the remove stream.

The next example statement totals the price of OrderEvent events in a union of the last 30 seconds and unique by product name:

select sum(price) from OrderEvent#time(30 sec)#unique(productName) retain-union

In the above statement, all OrderEvent events that are either less then 30 seconds old or that are the last event for the product name are considered.

Here is an example statement totals the price of OrderEvent events in an intersection of the last 30 seconds and unique by product name:

select sum(price) from OrderEvent#time(30 sec)#unique(productName) retain-intersection

In the above statement, only those OrderEvent events that are both less then 30 seconds old and are the last event for the product name are considered. The number of events that the runtime retains is the number of unique events per product name in the last 30 seconds (and not the number of events in the last 30 seconds).

For an intersection the runtime retains the minimal number of events representing that intersection. Thus when combining a time window of 30 seconds and a last-event window, for example, the number of events retained at any time is zero or one event (and not 30 seconds of events).

When combining a batch window into an intersection with another data window the combined data window gains batching semantics: Only when the batch criteria is fulfilled does the runtime provide the batch of intersecting insert stream events. Multiple batch data windows may not be combined into an intersection.

The table below provides additional examples for data window intersections:


Your from clause may assign a name to each stream. This assigned stream name can serve any of the following purposes.

First, the stream name can be used to disambiguate property names. The stream_name.property_name syntax uniquely identifies which property to select if property names overlap between streams. Here is an example:

select prod.productId, ord.productId from ProductEvent as prod, OrderEvent as ord

Second, the stream name can be used with a wildcard (*) character to select events in a join, or assign new names to the streams in a join:

// Select ProductEvent only
select prod.* from ProductEvent as prod, OrderEvent
// Assign column names 'product' and 'order' to each event
select prod.* as product, ord.* as order from ProductEvent as prod, OrderEvent as ord

Further, the stream name by itself can occur in any expression: The runtime passes the event itself to that expression. For example, the runtime passes the ProductEvent and the OrderEvent to the user-defined function 'checkOrder':

select prod.productId, MyFunc.checkOrder(prod, ord) 
from ProductEvent as prod, OrderEvent as ord

Last, you may invoke an instance method on each event of a stream, and pass parameters to the instance method as well. Instance method calls are allowed anywhere in an expression.

The next statement demonstrates this capability by invoking a method 'computeTotal' on OrderEvent events and a method 'getMultiplier' on ProductEvent events:

select ord.computeTotal(prod.getMultiplier()) from ProductEvent as prod, OrderEvent as ord

Instance methods may also be chained: Your EPL may invoke a method on the result returned by a method invocation.

Assume that your product event exposes a method getZone which returns a zone object. Assume that the Zone class declares a method checkZone. This example statement invokes a method chain:

select prod.getZone().checkZone("zone 1") from ProductEvent as prod

Use the backwards apostrophe ` (aka. back tick) character to escape stream names in the from-clause and in on-trigger statements (e.g. from MyEvent as `order`...).

The where clause is an optional clause in statements. Via the where clause event streams can be joined and correlated.

Any expression can be placed in the where clause. Typically you would use comparison operators =, < , > , >=, <=, !=, <>, is null, is not null and logical combinations via and and or for joining, correlating or comparing events. The where clause introduces join conditions as outlined in Section 5.12, “Joining Event Streams”.

Some examples are listed below.

...where settlement.orderId = order.orderId
...where exists (select orderId from Settlement#time(1 min) where settlement.orderId = order.orderId)

The following two statements are equivalent since both query filter events by the amount property value and both statements do not specify a data window.

// preferable: specify filter criteria with the "eventtype(...filters...)" notation
@name('first') select * from Withdrawal(amount > 200)
// equivalent only when there is no data window
@name('second') select * from Withdrawal where amount > 200

You can control whether the compiler rewrites the second statement to the form of the first statement. If you specify @Hint('disable_whereexpr_moveto_filter') you can instruct the compiler to not move the where-clause expression into the filter.

The aggregate functions are further documented in Section 10.2, “Aggregation Functions”. You can use aggregate functions to calculate and summarize data from event properties.

For example, to find out the total price for all stock tick events in the last 30 seconds, type:

select sum(price) from StockTickEvent#time(30 sec)

Aggregation functions do not require the use of data windows. The examples herein specify data windows for the purpose of example. An alternative means to instruct the runtime when to start and stop aggregating and on what level to aggregate is via context declarations.

For example, to find out the total price for all stock tick events since statement start, type:

select sum(price) from StockTickEvent

Here is the syntax for aggregate functions:

aggregate_function( [all | distinct] expression [,expression [,...]] 
    [, group_by:local_group_by] [, filter:filter_expression] )

You can apply aggregate functions to all events in an event stream window or to one or more groups of events (i.e. group by). From each set of events to which an aggregate function is applied the runtime generates a single value.

Expression is usually an event property name. However it can also be a constant, function, or any combination of event property names, constants, and functions connected by arithmetic operators.

You can provide a grouping dimension for each aggregation function by providing the optional group_by parameter as part of aggregation function parameters. Please refer to Section 5.6.4, “Specifying Grouping for Each Aggregation Function”.

You can provide a filter expression for each aggregation function by providing the optional filter parameter as part of aggregation function parameters. Please refer to Section 5.6.5, “Specifying a Filter Expression for Each Aggregation Function”.

For example, to find out the average price for all stock tick events in the last 30 seconds if the price was doubled:

select avg(price * 2) from StockTickEvent#time(30 seconds)

You can use the optional keyword distinct with all aggregate functions to eliminate duplicate values before the aggregate function is applied. The optional keyword all which performs the operation on all events is the default.

You can use aggregation functions in a select clause and in a having clause. You cannot use aggregate functions in a where clause, but you can use the where clause to restrict the events to which the aggregate is applied. The next statement computes the average and sum of the price of stock tick events for the symbol IBM only, for the last 10 stock tick events regardless of their symbol.

select 'IBM stats' as title, avg(price) as avgPrice, sum(price) as sumPrice
from StockTickEvent#length(10)
where symbol='IBM'

In the above example the length window of 10 elements is not affected by the where clause, i.e. all events enter and leave the length window regardless of their symbol. If you only care about the last 10 IBM events, you need to add filter criteria as below.

select 'IBM stats' as title, avg(price) as avgPrice, sum(price) as sumPrice
from StockTickEvent(symbol='IBM')#length(10)
where symbol='IBM'

You can use aggregate functions with any type of event property or expression, with the following exceptions:

  1. You can use sum, avg, median, stddev, avedev with numeric event properties only

The runtime ignores any null values returned by the event property or expression on which the aggregate function is operating, except for the count(*) function, which counts null values as well. All aggregate functions return null if the data set contains no events, or if all events in the data set contain only null values for the aggregated expression.

The group by clause is optional in all statements. The group by clause divides the output of a statement into groups. You can group by one or more event property names, or by the result of computed expressions. When used with aggregate functions, group by retrieves the calculations in each subgroup. You can use group by without aggregate functions, but generally that can produce confusing results.

For example, the below statement returns the total price per symbol for all stock tick events in the last 30 seconds:

select symbol, sum(price) from StockTickEvent#time(30 sec) group by symbol

The syntax of the group by clause is:

group by aggregate_free_expression [, aggregate_free_expression] [, ...]

The compiler places the following restrictions on expressions in the group by clause:

  1. Expressions in the group by cannot contain aggregate functions.

  2. When grouping an unbound stream, i.e. no data window is specified onto the stream providing groups, or when using output rate limiting with the ALL keyword, you should ensure your group-by expression does not return an unlimited number of values. If, for example, your group-by expression is a fine-grained timestamp, group state that accumulates for an unlimited number of groups potentially reduces available memory significantly. Use a @Hint as described below to instruct the runtime when to discard group state.

You can list more then one expression in the group by clause to nest groups. Once the sets are established with group by the aggregation functions are applied. Further detail on key expressions can be found at Section 5.2.12, “Composite Keys and Array Values as Keys”.

This statement posts the median volume for all stock tick events in the last 30 seconds per symbol and tick data feed. The runtime posts one event for each group to statement listeners:

select symbol, tickDataFeed, median(volume) 
from StockTickEvent#time(30 sec) 
group by symbol, tickDataFeed

In the statement above the event properties in the select list (symbol, tickDataFeed) are also listed in the group by clause. The statement thus follows the SQL standard which prescribes that non-aggregated event properties in the select list must match the group by columns.

EPL also supports statements in which one or more event properties in the select list are not listed in the group by clause. The statement below demonstrates this case. It calculates the standard deviation since statement start over stock ticks aggregating by symbol and posting for each event the symbol, tickDataFeed and the standard deviation on price.

select symbol, tickDataFeed, stddev(price) from StockTickEvent group by symbol

The above example still aggregates the price event property based on the symbol, but produces one event per incoming event, not one event per group.

Additionally, EPL supports statements in which one or more event properties in the group by clause are not listed in the select list. This is an example that calculates the mean deviation per symbol and tickDataFeed and posts one event per group with symbol and mean deviation of price in the generated events. Since tickDataFeed is not in the posted results, this can potentially be confusing.

select symbol, avedev(price) 
from StockTickEvent#time(30 sec) 
group by symbol, tickDataFeed

Expressions are also allowed in the group by list:

select symbol * price, count(*) from StockTickEvent#time(30 sec) group by symbol * price

If the group by expression resulted in a null value, the null value becomes its own group. All null values are aggregated into the same group. If you are using the count(expression) aggregate function which does not count null values, the count returns zero if only null values are encountered.

You can use a where clause in a statement with group by. Events that do not satisfy the conditions in the where clause are eliminated before any grouping is done. For example, the statement below posts the number of stock ticks in the last 30 seconds with a volume larger then 100, posting one event per group (symbol).

select symbol, count(*) from StockTickEvent#time(30 sec) where volume > 100 group by symbol

The runtime reclaims aggregation state agressively when it determines that a group has no data points, based on the data in the data windows. When your application data creates a large number of groups with a small or zero number of data points then performance may suffer as state is reclaimed and created anew. EPL provides the @Hint('disable_reclaim_group') hint that you can specify as part of a statement to avoid group reclaim.

When aggregating values over an unbound stream (i.e. no data window is specified onto the stream) and when your group-by expression returns an unlimited number of values, for example when a timestamp expression is used, then please note the next hint.

A sample statement that aggregates stock tick events by timestamp, assuming the event type offers a property by name timestamp that, reflects time in high resolution, for example arrival or system time:

// Note the below statement could lead to an out-of-memory problem:
select symbol, sum(price) from StockTickEvent group by timestamp

As the runtime has no means of detecting when aggregation state (sums per symbol) can be discarded, you may use the following hints to control aggregation state lifetime.

The @Hint("reclaim_group_aged=age_in_seconds") hint instructs the runtime to discard aggregation state that has not been updated for age_in_seconds seconds.

The optional @Hint("reclaim_group_freq=sweep_frequency_in_seconds") can be used in addition to control the frequency at which the runtime sweeps aggregation state to determine aggregation state age and remove state that is older then age_in_seconds seconds. If the hint is not specified, the frequency defaults to the same value as age_in_seconds.

The updated sample statement with both hints:

// Instruct runtime to remove state older then 10 seconds and sweep every 5 seconds
@Hint('reclaim_group_aged=10,reclaim_group_freq=5')
select symbol, sum(price) from StockTickEvent group by timestamp

Variables may also be used to provide values for age_in_seconds and sweep_frequency_in_seconds.

This example statement uses a variable named varAge to control how long aggregation state remains in memory, and the runtime defaults the sweep frequency to the same value as the variable provides:

@Hint('reclaim_group_aged=varAge')
select symbol, sum(price) from StockTickEvent group by timestamp

EPL supports the SQL-standard rollup, cube and grouping sets keywords. These keywords are available only in the group-by clause and instruct the runtime to compute higher-level (or super-aggregate) aggregation values, i.e. to perform multiple levels of analysis (groupings) at the same time.

EPL also supports the SQL-standard grouping and grouping_id functions. These functions can be used in the select-clause, having-clause or order by-clause to obtain information about the current row's grouping level in expressions. Please see Section 10.1.8, “The Grouping Function”.

Detailed examples and information in respect to output rate limiting can be found in Section A.7, “Output for Fully-Aggregated, Grouped Statements With Rollup”.

Use the rollup keyword in the group-by lists of expressions to compute the equivalent of an OLAP dimension or hierarchy.

For example, the following statement outputs for each incoming event three rows. The first row contains the total volume per symbol and feed, the second row contains the total volume per symbol and the third row contains the total volume overall. This example aggregates across all events for each aggregation level (3 groupings) since it declares no data window:

select symbol, tickDataFeed, sum(volume) from StockTickEvent
group by rollup(symbol, tickDataFeed)

The value of tickDataFeed is null for the output row that contains the total per symbol and the output row that contains the total volume overall. The value of both symbol and tickDataFeed is null for the output row that contains the overall total.

Use the cube keyword in the group-by lists of expressions to compute a cross-tabulation.

The following statement outputs for each incoming event four rows. The first row contains the total volume per symbol and feed, the second row contains the total volume per symbol, the third row contains the total volume per feed and the forth row contains the total volume overall (4 groupings):

select symbol, tickDataFeed, sum(volume) from StockTickEvent
group by cube(symbol, tickDataFeed)

The grouping sets keywords allows you to specify only the groupings you want. It can thus be used to generate the same groupings that simple group-by expressions, rollup or cube would produce.

In this example each incoming event causes the runtime to compute two output rows: The first row contains the total volume per symbol and the second row contains the total volume per feed (2 groupings):

select symbol, tickDataFeed, sum(volume) from StockTickEvent
group by grouping sets(symbol, feed)

Your group-by expression can list grouping expressions and use rollup, cube and grouping sets keywords in addition or in combination.

This statement outputs the total per combination of symbol and feed and the total per symbol (2 groupings):

select symbol, tickDataFeed, sum(volume) from StockTickEvent
group by symbol, rollup(tickDataFeed)

You can specify combinations of expressions by using parenthesis.

The next statement is equivalent and also outputs the total per symbol and feed and the total per symbol (2 groupings, note the parenthesis):

select symbol, tickDataFeed, sum(volume) from StockTickEvent
group by grouping sets ((symbol, tickDataFeed), symbol)

Use empty parenthesis to aggregate across all dimensions.

This statement outputs the total per symbol, the total per feed and the total overall (3 groupings):

select symbol, tickDataFeed, sum(volume) from StockTickEvent
group by grouping sets (symbol, tickDataFeed, ())

The order of any output events for both insert and remove stream data is well-defined and exactly as indicated before. For example, specifying grouping sets ((), symbol, tickDataFeed) outputs a total overall, a total by symbol and a total by feed in that order. If the statement has an order-by-clause then the ordering criteria of the order-by-clause take precedence.

You can use rollup and cube within grouping sets.

This statement outputs the total per symbol and feed, the total per symbol, the total overall and the total by feed (4 groupings):

select symbol, tickDataFeed, sum(volume) from StockTickEvent
group by grouping sets (rollup(symbol, tickDataFeed), tickDataFeed)

Note

In order to use any of the rollup, cube and grouping sets keywords the statement must be fully-aggregated. All non-aggregated properties in the select-clause, having-clause or order-by-clause must also be listed in the group by clause.

EPL allows each aggregation function to specify its own grouping criteria. This is useful for aggregating across multiple dimensions.

The syntax for the group_by parameter for use with aggregation functions is:

group_by: ( [expression [,expression [,...]]] )

The group_by identifier can occur at any place within the aggregation function parameters. It follows a colon and within parenthesis an optional list of grouping expressions. The parenthesis are not required when providing a single expression. For grouping on the top level (overall aggregation) please use () empty parenthesis. Further detail on key expressions can be found at Section 5.2.12, “Composite Keys and Array Values as Keys”.

The presence of group_by aggregation function parameters, the grouping expressions as well as the group-by clause determine the number of output rows for statements as further described in Section 2.15, “Basic Aggregated Statement Types”.

For un-grouped statements (without a group by clause), if any aggregation function specifies a group_by other than the () overall group, the statement executes as aggregated and un-grouped.

For example, the next statement is an aggregated (but not fully aggregated) and ungrouped statement and outputs various totals for each arriving event:

select sum(price, group_by:()) as totalPriceOverall,
  sum(price, group_by:account) as totalPricePerAccount,
  sum(price, group_by:(account, feed)) as totalPricePerAccountAndFeed
from Orders

For grouped statements (with a group by clause), if all aggregation functions specifiy either no group_by or group_by criteria that subsume the criteria in the group by clause, the statement executes as a fully-aggregated and grouped statement. Otherwise the statement executes as an aggregated and grouped statement.

The next example is fully-aggregated and grouped and it computes, for the last one minute of orders, the ratio of orders per account compared to all orders:

select count(*)/count(*, group_by:()) as ratio
from Orders#time(1 min) group by account

The next example is an aggregated (and not fully-aggregated) and grouped statement that in addition outputs a count per order category:

select count(*) as cnt, count(*, group_by:()) as cntOverall,  count(*, group_by:(category))  as cntPerCategory
from Orders#time(1 min) group by account

Please note the following restrictions:

  1. Expressions in the group_by cannot contain aggregate functions.

  2. Hints pertaining to group-by are not available when a statement specifies aggregation functions with group_by.

  3. The group_by aggregation function parameters are not available in subqueries, match-recognize, statements that aggregate into tables using into table or in combination with rollup and grouping sets.

Use the having clause to pass or reject events defined by the group-by clause. The having clause sets conditions for the group by clause in the same way where sets conditions for the select clause, except where cannot include aggregate functions, while having often does.

This statement is an example of a having clause with an aggregate function. It posts the total price per symbol for the last 30 seconds of stock tick events for only those symbols in which the total price exceeds 1000. The having clause eliminates all symbols where the total price is equal or less then 1000.

select symbol, sum(price) 
from StockTickEvent#time(30 sec) 
group by symbol 
having sum(price) > 1000

To include more then one condition in the having clause combine the conditions with and, or or not. This is shown in the statement below which selects only groups with a total price greater then 1000 and an average volume less then 500.

select symbol, sum(price), avg(volume)
from StockTickEvent#time(30 sec) 
group by symbol 
having sum(price) > 1000 and avg(volume) < 500

A statement with the having clause should also have a group by clause. If you omit group-by, all the events not excluded by the where clause return as a single group. In that case having acts like a where except that having can have aggregate functions.

The having clause can also be used without group by clause as the below example shows. The example below posts events where the price is less then the current running average price of all stock tick events in the last 30 seconds.

select symbol, price, avg(price) 
from StockTickEvent#time(30 sec) 
having price < avg(price)

When you include filters, the where condition, the group by clause and the having condition in a statement the sequence in which each clause affects events determines the final result:

The following statement illustrates the use of filter, where, group by and having clauses in one statement with a select clause containing an aggregate function.

select tickDataFeed, stddev(price)
from StockTickEvent(symbol='IBM')#length(10) 
where volume > 1000
group by tickDataFeed 
having stddev(price) > 0.8

The runtime filters events using the filter criteria for the event stream StockTickEvent. In the example above only events with symbol IBM enter the length window over the last 10 events, all other events are simply discarded. The where clause removes any events posted by the length window (events entering the window and event leaving the window) that do not match the condition of volume greater then 1000. Remaining events are applied to the stddev standard deviation aggregate function for each tick data feed as specified in the group by clause. Each tickDataFeed value generates one event. The runtime applies the having clause and only lets events pass for tickDataFeed groups with a standard deviation of price greater then 0.8.

The keyed segmented context create context ... partition by and the group by clause as well as the built-in #groupwin are similar in their ability to group events but very different in their semantics. This section explains the key differences in their behavior and use.

The keyed segmented context as declared with create context ... partition by and context .... select ... creates a new context partition per key value(s). The runtime maintains separate data windows as well as separate aggregations per context partition; thereby the keyed segmented context applies to both. See Section 4.2.2, “Keyed Segmented Context” for additional examples.

The group by clause works together with aggregation functions in your statement to produce an aggregation result per group. In greater detail, this means that when a new event arrives, the runtime applies the expressions in the group by clause to determine a grouping key. If the runtime has not encountered that grouping key before (a new group), the runtime creates a set of new aggregation results for that grouping key and performs the aggregation changing that new set of aggregation results. If the grouping key points to an existing set of prior aggregation results (an existing group), the runtime performs the aggregation changing the prior set of aggregation results for that group.

The #groupwin instructs the system to have a separate data window per group, see Section 14.3.15, “Grouped Data Window (groupwin or std:groupwin)”. It causes allocation of separate data window(s) for each grouping key encountered.

The table below summarizes the point:


Please review the performance section for advice related to performance or memory-use.

The next example shows statements that produce equivalent results. The statement using the group by clause is generally preferable as is easier to read. The second form introduces the #uni special data window which computes univariate statistics for a given property:

select symbol, avg(price) from StockTickEvent group by symbol
// ... is equivalent to ...
select symbol, average from StockTickEvent#groupwin(symbol)#uni(price)

The next example shows two statements that are NOT equivalent as the length window is ungrouped in the first statement, and grouped in the second statement:

select symbol, sum(price) from StockTickEvent#length(10) group by symbol
// ... NOT equivalent to ...
select symbol, sum(price) from StockTickEvent#groupwin(symbol)#length(10)

The key difference between the two statements is that in the first statement the length window is ungrouped and applies to all events regardless of group. While in the second statement each group gets its own length window. For example, in the second statement events arriving for symbol "ABC" get a length window of 10 events, and events arriving for symbol "DEF" get their own length window of 10 events.

The output clause is optional in EPL and is used to control or stabilize the rate at which events are output and to suppress output events. The EPL language provides for several different ways to control output rate.

Here is the syntax for the output clause that specifies a rate in time interval or number of events:

output [after suppression_def] 
  [[all | first | last | snapshot] every output_rate [seconds | events]]
[and when terminated]

An alternate syntax specifies the time period between output as outlined in Section 5.2.1, “Specifying Time Periods” :

output [after suppression_def] 
  [[all | first | last | snapshot] every time_period]
[and when terminated]

A crontab-like schedule can also be specified. The schedule parameters follow the pattern observer parameters and are further described in Section 7.6.4, “Crontab (timer:at)” :

output [after suppression_def] 
  [[all | first | last | snapshot] at 
   (minutes, hours, days of month, months, days of week [, seconds])]
[and when terminated]

For use with contexts, in order to trigger output only when a context partition terminates, specify when terminated as further described in Section 4.5, “Output When a Context Partition Ends or Terminates”:

output [after suppression_def] 
  [[all | first | last | snapshot] when terminated 
  [and termination_expression]
  [then set variable_name = assign_expression [, variable_name = assign_expression [,...]]]
  ]

Last, output can be controlled by an expression that may contain variables, user-defined functions and information about the number of collected events. Output that is controlled by an expression is discussed in detail below.

The after keyword and suppression_def can appear alone or together with further output conditions and suppresses output events.

For example, the following statement outputs, every 60 seconds, the total price for all orders in the 30-minute time window:

select sum(price) from OrderEvent#time(30 min) output snapshot every 60 seconds

The all keyword is the default and specifies that all events in a batch should be output, each incoming row in the batch producing an output row. Note that for statements that group via the group by clause, the all keyword provides special behavior as below.

The first keyword specifies that only the first event in an output batch is to be output. Using the first keyword instructs the runtime to output the first matching event as soon as it arrives, and then ignores matching events for the time interval or number of events specified. After the time interval elapsed, or the number of matching events has been reached, the next first matching event is output again and the following interval the runtime again ignores matching events. For statements that group via the group by clause, the first keywords provides special behavior as below.

The last keyword specifies to only output the last event at the end of the given time interval or after the given number of matching events have been accumulated. Again, for statements that group via the group by clause the last keyword provides special behavior as below.

The snapshot keyword is often used with unbound streams and/or aggregation to output current aggregation results. While the other keywords control how a batch of events between output intervals is being considered, the snapshot keyword outputs current state of a statement independent of the last batch. Its output is comparable to the iterator method provided by a statement. More information on output snapshot can be found in Section 5.7.1.3, “Output Snapshot”.

The output_rate is the frequency at which the runtime outputs events. It can be specified in terms of time or number of events. The value can be a number to denote a fixed output rate, or the name of a variable whose value is the output rate. By means of a variable the output rate can be controlled externally and changed dynamically at runtime.

Please consult the Appendix A, Output Reference and Samples for detailed information on insert and remove stream output for the various output clause keywords.

For use with contexts you may append the keywords and when terminated to trigger output at the rate defined and in addition trigger output when the context partition terminates. Please see Section 4.5, “Output When a Context Partition Ends or Terminates” for details.

Note

Please see Appendix B, Runtime Considerations for Output Rate Limiting for information on how the system retains input events and computes output events according to the specified output rate.

The time interval can also be specified in terms of minutes; the following statement is identical to the first one.

select * from StockTickEvent output every 1.5 minutes

A second way that output can be stabilized is by batching events until a certain number of events have been collected:

select * from StockTickEvent output every 5 events

Additionally, event output can be further modified by the optional last keyword, which causes output of only the last event to arrive into an output batch.

select * from StockTickEvent output last every 5 events

Using the first keyword you can be notified at the start of the interval. The allows to watch for situations such as a rate falling below a threshold and only be informed every now and again after the specified output interval, but be informed the moment it first happens.

select * from TickRate where rate<100 output first every 60 seconds

A sample statement using the Unix "crontab"-command schedule is shown next. See Section 7.6.4, “Crontab (timer:at)” for details on schedule syntax. Here, output occurs every 15 minutes from 8am to 5:45pm (hours 8 to 17 at 0, 15, 30 and 45 minutes past the hour):

select symbol, sum(price) from StockTickEvent group by symbol output at (*/15, 8:17, *, *, *)

Output can also be controlled by an expression that may check variable values, use user-defined functions and statement built-in properties that provide additional information. The synopsis is as follows:

output [after suppression_def] 
  [[all | first | last | snapshot] when trigger_expression 
    [then set variable_name = assign_expression [, variable_name = assign_expression [,...]]]
  [and when terminated 
    [and termination_expression]
    [then set variable_name = assign_expression [, variable_name = assign_expression [,...]]]
  ]

The when keyword must be followed by a trigger expression returning a boolean value of true or false, indicating whether to output. Use the optional then keyword to change variable values after the trigger expression evaluates to true. An assignment expression assigns a new value to variable(s).

For use with contexts you may append the keywords and when terminated to also trigger output when the context partition terminates. Please see Section 4.5, “Output When a Context Partition Ends or Terminates” for details. You may optionally specify a termination expression. If that expression is provided the runtime evaluates the expression when the context partition terminates: The evaluation result of true means output occurs when the context partition terminates, false means no output occurs when the context partition terminates. You may specify then set followed by a list of assignments to assign variables. Assignments are executed on context partition termination regardless of the termination expression, if present.

Lets consider an example. The next statement assumes that your application has defined a variable by name OutputTriggerVar of boolean type. The statement outputs rows only when the OutputTriggerVar variable has a boolean value of true:

select sum(price) from StockTickEvent output when OutputTriggerVar = true

The runtime evaluates the trigger expression when streams and data windows (if any) post one or more insert or remove stream events after considering the where clause, if present. It also evaluates the trigger expression when any of the variables used in the trigger expression, if any, changes value. Thus output occurs as follows:

  1. When there are insert or remove stream events and the when trigger expression evaluates to true, the runtime outputs the resulting rows.

  2. When any of the variables in the when trigger expression changes value, the runtime evaluates the expression and outputs results. Result output occurs within the minimum time interval of timer resolution.

By adding a then part to the EPL, you can reset any variables after the trigger expression evaluated to true:

select sum(price) from StockTickEvent 
  output when OutputTriggerVar = true  
  then set OutputTriggerVar = false

Expressions in the when and then may, for example, use variables, user defined functions or any of the built-in named properties that are described in the below list.

The following built-in properties are available for use:


The values provided by count_insert and count_remove are non-continues: The number returned for these properties may 'jump' up rather then count up by 1. The counts reset to zero upon output.

The following restrictions apply to expressions used in the output rate clause:

  • Event property names cannot be used in the output clause.

  • Aggregation functions cannot be used in the output clause.

  • The prev previous event function and the prior prior event function cannot be used in the output clause.

Remove stream events can also be useful in conjunction with aggregation and the output clause: When the runtime posts remove stream events for fully-aggregated statements, it presents the aggregation state before the expiring event leaves the data window. Your application can thus easily obtain a delta between the new aggregation value and the prior aggregation value.

The runtime evaluates the having-clause at the granularity of the data posted by data windows (if any) or when an event arrives (without a data windows). That is, if you utilize a time window and output every 10 events, the having clause applies to each individual event or events entering and leaving the time window (and not once per batch of 10 events).

The output clause interacts in two ways with the group by and having clauses. First, in the output every n events case, the number n refers to the number of events arriving into the group by clause. That is, if the group by clause outputs only 1 event per group, or if the arriving events don't satisfy the having clause, then the actual number of events output by the statement could be fewer than n.

Second, the last, all and first keywords have special meanings when used in a statement with aggregate functions and the group by clause:

Please consult the Appendix A, Output Reference and Samples for detailed information on insert and remove stream output for aggregation and group-by.

By adding an output rate limiting clause to a statement that contains a group by clause you can control output of groups to obtain one row for each group, generating an event per group at the given output frequency.

The next statement outputs total price per symbol cumulatively (no data window was used here). As it specifies the all keyword, the statement outputs the current value for all groups seen so far, regardless of whether the group was updated in the last interval. Output occurs after an interval of 5 seconds passed and at the end of each subsequent interval:

select symbol, sum(price) from StockTickEvent group by symbol output all every 5 seconds

The below statement outputs total price per symbol considering events in the last 3 minutes. When events leave the 3-minute data window output also occurs as new aggregation values are computed. The last keyword instructs the runtime to output only those groups that had changes. Output occurs after an interval of 10 seconds passed and at the end of each subsequent interval:

select symbol, sum(price) from StockTickEvent#time(3 min)
group by symbol output last every 10 seconds

This statement also outputs total price per symbol considering events in the last 3 minutes. The first keyword instructs the runtime to output as soon as there is a new value for a group. After output for a given group the runtime suppresses output for the same group for 10 seconds and does not suppress output for other groups. Output occurs again for that group after the interval when the group has new value(s):

select symbol, sum(price) from StockTickEvent#time(3 min)
group by symbol output first every 10 seconds

The order by clause is optional. It is used for ordering output events by their properties, or by expressions involving those properties. .

For example, the following statement outputs batches of 5 or more stock tick events that are sorted first by price ascending and then by volume ascending:

select symbol from StockTickEvent#time(60 sec) 
output every 5 events 
order by price, volume

Here is the syntax for the order by clause:

order by expression [asc | desc] [, expression [asc | desc]] [, ...]

If the order by clause is absent then the runtime still makes certain guarantees about the ordering of output:

  • If the statement is not a join, does not group via group by clause and does not declare grouped data windows via #groupwin, the order in which events are delivered to listeners and through the iterator pull API is the order of event arrival.

  • If the statement is a join or outer join, or groups, then the order in which events are delivered to listeners and through the iterator pull API is not well-defined. Use the order by clause if your application requires events to be delivered in a well-defined order.

The compiler places the following restrictions on the expressions in the order by clause:

  1. All aggregate functions that appear in the order by clause must also appear in the select expression.

Otherwise, any kind of expression that can appear in the select clause, as well as any name defined in the select clause, is also valid in the order by clause.

By default all sort operations on string values are performed via the compare method and are thus not locale dependent. To account for differences in language or locale, see Section 17.5.5, “Compiler Settings Related to Language and Locale” to change this setting.

The limit clause is typically used together with the order by and output clause to limit your statement results to those that fall within a specified range. You can use it to receive the first given number of result rows, or to receive a range of result rows.

There are two syntaxes for the limit clause, each can be parameterized by integer constants or by variable names. The first syntax is shown below:

limit row_count [offset offset_count]

The required row_count parameter specifies the number of rows to output. The row_count can be an integer constant and can also be the name of the integer-type variable to evaluate at runtime.

The optional offset_count parameter specifies the number of rows that should be skipped (offset) at the beginning of the result set. A variable can also be used for this parameter.

The next sample statement outputs the top 10 counts per property 'uri' every 1 minute.

select uri, count(*) from WebEvent 
group by uri 
output snapshot every 1 minute
order by count(*) desc 
limit 10

The next statement demonstrates the use of the offset keyword. It outputs ranks 3 to 10 per property 'uri' every 1 minute:

select uri, count(*) from WebEvent 
group by uri 
output snapshot every 1 minute
order by count(*) desc 
limit 8 offset 2

The second syntax for the limit clause is for SQL standard compatibility and specifies the offset first, followed by the row count:

limit offset_count[, row_count]

The following are equivalent:

limit 8 offset 2
// ...equivalent to
limit 2, 8

A negative value for row_count returns an unlimited number or rows, and a zero value returns no rows. If variables are used, then the current variable value at the time of output dictates the row count and offset. A variable returning a null value for row_count also returns an unlimited number or rows.

A negative value for offset is not allowed. If your variable returns a negative or null value for offset then the value is assumed to be zero (i.e. no offset).

The iterator pull API also honors the limit clause, if present.

The insert into clause is optional in EPL. The clause can be specified to make the results of a statement available as an event stream for use in further statements, or to insert events into a named window or table. The clause can also be used to merge multiple event streams to form a single stream of events.

The syntax for the insert into clause is as follows:

insert [istream | irstream | rstream] into event_stream_name  [ ( [property_name [, property_name]] ) ]

The istream (default) and rstream keywords are optional. If no keyword or the istream keyword is specified, the runtime supplies the insert stream events generated by the statement. The insert stream consists of the events entering the respective window(s) or stream(s). If the rstream keyword is specified, the runtime supplies the remove stream events generated by the statement. The remove stream consists of the events leaving the respective window(s).

If your application specifies irstream, the runtime inserts into the new stream both the insert and remove stream. This is often useful in connection with the istream built-in function that returns an inserted/removed boolean indicator for each event, see Section 10.1.11, “The Istream Function”.

The event_stream_name is an identifier that names the event stream (and also implicitly names the types of events in the stream) generated by the compiler. It may also specify a named window name or a table name. The identifier can be used in further statements to filter and process events of that event stream, unless inserting into a table. The insert into clause can consist of just an event stream name, or an event stream name and one or more property names.

The runtime also allows listeners to be attached to a statement that contain an insert into clause. Listeners receive all events posted to the event stream.

To merge event streams, simply use the same event_stream_name identifier in all statements that merge their result event streams. Make sure to use the same number and names of event properties and event property types match up.

The compiler places the following restrictions on the insert into clause:

  1. The number of elements in the select clause must match the number of elements in the insert into clause if the clause specifies a list of event property names

  2. If the event stream name has already been defined by a prior statement or configuration, and the event property names and/or event types do not match, an exception is thrown at statement compile time.

The following sample inserts into an event stream by name CombinedEvent:

insert into CombinedEvent
select A.customerId as custId, A.timestamp - B.timestamp as latency
  from EventA#time(30 min) A, EventB#time(30 min) B
 where A.txnId = B.txnId

Each event in the CombinedEvent event stream has two event properties named "custId" and "latency". The events generated by the above statement can be used in further statements, such as shown in the next statement:

select custId, sum(latency)
  from CombinedEvent#time(30 min)
 group by custId

The example statement below shows the alternative form of the insert into clause that explicitly defines the property names to use.

insert into CombinedEvent (custId, latency)
select A.customerId, A.timestamp - B.timestamp 
...

The rstream keyword can be useful to indicate to the runtime to generate only remove stream events. This can be useful if you want to trigger actions when events leave a window rather then when events enter a window. The statement below generates CombinedEvent events when EventA and EventB leave the window after 30 minutes.

insert rstream into CombinedEvent
select A.customerId as custId, A.timestamp - B.timestamp as latency
  from EventA#time(30 min) A, EventB#time(30 min) B
 where A.txnId = B.txnId

The insert into clause can be used in connection with patterns to provide pattern results to further statements for analysis:

insert into ReUpEvent
select linkUp.ip as ip 
from pattern [every linkDown=LinkDownEvent -> linkUp=LinkUpEvent(ip=linkDown.ip)]

The insert into clause allows to merge multiple event streams into a event single stream. The clause names an event stream to insert into by specifing an event_stream_name. The first statement that inserts into the named stream defines the stream's event types. Further statements that insert into the same event stream must match the type of events inserted into the stream as declared by the first statement.

One approach to merging event streams specifies individual colum names either in the select clause or in the insert into clause of the statement. This approach has been shown in earlier examples.

Another approach to merging event streams specifies the wildcard (*) in the select clause (or the stream wildcard) to select the underlying event. The events in the event stream must then have the same event type as generated by the from clause.

Assume a statement creates an event stream named MergedStream by selecting OrderEvent events:

insert into MergedStream select * from OrderEvent

A statement can use the stream wildcard selector to select only OrderEvent events in a join:

insert into MergedStream select ord.* from ItemScanEvent, OrderEvent as ord

And a statement may also use an application-supplied user-defined function to convert events to OrderEvent instances:

insert into MergedStream select MyLib.convert(item) from ItemScanEvent as item

The compiler specifically recognizes a conversion function as follows: A conversion function must be the only selected column, and it must return either a Java object or java.util.Map or Object[] (object array). Your EPL should not use the as keyword to assign a column name.

A variant stream is a predefined stream into which events of multiple disparate event types can be inserted.

A variant stream name may appear anywhere in a pattern or from clause. In a pattern, a filter against a variant stream matches any events of any of the event types inserted into the variant stream. In a from clause including for named windows, data windows may hold events of any of the event types inserted into the variant stream.

A variant stream is thus useful in problems that require different types of event to be treated the same.

Variant streams can be declared by means of create variant schema or can be predefined via runtime or initialization-time configuration as described in Section 17.4.16, “Variant Stream”. Your application may declare or predefine variant streams to carry events of a limited set of event types, or you may choose the variant stream to carry any and all types of events. This choice affects what event properties are available for consuming statements or patterns of the variant stream.

Assume that an application predefined a variant stream named OrderStream to carry only ServiceOrder and ProductOrder events. An insert into clause inserts events into the variant stream:

insert into OrderStream select * from ServiceOrder
insert into OrderStream select * from ProductOrder

Here is a sample statement that consumes the variant stream and outputs a total price per customer id for the last 30 seconds of ServiceOrder and ProductOrder events:

select customerId, sum(price) from OrderStream#time(30 sec) group by customerId

If your application predefines the variant stream to hold specific type of events, as the sample above did, then all event properties that are common to all specified types are visible on the variant stream, including nested, indexed and mapped properties. For access to properties that are only available on one of the types, the dynamic property syntax must be used. In the example above, the customerId and price were properties common to both ServiceOrder and ProductOrder events.

For example, here is a consuming statement that selects a service duraction property that only ServiceOrder events have, and that must therefore be casted to double and null values removed in order to aggregate:

select customerId, sum(coalesce(cast(serviceDuraction?, double), 0)) 
from OrderStream#time(30 sec) group by customerId

If your application predefines a variant stream to hold any type of events (the any type variance), then all event properties of the variant stream are effectively dynamic properties.

For example, an application may define an OutgoingEvents variant stream to hold any type of event. The next statement is a sample consumer of the OutgoingEvents variant stream that looks for the destination property and fires for each event in which the property exists with a value of 'email':

select * from OutgoingEvents(destination = 'email')

When you declare the inserted-into event type in advance to the statement that inserts, the runtime compares the inserted-into event type information to the return type of expressions in the select-clause. The comparison uses the column alias assigned to each select-clause expression using the as keyword.

When the inserted-into column type is an event type and when using a subquery or the new operator, the runtime compares column names assigned to subquery columns or new operator columns.

For example, assume a PurchaseOrder event type that has a property called items that consists of Item rows:

create schema Item(name string, price double)
create schema PurchaseOrder(orderId string, items Item[])

Declare a statement that inserts into the PurchaseOrder stream:

insert into PurchaseOrder 
select '001' as orderId, new {name='i1', price=10} as items
from TriggerEvent

The alias assigned to the first and second expression in the select-clause, namely orderId and items, both match the event property names of the Purchase Order event type. The column names provided to the new operator also both match the event property names of the Item event type.

When the event type declares the column as a single value (and not an array) and when the select-clause expression produces a multiple rows, the runtime only populate the first row.

Consider a PurchaseOrder event type that has a property called item that consists of a single Item event:

create schema PurchaseOrder(orderId string, items Item)

The sample subquery below populates only the very first event, discarding remaining subquery result events, since the items property above is declared as holding a single Item-typed event only (versus Item[] to hold multiple Item-typed events).

insert into PurchaseOrder select 
(select 'i1' as name, 10 as price from HistoryEvent#length(2)) as items 
from TriggerEvent

Consider using a subquery with filter, or one of the enumeration methods to select a specific subquery result row.

A subquery is a select within another statement. The compiler supports subqueries in the select clause, where clause, having clause and in stream and pattern filter expressions. Subqueries provide an alternative way to perform operations that would otherwise require complex joins. Subqueries can also make statements more readable then complex joins.

EPL supports both simple subqueries as well as correlated subqueries. In a simple subquery, the inner query is not correlated to the outer query. Here is an example simple subquery within a select clause:

select assetId, (select zone from ZoneClosed#lastevent) as lastClosed from RFIDEvent

If the inner query is dependent on the outer query, you will have a correlated subquery. An example of a correlated subquery is shown below. Notice the where clause in the inner query, where the condition involves a stream from the outer query:

select * from RfidEvent as RFID where 'Dock 1' = 
  (select name from Zones#unique(zoneId) where zoneId = RFID.zoneId)

The example above shows a subquery in the where clause. The statement selects RFID events in which the zone name matches a string constant based on zone id. The statement sets #unique to guarantee that only the last event per zone id is retained for processing by the subquery.

The next example is a correlated subquery within a select clause. In this statement the select clause retrieves the zone name by means of a subquery against the Zones set of events correlated by zone id:

select zoneId, (select name from Zones#unique(zoneId) 
  where zoneId = RFID.zoneId) as name from RFIDEvent

Note that when a simple or correlated subquery returns multiple rows, the runtime returns a null value as the subquery result. To limit the number of events returned by a subquery consider using one of the #lastevent, #unique data windows or aggregation functions or the multi-row and multi-column-select as described below.

The select clause of a subquery also allows wildcard selects, which return as an event property the underlying event object of the event type as defined in the from clause. An example:

select (select * from MarketData#lastevent) as md 
  from pattern [every timer:interval(10 sec)]

The output events to the statement above contain the underlying MarketData event in a property named "md". The statement populates the last MarketData event into a property named "md" every 10 seconds following the pattern definition, or populates a null value if no MarketData event has been encountered so far.

Aggregation functions may be used in the select clause of the subselect as this example outlines:

select * from MarketData
where price > (select max(price) from MarketData(symbol='GOOG')#lastevent)

As the sub-select expression is evaluated first (by default), the query above actually never fires for the GOOG symbol, only for other symbols that have a price higher then the current maximum for GOOG. As a sidenote, the insert into clause can also be handy to compute aggregation results for use in multiple subqueries.

When using aggregation functions in a correlated subselect the runtime computes the aggregation based on data window (if provided), named window or table contents matching the where-clause.

The following example compares the quantity value provided by the current order event against the total quantity of all order events in the last 1 hour for the same client.

select * from OrderEvent oe
where qty > 
  (select sum(qty) from OrderEvent#time(1 hour) pd 
  where pd.client = oe.client)

Filter expressions in a pattern or stream may also employ subqueries. Subqueries can be uncorrelated or can be correlated to properties of the stream or to properties of tagged events in a pattern. Subqueries may reference named windows and tables as well.

The following example filters BarData events that have a close price less then the last moving average (field movAgv) as provided by stream SMA20Stream (an uncorrelated subquery):

select * from BarData(ticker='MSFT', closePrice < 
    (select movAgv from SMA20Stream(ticker='MSFT')#lastevent))

A few generic examples follow to demonstrate the point. The examples use short event and property names so they are easy to read. Assume A and B are streams and DNamedWindow is a named window, and ETable is a table and properties a_id, b_id, d_id, e_id, a_val, b_val, d_val, e_val respectively:

// Sample correlated subquery as part of stream filter criteria
select * from A(a_val in 
  (select b_val from B#unique(b_val) as b where a.a_id = b.b_id)) as a
// Sample correlated subquery against a named window
select * from A(a_val in 
  (select d_val from DNamedWindow as d where a.a_id = d.d_id)) as a
// Sample correlated subquery in the filter criteria as part of a pattern, querying a named window
select * from pattern [
  a=A -> b=B(bvalue = 
    (select d_val from DNamedWindow as d where d.d_id = b.b_id and d.d_id = a.a_id))
]
// Sample correlated subquery against a table
select * from A(a_val in 
  (select e_val from ETable as e where a.a_id = e.e_id)) as a

Subquery state starts to accumulate as soon as a statement starts (and not only when a pattern-subexpression activates).

The following restrictions apply to subqueries:

  1. Subqueries can only consist of a select clause, a from clause, a where clause, a group by clause and a having clause. Joins, outer-joins and output rate limiting are not permitted within subqueries.

  2. If using aggregation functions in a subquery, note these limitations:

    1. None of the properties of the correlated stream(s) can be used within aggregation functions.

    2. The properties of the subselect stream must all be within aggregation functions.

  3. With the exception of subqueries against named windows and tables and subqueries that are both uncorrelated and fully-aggregated, the subquery stream definition must define a data window to limit subquery results, for the purpose of identifying the events held for subquery execution.

  4. The having-clause, if present, requires that properties of the selected stream are aggregated and does not allow un-aggregated properties of the selected stream. You may use the first aggregation function to obtain properties of the selected stream instead.

The order of evaluation of subqueries relative to the containing statement is guaranteed: If the containing statement and its subqueries are reacting to the same type of event, the subquery will receive the event first before the containing statement's clauses are evaluated. This behavior can be changed via configuration. The order of evaluation of subqueries is not guaranteed between subqueries.

Performance of your statement containing one or more subqueries principally depends on two parameters. First, if your subquery correlates one or more columns in the subquery stream with the enclosing statement's streams, the compiler determines and the runtime automatically builds the appropriate indexes for fast row retrieval based on the key values correlated (joined). The second parameter is the number of rows found in the subquery stream and the complexity of the filter criteria (where clause), as each row in the subquery stream must evaluate against the where clause filter.

The any subquery condition is true if the expression returns true for one or more of the values returned by the subquery.

The synopsis for the any keyword is as follows:

expression operator any (subquery)
expression operator some (subquery)

The left-hand expression is evaluated and compared to each row of the subquery result using the given operator, which must yield a Boolean result. The result of any is "true" if any true result is obtained. The result is "false" if no true result is found (including the special case where the subquery returns no rows).

The operator can be any of the following values: =, !=, <>, <, <=, >, >=.

The some keyword is a synonym for any. The in construct is equivalent to = any.

The right-hand side subquery must return exactly one column.

The next statement demonstrates the use of the any subquery condition:

select * from ProductOrder as ord
  where quantity < any
    (select minimumQuantity from MinimumQuantity#keepall)

The above statement compares ProductOrder event's quantity value with all rows from the MinimumQuantity stream of events and returns only those ProductOrder events that have a quantity that is less then any of the minimum quantity values of the MinimumQuantity events.

Note that if there are no successes and at least one right-hand row yields null for the operator's result, the result of the any construct will be null, not false. This is in accordance with SQL's normal rules for Boolean combinations of null values.

Your subquery may select multiple columns in the select clause including multiple aggregated values from a data window or named window or table.

The following example is a correlated subquery that selects wildcard and in addition selects the bid and offer properties of the last MarketData event for the same symbol as the arriving OrderEvent:

select *,
  (select bid, offer from MarketData#unique(symbol) as md 
   where md.symbol = oe.symbol) as bidoffer
from OrderEvent oe

Output events for the above statement contain all properties of the original OrderEvent event. In addition each output event contains a bidoffer nested property that itself contains the bid and offer properties. You may retrieve the bid and offer from output events directly via the bidoffer.bid property name syntax for nested properties.

The next example is similar to the above statement but instead selects aggregations and selects from a named window by name OrderNamedWindow (creation not shown here). For each arriving OrderEvent it selects the total quantity and count of all order events for the same client, as currently held by the named window:

select *,
  (select sum(qty) as sumPrice, count(*) as countRows 
   from OrderNamedWindow as onw
   where onw.client = oe.client) as pastOrderTotals
from OrderEvent as oe

The next statement computes a prorated quantity considering the maximum and minimum quantity for the last 1 minute of order events:

expression subq {
  (select max(quantity) as maxq, min(quantity) as minq from OrderEvent#time(1 min))
}
select (quantity - minq) / (subq().maxq  - subq().minq) as prorated
from OrderEvent

Output events for the above statement contain all properties of the original OrderEvent event. In addition each output event contains a pastOrderTotals nested property that itself contains the sumPrice and countRows properties.

While a subquery cannot change the cardinality of the selected stream, a subquery can return multiple values from the selected data window or named window or table. This section shows examples of the window aggregation function as well as the use of enumeration methods with subselects.

Consider using an inner join, outer join or unidirectional join instead to achieve a 1-to-many cardinality in the number of output events.

The next example is an uncorrelated subquery that selects all current ZoneEvent events considering the last ZoneEvent per zone for each arriving RFIDEvent.

select assetId,
 (select window(z.*) as winzones from ZoneEvent#unique(zone) as z) as zones
 from RFIDEvent

Output events for the above statement contain two properties: the assetId property and the zones property. The latter property is a nested property that contains the winzones property. You may retrieve the zones from output events directly via the zones.winzones property name syntax for nested properties.

In this example for a correlated subquery against a named window, assume that the OrderNamedWindow has been created and contains order events. The statement returns for each MarketData event the list of order ids for orders with the same symbol:

select price,
 (select window(orderId) as winorders 
  from OrderNamedWindow onw 
  where onw.symbol = md.symbol) as orderIds
 from MarketData md

Output events for the above statement contain two properties: the price property and the orderIds property. The latter property is a nested property that contains the winorders property of type array.

Another option to reduce selected rows to a single value is through the use of enumeration methods.

select price,
 (select *  from OrderNamedWindow onw
  where onw.symbol = md.symbol).selectFrom(v => v) as ordersSymbol
 from MarketData md

Output events for the above statement also contain a Collection of underlying events in the ordersSymbol property.

The following hints are available to tune performance and memory use of subqueries.

Use the @Hint('set_noindex') hint for a statement that utilizes one or more subqueries. It instructs the runtime to always perform a full scan. The runtime does not build an implicit index or use an explicitly-created index when this hint is provided. Use of the hint may result in reduced memory use but poor statement performance.

The following hints are available to tune performance and memory use of subqueries that select from named windows (does not apply to tables).

Named windows are globally-visible data windows. As such an application may create explicit indexes as discussed in Section 6.9, “Explicitly Indexing Named Windows and Tables”. The runtime may also elect to create implicit indexes (no create-index EPL required) for index-based lookup of rows when executing on-select, on-merge, on-update and on-delete statements and for statements that subquery a named window.

By default and without specifying a hint, each statement that subqueries a named window also maintains its own index for looking up events held by the named window. The runtime maintains the index by consuming the named window insert and remove stream. When the statement is undeployed it releases that index.

Specify the @Hint('enable_window_subquery_indexshare') hint to enable subquery index sharing for named windows. When using this hint, indexes for subqueries are maintained by the named window itself (and not each statement context partition). However only indexes explictly created with create index are used in this case. Specify the hint once as part of the create window statement.

This sample statement creates a named window with subquery index sharing enabled:

@Hint('enable_window_subquery_indexshare')
create window OrdersNamedWindow#keepall as OrderMapEventType

When subquery index sharing is enabled, performance may increase as named window stream consumption is no longer needed for correlated subqueries. You may also expect reduced memory use especially if a large number of statements perform similar subqueries against a named window. Subquery index sharing may require additional short-lived object creation and may slightly increase lock held time for named windows.

The following statement performs a correlated subquery against the named window above. When a settlement event arrives it select the order detail for the same order id as provided by the settlement event:

select 
  (select * from OrdersNamedWindow as onw 
    where onw.orderId = se.orderId) as orderDetail
  from SettlementEvent as se

With subquery index sharing enabled and only when a suitable index exists the query planner uses the index. A sample index is:

create index MyIndex on OrdersNamedWindow(orderId)

You may disable subquery index sharing for a specific statement by specifying the @Hint('disable_window_subquery_indexshare') hint, as this example shows, causing the statement to maintain its own index:

@Hint('disable_window_subquery_indexshare')
select 
  (select * from OrdersNamedWindow as onw 
    where onw.orderId = se.orderId) as orderDetail
  from SettlementEvent as se

Two or more event streams can be part of the from-clause and thus both (all) streams determine the resulting events. This section summarizes the important concepts. The sections that follow present more detail on each topic.

The default join is an inner join which produces output events only when there is at least one match in all streams.

Consider the sample statement shown next:

select * from TickEvent#lastevent, NewsEvent#lastevent

The above statement outputs the last TickEvent and the last NewsEvent in one output event when either a TickEvent or a NewsEvent arrives. If no TickEvent was received before a NewsEvent arrives, no output occurs. Similarly when no NewsEvent was received before a TickEvent arrives, no output occurs.

The where-clause lists the join conditions that the compiler uses to relate events in the two or more streams.

The next example statement retains the last TickEvent and last NewsEvent per symbol, and joins the two streams based on their symbol value:

select * from TickEvent#unique(symbol) as t, NewsEvent#unique(symbol) as n
where t.symbol = n.symbol

As before, when a TickEvent arrives for a symbol that has no matching NewsEvent then there is no output event.

An outer join does not require each event in either stream to have a matching event. The full outer join is useful when output is desired when no match is found. The different outer join types (full, left, right) are explained in more detail below.

This example statement is an outer-join and also returns the last TickEvent and last NewsEvent per symbol:

select * from TickEvent#unique(symbol) as t
full outer join NewsEvent#unique(symbol) as n on t.symbol = n.symbol

In the sample statement above, when a TickEvent arrives for a symbol that has no matching NewsEvent, or when a NewsEvent arrives for a symbol that has no matching TickEvent, the statement still produces an output event with a null column value for the missing event.

Note that each of the sample statements above defines a data window. The sample statements above use the last-event data window (#lastevent) or the unique data window (#unique). A data window serves to indicate the subset of events to join from each stream and may be required depending on the join.

In above statements, when either a TickEvent arrives or when a NewsEvent arrives then the statement evaluates and there is output. The same holds true if additional streams are added to the from-clause: Each of the streams in the from-clause trigger the join to evaluate.

The unidirectional keyword instructs the runtime to evaluate the join only when an event arrives from the single stream that was marked with the unidirectional keyword. In this case no data window should be specified for the stream marked as unidirectional since the keyword implies that the current event of that stream triggers the join.

Here is the sample statement above with unidirectional keyword, so that output occurs only when a TickEvent arrives and not when a NewsEvent arrives:

select * from TickEvent as t unidirectional, NewsEvent#unique(symbol) as n 
where t.symbol = n.symbol

It is oftentimes the case that an aggregation (count, sum, average) only needs to be calculated in the context of an arriving event or timer. Consider using the unidirectional keyword when aggregating over joined streams.

An EPL pattern is a normal citizen also providing a stream of data consisting of pattern matches. A time pattern, for example, can be useful to evaluate a join and produce output upon each interval.

This sample statement includes a pattern that fires every 5 seconds and thus triggers the join to evaluate and produce output, computing an aggregated total quantity per symbol every 5 seconds:

select symbol, sum(qty) from pattern[every timer:interval(5 sec)] unidirectional, 
  TickEvent#unique(symbol) t, NewsEvent#unique(symbol) as n 
where t.symbol = n.symbol group by symbol

Named windows as well as reference and historical data such as stored in your relational database, and data returned by a method/script/UDF invocation, can also be included in joins as discussed in Section 5.13, “Accessing Relational Data via SQL” and Section 5.14, “Accessing Non-Relational Data via Method, Script or UDF Invocation”.

Related to joins are subqueries: A subquery is a select within another statement, see Section 5.11, “Subqueries”

The compiler performs extensive statement analysis and planning, building internal indexes and strategies as required to allow fast evaluation of many types of statements.

Each point in time that an event arrives to one of the event streams, the two event streams are joined and output events are produced according to the where clause when matching events are found for all joined streams.

This example joins 2 event streams. The first event stream consists of fraud warning events for which it keep the last 30 minutes. The second stream is withdrawal events for which it considers the last 30 seconds. The streams are joined on account number.

select fraud.accountNumber as accntNum, fraud.warning as warn, withdraw.amount as amount,
       max(fraud.timestamp, withdraw.timestamp) as timestamp, 'withdrawlFraud' as desc
  from FraudWarningEvent#time(30 min) as fraud, WithdrawalEvent#time(30 sec) as withdraw
 where fraud.accountNumber = withdraw.accountNumber

Joins can also include one or more pattern statements as the next example shows:

select * from FraudWarningEvent#time(30 min) as fraud,
    pattern [every w=WithdrawalEvent -> PINChangeEvent(acct=w.acct)]#lastevent as withdraw
 where fraud.accountNumber = withdraw.w.accountNumber

The statement above joins the last 30 minutes of fraud warnings with a pattern. The pattern consists of every withdrawal event that is followed by a PIN change event for the same account number. It joins the two event streams on account number. The last-event window instucts the join to only consider the last pattern match.

In a join and outer join, your statement must declare a data window onto each stream. Streams that are marked as unidirectional and named windows and tables as well as database or methods in a join are an exception and do not require a data window. If you are joining an event to itself via contained-event selection, data windows also do not need to be specified. The reason that a data window must be declared is that a data window specifies which events are considered for the join (i.e. last event, last 10 events, all events, last 1 second of events etc.).

The next example joins all FraudWarningEvent events that arrived since the statement was started, with the last 20 seconds of PINChangeEvent events:

select * from FraudWarningEvent#keepall as fraud, PINChangeEvent#time(20 sec) as pin
 where fraud.accountNumber = pin.accountNumber

The above example employed the special keep-all window that retains all events.

EPL supports left outer joins, right outer joins, full outer joins and inner joins in any combination between an unlimited number of event streams. Outer and inner joins can also join reference and historical data as explained in Section 5.13, “Accessing Relational Data via SQL”, as well as join data returned by a method, script or UDF invocation as outlined in Section 5.14, “Accessing Non-Relational Data via Method, Script or UDF Invocation”.

The keywords left, right, full and inner control the type of the join between two streams. The optional on clause specifies one or more properties that join each stream. The synopsis is as follows:

...from stream_def [as name] 
  ((left|right|full outer) | inner) join stream_def 
  [on property = property [and property = property ...] ]
  [ ((left|right|full outer) | inner) join stream_def [on ...]]...

If the outer join is a left outer join, there will be at least one output event for each event of the stream on the left-hand side of the clause. For example, in the left outer join shown below you get output for each event in the stream RfidEvent, even if the event does not match any event in the event stream OrderList.

select * from RfidEvent#time(30 sec) as rfid
       left outer join
       OrderList#length(10000) as orderlist
     on rfid.itemId = orderList.itemId

Similarly, if the join is a Right Outer Join, then there will be at least one output event for each event of the stream on the right-hand side of the clause. For example, in the right outer join shown below you get output for each event in the stream OrderList, even if the event does not match any event in the event stream RfidEvent.

select * from RfidEvent#time(30 sec) as rfid
       right outer join
       OrderList#length(10000) as orderlist
       on rfid.itemId = orderList.itemId

For all types of outer joins, if the join condition is not met, the select list is computed with the event properties of the arrived event while all other event properties are considered to be null.

The next type of outer join is a full outer join. In a full outer join, each point in time that an event arrives to one of the event streams, one or more output events are produced. In the example below, when either an RfidEvent or an OrderList event arrive, one or more output event is produced. The next example shows a full outer join that joins on multiple properties:

select * from RfidEvent#time(30 sec) as rfid
       full outer join
       OrderList#length(10000) as orderlist
       on rfid.itemId = orderList.itemId and rfid.assetId = orderList.assetId

The last type of join is an inner join. In an inner join, the runtime produces at least one output event for each event of the stream on the left-hand side that matches at least one event on the right hand side considering the join properties. For example, in the inner join shown below you get output for each event in the RfidEvent stream that matches one or more events in the OrderList data window:

select * from RfidEvent#time(30 sec) as rfid
       inner join
       OrderList#length(10000) as orderlist
       on rfid.itemId = orderList.itemId and rfid.assetId = orderList.assetId

Patterns as streams in a join follow this rule: If your statement does not specify a data window for the pattern then the pattern stream retains the last match. Thus a pattern must have matched at least once for the last match to become available in a join. Multiple rows from a pattern stream may be retained by declaring a data window onto a pattern using the pattern [...]#window_spec syntax.

This example outer joins multiple streams. Here the RfidEvent stream is outer joined to both ProductName and LocationDescription via left outer join:

select * from RfidEvent#time(30 sec) as rfid
      left outer join ProductName#keepall as refprod
        on rfid.productId = refprod.prodId
      left outer join LocationDescription#keepall as refdesc
        on rfid.location = refdesc.locId

If the optional on clause is specified, it may only employ the = equals operator and property names. Any other operators must be placed in the where-clause. The stream names that appear in the on clause may refer to any stream in the from-clause.

Your EPL may also provide no on clause. This is useful when the streams that are joined do not provide any properties to join on, for example when joining with a time-based pattern.

The next example employs a unidirectional left outer join such that the runtime, every 10 seconds, outputs a count of the number of RfidEvent events in the 60-second time window.

select count(*) from
  pattern[every timer:interval(1)] unidirectional 
  left outer join
  RfidEvent#time(60 sec)

In a join or outer join your statement lists multiple event streams, data windows and/or patterns in the from clause. As events arrive into the runtime, each of the streams (data windows, patterns) provides insert and remove stream events. The runtime evaluates each insert and remove stream event provided by each stream, and joins or outer joins each event against data window contents of each stream, and thus generates insert and remove stream join results.

The direction of the join execution depends on which stream or streams are currently providing an insert or remove stream event for executing the join. A join is thus multidirectional, or bidirectional when only two streams are joined. A join can be made unidirectional if your application does not want new results when events arrive on a given stream or streams.

The unidirectional keyword can be used in the from clause to identify streams that provide the events to execute the join. If the keyword is present for a stream, all other streams in the from clause become passive streams. When events arrive or leave a data window of a passive stream then the join does not generate join results.

For example, consider a use case that requires us to join stock tick events (TickEvent) and news events (NewsEvent). The unidirectional keyword allows to generate results only when TickEvent events arrive, and not when NewsEvent arrive or leave the 10-second time window:

select * from TickEvent unidirectional, NewsEvent#time(10 sec) 
where tick.symbol = news.symbol

Aggregation functions in a unidirectional join aggregate within the context of each unidirectional event evaluation and are not cumulative. Thereby aggregation functions when used with unidirectional may evaluate faster as they do not need to consider a remove stream (data removed from data windows or named windows).

The count function in the next statement returns, for each TickEvent, the number of matching NewEvent events:

select count(*) from TickEvent unidirectional, NewsEvent#time(10 sec) 
where tick.symbol = news.symbol

The following restrictions apply to unidirectional joins:

  1. The unidirectional keyword can only be specified for a single stream in the from clause, unless all streams are in a full outer join and all streams declare unidirectional.

  2. Receiving data from a unidirectional join via the pull API (iterator method) is not allowed. This is because the runtime holds no state for the single stream that provides the events to execute the join.

  3. The stream that declares the unidirectional keyword cannot declare a data window for that stream, since remove stream events are not processed for the single stream.

When joining 3 or more streams (including any relational or non-relational sources as below) it can sometimes help to provide the query planner instructions how to best execute the join. The compiler compiles a query plan for the statement. You can output the query plan to logging (see configuration).

An outer join that specifies only inner keywords for all streams is equivalent to an default (inner) join. The following two statements are equivalent:

select * from TickEvent#lastevent, 
    NewsEvent#lastevent where tick.symbol = news.symbol

Equivalent to:

select * from TickEvent#lastevent 
	inner join NewsEvent#lastevent on tick.symbol = news.symbol

For all types of joins, the query planner determines a query graph: The term is used here for all the information regarding what properties or expressions are used to join the streams. The query graph thus includes the where-clause expressions as well as outer-join on-clauses if this statement is an outer join. The query planner also computes a dependency graph which includes information about all historical data streams (relational and non-relational as below) and their input needs.

For default (inner) joins the query planner first attempts to find a path of execution as a nested iteration. For each stream the query planner selects the best order of streams available for the nested iteration considering the query graph and dependency graph. If the full depth of the join is achievable via nested iteration for all streams without full table scan then the query planner uses that nested iteration plan. If not, then the query planner re-plans considering a merge join (Cartesian) approach instead.

Specify the @Hint('prefer_merge_join') to instruct the query planner to prefer a merge join plan instead of a nested iteration plan. Specify the @Hint('force_nested_iter') to instruct the query planner to always use a nested iteration plan.

For example, consider the below statement. Depending on the number of matching rows in OrderBookOne and OrderBookTwo (named windows in this example, and assumed to be defined elsewhere) the performance of the join may be better using the merge join plan.

@Hint('prefer_merge_join') 
select * from TickEvent#lastevent t, 
	OrderBookOne ob1, OrderBookOne ob2
where ob1.symbol = t.symbol and ob2.symbol = t.symbol 
and ob1.price between t.buy and t.sell and ob2.price between t.buy and t.sell

For outer joins the query planner considers nested iteration and merge join (Cartesian) equally and above hints don't apply.

For NEsper .NET also see Section J.13, “.NET Accessing Relational Data via SQL”.

This chapter outlines how reference data and historical data that are stored in a relational database can be queried via SQL within statements.

EPL can access via join and outer join as well as via iterator (poll) API all types of event streams to stored data. In order for such data sources to become accessible to EPL, some configuration is required. The Section 17.4.12, “Relational Database Access” explains the required configuration for database access in greater detail, and includes information on configuring a query result cache.

The compiler does not parse or otherwise inspect your SQL query. Therefore your SQL can make use of any database-specific SQL language extensions or features that your database provides.

If you have enabled SQL query result caching in your database configuration, the runtime retains SQL query results in cache following the configured cache eviction policy.

Also if you have enabled SQL query result caching in your database configuration and provide EPL where clause and/or on clause (outer join) expressions, then the runtime builds indexes on the SQL query results to enable fast lookup. This is especially useful if your SQL queries return a large number of rows. For building the proper indexes, the compiler inspects the expression found in your statement where clause, if present. For outer joins, the compiler also inspects your statement on clause. The compiler analyzes the EPL on clause and where clause expressions, if present, looking for property comparison with or without logical AND-relationships between properties. When a SQL query returns rows for caching, the runtime builds and caches the appropriate index and lookup strategies for fast row matching against indexes.

Joins or outer joins in which only SQL statements or method, script and UDF invocations are listed in the from clause and no other event streams are termed passive joins. A passive join does not produce an insert or remove stream and therefore does not invoke statement listeners with results. A passive join can be iterated on (polled) using a statement's safeIterator and iterator methods.

There are no restrictions to the number of SQL statements or types of streams joined. The following restrictions currently apply:

  • An SQL query cannot declare data windows; That is, you cannot create a time or length window on an SQL query. Instead, use insert into to make join results available for further processing.

  • Your database software must support JDBC prepared statements that provide statement meta data at compilation time. Most major databases provide this function. A workaround is available for databases that do not provide this function.

  • JDBC drivers must support the getMetadata feature. A workaround is available as below for JDBC drivers that don't support getting metadata.

The next sections assume basic knowledge of SQL (Structured Query Language).

To join an event stream against stored data, specify the sql keyword followed by the name of the database and a parameterized SQL query. The syntax to use in the from clause of a statement is:

sql:database_name [" parameterized_sql_query "]

The runtime uses the database_name identifier to obtain configuration information in order to establish a database connection, as well as settings that control connection creation and removal. Please see Section 17.4.12, “Relational Database Access” to configure an runtime for database access.

Following the database name is the SQL query to execute. The SQL query can contain one or more substitution parameters. The SQL query string is placed in single brackets [ and ]. The SQL query can be placed in either single quotes (') or double quotes ("). The SQL query grammer is passed to your database software unchanged, allowing you to write any SQL query syntax that your database understands, including stored procedure calls.

Substitution parameters in the SQL query string take the form ${expression}. The compiler resolves expression at statement execution time to the actual expression result by evaluating the events in the joined event stream or current variable values, if any event property references or variables occur in the expression. An expression may not contain EPL substitution parameters.

The compiler determines the type of the SQL query output columns by means of the result set metadata that your database software returns for the statement. The actual SQL query results are obtained via the getObject on java.sql.ResultSet.

The sample statement below joins an event stream consisting of CustomerCallEvent events with the results of an SQL query against the database named MyCustomerDB and table Customer:

select custId, cust_name from CustomerCallEvent,
  sql:MyCustomerDB [' select cust_name from Customer where cust_id = ${custId} ']

The example above assumes that CustomerCallEvent supplies an event property named custId. The SQL query selects the customer name from the Customer table. The where clause in the SQL matches the Customer table column cust_id with the value of custId in each CustomerCallEvent event. The runtime executes the SQL query for each new CustomerCallEvent encountered.

If the SQL query returns no rows for a given customer id, the runtime generates no output event. Else the runtime generates one output event for each row returned by the SQL query. An outer join as described in the next section can be used to control whether the runtime should generate output events even when the SQL query returns no rows.

The next example adds a time window of 30 seconds to the event stream CustomerCallEvent. It also renames the selected properties to customerName and customerId to demonstrate how the naming of columns in an SQL query can be used in the select clause in the statement. And the example uses explicit stream names via the as keyword.

select customerId, customerName from
  CustomerCallEvent#time(30 sec) as cce,
  sql:MyCustomerDB ["select cust_id as customerId, cust_name as customerName from Customer 
                  where cust_id = ${cce.custId}"] as cq

Any window, such as the time window, generates insert stream (istream) events as events enter the window, and remove stream (rstream) events as events leave the window. The runtime executes the given SQL query for each CustomerCallEvent in both the insert stream and the remove stream. As a performance optimization, the istream or rstream keywords in the select clause can be used to instruct the runtime to only join insert stream or remove stream events, reducing the number of SQL query executions.

Since any expression may be placed within the ${...} syntax, you may use variables or user-defined functions as well.

The next example assumes that a variable by name varLowerLimit is defined and that a user-defined function getLimit exists on the MyLib imported class that takes a LimitEvent as a parameter:

select * from LimitEvent le, 
  sql:MyCustomerDB [' select cust_name from Customer where 
      amount > ${max(varLowerLimit, MyLib.getLimit(le))} ']

The example above takes the higher of the current variable value or the value returned by the user-defined function to return only those customer names where the amount exceeds the computed limit.

Consider using the EPL where clause to join the SQL query result to your event stream. Similar to EPL joins and outer-joins that join event streams or patterns, the EPL where clause provides join criteria between the SQL query results and the event stream (as a side note, an SQL where clause is a filter of rows executed by your database on your database server before returning SQL query results).

The compiler analyzes the expression in the EPL where clause and outer-join on clause, if present, and builds the appropriate indexes from that information at runtime, to ensure fast matching of event stream events to SQL query results, even if your SQL query returns a large number of rows. Your applications must ensure to configure a cache for your database using configuration, as such indexes are held with regular data in a cache. If you application does not enable caching of SQL query results, the runtime does not build indexes on cached data.

The sample statement below joins an event stream consisting of OrderEvent events with the results of an SQL query against the database named MyRefDB and table SymbolReference:

select symbol, symbolDesc from OrderEvent as orders,
  sql:MyRefDB ['select symbolDesc from SymbolReference'] as reference
  where reference.symbol = orders.symbol

Notice how the EPL where clause joins the OrderEvent stream to the SymbolReference table. In this example, the SQL query itself does not have a SQL where clause and therefore returns all rows from table SymbolReference.

If your application enables caching, the SQL query fires only at the arrival of the first OrderEvent event. When the second OrderEvent arrives, the join execution uses the cached SQL query result. If the caching policy that you specified in the database configuration evicts the SQL query result from cache, then the runtime fires the SQL query again to obtain a new result and places the result in cache.

If SQL result caching is enabled and your EPL where clause, as show in the above example, provides the properties to join, then the runtime indexes the SQL query results in cache and retains the index together with the SQL query result in cache. Thus your application can benefit from high performance index-based lookups as long as the SQL query results are found in cache.

The SQL result caches operate on the level of all result rows for a given parameter set. For example, if your SQL query returns 10 rows for a certain set of parameter values then the cache treats all 10 rows as a single entry keyed by the parameter values, and the expiry policy applies to all 10 rows and not to each individual row.

It is also possible to join multiple autonomous database systems in a single statement, for example:

select symbol, symbolDesc from OrderEvent as orders,
  sql:My_Oracle_DB ['select symbolDesc from SymbolReference'] as reference,
  sql:My_MySQL_DB ['select orderList from orderHistory'] as history
  where reference.symbol = orders.symbol
  and history.symbol = orders.symbol 

Certain JDBC database drivers are known to not return metadata for precompiled prepared SQL statements. This can be a problem as metadata is required by the compiler. The compiler obtains SQL result set metadata to validate a statement and to provide column types for output events. JDBC drivers that do not provide metadata for precompiled SQL statements require a workaround. Such drivers do generally provide metadata for executed SQL statements, however do not provide the metadata for precompiled SQL statements.

Please consult the Chapter 17, Configuration for the configuration options available in relation to metadata retrieval.

To obtain metadata for an SQL statement, the compiler can alternatively fire a SQL statement which returns the same column names and types as the actual SQL statement but without returning any rows. This kind of SQL statement is referred to as a sample statement in below workaround description. The compiler can then use the sample SQL statement to retrieve metadata for the column names and types returned by the actual SQL statement.

Applications can provide a sample SQL statement to retrieve metadata via the metadatasql keyword:

sql:database_name ["parameterized_sql_query" metadatasql "sql_meta_query"] 

The sql_meta_query must be an SQL statement that returns the same number of columns, the same type of columns and the same column names as the parameterized_sql_query, and does not return any rows.

Alternatively, applications can choose not to provide an explicit sample SQL statement. If the statement does not use the metadatasql syntax, the compiler applies lexical analysis to the SQL statement. From the lexical analysis the compiler generates a sample SQL statement adding a restrictive clause "where 1=0" to the SQL statement.

Alternatively, you can add the following tag to the SQL statement: ${$ESPER-SAMPLE-WHERE}. If the tag exists in the SQL statement, the compiler does not perform lexical analysis and simply replaces the tag with the SQL where clause "where 1=0". Therefore this workaround is applicable to SQL statements that cannot be correctly lexically analyzed. The SQL text after the placeholder is not part of the sample SQL query. For example:

select mycol from sql:myDB [
  'select mycol from mytesttable ${$ESPER-SAMPLE-WHERE} where ....'], ...

If your parameterized_sql_query SQL query contains vendor-specific SQL syntax, generation of the metadata query may fail to produce a valid SQL statement. If you experience an SQL error while fetching metadata, use any of the above workarounds with the Oracle JDBC driver.

Your application may need to join data that originates from a web service, a distributed cache, an object-oriented database or simply data held in memory by your application. One way to join in external data is by means of method, script or user-defined function invocation (or procedure call or function) in the from clause of a statement.

The results of such a method, script or UDF invocation in the from clause plays the same role as a relational database table in an inner and outer join in SQL.

EPL can join and outer join an unlimited number and all types of event streams to the data returned by your invocation. In addition, the runtime can be configured to cache the data returned by your method, script or UDF invocations.

Joins or outer joins in which only SQL statements or method, script or UDF invocations are listed in the from clause and no other event streams are termed passive joins. A passive join does not produce an insert or remove stream and therefore does not invoke statement listeners with results. A passive join can be iterated on (polled) using a statement's safeIterator and iterator methods.

The following restrictions currently apply:

  • A invocation cannot declare data windows; That is, you cannot create a time or length window on an invocation. Instead, use insert into to make join results available for further processing.

The syntax for a method, script or UDF invocation in the from clause of a statement is:

method: [class_or_variable_name.]method_script_udf_name[(parameter_expressions)] [@type(eventtype_name)]

The method keyword denotes a method, script or UDF invocation. It is followed by an optional class or variable name. The method_script_udf_name is the name of the method, script or user-defined function. If you have parameters to your method, script or UDF invocation, these are placed in parentheses after the method or script name. Any expression is allowed as a parameter, and individual parameter expressions are separated by a comma. Expressions may also use event properties of the joined stream.

In case the return type of the method is EventBean instances, you must provide the @type annotation to name the event type of events returned. Otherwise @type is not allowed.

In the sample join statement shown next, the method lookupAsset provided by class (or variable) MyLookupLib returns one or more rows based on the asset id (a property of the AssetMoveEvent) that is passed to the method:

select * from AssetMoveEvent, method:MyLookupLib.lookupAsset(assetId)

The following statement demonstrates the use of the where clause to join events to the rows returned by an invocation, which in this example does not take parameters:

select assetId, assetDesc from AssetMoveEvent as asset, 
       method:MyLookupLib.getAssetDescriptions() as desc 
where asset.assetid = desc.assetid

Your method, scipt or UDF invocation may return zero, one or many rows for each invocation. If you have caching enabled through configuration, then the runtime can avoid the invocation and instead use cached results. Similar to SQL joins, the runtime also indexes cached result rows such that join operations based on the where clause or outer-join on clause can be very efficient, especially if your invocation returns a large number of rows.

If the time taken by method, script or UDF invocations is critical to your application, you may configure local caches as Section 17.4.11, “From-Clause Method Invocation” describes.

The compiler analyzes the expression in the EPL where clause and outer-join on clause, if present, and builds the appropriate indexes from that information at runtime, to ensure fast matching of event stream events to invocation results, even if your invocation returns a large number of rows. Your applications must ensure to configure a cache for your invocation using configuration, as such indexes are held with regular data in a cache. If you application does not enable caching of invocation results, the runtime does not build indexes on cached data.

You application can provide a public static method or can provide an instance method of an existing object. The method must accept the same number and type of parameters as listed in the parameter expression list.

The examples herein mostly use public static methods. For a detail description of instance methods please see Section 5.17.5, “Class and Event-Type Variables” and below example.

If your invocation returns either no row or only one row, then the return type of the method can be a Java class, java.util.Map or Object[] (object-array). If your invocation can return more then one row, then the return type of the method must be an array of Java class, array of Map, Object[][] (object-array 2-dimensional) or Collection or Iterator (or subtypes thereof).

If you are using a Java class, an array of Java class or a Collection<Class> or an Iterator<Class> as the return type, then the class must adhere to JavaBean conventions: it must expose properties through getter methods.

If you are using java.util.Map or an array of Map or a Collection<Map> or an Iterator<Map> as the return type, please note the following:

  • Your application must provide a second method that returns event property metadata, as the next section outlines.
  • Each map instance returned by your method should have String-type keys and object values (Map<String, Object>).

If you are using Object[] (object-array) or Object[][] (object-array 2-dimensional) or Collection<Object[]> or Iterator<Object[]> as the return type, please note the following:

  • Your application must provide a second method that returns event property metadata, as the next section outlines.
  • Each object-array instance returned by your method should have the exact same array position for values as the property metadata indicates and the array length must be the same as the number of properties.

Your application method must return either of the following:

  1. A null value or an empty array to indicate an empty result (no rows).

  2. A Java object or Map or Object[] to indicate a zero (null) or one-row result.

  3. Return multiple result rows by returning either:

    • An array of Java objects.
    • An array of Map instances.
    • An array of Object[] instances.
    • An array of EventBean[] instances (requires @type).
    • A Collection of Java objects.
    • A Collection of Map instances.
    • A Collection of Object[] instances.
    • An Collection of EventBean[] instances (requires @type).
    • An Iterator of Java objects.
    • An Iterator of Map instances.
    • An Iterator of Object[] instances.
    • An Iterator of EventBean[] instances (requires @type).

As an example, consider the method 'getAssetDescriptions' provided by class 'MyLookupLib' as discussed earlier:

select assetId, assetDesc from AssetMoveEvent as asset,
       method:com.mypackage.MyLookupLib.getAssetDescriptions() as desc 
  where asset.assetid = desc.assetid

The 'getAssetDescriptions' method may return multiple rows and is therefore declared to return an array of the class 'AssetDesc'. The class AssetDesc is a POJO class (not shown here):

public class MyLookupLib {
  ...
  public static AssetDesc[] getAssetDescriptions() {
    ...
    return new AssetDesc[] {...};
  }

The example above specifies the full Java class name of the class 'MyLookupLib' class in the statement. The package name does not need to be part of the EPL if your application imports the package using the auto-import configuration through the API or XML, as outlined in Section 17.4.2, “Class and Package Imports”.

Alternatively the example above could return a Collection wherein the method declares as public static Collection<AssetDesc> getAssetDescriptions() {...} or an Iterator wherein the method declares as public static Iterator<AssetDesc> getAssetDescriptions() {...}.

Method overloading is allowed as long as overloaded methods return the same result type.

If you application has an existing object instance such as a service or a dependency injected bean then it must make the instance available as a variable. Please see Section 5.17.5, “Class and Event-Type Variables” for more information.

For example, assuming you provided a stateChecker variable that points to an object instance that provides a public getMatchingAssets instance method and that returns property assetIds, you may use the state checker service in the from-clause as follows:

select assetIds from AssetMoveEvent, method:stateChecker.getMatchingAssets(assetDesc)

Your application may return java.util.Map or an array of Map from invocations. If doing so, your application must provide metadata about each row: it must declare the property name and property type of each Map entry of a row. This information allows the compiler to perform type checking of expressions used within the statement.

You declare the property names and types of each row by providing a method that returns property metadata. The metadata method must follow these conventions:

In the following example, a class 'MyLookupLib' provides a method to return historical data based on asset id and asset code:

select assetId, location, x_coord, y_coord from AssetMoveEvent as asset,
       method:com.mypackage.MyLookupLib.getAssetHistory(assetId, assetCode) as history

A sample implementation of the class 'MyLookupLib' is shown below.

public class MyLookupLib {
  ...
  // For each column in a row, provide the property name and type
  //
  public static Map<String, Class> getAssetHistoryMetadata() {
    Map<String, Class> propertyNames = new HashMap<String, Class>();
    propertyNames.put("location", String.class);
    propertyNames.put("x_coord", Integer.class);
    propertyNames.put("y_coord", Integer.class);
    return propertyNames;
  }
... 
  // Lookup rows based on assetId and assetCode
  // 
  public static Map<String, Object>[] getAssetHistory(String assetId, String assetCode) {
    Map rows = new Map[2];	// this sample returns 2 rows
    for (int i = 0; i < 2; i++) {
      rows[i] = new HashMap();
      rows[i].put("location", "somevalue");
      rows[i].put("x_coord", 100);
      // ... set more values for each row
    }
    return rows;
  }

In the example above, the 'getAssetHistoryMetadata' method provides the property metadata: the names and types of properties in each row. The compiler calls this method once per statement to determine event typing information.

The 'getAssetHistory' method returns an array of Map objects that are two rows. The implementation shown above is a simple example. The parameters to the method are the assetId and assetCode properties of the AssetMoveEvent joined to the method. The runtime calls this method for each insert and remove stream event in AssetMoveEvent.

To indicate that no rows are found in a join, your application method may return either a null value or an array of size zero.

Alternatively the example above could return a Collection wherein the method declares as public static Collection<Map> getAssetHistory() {...} or an Iterator wherein the method declares as public static Iterator<Map> getAssetHistory() {...}.

Your application may return Object[] (object-array) or an array of Object[] (object-array 2-dimensional) from invocations. If doing so, your application must provide metadata about each row: it must declare the property name and property type of each array entry of a row in the exact same order as provided by value rows. This information allows the runtime to perform type checking of expressions used within the statement.

You declare the property names and types of each row by providing a method that returns property metadata. The metadata method must follow these conventions:

In the following example, a class 'MyLookupLib' provides a method to return historical data based on asset id and asset code:

select assetId, location, x_coord, y_coord from AssetMoveEvent as asset,
       method:com.mypackage.MyLookupLib.getAssetHistory(assetId, assetCode) as history

A sample implementation of the class 'MyLookupLib' is shown below.

public class MyLookupLib {
  ...
  // For each column in a row, provide the property name and type
  //
  public static LinkedHashMap<String, Class> getAssetHistoryMetadata() {
    LinkedHashMap<String, Class> propertyNames = new LinkedHashMap<String, Class>();
    propertyNames.put("location", String.class);
    propertyNames.put("x_coord", Integer.class);
    propertyNames.put("y_coord", Integer.class);
    return propertyNames;
  }
... 
  // Lookup rows based on assetId and assetCode
  // 
  public static Object[][] getAssetHistory(String assetId, String assetCode) {
    Object[][] rows = new Object[5][];	// this sample returns 5 rows
    for (int i = 0; i < 5; i++) {
      rows[i] = new Object[2]; // single row has 2 fields
      rows[i][0]  = "somevalue";
      rows[i][1] = 100;
      // ... set more values for each row
    }
    return rows;
  }

In the example above, the 'getAssetHistoryMetadata' method provides the property metadata: the names and types of properties in each row. The compiler calls this method once per statement to determine event typing information.

The 'getAssetHistory' method returns an Object[][] that represents five rows. The implementation shown above is a simple example. The parameters to the method are the assetId and assetCode properties of the AssetMoveEvent joined to the method. The runtime calls this method for each insert and remove stream event in AssetMoveEvent.

To indicate that no rows are found in a join, your application method may return either a null value or an array of size zero.

Alternatively the example above could return a Collection wherein the method declares as public static Collection<Object[]> getAssetHistory() {...} or an Iterator wherein the method declares as public static Iterator<Object[]> getAssetHistory() {...}.

EPL allows declaring an event type via the create schema clause and also by configuring predefined types. The term schema and event type has the same meaning in EPL.

When using the create schema syntax to declare an event type, the runtime automatically removes the event type on undeploy.

The synopsis of the create schema syntax providing property names and types is:

create [map | objectarray | json | avro | xml] schema schema_name [as] 
    (property_name property_type [,property_name property_type [,...])
  [inherits inherited_event_type[, inherited_event_type] [,...]]
  [starttimestamp timestamp_property_name]
  [endtimestamp timestamp_property_name]
  [copyfrom copy_type_name [, copy_type_name] [,...]]

The create keyword can be followed by map to instruct the compiler to represent events of that type by the Map event representation, or objectarray to denote an Object-array event type, or json to denote a JSON event type, or avro to denote an Avro event type, or xml to denote an XML event type. If neither the map or objectarray or json or avro or xml keywords are provided, the compiler default event representation applies.

After create schema follows a schema_name. The schema name is the event type name.

The property_name is an identifier providing the event property name. The property_type is also required for each property. Valid property types are listed in Section 5.17.1, “Creating Variables: The Create Variable Clause” and in addition include:

  1. Any Java class name, fully-qualified or the simple class name if imports are configured.

  2. Add left and right square brackets [] to any type to denote an array-type event property, and [][] for two-dimensional arrays.

  3. Use an event type name as a property type.

  4. The null keyword for a null-typed property.

For XML event types please check Section I.5, “Using XML-Schema Annotations with create xml schema” for information on how to declare properties.

The optional inherits keywords is followed by a comma-separated list of event type names that are the supertypes to the declared type.

The optional starttimestamp keyword is followed by a property name. Use this to tell the compiler that your event has a timestamp. The compiler checks that the property name exists on the declared type and returns a date-time value. Declare a timestamp property if you want your events to implicitly carry a timestamp value for convenient use with interval algebra methods as a start timestamp.

The optional endtimestamp keyword is followed by a property name. Use this together with starttimestamp to tell the compiler that your event has a duration. The compiler checks that the property name exists on the declared type and returns a date-time value. Declare an endtimestamp property if you want your events to implicitly carry a duration value for convenient use with interval algebra methods.

The optional copyfrom keyword is followed by a comma-separate list of event type names. For each event type listed, the compiler looks up that type and adds all event property definitions to the newly-defined type, in addition to those listed explicitly (if any). The resulting order of properties is that copied-from properties are first (in the order of event types and their property order) and explicitly-listed properties last.

A few example event type declarations follow:

// Declare type SecurityEvent
create schema SecurityEvent as (ipAddress string, userId String, numAttempts int)
			
// Declare type AuthorizationEvent with the roles property being an array of String 
// and the hostinfo property being a POJO object
create schema AuthorizationEvent(group String, roles String[], hostinfo com.mycompany.HostNameInfo)

// Declare type CompositeEvent in which the innerEvents property is an array of SecurityEvent
create schema CompositeEvent(group String, innerEvents SecurityEvent[])

// Declare type WebPageVisitEvent that inherits all properties from PageHitEvent
create schema WebPageVisitEvent(userId String) inherits PageHitEvent

// Declare a type with start and end timestamp (i.e. event with duration).
create schema RoboticArmMovement (robotId string, startts long, endts long) 
  starttimestamp startts endtimestamp endts
  
// Create a type that has all properties of SecurityEvent plus a userName property
create schema ExtendedSecurityEvent (userName string) copyfrom SecurityEvent

// Create a type that has all properties of SecurityEvent 
create schema SimilarSecurityEvent () copyfrom SecurityEvent

// Create a type that has all properties of SecurityEvent and WebPageVisitEvent plus a userName property
create schema WebSecurityEvent (userName string) copyfrom SecurityEvent, WebPageVisitEvent

To elaborate on the inherits keyword, consider the following two schema definitions:

create schema Foo as (prop1 string)
create schema Bar() inherits Foo

Following above schema, Foo is a supertype or Bar and therefore any Bar event also fulfills Foo and matches where Foo matches. A statement such as select * from Foo returns any Foo event as well as any event that is a subtype of Foo such as all Bar events. When your statements don't use any Foo events there is no cost, thus inherits is generally an effective way to share properties between types. The start and end timestamp are also inherited from any supertype that has the timestamp property names defined.

The optional copyfrom keyword is for defining a schema based on another schema. This keyword causes the compiler to copy property definitions: There is no inherits, extends, supertype or subtype relationship between the types listed.

To define an event type Bar that has the same properties as Foo:

create schema Foo as (prop1 string)
create schema Bar() copyfrom Foo

To define an event type Bar that has the same properties as Foo and that adds its own property prop2:

create schema Foo as (prop1 string)
create schema Bar(prop2 string) copyfrom Foo

If neither the map or objectarray or json or avro keywords are provided, the following rule applies:

  • If the create-schema statement provides the @EventRepresentation(objectarray) annotation the runtime expects object array events.

  • If the statement provides the @EventRepresentation(json) annotation the runtime expects JSON strings as events.

  • If the statement provides the @EventRepresentation(avro) annotation the runtime expects Avro objects as events.

  • If the statement provides the @EventRepresentation(map) annotation the runtime expects Map objects as events.

  • If neither annotation is provided, the runtime uses the configured default event representation as discussed in Section 17.4.9.1, “Default Event Representation”.

The following two statements both instructs the compiler to represent Foo events as object arrays. When sending Foo events into the runtime use the sendEventObjectArray(Object[] data, String typeName) footprint.

create objectarray schema Foo as (prop1 string)
@EventRepresentation(objectarray) create schema Foo as (prop1 string)

The next two statements both instructs the compiler to represent Foo events as Maps. When sending Foo events into the runtime use the sendEventMap(Map data, String typeName) footprint.

create map schema Foo as (prop1 string)
@EventRepresentation(map) create schema Foo as (prop1 string)

The following two statements both instructs the compiler to represent Foo events as JSON. When sending Foo events into the runtime use the sendEventJson(String json, String typeName) footprint.

create json schema Foo as (prop1 string)
@EventRepresentation(json) create schema Foo as (prop1 string)

The following two statements both instructs the compiler to represent Foo events as Avro GenericData.Record. When sending Foo events into the runtime use the sendEventAvro(Object genericDataDotRecord, String typeName) footprint.

create avro schema Foo as (prop1 string)
@EventRepresentation(avro) create schema Foo as (prop1 string)

A variant stream is a predefined stream into which events of multiple disparate event types can be inserted. Please see Section 5.10.3, “Merging Disparate Types of Events: Variant Streams” for rules regarding property visibility and additional information.

The synopsis is:

create variant schema schema_name [as] eventtype_name|* [, eventtype_name|*] [,...]

Provide the variant keyword to declare a variant stream.

The '*' wildcard character declares a variant stream that accepts any type of event inserted into the variant stream.

Provide eventtype_name if the variant stream should hold events of the given type only. When using insert into to insert into the variant stream the compiler checks to ensure the inserted event type or its supertypes match the required event type.

A few examples are shown below:

// Create a variant stream that accepts only LoginEvent and LogoutEvent event types
create variant schema SecurityVariant as LoginEvent, LogoutEvent

// Create a variant stream that accepts any event type
create variant schema AnyEvent as *

EPL offers a convenient syntax to splitting, routing or duplicating events into multiple streams, and for receiving unmatched events among a set of filter criteria.

For splitting a single event that acts as a container and expose child events as a property of itself consider the contained-event syntax as described in Section 5.19, “Contained-Event Selection”. For generating marker events for contained-events please see below.

You may define a triggering event or pattern in the on-part of the statement followed by multiple insert into, select and where clauses.

The synopsis is:

[context context_name]
on event_type[(filter_criteria)] [as stream_name]
insert into insert_into_def select select_list [where condition]
[insert into insert_into_def select select_list [from contained-event-selection] [where condition]]
[insert into insert_into_def select select_list [from contained-event-selection] [where condition]]
[insert into...]
[output first | all]

The event_type is the name of the type of events that trigger the split stream. It is optionally followed by filter_criteria which are filter expressions to apply to arriving events. The optional as keyword can be used to assign a stream name. Patterns and named windows can also be specified in the on clause.

Following the on-clause is one or more insert into clauses as described in Section 5.10, “Merging Streams and Continuous Insertion: The Insert Into Clause” and select clauses as described in Section 5.3, “Choosing Event Properties and Events: The Select Clause”.

The second and subsequent insert into and select clause pair can have a from clause for contained-event-selection. This is useful when your trigger events themselves contain events that must be processed individually and that may be delimited by marker events that you can define.

Each select clause may be followed by a where clause containing a condition. If the condition is true for the event, the runtime transforms the event according to the select clause and inserts it into the corresponding stream.

At the end of the statement can be an optional output clause. By default the runtime inserts into the first stream for which the where clause condition matches if one was specified, starting from the top. If you specify the output all keywords, then the runtime inserts into each stream (not only the first stream) for which the where clause condition matches or that do not have a where clause.

If, for a given event, none of the where clause conditions match, the statement listener receives the unmatched event. The statement listener only receives unmatched events and does not receive any transformed or inserted events. The iterator method to the statement returns no events.

You may specify an optional context name to the effect that the split-stream operates according to the context dimensional information as declared for the context. See Chapter 4, Context and Context Partitions for more information.

In the below sample statement, the runtime inserts each OrderEvent into the LargeOrders stream if the order quantity is 100 or larger, or into the SmallOrders stream if the order quantity is smaller then 100:

on OrderEvent 
  insert into LargeOrders select * where orderQty >= 100
  insert into SmallOrders select *

The next example statement adds a new stream for medium-sized orders. The new stream receives orders that have an order quantity between 20 and 100:

on OrderEvent 
  insert into LargeOrders select orderId, customer where orderQty >= 100
  insert into MediumOrders select orderId, customer where orderQty between 20 and 100
  insert into SmallOrders select orderId, customer where orderQty > 0

As you may have noticed in the above statement, orders that have an order quantity of zero don't match any of the conditions. The runtime does not insert such order events into any stream and the listener to the statement receives these unmatched events.

By default the runtime inserts into the first insert into stream without a where clause or for which the where clause condition matches. To change the default behavior and insert into all matching streams instead (including those without a where clause), add the output all keywords.

The sample statement below shows the use of the output all keywords. The statement populates both the LargeOrders stream with large orders as well as the VIPCustomerOrders stream with orders for certain customers based on customer id:

on OrderEvent 
  insert into LargeOrders select * where orderQty >= 100
  insert into VIPCustomerOrders select * where customerId in (1001, 1002)
  output all

Since the output all keywords are present, the above statement inserts each order event into either both streams or only one stream or none of the streams, depending on order quantity and customer id of the order event. The statement delivers order events not inserted into any of the streams to the listeners and/or subscriber to the statement.

The following limitations apply to split-stream statements:

  1. Aggregation functions and the prev and prior operators are not available in conditions and the select-clause.

When a trigger event contains properties that are themselves events, or more generally when your application needs to split the trigger event into multiple events, or to generate marker events (begin, end etc.) or process contained events in a defined order, you may specify a from clause.

The from clause is only allowed for the second and subsequent insert into and select clause pair. It specifies how the trigger event should get unpacked into individual events and is based on the Section 5.19, “Contained-Event Selection”.

For example, assume there is an order event that contains order items:

create schema OrderItem(itemId string)
create schema OrderEvent(orderId string, items OrderItem[])

We can tell the runtime that, for each order event, it should process in the following order:

  1. Process a single OrderBeginEvent that holds just the order id.

  2. Process all order items contained in an order event.

  3. Process a single OrderEndEvent that holds just the order id.

The EPL is:

on OrderEvent
  insert into OrderBeginEvent select orderId
  insert into OrderItemEvent select * from [select orderId, * from items]
  insert into OrderEndEvent select orderId
  output all

When an OrderEvent comes in, the runtime first processes an OrderBeginEvent. The runtime unpacks the order event and for each order item processes an OrderItemEvent containing the respective item. The runtime last processes an OrderEndEvent.

Such begin and end marker events are useful to initiate and terminate an analysis using context declaration, for example. The next two statements declare a context and perform a simple count of order items per order:

create context OrderContext 
  initiated by OrderBeginEvent as obe
  terminated by OrderEndEvent(orderId = obe.orderId)
context OrderContext select count(*) as orderItemCount from OrderItemEvent output when terminated

A variable is a scalar, object, event or set of aggregation values that is available for use in all statements including patterns. Variables can be used in an expression anywhere in a statement as well as in the output clause for output rate limiting.

Variables must first be declared or configured before use, by defining each variable's type and name. Variables can be created via the create variable syntax or declared by runtime or static configuration. Variables can be assigned new values by using the on set syntax or via the setVariableValue methods on EPVariableService. The EPVariableService also provides method to read variable values.

A variable can be declared constant. A constant variable always has the initial value and cannot be assigned a new value. A constant variable can be used like any other variable and can be used wherever a constant is required. By declaring a variable constant you enable the runtime to optimize and perform query planning knowing that the variable value cannot change.

When declaring a class-typed, event-typed or aggregation-typed variable you may read or set individual properties within the same variable.

The runtime guarantees consistency and atomicity of variable reads and writes on the level of context partition (this is a soft guarantee, see below). Variables are optimized for fast read access and are also multithread-safe.

When you associate a context to the variable then each context partition maintains its own variable value. See Section 4.8, “Context and Variables” for more information.

Your application can only undeploy the statement that created the variable after all statements using the variables are also undeployed.

The create variable syntax creates a new variable by defining the variable type and name. In alternative to the syntax, variables can also be declared in the configuration object.

The synopsis for creating a variable is as follows:

create [constant] variable variable_type [[]] variable_name [ = assignment_expression ]

Specify the optional constant keyword when the variable is a constant whose associated value cannot be altered. Your EPL design should prefer constant variables over non-constant variables.

The variable_type can be any of the following:

variable_type
	:  string
	|  char 
	|  character
	|  bool 
	|  boolean
	|  byte
	|  short 
	|  int 
	|  integer 
	|  long 
	|  double
	|  float
	|  object
	|  enum_class
	|  class_name
	|  event_type_name

Variable types can accept null values. The object type is for an untyped variable that can be assigned any value. You can provide a class name (use imports) or a fully-qualified class name to declare a variable of that Java class type including an enumeration class. You can also supply the name of an event type to declare a variable that holds an event of that type.

Append [] to the variable type to declare an array variable. A limitation is that if your variable type is an event type then array is not allowed (applies to variables only and not to named windows or tables). For arrays of primitives, specify [primitive], for example int[primitive].

The variable_name is an identifier that names the variable. The variable name should not already be in use by another variable.

The assignment_expression is optional. Without an assignment expression the initial value for the variable is null. If present, it supplies the initial value for the variable.

The EPStatement object of the create variable statement provides access to variable values. The pull API methods iterator and safeIterator return the current variable value. Listeners to the create variable statement subscribe to changes in variable value: the runtime posts new and old value of the variable to all listeners when the variable value is updated by an on set statement.

The example below creates a variable that provides a threshold value. The name of the variable is var_threshold and its type is long. The variable's initial value is null as no other value has been assigned:

create variable long var_threshold

This statement creates an integer-type variable named var_output_rate and initializes it to the value ten (10):

create variable integer var_output_rate = 10

The next statement declares a constant string-type variable:

create constant variable string const_filter_symbol = 'GE'

In addition to creating a variable via the create variable syntax, the configuration also allows adding variables. The next code snippet illustrates the use of the configuration API to declare a string-typed variable:

Configuration configuration = new Configuration();
configuration.getCommon()..addVariable("myVar", String.class, "init value");

The following example declares a constant that is an array of string:

create constant variable string[] const_filters = {'GE', 'MSFT'}

The next example declares a constant that is an array of enumeration values. It assumes the Color enumeration class was imported:

create constant variable Color[] const_colors = {Color.RED, Color.BLUE}

For an array of primitive-type bytes, specify the primitive keyword in square brackets, as the next example shows:

create variable byte[primitive] mybytes = SomeClass.getBytes()

Use the new keyword to initialize object instances (the example assumes the package or class was imported):

create constant variable AtomicInteger cnt = new AtomicInteger(1)

The runtime removes the variable if the deployment that created the variable is undeployed.

The on set statement assigns a new value to one or more variables when a triggering event arrives or a triggering pattern occurs. Use the setVariableValue methods on EPVariableService to assign variable values programmatically.

The synopsis for setting variable values is:

on event_type[(filter_criteria)] [as stream_name]
  set variable_name = expression [, variable_name = expression [,...]]

The event_type is the name of the type of events that trigger the variable assignments. It is optionally followed by filter_criteria which are filter expressions to apply to arriving events. The optional as keyword can be used to assign an stream name. Patterns and named windows can also be specified in the on clause.

The comma-separated list of variable names and expressions set the value of one or more variables. Subqueries may by part of expressions however aggregation functions and the prev or prior function may not be used in expressions.

All new variable values are applied atomically: the changes to variable values by the on set statement become visible to other statements all at the same time. No changes are visible to other processing threads until the on set statement completed processing, and at that time all changes become visible at once.

The EPStatement object provides access to variable values. The pull API methods iterator and safeIterator return the current variable values for each of the variables set by the statement. Listeners to the statement subscribe to changes in variable values: the runtime posts new variable values of all variables to any listeners.

In the following example, a variable by name var_output_rate has been declared previously. When a NewOutputRateEvent event arrives, the variable is updated to a new value supplied by the event property 'rate':

on NewOutputRateEvent set var_output_rate = rate

The next example shows two variables that are updated when a ThresholdUpdateEvent arrives:

on ThresholdUpdateEvent as t 
  set var_threshold_lower = t.lower,
      var_threshold_higher = t.higher

The sample statement shown next counts the number of pattern matches using a variable. The pattern looks for OrderEvent events that are followed by CancelEvent events for the same order id within 10 seconds of the OrderEvent:

on pattern[every a=OrderEvent -> (CancelEvent(orderId=a.orderId) where timer:within(10 sec))]
  set var_counter = var_counter + 1

A variable name can be used in any expression and can also occur in an output rate limiting clause. This section presents examples and discusses performance, consistency and atomicity attributes of variables.

The next statement assumes that a variable named 'var_threshold' was created to hold a total price threshold value. The statement outputs an event when the total price for a symbol is greater then the current threshold value:

select symbol, sum(price) from TickEvent 
group by symbol 
having sum(price) > var_threshold

This example uses a variable to dynamically change the output rate on-the-fly. The variable 'var_output_rate' holds the current rate at which the statement posts a current count to listeners:

select count(*) from TickEvent output every var_output_rate seconds

Variables are optimized towards high read frequency and lower write frequency. Variable reads do not incur locking overhead (99% of the time) while variable writes do incur locking overhead.

The runtime softly guarantees consistency and atomicity of variables when your statement executes in response to an event or timer invocation. Variables acquire a stable value (implemented by versioning) when your statement starts executing in response to an event or timer invocation, and variables do not change value during execution. When one or more variable values are updated via on set statements, the changes to all updated variables become visible to statements as one unit and only when the on set statement completes successfully.

The atomicity and consistency guarantee is a soft guarantee. If any of your application statements, in response to an event or timer invocation, execute for a time interval longer then 15 seconds (default interval length), then the runtime may use current variable values after 15 seconds passed, rather then then-current variable values at the time the statement started executing in response to an event or timer invocation.

The length of the time interval that variable values are held stable for the duration of execution of a given statement is by default 15 seconds, but can be configured via runtime settings.

The create variable syntax and the API accept a fully-qualified class name or alternatively the name of an event type. This is useful when you want a single variable to have multiple property values to read or set.

The next statement assumes that the event type PageHitEvent is declared:

create variable PageHitEvent varPageHitZero

These example statements show two ways of assigning to the variable:

// You may assign the complete event
on PageHitEvent(ip='0.0.0.0') pagehit set varPageHitZero = pagehit
// Or assign individual properties of the event
on PageHitEvent(ip='0.0.0.0') pagehit set varPageHitZero.userId = pagehit.userId

Similarly statements may use properties of class or event-type variables as this example shows:

select * from FirewallEvent(userId=varPageHitZero.userId)

Instance method can also be invoked:

create variable com.example.StateCheckerService stateChecker
select * from TestEvent as e where stateChecker.checkState(e)

A variable that represents a service for calling instance methods could be initialized by calling a factory method. This example assumes the classes were added to imports:

create constant variable StateCheckerService stateChecker = StateCheckerServiceFactory.makeService()

You can add a variable via the configuration API; an example code snippet is next:

Configuration configuration = new Configuration();
configuration.getCommon().addVariable("stateChecker", StateCheckerService.class, StateCheckerServiceFactory.makeService(), true);

Application objects can also be passed via transient configuration information as described in Section 17.7, “Passing Services or Transient Objects”.

Note

When using non-constant class or event-type variables and when your EPL intends to set property values on the variable itself (i.e. set varPageHitZero.userId), please note the following requirements. In order for the runtime to assign property values, the underlying event type must allow writing property values. If using JavaBean event classes the class must have setter methods and a default constructor. The underlying event type must also be copy-able i.e. implement Serializable or configure a copy method (only for non-constant variables and when setting property values).

Your application can declare an expression or script using the create expression clause. Such expressions or scripts become available globally to any statement.

The synopsis of the create expression syntax is:

create expression expression_or_script

Use the create expression keywords and append the expression or scripts.

Expression aliases are the simplest means of sharing expressions and do not accept parameters. Expression declarations limit the expression scope to the parameters that are passed.

The runtime may cache declared expression result values and reuse cache values, see Section 17.6.10.5, “Declared Expression Value Cache Size”.

The syntax and additional examples for declaring an expression is outlined in Section 5.2.8, “Expression Alias”, which discusses expression aliases that are visible within the same statement i.e. visible locally only.

When using the create expression syntax to declare an expression the runtime remembers the expression alias and expression and allows the alias to be referenced in all other statements.

The below EPL declares a globally visible expression alias for an expression that computes the total of the mid-price which is the buy and sell price divided by two:

create expression totalMidPrice alias for { sum((buy + sell) / 2) }

The next EPL returns mid-price for events for which the mid-price per symbol stays below 10:

select symbol, midPrice from MarketDataEvent group by symbol having midPrice < 10

The expression name must be unique among all other expression aliases and expression declarations.

Your application can provide an expression alias of the same name local to a given statement as well as globally using create expression. The locally-provided expression alias overrides the global expression alias.

The compiler validates global expression aliases at the time your application creates a statement that references the alias. When a statement references a global alias, the compiler uses the that statement's local expression scope to validate the expression. Expression aliases can therefore be dynamically typed and type information does not need to be the same for all statements that reference the expression alias.

The syntax and additional examples for declaring an expression is outlined in Section 5.2.9, “Expression Declaration”, which discusses declaring expressions that are visible within the same statement i.e. visible locally only.

When using the create expression syntax to declare an expression the compiler remembers the expression and allows the expression to be referenced in all other statements.

The below EPL declares a globally visible expression that computes a mid-price and that requires a single parameter:

create expression midPrice { in => (buy + sell) / 2 }

The next EPL returns mid-price for each event:

select midPrice(md) from MarketDataEvent as md

The expression name must be unique for global expressions. It is not possible to declare the same global expression twice with the same name.

Your application can declare an expression of the same name local to a given statement as well as globally using create expression. The locally-declared expression overrides the globally declared expression.

The compiler validates globally declared expressions at the time your application creates a statement that references the global expression. When a statement references a global expression, the compiler uses that statement's type information to validate the global expressions. Global expressions can therefore be dynamically typed and type information does not need to be the same for all statements that reference the global expression.

This example shows a sequence of EPL, that can be created in the order shown, and that demonstrates expression validation at time of referral:

create expression minPrice {(select min(price) from OrderWindow)}
create window OrderWindow#time(30) as OrderEvent
insert into OrderWindow select * from OrderEvent
// Validates and incorporates the declared global expression
select minPrice() as minprice from MarketData

The syntax and additional examples for declaring scripts is outlined in Chapter 18, Script Support, which discusses declaring scripts that are visible within the same statement i.e. visible locally only.

When using the create expression syntax to declare a script the compiler remembers the script and allows the script to be referenced in all other statements.

The below EPL declares a globally visible script in the JavaScript dialect that computes a mid-price:

create expression midPrice(buy, sell) [ (buy + sell) / 2 ]

The next EPL returns mid-price for each event:

select midPrice(buy, sell) from MarketDataEvent

The compiler validates globally declared scripts at the time your application creates a statement that references the global script. When a statement references a global script, the compiler uses that statement's type information to determine parameter types. Global scripts can therefore be dynamically typed and type information does not need to be the same for all statements that reference the global script.

The script name in combination with the number of parameters must be unique for global scripts. It is not possible to declare the same global script twice with the same name and number of parameters.

Your application can declare a script of the same name and number of parameters that is local to a given statement as well as globally using create expression. The locally-declared script overrides the globally declared script.

Contained-event selection is for use when an event contains properties that are themselves events, or more generally when your application needs to split an event into multiple events. One example is when application events are coarse-grained structures and you need to perform bulk operations on the rows of the property graph in an event.

Use the contained-event selection syntax in a filter expression such as in a pattern, from clause, subselect, on-select and on-delete. This section provides the synopsis and examples.

To review, in the from clause a contained_selection may appear after the event stream name and filter criteria, and before any data windows.

The synopsis for contained_selection is as follows:

[select select_expressions from] 
  contained_expression [@type(eventtype_name)] [as alias_name]
  [where filter_expression]

The select clause and select_expressions are optional and may be used to select specific properties of contained events.

The contained_expression is required and returns individual events. The expression can, for example, be an event property name that returns an event fragment, i.e. a property that can itself be represented as an event by the underlying event representation. The expression can also be any other expression such as a single-row function or a script that returns either an array or a java.util.Collection of events. Simple values such as integer or string are not fragments but can be used as well as described below.

Provide the @type(name) annotation after the contained expression to name the event type of events returned by the expression. The annotation is optional and not needed when the contained-expression is an event property that returns a class or other event fragment.

The alias_name can be provided to assign a name to the expression result value rows.

The where clause and filter_expression is optional and may be used to filter out properties.

As an example event, consider a media order. A media order consists of order items as well as product descriptions. A media order event can be represented as an object graph (POJO event representation), or a structure of nested Maps (Map event representation) or a XML document (XML DOM or Axiom event representation) or other custom plug-in event representation.

To illustrate, a sample media order event in XML event representation is shown below. Also, a XML event type can optionally be strongly-typed with an explicit XML XSD schema that is not shown here. Note that Map and POJO representation can be considered equivalent for the purpose of this example.

Let us now assume the event type MediaOrder as being represented by the root node <mediaorder> of such XML snip:

<mediaorder>
  <orderId>PO200901</orderId>
  <items>
    <item>
      <itemId>100001</itemId>
      <productId>B001</productId>
      <amount>10</amount>
      <price>11.95</price>
    </item>
  </items>
  <books>
    <book>
      <bookId>B001</bookId>
      <author>Heinlein</author>
      <review>
        <reviewId>1</reviewId>
        <comment>best book ever</comment>
      </review>
    </book>
    <book>
      <bookId>B002</bookId>
      <author>Isaac Asimov</author>
    </book>
  </books>
</mediaorder>

The next statement utilizes the contained-event selection syntax to return each book:

select * from MediaOrder[books.book]

The result of the above statement is one event per book. Output events contain only the book properties and not any of the mediaorder-level properties.

Note that, when using listeners, the runtime delivers multiple results in one invocation of each listener. Therefore listeners to the above statement can expect a single invocation passing all book events within one media order event as an array.

To better illustrate the position of the contained-event selection syntax in a statement, consider the next two statements:

select * from MediaOrder(orderId='PO200901')[books.book]

The above statement the returns each book only for media orders with a given order id. This statement illustrates a contained-event selection and a data window:

select count(*) from MediaOrder[books.book]#unique(bookId)

The sample above counts each book unique by book id.

Contained-event selection can be staggered. When staggering multiple contained-event selections the staggered contained-event selection is relative to its parent.

This example demonstrates staggering contained-event selections by selecting each review of each book:

select * from MediaOrder[books.book][review]

Listeners to the statement above receive a row for each review of each book. Output events contain only the review properties and not the book or media order properties.

The following is not valid:

// not valid
select * from MediaOrder[books.book.review]

The book property in an indexed property (an array or collection) and thereby requires an index in order to determine which book to use. The expression books.book[1].review is valid and means all reviews of the second (index 1) book.

The contained-event selection syntax is part of the filter expression and may therefore occur in patterns and anywhere a filter expression is valid.

A pattern example is below. The example assumes that a Cancel event type has been defined that also has an orderId property:

select * from pattern [c=Cancel -> books=MediaOrder(orderId = c.orderId)[books.book] ]

When used in a pattern, a filter with a contained-event selection returns an array of events, similar to the match-until clause in patterns. The above statement returns, in the books property, an array of book events.

The optional select clause provides control over which fields are available in output events. The expressions in the select-clause apply only to the properties available underneath the property in the from clause, and the properties of the enclosing event.

When no select is specified, only the properties underneath the selected property are available in output events.

In summary, the select clause may contain:

The next statement's select clause selects each review for each book, and the order id as well as the book id of each book:

select * from MediaOrder[select orderId, bookId from books.book][select * from review]
// ... equivalent to ...
select * from MediaOrder[select orderId, bookId from books.book][review]]

Listeners to the statement above receive an event for each review of each book. Each output event has all properties of the review row, and in addition the bookId of each book and the orderId of the order. Thus bookId and orderId are found in each result event, duplicated when there are multiple reviews per book and order.

The above statement uses wildcard (*) to select all properties from reviews. As has been discussed as part of the select clause, the wildcard (*) and property_alias.* do not copy properties for performance reasons. The wildcard syntax instead specifies the underlying type, and additional properties are added onto that underlying type if required. Only one wildcard (*) and property_alias.* (unless used with a column rename) may therefore occur in the select clause list of expressions.

All the following statements produce an output event for each review of each book. The next sample statements illustrate the options available to control the fields of output events.

The output events produced by the next statement have all properties of each review and no other properties available:

select * from MediaOrder[books.book][review]

The following statement is not a valid statement, since the order id and book id are not part of the contained-event selection:

// Invalid select-clause: orderId and bookId not produced.
select orderId, bookId from MediaOrder[books.book][review]

This statement is valid. Note that output events carry only the orderId and bookId properties and no other data:

select orderId, bookId from MediaOrder[books.book][select orderId, bookId from review]
//... equivalent to ...
select * from MediaOrder[select orderId, bookId from books.book][review]

This variation produces output events that have all properties of each book and only reviewId and comment for each review:

select * from MediaOrder[select * from books.book][select reviewId, comment from review]
// ... equivalent to ...
select * from MediaOrder[books.book as book][select book.*, reviewId, comment from review]

The output events of the next EPL have all properties of the order and only bookId and reviewId for each review:

select * from MediaOrder[books.book as book]
    [select mediaOrder.*, bookId, reviewId from review] as mediaOrder

This EPL produces output events with 3 columns: a column named mediaOrder that is the order itself, a column named book for each book and a column named review that holds each review:

insert into ReviewStream select * from MediaOrder[books.book as book]
  [select mo.* as mediaOrder, book.* as book, review.* as review from review as review] as mo
// .. and a sample consumer of ReviewStream...
select mediaOrder.orderId, book.bookId, review.reviewId from ReviewStream

Please note these limitations:

This section discusses contained-event selection in joins.

When joining within the same event it is not required to specify a data window. Recall, in a join or outer join there must be a data window specified that defines the subset of events available to be joined. For self-joins, no data window is required and the join executes against the data returned by the same event.

This statement inner-joins items to books where book id matches the product id:

select book.bookId, item.itemId 
from MediaOrder[books.book] as book, 
      MediaOrder[items.item] as item 
where productId = bookId

Statement results for the above statement when sending the media order event as shown earlier are:

book.bookIditem.itemId
B001100001

The next example statement is a left outer join. It returns all books and their items, and for books without item it returns the book and a null value:

select book.bookId, item.itemId 
from MediaOrder[books.book] as book 
  left outer join 
    MediaOrder[items.item] as item 
  on productId = bookId

Statement results for the above statement when sending the media order event as shown earlier are:

book.bookIditem.itemId
B001100001
B002null

A full outer join combines the results of both left and right outer joins. The joined table will contain all records from both tables, and fill in null values for missing matches on either side.

This example statement is a full outer join, returning all books as well as all items, and filling in null values for book id or item id if no match is found:

select orderId, book.bookId,item.itemId 
from MediaOrder[books.book] as book 
  full outer join 
     MediaOrder[select orderId, * from items.item] as item 
  on productId = bookId 
order by bookId, item.itemId asc

As in all other statements, aggregation results are cumulative from the time the statement was created.

The following statement counts the cumulative number of items in which the product id matches a book id:

select count(*) 
from MediaOrder[books.book] as book, 
      MediaOrder[items.item] as item 
where productId = bookId

The unidirectional keyword in a join indicates to the runtime that aggregation state is not cumulative. The next statement counts the number of items in which the product id matches a book id for each event:

select count(*) 
from MediaOrder[books.book] as book unidirectional, 
      MediaOrder[items.item] as item 
where productId = bookId

The examples herein are not based on the POJO events of the prior example. They are meant to demonstrate different types of contained-event expressions and the use of @type(type_name) to identify the event type of the return values of the contained-event expression.

The example first defines a few sample event types:

create schema SentenceEvent(sentence String)
create schema WordEvent(word String)
create schema CharacterEvent(char String)

The following EPL assumes that your application defined a plug-in single-row function by name splitSentence that returns an array of Map, producting output events that are WordEvent events:

insert into WordStream select * from SentenceEvent[splitSentence(sentence)@type(WordEvent)]

The example EPL shown next invokes a JavaScript function which returns some events of type WordEvent:

expression Collection js:splitSentenceJS(sentence) [
  var CollectionsClazz = Java.type('java.util.Collections');
  var words = new java.util.ArrayList();
  words.add(CollectionsClazz.singletonMap('word', 'wordOne'));
  words.add(CollectionsClazz.singletonMap('word', 'wordTwo'));
  words;
]
select * from SentenceEvent[splitSentenceJS(sentence)@type(WordEvent)]

In the next example the sentence event first gets split into words and then each word event gets split into character events via an additional splitWord single-row function, producing events of type CharacterEvent:

select * from SentenceEvent
  [splitSentence(sentence)@type(WordEvent)]
  [splitWord(word)@type(CharacterEvent)]

The update istream statement allows declarative modification of event properties of events entering a stream. Update is a pre-processing step to each new event, modifying an event before the event applies to any statements.

The synopsis of update istream is as follows:

update istream event_type [as stream_name]
  set property_name = set_expression [, property_name = set_expression] [,...]
  [where where_expression]

The event_type is the name of the type of events that the update applies to. The optional as keyword can be used to assign a name to the event type for use with subqueries, for example. Following the set keyword is a comma-separated list of property names and expressions that provide the event properties to change and values to set.

The optional where clause and expression can be used to filter out events to which to apply updates.

Listeners to an update statement receive the updated event in the insert stream (new data) and the event prior to the update in the remove stream (old data). Note that if there are multiple update statements that all apply to the same event then the runtime will ensure that the output events delivered to listeners or subscribers are consistent with the then-current updated properties of the event (if necessary making event copies, as described below, in the case that listeners are attached to update statements). Iterating over an update statement returns no events.

As an example, the below statement assumes an AlertEvent event type that has properties named severity and reason:

update istream AlertEvent 
  set severity = 'High'
  where severity = 'Medium' and reason like '%withdrawal limit%'

The statement above changes the value of the severity property to "High" for AlertEvent events that have a medium severity and contain a specific reason text.

Update statements apply the changes to event properties before other statements receive the event(s) for processing, e.g. "select * from AlertEvent" receives the updated AlertEvent. This is true regardless of the order in which your application creates statements.

When multiple update statements apply to the same event, the runtime executes updates in the order in which update statements were deployed. We recommend the @Priority EPL annotation to define a deterministic order of processing updates, especially in the case where update statements get deployed and undeployed dynamically or multiple update statements update the same fields. The update statement with the highest @Priority value applies last.

The update clause can be used on streams populated via insert into, as this example utilizing a pattern demonstrates:

insert into DoubleWithdrawalStream 
select a.id, b.id, a.account as account, 0 as minimum 
from pattern [a=Withdrawal -> b=Withdrawal(id = a.id)]
update istream DoubleWithdrawalStream set minimum = 1000 where account in (10002, 10003)

When using update istream with named windows, any changes to event properties apply before an event enters the named window. The update istream is not available for tables.

Consider the next example (shown here with statement names in @Name EPL annotation, multiple statements):

@Name("CreateWindow") create window MyWindow#time(30 sec) as AlertEvent

@Name("UpdateStream") update istream MyWindow set severity = 'Low' where reason = '%out of paper%'

@Name("InsertWindow") insert into MyWindow select * from AlertEvent

@Name("SelectWindow") select * from MyWindow

The UpdateStream statement specifies an update clause that applies to all events entering the named window. Note that update does not apply to events already in the named window at the time an application creates the UpdateStream statement, it only applies to new events entering the named window (after an application created the update statement).

Therefore, in the above example listeners to the SelectWindow statement as well as the CreateWindow statement receive the updated event, while listeners to the InsertWindow statement receive the original AlertEvent event (and not the updated event).

Subqueries can also be used in all expressions including the optional where clause.

This example demonstrates a correlated subquery in an assignment expression and also demonstrates the optional as keyword. It assigns the phone property of an AlertEvent event a new value based on the lookup within all unique PhoneEvent events (according to an empid property) correlating the AlertEvent property reporter with the empid property of PhoneEvent:

update istream AlertEvent as ae
  set phone = 
    (select phone from PhoneEvent#unique(empid) where empid = ae.reporter)

When updating indexed properties use the syntax propertyName[index] = value with the index value being an integer number. When updating mapped properties use the syntax propertyName(key) = value with the key being a string value.

When using update, please note these limitations:

The runtime delivers all result events of a given statement to the statement's listeners and subscriber (if any) in a single invocation of each listener and subscriber's update method passing an array of result events. For example, a statement using a time-batch window may provide many result events after a time period passes, a pattern may provide multiple matching events or in a join the join cardinality could be multiple rows.

For statements that typically post multiple result events to listeners the for keyword controls the number of invocations of the runtime to listeners and subscribers and the subset of all result events delivered by each invocation. This can be useful when your application listener or subscriber code expects multiple invocations or expects that invocations only receive events that belong together by some additional criteria.

The for keyword is a reserved keyword. It is followed by either the grouped_delivery keyword for grouped delivery or the discrete_delivery keyword for discrete delivery. The for clause is valid after any EPL select statement.

The synopsis for grouped delivery is as follows:

... for grouped_delivery (group_expression [, group_expression] [,...])

The group_expression expression list provides one or more expressions to apply to result events. The runtime invokes listeners and subscribers once for each distinct set of values returned by group_expression expressions passing only the events for that group. Further detail on key expressions can be found at Section 5.2.12, “Composite Keys and Array Values as Keys”.

The synopsis for discrete delivery is as follows:

... for discrete_delivery

With discrete delivery the runtime invokes listeners and subscribers once for each result event passing a single result event in each invocation.

Consider the following example without for-clause. The time batch data window collects RFIDEvent events for 10 seconds and posts an array of result events:

select * from RFIDEvent#time_batch(10 sec)

Let's consider an example event sequence as follows:


Without for-clause and after the 10-second time period passes, the runtime delivers an array of 3 events in a single invocation to listeners and the subscriber.

The next example specifies the for-clause and grouped delivery by zone:

select * from RFIDEvent#time_batch(10 sec) for grouped_delivery (zone)

With grouped delivery and after the 10-second time period passes, the above statement delivers result events in two invocations to listeners and the subscriber: The first invocation delivers an array of two events that contains zone A events with id 1 and 3. The second invocation delivers an array of 1 event that contains a zone B event with id 2.

The next example specifies the for-clause and discrete delivery:

select * from RFIDEvent#time_batch(10 sec) for discrete_delivery

With discrete delivery and after the 10-second time period passes, the above statement delivers result events in three invocations to listeners and the subscriber: The first invocation delivers an array of 1 event that contains the event with id 1, the second invocation delivers an array of 1 event that contains the event with id 2 and the third invocation delivers an array of 1 event that contains the event with id 3.

Remove stream events are also delivered in multiple invocations, one for each group, if your statement selects remove stream events explicitly via irstream or rstream keywords.

The insert into for inserting events into a stream is not affected by the for-clause.

The delivery order respects the natural sort order or the explicit sort order as provided by the order by clause, if present.

The following are known limitations:

  1. The compiler validates group_expression expressions against the output event type, therefore all properties specified in group_expression expressions must occur in the select clause.