WSO2 Complex Event Processor is succeeded by WSO2 Stream Processor. To view the latest documentation for WSO2 SP, see WSO2 Stream Processor Documentation.
||
Skip to end of metadata
Go to start of metadata

This extension provides Natural Language Processing capabilities to Siddhi. Functions of the NLP extension are as follows.

Find Name Entity Type function

Syntax<string> nlp:findNameEntityType(<string> entityType, <bool>  groupSuccessiveMatch, <string> string-variable )
Extension TypeFunction
Description

This function uses the following input parameters.

  • entityType: This is a user-specified string constant. e.g., PERSONLOCATIONORGANIZATIONMONEYPERCENTDATE or TIME
  • groupSuccessiveMatch: This is a user-specified boolean constant used to group successive matches of the specified entityType and a text stream.
  • streamAttribute: A string or the stream attribute in which text stream is included.

This function returns the entities in the text. If you specify group successive matches as true, the result aggregates successive words of the same entity type.

Example

findNameEntityType("PERSON",true,text)

In the above example, if the text attribute contains "Bill Gates donates £31million to fight Ebola", the result is Bill Gates. If the group successive match is set to false, two events are generated as Bill and Gates.


Find Name Entity Type Via Dictionary function

Syntax<string> nlp:findNameEntityTypeViaDictionary(<string> entityType, <string> dictionaryFilePath,  <string> string-variable )
Extension TypeFunction
Description

This function uses the following input parameters.

  • entityType: This is a user-specified string constant. e.g., PERSONLOCATIONORGANIZATIONMONEYPERCENTDATE or TIME
  • dictionaryFilePath: The path to the dictionary in which the function searches for the specified entries. The relevant entries for the entity types should be available in the dictionary as shown in the example below.
  • streamAttribute: A string or the stream attribute in which text stream is included.

This function returns the entities in the text. If you specify group successive matches as true, the result aggregates successive words of the same entity type.

Example

findNameEntityTypeViaDictionary("PERSON","dictionary.xml",text)

In the above example, if the text attribute contains "Bill Gates donates £31million to fight Ebola", and the dictionary consists of the above entries (i.e. entries of the example in theDescription), the result is "Bill".


Find Relationship By Verb function

Syntax<string > text, <string> subject,  < string > object, < string >  verb   nlp:findRelationshipByVerb (<string> verb, <string> string-variable )
Extension TypeFunction
Description

findRelationshipByVerb takes in a user specified string constant as a verb and a text stream, and returns the whole text, subject, object and the verb based on the specified verb. This information can be extracted only if the verb specified exists in the text stream. However, the tense of the verb does not have to match.

The input parameters used are as follows.

  • verb: This is a user specified string constant.
  • string-variable: A string or the stream attribute which includes the text stream.
Examples

findRelationshipByVerb("say", "Information just reaching us says another Liberian With Ebola Arrested At Lagos Airport") returns the following.

  • The whole text
  • Information as the subject
  • Liberian as the object.
  • says as the verb.


Find Relationship By Regex function

Syntax<string > text, <string> subject,  < string > object, < string >  verb   nlp:findRelationshipByRegex (<string> regex, <string> string-variable )
Extension TypeFunction
DescriptionThis function returns the whole text, subject, object and verb from the text stream that matches the named nodes of the Semgrex pattern.
Example

findRelationshpByRegex('{}=verb >/nsubj|agent/ {}=subject >/dobj/ {}=object', "gates foundation donates $50M in support of #Ebola relief")returns the following.

  • The whole text
  • "foundation" as the subject
  • "$" as the object
  • "donates" as the verb


Find Semgrex Pattern function

Syntax<string > text, <string> match,  < string > object, < string >  verb   nlp:findSemgrexPattern (<string> regex, <string> string-variable )
Extension TypeFunction
Description

The findSemgrexPattern function returns the whole text, subject, object and verb from the text stream that matches the named nodes of the Semgrex pattern.

This function uses the following input parameters.

  • regex: A user specified regular expression that matches the Semgrex pattern syntax.
  • string-variable: A string or the stream attribute which includes the text stream.
Example

findSemgrexPattern('{lemma:die} >/.*subj|num.*/=reln {}=diedsubject', "Sierra Leone doctor dies of Ebola after failed evacuation.")

In this example, the function searches for words with the lemmatization die that are governors on any subject or numeric relation. The dependent is marked as the diedsubject, and the relationship is marked as reln. Thus, the query returns an output stream that has the full match of this expression, i.e. the governing word with lemmatization for die. It also returns the name of the corresponding node for each match it finds.

The following is the list of elements in the output stream.

  • The whole text
  • dies as the match
  • "nsubj" as reln
  • doctor asdiedsubject


Find Tokens Regex Pattern function

Syntax< string > text, <string> match, <string>  group_1, etc.   nlp:findTokensRegexPattern (<string> regex, <string> string-variable )
Extension TypeFunction
Description

findTokensRegexPattern returns the whole text, subject, object and verb from the text stream that matches the named nodes of the Semgrex pattern. The return also includes the corresponding node in the Semgrex pattern and the corresponding named relation defined in the regular expression for each word/phrase.

This function uses the following input parameters.

  • regex: A user specified regular expression that matches the Semgrex pattern syntax.
  • string-variable: A string or the stream attribute which includes the text stream.
Example

findTokensRegexPattern('([ner:/PERSON|ORGANIZATION|LOCATION/]+) (?:[]* [lemma:donate]) ([ner:MONEY]+)', text) defines three groups:

  • The first group looks for words that are entities of  either PERSONORGANIZATON or LOCATION with one or more successive words matching same.
  • The middle group is defined as the non capturing group.
  • Third looks for one or more successive entities of type MONEY.

This function returns the following.

  • The whole text
  • " Paul Allen donates $ 9million " as the match.
  • " Paul Allen", as group_1.
  • "$ 9million" as group_2.
  • No labels