.Rule()

A Rule object is a grammar rule that describes the syntax tree that will result from parsing some text with the rule. Rules are, in spirit, a JavaScript adaptation of EBNF notation.

Rules may be built from other rules and have an object structure that resembles the syntax tree that results when the rule's .parse() method is called.

The terminal sub-rules match the input text against a lexicon or a regular expression pattern.

Constructor

ishml.Rule()

Returns a new ishml.Rule object instance. Use of new operator is optional. Takes no argument.

Properties

Enumerable properties are of type ishml.Rule. They are created with the .snip() method, which forms a tree structure of rules, mirroring the intended syntax tree resulting from parsing.

The following non-enumerable properties set the rule's behavior when its .parse() method is called. These properties may be set directly or with the .configure() method.

.caseSensitive Boolean

Applies to terminal rules. Defaults to false. Set to true for case sensitive parsing.

.entire boolean

If set to true requires the rule to match the entire input text with no remainder in order to be considered a valid match. Defaults to false for partial matching of the input text.

.filter function

Filters the array of definitions associated with the token(s) to be processed when rule's .parse() method is called. Applies to terminal rules. Defaults to (definition)=>true. Returning true from the filter function indicates that the definition should be kept. Returning false removes the definition from the definitions array of the token in the resulting interpretation. A token that has no definitions left after filtering is consider a non-matching token for the rule.

.full Boolean

Applies to terminal rules. Defaults to false for partial matching. Set to true for full matching.

A partial match is a match of the lexicon entry's full lexeme against the initial characters of the text to be parsed, but not the other way around.

A full match matches all the characters of the text to be parsed against the lexicon entry with no characters leftover.

.greedy Boolean

Set to true to consider the longest possible array of terms fitting the rule's criteria. Only applicable when minimum and maximum are set to different values and maximum is greater than one. Applies to both terminal and non-terminal rules. Defaults to false, which generates all possible interpretations between minimum and maximum inclusively.

.keep Boolean

Includes the result of a rule's parsing in the final result of parent rule. Applies to both terminal and non-terminal rules. Defaults to true. Set to false to require the rule to parse succesfully, but skip its result.

.longest Boolean

Applies to terminal rules. Defaults to false. Set to true to return only the longest match from the lexicon. Only applicable when full is set to false for partial matching.

.maximum Integer

Sets the maximum number of times to repeat the rule. Applies to both terminal and non-terminal rules. Defaults to 1. To allow an indefinite number of repitions, set maximum to Infinity.

.minimum Integer

Sets the minimum number of times to repeat the rule.

Defaults to 1. Set minimum to 0 to make the rule optional.

.mismatch function

Modifies the rule's generated interpretations, according to the custom function assigned, in the event that the rule does not match the input text. Generally used to provide meta-data about the reason for the rule's failure. Each interpretation is passed to the function and a modified interpretation is returned. Typically, the interpretation's .valid property is set to .false and the interpretation's .gist property is modified to provide additional information. See example below.

//Example        
nounPhrase.noMatch=(interpretation)=>
{
    interpretation.gist.error=true
    interpretation.gist.errorMessage=
        `Expected end of nounPhrase. Found: "${interpretation.remainder}".`
    interpretation.valid=false
    return interpretation
}
nounPhrase.noun.noMatch=(interpretation)=>
{
    interpretation.gist.error=true
    interpretation.gist.errorMessage=
        "Expected noun. Found: "${interpretation.remainder}"
    interpretation.valid=false
    return interpretation
}
command.noMatch=(interpretation)=>
{
    interpretation.gist.error=true
    interpretation.gist.errorMessage=
        `Expected end of command. Found: "${interpretation.remainder}".`
    interpretation.valid=false
    return interpretation
}
command.verb.noMatch=(interpretation)=>
{
    interpretation.gist.error=true
    interpretation.gist.errorMessage=
        `Expected verb. Found: "${interpretation.remainder}"".`
    interpretation.valid=false
    return interpretation   
}

.mode all | any | apt

Sets parsing mode for sub-rules of a rule. Applies to non-terminal rules. Defaults to ishml.enum.all, which treats the sub-rules as part of a sequence, each of which must parse successfully in order for the parent rule to be considered successfully parsed. The syntax trees generated by the sub-rules are appended to the node generate by the parent rule.

ishml.enum.any treats each sub-rule as a choice. At least one sub-rule must parse successfully in order for the rule to parse successfully. If more than one choice parses successfully, multiple alternative interpretations are generated. The resulting sub-tree generated by the sub-rule has its root node removed and becomes the syntax tree generated by the parent rule.

ishml.enum.apt treats each sub-rule as a choice. At least one choice must parse successfully in order for the rule to parse successfully. Parsing of sub-rules stops after the first successful choice is parsed and only one interpretation is generated. The resulting sub-tree generated by the sub-rule has its root node removed and becomes the syntax tree generated by the parent rule.

.regex RegExp

Applies to terminal rules. Defaults to false for lexicon search. May be set to any regular expression. Causes the text to parsed to be matched against the regular expression without searching the lexicon for definitions. Instead {fuzzy:true} is provided as the token's definition.

.semantics function

Checks the rule's generated syntax tree for semantic correctness and optionally edit the syntax tree. Applies to non-terminal rules. Defaults to (interpretation)=>true, which accepts all interpretations as semantically correct. Returning false removes the interpretation from further consideration. Returning true allows the interperation to continue processing. Optionally, you may alter the content interpretation.gist and return the altered interpretation as alternative to returning true.

.separator RegExp

Applies to partial matching in terminal rules. Defaults to /^\s+/, one or more whitespace characters. May be set to any regular expression. For no separator, set to empty string.

Methods

.clone()

Creates a deep copy of the rule.

.configure(options)

Configures behavior of rule.

The options argument is a plain javaScript object with properties that are the same as the non-enumerable properties of ishml.Rule.

Returns the rule. This method is chainable.

.parse(tokenization)

Parses a tokenization into one or more interpretations.

If the rule contains sub-rules the parse method of each sub-rule is called recursively to build the syntax tree. If the rule has no sub-rule, the rule is a terminal rule and the next token(s) in the tokenizations will be processed.

Returns an array of interpretations.

.snip(key [, rule])

Creates a new ishml.rule instance as an enumerable property of the rule.

The key argument is the name to be used for the sub-rule and may be a string or integer. If the sub-rule is to be accessed using dot notation, the requirements for dot notation must be observed when naming the key. For convenience, spaces are automatically converted to underscores.

The rule argument is the ishml.Rule instance to be assigned to the new property. Cloning of rule is recommended, for example, command.snip("subject",nounPhrase.clone()), unless the rule is being defined recursively.

If rule is omitted, a new instance of ishml.Rule is used.

Returns the rule. This method is chainable.

See also parsing tutorial.