.Lexicon()

A Lexicon stores a collection of tokens.

Constructor

ishml.Lexicon()

Returns an instance of the Lexicon object. Use of new operator is optional. Takes no argument.

Used by ishml.Parser.analyze() and ishml.Rule.parse().

Methods

.register(lexeme...).as(definition)

Adds tokens to the lexicon.

The lexeme is the string of characters to be matched. More than one lexeme may be specified for the same definition. The same lexeme may have multiple definitions, but only one definition is permitted for each call of the register method.

The definition may be a simple value or a complex object. It is an arbitrary payload to be retrieved when the lexeme is matched. A definition typically holds one or more references to objects and functions defined elsewhere in the application.

Returns the instance of ishml.Lexicon. This method is chainable.

.search(lexeme[, options])

Searches for full and partial matches in the lexicon.

Returns an array of search results. Each search result is a plain JavaScript object with a token property containing the matching token and, a remainder property containing the remaining unmatched string of characters from the lexeme argument.

The lexeme argument is a string of characters to be matched against the entries in the lexicon.

The options argument is a plain javaScript object with properties listed below that override the default behavior of search.

.caseSensitve Boolean

Defaults to false. Set to true for case sensitive searches.

.full Boolean

Defaults to false for partial matching. Set to true for full matching.

A partial match is a match of the lexicon entry's full lexeme against the initial characters of the lexeme argument, but not the other way around.

A full match matches all the characters in the lexeme argument against the lexicon entry with no characters leftover.

.lax Boolean

Applies to partial matching. Defaults to false. Set to true to return partial matches even if the next character in the lexeme argument does not match the separator or end of string.

.longest Boolean

Defaults to false. Set to true to return the longest match. Only applicable when full is set to false for partial matching.

.regex RegExp

Defaults to false. May be set to any regular expression. Causes the text supplied in the lexeme argument to be matched against the regular expression without searching the lexicon for definitions. Instead {fuzzy:true} is provided as the tokens definition.

.separator RegExp

Applies to partial matching. Defaults to /^\s+/, whitespace. May be set to any regular expression. For no separator, set to empty string. When lax is set to false, a potential partial match will only be considered a match if the next character in the lexeme argument matches the separator or is the end of string.

.unregister(lexeme[,definition])

Removes a definition from a token in the lexicon.

The lexeme argument is a string of characters identifying a token in the lexicon.

The definition argument is a JavaScript object matching the original definition under which the lexeme was registered. The function does a shallow comparison of the properties and values of the definition argument to the definition stored in the lexicon. If they are found to be equal, the definition in the lexicon is deleted.

If no definition argument is provided, all definitions connected with the lexeme argument are removed from the lexicon, which, in essence, deletes the token.

Returns the instance object of method. This method is chainable.

See also parsing tutorial.