# Hacking Stew

*([Follow this link to go back to the README file.](../README.html))*

[Stew](https://github.com/rodw/stew) is a JavaScript library, implemented in [CoffeeScript](http://coffeescript.org/).  It is primarily intended to be used in a [Node.js](http://nodejs.org/) environment.^[Although it probably wouldn't be difficult to make Stew work in a browser context, we haven't had any need for that, and so we haven't (yet) attempted to do it. Drop us a [note](https://github.com/rodw/stew/issues) if this is something you'd like to see Stew support.]

Both the (original) CoffeeScript and (generated) JavaScript files are included in the [binary distribution](https://npmjs.org/package/stew-select), so clients can use whichever they prefer.

Stew's source code is hosted at [github.com/rodw/stew](https://github.com/rodw/stew). Any [issues](https://github.com/rodw/stew/issues) or [pull-requests](https://github.com/rodw/stew/pulls) you'd like to submit are appreciated.

Stew is published under [an MIT license](../MIT-LICENSE.txt).

This document provides information that is primarily of interest to those that want to *make changes* to Stew. Most clients (users) of the Stew library will be more intersted in the [README](../README.html) file.

## How it Works

Stew is partioned into three classes: *DOMUtil*, *PredicateFactory* and *Stew*.

[***Stew***](#stew) is the real driver behind the API, parsing CSS selector expressions and collecting matching nodes from the DOM tree.

[***PredicateFactory***](#predicatefactory) defines methods that implement indvidual CSS selection rules.

[***DOMUtil***](#domutil) provides fairly generic utility methods for working with DOM structures.

We'll cover those bottom-up, from the most generic to the most specific.

### DOMUtil

[***DOMUtil***](./docco/dom-util.html) provides generic utilities for working with the DOM (Document Object Model) structure generated by [node-htmlparser](https://github.com/tautologistics/node-htmlparser).

For our purposes, the most important of these utilities is the `walk_dom` method, which implements a depth-first walk of a given DOM tree. `walk_dom` will invoke the given `visit` (callback) method for every node in the DOM.

For example, to convert a DOM structure into text, we might create a `visit` method like this:

```javascript
var buffer = "";
var visit = new function(node,node_metadata,all_metadata) {
  if(node.type === 'text') {
    buffer = buffer + node.raw;
  }
  return true;
};
```

and invoke it like this:

```javascript
domutil.walk_dom(dom,visit);
console.log("The text was");
console.log(buffer);
```

*Stew* uses `DOMUtil.walk_dom` to transverse the DOM tree.

(`node_metadata` and `all_metadata` contain metadata about the current node, and all previously visited nodes, respectively.  For example, `node_metadata.parent` contains the parent of the current node and `node_metadata.siblings` contains an array of all of `node_metadata.parent`'s children. See the comments with `dom-util.coffee` for more detail.)

See the [annotated source](./docco/dom-util.html) for more detail.

### PredicateFactory

[***PredicateFactory***](./docco/predicate-factory.html) generates predicate functions that test whether a given node matches a specific CSS selector.

For example, the "universal selector" (`*`) matches any and every "tag" node.  Here's a predicate function that implements the `*` selector:

```javascript
function universal_selector_predicate(node) {
  return node.type === 'tag';
}
```

Here's a predicate that implements a "tag" selector, selecting all tags with the type (name) `foo`:

```javascript
function foo_tag_predicate(node) {
  return node.type === 'tag' && node.name === 'foo';
}
```

*PredicateFactory* methods generate functions like these (bound to particular input parameters such as tag or attribute names).

*PredicateFactory* includes generators for each of the core CSS selectors (tag, ID, class, attribute name and attribute value) as well as combinators such as "and" (no space), "or" (`,`), "descendant" (space), "direct descendant" (`>`), "adjacent sibling" (`+`).

*Stew* uses these predicates to implement the CSS selection logic.

See the [annotated source](./docco/predicate-factory.html) for more detail.

### Stew

[***Stew***](./docco/stew.html) is the main entry point for the overall library. *Stew* parses a `String` representation of a CSS Selector, generate the appropriate predicates (using *PredicateFactory*) and then processes the DOM tree (using *DOMUtil*) to select the matching nodes.

The CSS parsing is primarily accomplished via regular expressions.  This is a multi-step process.

For example, lets assume complicated CSS expression such as:

    'div#main .sidebar ul.links li:first-child a[rel="author"][href]'

1. The expression is split into individual selectors by `_parse_selectors` using `_SPLIT_ON_WS_REGEXP`.  Naively this the same as splitting the expression on white-space characters, but we also need to take into account the use of spaces within `"quoted strings"` and `/regular expressions/` and non-whitespace delimiters like `,` or `+`.  In our example, we obtain these five tokens:

        [ 'div#main',  '.sidebar',  'ul.links',  'li:first-child', 'a[rel="author"][href]' ]

2. Each of these tokens is then parsed into one or more CSS specific selectors by `_parse_selector` using `_CSS_SELECTOR_REGEXP` (and where needed, `_ATTRIBUTE_CLAUSE_REGEXP`). For example, from the first token (`div#main`) we identify two individual predicates, one that implements "tag name is `div`" and another that implements "node id is `main`".  These two predicates are then joined by an "and" predicate.  All together, these five tokens are converted into predicates (something) like these:

    a. `div#main` becomes `and( tag_name_is_div(), node_id_is_main() )`
    b. `.sidebar` becomes `class_name_is_sidebar()`
    c. `ul.links` becomes `and( tag_name_is_ul(), class_name_is_links() )`
    d. `li:first-child` becomes `and( tag_name_is_li(), tag_is_parents_first_child() )`
    e. `a[rel="author"][href]` becomes `and( tag_name_is_a(), rel_attr_is_author(), has_href_attr() )`

3. Back in `_parse_selectors` these five predicates are joined into a "descendant selector" predicate, yielding a single predicate that returns `true` if and only if the current node matches the complete CSS expression.

CSS-Selector-implementing predicate in hand, *Stew*'s `select` method then visits every node in the DOM tree, collecting each node that matches the predicate.

See the [annotated source](./docco/stew.html) for more detail.

### Unit Tests

The `./test` directory contains unit tests for each of these types.  These tests can be executed by running

```console
make test
```

or

```console
npm test
```

The [test-coverage report](./coverage.html) identifies the lines of code^[The generated JavaScript code, not source CoffeeScript, for better or worse.] that are exerciesd by the test suite.  These report can be generated by running:

```console
make coverage
```

## How you can help.

Your contributions, [bug reports](https://github.com/rodw/stew/issues) and [pull-requests](https://github.com/rodw/stew/pulls) are greatly appreciated.

### Areas that need work.

If you're looking for areas in which to contribute, here are a few ideas:

 * Documenation and examples are *always* welcome. There are several Markdown-format files within [./docs/](https://github.com/rodw/stew/tree/master/docs) that are always in need of editing and improvement, and please feel free to plug any documentation gaps that you see.

 * New and improved unit-tests are also always welcome. You could help us ensure we've tested all the relevant parts of the CSS selector specification, or review the [test coverage report](http://heyrod.com/stew/docs/coverage.html) to identify areas that aren't currently exercised by our unit test suite.

 * Stew has a few known limitations we'd like to eliminate. See the "Limitations" section of the [README file](../README.html) for details.

 * Browser-side Stew isn't yet supported, or at least not fully tested.  This probably doesn't require substantial changes, but no one has gotten around to it just yet.

 * Run the target `make todo` to see a list of `TODO`, `FIXME` and similiar comments within the code and documenation.

### How to contribute.

We're happy to accept any help you can offer, but the following guidelines can help streamline the process for everyone.

 * You can report any bugs at [github.com/rodw/stew/issues](https://github.com/rodw/stew/issues).

    - We'll be able to address the issue more easily if you can provide an demonstration of the problem you are encountering. The best format for this demonstration is a failing unit test (like those found in [./test/](https://github.com/rodw/stew/tree/master/test)), but your report is welcome with or without that.

 * Our preferered channel for contributions or changes to Stew's source code and documenation is as a Git "patch" or "pull-request".

    - If you've never submitted a pull-request, here's one way to go about it:

        1. Fork or clone the Stew repository.
        2. Create a local branch to contain your changes (`git checkout -b my-new-branch`).
        3. Make your changes and commit them to your local repository.
        4. Create a pull request [as described here]( https://help.github.com/articles/creating-a-pull-request).

    - If you'd rather use a private (or just non-GitHub) repository, you might find [these generic instructions on creating a "patch" with Git](https://ariejan.net/2009/10/26/how-to-create-and-apply-a-patch-with-git/) helpful.

 * If you are making changes to the code please ensure that the [unit test suite](#unit-tests) still passes.

 * If you are making changes to the code to address a bug or introduce new features, we'd *greatly* appreciate it if you can provide one or more [unit tests](#unit-tests) that demonstrate the bug or exercise the new feature.

**Please Note:** We'd rather have a contribution that doesn't follow these guidelines than no contribution at all.  If you are confused or put-off by any of the above, your contribution is still welcome.  Feel free to contribute or comment in whatever channel works for you.

## Nuts and Bolts

### Run-time Depenencies

Technically Stew doesn't have any run-time dependencies. No external libraries are required.

Practically speaking, Stew depends upon [Chris Winberry's node-htmlparser](https://github.com/tautologistics/node-htmlparser). Stew assumes the structure of the DOM object passed to `select` and similiar methods is compatible with that generated by node-htmlparser.

If `node-htmlparser` is available (via a `require` call) then some (optional) `DOMUtil` methods will make use of it.

Stew makes use of several libraries to support development, documentation and testing.  These are enumerated in the `package.json` file.

### Building and Testing

**Downloading**

You can clone [Stew's Git repository](https://github.com/rodw/stew) via:

```console
git clone git@github.com:rodw/stew.git
```

You can also [download a ZIP archive of the latest source](https://github.com/rodw/stew/archive/master.zip).

**Installing**

Once you have Stew cloned into a local working directory, you can use [npm](https://npmjs.org/) to install any build-time dependencies, as follows:

```console
npm install
```

(This may take a few minutes, as some external libraries may need to be downloaded and natively compiled.)

**Testing**

Once installed, you can also run Stew's unit test suite using npm:

```console
npm test
```

If everything is working properly, you should expect to see a message like `68 tests complete (633 ms)` (although the specific numbers might be different, of course).

**Compiling the CoffeeScript files into JavaScript**

You can run

```console
npm run-script compile
```

to generate JavaScript files from the CoffeeScript files in `./lib`.

### Using Make

If you have [GNU Make](http://www.gnu.org/software/make/) installed, the best and easiest way to work with Stew's source code is using the provided makefile.

#### Installing and Testing

You can use:

```console
make install
```

and:

```console
make test
```

and:

```console
make js
```

in place of the npm equivalents above, but the makefile can help you to do much more than that.

#### Generating Documentation

**`make markdown`** will generate Stew's HTML documention from various [Markdown](http://daringfireball.net/projects/markdown/) files in the repository.  Most of these files will be written to the `./docs` directory.  Note that the Makefile uses [Pandoc](http://johnmacfarlane.net/pandoc/) to generate HTML from the Markdown sources, but in theory other Markdown processors could be used.

**`make docco`** will generate an annotated version of Stew's source code using the nifty [Docco](http://jashkenas.github.io/docco/) documentation generator.  These files will be written to `./docs/docco/`.

**`make docs`** will do both of these at once.

#### Test Coverage

**`make coverage`** will generate a report that shows which source code lines are touched (and not touched) by the test suite.  This runs the same unit tests as `make test`, but uses [JSCoverage](http://siliconforks.com/jscoverage/) to evaluate the test coverage.  The coverage report is written to `./docs/coverage.html`.

#### npm Packaging

**`make module`** will generate a package suitable for distribution via npm (into a directory called `./module`).

**`make test-module-install`** will generate the `./module` directory and then validate it by trying to install it into a temporary directory. You should expect to see `It worked!` as the last line of output.

#### Some other targets

**`make clean`** will remove various generated files.

**`make todo`** will display a list of "TODO" and related comments found in the source code.

**`make targets`** will list all available targets.
