# JupiterOne Graph Data Model

- [Data Model Guide](https://community.askj1.com/kb/articles/846-jupiterone-data-model)

The **JupiterOne Graph Data Model** describes a set of common classifications
for data found in an organization's set of digital assets, as well as common
property names and relationships.

The model does not represent a strict requirement for data stored in the
JupiterOne graph. It is acceptable and common to include many additional
properties on any class of entity or relationship, when those properties provide
value for querying and reporting. It is however strongly recommended that
similar data use common class and property names where possible.

The value is realized when writing queries, or using queries others have
written, and when viewing a list of similar assets from any number of external
systems. For example, `find Host with ipAddress="192.168.10.23"` depends on the
data model, which works whether the asset is in AWS, Azure, GCP, or detected by
an on-prem scanner, or is a machine in the classic sense or a serverless
function. The list of results would have some common property names no matter
what a value is labeled in external systems.

Though the data model is not a strict schema, [there are schemas](/src/schemas)
which serve to communicate the data model and are used in JupiterOne UIs to
support entity creation, editing, and visualization. Additionally, integrations
are encouraged to generate entities and relationships that conform to the
schemas to help to drive the advancement of the data model and provide
consistency in the data we ingest. See the
[Integration SDK](https://github.com/JupiterOne/integration-sdk) for functions
that make this easy to do.

## Entities and Relationships

The data model is built for a knowledge graph -- entities and relationships, or
nodes and edges -- that reflects the stateful representation of the cyber
infrastructure and digital operations of an organization.

The schema for each entity and relationship describes a collection of common
attributes for that specific abstract class, along with graph object metadata as
described in [`GraphObject.json`](/src/schemas/GraphObject.json).

The data model combines the benefit of having vendor/provider specific
attributes together with abstract/normalized attributes. The vendor/provider
specific attributes are dynamically assigned and not defined by the data model.

## The Concept of `_type` and `_class`

Each entity represents an actual operating element (a "thing") that is part of
an organization's cyber operations or infrastructure. This "thing" can be either
physical or logical.

The metadata attributes `_type` and `_class` are used to define what the asset
is:

- `_type`: The value is a single string typically in the format of
  `${vendor}_${resource}` or `${vendor}_${product}_${resource}` in `snake_case`.

  > For example: `aws_instance`, `google_cloud_function`, `apple_tv`,
  > `sentinelone_agent`

  It is important to note that in some cases, `${vendor}_${resource}` may not be
  ideal or feasible.

  For example, we may have directory data that comes in from an HR integration
  such as BambooHR or Rippling. The `Person` entity being created should have
  `_type: 'employee'` or `_type: 'contractor'` rather than
  `_type: 'bamboohr_employee'` or `_type: 'bamboohr_contractor'`.

  Another exception is data that comes from an integration with another ITSM,
  asset discovery tool, device management tool, or CMDB. While a system might be
  a good "source of truth" or "system of record," they are not the actual vendor
  of those devices.

  - If a server or application is ingested from **ServiceNow**, the `_type`
    should _not_ be `servicenow_server` or `servicenow_application`.

  - If a Cisco switch is ingested from **Rumble** or **Netbox**, the `_type`
    should be `cisco_switch` instead of `rumble_asset` or `netbox_device`.

  - If a smartphone/mobile device is managed by Google Workspace and ingested
    via the integration, the `_type` for the device should _not_ be
    `google_mobile_device` because the device could be an Apple iPhone and it
    would be very confusing to call an iPhone a Google mobile device. Instead,
    it should be `apple_iphone` when the type of device is known or a generic
    value of `mobile_device`.

- `_class`: The value is a string or string array in `TitleCase` using a generic
  IT or Security term to describe the higher level category of the asset.

  > These are defined in [`src/schemas`](src/schemas).

## Versioning this package

Versioning and publishing are automated. Don't bump `package.json`, don't run
`npm version`, and don't run `npm publish` by hand. Just commit your changes
with conventional-commit messages (`feat: ...`, `fix: ...`, etc.) and merge to
main.

On merge, the `Monorepo Release` workflow runs `nx release`, which:

1. Computes the next version from the conventional commits since the last
   `@jupiterone/data-model@*` tag.
2. Updates `package.json` and `CHANGELOG.md` in a `Release affected projects
   [skip ci]` commit.
3. Creates a git tag and a GitHub Release named `@jupiterone/data-model@<x.y.z>`.

Creating the GitHub Release triggers `Monorepo Deploy`, whose
`release-npm-packages` job (gated on this package's `type:library` /
`scope:public` NX tags) builds the package and runs `npm publish`.

### Common pitfalls

- **Don't pre-bump `package.json` in your feature PR.** If `nx release`
  finds `package.json` already at the version it would compute, it produces
  no version commit, no tag, and no GitHub Release — so nothing publishes.
- **Don't edit `CHANGELOG.md` by hand.** It's regenerated by `nx release`.
- If a release was missed because of one of the above, recover by creating
  the missing GitHub Release manually
  (`gh release create '@jupiterone/data-model@<x.y.z>' --target main`); that
  fires the deploy workflow and publishes to npm the same way an automated
  release would.
