# scheduler

This library exposes two main classes: Scheduler and Processor. The purpose of the library is to schedule jobs for execution on a schedule, similarly to how one would use cronjobs, but without the need to use externally scheduled cronjobs (because we usually want a run-once guarantee). Because it uses no externally scheduled cronjobs, dev env is more similar to remote envs.
Also, this can be used for workload scaling as these jobs are basically a job queue where you can have as many parallel
Processors as you want.

A Scheduler has methods to schedule Jobs which can have custom handlers.
Meaning, you can schedule a function to be called one-time in the future or a repeating schedule,
and have this information persisted in DB so that you have guarantees that:

- The job will be executed by calling the handler you passed
- The job will still be executed if the service instance you scheduled on goes offline
- It will be executed (almost) at the specified time. There is no to-the-millisecond guarantee.
- There will be no duplicate executions

It _does not_ guarantee that:

- The job will be executed exactly at the specified time as it depends on how often Processors check for new jobs
- It cannot guarantee that job A scheduled a few ms before job B will be executed before B as one Processor might work faster than another

Each job has a `method` property. A Processor picks up jobs from the database and executes the `handle` function registered under that `method`. Each processor can either be in charge of processing jobs for exactly one method, or all jobs regardless of method. The number of processors define the level of parallelism that a method supports, so in this way the user of the library has full control of scaling for all or specific methods.

## Usage

### DB setup

Wherever you are configuring typeorm, import `entities` and `migrations` from the lib and pass them to the orm configuration.

```
import { DataSource } from 'typeorm'
import {
  entities as schedulerEntities,
  migrations as schedulerMigrations,
} from '@minka/scheduler'

const MyDataSource = new DataSource({
  type: 'postgres',
  ...
  entities: [
    ...[MyEntity1, MyEntity2],
    ...schedulerEntities,
  ],
  migrations: [
    [MyMigration1, MyMigration2],
    ...schedulerMigrations,
  ],
  ...
})
```

### Configuration

Somewhere in your service, usually while setting up the service, import `configure` from this lib and call it by giving it the data source you have configured for typeorm, and the handlers you wish to register for each job `method` (the handler that will be called for the job depends on the `method` of the job). It will then be able to use this data source to manage jobs from the `job` table.

If you wish to run jobs on a recurring basis, or retry failed jobs, this functionality is recommended to be implemented in the custom job `method` handler itself. There you might wish to reschedule by calling `.schedule` on a `Scheduler` in case you catch errors or simply wish to repeat the job some other time.

Since each job has a `params` custom object, you can store any data you need regarding recurrence in there.

Example from a service bootstrap file:

```
    import { ContextualLogger } from '@minka/logger'
    import {
      Processor,
      Scheduler,
      configure as configureScheduler,
    } from '@minka/scheduler'

    const jobHandlers = {
      callUrl: {
        handle: (job) => {
          console.log(`Calling url ${job.params['url']}`)
          const data = ... // try catch logic with axios etc.
          return {
            success: true,
            data,
          }
        },
        // if a `callUrl` job times out due to never finishing or the server
        // instance crashing, Scheduler will internally schedule a job that
        // will cause `handleTimeout` to be called.
        // the `callUrl` job will remain in status `timed-out` while the new job
        // will reference the `callUrl` job's id in `params.originalJobId`
        // and the result of the new internal job will be whatever `handleTimeout` returns
        // the `job`'s `params` will also contain all the `params` of the original `callUrl` job
        handleTimeout: (job) => {
          const timedOutJobId = job.params.originalJobId

          // define custom on timeout logic here
          // such as rescheduling, cleanup, etc.
          // if rescheduling, beware of endless timeout loops
          return { data: {}, success: true }
        },
      },
    }

    // if you want the scheduler to log,
    // pass it a function that can return an ILogger
    const getSchedulerLogger = () => {
      const schedulerLogger = new ContextualLogger()
      schedulerLogger.appendInstancePrefix(`Ledger scheduler`)
      return schedulerLogger
    }
    configureScheduler({
      postgresDataSource,
      jobHandlers,
      getLogger: getSchedulerLogger,
    })

    const scheduler = new Scheduler()
    // example job just to see that processor picks it up
    await scheduler.schedule('callUrl', { url: 'http//www.google.com' }, new Date())

    const processor = new Processor()
    processor.start()
```

### Timeout cleanup

The scheduler defines a default (it is configurable) timeout of 30 minutes for jobs. This means that a job stuck in `running` status for longer than that will have its status marked as `timed-out`. Also, if the job had a `handleTimeout` function
defined in its handler, the scheduler will schedule an internally defined job where the `handleTimeout` will be invoked
as the internal job's `handle` function.

If this also ends up timed out, _there will be no_ timeout handler for the timeout itself, a job only gets one chance to perform post-timeout actions. You _can_ schedule another job in the `handleTimeout` function, but please exercise caution as this could end up being an endless loop.

## Running unit tests

Run `nx test scheduler` to execute the unit tests via [Jest](https://jestjs.io).

## Running lint

Run `nx lint scheduler` to execute the lint via [ESLint](https://eslint.org/).