## Table of contents

-   [Introduction](#introduction)
    -   [Summary vs snapshot](#summary-vs-snapshot)
-   [Why do we need summaries?](#why-do-we-need-summaries)
-   [Who generates summaries?](#who-generates-summaries)
-   [When are summaries generated?](#when-are-summaries-generated)
-   [How are summaries generated?](#how-are-summaries-generated)
    -   [Summary Lifecycle](#summary-lifecycle)
    -   [Single-commit vs two-commit summaries](#single-commit-vs-two-commit-summaries)
    -   [Incremental summaries](#incremental-summaries)
    -   [Resiliency](#resiliency)
-   [What does a summary look like?](#what-does-a-summary-look-like)

## Introduction

This document provides a conceptual overview of summarization. It describes what summaries are, how / when they are generated, and what they look like. The goal is for this to be an entry point into summarization for users and developers alike.

### Summary vs snapshot

The terms summary and snapshot are sometimes used interchangeably. Both represent the state of a container at a point in time. They differ in some respects which are described in [this FAQ](https://fluidframework.com/docs/faq/#summarization).

## Why do we need summaries?

A 'summary' captures the state of a container at a point in time so that future clients can start from this point. Without it, a client would have to apply every operation in the op log, even if those operations (hereafter ops) no longer affected the current state (e.g. op 1 inserts 'h' and op 2 deletes 'h'). For large op logs, this would be very expensive for clients to both download from the service and to process them.
Instead, when a client opens a collaborative document, it downloads the latest snapshot of the container, and simply process new operations from that point forward.

## Who generates summaries?

Summaries can be generated by any client connected in "write" mode. They are generated by a separate non-interactive client called the summarizer client. Using a separate client is an optimization - this client doesn't have to take local changes into account which can make the summary process more complicated.
A summarizer client is like any other client connected to the document except that users cannot interact with this client, and it only works on the state it receives from other clients. It has a brand-new container with its own connection to services.
All the clients connected to the document participate in a process called "summary client election" to elect a "parent summarizer" client. Typically, it's the oldest "write" client connected to the document. The parent summarizer client spawns a "summarizer" client which is responsible for summarization.

Note: If the summarizer client closes, the "summary client election" process will choose a new one, if there are eligible clients.

## When are summaries generated?

The summarizer client periodically generates summary based on heuristics calculated based on configurations such as the number of user or system operations received, the amount of time a client has been idle (hasn't received any ops), the maximum time since last summary, maximum number of ops since last summary, etc. The heuristic configurations are defined by an `ISummaryConfigurationHeuristics` interface defined in [containerRuntime.ts in the container-runtime package][container-runtime].

The summarizer client uses a default set of configurations defined by `DefaultSummaryConfiguration` in [containerRuntime.ts in the container-runtime package][container-runtime]. These can be overridden by providing a new set of configurations as part of container runtime options during creation.

## How are summaries generated?

When summarization process is triggered, every object in the container's object tree that has data to be summarized is asked to generate its summary, starting at the container runtime which is at the root. There are various objects that participate in the summary process and generate its summary such as data stores, DDSes, garbage collector, blob manager, id compressor, etc. Note that the user data is in the DDSes.

### Summary Lifecycle

The lifecycle of a summary starts when a "parent summarizer" client is elected.

-   The parent summarizer spawns a non-interactive summarizer client.
-   The summarizer client periodically starts a summary as per heuristics. A summary happens at a particular sequence number called the "summary sequence number" or reference sequence number for the summary.
-   The container runtime (hereafter runtime) generates a summary tree (described in the ["What does a summary look like?"](#what-does-a-summary-look-like) section below).
-   The runtime uploads the summary tree to the Fluid storage service which returns a handle (unique id) to the summary if the upload is successful. Otherwise, it returns a failure. The runtime also includes the handle of the last successful summary. If this information is incorrect, the service will reject this summary. This is done to ensure that [incremental summaries](#incremental-summaries) are correct.
-   The runtime submits a "summarize" op to the Fluid ordering service containing the uploaded summary handle and the summary sequence number.
-   The ordering service stamps it with a sequence number (like any other op) and broadcasts the summarize op. This creates a record in the op log that a summary was submitted and it lets other clients know about it. Non-summarizer clients don't do anything with the summary op. The summarizer client that submitted it processes it and waits for a summary ack / nack. Future summarizer clients also process them and validates that a corresponding summary ack / nack is received.
-   The ordering service then responds to the summarize op:
    -   If the summary is accepted, it sends a "summary ack" with the summary sequence number and a summary handle.
    -   If the summary is rejected, it sends a "summary nack" with the details of the summary op.
-   The runtime processes the summary ack or nack completes the summary process as success or failure accordingly.
    -   If the summary is successful, the handle in the ack becomes the last successful summary's handle which is used when upload summaries as described earlier.
    -   If the summary failed, the summarizer client closes and the summary election process starts to elect a new one.
-   The runtime has a timeout called "maxAckWaitTime" and if the summary op, ack or nack is not received within this time, it will fail this summary.

### Incremental summaries

Summaries are incremental, i.e., if an object (or node) did not change since the last summary, it doesn't have to re-summarize its entire contents. Fluid supports the concept of a summary handle defined in [summary.ts in the protocol-definitions package][summary-protocol]. A handle is a path to a subtree in a snapshot and it allows objects to reference a subtree in the previous snapshot, which is essentially an instruction to storage to find that subtree and populate into new summary.

So, say that a data store or DDS did not change since the last summary, it doesn't have to go through the whole summary process described above. It can instead return an ISummaryHandle with path to its subtree in the last successful summary. The same applies to other types of content like a single content blob within an object's summary tree.

For incremental summary, objects diff their content against the last summary to determine whether to send a summary handle. So, it's crucial that the last summary information be correct or else the summary will be incorrect. So, during upload, the last summary's handle is also sent and the service will validate that it's correct.

### Resiliency

The summarization process is designed to be resilient - A document will eventually summarize and make progress even if there are intermittent failures or disruptions. Some examples of steps taken to achieve this:

-   Last summary - Usually, if the "parent summarizer" client disconnects or shuts down, the "summarizer" client also shuts down and the summarizer election process begins. However, if there a certain number of un-summarized ops, the summarizer client will perform a "last summary" even if the parent shuts down. This is done to make progress in scenarios where new summarizer clients are closed quickly because the parent summarizer keeps disconnecting repeatedly.
-   Retries - The summarizer has a retry mechanism which can identify certain types of intermittent failures either in the client or in the server. It will retry the summary attempt for these failures a certain number of times. This helps in cases where there are intermittent failures such as throttling errors from the server which goes away after waiting for a while.

## What does a summary look like?

The format of summaries (and snapshots) is described in [summary and snapshot formats](./summaryFormats.md).

[container-runtime]: ../../src/containerRuntime.ts
[summary-protocol]: /common/lib/protocol-definitions/src/summary.ts
