.. index:: ! text format, Unicode, UTF-8, S-expression, identifier, file extension, abstract syntax

Conventions
-----------

The textual format for WebAssembly :ref:`modules <module>` is a rendering of their :ref:`abstract syntax <syntax-module>` into |SExpressions|_.

Like the :ref:`binary format <binary>`, the text format is defined by an *attribute grammar*.
A text string is a well-formed description of a module if and only if it is generated by the grammar.
Each production of this grammar has at most one synthesized attribute: the abstract syntax that the respective character sequence expresses.
Thus, the attribute grammar implicitly defines a *parsing* function.
Some productions also take a :ref:`context <text-context>` as an inherited attribute
that records bound :ref:`identifers <text-id>`.

Except for a few exceptions, the core of the text grammar closely mirrors the grammar of the abstract syntax.
However, it also defines a number of *abbreviations* that are "syntactic sugar" over the core syntax.

The recommended extension for source files containing WebAssembly modules in text format is ":math:`\T{.wat}`".
Files with this extension are assumed to be encoded in UTF-8, as per |Unicode|_ (Section 2.5).


.. index:: grammar notation, notation, Unicode
   single: text format; grammar
   pair: text format; notation
.. _text-grammar:

Grammar
~~~~~~~

The following conventions are adopted in defining grammar rules for the text format.
They mirror the conventions used for :ref:`abstract syntax <grammar>` and for the :ref:`binary format <binary>`.
In order to distinguish symbols of the textual syntax from symbols of the abstract syntax, :math:`\mathtt{typewriter}` font is adopted for the former.

* Terminal symbols are either literal strings of characters enclosed in quotes: :math:`\text{module}`;
  or expressed as |Unicode|_ code points: :math:`\unicode{0A}`.
  (All characters written literally are unambiguously drawn from the 7-bit |ASCII|_ subset of Unicode.)

* Nonterminal symbols are written in typewriter font: :math:`\T{valtype}, \T{instr}`.

* :math:`T^n` is a sequence of :math:`n\geq 0` iterations  of :math:`T`.

* :math:`T^\ast` is a possibly empty sequence of iterations of :math:`T`.
  (This is a shorthand for :math:`T^n` used where :math:`n` is not relevant.)

* :math:`T^+` is a sequence of one or more iterations of :math:`T`.
  (This is a shorthand for :math:`T^n` where :math:`n \geq 1`.)

* :math:`T^?` is an optional occurrence of :math:`T`.
  (This is a shorthand for :math:`T^n` where :math:`n \leq 1`.)

* :math:`x{:}T` denotes the same language as the nonterminal :math:`T`, but also binds the variable :math:`x` to the attribute synthesized for :math:`T`.

* Productions are written :math:`\T{sym} ::= T_1 \Rightarrow A_1 ~|~ \dots ~|~ T_n \Rightarrow A_n`, where each :math:`A_i` is the attribute that is synthesized for :math:`\T{sym}` in the given case, usually from attribute variables bound in :math:`T_i`.

* Some productions are augmented by side conditions in parentheses, which restrict the applicability of the production. They provide a shorthand for a combinatorial expansion of the production into many separate cases.

.. _text-syntactic:

* A distinction is made between *lexical* and *syntactic* productions. For the latter, arbitrary :ref:`white space <text-space>` is allowed in any place where the grammar contains spaces. The productions defining :ref:`lexical syntax <text-lexical>` and the syntax of :Ref:`values <text-value>` are considered lexical, all others are syntactic.

.. note::
   For example, the :ref:`textual grammar <text-valtype>` for :ref:`value types <syntax-valtype>` is given as follows:

   .. math::
     \begin{array}{llcll@{\qquad\qquad}l}
     \production{value types} & \Tvaltype &::=&
       \text{i32} &\Rightarrow& \I32 \\ &&|&
       \text{i64} &\Rightarrow& \I64 \\ &&|&
       \text{f32} &\Rightarrow& \F32 \\ &&|&
       \text{f64} &\Rightarrow& \F64 \\
     \end{array}

   The :ref:`textual grammar <text-limits>` for :ref:`limits <syntax-limits>` is defined as follows:   

   .. math::
      \begin{array}{llclll}
      \production{limits} & \Tlimits &::=&
        n{:}\Tu32 &\Rightarrow& \{ \LMIN~n, \LMAX~\epsilon \} \\ &&|&
        n{:}\Tu32~~m{:}\Tu32 &\Rightarrow& \{ \LMIN~n, \LMAX~m \} \\
      \end{array}

   The variables :math:`n` and :math:`m` name the attributes of the respective |Tu32| nonterminals, which in this case are the actual :ref:`unsigned integers <syntax-uint>` those parse into.
   The attribute of the complete production then is the abstract syntax for the limit, expressed in terms of the former values.


.. index:: ! abbreviations, rewrite rule
.. _text-abbreviations:

Abbreviations
~~~~~~~~~~~~~

In addition to the core grammar, which corresponds directly to the :ref:`abstract syntax <syntax>`, the textual syntax also defines a number of *abbreviations* that can be used for convenience and readability.

Abbreviations are defined by *rewrite rules* specifying their expansion into the core syntax:

.. math::
   \X{abbreviation~syntax} \quad\equiv\quad \X{expanded~syntax}

These expansions are assumed to be applied, recursively and in order of appearance, before applying the core grammar rules to construct the abstract syntax.


.. index:: ! identifier context, identifier, index, index space
.. _text-context-wf:
.. _text-context:

Contexts
~~~~~~~~

The text format allows to use symbolic :ref:`identifiers <text-id>` in place of :ref:`indices <syntax-index>`.
To resolve these identifiers into concrete indices,
some grammar production are indexed by an *identifier context* :math:`I` as a synthesized attribute that records the declared identifiers in each :ref:`index space <syntax-index>`.
In addition, the context records the types defined in the module, so that :ref:`parameter <text-param>` indices can be computed for :ref:`functions <text-func>`.

It is convenient to define identifier contexts as :ref:`records <notation-record>` :math:`I` with abstract syntax as follows:

.. math::
   \begin{array}{llll}
   \production{(identifier context)} & I &::=&
     \begin{array}[t]{l@{~}ll}
     \{ & \ITYPES & (\Tid^?)^\ast, \\
        & \IFUNCS & (\Tid^?)^\ast, \\
        & \ITABLES & (\Tid^?)^\ast, \\
        & \IMEMS & (\Tid^?)^\ast, \\
        & \IGLOBALS & (\Tid^?)^\ast, \\
        & \ILOCALS & (\Tid^?)^\ast, \\
        & \ILABELS & (\Tid^?)^\ast, \\
        & \ITYPEDEFS & \functype^\ast ~\} \\
     \end{array}
   \end{array}

For each index space, such a context contains the list of :ref:`identifiers <text-id>` assigned to the defined indices.
Unnamed indices are associated with empty (:math:`\epsilon`) entries in these lists.

An identifier context is *well-formed* if no index space contains duplicate identifiers.


Conventions
...........

To avoid unnecessary clutter, empty components are omitted when writing out identifier contexts.
For example, the record :math:`\{\}` is shorthand for an :ref:`identifier context <text-context>` whose components are all empty.


.. index:: vector
   pair: text format; vector
.. _text-vec:

Vectors
~~~~~~~

:ref:`Vectors <syntax-vec>` are written as plain sequences, but with a restriction on the length of these sequence.

.. math::
   \begin{array}{llclll@{\qquad\qquad}l}
   \production{vector} & \Tvec(\T{A}) &::=&
     (x{:}\T{A})^n &\Rightarrow& x^n & (\iff n < 2^{32}) \\
   \end{array}
