Datt P2P Protocol
=================

The goal of this document is to design a minimalist p2p protocol for the datt
prototype. If it turns out there is another protocol that already exists that
solves our problem, it would be better to leverage that. However, such
protocols may either not exist, or may not solve the problem we have. This
document can serve as a reference for what a protocol would need to do to solve
our problem.

Note that this protocol is nothing like bittorrent, and is a lot more like
bitcoin. This does not describe a protocol intended for use with a DHT. It is
not assumed there is global consensus about what content is on the network.
Each node can and will have a different set of content, although with a lot of
overlap between nodes. In order to download all content, a user would want to
connect to multiple nodes.

First, let's discuss the 'high level' protocol, which will be layered on top of
'low level' protocols such as TCP, web sockets and web RTC.

## Message overview

Basic messages:
- initiation message - send connection info
- init ack - send connection info
- get peers - get other peers' connection info
- send peers - send your own peers' connection info
- post data - contains data to be posted
- get data - contains hash of data to be received
- get datas - contains multiple hashes of data to be received
- get inv - get inventory of what data is available for download
- send inv - send an inventory of what data is available
- set filter - set a filter on data to be received
- request payment - contains payment amount and address
- send payment - probably contains actual bitcoin tx
- payment ack - just an ack of received payment

These messages are NOT final and subject to change, but just provide an
overview of what types of messages are probably needed.

The way this might work is you request data, but the node responds with a
"request payment", and to which you must send a payment before the data is
received. The "payment ack" may not be necessary, as you could just suddenly
receive data. The same goes with posting data. When you try to post, the node
responds with "request payment", to which you must respond with a payment
before continuing. This is similar to HTTP and their 402 error code. We could
consider just using HTTP as a messaging protocol, particularly HTTP 2.0 which
is binary and more efficient. However, that may be more complicated than we
need. We only need something very simple.

### Some example behaviors

#### Connecting

A node must have at least one other node's connection information to get
started. The node sends the initiation message and then receives the init ack.
The nodes are now connected and can send other messages to and from each other.

The node may now request a list of other nodes to connect to with the get peers
command. A list of peers will be sent back with a send peers command. The node
may now try to connect to those other nodes.

Note that because we must support three different protocols, TCP, web sockets,
and web RTC, the peer connection information will need to include what ever
connection information is necessary to connect to that node. In other words,
you will need more than just IP address and port, but which protocol to use and
perhaps what rendezvous server to use.

#### Posting a new top-level comment

The user, which we assume is running a node, constructs the comment (probably
in the form of JSON data), signs it, and then "posts" the data to all the nodes
they are connected to. In order to prevent nodes from rebroadcasting old data
that was generated by other users and thus spamming the network, the user's
comment needs to contain the latest block hash. That is included when the user
signs the comment, proving the signature is new. If a node tries to rebroadcast
unsolicited old data, the signatures will be "stale", i.e. they signed an old
block, and nodes will know not to rebroadcast this data.

It is possible that when the node tries to post a comment, the receiving node
will respond with a request for payment. In that case, the node will need to
send a payment as the next message. Then the data will be transmitted.

#### Posting a new comment response

The only difference between a top-level comment and a response comment is that
the response needs to contain a "link", i.e. the hash, of the comment it is
responding to. Each piece of content may be identified by its hash. This should
include the user's name, block hash, and possibly also a nonce, so that if two
users post the same message (e.g., "hello"), they are interpreted as different
comments.

Nodes that receive a posted message that is new will rebroadcast this message
to their peers, i.e. they will issue a new post command to the other peers
containing the same data and its original signature.

#### Upvoting a comment

Upvoting requires paying an amount to a user via the bitcoin p2p network, then
sending proof-of-payment, either in the form of a transaction, or transaction
hash, or a lightning-network based proof, so that other nodes can validate that
the payment actually occurred. Upvotes occur using the same "post data" message
as posting a new comment, except the data is the payment proof, and it must
refer to the comment by including the hash of the comment being paid.

#### Moderation actions

Downvotes and other actions work similarly to upvotes. A special piece of data
is posted to the network containing the action performed along with any
necessary proof. For instance, a mod action to delete a comment needs to be
signed by the mod. Other nodes can verify that the mod did in fact remove the
comment.

#### Getting new comments

A node can request an inventory to see what new comments are available. The
sending node sends the "get inv" command, and the responding node sends an
inventory of recent data, which can then be requested for download by the node.
The responding node does not send an inventory of all data, because that would
potentially be too large. Instead, the responding node will only send recent
data. The data can be paginated so that old data can be sent in a separate request.

A good way to paginate data in a decentralized way is by block hash. In order
to get the "second page", which may change a moment after the first page is
received, simply request data that occurred at a particular block hash. That
way you are guaranteed not to miss some data that has slipped to the third page
in the meantime. Since all signed data also signs the block hash when it was
created, a node can be sure that old data isn't being sent.

#### Getting new comments in a particular subcommunity

Comments can be labeled, and a node can set a filter on another node to only
receive data matching the filter. An easy way to do this is for each comment to
have a label. A node can set a filter to only receive data with a particular
label. That ensures that data for every subcommunity, most of which are
uninterested, are not received. Only the data with that label will be received.

The filter should probably work both on inv and data. In other words, once you
set a filter, all "get inv" commands respond with an inventory matching the
filter. Subsequent get data commands also apply the same filter.

The filters need to be designed to be easy to match so it's not a burden on the
responding node. Bitcoin handles this by using bloom filters. Those filters may
be appropriate in some cases - for instance, if you want to receive data from
multiple subcommunities, simply combine them into the same bloom filter. There
may be other types of filters we want to use.

#### Note on payments

Note that there are two subtly different ways to pay with this p2p protocol. A
node may pay to download content, or a user may pay another user for the
quality of their comment. The difference is that nodes are connected to each
other, and users are not. When a node pays another node, they can send the
payment directly (in addition to sending it over the bitcoin p2p network or
lightning network), but when a user pays another user, they are not connected,
and thus cannot send the payment directly. Instead, the user receiving the
payment simply sees it show up on the blockchain or on a payment channel.

Also note that payments are not necessarily required for downloading or posting
content.  Most or all content may be free to download or post. But the protocol
includes support for payments in case nodes desire to charge for content, which
may be useful in cases where a node requests or posts an enormous amount of
data.

#### Posting data to the blockchain

An alternative way for users to advertise their content and to query recent
content is to post the hash of the content to the bitcoin blockchain. Any
content hash posted in an OP_RETURN cannot be censored. Users may wish to pay a
transaction fee to ensure that the hash of their content exists forever. Users
that which to query data posted in this manner may look it up on the
blockchain. Once a user finds a hash on the blockchain, they can query nodes on
the datt p2p network or potentially alternative networks such as torrents or
ipfs to find that data.

## Message details

### Message Structure

[version][cmd][datalen][checksum][data]

- version: uint32, defaults to 0. The value should probably be ignored, but
higher values can signal to newer nodes to process data differently.

- cmd: 12 chars of command characters, similar to bitcoin. command strings are
followed by all 0s to fill the 12 chars.

- datalen: uint32, length of data to follow.

- checksum: First 4 bytes of sha256 hash of everything that follows (similar to
bitcoin).

- data: Binary data of  length datalen. interpretation of data depends on the
cmd.

### Signed Content

[pubkey][sig][datalen][data]

- pubkey: secp256k1 DER compressed pubkey, same as bitcoin, always 33 bytes.

- sig: "Compact" ECDSA signature of data; NOT the same as in the bitcoin
blockchain, but the same as bitcoin's "Bitcoin Signed Message" signatures (this
is just a more compact and easier to parse version of a signature). Always 64
bytes.

- datalen: uint32, length of data to follow.

- data: Arbitrary data of length datalen.

### ECIES encrypted content

[mac][pubkey_receiver][pubkey_sender][sig][datalen][data]

- mac: HMAC of data and the shared secret

- pubkey_receiver: compressed public key of receiver, always 33 bytes. receiver
doesn't have to be a person - maybe the shared public key of a channel, where
anyone knows how to derive the private key, e.g. it is the hash of the name
"science" for the global science subcommunity.

- pubkey_sender: the 33 bytes compressed public key of the sender

- sig: 64 byte compact ECDSA signature of data

- datalen: uint32, length of data to follow

- data: arbitrary data

### Tradle's work on p2p protocol

genevayngrib 8:19 AM @ryanxcharles: agree, p2p work is done by so many people!
Every multisig wallet needs it, other group sigs need it, lightning net needs
it! It is insane that we do not have a standard yet.

At Tradle we have spent a huge amount of time designing p2p protocol associated
with the blockchain and producing the associated OSS code. The basic ideas
are:-  all actions are associated with the identity of your choice. Identity
can be totally fake or verified by the government (as required on airbnb for
example). You can create as many of them as u like and all identities, their
keys and key restore/revocation as anchored on blockchain, so the chain is your
key server.

- p2p messages can be ephemeral or permanent. Permanent can be anchored to the
  public blockchain and not (anchored to local/federated chains, but for now we
  just use simpler logs)

- p2p  messages have open ended structure (json for now) and types of messages
  (models) are loaded from github or other web sites freely. These msgs can be
  new community, new post, upvote/downvote, moderator actions, etc. or could be
  verification of identity or any other stuff needed for the web of trust (or
  as keybase.io calls them - tracked identities)

- discovery of peers and topic should be 100% decentralized (we use a
  combination of identities on chain for finding verified pub keys and the
  bittorrent DHT for finding IP/Port of peers.

- peer discovery leads to a NAT traversal and UDP-based rUDP or uTP for
  reliable delivery of jsons over this line. Could support webrtc too of
  course, but does not yet. 
