123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427 |
- <html>
- <title>
- -
- </title>
- <body BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#330088" ALINK="#FF0044">
- <H1>The IL protocol
- </H1>
- <DL><DD><I>Dave Presotto<br>
- Phil Winterbottom<br>
- <br> <br>
- presotto,philw@plan9.bell-labs.com<br>
- </I></DL>
- <DL><DD><H4>ABSTRACT</H4>
- To transport the remote procedure call messages of the Plan 9 file system
- protocol 9P, we have implemented a new network protocol, called IL.
- It is a connection-based, lightweight transport protocol that carries
- datagrams encapsulated by IP.
- IL provides retransmission of lost messages and in-sequence delivery, but has
- no flow control and no blind retransmission.
- </DL>
- <H4>Introduction
- </H4>
- <P>
- Plan 9 uses a file system protocol, called 9P [PPTTW93], that assumes
- in-sequence guaranteed delivery of delimited messages
- holding remote procedure call
- (RPC) requests and responses.
- None of the standard IP protocols [RFC791] is suitable for transmission of
- 9P messages over an Ethernet or the Internet.
- TCP [RFC793] has a high overhead and does not preserve delimiters.
- UDP [RFC768], while cheap and preserving message delimiters, does not provide
- reliable sequenced delivery.
- When we were implementing IP, TCP, and UDP in our system we
- tried to choose a protocol suitable for carrying 9P.
- The properties we desired were:
- </P>
- <DL COMPACT>
- <DT>*<DD>
- Reliable datagram service
- <DT>*<DD>
- In-sequence delivery
- <DT>*<DD>
- Internetworking using IP
- <DT>*<DD>
- Low complexity, high performance
- <DT>*<DD>
- Adaptive timeouts
- </dl>
- <br> <br>
- No standard protocol met our needs so we designed a new one,
- called IL (Internet Link).
- <P>
- IL is a lightweight protocol encapsulated by IP.
- It is connection-based and
- provides reliable transmission of sequenced messages.
- No provision is made for flow control since the protocol
- is designed to transport RPC
- messages between client and server, a structure with inherent flow limitations.
- A small window for outstanding messages prevents too
- many incoming messages from being buffered;
- messages outside the window are discarded
- and must be retransmitted.
- Connection setup uses a two-way handshake to generate
- initial sequence numbers at each end of the connection;
- subsequent data messages increment the
- sequence numbers to allow
- the receiver to resequence out of order messages.
- In contrast to other protocols, IL avoids blind retransmission.
- This helps performance in congested networks,
- where blind retransmission could cause further
- congestion.
- Like TCP, IL has adaptive timeouts,
- so the protocol performs well both on the
- Internet and on local Ethernets.
- A round-trip timer is used
- to calculate acknowledge and retransmission times
- that match the network speed.
- </P>
- <H4>Connections
- </H4>
- <P>
- An IL connection carries a stream of data between two end points.
- While the connection persists,
- data entering one side is sent to the other side in the same sequence.
- The functioning of a connection is described by the state machine in Figure 1,
- which shows the states (circles) and transitions between them (arcs).
- Each transition is labeled with the list of events that can cause
- the transition and, separated by a horizontal line,
- the messages sent or received on that transition.
- The remainder of this paper is a discussion of this state machine.
- <DL><DT><DD><TT><PRE>
- <br><img src="-.15070.gif"><br>
- <DL><DD>
- </PRE></TT></DL>
- </P>
- <DL COMPACT>
- <DT><I>ackok</I><DD>
- any sequence number between id0 and next inclusive
- <DT><I>!x</I><DD>
- any value except x
- <DT>-<DD>
- any value
- </DL>
- <br> <br>
- <I>Figure 1 - IL State Transitions</I>
- </dl>
- <P>
- The IL state machine has five states:
- <I>Closed</I>,
- <I>Syncer</I>,
- <I>Syncee</I>,
- <I>Established</I>,
- and
- <I>Closing</I>.
- The connection is identified by the IP address and port number used at each end.
- The addresses ride in the IP protocol header, while the ports are part of the
- 18-byte IL header.
- The local variables identifying the state of a connection are:
- <DL><DD>
- </P>
- <DL COMPACT>
- <DT>state<DD>
- one of the states
- <DT>laddr<DD>
- 32-bit local IP address
- <DT>lport<DD>
- 16-bit local IL port
- <DT>raddr<DD>
- 32-bit remote IP address
- <DT>rport<DD>
- 16-bit remote IL port
- <DT>id0<DD>
- 32-bit starting sequence number of the local side
- <DT>rid0<DD>
- 32-bit starting sequence number of the remote side
- <DT>next<DD>
- sequence number of the next message to be sent from the local side
- <DT>rcvd<DD>
- the last in-sequence message received from the remote side
- <DT>unacked<DD>
- sequence number of the first unacked message
- </DL>
- </dl>
- <P>
- Unused connections are in the
- <I>Closed</I>
- state with no assigned addresses or ports.
- Two events open a connection: the reception of
- a message whose addresses and ports match no open connection
- or a user explicitly opening a connection.
- In the first case, the message's source address and port become the
- connection's remote address and port and the message's destination address
- and port become the local address and port.
- The connection state is set to
- <I>Syncee</I>
- and the message is processed.
- In the second case, the user specifies both local and remote addresses and ports.
- The connection's state is set to
- <I>Syncer</I>
- and a
- <TT>sync</TT>
- message is sent to the remote side.
- The legal values for the local address are constrained by the IP implementation.
- </P>
- <H4>Sequence Numbers
- </H4>
- <P>
- IL carries data messages.
- Each message corresponds to a single write from
- the operating system and is identified by a 32-bit
- sequence number.
- The starting sequence number for each direction in a
- connection is picked at random and transmitted in the initial
- <TT>sync</TT>
- message.
- The number is incremented for each subsequent data message.
- A retransmitted message contains its original sequence number.
- </P>
- <H4>Transmission/Retransmission
- </H4>
- <P>
- Each message contains two sequence numbers:
- an identifier (ID) and an acknowledgement.
- The acknowledgement is the last in-sequence
- data message received by the transmitter of the message.
- For
- <TT>data</TT>
- and
- <TT>dataquery</TT>
- messages, the ID is its sequence number.
- For the control messages
- <TT>sync</TT>,
- <TT>ack</TT>,
- <TT>query</TT>,
- <TT>state</TT>,
- and
- <TT>close</TT>,
- the ID is one greater than the sequence number of
- the highest sent data message.
- </P>
- <P>
- The sender transmits data messages with type
- <TT>data</TT>.
- Any messages traveling in the opposite direction carry acknowledgements.
- An
- <TT>ack</TT>
- message will be sent within 200 milliseconds of receiving the data message
- unless a returning message has already piggy-backed an
- acknowledgement to the sender.
- </P>
- <P>
- In IP, messages may be delivered out of order or
- may be lost due to congestion or faults.
- To overcome this,
- IL uses a modified ``go back n'' protocol that also attempts
- to avoid aggravating network congestion.
- An average round trip time is maintained by measuring the delay between
- the transmission of a message and the
- receipt of its acknowledgement.
- Until the first acknowledge is received, the average round trip time
- is assumed to be 100ms.
- If an acknowledgement is not received within four round trip times
- of the first unacknowledged message
- (<I>rexmit timeout</I>
- in Figure 1), IL assumes the message or the acknowledgement
- has been lost.
- The sender then resends only the first unacknowledged message,
- setting the type to
- <TT>dataquery</TT>.
- When the receiver receives a
- <TT>dataquery</TT>,
- it responds with a
- <TT>state</TT>
- message acknowledging the highest received in-sequence data message.
- This may be the retransmitted message or, if the receiver has been
- saving up out-of-sequence messages, some higher numbered message.
- Implementations of the receiver are free to choose whether to save out-of-sequence messages.
- Our implementation saves up to 10 packets ahead.
- When the sender receives the
- <TT>state</TT>
- message, it will immediately resend the next unacknowledged message
- with type
- <TT>dataquery</TT>.
- This continues until all messages are acknowledged.
- </P>
- <P>
- If no acknowledgement is received after the first
- <TT>dataquery</TT>,
- the transmitter continues to timeout and resend the
- <TT>dataquery</TT>
- message.
- The intervals between retransmissions increase exponentially.
- After 300 times the round trip time
- (<I>death timeout</I>
- in Figure 1), the sender gives up and
- assumes the connection is dead.
- </P>
- <P>
- Retransmission also occurs in the states
- <I>Syncer</I>,
- <I>Syncee</I>,
- and
- <I>Close</I>.
- The retransmission intervals are the same as for data messages.
- </P>
- <H4>Keep Alive
- </H4>
- <P>
- Connections to dead systems must be discovered and torn down
- lest they consume resources.
- If the surviving system does not need to send any data and
- all data it has sent has been acknowledged, the protocol
- described so far will not discover these connections.
- Therefore, in the
- <I>Established</I>
- state, if no other messages are sent for a 6 second period,
- a
- <TT>query</TT>
- is sent.
- The receiver always replies to a
- <TT>query</TT>
- with a
- <TT>state</TT>
- message.
- If no messages are received for 30 seconds, the
- connection is torn down.
- This is not shown in Figure 1.
- </P>
- <H4>Byte Ordering
- </H4>
- <P>
- All 32- and 16-bit quantities are transmitted high-order byte first, as
- is the custom in IP.
- </P>
- <H4>Formats
- </H4>
- <P>
- The following is a C language description of an IP+IL
- header, assuming no IP options:
- <DL><DT><DD><TT><PRE>
- typedef unsigned char byte;
- struct IPIL
- {
- byte vihl; /* Version and header length */
- byte tos; /* Type of service */
- byte length[2]; /* packet length */
- byte id[2]; /* Identification */
- byte frag[2]; /* Fragment information */
- byte ttl; /* Time to live */
- byte proto; /* Protocol */
- byte cksum[2]; /* Header checksum */
- byte src[4]; /* Ip source */
- byte dst[4]; /* Ip destination */
- byte ilsum[2]; /* Checksum including header */
- byte illen[2]; /* Packet length */
- byte iltype; /* Packet type */
- byte ilspec; /* Special */
- byte ilsrc[2]; /* Src port */
- byte ildst[2]; /* Dst port */
- byte ilid[4]; /* Sequence id */
- byte ilack[4]; /* Acked sequence */
- };
- </PRE></TT></DL>
- </P>
- <br> <br>
- Data is assumed to immediately follow the header in the message.
- <TT>Ilspec</TT>
- is an extension reserved for future protocol changes.
- <P>
- The checksum is calculated with
- <TT>ilsum</TT>
- and
- <TT>ilspec</TT>
- set to zero.
- It is the standard IP checksum, that is, the 16-bit one's complement of the one's
- complement sum of all 16 bit words in the header and text. If a
- message contains an odd number of header and text bytes to be
- checksummed, the last byte is padded on the right with zeros to
- form a 16-bit word for the checksum.
- The checksum covers from
- <TT>cksum</TT>
- to the end of the data.
- </P>
- <P>
- The possible
- <I>iltype</I>
- values are:
- <DL><DT><DD><TT><PRE>
- enum {
- sync= 0,
- data= 1,
- dataquery= 2,
- ack= 3,
- query= 4,
- state= 5,
- close= 6,
- };
- </PRE></TT></DL>
- </P>
- <br> <br>
- The
- <TT>illen</TT>
- field is the size in bytes of the IL header (18 bytes) plus the size of the data.
- <H4>Numbers
- </H4>
- <P>
- The IP protocol number for IL is 40.
- </P>
- <P>
- The assigned IL port numbers are:
- <DL><DD>
- </P>
- <DL COMPACT>
- <DT>7<DD>
- echo all input to output
- <DT>9<DD>
- discard input
- <DT>19<DD>
- send a standard pattern to output
- <DT>565<DD>
- send IP addresses of caller and callee to output
- <DT>566<DD>
- Plan 9 authentication protocol
- <DT>17005<DD>
- Plan 9 CPU service, data
- <DT>17006<DD>
- Plan 9 CPU service, notes
- <DT>17007<DD>
- Plan 9 exported file systems
- <DT>17008<DD>
- Plan 9 file service
- <DT>17009<DD>
- Plan 9 remote execution
- <DT>17030<DD>
- Alef Name Server
- </DL>
- </dl>
- <H4>References
- </H4>
- <br> <br>
- [PPTTW93] Rob Pike, Dave Presotto, Ken Thompson, Howard Trickey, and Phil Winterbottom,
- ``The Use of Name Spaces in Plan 9'',
- <I>Op. Sys. Rev.,</I>
- Vol. 27, No. 2, April 1993, pp. 72-76,
- reprinted in this volume.
- <br>
- [RFC791] RFC791,
- <I>Internet Protocol,</I>
- <I>DARPA Internet Program Protocol Specification,</I>
- September 1981.
- <br>
- [RFC793] RFC793,
- <I>Transmission Control Protocol,</I>
- <I>DARPA Internet Program Protocol Specification,</I>
- September 1981.
- <br>
- [RFC768] J. Postel, RFC768,
- <I>User Datagram Protocol,</I>
- <I>DARPA Internet Program Protocol Specification,</I>
- August 1980.
- <br> <br>
- <A href=http://www.lucent.com/copyright.html>
- Copyright</A> © 2000 Lucent Technologies Inc. All rights reserved.
- </body></html>
|