il.ms 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395
  1. .HTML "The IL Protocol
  2. .TL
  3. The IL protocol
  4. .AU
  5. Dave Presotto
  6. Phil Winterbottom
  7. .sp
  8. presotto,philw@plan9.bell-labs.com
  9. .AB
  10. To transport the remote procedure call messages of the Plan 9 file system
  11. protocol 9P, we have implemented a new network protocol, called IL.
  12. It is a connection-based, lightweight transport protocol that carries
  13. datagrams encapsulated by IP.
  14. IL provides retransmission of lost messages and in-sequence delivery, but has
  15. no flow control and no blind retransmission.
  16. .AE
  17. .SH
  18. Introduction
  19. .PP
  20. Plan 9 uses a file system protocol, called 9P [PPTTW93], that assumes
  21. in-sequence guaranteed delivery of delimited messages
  22. holding remote procedure call
  23. (RPC) requests and responses.
  24. None of the standard IP protocols [RFC791] is suitable for transmission of
  25. 9P messages over an Ethernet or the Internet.
  26. TCP [RFC793] has a high overhead and does not preserve delimiters.
  27. UDP [RFC768], while cheap and preserving message delimiters, does not provide
  28. reliable sequenced delivery.
  29. When we were implementing IP, TCP, and UDP in our system we
  30. tried to choose a protocol suitable for carrying 9P.
  31. The properties we desired were:
  32. .IP \(bu
  33. Reliable datagram service
  34. .IP \(bu
  35. In-sequence delivery
  36. .IP \(bu
  37. Internetworking using IP
  38. .IP \(bu
  39. Low complexity, high performance
  40. .IP \(bu
  41. Adaptive timeouts
  42. .LP
  43. No standard protocol met our needs so we designed a new one,
  44. called IL (Internet Link).
  45. .PP
  46. IL is a lightweight protocol encapsulated by IP.
  47. It is connection-based and
  48. provides reliable transmission of sequenced messages.
  49. No provision is made for flow control since the protocol
  50. is designed to transport RPC
  51. messages between client and server, a structure with inherent flow limitations.
  52. A small window for outstanding messages prevents too
  53. many incoming messages from being buffered;
  54. messages outside the window are discarded
  55. and must be retransmitted.
  56. Connection setup uses a two-way handshake to generate
  57. initial sequence numbers at each end of the connection;
  58. subsequent data messages increment the
  59. sequence numbers to allow
  60. the receiver to resequence out of order messages.
  61. In contrast to other protocols, IL avoids blind retransmission.
  62. This helps performance in congested networks,
  63. where blind retransmission could cause further
  64. congestion.
  65. Like TCP, IL has adaptive timeouts,
  66. so the protocol performs well both on the
  67. Internet and on local Ethernets.
  68. A round-trip timer is used
  69. to calculate acknowledge and retransmission times
  70. that match the network speed.
  71. .SH
  72. Connections
  73. .PP
  74. An IL connection carries a stream of data between two end points.
  75. While the connection persists,
  76. data entering one side is sent to the other side in the same sequence.
  77. The functioning of a connection is described by the state machine in Figure 1,
  78. which shows the states (circles) and transitions between them (arcs).
  79. Each transition is labeled with the list of events that can cause
  80. the transition and, separated by a horizontal line,
  81. the messages sent or received on that transition.
  82. The remainder of this paper is a discussion of this state machine.
  83. .KF
  84. \s-2
  85. .PS 5.5i
  86. copy "transition.pic"
  87. .PE
  88. \s+2
  89. .RS
  90. .IP \fIackok\fR 1.5i
  91. any sequence number between id0 and next inclusive
  92. .IP \fI!x\fR 1.5i
  93. any value except x
  94. .IP \- 1.5i
  95. any value
  96. .RE
  97. .sp
  98. .ce
  99. .I "Figure 1 - IL State Transitions
  100. .KE
  101. .PP
  102. The IL state machine has five states:
  103. .I Closed ,
  104. .I Syncer ,
  105. .I Syncee ,
  106. .I Established ,
  107. and
  108. .I Closing .
  109. The connection is identified by the IP address and port number used at each end.
  110. The addresses ride in the IP protocol header, while the ports are part of the
  111. 18-byte IL header.
  112. The local variables identifying the state of a connection are:
  113. .RS
  114. .IP state 10
  115. one of the states
  116. .IP laddr 10
  117. 32-bit local IP address
  118. .IP lport 10
  119. 16-bit local IL port
  120. .IP raddr 10
  121. 32-bit remote IP address
  122. .IP rport 10
  123. 16-bit remote IL port
  124. .IP id0 10
  125. 32-bit starting sequence number of the local side
  126. .IP rid0 10
  127. 32-bit starting sequence number of the remote side
  128. .IP next 10
  129. sequence number of the next message to be sent from the local side
  130. .IP rcvd 10
  131. the last in-sequence message received from the remote side
  132. .IP unacked 10
  133. sequence number of the first unacked message
  134. .RE
  135. .PP
  136. Unused connections are in the
  137. .I Closed
  138. state with no assigned addresses or ports.
  139. Two events open a connection: the reception of
  140. a message whose addresses and ports match no open connection
  141. or a user explicitly opening a connection.
  142. In the first case, the message's source address and port become the
  143. connection's remote address and port and the message's destination address
  144. and port become the local address and port.
  145. The connection state is set to
  146. .I Syncee
  147. and the message is processed.
  148. In the second case, the user specifies both local and remote addresses and ports.
  149. The connection's state is set to
  150. .I Syncer
  151. and a
  152. .CW sync
  153. message is sent to the remote side.
  154. The legal values for the local address are constrained by the IP implementation.
  155. .SH
  156. Sequence Numbers
  157. .PP
  158. IL carries data messages.
  159. Each message corresponds to a single write from
  160. the operating system and is identified by a 32-bit
  161. sequence number.
  162. The starting sequence number for each direction in a
  163. connection is picked at random and transmitted in the initial
  164. .CW sync
  165. message.
  166. The number is incremented for each subsequent data message.
  167. A retransmitted message contains its original sequence number.
  168. .SH
  169. Transmission/Retransmission
  170. .PP
  171. Each message contains two sequence numbers:
  172. an identifier (ID) and an acknowledgement.
  173. The acknowledgement is the last in-sequence
  174. data message received by the transmitter of the message.
  175. For
  176. .CW data
  177. and
  178. .CW dataquery
  179. messages, the ID is its sequence number.
  180. For the control messages
  181. .CW sync ,
  182. .CW ack ,
  183. .CW query ,
  184. .CW state ,
  185. and
  186. .CW close ,
  187. the ID is one greater than the sequence number of
  188. the highest sent data message.
  189. .PP
  190. The sender transmits data messages with type
  191. .CW data .
  192. Any messages traveling in the opposite direction carry acknowledgements.
  193. An
  194. .CW ack
  195. message will be sent within 200 milliseconds of receiving the data message
  196. unless a returning message has already piggy-backed an
  197. acknowledgement to the sender.
  198. .PP
  199. In IP, messages may be delivered out of order or
  200. may be lost due to congestion or faults.
  201. To overcome this,
  202. IL uses a modified ``go back n'' protocol that also attempts
  203. to avoid aggravating network congestion.
  204. An average round trip time is maintained by measuring the delay between
  205. the transmission of a message and the
  206. receipt of its acknowledgement.
  207. Until the first acknowledge is received, the average round trip time
  208. is assumed to be 100ms.
  209. If an acknowledgement is not received within four round trip times
  210. of the first unacknowledged message
  211. .I "rexmit timeout" "" (
  212. in Figure 1), IL assumes the message or the acknowledgement
  213. has been lost.
  214. The sender then resends only the first unacknowledged message,
  215. setting the type to
  216. .CW dataquery .
  217. When the receiver receives a
  218. .CW dataquery ,
  219. it responds with a
  220. .CW state
  221. message acknowledging the highest received in-sequence data message.
  222. This may be the retransmitted message or, if the receiver has been
  223. saving up out-of-sequence messages, some higher numbered message.
  224. Implementations of the receiver are free to choose whether to save out-of-sequence messages.
  225. Our implementation saves up to 10 packets ahead.
  226. When the sender receives the
  227. .CW state
  228. message, it will immediately resend the next unacknowledged message
  229. with type
  230. .CW dataquery .
  231. This continues until all messages are acknowledged.
  232. .PP
  233. If no acknowledgement is received after the first
  234. .CW dataquery ,
  235. the transmitter continues to timeout and resend the
  236. .CW dataquery
  237. message.
  238. The intervals between retransmissions increase exponentially.
  239. After 300 times the round trip time
  240. .I "death timeout" "" (
  241. in Figure 1), the sender gives up and
  242. assumes the connection is dead.
  243. .PP
  244. Retransmission also occurs in the states
  245. .I Syncer ,
  246. .I Syncee ,
  247. and
  248. .I Close .
  249. The retransmission intervals are the same as for data messages.
  250. .SH
  251. Keep Alive
  252. .PP
  253. Connections to dead systems must be discovered and torn down
  254. lest they consume resources.
  255. If the surviving system does not need to send any data and
  256. all data it has sent has been acknowledged, the protocol
  257. described so far will not discover these connections.
  258. Therefore, in the
  259. .I Established
  260. state, if no other messages are sent for a 6 second period,
  261. a
  262. .CW query
  263. is sent.
  264. The receiver always replies to a
  265. .CW query
  266. with a
  267. .CW state
  268. message.
  269. If no messages are received for 30 seconds, the
  270. connection is torn down.
  271. This is not shown in Figure 1.
  272. .SH
  273. Byte Ordering
  274. .PP
  275. All 32- and 16-bit quantities are transmitted high-order byte first, as
  276. is the custom in IP.
  277. .SH
  278. Formats
  279. .PP
  280. The following is a C language description of an IP+IL
  281. header, assuming no IP options:
  282. .P1
  283. typedef unsigned char byte;
  284. struct IPIL
  285. {
  286. byte vihl; /* Version and header length */
  287. byte tos; /* Type of service */
  288. byte length[2]; /* packet length */
  289. byte id[2]; /* Identification */
  290. byte frag[2]; /* Fragment information */
  291. byte ttl; /* Time to live */
  292. byte proto; /* Protocol */
  293. byte cksum[2]; /* Header checksum */
  294. byte src[4]; /* Ip source */
  295. byte dst[4]; /* Ip destination */
  296. byte ilsum[2]; /* Checksum including header */
  297. byte illen[2]; /* Packet length */
  298. byte iltype; /* Packet type */
  299. byte ilspec; /* Special */
  300. byte ilsrc[2]; /* Src port */
  301. byte ildst[2]; /* Dst port */
  302. byte ilid[4]; /* Sequence id */
  303. byte ilack[4]; /* Acked sequence */
  304. };
  305. .P2
  306. .LP
  307. Data is assumed to immediately follow the header in the message.
  308. .CW Ilspec
  309. is an extension reserved for future protocol changes.
  310. .PP
  311. The checksum is calculated with
  312. .CW ilsum
  313. and
  314. .CW ilspec
  315. set to zero.
  316. It is the standard IP checksum, that is, the 16-bit one's complement of the one's
  317. complement sum of all 16 bit words in the header and text. If a
  318. message contains an odd number of header and text bytes to be
  319. checksummed, the last byte is padded on the right with zeros to
  320. form a 16-bit word for the checksum.
  321. The checksum covers from
  322. .CW cksum
  323. to the end of the data.
  324. .PP
  325. The possible
  326. .I iltype
  327. values are:
  328. .P1
  329. enum {
  330. sync= 0,
  331. data= 1,
  332. dataquery= 2,
  333. ack= 3,
  334. query= 4,
  335. state= 5,
  336. close= 6,
  337. };
  338. .P2
  339. .LP
  340. The
  341. .CW illen
  342. field is the size in bytes of the IL header (18 bytes) plus the size of the data.
  343. .SH
  344. Numbers
  345. .PP
  346. The IP protocol number for IL is 40.
  347. .PP
  348. The assigned IL port numbers are:
  349. .RS
  350. .IP 7 15
  351. echo all input to output
  352. .IP 9 15
  353. discard input
  354. .IP 19 15
  355. send a standard pattern to output
  356. .IP 565 15
  357. send IP addresses of caller and callee to output
  358. .IP 566 15
  359. Plan 9 authentication protocol
  360. .IP 17005 15
  361. Plan 9 CPU service, data
  362. .IP 17006 15
  363. Plan 9 CPU service, notes
  364. .IP 17007 15
  365. Plan 9 exported file systems
  366. .IP 17008 15
  367. Plan 9 file service
  368. .IP 17009 15
  369. Plan 9 remote execution
  370. .IP 17030 15
  371. Alef Name Server
  372. .RE
  373. .SH
  374. References
  375. .LP
  376. [PPTTW93] Rob Pike, Dave Presotto, Ken Thompson, Howard Trickey, and Phil Winterbottom,
  377. ``The Use of Name Spaces in Plan 9'',
  378. .I "Op. Sys. Rev.,
  379. Vol. 27, No. 2, April 1993, pp. 72-76,
  380. reprinted in this volume.
  381. .br
  382. [RFC791] RFC791,
  383. .I "Internet Protocol,
  384. .I "DARPA Internet Program Protocol Specification,
  385. September 1981.
  386. .br
  387. [RFC793] RFC793,
  388. .I "Transmission Control Protocol,
  389. .I "DARPA Internet Program Protocol Specification,
  390. September 1981.
  391. .br
  392. [RFC768] J. Postel, RFC768,
  393. .I "User Datagram Protocol,
  394. .I "DARPA Internet Program Protocol Specification,
  395. August 1980.