LIBCURL-STRUCTS 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245
  1. _ _ ____ _
  2. ___| | | | _ \| |
  3. / __| | | | |_) | |
  4. | (__| |_| | _ <| |___
  5. \___|\___/|_| \_\_____|
  6. Structs in libcurl
  7. This document should cover 7.32.0 pretty accurately, but will make sense even
  8. for older and later versions as things don't change drastically that often.
  9. 1. The main structs in libcurl
  10. 1.1 SessionHandle
  11. 1.2 connectdata
  12. 1.3 Curl_multi
  13. 1.4 Curl_handler
  14. 1.5 conncache
  15. 1.6 Curl_share
  16. 1.7 CookieInfo
  17. ==============================================================================
  18. 1. The main structs in libcurl
  19. 1.1 SessionHandle
  20. The SessionHandle handle struct is the one returned to the outside in the
  21. external API as a "CURL *". This is usually known as an easy handle in API
  22. documentations and examples.
  23. Information and state that is related to the actual connection is in the
  24. 'connectdata' struct. When a transfer is about to be made, libcurl will
  25. either create a new connection or re-use an existing one. The particular
  26. connectdata that is used by this handle is pointed out by
  27. SessionHandle->easy_conn.
  28. Data and information that regard this particular single transfer is put in
  29. the SingleRequest sub-struct.
  30. When the SessionHandle struct is added to a multi handle, as it must be in
  31. order to do any transfer, the ->multi member will point to the Curl_multi
  32. struct it belongs to. The ->prev and ->next members will then be used by the
  33. multi code to keep a linked list of SessionHandle structs that are added to
  34. that same multi handle. libcurl always uses multi so ->multi *will* point to
  35. a Curl_multi when a transfer is in progress.
  36. ->mstate is the multi state of this particular SessionHandle. When
  37. multi_runsingle() is called, it will act on this handle according to which
  38. state it is in. The mstate is also what tells which sockets to return for a
  39. specific SessionHandle when curl_multi_fdset() is called etc.
  40. The libcurl source code generally use the name 'data' for the variable that
  41. points to the SessionHandle.
  42. 1.2 connectdata
  43. A general idea in libcurl is to keep connections around in a connection
  44. "cache" after they have been used in case they will be used again and then
  45. re-use an existing one instead of creating a new as it creates a significant
  46. performance boost.
  47. Each 'connectdata' identifies a single physical connection to a server. If
  48. the connection can't be kept alive, the connection will be closed after use
  49. and then this struct can be removed from the cache and freed.
  50. Thus, the same SessionHandle can be used multiple times and each time select
  51. another connectdata struct to use for the connection. Keep this in mind, as
  52. it is then important to consider if options or choices are based on the
  53. connection or the SessionHandle.
  54. Functions in libcurl will assume that connectdata->data points to the
  55. SessionHandle that uses this connection.
  56. As a special complexity, some protocols supported by libcurl require a
  57. special disconnect procedure that is more than just shutting down the
  58. socket. It can involve sending one or more commands to the server before
  59. doing so. Since connections are kept in the connection cache after use, the
  60. original SessionHandle may no longer be around when the time comes to shut
  61. down a particular connection. For this purpose, libcurl holds a special
  62. dummy 'closure_handle' SessionHandle in the Curl_multi struct to
  63. FTP uses two TCP connections for a typical transfer but it keeps both in
  64. this single struct and thus can be considered a single connection for most
  65. internal concerns.
  66. The libcurl source code generally use the name 'conn' for the variable that
  67. points to the connectdata.
  68. 1.3 Curl_multi
  69. Internally, the easy interface is implemented as a wrapper around multi
  70. interface functions. This makes everything multi interface.
  71. Curl_multi is the multi handle struct exposed as "CURLM *" in external APIs.
  72. This struct holds a list of SessionHandle structs that have been added to
  73. this handle with curl_multi_add_handle(). The start of the list is ->easyp
  74. and ->num_easy is a counter of added SessionHandles.
  75. ->msglist is a linked list of messages to send back when
  76. curl_multi_info_read() is called. Basically a node is added to that list
  77. when an individual SessionHandle's transfer has completed.
  78. ->hostcache points to the name cache. It is a hash table for looking up name
  79. to IP. The nodes have a limited life time in there and this cache is meant
  80. to reduce the time for when the same name is wanted within a short period of
  81. time.
  82. ->timetree points to a tree of SessionHandles, sorted by the remaining time
  83. until it should be checked - normally some sort of timeout. Each
  84. SessionHandle has one node in the tree.
  85. ->sockhash is a hash table to allow fast lookups of socket descriptor to
  86. which SessionHandle that uses that descriptor. This is necessary for the
  87. multi_socket API.
  88. ->conn_cache points to the connection cache. It keeps track of all
  89. connections that are kept after use. The cache has a maximum size.
  90. ->closure_handle is described in the 'connectdata' section.
  91. The libcurl source code generally use the name 'multi' for the variable that
  92. points to the Curl_multi struct.
  93. 1.4 Curl_handler
  94. Each unique protocol that is supported by libcurl needs to provide at least
  95. one Curl_handler struct. It defines what the protocol is called and what
  96. functions the main code should call to deal with protocol specific issues.
  97. In general, there's a source file named [protocol].c in which there's a
  98. "struct Curl_handler Curl_handler_[protocol]" declared. In url.c there's
  99. then the main array with all individual Curl_handler structs pointed to from
  100. a single array which is scanned through when a URL is given to libcurl to
  101. work with.
  102. ->scheme is the URL scheme name, usually spelled out in uppercase. That's
  103. "HTTP" or "FTP" etc. SSL versions of the protcol need its own Curl_handler
  104. setup so HTTPS separate from HTTP.
  105. ->setup_connection is called to allow the protocol code to allocate protocol
  106. specific data that then gets associated with that SessionHandle for the rest
  107. of this transfer. It gets freed again at the end of the transfer. It will be
  108. called before the 'connectdata' for the transfer has been selected/created.
  109. Most protocols will allocate its private 'struct [PROTOCOL]' here and assign
  110. SessionHandle->req.protop to point to it.
  111. ->connect_it allows a protocol to do some specific actions after the TCP
  112. connect is done, that can still be considered part of the connection phase.
  113. Some protocols will alter the connectdata->recv[] and connectdata->send[]
  114. function pointers in this function.
  115. ->connecting is similarly a function that keeps getting called as long as the
  116. protocol considers itself still in the connecting phase.
  117. ->do_it is the function called to issue the transfer request. What we call
  118. the DO action internally. If the DO is not enough and things need to be kept
  119. getting done for the entire DO sequence to complete, ->doing is then usually
  120. also provided. Each protocol that needs to do multiple commands or similar
  121. for do/doing need to implement their own state machines (see SCP, SFTP,
  122. FTP). Some protocols (only FTP and only due to historical reasons) has a
  123. separate piece of the DO state called DO_MORE.
  124. ->doing keeps getting called while issuing the transfer request command(s)
  125. ->done gets called when the transfer is complete and DONE. That's after the
  126. main data has been transferred.
  127. ->do_more gets called during the DO_MORE state. The FTP protocol uses this
  128. state when setting up the second connection.
  129. ->proto_getsock
  130. ->doing_getsock
  131. ->domore_getsock
  132. ->perform_getsock
  133. Functions that return socket information. Which socket(s) to wait for which
  134. action(s) during the particular multi state.
  135. ->disconnect is called immediately before the TCP connection is shutdown.
  136. ->readwrite gets called during transfer to allow the protocol to do extra
  137. reads/writes
  138. ->defport is the default report TCP or UDP port this protocol uses
  139. ->protocol is one or more bits in the CURLPROTO_* set. The SSL versions have
  140. their "base" protocol set and then the SSL variation. Like "HTTP|HTTPS".
  141. ->flags is a bitmask with additional information about the protocol that will
  142. make it get treated differently by the generic engine:
  143. PROTOPT_SSL - will make it connect and negotiate SSL
  144. PROTOPT_DUAL - this protocol uses two connections
  145. PROTOPT_CLOSEACTION - this protocol has actions to do before closing the
  146. connection. This flag is no longer used by code, yet still set for a bunch
  147. protocol handlers.
  148. PROTOPT_DIRLOCK - "direction lock". The SSH protocols set this bit to
  149. limit which "direction" of socket actions that the main engine will
  150. concern itself about.
  151. PROTOPT_NONETWORK - a protocol that doesn't use network (read file:)
  152. PROTOPT_NEEDSPWD - this protocol needs a password and will use a default
  153. one unless one is provided
  154. PROTOPT_NOURLQUERY - this protocol can't handle a query part on the URL
  155. (?foo=bar)
  156. 1.5 conncache
  157. Is a hash table with connections for later re-use. Each SessionHandle has
  158. a pointer to its connection cache. Each multi handle sets up a connection
  159. cache that all added SessionHandles share by default.
  160. 1.6 Curl_share
  161. The libcurl share API allocates a Curl_share struct, exposed to the external
  162. API as "CURLSH *".
  163. The idea is that the struct can have a set of own versions of caches and
  164. pools and then by providing this struct in the CURLOPT_SHARE option, those
  165. specific SessionHandles will use the caches/pools that this share handle
  166. holds.
  167. Then individual SessionHandle structs can be made to share specific things
  168. that they otherwise wouldn't, such as cookies.
  169. The Curl_share struct can currently hold cookies, DNS cache and the SSL
  170. session cache.
  171. 1.7 CookieInfo
  172. This is the main cookie struct. It holds all known cookies and related
  173. information. Each SessionHandle has its own private CookieInfo even when
  174. they are added to a multi handle. They can be made to share cookies by using
  175. the share API.