HTTP-COOKIES.md 6.5 KB

HTTP Cookies

Cookie overview

Cookies are name=contents pairs that an HTTP server tells the client to hold and then the client sends back those to the server on subsequent requests to the same domains and paths for which the cookies were set.

Cookies are either "session cookies" which typically are forgotten when the session is over which is often translated to equal when browser quits, or the cookies are not session cookies they have expiration dates after which the client throws them away.

Cookies are set to the client with the Set-Cookie: header and are sent to servers with the Cookie: header.

For a long time, the only spec explaining how to use cookies was the original Netscape spec from 1994.

In 2011, RFC 6265 was finally published and details how cookies work within HTTP. In 2016, an update which added support for prefixes was proposed, and in 2017, another update was drafted to deprecate modification of 'secure' cookies from non-secure origins. Both of these drafts have been incorporated into a proposal to replace RFC 6265. Cookie prefixes and secure cookie modification protection has been implemented by curl.

curl considers http://localhost to be a secure context, meaning that it allows and uses cookies marked with the secure keyword even when done over plain HTTP for this host. curl does this to match how popular browsers work with secure cookies.

Super cookies

A single cookie can be set for a domain that matches multiple hosts. Like if set for example.com it gets sent to both aa.example.com as well as bb.example.com.

A challenge with this concept is that there are certain domains for which cookies should not be allowed at all, because they are Public Suffixes. Similarly, a client never accepts cookies set directly for the top-level domain like for example .com. Cookies set for too broad domains are generally referred to as super cookies.

If curl is built with PSL (Public Suffix List) support, it detects and discards cookies that are specified for such suffix domains that should not be allowed to have cookies.

if curl is not built with PSL support, it has no ability to stop super cookies.

Cookies saved to disk

Netscape once created a file format for storing cookies on disk so that they would survive browser restarts. curl adopted that file format to allow sharing the cookies with browsers, only to see browsers move away from that format. Modern browsers no longer use it, while curl still does.

The Netscape cookie file format stores one cookie per physical line in the file with a bunch of associated meta data, each field separated with TAB. That file is called the cookie jar in curl terminology.

When libcurl saves a cookie jar, it creates a file header of its own in which there is a URL mention that links to the web version of this document.

Cookie file format

The cookie file format is text based and stores one cookie per line. Lines that start with # are treated as comments. An exception is lines that start with #HttpOnly_, which is a prefix for cookies that have the HttpOnly attribute set.

Each line that specifies a single cookie consists of seven text fields separated with TAB characters. A valid line must end with a newline character.

Fields in the file

Field number, what type and example data and the meaning of it:

  1. string example.com - the domain name
  2. boolean FALSE - include subdomains
  3. string /foobar/ - path
  4. boolean TRUE - send/receive over HTTPS only
  5. number 1462299217 - expires at - seconds since Jan 1st 1970, or 0
  6. string person - name of the cookie
  7. string daniel - value of the cookie

Cookies with curl the command line tool

curl has a full cookie "engine" built in. If you just activate it, you can have curl receive and send cookies exactly as mandated in the specs.

Command line options:

-b, --cookie

tell curl a file to read cookies from and start the cookie engine, or if it is not a file it passes on the given string. -b name=var works and so does -b cookiefile.

-j, --junk-session-cookies

when used in combination with -b, it skips all "session cookies" on load so as to appear to start a new cookie session.

-c, --cookie-jar

tell curl to start the cookie engine and write cookies to the given file after the request(s)

Cookies with libcurl

libcurl offers several ways to enable and interface the cookie engine. These options are the ones provided by the native API. libcurl bindings may offer access to them using other means.

CURLOPT_COOKIE

Is used when you want to specify the exact contents of a cookie header to send to the server.

CURLOPT_COOKIEFILE

Tell libcurl to activate the cookie engine, and to read the initial set of cookies from the given file. Read-only.

CURLOPT_COOKIEJAR

Tell libcurl to activate the cookie engine, and when the easy handle is closed save all known cookies to the given cookie jar file. Write-only.

CURLOPT_COOKIELIST

Provide detailed information about a single cookie to add to the internal storage of cookies. Pass in the cookie as an HTTP header with all the details set, or pass in a line from a Netscape cookie file. This option can also be used to flush the cookies etc.

CURLOPT_COOKIESESSION

Tell libcurl to ignore all cookies it is about to load that are session cookies.

CURLINFO_COOKIELIST

Extract cookie information from the internal cookie storage as a linked list.

Cookies with JavaScript

These days a lot of the web is built up by JavaScript. The web browser loads complete programs that render the page you see. These JavaScript programs can also set and access cookies.

Since curl and libcurl are plain HTTP clients without any knowledge of or capability to handle JavaScript, such cookies are not detected or used.

Often, if you want to mimic what a browser does on such websites, you can record web browser HTTP traffic when using such a site and then repeat the cookie operations using curl or libcurl.