HTTP is arguably the most important application level network protocol for what we consider to be the Internet. It is the protocol that allows web browsers and web servers to communicate. It is also becoming the most popular protocol for implementing web services.
With Zinc, Pharo has out of the box support for HTTP. Zinc is a robust, fast and elegant HTTP client and server library written and maintained by Sven van Caekenberghe.
HTTP, short for Hypertext Transfer Protocol, functions as a request-response protocol in the client-server computing model. As an application level protocol it is layered on top of a reliable transport such as a TCP socket stream. The most important standard specification document describing HTTP version 1.1 is RFC 2616. As usual, a good starting point for learning about HTTP is its Wikipedia article.
A client, often called user-agent, submits an HTTP request to a server which will respond with an HTTP response (see Fig. 0.1). The initiative of
the communication lies with the client. In HTTP parlance, the client requests a resource. A resource, sometimes also called an entity, is the combination of a
collection of bytes and a mime-type. A simple text resource will consist of bytes encoding the string in some encoding, for example UTF-8, and the mime-type
text/plain;charset=utf-8, in contrast, an HTML resource will have a mime-type like
To specify which resource you want, a URL (Uniform Resource Locator) is used. Web addresses are the most common form of URL. Consider for example http://pharo.org/files/pharo-logo-small.png : it is a URL that refers to a PNG image resource on a specific server.
The reliable transport connection between an HTTP client and server is used bidirectionally: both to send the request as well as to receive the response. It can be used for just one request/response cycle, as was the case for HTTP version 1.0, or it can be reused for multiple request/response cycles, as is the default for HTTP version 1.1.
Zinc, the short form for Zinc HTTP Components, is an open-source Smalltalk framework to deal with HTTP. It models most concepts of HTTP and its related standards and offers both client and server functionality. One of its key goals is to offer understandability (Smalltalk's design principle number one). Anyone with a basic understanding of Smalltalk and the HTTP principles should be able to understand what is going on and learn, by looking at the implementation. Zinc, or Zn, after its namespace prefix, is an integral part of Pharo Smalltalk since version 1.3. It has been ported to other Smalltalk implementations such as Gemstone.
The reference Zn implementation lives in several places:
Installation or updating instructions can be found on its web site.
The key object to programmatically execute HTTP requests is called
ZnClient. You instantiate it, use its rich API to configure and execute an HTTP request
and access the response.
ZnClient is a stateful object that acts as a builder.
Let's get started with the simplest possible usage.
Select the expression and print its result. You should get a
String back containing a very small HTML document. The
belongs to the convenience API. Let's use a more general API to be a bit more explicit about what happened.
Here we explicitly set the url of the resource to access using
url:, then we execute an HTTP GET using
get and we finally ask for the response object
response. The above returns a
ZnResponse object. Of course you can inspect it. It consists of 3 elements:
The status line says HTTP/1.1 200 OK, which means the request was successful. This can be tested by sending
isSuccess to either the response object or the
client itself. The headers contain meta data related to the response, including:
The entity is the actual resource: the bytes that should be interpreted in the context of the content-type mime-type. Zn automatically converts non-binary
Strings using the correct encoding. In our example, the entity is an instance of
ZnStringEntity, a concrete subclass of
Like any Smalltalk object, you can inspect or explore the
ZnResponse object. You might be wondering how this response was actually transferred over the
network. That is easy with Zinc, as the key HTTP objects all implement
writeOn: that displays the raw format of the response i.e. what has been transmitted
through the network.
If you have the Transcript open, you should see something like the following:
The first CRLF terminated line is the status line. Next are the headers, each on a line with a key and a value. An empty line ends the headers. Finally, the entity bytes follows, either up to the content length or up to the end of the stream.
You might wonder what the request looked like when it went over the network? You can find it out using the same technique.
In an opened Transcript you will now see:
ZnRequest object consists of 3 elements:
The request line contains the HTTP method (sometimes called verb), URL and the HTTP protocol version. Next come the request headers, similar to the response headers, meta data including:
If you look carefully at the Transcript you will see the empty line terminating the headers. For most kinds of requests, like for a GET, there is no entity.
For debugging and for learning, it can be helpful to enable logging on the client. Try the following.
This will print out some information on the Transcript, as shown below.
In a later subsection about server logging, which uses the same mechanism, you will learn how to interpret and customize logging.
ZnClient is absolutely the preferred object to deal with all the intricacies of HTTP, you sometimes wish you could to a quick HTTP request with an
absolute minimum amount of typing, especially during debugging. For these occasions there is
ZnEasy, a class side only API for quick HTTP requests.
The result is always a
ZnResponse object. Apart from basic authentication, there are no other options. A nice feature here, more as an example, is some
direct ways to ask for image resources as ready to use Forms.
When you explore the implementation, you will notice that
ZnEasy uses a
ZnClient object internally.
A simple view of HTTP is: you request a resource and get a response back containing the resource. But even if the mechanics of HTTP did work, and even that is not guaranteed (see the next section), the response could not be what you expected.
HTTP defines a whole set of so called status codes to define various situations. These codes turn up as part of the status line of a response. The dictionary mapping numeric codes to their textual reason string is predefined.
A good overview can be found in the Wikipedia article List of HTTP status codes. The most common code,
the one that indicates success is numeric code 200 with reason 'OK'. Have a look at the
testing protocol of
ZnResponse for how to interpret some of them.
So if you do an HTTP request and get something back, you cannot just assume that all is well. You first have to make sure that the call itself (more
specifically the response) was successful. As mentioned before, this is done by sending
isSuccess to the response or the client.
To make it easier to write better HTTP client code,
ZnClient offers some useful status handling methods in its API. You can ask the client to consider
non-successful HTTP responses as errors with the
enforceHTTPSuccess option. The client will then automatically throw a
ZnHTTPUnsuccesful exception. This
is generally useful when the application code that uses Zinc handles errors.
Additionally, to install a local failure handler, there is the
ifFail: option. This will invoke a block, optionally passing an exception, whenever something
goes wrong. Together, this allows the above code to be rewritten as follows.
Maybe it doesn't look like a big difference, but combined with some other options and features of
ZnClient that we'll see later on, the code does become more
elegant and more reliable at the same time.
As a network protocol, HTTP is much more complicated than an ordinary message send. The famous Fallacies of Distributed Computing paper by Deutsch et. al. eloquently lists the issues involved:
Zn will signal various exceptions when things go wrong, at different levels.
ZnClient and the underlying framework have constants, settings and options to
deal with various aspects related to these issues.
Doing an HTTP request-response cycle can take an unpredictable amount of time. Client code has to specify a timeout: the maximum amount of time to wait for a response, and be prepared for when that timeout is exceeded. When there is no answer within a specified timeout can mean that some networking component is extremely slow, but it could also mean that the server simply refuses to answer.
Setting the timeout directly on a
ZnClient is the easiest.
The timeout counts for each socket level connect, read and write operation, separately. You can dynamically redefine the timeout using the
class, which is a
Zn defines its global default timeout in seconds as a setting.
This setting affects most framework level operations, if nothing else is specified.
During the execution of HTTP, various network exceptions, as subclasses of NetworkError, might be thrown. These will all be caught by the
ifFail: block when installed.
To deal with temporary or intermittent network or server problems,
ZnClient offers a retry protocol. You can set how many times a request should be retried
and how many seconds to wait between retries.
In the above example, the request will be tried up to 3 times, with a 2 second delay between attempts. Note that the definition of failure/success is broad: it includes for example the option to enforce HTTP success.
ZnUrl objects to deal with URLs.
ZnClient also contains an API to build URLs. Let us revisit our initial example, using explicit URL construction with the
Instead of giving a string argument to be parsed into a
ZnUrl, we now provide the necessary elements to construct the URL manually, by sending messages to
ZnClient object. With
http we set what is called the scheme. Then we set the hostname. Since we don't specify a port, the default port for HTTP will
be used, port 80. Next we add path elements, extending the path one by one.
A URL can also contain query parameters. Let's do a Google search as an example:
Query parameters have a name and a value. Certain special characters have to be encoded. You can build the same URL with the
ZnUrl object, in several ways.
If you print the above expression, it gives you the printable representation of the URL.
This string version can easily be parsed again into a
'http://www.google.com/search?q=Pharo%20Smalltalk' asZnUrl. 'http://www.google.com:80/search?q=Pharo Smalltalk' asZnUrl.
Note how the
ZnUrl parser is forgiving with respect to the space, like most browsers would do. When producing an external representation, proper encoding
will take place. Please consult the class comment of
ZnUrl for a more detailed look at the capabilities of
ZnUrl as a standalone object.
In many web applications HTML forms are used. Examples are forms to enter a search string, a form with a username and password to log in or complex registration
forms. In the classic and most common way, this is implemented by sending the data entered in the fields of a form to the server when a submit button is clicked.
It is possible to implement the same behavior programmatically using
First you have to find out how the form is implemented by looking at the HTML code. Here is an example.
This form shows one text input field, preceded by a ‘Search for:’ label and followed by a submit button with ‘Go!’ as label. Assuming this appears on a page with
http://www.search-engine.com/, we can implement the behavior of the browser when the user clicks the button, submitting or sending the form data to the server.
The URL is composed by combining the URL of the page that contains the form with the action specified. There is no need to set the encoding of the request here
because the form uses the default encoding
application/x-www-form-urlencoded. By using the
formAt:put: method to set the value of a field, an entity of
ZnApplicationFormUrlEncodedEntity will be created if needed, and the field name/value association will be stored in it. When finally
post is invoked,
the HTTP request sent to the server will include a properly encoded entity. As far as the server is concerned, it will seem as if a real user submitted the form.
Consequently, the response should be the same as when you submit the form manually using a browser. Be careful to include all relevant fields, even the hidden ones.
There is a second type of form encoding called
multipart/form-data. Here, instead of adding fields, you add
The code to submit this form would then be as follows.
In this case, an entity of type
ZnMultiPartFormDataEntity is created and used. This type is often used in forms that upload files. Here is an example.
This would be the way to do the upload programmatically.
Sometimes, the form's submit method is GET instead of POST, just send
get instead of
post to the client. Note that this technique of sending form data to
a server is different than what happens with raw POST or PUT requests using a REST API. In a later subsection we will come back to this.
There are various techniques to add authentication, a mechanism to control who accesses which resources, to HTTP. This is orthogonal to HTTP itself. The simplest and most common form of authentication is called 'Basic Authentication'.
That is all there is to it. If you want to understand how this works, look at how
ZnRequest>>#setBasicAuthenticationUsername:password: is implemented.
Basic authentication over plain HTTP is insecure because it transfers the username/password combination obfuscated by encoding it using the trivial Base64 encoding. When used over HTTPS, basic authentication is secure though. Note that when sending multiple requests while reusing the same client, authentication is reset for each request, to prevent the accidental transfer of sensitive data.
Basic authentication is not the same as a web application where you have to log in using a form. In such web applications, e.g an online store that has a login part and a shopping cart per user, state is needed. During the interaction with the web application, the server needs to know that your requests/responses are part of your session: you log in, you add items to your shopping cart and you finally check out and pay. It would be problematic if the server mixed the requests/responses of different users. However, HTTP is by design a stateless protocol: each request/response cycle is independent. This principle is crucial to the scalability of the internet.
The most commonly used technique to overcome this issue, enabling the tracking of state across different request/response cycles is the use of so called cookies. Cookies are basically key/value pairs connected to a specific server domain. Using a special header, the server asks the client to remember or update the value of a cookie for a domain. On subsequent requests to the same domain, the client will use a special header to present the cookie and its value back to the server. Semantically, the server manages a key/value pair on the client.
As we saw before, a
ZnClient instance is essentially stateful. It not only tries to reuse a network connection but it also maintains a
ZnUserAgentSession object, which represents the session. One of the main functions of this session object is to manage cookies, just like your browser does.
ZnCookie objects are held
ZnCookieJar object inside the session object.
Cookie handling will happen automatically. This is a hypothetical example of how this might work, assuming a site where you have to log in before you are able to access a specific file.
post, the server will presumably set a cookie to acknowledge a successful login. When a specific file is next requested from the same domain, the
client presents the cookie to prove the login. The server knows it can send back the file because it recognizes the cookie as valid. By sending
the client object, you can access the session object and then the remembered cookies.
A regular request for a resource is done using a GET request. A GET request does not send an entity to the server. The only way for a GET request to transfer information to the server is by encoding it in the URL, either in the path or in query variables. (To be 100% correct we should add that data can be sent as custom headers as well.)
HTTP provides for two methods (or verbs) to send information to a server. These are called PUT and POST. They both send an entity to the server in order to transfer data.
In the subsection about submitting HTML forms we already saw how POST is used to send either a
ZnApplicationFormUrlEncodedEntity or to send a
ZnMultiPartFormDataEntity containing structured data to a server.
Apart from that, it is also possible to send a raw entity to a server. Of course, the server needs to be prepared to handle this kind of entity coming in. Here are a couple of examples of doing a raw PUT and POST request.
In the last example we explicitly set the entity to be XML and do a POST. In the first two examples, the convenience contents system is used to automatically
ZnStringEntity of the type
ZnMimeType textPlain, respectively a
ZnByteArrayEntity of the type
The difference between PUT and POST is semantic. POST is generally used to create a new resource inside an existing collection or container, or to initiate some
action or process. For this reason, the normal response to a POST request is to return the URL (or URI) of the newly created resource. Conventionally, the reponse
contains this URL both in the
Location header accessible via the message
location and in the entity part.
When a POST successfully created the resource, its HTTP response will be 201 Created. PUT is generally used to update an existing resource of which you know the exact URL (or URI). When a PUT is successful, its HTTP response will be just 200 OK and nothing else will be returned. When we will discuss REST Web Service APIs, we will come back to this.
The fourth member of the common set of HTTP methods is DELETE. It is very similar to both GET and PUT: you just specify an URL of the resource that you want to delete or remove. When successful, the server will just reply with a 200 OK. That is all there is to it.
Certain HTTP based protocols, like WebDAV, use even more HTTP methods. These can be queried explicitly using the
method: setter and the
An OPTIONS request does not return an entity, but only meta data that are included in the header of the response. In this example, the response header contains
an extra meta data named
Allow which specifies the list of HTTP methods that may be used on the resource.
HTTP 1.1 defaults to keeping the client connection to a server open, and the server will do the same. This is useful and faster if you need to issue more than
ZnClient implements this behavior by default.
The above example sets up a client to connect to a specific host. Then it collects the results of 10 different requests, asking for random strings of a specific size. All requests will go over the same network connection.
Neither party is required to keep the connection open for a long time, as this consumes resources. Both parties should be prepared to deal with connections
closing, this is not an error.
ZnClient will try to reuse an existing connection and reconnect once if this reuse fails. The option
limits the maximum age for a connection to be reused.
Note how we also close the client using the message
close. A network connection is an external resource, like a file, that should be properly closed after use.
If you don't do that, they will get cleaned up eventually by the system, but it is more efficient to do it yourself.
In many situations, you only want to do one single request. HTTP 1.1 has provisions for this situation. The beOneShot option of
ZnClient will do just that.
With the beOneShot option, the client notifies the server that it will do just one request and both parties will consequently close the connection after use,
automatically. In this case, an explicit close of the
ZnClient object is no longer needed.
Sometimes when requesting a URL, an HTTP server will not answer immediately but redirect you to another location. For example, Seaside actually does this on each
request. This is done with a 301 or 302 response code. You can ask a
ZnResponse whether it's a redirect with
isRedirect. In case of a redirect response,
Location header will contain the location the server redirects you to. You can access that URL using
ZnClient will follow redirects automatically for up to 3 redirects. You won't even notice unless you activate logging. If for some reason you
want to disable this feature, send a
followRedirects: false to your client. To modify the maximum number of redirects that could be followed, use
Following redirects can be tricky when PUT or POST are involved. Zn implements the common behavior of changing a redirected PUT or POST into a GET while dropping the body entity. Cookies will be resubmitted. Zn also handles relative redirect URLs, although these are not strictly part of the standard.
A client that already requested a resource in the past can also ask a server if that resource has been modified, i.e. is newer, since he last requested it. If so,
the server will give a quick 304 Not Modified response without sending the resource over again. This is done by setting the If-Modified-Since header using
This works both for regular requests as well as for downloads.
For this to work, the server has to honor this particular protocol interaction, of course.
Asking for a resource with a certain mime-type does not mean that the server will return something of this type. The extension at the end of a URL has no real
significance, and the server might have been reconfigured since last you asked for this resource. For example, asking for
http://example.com/foo.text could all be the same or all be different, and this may change over time. This is why HTTP resources
(entities) are accompanied by a content-type: a mime-type that is an official, cross-platform definition of a file or document type or format. Again, see the
Wikipedia article Internet media type for more details.
Zn models mime-types using its
ZnMimeType object which has 3 components:
The class side of
ZnMimeType has some convenience methods for accessing well known mime-types, for example:
Note that for textual (non-binary) types, the encoding defaults to UTF-8, the prevalent internet standard. Creating a
ZnMimeType object is also as easy as
asZnMimeType to a
The subtype can be a wildcard, indicated by a
*. This allows for matching.
ZnClient you can set the accept request header to indicate what you as a client expect, and optionally enforce that the server returns the type you asked for.
The above code indicates to the server that we want a
text/plain type resource by means of the
Accept header. When the response comes back and it is not
of that type, the client will raise a
ZnUnexpectedContentType exception. Again, this will be handled by the
ifFail: block, when specified.
HTTP meta data, both for requests and for responses, is specified using headers. These are key/value pairs, both strings. A large number of predefined headers
exists, see this List of HTTP header fields. The exact semantics of each header, especially their value, can be very
complicated. Also, although headers are key/value pairs, they are more than a regular dictionary. There can be more values for the same key and keys are often
written using a canonical capitalization, like
HTTP provides for a way to do a request, just like a regular GET but with a response that contains only the meta data, the status line and headers, but not the actual resource or entity. This is called a HEAD request.
Since there is no content, we have to look at the
headers of the response object. Note that the content-type and content-length headers will be set, as if
there was an entity, although none is transferred.
ZnClient allows you to easily specify custom headers for which there is not yet a predefined accessor, which is most of them. At the framework level,
ZnRequest offer some more predefined accessors, as well as a way to set and query any custom header by accessing their headers sub object.
The following are all equivalent:
Once a request is executed, you can query the response headers like this:
As mentioned before,
ZnResponses) can hold an optional
ZnEntity as body. By now we used almost all concrete
Like all other fundamental Zn domain model objects, these can and are used both by clients and servers. All
ZnEntities have a content type (a mime-type) and
a content length (in bytes). Their basic behavior is that they can be written to or read from a binary stream. All but the last one are classic, in-memory objects.
ZnStreamingEntity is special: it contains a read or write stream to be used once in one direction only. If you want to transfer a 10 Mb file, using a normal
entity, this would result in the 10 Mb being taken into memory. With a streaming entity, a file stream is opened to the file, and the data is then copied using
a buffer of a couple of tens of Kb. This is obviously more efficient. The limitation is that this only works if the exact size is known upfront.
Knowing that a
ZnStringEntity has a content type of XML or JSON is however not enough to interpret the data correctly. You might need a parser to convert the
representation to Smalltalk or a writer to convert Smalltalk into the proper representation. That is where the
contentWriter are useful.
If the content reader is nil (the default),
contents will return the
contents of the response object, usually a
To customize the content reader, you specify a block that will be given the incoming entity and that is then supposed to parse the incoming representation, for example as below:
In this example,
get (which returns the same as
contents) will no longer return a
String but a collection of numbers. Note also that by using
in combination with an
accept: we handle most error cases before the content reader start doing its work, so it does no longer have to check for good incoming
data. In any case, when the
contentReader throws an exception, it can be caught by the
If the content writer is nil (the default),
contents: will take a Smalltalk object and pass it to
with: instance creation method.
This will create either a
String entity or an
You could further customize the entity by sending
contentType: with another mime type. Or you could completely skip the
contents: mechanism and supply your own entity to
To customize the content writer, you need to pass a one-argument block to the
contentWriter: message. The block should create and return an entity. A theoretical example is given next.
Assuming there is a web service at
http://internet-calculator.com where you can send numbers to, we send a whitespace separated list of numbers to its sum URI
and expect a number back. Exceptions occuring in the content writer can be caught with the
Often, you want to download a resource from some internet server and store its contents in a file. The well known curl and wget Unix utilities are often used to
do this in scripts. There is a handy convenience method in
ZnClient to do just that.
The example will download the URL and save it in a file named
numbers.txt next to your image. The argument to
downloadTo: can be a
a path string, designating either a file or a directory. When it is a directory, the last component of the URL will be used to create a new file in that directory.
When it is a file, that file will be used as given. Additionally, the
downloadTo: operation will use streaming so that a large file will not be taken into
memory all at once, but will be copied in a loop using a buffer.
The inverse, uploading the raw contents of file, is just as easy thanks to the convenience method
uploadEntityFrom:. Given a file reference or a path string, it
will set the current request entity to a
ZnStreamingEntity reading bytes from the named file. The content type will be guessed based on the file
name extension. If needed you can next override that mime type using
contentType:. Here is a hypothetical example uploading the contents of the file
numbers.txt using a POST to the URL specified, again using an efficient streaming copy.
Some HTTP operations, particularly those involving large resources, might take some time, especially when slower networks or servers are involved. During
interactive use, Pharo Smalltalk often indicates progress during operations that take a bit longer.
ZnClient can do that too using the
By default this is off. Here is an example.
To handle its large set of options,
ZnClient implements a uniform, generic option mechanism using the
(this last one always defines an explicit default), storing them lazily in a dictionary. The method category
options includes all accessors to actual settings.
Options are generally named after their accessor, a notable exception is
beOneShot. For example, the timeout option has a getter named
timeout and setter
timeout: whose implementation defines its default
The set of all option defaults defines the default policy of
ZnClient. For certain scenarios, there are policy methods that set several options at once. The
most useful one is called
systemPolicy. It specifies good practice behavior for when system level code does an HTTP call:
Also, in some networks you do not talk to internet web servers directly, but indirectly via a proxy. Such a proxy controls and regulates traffic. A proxy can improve performance by caching often used resources, but only if there is a sufficiently high hit rate.
Zn client functionality will automatically use the proxy settings defined in your Pharo image. The UI to set a proxy host, port, username or password can be
found in the Settings browser under the Network category. Accessing localhost will bypass the proxy. To find out more about Zn's usage of the proxy settings,
start by browsing the
proxy method category of
Zinc is a solid and very flexible HTTP library. This chapter only presented the client-side of Zinc i.e. how to use it to send HTTP requests and receive responses back. Through several code examples, we demonstrated some of the possibilities of Zinc and also its simplicity. Zinc relies on a very good object-centric decomposition of the HTTP concepts. It results in an easy to understand and extensible library.