October 21, 2014

Hosting Web Applications

Web frameworks of all languages include a way to run an application in the browser during development.

Things get more complicated when we want to host an application. We can't just run python app.py or php -S 0.0.0.0:80 on a server used for real traffic and hope for the best!

This article will cover what goes into hosting a web application on a real server.

A "real server" in this context is a production server. However it could be any remote server, whether for development, staging, or production.

Three Actors

Hosting a web application requires the orchestration of three actors:

  1. The Application
  2. The Gateway
  3. The Web Server

Here's the general flow of a web request into an application. We'll discuss this flow going from right to left.

application gateway request flow

Applications & HTTP Interfaces

Web Applications are generally coded using a framework or suite of libraries. These typically have tooling to handle HTTP requests.

Libraries created to accept and translate HTTP requests are referred to as HTTP Interfaces. These accept requests and translate them for application code.

For example, Python has the WSGI specification. This specifies an interface between web servers and Python applications.

A popular implementation of WSGI is Werkzeug. This is a Python library that can accept and parse WSGI-compliant web requests.

Similarly, Ruby has Rack. Rack is a specification and library that can accept Rack-compliant web requests.

Most languages have HTTP interfaces available. Python and Ruby have HTTP interfaces added on via specifications and libraries. However, many newer languages include HTTP interfaces as part of their standard library.

NodeJS and Golang are two examples of languages that can listen for HTTP requests "out of the box". HTTP requests can be given and parsed without needing libraries or specifications.

PHP, notably, is lacking such a specification. However, there are talks of PHP-FIG defining such an interface in PSR-7.

In any case, web applications must incorporate a way to accept web requests and return valid responses.

We've discussed how languages specify how to accept web requests. Next, we'll discuss how application gateways translate HTTP requests.

The application request flow and its interaction with a gateway is pictured here.

application gateway request flow

The Gateway

Gateways sit between a web server (Apache, Nginx) and a web application. They accept requests from a web server and translate them for a web application.

Unfortunately, gateways typically don't label themselves as such. The exact definition of a gateway is somewhat fluid.

Some call themselves HTTP servers. Some consider themselves process managers. Others are more of a platform, supporting multiple use cases, protocols, and programming languages.

It might be useful to describe what gateways do rather than pin down an exact definition. Some common functionality of gateways include:

  1. Listen for requests (HTTP, FastCGI, uWSGI and more)
  2. Translate requests to application code
  3. Spawn multiple processes and/or threads of applications
  4. Monitor spawned processes
  5. Load balance requests between processes
  6. Reporting/logging

A gateway's main purpose is usually to translate requests. It's also common for a gateway to control application processes and threads.

We'll concentrate on the translation of requests.

Consider a gateway receiving a request meant for a Python application. The gateway will translate the request into a WSGI-compliant request.

It's the same for a gateway receiving a request for a Rack application. The way gateway will translate the request into a rack-compliant request.

Of course, in order for a gateway to translate a request, they must first receive one.

PHP-FPM, the gateway for PHP, is an implementation of FastCGI. It will listen for FastCGI requests from a web server.

Many gateways can accept HTTP requests directly. uWSGI, Gunicorn, and Unicorn are examples of such gateways.

Other protocols are also often supported. For example, uWSGI will accept HTTP, FastCGI and uwsgi (lowercase, the protocol) requests.

Don't confuse Python's PEP 3333 (WSGI) specification with uWSGI's protocol "uwsgi".

WSGI is Python specification for handling web requests. uWSGI can translate a request to be compatible with WSGI applications.

Similarly, uWSGI has it's own specification called "uwsgi". This specifies how other clients (web servers) can communicate with uWSGI!

A web server such as Nginx can "speak" uwsgi in order to communicate with the gateway uWSGI. uWSGI, in turn, can translate that request to WSGI in order to communicate with an application. The application will accept the WSGI-compliant request for an application. The Werkzeug library is capable of reading such a WSGI request.

No matter what protocol is used, gateways can accept a request and translate it to speak a web application's "language".

The following gateways will translate requests for WSGI (Python) applications:

  • Gunicorn
  • Tornado
  • Gevent
  • Twisted Web
  • uWSGI

The following gateways will translate requests to Rack (Ruby) applications:

  • Unicorn
  • Phusion Passenger
  • Thin
  • Puma

A modern way to run PHP applications is to use the PHP-FPM gateway. PHP-FPM will listens for FastCGI connections.

Users of HHVM can use the included FastCGI server to listen for web requests. It acts much like PHP-FPM.

Before PHP-FPM, PHP was commonly run directly in Apache. A Gateway was not used. Instead, Apache's PHP module loaded PHP directly, allowing PHP to be run inline of any files processed.

This is still a common way to run PHP.

Skipping Gateways

I mentioned above that some languages include HTTP interfaces in their standard library.

Applications built in such languages can skip the use of gateways. In that scenario, a web server will send HTTP requests directly to the application.

Such applications can still benefit from the use of a gateway. For example, NodeJS applications. Node's asynchronous model allows it to run efficiently as a single-process. However, you may want to use multiple processes on multi-core servers.

A NodeJS gateway such as PM2 could manage multiple processes. This would allow for more concurrent application requests to be handled.

Gateways aren't necessarily language-specific! For example, uWSGI, Gunicorn and Unicorn have all been used with applications of various languages.

You'll find that tutorials often match a gateway with applications of a specific language. This isn't a hard rule. In fact, uWSGI is written in C rather than Python!

This is why specifications exist. They allow for language-agnostic implementations.

The gateway request flow described is pictured here. We've discussed the flow from a web server to the gateway and from the gateway to the application.

application gateway request flow

The Web Server

Web servers excel at serving requested files, but usually serve other purposes as well.

Popular web-server features include:

  • Hosting multiple sites
  • Serving static files
  • Proxying requests to other processes
  • Load balancing
  • HTTP caching
  • Streaming media

Here we're concerned with the web server's ability to act as a (reverse) proxy.

The web server and the gateway are middlemen between the client and an application. The web server accepts a request and relays it to a gateway, which in turn translates it to the application. The response is relayed back, finally reaching the client.

We've briefly discussed how a gateway can accept a request and translate it for an application. We'll get in a little more detail here.

As mentioned, a web server will translate an HTTP request to something a gateway can understand. Gateways listen for requests using various protocols.

Some gateways can listen for HTTP connections. In this case, the web server can relay the HTTP request to the gateway directly.

Other gateways listen for FastCGI or uwsgi connections. Web servers which support these protocols must translate an HTTP request to those protocols.

Nginx and Apache can both "speak" HTTP, uwsgi, and FastCGI. These web servers will accept an HTTP request and relay them to a gateway in whichever protocol the gateway needs.

More specifically, the web server will translate a request into whatever you configure it to use. It's up to the developer/sysadmin to configure the web server correctly.

The web server request flow described is the flow from the client to the web server and from the web server to the gateway.

application gateway request flow

PHP is Special

PHP is unique in that it's a language built specifically for the web.

Most other languages are either general purpose or do not concentrate on the web. As a result, PHP is fairly different in how it handles HTTP requests.

PHP was originally built under the assumption that it is run during an HTTP request. It contained no process for converting the bytes of an HTTP request into something for code to handle. By the time the code was run, that was dealt with.

PHP is no longer limited to being run within context of an HTTP request. However, running PHP outside of a web request was an evolution of PHP, not a starting point.

Conversely, other languages have a process of translating an HTTP request into code. This usually means parsing the bytes constituting HTTP request data.

Many libraries have been created as a language add-on to handle HTTP requests. Some newer languages can handle HTTP requests directly. These languages don't assume code is always run in the context of an HTTP request, however.

This process in PHP is roughly equivalent to using cURL (or perhaps the Guzzle package) to accept web requests.

On a practical level, this means that PHP's super globals ($_SERVER, $_GET, $_POST, $_SESSION, $_COOKIE and so on) are already populated by the time your PHP code is run. PHP doesn't need to do work to get this data.

PHP-FPM, the gateway for PHP, takes a web request's data and fills in the PHP super globals. It sets up the state of the request (the environment) before running code.

PHP never needed a specification or library for accepting bytes of data and converting it to a request. It's already done by the time PHP code is run! This is great for simplicity.

Modern applications, however, are not simple. More structured applications ("Enterprise") often need to deal with HTTP in great detail. Perhaps we want to encrypt cookies or set cache headers.

This often requires us to write objects describing HTTP. These objects are populated by PHP's environment/state. These objects then adjust HTTP-related state, applying correct HTTP business logic and constraints.

Under the hood, such libraries use PHP's built in HTTP-related functions to read in a request and create a response.

Popular examples of these libraries includes Symfony's HTTP libraries, and the possible PSR-7 standard.

An HTTP interface can help standardize and encapsulate HTTP concerns as well. This is great for testing in PHP. Super globals and managing state can often lead to issues in testing and code quality.

For the end-user (developer), that's a hidden difference between PHP and other languages.

Languages like Python and Ruby have a specification for how raw HTTP information should be sent. It's up to frameworks and libraries to handle HTTP concerns. Code manipulating HTTP response data must be written to send responses in the proper HTTP format.

PHP frameworks simply manipulate HTTP request state given. For HTTP responses, PHP has built-in methods to let you adjust HTTP headers and data.

PHP developers don't need to concern themselves with how data gets translated into a HTTP response. Instead, that's a concern for the internals team.

All Topics