posted last year in Dev Platform category by Esen Sagynov
Apache which had been at the forefront of web revival is now giving its way to Nginx. Nginx is spreading at a fast pace worldwide, and at NHN as well we have been replacing web servers by Nginx more and more.
Let’s look at what Nginx is and how it differs from Apache Web Server.
In the beginning of 21st century as the use of internet became more active, people began to shed more interest on web servers capable of processing more requests. Later the C10k problem emerged in this era, in other words to enable each web server to be capable of processing 10,000 clients at the same time. Therefore a need to develop a better Network I/O and Thread management technology was emerged. During this period, far less-developed technology was applied to Linux than previous ones such as epoll and NPTL.
epoll was designed for better process of IO events on Linux. kqueue (FreeBSD) and IOCP (Windows) are similar. NPTL is a thread library of Linux loaded on Kernel 206 and up (based on safe version). It provided enormous improvement in thread performance.
It is fair to say that the emergence of several technology was not the result of an effort to tackle C10k problem (as a common problem) but "The C10k problem" succeeded in putting together efforts to enhance performance of network server development. As a result various web servers such as Nginx, Lightpad and Cherokee made with new technology were in the market.
Nginx was first developed by a Russian guy called Igor Sysoev in 2002. After two years in 2004 it made its debut in the forms of HTTP server and reverse proxy/IMAP/POP3 server. To respond to C10k problem, Nginx adopted event-driven (asynchronous) structure which is not a conventional way (as in Apache) of processing single client in a single thread. Other than Nginx, various newly developed web servers such as Lightpad, Tornado, Magnum and Aleph have adopted Event-driven structure too.
Event-driven Architecture (EDA)
EDA method is capable of handling more number of clients with less threads compared to conventional way (one thread for one client). In Apache each thread is dedicated to handle one client. So, there will be many cases when a certain thread needs to be on hold due to I/O problems until certain data is read from persistence layer in "accept" and delivered to the client. There should be the same number of threads as that of clients.
However, with EDA a method enables event (i.e. when event needs to be created or response result is completed) to be processed every time it takes place by designating each state. In other words, with less thread (or even without any thread) CPU can be used more efficiently.
Nginx provides almost all the functions Apache web server does. For example:
- handling of static files
- reverse proxy
- load balancing
- SSL support
- Virtual Host
- FLV Streaming
- MP4 streaming
- Web page access authentication
- URL Rewriting
- Custom Logging
According to a research done in April, 2011 by W3Techs.com, Nginx takes up 6.9% in web server market and is on a steady increase. The graph below shows the market share of Nginx in recent 1 year.
We can see that many websites process massive traffic using Nginx. Alexa.com is well known for releasing website ranking according to its traffic. The following list, released on April 18, 2011, illustrates websites from Top 500 which use Nginx.
Characteristics of Nginx
Non-blocking event-driven method
The most notable characteristic of Nginx is that it uses non-blocking event-driven method. All the network connection operates in non-blocking way. Some socket interfaces return results immediately, while some are blocked in certain cases. Socket() and setsockopt() returns results without blocking, whereas connect(), send(), recv() and close() might experience some blocking in certain cases. Nginx calls above mentioned functions only after confirming there will be no lags. In order to prevent blocking, as for connect(), socket is changed in advance to non-blocking using ioctl().
As for send() and recv(), epoll is used to confirm blocklessness. Codes for calling send() or recv() are composed in event-driven format. Each event is composed of a socket, a socket state and an operating function. Almost all of internal Nginx is operated in event-driven way since web server processes request from web browser through network.
Nginx is operated by pre-set number of worker processes. Each process operates as a single thread. Also, the need for multi-thread is relatively low since with the use of Non-blocking event-driven allows a single worker process to handle requests by multiple clients.
Advantage of Nginx
Generally Nginx is known to record higher performance than Apache. However, we cannot simply jump to conclusion in terms of better performance since it varies greatly according to workload. Below chart indicates a system function that is called to serve a single favicon.ico using starce. As you can see in the table below Apache called 31 system functions whereas Nginx called 16. Although the system function used here cannot accurately indicate the performance, we can still expect relative performance.
Handling many clients with less process
Disadvantage of Nginx
Difficult to create module
Does not support HTTP/1.1 in communication with Backend
Things to note while using Nginx
- Nginx has a non-blocking event-driven structure. Such structure is desirable for network transmission and reception. However, it could generate blocking when file is being sent and received. But the latest Linux versions support Asynchronous Input-Output (AIO), and Nginx began to support AIO since version 0.8.11.
- Nginx uses AIO function through eventfd() system function of Linux. However it only works when AIO of Linux is set as DirectIO. Thus, using AIO on Linux disables buffer cache of OS.
- As in the case of reading the same file over and over again, it reads from disk in the first place however from the second time it can rapidly read stored file on buffer cache. No matter how many times the file is being read, using DirectIO makes it to be read from disk consistently.
- If a DBMS uses independent buffer, DirectIO might be efficient. However in other cases DirectIO could decrease file input/output performance.
- If most of the files that Nginx read exist in OS buffer cache, most of file I/O become non-blocking. However if Nginx gets to serve files which do not existent in the buffer cache, the worker process becomes blocked due to file I/O. In this case, you need to increase the number of worker processes in order to prevent the worker process from becoming blocking.