Background Image

BLOG

?

Shortcut

PrevPrev Article

NextNext Article

Larger Font Smaller Font Up Down Go comment Print Attachment
Written by Jaehong Kim on 06/16/2017

Previous blog article covered Vert.x, a Java application framework which provides noticeable performance advantage over competing technologies and features multi programming language support. The previous article has explained us about the philosophy of Vert.x, performance comparison with Node.js, internal structure of Vert.x, and many more. Today, I would like to continue this conversation and talk more about Vert.x architecture.

Considerations Used to Develop Vert.x

Polyglot is the feature making Vert.x stand out from other server frameworks. In the past, server frameworks could not support multiple languages. Supporting several languages does more than expand the range of users. More important thing is that services using different languages in a distributed environment can intercommunicate with ease. Of course, supporting a variety of languages is not sufficient for supporting a distributed environment. Essential functions of greater priority for a distributed environment include address system or message bus. Vert.x framework provides these functions. As Vert.x provides these functions as well as Polyglot, the benefits of Vert.x should be considered for a distributed environment.

As Vert.x supports a universal server framework, a variety of workloads should be considered. We should consider unusal cases different from Nginx, which is typically used as a Web server, or Node.js. It is to build a universal server application that processes a variety of protocols except HTTP (i.e., not a Web server which executes simple operations, considering scalability in a 3-tier environment). In order to accomplish this, Vert.x provides an additional thread pool while using the Run Loop method.

We will discuss Vert.x architecture starting from the thread pool and consideration for a distributed environment.

Run Loop and Thread Pool

Vert.x and asynchronous server applications (or frameworks), include Ngin.x and Node.js, use the Run Loop method. Vert.x uses the term 'Event Loop' instead of 'Run Loop'. However, as Run Loop is the more popular term among some developers. I use this term, Run Loop, here. Run Loop, as you will guess from the name, is a method for checking whether there is a new event in the infinite loop, and calling an event handler when an event has been received. As such, the asynchronous server application and the event-based server application are different terms indicating an identical target, similar to ‘enharmonic' for music. To use the Run Loop method, all I/Os should be managed as events.

For example, imagine a general Web server application that creates a query for a database to respond to an HTTP request from a Web browser. The CPU of the Web server is used when one thread analyzes the HTTP request to execute proper business logic, and creates a query statement. However, the CPU is not used while the thread sends the query to the database and waits for a response. However, when the thread to be created equals the number of HTTP requests (Thread per Connection), another thread may be processing a task requiring the web server CPU, while one thread is waiting for response from the database. Finally, the web server CPU is used to process HTTP requests. As you know, the weakness of Thread per Connection is the cost for context switching at the kernel level since many threads must be created. This can be called waste. 

The asynchronous event handling method can overcome this weakness (figuratively speaking, 'asynchronous event handling' is the 'purpose' and ‘Run Loop’ is the 'means'). If ‘HTTP request itself’ and ‘receiving a response from the database’ are created as an event, and the Run Loop calls the corresponding event handler whenever an event is received, the execution performance of the application can be enhanced by avoiding unnecessary context switching. In this fashion, to utilize a CPU efficiently, the number of Run Loops required equals the number of cores (i.e., thread should be created equaling the number of cores and each thread should run the Run Loop).

However, there is another problem creating threads equaling the number of cores, which is preventing as much context switching as possible. If a handler, using server resources, takes a long time to handle an event, other events received while the handler is being executed are not managed in a timely manner. A popular example is file searching on the server disk. In this case, it is better to create a separate thread for searching files. 

Therefore, to build a universal server framework with asynchronous event handling, the framework should have a function for managing a thread pool. This is the aim of Vert.x. Thread pool management is the biggest difference between Vert.x and Node.js, except for polyglot. Vert.x creates Run Loops (Event Loops) equaling the number of cores and provides thread pool-related function to handle tasks using server resources requiring long periods for event handling.

Why is Hazelcast Used?

Vert.x uses Hazelcast, an In-Memory Data Grid (IMDG). Hazelcast API is not directly revealed to users but is used in Vert.x. When Vert.x is started, Hazelcast is started as an embedded element.

Hazelcast is a type of distributed storage. When storage is embedded and used in a server framework, we can obtain expected effects from a distributed environment.

The most popular case is session data processing. Vert.x calls it Shared Data. It allows multiple Vert.x instances to share the same data. Of course, additional RDBMS, instead of Hazelcast, will bring the same effect from the functional side. It is natural that embedded memory storage can consistently provide results faster than remote RDBMS. Therefore, users who need sessions for e-commerce or chatting servers can build a system with a simple configuration by using only Vert.x.

Hazelcast allows a message queue use without additional costs or investments (without server costs or monitoring of message queue instances). As mentioned before, Hazelcast is a distributed storage. It can duplicate a storage for reliability. By using this distributed storage as a queue, the server application implemented by using Vert.x becomes a message processing server application and a distributed queue.

These benefits make Vert.x a strong framework in a distributed environment.

 

Understanding Vert.x Components

 

vertx-architecture-diagram.png

Figure 1: Vert.x Architecture (Component) Diagram.

Figure 1 above shows a diagram of Vert.x components. As shown in the figure, in all Vert.x instances (these can be understood as a JVM), a Hazelcast is embedded and runs. The embedded Hazelcast is connected to Hazelcast in other Vert.x instances. Event Bus uses functions of Hazelcast. Hazelcast itself provides a certain level of reliability (because of WAL records and data duplication). So, events can be forwarded with a certain level of reliability.

HTTP Server and Net Server

HTTP Server and Net Server control network events and event handlers. A Net Server is for events and handlers private protocol, and an HTTP Server allows registering a handler to an HTTP event such as GET or POST. The reason for preparing an HTTP Server is eliminating the need to add event types, as well as the universality of HTTP itself. HTTP Server supports WebSocket as well as HTTP.

vertx-event-and-handler-of-http-server.png

Figure 2: Event and Handler of HTTP Server.

Vert.x Thread Pool

Vert.x has three types of thread pools:

  1. Acceptor: A thread to accept a socket. One thread is created for one port.
  2. Event Loops: (same with Run Loop) equals the number of cores. When an event occurs, it executes a corresponding handler. When execution is performed, it repeats reading another event.
  3. Background: Used when Event Loop executes a handler and an additional thread is required. Users can specify the number of threads in vertx.backgroundPoolSize, an environmental variable. The default is 20. Using too many threads causes an increase in context switching costs, so be cautious.

Event Loops can be described as follows in a detailed way. Event Loops use Netty NioWorkder as it is. All handlers specified by verticles run on Event Loops. Each verticle instance has its specified NioWorker. As such, it is guaranteed that a verticle instance is always executed on an identical thread. Therefore, verticles can be written in a thread-safe manner.

Conclusion

So far, I have briefly described Vert.x architecture. Since Vert.x framework is not widely used, I believe it would be better to detail the concept of designing Vert.x than detail each Vert.x component. Even if you have no interest in network server frameworks, it is helpful to review new products and determine differences between new and existing products. Doing so helps in understanding the evolution and direction of software products that are flooding today's market.


  1. CUBRID License Model

    Written by Charis Chau on 06/08/2020   Why Licenses Matter?   Open source licenses allow software to be freely used, modified, and shared. Choosing a DBMS with suitable licenses could save the development cost of your application or the Total Cost of Ownership (TCO) for your company. Choosing a DBMS without a proper license, you might find yourself situate in a legal grey area!     CUBRID Licenses   Unlike other open source DBMS vendors, CUBRID is solely under open source license instead of having a dual license in both commercial license and open source license. Which means that for you, it is not mandatory to purchase a license or annual subscription; company/organizational users can achieve the saving from Total Cost of Ownership (TCO).   Since CUBRID has been open source DBMS from 2008,...
    Read More
  2. Our Experience of Creating Large Scale Log Search System Using ElasticSearch

    Written by Lee Jae Ik on 05/01/2018 At NHN, we have a service called NELO (NHN Error Log System) to manage and search logs pushed to the system by various applications and other Web services. The search performance and functionality of NELO2, the second generation of the system, have significantly been improved through ElasticSearch. Today I would like to share our experience at NHN in deploying ElasticSearch in Log Search Systems. ElasticSearch is a distributed search engine based on Lucene developed by Shay Banon. Shay and his team have recently released the long-awaited version 0.90. Here is a link to a one-hour recorded webinar where Clinton Gormley, one of the core ElasticSearch developers, explains what's new in ElasticSearch 0.90. If you are developing a system which requires a searc...
    Read More
  3. A Node.js speed dilemma: AJAX or Socket.IO?

    Written by CUBRID Community on 07/14/2017 One of the first things I stumbled upon when I started my first Node.js project was how to handle the communication between the browser (the client) and my middleware (the middleware being a Node.js application using the CUBRID Node.js driver (node-cubrid) to exchange information with a CUBRID 8.4.1 database). I am already familiar with AJAX (btw, thank God for jQuery!!) but, while studying Node.js, I found out about the Socket.IO module and even found some pretty nice code examples on the internet... Examples which were very-very easy to (re)use... So this quickly becomes a dilemma: what to choose, AJAX or sockets.io? Obviously, as my experience was quite limited, first I needed more information from out there... In other words, it was time to do s...
    Read More
  4. Become a Jave GC Expert Series 5 : The Principles of Java Application Performance Tuning

    Written by Se Hoon Park on 06/30/2017 This is the fifth article in the series of "Become a Java GC Expert". In the first issue Understanding Java Garbage Collection we have learned about the processes for different GC algorithms, about how GC works, what Young and Old Generation is, what you should know about the 5 types of GC in the new JDK 7, and what the performance implications are for each of these GC types. In the second article How to Monitor Java Garbage Collection we have explained how JVM actually runs the Garbage Collection in the real time, how we can monitor GC, and which tools we can use to make this process faster and more effective. In the third article How to Tune Java Garbage Collection we have shown some of the best options based on real cases as our examples that you can...
    Read More
  5. Become a Jave GC Expert Series 4 : MaxClients in Apache and its effect on Tomcat during Full GC

    Written by Dongsoon Choi on 06/23/2017 This is the fourth article in the series of "Become a Java GC Expert". In the first issue Understanding Java Garbage Collection we have learned about the processes for different GC algorithms, about how GC works, what Young and Old Generation is, what you should know about the 5 types of GC in the new JDK 7, and what the performance implications are for each of these GC types. In the second article How to Monitor Java Garbage Collection we have explained how JVM actually runs the Garbage Collection in the real time, how we can monitor GC, and which tools we can use to make this process faster and more effective. In the third article How to Tune Java Garbage Collection we have shown some of the best options based on real cases as our examples that you c...
    Read More
Board Pagination Prev 1 2 3 4 Next
/ 4

Join the CUBRID Project on