posted 7 months ago in Dev Platform category by Kim Sung Kyu
PostgreSQL shows excellent functionalities and performance. Considering its high quality, it may seem strange that PostgreSQL is not more popular. However, PostgreSQL continues to make progress. This article will discuss this database.
Why You Should Know about PostgreSQL
PostgreSQL is an RDBMS, which is popular mainly in North America and Japan. It is not used much in Korea yet, but as it is a very excellent RDBMS in terms of functionality and performance, it is worth learning about what kind of database PostgreSQL is.
PostgreSQL (pronounced as [Post-Gres-Q-L]) is an object-relational database system (ORDBMS), and is an open-source DBMS that provides the enterprise-level DBMS functionalities and many other functionalities you can find only in advanced DBMS. PostgreSQL is also known as an open-source DBMS that Oracle users can adapt themselves to the most easily, as it has many functionalities similar to those of Oracle.
There were many ancestors of PostgreSQL, and of them, Ingres (INteractive Graphics REtrieval System) can be said to be the progenitor of PostgreSQL. Ingres was a project launched by Michael Stonebraker (Picture 1), a great master in the area of databases who is still working hard even today.
Picture 1: Michael Stonebraker started Ingres project.
The Ingres project was launched at Berkeley University in the US in 1977. After Ingres Michael Stonebraker had started another project called Postgres (Post-Ingres). As Postgres version 3 was released in 1991, its user base grew to be quite large. But as the burden of providing support to users became too high, the project was terminated in 1993 (Postgres is known to have had a huge influence on the current Informix product, even after the end of the project. Illustra, a commercial version of POSTGRES, was taken over by Informix in 1997, and then by IBM in 2001.).
Figure 1: Product History.
Despite the project having ended, Postgres users and students continued its development and finally created Postgres95, which achieved 40% better performance than Postgres by supporting SQL and improving its structure.
When Postgre95 became an open-source system in 1996, it was given the name PostgreSQL, its current name, to reflect the fact that it succeeded Postgres and supports SQL (Postgres supported a language called QUEL instead of SQL). In 1997, PostgreSQL was finally released after determining its first version as 6.0.
Since then, PostgreSQL has been actively developed to this day through an open-source community, and the latest release is 9.2, as of May 2013. In addition, due to its open license (like the BSD or MIT license, PostgreSQL allows commercial use and modification, but it also clarifies that the original developers are not liable for any problem that may occur in its use), there have been more than 20 various forks, some of which have had an influence on PostgreSQL and some of which have disappeared.
PostgreSQL's logo is an elephant named 'Slonik' (a baby elephant in Russian language). The true reason why an elephant was used for the logo is not known, but it has been said that just after it became an open-source system, one of its users was inspired by Agatha Christie's novel "Elephants Can Remember" and suggested it. Since then, the elephant logo has been visible at every official PostgreSQL event.
As elephants are thought of as large, strong, reliable and have a good memory, Hadoop and Evernote also use an elephant as their official logo.
Functionalities and Limitations
PostgreSQL supports transaction and ACID, which are the basic functionalities of a relational DBMS. Moreover, PostgreSQL also has many progressive functionalities or expanded functionalities for academic research as well as for basic reliability and stability. Even a general list of PostgreSQL functionalities includes a large number of functionalities.
- Nested transactions (savepoints)
- Point in time recovery
- Online/hot backups, Parallel restore
- Rules system (query rewrite system)
- B-tree, R-tree, hash, GiST method indexes
- Multi-Version Concurrency Control (MVCC)
- Procedural Language
- Information Schema
- I18N, L10N
- Database & Column level collation
- Array, XML, UUID type
- Auto-increment (sequences),
- Asynchronous replication
- Full text search
- SSL, IPv6
- Key/Value storage
- Table inheritance
In addition to these, it features a variety of functionalities and new functionalities of enterprise-level DBMS.
In general, PostgreSQL has the following limits:
|Max. Database Size||Unlimited|
|Max. Table Size||32 TB|
|Max. Row Size||1.6 TB|
|Max. Field Size||1 GB|
|Max. Rows per Table||Unlimited|
|Max. Columns per Table||250~1600|
|Max. Indexes per Table||Unlimited|
As of May 2013, the latest release is 9.2. Figure 2 provides some brief information on the progress of PostgreSQL by year.
Figure 2: Progress of PostgreSQL by Year.
The main functionalities of each version are as follows:
|Version||Release Year||Main Functionalities|
The next PostgreSQL release under development is PostgreSQL 9.3, which is due to be released in the third quarter of 2013. This release features many functionalities, including an enhanced management functionality, parallel query, MERGE/UPSERT, multi-master replication, materialized view, and enhanced multi-language support.
The following shows the process structure:
Figure 3: Process Structure.
If the client requests connection with the server through the (1) interface library (variety of Interfaces Including libpg, JDBC and ODBC), the Postmaster process relays connection with the server (2). Then, the client executes a query through connection with the allocated server (Figure 3).
The following shows the process of query execution in the server:
Figure 4: Query Execution Procedure.
If it receives a query request from the client, the system creates a parse tree through the syntax analytics process (1), starts a new transaction through the semantic checking process (2) and creates a query tree.
Next, a query tree is re-generated according to the rules defined in the server (3), and of the many available execution plans, the most optimized plan tree is created (4). The server executes this (5) and sends the result of the requested query to the client.
While the server executes a query, a system catalog in the database is frequently used. In the system catalog, users can directly define the type of functions and data, as well as index access methods and rules. In PostgreSQL, therefore, a system catalog is utilized as an important point in adding or expanding its functionalities.
A file that stores data consists of multiple pages, and a single page has a scalable slotted page structure (Figures 5 and 6).
Figure 5: Data Page Structure.
Figure 6: Index Page Structure.
The development process model of PostgreSQL can be explained by the following sentence:
‘A community-based open-source project led by a few.’
Like the Linux, Apache and Eclipse projects, the PostgreSQL project is also composed only of a few administrators, a variety of developers and a large number of users. The small administrator group (Core Team) collects requests and feedback (the group sometimes takes a vote to determine priorities at http://postgresql.uservoice.com) from a large number of users, determines the direction of the product, has final approval right for the code and exerts its right for release. This is a different model from corporate management development processes such as MySQL and JBoss.
The developer group consists of code committers and code developers/contributors. They are located in many countries, including the U.S., Japan and Europe.
Figure 7: Distribution of PostgreSQL Developers by Region.
Codes developed by a variety of developers go through a variety of review processes (Submission Review, Usability Review, Feature Test, Performance Review, Coding Review, Architecture Review, Review Review), and are reflected in the product after approval by the Core Team. The mailing list that has been used by the community for a long time is usually used, and a variety of documents, including manuals, are well maintained through the official website.
Products in Competition
PostgreSQL wants to be compared with enterprise-level commercial DBs, but it has been compared mainly with popular open-source DBMSs. The following are the catchphrases of these open-source DBMSs, each of which reflects its features:
- PostgreSQL: The world's most advanced open source database
- MySQL: The world's most popular open source database
- CUBRID: Open Source Database Highly Optimized for Web Applications
- Firebird: The true open source database
- SQLite: self-contained library, serverless, zero-configuration, transactional SQL database engine
It is not easy to compare these products using their catchphrases alone, but you can see that PostgreSQL seeks progressiveness and openness.
The following is brief comparison of PostgreSQL and its competitiors:
|DB2, MS SQL||
|Other commercial DBs||
|Other open source DBs||
For a long time, the PostgreSQL community has made attempts to enter the enterprise DBMS market. In 2004, EnterpriseDB, a company using PostgreSQL, was established, and it is striving to strengthen its position in the enterprise DBMS market. One of the company's main products is Postgres Plus Advanced Server. Postgres Plus Advanced Server was developed by adding Oracle-compatible functionalities (PL/SQL, SQL statements, functions, DB Links, OCI library, etc.) to the open-source PostgreSQL, featuring easy data and application migration and a cost reduction of 20% compared to Oracle (Figure 7).
Figure 8: Cost Reduction Compared to Oracle.
In addition, Postgres Plus Advanced Server provides differentiated services, including a training, consulting and migration, and technical support service from PostgreSQL experts. Through approximately 300 reference sites in a variety of areas, the product is promoted as a database for all industries, with a growing base of users across the world.
Present Status and Trend
As you can see from most posts on PostgreSQL, most PostgreSQL users have a developer-like tendency, and are very loyal to the product.
In fact, they have a good reason for their loyalty. PostgreSQL provides sufficient functionalities and conservative performance compared to other products, and one of its advantages is that it has good enough conditions for beginners to attract new developers.
These good conditions include a well-written manual on the project page, related documents, over 300 reference publications, and over 10 seminars and conferences held in a variety of countries every year. More recently, a PostgreSQL magazine has even appeared. And these are the results of the active PostgreSQL community.
The representative features that PostgreSQL users identify as being important are as follows:
- Reliability is the top priority of the product
- ACID and transaction
- A variety of indexing techniques
- Flexible full-text search
- MVCC for better concurrency performance
- Diverse and flexible replication methods
- A variety of procedures (PL/pgSQL, Perl, Python, Ruby, TCL, etc.)/Interface (JDBC, ODBC, C/C++, .Net, Perl, Python, etc.) languages
- Excellent community and commercial support
- Well-made documents and a thorough manual
A variety of expansion functionalities and ease of development of such functionalities are also advantages of PostgreSQL. The following are the differentiated expansion functionalities of PostgreSQL:
- GIS add-on (PostGIS)
- Key-Value store expansion (HStore)
- Support for a variety of functions and types, including Crypto and UUID
There are many other practical and experimental expansion functionalities as well.
Of these, you will see a brief account of GIS (Geographic Information System), which has recently become a hot topic. PostGIS is a middleware expansion functionality that enables PostgreSQL to conform to the OpenGIS standard and support geographic objects (Figure 9).
Figure 9: PostGIS Structure.
PostGIS began to be developed from 2001, and with many functionality and performance improvements, it currently has the most users among the open-source products. There are some commercial products, such as Oracle Spatial, DB2 and MS SQL Server, but the commercial products have not been as well-received in terms of price-performance ratio.
In addition, you can easily find benchmark data that shows that the functionalities and performance of PostGIS/PostgreSQL are worthy of comparison to Oracle.
According to the recent trend, PostgreSQL is also much talked about in relation to cloud as well as GIS. With the recent increase in the number of companies providing DBaaS (Database as a Service), the demand for PostgreSQL, which has advantages in terms of costs and license, has increased, and as such EnterpriseDB has released Postgres Plus Cloud Database in the cloud market, with the following features:
- Simple setup & web-based management
- Automatic scaling, load balancing and failover
- Automated online backup
- Database Cloning
It is used in many web services, including Amazon EC2, Eucalyptus cloud, and Red Hat Openshift development platform cloud. Other cloud service providers such as Heroku and dotCloud also provide services using PostgreSQL.
As Sun, which had acquired MySQL, was acquired by Oracle in 2009, MySQL began to be developed as a more closed corporate project, and many MySQL developers left the community around the same time. Afraid of this change, MySQL users are paying attention not only to the forks (MariaDB, Drizzle, Percona, etc.) of MySQL to which they can easily migrate, but also to the migration to PostgreSQL.
Looking at the trend of help-wanted ads related to PostgreSQL and MySQL in the most popular job finding portal http://www.indeed.com (Figure 9), we can see the increase in help-wanted ads related to MySQL is slowing down, while help-wanted ads related to PostgreSQL continue to increase.
Figure 10: Trend of Help-wanted Ads.
According to the trend of search frequency in search sites (Figure 10), MySQL shows a continued downtrend, while PostgreSQL seems to have almost no change. In Korea, however, the search frequency for PostgreSQL has shown an upward trend since mid 2010.
Figure 11: Search Frequency Trend (source).
Of course, the popularity and usage of MySQL is still much higher than PostgreSQL. Although you may not be able to determine the true status or prospects of these products from the above data alone, you could infer that if the popularity of MySQL declines, the popularity of PostgreSQL will increase.
PostgreSQL is not yet powerful enough to surpass MySQL in popularity, but the PostgreSQL open source project community continues to make the following efforts:
- Improvement of the reliability of basic DBMS functionalities
- Provision of progressive and differentiated functionality expansion
- Continuous attraction of more open source developers
In addition, EnterpriseDB, which has stronger business purposes, is also striving to achieve the following objectives:
- Expansion of its share in the enterprise market
- Expansion of its share in the cloud market
- Efforts to replace Oracle and MySQL
By Kim Sung Kyu, Senior Software Engineer at CUBRID DBMS Lab, NHN Corporation.