Open Source RDBMS - Seamless, Scalable, Stable and Free

한국어 | Login |Register

Current Events
Join our developers event to win one of the valuable prizes!
posted last year
viewed 14342 times
Share this article

PostgreSQL at a glance

PostgreSQL shows excellent functionalities and performance. Considering its high quality, it may seem strange that PostgreSQL is not more popular. However, PostgreSQL continues to make progress. This article will discuss this database.

Why You Should Know about PostgreSQL

PostgreSQL is an RDBMS, which is popular mainly in North America and Japan. It is not used much in Korea yet, but as it is a very excellent RDBMS in terms of functionality and performance, it is worth learning about what kind of database PostgreSQL is.

PostgreSQL (pronounced as [Post-Gres-Q-L]) is an object-relational database system (ORDBMS), and is an open-source DBMS that provides the enterprise-level DBMS functionalities and many other functionalities you can find only in advanced DBMS. PostgreSQL is also known as an open-source DBMS that Oracle users can adapt themselves to the most easily, as it has many functionalities similar to those of Oracle.

History

There were many ancestors of PostgreSQL, and of them, Ingres (INteractive Graphics REtrieval System) can be said to be the progenitor of PostgreSQL. Ingres was a project launched by Michael Stonebraker (Picture 1), a great master in the area of databases who is still working hard even today.

320px-Michael_Stonebraker_1.jpg

Picture 1: Michael Stonebraker started Ingres project.

The Ingres project was launched at Berkeley University in the US in 1977. After Ingres Michael Stonebraker had started another project called Postgres (Post-Ingres). As Postgres version 3 was released in 1991, its user base grew to be quite large. But as the burden of providing support to users became too high, the project was terminated in 1993 (Postgres is known to have had a huge influence on the current Informix product, even after the end of the project. Illustra, a commercial version of POSTGRES, was taken over by Informix in 1997, and then by IBM in 2001.).

postgresql_history.png

Figure 1: Product History.

Despite the project having ended, Postgres users and students continued its development and finally created Postgres95, which achieved 40% better performance than Postgres by supporting SQL and improving its structure.

When Postgre95 became an open-source system in 1996, it was given the name PostgreSQL, its current name, to reflect the fact that it succeeded Postgres and supports SQL (Postgres supported a language called QUEL instead of SQL). In 1997, PostgreSQL was finally released after determining its first version as 6.0.

Since then, PostgreSQL has been actively developed to this day through an open-source community, and the latest release is 9.2, as of May 2013. In addition, due to its open license (like the BSD or MIT license, PostgreSQL allows commercial use and modification, but it also clarifies that the original developers are not liable for any problem that may occur in its use), there have been more than 20 various forks, some of which have had an influence on PostgreSQL and some of which have disappeared.

postgresql_logo.pngPostgreSQL's logo is an elephant named 'Slonik' (a baby elephant in Russian language). The true reason why an elephant was used for the logo is not known, but it has been said that just after it became an open-source system, one of its users was inspired by Agatha Christie's novel "Elephants Can Remember" and suggested it. Since then, the elephant logo has been visible at every official PostgreSQL event.

As elephants are thought of as large, strong, reliable and have a good memory, Hadoop and Evernote also use an elephant as their official logo.

Functionalities and Limitations

PostgreSQL supports transaction and ACID, which are the basic functionalities of a relational DBMS. Moreover, PostgreSQL also has many progressive functionalities or expanded functionalities for academic research as well as for basic reliability and stability. Even a general list of PostgreSQL functionalities includes a large number of functionalities.

  • Nested transactions (savepoints)
  • Point in time recovery
  • Online/hot backups, Parallel restore
  • Rules system (query rewrite system)
  • B-tree, R-tree, hash, GiST method indexes
  • Multi-Version Concurrency Control (MVCC)
  • Tablespaces
  • Procedural Language
  • Information Schema
  • I18N, L10N
  • Database & Column level collation
  • Array, XML, UUID type
  • Auto-increment (sequences), 
  • Asynchronous replication
  • LIMIT/OFFSET
  • Full text search
  • SSL, IPv6
  • Key/Value storage
  • Table inheritance

In addition to these, it features a variety of functionalities and new functionalities of enterprise-level DBMS.

In general, PostgreSQL has the following limits:

Table 1: Basic Limits of PostgreSQL.
Limit Value
Max. Database Size Unlimited
Max. Table Size 32 TB
Max. Row Size 1.6 TB
Max. Field Size 1 GB
Max. Rows per Table Unlimited
Max. Columns per Table 250~1600
Max. Indexes per Table Unlimited

Roadmap

As of May 2013, the latest release is 9.2. Figure 2 provides some brief information on the progress of PostgreSQL by year.

progress_of_postgresql_by_year.png

Figure 2: Progress of PostgreSQL by Year.

The main functionalities of each version are as follows:

Table 2: Main Functionalities by Version.
VersionRelease YearMain Functionalities
0.01 1995
  • Postgres95 release
1.0 1995
  • Copyright change, open source
6.0~6.5

1997~1999

  • Renamed PostgreSQL
  • Index, VIEWs and RULEs
  • Sequences, Triggers
  • Genetic Query Optimizer
  • Constraints, Subselect
  • MVCC, JDBC interface,
7.0~7.4

2000~2010

 

  • Foreign keys
  • SQL92 syntax JOINs
  • Write-Ahead Log
  • Information Schema, Internationalization
8.0~8.4

2005~2012

  • Native Support for MS Windows
  • Savepoint, Point-in-time recovery
  • Two-phase commit
  • Table spaces, Partitioning
  • Full text search
  • Common table expressions (CTE)
  • SQL/XML, ENUM, UUID Type
  • Window functions
  • Per-database collation
  • Replication, Warm standby
9.0

2010-09

  • Streaming replication, Hot standby
  • Support for 64bit MS Windows
  • Per-column conditional trigger
9.1

2011-09

  • Functionality differentiation
  • Synchronous replication
  • Per-column collations
  • Unlogged tables
  • K-nearest-neighbor indexing
  • Serializable isolation level
  • Writeable CTE (WITH)
  • SQL/MED External Data
  • SE-Linux integration
9.2

2012-09

  • Performance optimization
  • linear scalability to 64 cores
  • Reduction in CPU power consumption
  • Cascade streaming replication
  • JSON, Range Type
  • Improved lock management
  • Space-partitioned GiST index
  • Index-only scans (covering)

The next PostgreSQL release under development is PostgreSQL 9.3, which is due to be released in the third quarter of 2013. This release features many functionalities, including an enhanced management functionality, parallel query, MERGE/UPSERT, multi-master replication, materialized view, and enhanced multi-language support.

Internal Structure

The following shows the process structure:

postgresql_process_structure.png
Figure 3: Process Structure.

If the client requests connection with the server through the (1) interface library (variety of Interfaces Including libpg, JDBC and ODBC), the Postmaster process relays connection with the server (2). Then, the client executes a query through connection with the allocated server (Figure 3).

The following shows the process of query execution in the server:

postgresql_query_execution_procedure.png

Figure 4: Query Execution Procedure.

If it receives a query request from the client, the system creates a parse tree through the syntax analytics process (1), starts a new transaction through the semantic checking process (2) and creates a query tree.

Next, a query tree is re-generated according to the rules defined in the server (3), and of the many available execution plans, the most optimized plan tree is created (4). The server executes this (5) and sends the result of the requested query to the client.

While the server executes a query, a system catalog in the database is frequently used. In the system catalog, users can directly define the type of functions and data, as well as index access methods and rules. In PostgreSQL, therefore, a system catalog is utilized as an important point in adding or expanding its functionalities.

A file that stores data consists of multiple pages, and a single page has a scalable slotted page structure (Figures 5 and 6).

postgresql_data_page_structure.png

Figure 5: Data Page Structure.

postgresql_index_page_structure.png

Figure 6: Index Page Structure.

Development Process

The development process model of PostgreSQL can be explained by the following sentence:

‘A community-based open-source project led by a few.’

Like the Linux, Apache and Eclipse projects, the PostgreSQL project is also composed only of a few administrators, a variety of developers and a large number of users. The small administrator group (Core Team) collects requests and feedback (the group sometimes takes a vote to determine priorities at http://postgresql.uservoice.com) from a large number of users, determines the direction of the product, has final approval right for the code and exerts its right for release. This is a different model from corporate management development processes such as MySQL and JBoss.

The developer group consists of code committers and code developers/contributors. They are located in many countries, including the U.S., Japan and Europe.

distribution_of_postgresql_developers_by_region.png

Figure 7: Distribution of PostgreSQL Developers by Region.

Codes developed by a variety of developers go through a variety of review processes (Submission Review, Usability Review, Feature Test, Performance Review, Coding Review, Architecture Review, Review Review), and are reflected in the product after approval by the Core Team. The mailing list that has been used by the community for a long time is usually used, and a variety of documents, including manuals, are well maintained through the official website.

Products in Competition

PostgreSQL wants to be compared with enterprise-level commercial DBs, but it has been compared mainly with popular open-source DBMSs. The following are the catchphrases of these open-source DBMSs, each of which reflects its features:

  • PostgreSQL: The world's most advanced open source database
  • MySQL: The world's most popular open source database
  • CUBRID: Open Source Database Highly Optimized for Web Applications
  • Firebird: The true open source database
  • SQLite: self-contained library, serverless, zero-configuration, transactional SQL database engine

It is not easy to compare these products using their catchphrases alone, but you can see that PostgreSQL seeks progressiveness and openness.

The following is brief comparison of PostgreSQL and its competitiors:

Table 3: Comparison of Products in Competition.
Oracle
  • An enormous amount of long-proven code and a variety of references.
  • High cost
DB2, MS SQL
  • Similar to Oracle
MySQL
  • A variety of applications and references.
  • Corporate development model
  • And the burden of licensing
CUBRID
  • An alternative to MySQL
  • Built-in HA and database sharding
  • Dual licensing
Other commercial DBs
  • Other commercial DBs show a downtrend due to open-source DBMSs
Other open source DBs
  • Struggle to attract developers

For a long time, the PostgreSQL community has made attempts to enter the enterprise DBMS market. In 2004, EnterpriseDB, a company using PostgreSQL, was established, and it is striving to strengthen its position in the enterprise DBMS market. One of the company's main products is Postgres Plus Advanced Server. Postgres Plus Advanced Server was developed by adding Oracle-compatible functionalities (PL/SQL, SQL statements, functions, DB Links, OCI library, etc.) to the open-source PostgreSQL, featuring easy data and application migration and a cost reduction of 20% compared to Oracle (Figure 7).

postgresql_cost_reduction_compared_to_oracle.png

Figure 8: Cost Reduction Compared to Oracle.

In addition, Postgres Plus Advanced Server provides differentiated services, including a training, consulting and migration, and technical support service from PostgreSQL experts. Through approximately 300 reference sites in a variety of areas, the product is promoted as a database for all industries, with a growing base of users across the world.

Present Status and Trend

As you can see from most posts on PostgreSQL, most PostgreSQL users have a developer-like tendency, and are very loyal to the product.

In fact, they have a good reason for their loyalty. PostgreSQL provides sufficient functionalities and conservative performance compared to other products, and one of its advantages is that it has good enough conditions for beginners to attract new developers.

These good conditions include a well-written manual on the project page, related documents, over 300 reference publications, and over 10 seminars and conferences held in a variety of countries every year. More recently, a PostgreSQL magazine has even appeared. And these are the results of the active PostgreSQL community.

The representative features that PostgreSQL users identify as being important are as follows:

  • Reliability is the top priority of the product
  • ACID and transaction
  • A variety of indexing techniques
  • Flexible full-text search
  • MVCC for better concurrency performance
  • Diverse and flexible replication methods
  • A variety of procedures (PL/pgSQL, Perl, Python, Ruby, TCL, etc.)/Interface (JDBC, ODBC, C/C++, .Net, Perl, Python, etc.) languages
  • Excellent community and commercial support
  • Well-made documents and a thorough manual

A variety of expansion functionalities and ease of development of such functionalities are also advantages of PostgreSQL. The following are the differentiated expansion functionalities of PostgreSQL:

  • GIS add-on (PostGIS)
  • Key-Value store expansion (HStore)
  • DBLink
  • Support for a variety of functions and types, including Crypto and UUID

There are many other practical and experimental expansion functionalities as well.

Of these, you will see a brief account of GIS (Geographic Information System), which has recently become a hot topic. PostGIS is a middleware expansion functionality that enables PostgreSQL to conform to the OpenGIS standard and support geographic objects (Figure 9).

postgis_structure.png

 

Figure 9: PostGIS Structure.

 

PostGIS began to be developed from 2001, and with many functionality and performance improvements, it currently has the most users among the open-source products. There are some commercial products, such as Oracle Spatial, DB2 and MS SQL Server, but the commercial products have not been as well-received in terms of price-performance ratio.

In addition, you can easily find benchmark data that shows that the functionalities and performance of PostGIS/PostgreSQL are worthy of comparison to Oracle.

According to the recent trend, PostgreSQL is also much talked about in relation to cloud as well as GIS. With the recent increase in the number of companies providing DBaaS (Database as a Service), the demand for PostgreSQL, which has advantages in terms of costs and license, has increased, and as such EnterpriseDB has released Postgres Plus Cloud Database in the cloud market, with the following features:

  • Simple setup & web-based management
  • Automatic scaling, load balancing and failover
  • Automated online backup
  • Database Cloning

It is used in many web services, including Amazon EC2, Eucalyptus cloud, and Red Hat Openshift development platform cloud. Other cloud service providers such as Heroku and dotCloud also provide services using PostgreSQL.

Conclusion

As Sun, which had acquired MySQL, was acquired by Oracle in 2009, MySQL began to be developed as a more closed corporate project, and many MySQL developers left the community around the same time. Afraid of this change, MySQL users are paying attention not only to the forks (MariaDB, Drizzle, Percona, etc.) of MySQL to which they can easily migrate, but also to the migration to PostgreSQL.

Looking at the trend of help-wanted ads related to PostgreSQL and MySQL in the most popular job finding portal http://www.indeed.com (Figure 9), we can see the increase in help-wanted ads related to MySQL is slowing down, while help-wanted ads related to PostgreSQL continue to increase.

trend_of_help_wanted_ads.png

Figure 10: Trend of Help-wanted Ads.

According to the trend of search frequency in search sites (Figure 10), MySQL shows a continued downtrend, while PostgreSQL seems to have almost no change. In Korea, however, the search frequency for PostgreSQL has shown an upward trend since mid 2010.

search_frequency_trend.png

Figure 11: Search Frequency Trend (source).

Of course, the popularity and usage of MySQL is still much higher than PostgreSQL. Although you may not be able to determine the true status or prospects of these products from the above data alone, you could infer that if the popularity of MySQL declines, the popularity of PostgreSQL will increase.

PostgreSQL is not yet powerful enough to surpass MySQL in popularity, but the PostgreSQL open source project community continues to make the following efforts: 

  • Improvement of the reliability of basic DBMS functionalities
  • Provision of progressive and differentiated functionality expansion
  • Continuous attraction of more open source developers

In addition, EnterpriseDB, which has stronger business purposes, is also striving to achieve the following objectives:

  • Expansion of its share in the enterprise market
  • Expansion of its share in the cloud market
  • Efforts to replace Oracle and MySQL

By Kim Sung Kyu, Senior Software Engineer at CUBRID DBMS Lab, NHN Corporation.



comments powered by Disqus