posted 5 years ago in Dev Platform category by Esen Sagynov
If you have a smartphone, you must be using LINE, KakaoTalk, or Whatsapp messaging app, if not all of them. But unlike KakaoTalk and Whatsapp, LINE also allows users to call other LINE users for free - great value-added service, isn't it!? Today I will explain more about LINE, its background story as well as how its developers use NoSQL together with an in-memory data storage to manage billions of user messages per month!
So, you already know about the first advantage of LINE - free calling service. But there is another one which gives even more convenience for users - it is available not only on leading mobile platforms (iOS and Android), but also on Mac OS X and Windows desktop operating systems. Now in order to send a message to your friend, you no longer need to take out your mobile phone and type on its tiny keyboard. You can use the desktop LINE to make the conversation fast and more convenient! You type on your computer, your friends and family read the message on their mobile or desktop app. Staying in touch for FREE has never been easier. I am sold!
Now the main question: how LINE app was created?
It may sound somewhat bizarre but it is all about the 2011 Japan Earthquake. You all know how tragic it was. Communication was disrupted, major damage was caused to many buildings, roads and transportation systems. In such situations cell phones still work but service is sketchy. Data works better than voice. This was the impetus to my colleagues at NHN Japan to design an app accessible on smartphones, tablet, and PC, which would work on data network and provide continuous and free instant messaging and calling service.
The name LINE was originated from the fact that after the incident people had to line up outside of public phones, because in Japan the public phones "are programmed to take priority over networks during and after an earthquake".
Later in June 2011 LINE app was launched.
Today LINE is one of the most popular beautifully designed instant messaging and calling app available for iOS, Android, tablet, and desktop users. It took only nine months for LINE to overcome 30 million registered users milestone. At the time of writing this article LINE has already 36 million users worldwide and is recognized as being a "Fast and Light" messenger that is considered as the "The Number 1 Free App" in many countries, especially in South East Asia.
LINE app allows group chatting with up to 100 people at once, sending photo video files as well as location info, making free voice calls both on Wi-Fi and cellular network. But one of the greatest values in LINE is its 250+ expressive sticker and emotions collection which make chatting even more exciting and fun.
Database Storage for Instant Messaging App
With increasing number of LINE users, the number of chat messages to store grows exponentially (see the graph below). These days LINE needs to store billions of rows in a database every month. However, back in early development stage LINE developers had anticipated the service load at most from only 1 million registered users. With such assumptions the developers had deployed Redis, an in-memory key-value store, which provides synchronous and asynchronous replication and enables DB administrators to take periodical disk snapshots.
Since Redis does not provide server-side sharding, LINE developers had to implement sharding on the client side and manage it with ZooKeeper. Thus, the entire LINE storage stack initially was based on single Redis cluster consisting of 3 nodes sharded on the application layer. With the growing service, more nodes were added to the cluster.
However, everything was changed around October 2011 when LINE started to experience extreme load due to unexpected growth worldwide (see above graphs). This have caused some outages, and delay in message delivery. The problem was in the core of Redis, in the fact that it is an in-memory store, which, unlike persistent storage systems, requires more servers. Managing LINE's tremendous growth with Redis would cost an arm and a leg. It was critical to find an alternative solution which would provide high scalability and availability at a relatively less cost.
To solve this problem, LINE developers had first identified the types of data they have to store depending on how much availability and scalability were important for each type. As a result they came to a conclusion that:
- for user messages in delivery queue availability is critical;
- for user profile info as well as contacts and groups both scalability and availability are important;
- for messages in the Inbox as well as change-sets of (2) due to its massive size scalability is vital.
To achieve availability LINE developers chose to stick with Redis in-memory data store. For scalability they had three options: HBase, Cassandra, and MongoDB.
- Best matches the requirements
- Easy to operate (Storage system built on DFS, multiple ad hoc partitions per server)
- Random read and deletion are somewhat slow.
- Slightly lower avaiability (there’re some SPOF)
- Also suitable for dealing with the latest workload
- High Availability (decentralized architecture, rack/DC-aware replication)
- High operation costs due to weak consistency
- Counter increments are expected to be slightly slower.
- Auto sharding, auto failover
- A rich range of operations (but LINE storage doesn’t require most of them)
- NOT suitable for the timeline workload (B-tree indexing)
- Ineffective disk and network utilization
Finally out of three, HBase became the successor as a primary storage for the third type of their user data on top of HDFS. Sharded Redis is still running as a front-end cache for data stored in HBase and as a primary storage for user profile, contacts and groups. Below is the current LINE storage stack architecture.
LINE storage stack: over 600 nodes and growing.
LINE developers at NHN Japan have posted a very detailed blog article where they explain how everything is designed, how they had migrated from original Redis based architecture to HBase based architecture. You are highly encouraged to read this article if you would like to learn more about how they have installed and configured separate components of their storage stack.