Our company, NHN, has several huge products. One of them is the Korean most popular portal named NAVER having more than 50 million users. Another is "Line" which is one of most popular mobile messengers having more than 90 million users. There are still a number of products ranging from web to game. Beside those, We're also developing various open source platforms including Cubrid(open source RDBMS which supports High Availability) in which the nGrinder wiki site lives. :-) There are more than 1000 developers in NHN. They are always struggling to realize brilliant ideas into products and make these products more scalable.
To make these products more stable and faster, we have been using nGrinder intensively since 2011. We are running multiple instances of nGrinder internally. However we ask our employees to start with the biggest instance which is consists of 5 controllers and 40 agents in 5 different IDCs. We have the official DNS name( http://ngrinder.nhncorp.com : It's only accessible for NHN employees) pointing to L4(load balancer) in front of those 5 controllers. How can you deploy nGrinder in such a large scale? See Controller Clustering Guide.
Most IDCs except one are located in Korea . The exception is located in Singapore. the communication speed b/w a nGrinder controller and agents are not critical, so we could locate all the controllers in single IDC in Korea and distribute agents in each IDCs. We also deployed nGrinder SSO plugin and Network overflow plugin so that every developer can access nGrinder anytime they want without additional login step and control the abnormal test executions which may cause the huge network over traffic. Our nGrinder SSO plugin gets rid of the user management overhead from the nGrinder administration. It works with SiteMinder and it creates a new user account if the user does not exist when a user logs in with his/her SSO account.
Need to explain about the network overflow plugin? Currently our real service areas in each IDC are made up with 10Gbps backbone but our dev areas in each IDC only have 3Gbps. The bandwidths b/w IDCs are even smaller. Therefore, we evaluated there are possibilities to cause the other systems not to serve correctly if nGrinder makes too much traffic especially in each network edge. This really happened when we used Performance Center before. So we set up network overflow plugin for each test to use 1Gbps at maximum. If a test occupies network bandwidth more than 1Gbps, this automatically stops it in force. If the developers need to run the tests which causes over 1Gbps traffic, we ask them to install the separate nGrinder controller and agents under the same switch where the test target is located. It usually took an hour for them to set up their own nGrinder instance.
We only allow each user to use 5 agents without additional steps. This maximizes the agent share by multiple users. We observed more than 90% of tests are finished within 10 min. Some IDCs have 10 agents. Which means at minimum 2 tests can be run at the same time. If a user needs more agents, Instead of increasing the limitation, we recommend him/her to install user agent which belong to the user.
Not only did we use these systematic approaches, we also used a human way to facilitate the use of nGrinder. There is an nGrinder dedicated engineer who always monitors all tests(This guy is on the left. His name is Jo, JiWon. “JiWon” means “Support” in Korean. He spends 30% of his working time on this). Most developers here are Java engineers and don’t have an experience to write python code. What this guy is doing is to help developers not to get lost during the script writing. He has the super user permission enabled account which can see the all tests performed in nGrinder and can run the other scripts without asking additional permission. So whenever red balls are popped up in nGrinder performance test list view like below, he clicks the scripts and validates them to find out what the users are doing wrong. Whenever he finds the clue, he contacts the user using our internal messenger. This makes nGrinder users be able to write the correct nGrinder script only after 2~3 failed attempts. Sometimes few developers say “Keep out of it!.”. But most developers have said “Thanks” because they are in the really busy phase and this small help reduces their overhead.
As I described earlier, We pre-deployed 40 nGrinder agents in our all IDCs and installed some plugins which make the user easy and safe to run their tests. Therefore, nGrinder users don’t need to contact nGrinder admin when they start to run the performance test. The developers can visit the nGrinder URL and run the tests whenever they want. This eventually results to a quite different user experience from what we experienced before. As we described above, we used Peformance Center before. At that time, developers needed to reserve a specific time slot in advance because Performance Center only allow one test at the specific time frame and the user should download and install a bunch of applications. Now it’s changed. Everything is web based and no reservation is required to use nGrinder. This made whole performance tests cheap and a lot of developers eventually have started to treat nGrinder as one of daily development toolset like Eclipse. Now most products are tested not only at the end of the development phase but also in the middle of development phases with nGrinder. We have observed at maximum 3 tests were concurrently running and new tests are invoked every 10 minutes on the average. Regarding the fact that it’s the performance test tool which is usually used at the end of projects’ period, we can say this execution rate is quite impressive.
This is how we have applied nGrinder in the large scale. We believe the share of our experience can make the world more stable.
Any feedback is welcomed. Write the feedback in our mailing list please.