Load testing Sendbird Calls at scale with Selenium and Kubernetes

September 2, 2020

How to commit to high performance at scale

Introduction: The key challenges load-testing Calls vs. Chat

The recent launch of Sendbird Calls includes a commitment to high uptime and low latency server infrastructure. As an engineering team, before we could make this commitment, we had to validate if we could fulfill the uptime and latency requirements at scale.

We’re excited about the results: Our load tests successfully simulate 10K concurrent 10-minute HD video calls via TURN. Over 600 c5.9xlarge instances generate the number of JavaScript SDK-equipped clients required to achieve this load. 

The process for load testing Sendbird Calls was initially patterned after the load tests performed previously on the Sendbird Chat infrastructure. For load testing our chat infrastructure, we created Node.JS processes throughout EC2 instances, and had them send/receive messages and call APIs using Sendbird Chat JS SDK. There were, however, two major differences when load testing SendBird Calls:

Controlling the runtime environment for Sendbird Calls is more complex than it is for Chat because Calls depends on WebRTC Load testing clients execute in a runtime environment. In Sendbird Calls’ case,  we use the JavaScript SDK. We initially ran chat load testing clients using a node.js runtime environment. But Calls cannot run inside node.js given its current SDK structure because it is dependent on WebRTC. Instead, we must  run it inside virtual instances of the Chrome browser To generate a load for Sendbird Calls is far more resource-intensive than it is for Sendbird Chat for three reasons: Computing resources: We must spin up more instances in order to generate the required load Packet size:  Both inbound and outbound packets are much bigger for Calls than they are for Chat Environment: Virtual instances of the Chrome browser support WebRTC. These instances require a substantial memory footprint.

Given these differences between load-testing Chat and Calls, load-testing Calls presented our team with two main challenges:

To run the JavaScript Call clients inside of virtual browsers To spawn enough instances to generate the required load while optimizing for cost at the same time Running a headless browser

We selected Selenium WebDriver to handle the clients within headless instances of Chrome. Selenium WebDriver’s interfaces enable testers to send commands that execute within an environment that emulates the specified browser (Chrome in this case).

We created two Sendbird Calls SDK users and connected them to call for 10 minutes. The following code briefly demonstrates how we implemented Selenium WebDriver in the load-test: https://github.com/sendbird/calls-loadtesting-blog/blob/master/loadtest.js

Unexpected issues Chrome unexpectedly shut down far more frequently than it should have. When this occurred, we had no choice but to reinitialize the client. We identified a limiting factor in Selenium ChromeDriver. Despite the EC2 instance having ample CPU capacity, the Selenium ChromeDriver seemed to have a limit to the number of Chrome processes that it can manage. When this limit is surpassed, performance drops significantly. We tested out many process numbers but figured out that 15 seemed to be the maximum value that ChromeDriver can handle stably. Therefore, we manually set the number of processes to 15. Managing instances via Kubernetes

We used Kubernetes to orchestrate client instances. Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management. The Cloud Native Computing Foundation maintains it. Kubernetes uses the concept of “pods”

Continue reading

This post was originally published on this site