By Steven Warwick, Solutions Architect – AWS
By Josue Lima, Director of Software Engineering – Pega
The Pega Digital Messaging Service allows customer service applications to receive and send messages in a simplified, consistent format over digital channels such as Apple Messages for Business, Facebook, SMS, WhatsApp, Twitter, and web chat. Each customer is considered a tenant of the system and can create multiple sessions defined by a unique identifier, and each session can create multiple connections.
Handling traffic surges and unpredictable load is critical for providing stable and consistent performance in multi-tenant customer service applications.
During a web chat conversation, for example, customers can use multiple web browser tabs and windows together as a single session using a unique identifier. Sessions provide the flexibility to maintain message state across browser tabs or windows, but leads to higher message rates due to maintaining the conversation across all open connections.
Message floods can result in overloaded call centers and are typically experienced during new product or service releases, promotions, and customer incidents.
Amazon API Gateway has the option to leverage usage plans with HTTP connections, but usage plans are not available for WebSockets. Because of this, an API Gateway Lambda Authorizer can be used to validate a Pega customer (tenant) and session, as well as control the connect and message rates.
Amazon API Gateway has an upper bound on how many connections per second it can handle. Controlling the connection rate per tenant allows each tenant to have a specific number of the total connections per second of the API Gateway. Meanwhile, message rate limiting enables the control of the load on the downstream systems that process the messages.
This post explores how Pegasystems built a solution to enable the Pega Digital Messaging service to manage inbound multi-tenant WebSocket connection and message rates.
The solution shown in Figure 1 below leverages Amazon Simple Queue Service (SQS) FIFO queues, AWS Lambda, and Amazon DynamoDB. These services are used to control the overall message flow; Lambda functions are used to check and update the session and tenant rate limits which are stored in the DynamoDB database.
The services work together to throttle connections and messages at specific rates per tenant. A sample application for this solution demonstrates the interactions between the services.
Incoming connections are rate limited during the Amazon API Gateway connect phase. Messages are rate limited during the message processing stage before the message is sent to any downstream system.
A simple token bucket algorithm is used to keep track of connection and message rates for each tenant and session in DynamoDB tables.
Connections and messages are processed in four stages:
- A connection is made with invalid tenant, or session is rejected with a 403 Forbidden response (see Figure 2 later in this post).
- User connection hits a connection rate limit and receives a 429 Too Many Requests response (see Figure 3).
- User connection hits a message rate limit and receives a 429 Too Many Requests response. This stops any further message processing to lower resource usage (see Figure 4).
- User connection is successful and processes messages without throttling (see Figure 5).
Figure 1 – Amazon API Gateway WebSocket multi-tenant rate limit architecture.
- Client sends an HTTP PUT request to the Amazon API Gateway HTTP endpoint to create a session for a tenant. A tenant is defined by a unique identifier which is created during the onboarding process of a tenant (this is outside the scope of this post). If required, the call can be authenticated but that, too, is outside the scope of this post.
- A Lambda function will create a session for a tenant and store it in a DynamoDB table with a Time To Live (TTL) value specified. Amazon DynamoDB Streams are used to remove all session connections if no communication is sent or received over a specific period of time. Each call to the database layer is restricted by conditional keys using sts:TransitiveTagKeys so each tenant can only access its specific rows based on the tenant ID.
- Once a session is created, the client will initiate a WebSocket connection to the API Gateway WebSocket endpoint. A session can be used multiple times to create connections from multiple web browser windows. The session is used to keep these different connections in sync.
- A Lambda authorizer is used as the authorizer for the WebSocket connection. The authorizer will do the following:
- Validate the tenant exists.
- Validate the session exists.
- Add the tenant ID, session ID, connection ID, and tenant settings to the authorization context.
- A Lambda function is used for the connect route, which throttles inbound connections and returns a 429 response code if over a limit. The following checks and processing are done:
- Over the total number of connections allowed for this tenant.
- Over the total number of connections allowed for this session.
- Over the total number of connections per minute allowed for the tenant.
- Over the total number of connections per minute allowed for the session.
- Add the connection ID to the sessions connection ID set and update the session TTL.
- Increment the total number of connections for the tenant.
- Messages are processed via a siloed or pooled SQS FIFO queue depending on the API Gateway route. SQS FIFO queues are used to keep messages in order. This is because, if messages are sent directly to the Lambda function, there is the possibility a cold start could occur on the first message. This would delay its processing, while a follow-up message hits a warm Lambda function causing it to process faster and return an out of order reply. The tenant ID, session ID, connection ID and tenant settings are added to each message as message metadata. SQS FIFO queues use a combination of tenant ID and session ID for the SQS message group ID to keep messages in order. Each inbound message will update the DynamoDB session TTL, which resets the session timeout and allows the connection to stay connected longer due to activity.
- Silo-based messages are processed by the tenant’s corresponding SQS FIFO queue, which is named using the tenant ID. A Lambda function per tenant is used to read messages from the tenant’s SQS FIFO queue.
- Pool-based messages are processed by a single pooled SQS FIFO queue. A single Lambda function is used by all tenants to read messages from the pooled SQS FIFO queue.
- A Lambda function is used during disconnect to do the following:
- Remove the connection ID from the session connection ID set.
- Decrement the total number of connections for the tenant.
- Once all connections are closed, the client will send an HTTP DELETE request to the API Gateway HTTP endpoint to remove the session.
Sequence of WebSocket Connections and Messages
Each connection will be validated to make sure the tenant and session information is correct. If a tenant or session is invalid a 403 Forbidden result code will be returned.
Figure 2 – Connection is made with invalid tenant or session rejected with a 403 Forbidden response.
After a tenant and session have been validated the connection is processed by the connect Lambda function to validate the number of connections for the current tenant. If the number of connections is over the current limit, a 429 response code is returned to the user.
Figure 3 – User connection hits a connection rate limit and receives a 429 Too Many Requests response.
When a tenant and session has been validated and the connection is accepted, messages can be processed by the message Lambda. The message Lambda will increment the total message count for the tenant and session over the current period.
If the total message count over the current limit for the tenant or session, a response message of Too Many Requests will be sent and no further processing of the current message will occur.
Figure 4 – User connection hits a message rate limit and receives a Too Many Requests response. This stops any further message processing to lower resource usage.
A message is processed successfully if the tenant and session is valid and no connection or message limits are exceeded. Messages will continue to be processed until the user disconnects.
Figure 5 – User connection is successful and processes messages without throttling.
SQS FIFO Queue vs. Lambda Function
The main reason SQS FIFO queues are used is to ensure messages will be consumed in the correct order in a chat conversation.
Amazon API Gateway can directly call a Lambda function to handle message processing, but the potential exists for messages to be processed out of order.
- Group messages by a group ID to reduce problems with noisy neighbors. This results in more parallel processing and a non-blocking message flow. For example, if Tenant A has a surge in messages received, then Tenant B’s speed of processing won’t be affected since the grouping and order are ensured at the tenant level.
- In case of failure, the order is kept during the retries. When a message from a Group ID is received, no more messages for that same Group ID are processed unless you delete the message being processed or it becomes visible again (retry).
- Avoid racing conditions between Amazon API Gateway and Lambda functions. Lambda function cold starts, or even calls to third-party services inside the Lambda function code, can cause messages to be processed out of order. For example, if message A hits a cold Lambda function and message B hits a warm Lambda function, message B would be consumed ahead of message A, which is not ideal. Even if the order of execution is preserved, it could be the case where an external dependency fails, and message A has to be retried. The order of the messages is crucial to keep the context and integrity of a customer service interaction (chat conversation).
If FIFO ordering is not required, a direct Lambda function call could be beneficial. When doing a direct Lambda function call, keep in mind provisioned concurrency to ensure each tenant is processing messages at a rate acceptable for their tier level. The same considerations would be required when executing a Lambda function directly vs. sending to an SQS queue.
Silo vs. Pool vs. Bridge
The services used in this solution can be configured in multiple ways to process messages. Configuring the services for a silo, pool, or bridge mode can affect different aspects of the system, such as increasing flow rate per tenant, creating a tenant tiering model, or reducing operational overhead and cost.
Pegasystems chose to use a bridged mode which adds the flexibility to use tiers and isolate customers requiring more capacity into dedicated queues.
- Silo message handling is done by creating a SQS FIFO queue per tenant. Each tenant queue is processed by one or more dedicated Lambda functions. This model provides the most flexibility for controlling flow rates per tenant, but this could cause an increase in operational overhead and cost. Depending on the SQS message group ID, more than one Lambda function may be used to process the FIFO queue, which would increase performance but also increase cost.
- Pooled message handling is accomplished by creating a single SQS FIFO queue for all tenants. One or more Lambda functions will process the queue and use the tenant information contained in the metadata to handle tenant flow rates. This method can decrease operational overhead and cost due to shared resources. One major drawback is handling multiple tenants may hit the FIFO queue limits, such as message throughput on a single FIFO queue.
- Bridged message handling is a mix of both pool and silo message handling. Some tenants may be assigned a single queue, while others are mixed together depending on the tenants’ tier level. Using the bridged model allows for multiple tiers to be created; some queues may have a lower number of tenants per queue and others may have a higher number, with each having a different pay structure.
AWS Lambda Authorizer and Connect Route Performance
The architecture shown (see Figure 1) leverages Amazon DynamoDB as the database to store connection, session, and message rates for each tenant.
DynamoDB was selected after considering performance, durability, serverless, and cost metrics. Durability is required so the system retains session information.
Serverless makes the operational aspect of the system easier to maintain, and DynamoDB can auto scale during active periods. Its millisecond response time is acceptable for this use case.
Amazon DynamoDB Accelerator (DAX) could be added to the solution to increase performance while maintaining a serverless approach if lower latency is required.
Amazon ElastiCache or Amazon MemoryDB for Redis could also be used in this scenario; the choice comes down to use case requirements for latency, durability, cost, and operational management. If latency is critical, ElastiCache delivers sub-millisecond latency vs. DynamoDB which is in the single-digit millisecond range.
Cost calculations should consider the number of connections per second and total bytes for each read/write.
Note that Amazon API Gateway authorization caching is currently not available for WebSocket connections, and each connection will invoke the authorizer and connect routes.
Caching data with AWS Lambda extensions should also be considered to reduce the number of request calls to the database layer.
In this post, we showed how you can build a multi-tenant messaging platform using Amazon API Gateway, AWS Lambda, SQS, and Amazon DynamoDB to rate limit WebSocket connections and messages.
Noisy neighbor concerns and message ordering are important for messaging systems and can be reduced using this architecture. Care must be taken to give each tenant the best experience while also providing the choice of lower cost or increased performance when required.
Pegasystems – AWS Partner Spotlight
Pegasystems is an AWS Partner that simplifies the most complex business issues through automation, engagement, and low-code development.
*Already worked with Pegasystems? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.