The ForgeRock Core Token Service (CTS) is a specific DS repository used to store and persist AM SSO, OAuth2/OIDC and SAML tokens. In terms of architecture, typical primary/secondary aka active/standby or all active 1:1 AM topologies are adopted for AM to CTS connections (described in the Traditional AM - CTS topology section). Both of these approach have inherent limitations, namely:
- Active/standby topologies do not maximize available hardware (only the primary node services requests) and horizontal scaling of the CTS pool of servers is not possible.
- Active/standby topologies require expensive vertically scaled instances under high load.
- 1:1 all active CTS topologies require stickiness for all session requests to the server which created the token or replication delay becomes a functional issue.
Affinity based load balancing for CTS elegantly resolves these problems.
This article covers the following topics:
Traditional AM - CTS topology
Assuming CTS connections strings are used; the recommended approach to avoid the need for a load balancer and allow AM to manage its own connections is to typically use one of two AM - CTS topologies - Primary/Secondary (aka Active/Standby) or 1:1 all active. These are described below.
Mode 1 - Primary/Secondary
All AMs in the pool communicate with a single CTS server. If this server fails, all connections failover to the secondary CTS server and so on. The following topology depicts this architecture:
In this mode, replication delay is not an issue as all traffic hits the primary and as long as replication happens successfully over time; functionality is unaffected.
A connection string for this mode would look like this:cts1.example.com:1636,cts2.example.com:1636
- In a high load environment, CTS needs to be vertically scaled to ensure a single server can handle all requests. This is extremely costly for both on-prem and cloud based deployments.
- It does not make efficient use of the available hardware; only one server is handling traffic, while the others are sat idle awaiting primary failure.
Mode 2 - All Active CTS with 1:1 Mapping between AM and CTS nodes
Each AM communicates with its own CTS server, if this node fails it connects to its failover CTS. For example, AM1 connects to CTS1 as its primary with CTS2 as it secondary, AM2 connects to CTS2 as its primary and CTS1 as its secondary. The following topology depicts this architecture:
This mode allows an all active CTS farm to maximize hardware usage. This approach must ensure end-to-end stickiness to the AM node that created the session/token for all interactions to prevent functional issues. If this cannot be ensured, this is not a viable option.
The connection string would look like the following, where the serverID for AM1=01 and AM2=02.cts1.example.com:1636|01,cts2.example.com:1636|01,cts2.example.com:1636|02, cts1.example.com:1636|02
- An all active architecture like this is highly susceptible to replication delay if stickiness to the AM node that created a given session cannot be guaranteed. DS replication is based on a “loose” algorithm where all CTS nodes will fully converge over time, but not instantly. This means under high load a token create may hit AM1 and CTS1, however AM2 (which is connected to CTS2) may be hit for the token validate request. If replication has not happened fast enough, the request would error.
What is CTS Affinity based load balancing?
Affinity based load balancing for CTS addresses the key issues highlighted with the more traditional approaches; namely, unlimited horizontal scaling of the CTS pool using smaller, cheaper instances and eliminating functional issues as a result of replication delay.
It does this by making use of the new affinity based load balancing algorithm built into the DS SDK deployed with AM. For each and every inbound token DN, the SDK creates a hash and allocates the result to a specific CTS DS instance in the CTS connection string. All AM servers in the pool compute the same hash and thus send the request to the same CTS instance. The next request for another token DN is hashed and may be sent to another CTS instance in the pool and so on. The end result of this is, for each and every token a create occurs on a specific CTS instance (the token origin server) and from that point on all read, update and delete operations will be sent to this same origin server from any AM node. The following topology depicts this architecture:
The DS SDK also makes sure token creates are spread close to evenly across all CTS nodes in the pool to ensure one CTS server is not overloaded with requests, while the others remain idle. Finally, the DS SDK is instance aware; if the origin server goes down, the request goes to another in the pool and remains sticky to that CTS instance from then on. Assuming replication has happened there will be no functional impact. When the original CTS server comes back online, requests fail back to that server for any requests where it is the origin servers.
The connection string for affinity simply lists all CTS instances as a comma separate list:cts1.example.com:1636,cts2.example.com:1636
It is imperative that all AM servers share the same connection string; affinity will fail if each AM server changes the ordering of the connection string.
What are the advantages of CTS Affinity based load balancing?
- Allows horizontal scaling of the CTS server farm, this is not possible with the traditional topologies.
- Maximizes available resources by introducing an all active CTS pool of servers.
- Allows smaller, cheaper nodes for handling CTS traffic with the all active CTS architecture, rather than expensive vertically scaled nodes.
- Combats replication delay between CTS nodes.
- Removes the need for end-to-end stickiness to AM for session activities (stickiness is still required for authentication when using authentication modules or using authentication trees in AM 5.5 as the AuthID is held in the memory of the AM server that was first hit for the ../json/authenticate call; see OPENAM-8336 (XUI+REST authentication with chains must have sticky load balancing) and Sessions Guide › CTS-Based Sessions for further information).
How do you configure CTS Affinity based load balancing?
- Build an external CTS repository per the following guides: Core Token Service Guide (CTS) › Configuring CTS Token Stores (AM 6.5 and later) or Best practice for configuring an external DS instance for the Core Token Service (CTS) in AM 5.x and 6.
- As the connection string for CTS Affinity needs to be exactly the same across all AM nodes, it is recommended to make the necessary CTS changes at the Server Default level rather than at an instance level to ensure all AM nodes inherit the settings from these defaults. Navigate to: Configure > Server Defaults > CTS.
- Configure the following settings:
- Change the Store Mode to External Token Store.
- Set the Root Suffix to that configured in step 1.
- Set Max Connections appropriately (see Max Connections subsection for more information).
Each AM will create a small number of connections on startup to each CTS node in the pool. As load increases, the number of connections from each AM to each CTS node will increase up to this Max Connection setting. In addition, the first CTS node in the connection string will receive additional connections for the AM Reaper and Blacklisting threads. For example, if Max Connections is set to 20 and there are 2 AM nodes and 2 CTS nodes:
- AM1 will create a maximum of 20 connections to CTS1* and a maximum of 20 connections to CTS2.
- AM2 will create a maximum of 20 connections to CTS1* and a maximum of 20 connections to CTS2.
* CTS1 will however receive additional connections from each AM for the AM Reaper and AM Blacklisting threads, so total connections from AM1 and AM2 to CTS1 will be greater than 20. The Reaper and Blacklisting threads execute on the first CTS node in the connection list only, unless it is down in which case it will create a new set of connections to the next CTS in the CTS connection string list.
- Click Save Changes
- Select the External Store Configuration tab and configure the following settings:
- Enable SSL/TLS Enabled if CTS is LDAPS enabled.
- Define the Connection String with a comma separated list in the format <server FQDN>:<port>. An example connection string is: cts1.example.com:1636,cts2.example.com:1636
- Set the Login Id and Password as appropriate.
- Enable Affinity Enabled.
- Click Save Changes.
- Restart the AM node(s). Configuration complete.
If there are problems, you should check the debug logs - usually it is something like the bind credentials to connect to CTS are wrong. See Maintenance Guide › Debug Logging (AM 7 and later) or How do I enable Message level debugging in AM 5.x, 6.x and OpenAM 13.x debug files?
What about multi-data center deployments?
As always the optimal AM to CTS topology for a multi-data center (DC) deployment is it depends. For example, is failover across DCs really required? The worst case is a user needs to re-authenticate, there is cost and complexity for DC SSO and OAuth2 session failover. Another consideration may be latency; for example, some customers have dedicated communication lines which guarantee low latency between their DCs; others are subject to public internet conditions. These considerations and more should be factored in when deciding on which CTS architecture to go with.
However, if for example, inter-DC failover for SSO and OAuth2 sessions is required and latency levels are guaranteed, the standard affinity approach could be used, whereby all CTS nodes from both sites are set in the connection string and AM sees the CTS pool as one collection of servers and is unaware that some CTS nodes are local and some remote. Example connection string:cts1-dc1.example.com:1636,cts2-dc1.example.com:1636,cts1-dc2.example.com:1636,cts2-dc2.example.com:1636
If however latency levels cannot be guaranteed, then perhaps affinity within a site would be more appropriate, whereby the connection string is set at an instance level for all CTS nodes within a particular DC. However, this approach would be subject to replication delay for token replication between DCs; if the data does not make the remote DC quick enough and stickiness to a particular DC cannot be guaranteed, there is a chance for failure at the token validation phase. Example connection string:AM1 and AM2 in DC1: cts1-dc1.example.com:1636,cts2-dc1.example.com:1636 AM1 and AM2 in DC2: cts1-dc2.example.com:1636,cts2-dc2.example.com:1636
Multi-DC deployments are complex; ForgeRock Deployment Support Services are available to help customers to architect, design and build in the best way. For more information, please refer to Deployment Support Services.
Related Issue Tracker IDs