The ForgeRock Core Token Service (CTS) is a specific DS/OpenDJ repository used to store and persist AM/OpenAM SSO, OAuth2/OIDC and SAML tokens. In terms of architecture, typical primary/secondary aka active/standby or all active 1:1 AM/OpenAM topologies are adopted for AM/OpenAM to CTS connections (described in the Traditional AM/OpenAM - CTS topology section). Both of these approach have inherent limitations, namely:
- Active/standby topologies do not maximize available hardware (only the primary node services requests) and horizontal scaling of the CTS pool of servers is not possible.
- Active/standby topologies require expensive vertically scaled instances under high load.
- 1:1 all active CTS topologies require stickiness for all session requests to the server which created the token or replication delay becomes a functional issue.
Affinity based load balancing for CTS elegantly resolves these problems.
This article covers the following topics:
Traditional AM/OpenAM - CTS topology
Assuming CTS connections strings are used; the recommended approach to avoid the need for a load balancer and allow AM/OpenAM to manage its own connections is to typically use one of two AM/OpenAM - CTS topologies - Primary/Secondary (aka Active/Standby) or 1:1 all active. These are described below.
Mode 1 - Primary/Secondary
All AM/OpenAMs in the pool communicate with a single CTS server. If this server fails, all connections failover to the secondary CTS server and so on. The following topology depicts this architecture:
In this mode, replication delay is not an issue as all traffic hits the primary and as long as replication happens successfully over time; functionality is unaffected.
A connection string for this mode would look like this:
- In an high load environment, CTS needs to be vertically scaled to ensure a single server can handle all requests. This is extremely costly for both on-prem and cloud based deployments.
- It does not make efficient use of the available hardware; only one server is handling traffic, while the others are sat idle awaiting primary failure.
Mode 2 - All Active CTS with 1:1 Mapping between AM/OpenAM and CTS nodes
Each AM/OpenAM communicates with its own CTS server, if this node fails it connects to its failover CTS. For example, AM1 connects to CTS1 as its primary with CTS2 as it secondary, AM2 connects to CTS2 as its primary and CTS1 as its secondary. The following topology depicts this architecture:
This mode allows an all active CTS farm to maximize hardware usage. This approach must ensure end-to-end stickiness to the AM/OpenAM node that created the session/token for all interactions to prevent functional issues. If this cannot be ensured, this is not a viable option.
The connection string would look like the following, where the serverID for AM1=01 and AM2=02.
- An all active architecture like this is highly susceptible to replication delay if stickiness to the AM/OpenAM node that created a given session cannot be guaranteed. DS/OpenDJ replication is based on a “loose” algorithm where all CTS nodes will fully converge over time, but not instantly. This means under high load a token create may hit AM1 and CTS1, however AM2 (which is connected to CTS2) may be hit for the token validate request. If replication has not happened fast enough, the request would error.
What is CTS Affinity based load balancing?
Affinity based load balancing for CTS addresses the key issues highlighted with the more traditional approaches; namely, unlimited horizontal scaling of the CTS pool using smaller, cheaper instances and eliminating functional issues as a result of replication delay.
It does this by making use of the new affinity based load balancing algorithm built into the DS/OpenDJ SDK deployed with AM/OpenAM. For each and every inbound token DN, the SDK creates a hash and allocates the result to a specific CTS DS/OpenDJ instance in the CTS connection string. All AM/OpenAM servers in the pool compute the same hash and thus send the request to the same CTS instance. The next request for another token DN is hashed and may be sent to another CTS instance in the pool and so on. The end result of this is, for each and every token a create occurs on a specific CTS instance (the token origin server) and from that point on all read, update and delete operations will be sent to this same origin server from any AM/OpenAM node. The following topology depicts this architecture:
The DS/OpenDJ SDK also makes sure token creates are spread close to evenly across all CTS nodes in the pool to ensure one CTS server is not overloaded with requests, while the others remain idle. Finally, the DS/OpenDJ SDK is instance aware; if the origin server goes down, the request goes to another in the pool and remains sticky to that CTS instance from then on. Assuming replication has happened there will be no functional impact. When the original CTS server comes back online, requests fail back to that server for any requests where it is the origin servers.
The connection string for affinity simply lists all CTS instances as a comma separate list:
It is imperative that all AM/OpenAM servers share the same connection string; affinity will fail if each AM/OpenAM server changes the ordering of the connection string.
What are the advantages of CTS Affinity based load balancing?
- Allows horizontal scaling of the CTS server farm, this is not possible with the traditional topologies.
- Maximizes available resources by introducing an all active CTS pool of servers.
- Allows smaller, cheaper nodes for handling CTS traffic with the all active CTS architecture, rather than expensive vertically scaled nodes.
- Combats replication delay between CTS nodes.
- Removes the need for end-to-end stickiness to AM/OpenAM for session activities (stickiness is still required for authentication as the AuthID is held in the memory of the AM/OpenAM server that was first hit for the ../json/authenticate call; see OPENAM-8336 (XUI+REST authentication with chains must have sticky load balancing) for further information).
How do you configure CTS Affinity based load balancing?
- Build an external CTS repository per the following guides: Installation Guide › CTS Deployment Steps and Best practice for configuring an external DS/OpenDJ instance for the Core Token Service (CTS) in AM/OpenAM (All versions) .
- As the connection string for CTS Affinity needs to be exactly the same across all AM/OpenAM nodes, it is recommended to make the necessary CTS changes at the Server Default level rather than at an instance level to ensure all AM/OpenAM nodes inherit the settings from these defaults. Navigate to: Configure > Server Defaults > CTS.
- Configure the following settings:
- Change the Store Mode to External Token Store.
- Set the Root Suffix to that configured in step 1.
- Set Max Connections as appropriate depending on your version:
AM 5 and later: Each AM will create a small number of connections on startup to each CTS node in
the pool. As load increases, the number of connections from each AM to each CTS node
will increase up to this Max Connection setting. In addition, the first CTS node in
the connection string will receive additional connections for the AM Reaper and Blacklisting
threads. For example, if Max Connections is set to 20 and there are 2 AM nodes and
2 CTS nodes:
- AM1 will create a maximum of 20 connections to CTS1* and a maximum of 20 connections to CTS2.
- AM2 will create a maximum of 20 connections to CTS1* and a maximum of 20 connections to CTS2.
- Pre-AM 5: The Max connections is shared as follows; 1 connection for CTS cleanup Reaper and the rest shared equally between the nodes specified in the connection string. For example, if there are 2 CTS nodes and the connection string is set to 21; 10 connections will be allocated to CTS1; 10 to CTS2 and 1 reserved for the reaper. Load testing will determine optimal value - increase as appropriate.
- AM 5 and later: Each AM will create a small number of connections on startup to each CTS node in the pool. As load increases, the number of connections from each AM to each CTS node will increase up to this Max Connection setting. In addition, the first CTS node in the connection string will receive additional connections for the AM Reaper and Blacklisting threads. For example, if Max Connections is set to 20 and there are 2 AM nodes and 2 CTS nodes:
- Click Save Changes
- Select the External Store Configuration tab and configure the following settings:
- Enable SSL/TLS Enabled if CTS is LDAPS enabled.
- Define the Connection String with a comma separated list in the format <server FQDN>:<port>. An
example connection string is:
- Set the Login Id and Password as appropriate.
- Enable Affinity Enabled.
- Click Save Changes.
- Restart the AM/OpenAM node(s). Configuration complete.
If there are problems be sure to check out AM/OpenAM’s debug logs - usually it is something like the bind credentials to connect to CTS are wrong. See How do I enable Message level debugging in AM/OpenAM (All versions) debug files?
What about multi-data center deployments?
As always the optimal AM/OpenAM to CTS topology for a multi-data center (DC) deployment is it depends. For example, is failover across DCs really required? The worst case is a user needs to re-authenticate, there is cost and complexity for DC SSO and OAuth2 session failover. Another consideration may be latency; for example, some customers have dedicated communication lines which guarantee low latency between their DCs; others are subject to public internet conditions. These considerations and more should be factored in when deciding on which CTS architecture to go with.
However, if for example, inter-DC failover for SSO and OAuth2 sessions is required and latency levels are guaranteed, the standard affinity approach could be used, whereby all CTS nodes from both sites are set in the connection string and AM/OpenAM sees the CTS pool as one collection of servers and is unaware that some CTS nodes are local and some remote. Example connection string:
If however latency levels cannot be guaranteed, then perhaps affinity within a site would be more appropriate, whereby the connection string is set at an instance level for all CTS nodes within a particular DC. However, this approach would be subject to replication delay for token replication between DCs; if the data does not make the remote DC quick enough and stickiness to a particular DC cannot be guaranteed, there is a chance for failure at the token validation phase. Example connection string:
AM1 and AM2 in DC1: cts1-dc1.example.com:1636,cts2-dc1.example.com:1636 AM1 and AM2 in DC2: cts1-dc2.example.com:1636,cts2-dc2.example.com:1636
Multi-DC deployments are complex; ForgeRock Deployment Support Services are available to help customers to architect, design and build in the best way. For more information, please contact firstname.lastname@example.org
Related Issue Tracker IDs