Sizing Hardware and Services For Deployment

Determine the correct size of your deployment by understanding what is that you need out of it. Availability over anything? Fast response times? Can you find a compromise?

Sizing for Availability

Any part of a system that can fail eventually will fail. Keeping your service available means tolerating failure in any part of the system without interrupting the service. You make AM services highly available through good maintenance practices and by removing single points of failure from your architectures.

Removing single points of failure involves replicating each system component, so that when one component fails, another can take its place. Replicating components comes with costs not only for the deployment and maintenance of more individual components, but also for the synchronization of anything those components share. Due to necessary synchronization between components, what you spend on availability is never fully recovered as gains in capacity. (Two servers cannot do quite twice the work of one server.) Instead you must determine the right trade offs for the deployment.

To start thinking about the trade offs, answer the following questions.

What is the impact of the AM service becoming unavailable?

In an online system this could be a severe problem, interrupting all access to protected resources. Most deployments fall into this category.

In an embedded system protecting local resources, it might be acceptable to restart the service.

Deployments that require always-on service availability require some sort of load balancing solution at minimum between AM and AM client applications. The load balancer itself must be redundant, too, so that it does not become a single point of failure. Illustrations in Example Deployment Topology, show multiple levels of load balancing for availability and firewalls for security.

Is the service critical enough to warrant deployment across multiple sites?

AM allows you to deploy replicated configurations in different physical locations, so that if the service experiences complete failure at one site, you can redirect client traffic to another site and continue operation. The question is whether the benefit in reducing the likelihood of failure outweighs the costs of maintaining multiple sites.

When you need failover across sites, one of the costs is (redundant) WAN links scaled for inter-site traffic. AM synchronizes configuration and policy data across sites, and by default also synchronizes session data as well. AM also expects profiles in identity stores to remain in sync. As shown in Example Deployment Topology, the deployment involves directory service replication between sites.

What is the impact of losing individual session information?

AM offers different ways of storing session information to suit your environment needs. If your environment requires session high availability, AM can store sessions in the CTS token store. Even if a server fails or the client browser crashes, the session remains available as long as the CTS is available.

In deployments where an interruption in access to a protected resource could cause users to lose valuable information, configuring the CTS correctly can prevent the loss. The underlying directory service should replicate sessions stored in CTS. Because session information can be quite volatile—more volatile than configuration and policy data—replication of the CTS across sites can therefore call for more WAN bandwidth, as more information is shared across sites.

Once you have the answers to these questions for the deployment, you can draw a diagram of the deployment, checking for single points of failure to avoid. In the end, you should have a count of the number of load balancers, network links, and servers that you need, with the types of clients and an estimated numbers of clients that access the AM service.

Also, you must consider the requirements for non-functional testing, covered in "Planning Tests". While you might be able to perform functional testing by using a single AM server, other tests require a more complete environment with multiple servers, secure connections, and so forth. Performance testing should reveal any scalability issues. Performance testing should also run through scenarios where components fail, and check that critical functionality remains available and continues to provide acceptable levels of service.

Sizing For Service Levels

Beyond service availability, your aim is to provide some level of service. You can express service levels in terms of throughput and response times. For example, the service level goal could be to handle an average of 10,000 authentications per hour with peaks of 25,000 authentications per hour, and no more than 1 second wait for 95% of users authenticating, with an average of 100,000 live SSO sessions at any given time. Another service level goal could be to handle an average of 500 policy requests per minute per web or Java agent with an average response time of 0.5 seconds.

When you have established your service level goals, you can create load tests for each key service as described in Planning Service Performance Testing. Use the load tests to check your sizing assumptions.

To estimate sizing based on service levels, take some initial measurements and extrapolate from those measurements.

For a service that handles policy decision (authorization) requests, what is the average policy size? What is the total size of all expected policies?

To answer these questions, you can measure the current disk space and memory occupied by the configuration directory data. Next, create a representative sample of the policies that you expect to see in the deployment, and measure the difference. Then, derive the average policy size, and use it to estimate total size.

To measure rates of policy evaluations, you can monitor policy evaluation counts over SNMP. For details, see "SNMP Monitoring for Policy Evaluation".

How Big is CTS Data?

The average total size depends on the number of live CTS entries, which in turn depends on session and token lifetimes. The lifetimes are configurable and depend also on user actions like logout, that are specific to the deployment.

For example, suppose that the deployment only handles CTS-based SSO sessions, session entries occupy on average 20 KB of memory, and that you anticipate on average 100,000 active sessions. In that case, you would estimate the need for 2 GB (100,000 x 20,000) RAM on average if you wanted to cache all the session data in memory. If you expect that sometimes the number of active sessions could rise to 200,000, then you would plan for 4 GB RAM for the session cache. Keep in mind that this is extra memory needed in addition to memory needed for everything else that the system does including running AM server.

Session data is relatively volatile, as the CTS creates session entries when sessions are created. CTS deletes session entries when sessions are destroyed due to logout or timeout. Sessions are also modified regularly to update the idle timeout. Suppose the rate of session creation is about 5 per second, and the rate of session destruction is also about 5 per second. Then you know that the underlying directory service must handle on average 5 adds and 5 deletes per second. The added entries generate on the order of 100 KB replication traffic per second (5/s x 20 KB plus some overhead). The deleted entries generate less replication traffic, as the directory service only needs to know the distinguished name (DN) of the entry to delete, not its content.

You can also gather statistics about CTS operations over using AM monitoring services. For more information, see Monitoring Instances.

What level of network traffic do you expect for session change notifications?

When sizing the network, you must account for change notifications from the Core Token Server token store to the AM server.

In AM deployments, there is no direct network traffic between AM servers.

What increase in user and group profile size should you expect?

AM stores data in user profile attributes. AM can use or provision many profile attributes, as described in "To Configure an Identity Store".

When you know which attributes are used, you can estimate the average increase in size by measuring the identity store as you did for configuration and CTS-related data. If you do not manage the identity store as part of the deployment, you can communicate this information with the maintainers. For a large deployment, the increase in profile size can affect sizing for the underlying directory service.

How does the number of realms affect the configuration data size?

In a centrally managed deployment with only a few realms, the size of realm configuration data might not be consequential. Also, you might have already estimated the size of policy data. For example, each new realm might add about 1 MB of configuration data to the configuration directory, not counting the policies added to the realm.

In a multi-tenant deployment or any deployment where you expect to set up many new realms, the realm configuration data and the additional policies for the realm can add significantly to the size of the configuration data overall. You can measure the configuration directory data as you did previously, but specifically for realm creation and policy configuration, so that you can estimate an average for a new realm with policies and the overall size of realm configuration data for the deployment.

See "Sizing Systems" for about CPU, memory, network, and disk-specific sizing information.