Your key performance requirement is to satisfy your users or customers with the resources available to you. Before you can solve potential performance problems, define what those users or customers expect. Determine which resources you will have to satisfy their expectations.
A service level objective (SLO) is a target for a directory service level that you can measure quantitatively. If possible, base SLOs on what your key users expect from the service in terms of performance.
Define SLOs for at least the following areas:
Directory service response times
Directory service response times range from less than a millisecond on average, across a low latency connection on the same network, to however long it takes your network to deliver the response.
More important than average or best response times is the response time distribution, because applications set timeouts based on worst case scenarios.
An example response time performance requirement is, Directory response times must average less than 10 milliseconds for all operations except searches returning more than 10 entries, with 99.9% of response times under 40 milliseconds.
Directory service throughput
Directories can serve many thousands of operations per second. In fact there is no upper limit for read operations such as searches, because only write operations must be replicated. To increase read throughput, simply add additional replicas.
More important than average throughput is peak throughput. You might have peak write throughput in the middle of the night when batch jobs update entries in bulk, and peak binds for a special event or first thing Monday morning.
An example throughput performance requirement is, The directory service must sustain a mix of 5,000 operations per second made up of 70% reads, 25% modifies, 3% adds, and 2% deletes.
Ideally, you mimic the behavior of key operations during performance testing, so that you understand the patterns of operations in the throughput you need to provide.
Directory service availability
DS software is designed to let you build directory services that are basically available, including during maintenance and even upgrade of individual servers.
To reach very high levels of availability, you must also ensure that your operations execute in a way that preserves availability.
Availability requirements can be as lax as a best effort, or as stringent as 99.999% or more uptime.
Replication is the DS feature that allows you to build a highly available directory service.
Directory service administrative support
Be sure to understand how you support your users when they run into trouble.
While directory services can help you turn password management into a self-service visit to a web site, some users still need to know what they can expect if they need your help.
Creating an SLO, even if your first version consists of guesses, helps you reduce performance tuning from an open-ended project to a clear set of measurable goals for a manageable project with a definite outcome.
With your SLOs in hand, inventory the server, networks, storage, people, and other resources at your disposal. Now is the time to estimate whether it is possible to meet the requirements at all.
If, for example, you are expected to serve more throughput than the network can transfer, maintain high-availability with only one physical machine, store 100 GB of backups on a 50 GB partition, or provide 24/7 support all alone, no amount of tuning will fix the problem.
When checking that the resources you have at least theoretically suffice to meet your requirements, do not forget that high availability in particular requires at least two of everything to avoid single points of failure. Be sure to list the resources you expect to have, when and how long you expect to have them, and why you need them. Make note of what is missing and why.
DS servers are pure Java applications, making them very portable. DS servers tend to perform best on single-board, x86 systems due to low memory latency.
High-performance storage is essential for handling high-write throughput. When the database stays fully cached in memory, directory read operations do not result in disk I/O. Only writes result in disk I/O. You can further improve write performance by using solid-state disks for storage or file system cache.
DS directory servers are designed to work with local storage for database backends. Do not use network file systems, such as NFS, where there is no guarantee that a single process has access to files.
Storage area networks (SANs) and attached storage are fine for use with DS directory servers.
Regarding database size on disk, sustained write traffic can cause the database to grow to more than twice its initial size on disk. This is normal behavior. The size on disk does not impact the DB cache size requirements.
To avoid directory database file corruption after crashes or power failures on Linux systems,
enable file system write barriers, and make sure that the file system journaling mode is ordered.
For details on how to enable write barriers and set the journaling mode for data,
see the options for your file system in the
mount command manual page.
Even if you do not need high availability, you still need two of everything, because your test environment needs to mimic your production environment as closely as possible.
In your test environment, set up DS servers just as you do in production. Conduct experiments to determine how to best meet your SLOs.
The following command-line tools help with basic performance testing:
The makeldif command generates sample data with great flexibility.
addratecommand measures add and delete throughput and response time.
authratecommand measures bind throughput and response time.
modratecommand measures modification throughput and response time.
searchratecommand measures search throughput and response time.
*rate commands display response time distributions measurements,
and support testing at specified levels of throughput.
For additional precision when evaluating response times, use the global configuration setting etime-resolution. To change elapsed processing time resolution from milliseconds (default) to nanoseconds:
$ dsconfig \ set-global-configuration-prop \ --hostname localhost \ --port 4444 \ --bindDN uid=admin \ --bindPassword password \ --set etime-resolution:nanoseconds \ --usePkcs12TrustStore /path/to/opendj/config/keystore \ --trustStorePassword:file /path/to/opendj/config/keystore.pin \ --no-prompt
etime, recorded in the server access log, indicates the elapsed time to process the request.
etime starts when the decoded operation is available to be processed by a worker thread.
Test performance with your production-ready configuration. If, however, you simply want to demonstrate top performance, take the following points into account:
Incorrect JVM tuning slows down server and tool performance. Make sure the JVM is tuned for best performance.
For example, set the following environment variable, then restart the server and run the performance tools again to take the change into account:
export OPENDJ_JAVA_ARGS="-XX:+UseParallelGC -XX:MaxTenuringThreshold=1"
If the server heap is very large, see the details in Java Settings.
Unfiltered access logs record messages for each client request. Turn off full access logging.
For example, set
Json File-Based Access Loggerlog publisher, and any other unfiltered log publishers that are enabled.
Secure connections are recommended, and they can be costly.
require-secure-authentication:falsein the password policies governing the bind entries, and bind using insecure connections.
Use the following suggestions when your tests show that DS performance is lacking, even though you have the right underlying network, hardware, storage, and system resources in place.
DS servers must open many file descriptors when handling thousands of client connections.
Linux systems often set a limit of 1024 per user. That setting is too low to accept thousands of client connections.
Make sure the server can use at least 64K (65536) file descriptors.
For example, when running the server as user
opendj on a Linux system
/etc/security/limits.conf to set user level limits,
set soft and hard limits by adding these lines to the file:
opendj soft nofile 65536 opendj hard nofile 131072
The example above assumes the system has enough file descriptors available overall. Check the Linux system overall maximum as follows:
$ cat /proc/sys/fs/file-max 204252
Default Linux virtual memory settings cause significant buildup of dirty data pages before flushing them. When the kernel finally flushes the pages to disk, the operation can exhaust the disk I/O for up to several seconds. Application operations waiting on the file system to synchronize to disk are blocked.
The default virtual memory settings can therefore cause DS server operations to block for seconds at a time. Symptoms included high outlier etimes, even for very low average etimes. For sustained high loads, such as import operations, the server has to maintain thousands of open file descriptors.
To avoid these problems, tune Linux page caching.
As a starting point for testing and tuning, set
vm.dirty_background_bytes to one quarter of the disk I/O per second,
vm.dirty_expire_centisecs to 1000 (10 seconds) using the
This causes the kernel to flush more often, and limits the pauses to a maximum of 250 milliseconds.
For example, if the disk I/O is 80 MB/second for writes, the following example shows an appropriate starting point.
It updates the
/etc/sysctl.conf file to change the setting permanently,
and uses the
sysctl -p command to reload the settings:
$ echo vm.dirty_background_bytes=20971520 | sudo tee -a /etc/sysctl.conf [sudo] password for admin: $ echo vm.dirty_expire_centisecs=1000 | sudo tee -a /etc/sysctl.conf $ sudo sysctl -p vm.dirty_background_bytes = 20971520 vm.dirty_expire_centisecs = 1000
Be sure to test and adjust the settings for your deployment.
Default Java settings let you evaluate DS servers using limited system resources. For high performance production systems, test and run with a tuned JVM.
To apply JVM settings for a server, edit
Availability of the following
java options depends on the JVM:
If you observe any internal node evictions, add more RAM to the system. If adding RAM is not an option, increase the maximum heap size to optimize RAM allocation. For details, see Cache Internal Nodes.
Use at least a 2 GB heap unless your data set is small.
When using JMX, add this option to the list of
start-ds.java-argsarguments to avoid periodic full GC events.
JMX is based on RMI, which uses references to objects. By default, the JMX client and server perform a full GC periodically to clean up stale references. As a result, the default settings cause JMX to cause a full GC every hour.
Avoid using this argument with
import-ldif.offline.java-argsor when using the
import-ldifcommand. The import process uses garbage collection to manage memory and references to memory-mapped files.
This sets the maximum number of GC cycles an object stays in survivor spaces before it is promoted into the old generation space.
Setting this option as suggested reduces the new generation GC frequency and duration. The JVM quickly promotes long-lived objects to the old generation space, rather than letting them accumulate in new generation survivor spaces, copying them for each GC cycle.
Log garbage collection messages when diagnosing JVM tuning problems. You can turn the option off when everything is running smoothly.
Always specify the output file for the garbage collection log. Otherwise, the JVM logs the messages to the
opendj/logs/server.outfile, mixing them with other messages, such as stack traces from the
-Xlog:gc=info:file=/path/to/gc.loglogs informational messages about garbage collection to the file,
For details, use the
Short-lived client tools, such as the
ldapsearchcommand, start up faster when this option is set to
Use G1 GC (the default) when the heap size is 8 GB or more.
Use parallel GC when the heap size is less than 8 GB.
By default, DS servers compress attribute descriptions and object class sets to reduce data size. This is called compact encoding.
By default, DS servers do not compress entries stored in its backend database.
If your entries hold values that compress well, such as text, you can gain space.
Set the backend property
entries-compressed:true, and reimport the data from LDIF.
The DS server compresses entries before writing them to the database:
$ dsconfig \ set-backend-prop \ --hostname localhost \ --port 4444 \ --bindDN uid=admin \ --bindPassword password \ --backend-name dsEvaluation \ --set entries-compressed:true \ --usePkcs12TrustStore /path/to/opendj/config/keystore \ --trustStorePassword:file /path/to/opendj/config/keystore.pin \ --no-prompt $ import-ldif \ --hostname localhost \ --port 4444 \ --bindDN uid=admin \ --bindPassword password \ --ldifFile backup.ldif \ --backendID dsEvaluation \ --includeBranch dc=example,dc=com \ --usePkcs12TrustStore /path/to/opendj/config/keystore \ --trustStorePassword:file /path/to/opendj/config/keystore.pin
DS directory servers do not proactively rewrite all entries after you change the settings. To force the DS server to compress all entries, you must import the data from LDIF.
By default, the temporary directory used for scratch files is
import-ldif --tmpDirectory option to set this directory to a
tmpfs file system,
If you are certain your LDIF contains only valid entries with correct syntax, you can skip schema validation.
import-ldif --skipSchemaValidation option.
By default, DS directory servers:
If you require fine-grained control over JE backend cache settings, you can configure the amount of memory requested for database cache per database backend:
Percentage of JVM memory to allocate to the database cache for the backend.
If the directory server has multiple database backends, the total percent of JVM heap used must remain less than 100 (percent), and must leave space for other uses.
Default: 50 (percent)
JVM memory to allocate to the database cache.
This is an alternative to
db-cache-percent. If you set its value larger than 0, then it takes precedence over
Default: 0 MB
Set the global property
Restart the server for the changes to take effect.
A JE backend is implemented as a B-tree data structure. A B-tree is made up of nodes that can have children. Nodes with children are called internal nodes. Nodes without children are called leaf nodes.
The directory stores data in key-value pairs. Internal nodes hold the keys, and can also hold small values. Leaf nodes hold the values. One internal node usually holds keys to values in many leaf nodes. A B-tree has many more leaf nodes than internal nodes.
To read a value by its key, the backend traverses all internal nodes on the branch from the B-tree root to the leaf node holding the value. The backend is more likely to access nodes the closer they are to the B-tree root. Internal nodes are accessed far more frequently than leaf nodes, and must remain cached in memory. In addition to the worker threads serving client application requests, cleaner threads working in the background also access internal nodes frequently. The performance impact of having to fetch frequently used internal nodes from disk can be severe.
When the database cache is full, the backend must begin evicting nodes from cache in order to load others. By default, the backend evicts leaf nodes even when the cache is not full. The backend is less likely to access a leaf node than an internal node, and leaf nodes might remain in the file system cache where they can be accessed quickly. If, however, the internal nodes do not all fit in cache, the backend eventually evicts even critical internal nodes.
Monitor the backend database environment to react if a backend evicts internal nodes, or performs critical evictions. The following example shows no internal node (IN) evictions, and no critical evictions:
$ ldapsearch \ --hostname localhost \ --port 1636 \ --useSsl \ --usePkcs12TrustStore /path/to/opendj/config/keystore \ --trustStorePassword:file /path/to/opendj/config/keystore.pin \ --bindDN uid=admin \ --bindPassword password \ --baseDN cn=backends,cn=monitor \ "(|(ds-mon-db-cache-evict-internal-nodes-count=*)(ds-mon-je-environment-nbytes-evicted-critical=*))" \ ds-mon-db-cache-evict-internal-nodes-count \ ds-mon-je-environment-nbytes-evicted-critical dn: ds-cfg-backend-id=dsEvaluation,cn=backends,cn=monitor ds-mon-db-cache-evict-internal-nodes-count: 0 dn: cn=raw JE database statistics,ds-cfg-backend-id=dsEvaluation,cn=backends,cn=monitor ds-mon-je-environment-nbytes-evicted-critical: 0
ds-mon-db-cache-evict-internal-nodes-countis greater than
0, then the system has too little memory for all internal nodes to remain in DB cache.
ds-mon-je-environment-nbytes-evicted-criticalis greater than
0, then the DB worker threads are evicting data because the normal process of clearing cache using background threads is no longer sufficient.
Increase the DB cache size, and add more RAM to your system if necessary, until there are no internal node evictions,
and no critical evictions.
If adding RAM is not an option, increase the maximum heap size (
-Xmx) to optimize RAM allocation.
When the DB cache is not large enough to hold all internal nodes, the performance impact can be severe. This section explains how to estimate the minimum DB cache size to hold all internal nodes.
The examples below reflect a directory server with a 10 million entry
The backend holds Example.com entries that generated as described in Install DS for Evaluation
with the additional setup option
Base your own calculations on realistic sample data, with the same indexes that you use in production, and with data affected by realistic client application and replication loads. To generate your own sample data, start by reading Generate Test Data. To simulate load, use the tools described in Performance Tests. Even better, learn about real loads from analysis of production access logs, and build custom test clients that reflect the access patterns of your applications.
After you import LDIF, the backend contains the minimum number of internal nodes required for the data. Over time as external applications update the directory server, the number of internal nodes grows.
A JE backend only appends to the database log for update operations, so many internal nodes in the database logs of a live system represent garbage that the backend eventually cleans up. Only the live internal nodes must be cached in memory. Over time, the increase in the number of internal nodes should track backend growth.
After loading the server for some time, stop the server.
backendstat command and JE
DbCacheSize tool together to estimate the required DB cache size.
The following example uses the
backendstat command to discover information about keys in the backend.
Using a script or a spreadsheet on the output,
calculate the total number of keys (sum of Total Keys, here: 73255315)
and average key size (sum of Key Size/sum of Total Keys, here: 13).
Use the results as input to the JE
# Stop the server before using backendstat: $ stop-ds $ backendstat list-raw-dbs --backendId dsEvaluation Raw DB Name ... Total Keys Keys Size Values Size Total Size ----------------------- ... ------------------------------------------------ /compressed_schema/comp ... 50 50 772 822 /compressed_schema/comp ... 17 17 848 865 /dc=com,dc=example/aci. ... 1 1 3 4 /dc=com,dc=example/cn.c ... 10000165 139242471 47887210 187129681 /dc=com,dc=example/cn.c ... 858658 5106085 204936391 210042476 /dc=com,dc=example/dn2i ... 10000181 268892913 80001448 348894361 /dc=com,dc=example/ds-c ... 0 0 0 0 /dc=com,dc=example/ds-c ... 1 18 3 21 /dc=com,dc=example/ds-s ... 0 0 0 0 /dc=com,dc=example/ds-s ... 0 0 0 0 /dc=com,dc=example/entr ... 9988518 39954072 47871653 87825725 /dc=com,dc=example/give ... 8614 51691 20017387 20069078 /dc=com,dc=example/give ... 19652 97670 48312528 48410198 /dc=com,dc=example/id2c ... 8 26 14 40 /dc=com,dc=example/id2e ... 10000181 80001448 4989592300 5069593748 /dc=com,dc=example/json ... 4 74 10 84 /dc=com,dc=example/json ... 2 34 4 38 /dc=com,dc=example/mail ... 10000152 238891751 47887168 286778919 /dc=com,dc=example/mail ... 1222798 7336758 112365106 119701864 /dc=com,dc=example/memb ... 1 40 2 42 /dc=com,dc=example/obje ... 23 379 393 772 /dc=com,dc=example/refe ... 0 0 0 0 /dc=com,dc=example/sn.c ... 13457 92943 20027045 20119988 /dc=com,dc=example/sn.c ... 41585 219522 73713958 73933480 /dc=com,dc=example/stat ... 23 1153 22 1175 /dc=com,dc=example/tele ... 9989952 109889472 47873522 157762994 /dc=com,dc=example/tele ... 1111110 6543210 221282026 227825236 /dc=com,dc=example/uid. ... 10000152 118889928 47887168 166777096 /dc=com,dc=example/uniq ... 10 406 21 427 Total: 29 # Calculate sum of Total Keys, sum of Key Size, and average key size: $ java -cp /path/to/opendj/lib/opendj.jar com.sleepycat.je.util.DbCacheSize \ -records 73255315 -key 13 === Environment Cache Overhead === 3,158,773 minimum bytes To account for JE daemon operation, record locks, HA network connections, etc, a larger amount is needed in practice. === Database Cache Size === Number of Bytes Description --------------- ----------- 2,709,096,544 Internal nodes only To get leaf node sizing specify -data For further information see the DbCacheSize javadoc.
The resulting recommendation for DB cache size, 2,709,096,544 bytes in this case, is a minimum estimate.
Round up when configuring backend settings for
If the system in this example has 8 GB available memory, use the default setting of
(50% * 8 GB = 4 GB, which is larger than the minimum estimate.)
With default settings, if the database has more than 200 files on disk, then the JE backend must start closing one log file in order to open another. This has serious impact on performance when the file cache starts to thrash.
Having the JE backend open and close log files from time to time is okay. Changing the settings is only necessary if the JE backend has to open and close the files very frequently.
A JE backend stores data on disk in append-only log files. The maximum size of each log file is configurable. A JE backend keeps a configurable maximum number of log files open, caching file handles to the log files. The relevant JE backend settings are the following:
Maximum size of a database log file.
Default: 1 GB
File handle cache size for database log files.
With these defaults, if the size of the database reaches 200 GB on disk (1 GB x 200 files),
the JE backend must close one log file to open another.
To avoid this situation, increase
until the JE backend can cache file handles to all its log files.
When changing the settings, make sure the maximum number of open files is sufficient.
DS servers implement an entry cache designed for a few large entries that are regularly updated or accessed, such as large static groups. An entry cache is used to keep such groups in memory in a format that avoids the need to constantly read and deserialize the large entries.
When configuring an entry cache, take care to include only the entries that need to be cached.
The memory devoted to the entry cache is not available for other purposes.
Use the configuration properties
exclude-filter for this.
The following example adds a Soft Reference entry cache to hold entries
that match the filter
(ou=Large Static Groups).
A Soft Reference entry cache releases entries when the JVM runs low on memory.
It does not have a maximum size setting.
The number of entries cached is limited only by the
$ dsconfig \ create-entry-cache \ --hostname localhost \ --port 4444 \ --bindDN uid=admin \ --bindPassword password \ --cache-name "Large Group Entry Cache" \ --type soft-reference \ --set cache-level:1 \ --set include-filter:"(ou=Large Static Groups)" \ --set enabled:true \ --usePkcs12TrustStore /path/to/opendj/config/keystore \ --trustStorePassword:file /path/to/opendj/config/keystore.pin \ --no-prompt
The entry cache configuration takes effect when the entry cache is enabled.
Debug logs trace the internal workings of DS servers, and should be used sparingly. Be particularly careful when activating debug logging in high-performance deployments.
In general, leave other logs active for production environments to help troubleshoot any issues that arise.
For servers handling 100,000 operations per second or more, the access log can be a performance bottleneck. Each client request results in at least one access log message. Test whether disabling the access log improves performance in such cases.
The following command disables the JSON-based LDAP access logger:
$ dsconfig \ set-log-publisher-prop \ --hostname localhost \ --port 4444 \ --bindDN uid=admin \ --bindPassword password \ --publisher-name "Json File-Based Access Logger" \ --set enabled:false \ --usePkcs12TrustStore /path/to/opendj/config/keystore \ --trustStorePassword:file /path/to/opendj/config/keystore.pin \ --no-prompt
The following command disables the HTTP access logger:
$ dsconfig \ set-log-publisher-prop \ --hostname localhost \ --port 4444 \ --bindDN uid=admin \ --bindPassword password \ --publisher-name "File-Based HTTP Access Logger" \ --set enabled:false \ --usePkcs12TrustStore /path/to/opendj/config/keystore \ --trustStorePassword:file /path/to/opendj/config/keystore.pin \ --no-prompt
By default, a replication server indexes change numbers for replicated user data. This allows legacy applications to get update notifications by change number, as described in Align Draft Change Numbers. Indexing change numbers requires additional CPU, disk accesses and storage, so it should not be used unless change number-based browsing is required.
Disable change number indexing if it is not needed. For details, see Disable Change Number Indexing.