Redis is designed to be fast. In most cases, it is. However, there are times when Redis may be slow, due to network issues, disk latency, or other factors. When this happens, it is important to be able to detect the slow down and investigate the cause of Redis latency.
Redis & latency
Latency is the maximum delay between the time a client issues a command and the time the reply to the command is received by the client. Redis has strict requirements on average and worst case latency.
Let’s take a look at the different forms of latency that can affect a Redis deployment and suggestions on how you can fix any potential latency issues that may arise:
The 7 types of Redis latency
1. Intrinsic latency
This is latency that is inherently part of the environment where Redis is running. It is induced by the operating system kernel or, if you are using virtualization, by the hypervisor you are using.
To measure intrinsic latency, Redis provides a CLI command (The argument 100 indicates the number of seconds the test should run). However this test is CPU intensive and is likely to saturate a single CPU core.
$ ./redis-cli --intrinsic-latency 100
Max latency so far: 1 microseconds.
Max latency so far: 26 microseconds.
Max latency so far: 53 microseconds.
Max latency so far: 73 microseconds.
2. Network latency
As the name suggests this is the latency introduced by the network and hardware for relaying the messages between the client and server. The typical latency of a 1 Gbit/s network is about 200 us, while the latency with a Unix domain socket can be as low as 30 us.
How to fix it? If you think network latency is a problem in your environment and want to optimize it, here are some guidelines:- Use a client that supports several commands to be pipelined together
- If possible prefer a physical machine to a VM to host the server
- If client and server are on the same host, then use unix domain sockets
- Keep your connections as long lived as possible.
3. Command latency
Redis serves all the requests using a single thread in a sequential manner. This means when a request is slow to serve all the other clients will need to wait. Commands operating on many elements, such as SORT, LREM, SUNION or taking the intersection of two big sets can take a considerable amount of time.
The CPU consumption of the main Redis process is a good indicator whether you have a slow command problem - if CPU is high when traffic is low the likely culprits are slow commands.
The Redis Slow Log feature allows you to get into the details of slow commands.
How to fix it? Avoid using slow commands against values composed of many elements, or run a Redis replica where you run all your slow queries so it does not affect other queries.4. Fork latency
Redis forks background processes in order to generate RDB files in the background or rewrite AOF files if persistence is enabled. This fork operation which runs on the main thread can cause latency as well.
This latency could be problematic in scenarios where you have a large Redis instance running on a VM, where allocation and initialization of large memory chunks that are required for BGSAVE could be problematic.
- Linux VM on EC2 (old instance type) -> 239.3 milliseconds per GB
- Linux VM on EC2 (new instance type) -> 10 milliseconds per GB
5. Transparent huge pages latency
If the Linux kernel has transparent huge pages enabled, Redis incurs a big latency and memory usage penalty every time the fork call is used to persist on disk.
How to fix it? Disable transparent huge pages using the following command:echo never > /sys/kernel/mm/transparent_hugepage/enabled
6. Swapping latency
If a Redis page is moved by the kernel from memory to the swap, while Redis is using data stored in this page then the kernel will stop the Redis process in order to move the page back to main memory.
This is obviously much slower than accessing a page already in memory - and could result in latency spikes being experienced by the Redis client.
How to fix it? Lower the memory pressure in your system, by adding more RAM or avoiding running other memory hungry processes in the same system.7. Disk I/O latency
The Redis AOF (Append Only File) persistence mechanism uses write(2) and fdatasync(2) to write data to the append only file and flush the kernel buffer on disk. Both these system calls can induce latency, especially fdatasync(2) which can take anywhere from a few milliseconds to a few seconds to complete if there are other processes doing heavy I/O on the same system.
How to fix it? Avoid running other processes that do I/O on the same system, use an SSD.Let us hear from you
If you haven’t already, sign up now for a free Netdata account! Feel free to check out the Redis demo room to explore and interact with Netdata.
For details on all the Redis metrics that Netdata monitors, check out the comprehensive Redis monitoring blog post.
We’d love to hear from you – if you have any questions, complaints or feedback please reach out to us on Discord or Github.
Happy Troubleshooting!