JEMalloc and Cassandra

2017, Sep 13    

Memory management in Cassandra

Cassandra depends on JVM-Java Virtual Machine, to accomplish Cassandra’s memory management requirement. The JVM mainly divided into two areas as follows;

  1. Heap - data area which contains the runtime structures.
  2. Internal data structures - Java methods, thread stack and native methods.

Cassandra uses its memory in four ways which are mentioned below. This includes OS memory too.

  1. Java heap
  2. Offheap memory (OS memory that is not managed by JVM G.C-Garbage Collector)
  3. OS page cache
  4. OS TCP/IP stack I/O cache

Since Cassandra uses JVM for its memory management, tuning of JVM is necessary to get optimal performance in Cassandra. The tuning of JVM includes the changing the settings in cassandra-env.sh as mentioned below;

  • MAX_HEAP_SIZE
  • HEAP_NEWSIZE

What is JEMalloc?

JEMalloc is an enhanced memory allocator in Linux based platforms. With JEMalloc, the memory allocation for multithreaded applications scales well as the no.of processors’ increases. The previously used memory allocator, malloc(3) suffered scalability bottleneck for some multithreaded applications that caused JEMalloc to emerged.

Use of JEMalloc has been introduced in Cassandra after 2.0.

Ensure JNA-Java Native Access and JEMalloc are installed on Linux AMI. If you’re creating an Amazon AMI for Cassandra, then you want to install both of these.

yum install -y jna
yum install -y jemalloc

Cassandra.yaml configuration requires the change mentioned below in order to use JEMalloc.

memtable_allocation_type: offheap_objects

Note. The above setting is set to “heap_buffers” by default.

What is the benefit of using JEMalloc in Cassandra

By enabling JEMalloc in Cassandra, it reduces the amount of Java heap space that Cassandra uses. Data written to Cassandra is first stored in memtables in heap memory. Memtables are then flushed to SStables on disk when they get full. The garbage collection process of JVM is used to clear the heap memory. Sometimes, this garbage collection process causes issues in Cassandra due to garbage collection pause.

The benefit of JEMalloc is, it reduces the pressure of garbage collection because Cassandra uses off-heap memory allocation with JEMalloc.