Memory Impact of Java Collections

Sometimes it is important to know how much memory you need in order to store the data in a Java collection.

Here is a little overview of the memory overhead of the most important Java collections.

In some cases I also added my own implementations to see how they shape up.

All collections where filled with 10000 Integer elements and then the memory was measured by producing a heap dump and then analyzing the heap dump with MemoryAnalyzer.
Executed in:

Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)

on a

Intel(R) Core(TM) i7 CPU       M 620  2.67GHz (4 CPUs), ~2.7GHz

Set

The data stored in the sets is always the same: 10000 instances of Integer:

Class Name Objects Bytes
java.util.Integer 10,000 240,000
Total 10,000 240,000

Memory Footprint

java.util.HashSet

Class Name Objects Bytes
java.util.HashMap$Entry 10,000 480,000
java.util.HashMap$Entry[] 1 131,096
java.util.HashMap 1 64
java.util.HashSet 1 24
java.util.HashMap$KeySet 1 24
Total 10,004 611,208

java.util.TreeSet

Class Name Objects Bytes
java.util.TreeMap$Entry 10,000 640,000
java.util.TreeMap 1 80
java.util.TreeSet 1 24
Total 10,002 640,104

Collections.synchronizedSet(java.util.HashSet)

A synchronized HashSet by wrapping it with the Collections.synchronizedSet() method.

Class Name Objects Bytes
java.util.HashMap$Entry 10,000 480,000
java.util.HashMap$Entry[] 1 131,096
java.util.HashMap 1 64
java.util.Collections$SynchronizedSet 1 32
java.util.HashSet 1 24
Total 10,004 611,208

Collections.newSetFromMap(ConcurrentHashMap)

Java does not provide a ConcurrentHashSet out of the box, but you can create an equivalent by wrapping a ConcurrentHashMap with Collections.newSetFromMap().

Class Name Objects Bytes
java.util.concurrent.ConcurrentHashMap$HashEntry 10,000 480,000
java.util.concurrent.ConcurrentHashMap$HashEntry[] 16 131,456
java.util.concurrent.ConcurrentHashMap$Segment 16 768
java.util.concurrent.ConcurrentHashMap$NonfairSync 16 768
java.util.concurrent.ConcurrentHashMap$Segment[] 1 152
java.util.concurrent.ConcurrentHashMap 1 72
java.util.Collections$SetFromMap 1 32
java.util.concurrent.ConcurrentHashMap$KeySet 1 24
Total 10,052 613,272

ch.obermuhlner.collection.ArraySet

This is an experimental implementation of an array-based mutable Set that was designed to have minimal memory footprint.

Access with contains() is O(n).

Class Name Objects Bytes
java.lang.Object[] 1 80,024
ch.obermuhlner.collection.ArraySet 1 24
Total 2 80,048

ch.obermuhlner.collection.ImmutableArraySet

Similar to the ArraySet above but immutable. Designed for minimal memory footprint.

Access with contains() is O(n).

Class Name Objects Bytes
java.lang.Object[] 1 80,024
ch.obermuhlner.collection.ImmutableArraySet 1 24
Total 2 80,048

ch.obermuhlner.collection.ImmutableSortedArraySet

Another experimental implementation of an array-based immutable Set.
The array is sorted by hash code and contains() uses a binary search.

Access with contains() is O(log(n)).

The ImmutableSortedArraySet has the option to store the hash codes of all elements in a separate int[] which trading improved performance with additional memory footprint.

Class Name Objects Bytes
java.lang.Object[] 1 80,024
int[] 1 40,024
ch.obermuhlner.collection.ImmutableSortedArraySet 1 32
Total 3 120,080

Performance

The performance of the different Sets was measured by running contains() with an existing random key against a set of a specific size.

Measuring with up to 20 elements shows that ArraySet and ImmutableArraySet really are linear. With less than 10 elements they are actually faster than the O(log(n)) ImmutableSortedArraySet.

In the next chart we see the performance with up to 1000 elements. The linear Sets are no longer shown because they out-scale everything else.

This entry was posted in Development, Java. Bookmark the permalink.

2 Responses to Memory Impact of Java Collections

  1. If overall memory utilization is increasing continuously despite garbage collection, there is a memory leak, which will inevitably lead to an out-of-memory error. In this case, a memory heap analysis is necessary.

    • admin says:

      Sorry for answering so late, there was so much spam that I did not see your comment.

      You are of course correct. Memory utilization will increase if you keep references to your object, for example by having added it to a collection.

      What I am measuring here is how much memory is needed to maintain the data structure of the collection and how much heap memory can be freed by storing the object data in non-heap memory.

      This has nothing to do with memory leaks.

Leave a Reply

Your email address will not be published. Required fields are marked *