Sometimes it is important to know how much memory you need in order to store the data in a Java collection.
Here is a little overview of the memory overhead of the most important Java collections.
In some cases I also added my own implementations to see how they shape up.
All collections where filled with 10000 Integer elements and then the memory was measured by producing a heap dump and then analyzing the heap dump with MemoryAnalyzer.
Executed in:
Java(TM) SE Runtime Environment (build 1.6.0_23-b05) Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
on a
Intel(R) Core(TM) i7 CPU M 620 2.67GHz (4 CPUs), ~2.7GHz
Set
The data stored in the sets is always the same: 10000 instances of Integer
:
Class Name | Objects | Bytes |
---|---|---|
java.util.Integer | 10,000 | 240,000 |
Total | 10,000 | 240,000 |
Memory Footprint
java.util.HashSet
Class Name | Objects | Bytes |
---|---|---|
java.util.HashMap$Entry | 10,000 | 480,000 |
java.util.HashMap$Entry[] | 1 | 131,096 |
java.util.HashMap | 1 | 64 |
java.util.HashSet | 1 | 24 |
java.util.HashMap$KeySet | 1 | 24 |
Total | 10,004 | 611,208 |
java.util.TreeSet
Class Name | Objects | Bytes |
---|---|---|
java.util.TreeMap$Entry | 10,000 | 640,000 |
java.util.TreeMap | 1 | 80 |
java.util.TreeSet | 1 | 24 |
Total | 10,002 | 640,104 |
Collections.synchronizedSet(java.util.HashSet)
A synchronized HashSet
by wrapping it with the Collections.synchronizedSet()
method.
Class Name | Objects | Bytes |
---|---|---|
java.util.HashMap$Entry | 10,000 | 480,000 |
java.util.HashMap$Entry[] | 1 | 131,096 |
java.util.HashMap | 1 | 64 |
java.util.Collections$SynchronizedSet | 1 | 32 |
java.util.HashSet | 1 | 24 | Total | 10,004 | 611,208 |
Collections.newSetFromMap(ConcurrentHashMap)
Java does not provide a ConcurrentHashSet out of the box, but you can create an equivalent by wrapping a ConcurrentHashMap
with Collections.newSetFromMap()
.
Class Name | Objects | Bytes |
---|---|---|
java.util.concurrent.ConcurrentHashMap$HashEntry | 10,000 | 480,000 |
java.util.concurrent.ConcurrentHashMap$HashEntry[] | 16 | 131,456 |
java.util.concurrent.ConcurrentHashMap$Segment | 16 | 768 |
java.util.concurrent.ConcurrentHashMap$NonfairSync | 16 | 768 |
java.util.concurrent.ConcurrentHashMap$Segment[] | 1 | 152 |
java.util.concurrent.ConcurrentHashMap | 1 | 72 |
java.util.Collections$SetFromMap | 1 | 32 |
java.util.concurrent.ConcurrentHashMap$KeySet | 1 | 24 |
Total | 10,052 | 613,272 |
ch.obermuhlner.collection.ArraySet
This is an experimental implementation of an array-based mutable Set
that was designed to have minimal memory footprint.
Access with contains()
is O(n).
Class Name | Objects | Bytes |
---|---|---|
java.lang.Object[] | 1 | 80,024 |
ch.obermuhlner.collection.ArraySet | 1 | 24 |
Total | 2 | 80,048 |
ch.obermuhlner.collection.ImmutableArraySet
Similar to the ArraySet
above but immutable. Designed for minimal memory footprint.
Access with contains()
is O(n).
Class Name | Objects | Bytes |
---|---|---|
java.lang.Object[] | 1 | 80,024 |
ch.obermuhlner.collection.ImmutableArraySet | 1 | 24 |
Total | 2 | 80,048 |
ch.obermuhlner.collection.ImmutableSortedArraySet
Another experimental implementation of an array-based immutable Set
.
The array is sorted by hash code and contains()
uses a binary search.
Access with contains()
is O(log(n)).
The ImmutableSortedArraySet
has the option to store the hash codes of all elements in a separate int[]
which trading improved performance with additional memory footprint.
Class Name | Objects | Bytes |
---|---|---|
java.lang.Object[] | 1 | 80,024 |
int[] | 1 | 40,024 |
ch.obermuhlner.collection.ImmutableSortedArraySet | 1 | 32 |
Total | 3 | 120,080 |
Performance
The performance of the different Set
s was measured by running contains()
with an existing random key against a set of a specific size.
Measuring with up to 20 elements shows that ArraySet
and ImmutableArraySet
really are linear. With less than 10 elements they are actually faster than the O(log(n)) ImmutableSortedArraySet
.
In the next chart we see the performance with up to 1000 elements. The linear Set
s are no longer shown because they out-scale everything else.
If overall memory utilization is increasing continuously despite garbage collection, there is a memory leak, which will inevitably lead to an out-of-memory error. In this case, a memory heap analysis is necessary.
Sorry for answering so late, there was so much spam that I did not see your comment.
You are of course correct. Memory utilization will increase if you keep references to your object, for example by having added it to a collection.
What I am measuring here is how much memory is needed to maintain the data structure of the collection and how much heap memory can be freed by storing the object data in non-heap memory.
This has nothing to do with memory leaks.