Bad programmers worry about the code. Good programmers worry about data structures and their relationships. — Linus Torvalds
The Java collections framework is a set of classes and interfaces for implementing a commonly used collection data structure. Examples include lists, queues, etc. Joshua Bloch designed and developed the collections framework, which debuted in JDK 1.2.
Because these collections are well tested, we do not have to implement every algorithm and data structure from scratch.
Almost all collections except the map can be derived from the java.util.Collection interface. The toArray() method converts any Collection into a one-dimensional array. We can use the for-each loop because the Collection interface extends the java.lang.Iterable interface.
The data structure allows us to store and manipulate data efficiently. With proper data structure selection, we can make algorithms faster. Without the proper data structure (PriorityQueue), we cannot solve Dijkstra’s shortest path problem in O((E+V)LogV) rather it will take O(V²) when List is used.
Therefore, data structures have a significant impact on the final performance of the software. Let’s examine them in more detail.
- The List interface is an ordered collection that allows us to store and access items in a sequential manner.
- It extends the Collection interface which extends the Iterable interface.
- Lists can include duplicate elements.
- It gives the user full visibility and control over the ordering of its elements.
- Resizable-array implementation of the List interface.
- This class is roughly equivalent to Vector, except that it is unsynchronized.
- Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. As elements are added to an ArrayList, its capacity grows automatically.
- Random access with index takes O(1) time complexity.
- Insertion/deletion at the given index takes O(N) time complexity.
- Doubly-linked list implementation of the List and Deque interfaces.
- This implementation is not synchronized
- Since the items are not stored next to each other in the memory, so there is no random access.
- Manipulation (insertion/deletion) of the first & last item will take O(1) time complexity.
- Manipulating an arbitrary item will O(N) time complexity.
ArrayList vs LinkedList Performance
Let’s see the performance of ArrayList and LinkedList when insertion happens at the start of the list.
// Output on my system
Time taken by ArrayList : 519
Time taken by LinkedList : 4
LinkedList performance is way better than ArrayList as no data shift happened.
- Vector is similar to ArrayList but it’s synchronized.
- If a thread-safe implementation is not needed, it is recommended to use ArrayList in place of Vector.
- The Stack class represents a last-in-first-out (LIFO) stack of objects.
- It is an abstract data type and it can be implemented either with an ArrayList or with a LinkedList.
- Basic operations are push(), pop() and peek().
- A more complete and consistent set of LIFO stack operations is provided by the Deque interface and its implementations, which should be used in preference to this class.
- Graph algorithms rely heavily on stacks such as depth-first search.
- Finding Eulerial cycles in a G(V, E) graph.
- Finding strongly connected components in a given G(V, E) graph.
- Processing function call.
Peek Element: Tiger
Pop Element: Tiger
Stack extends the Vector class which means that stacks are inherently synchronized. However, if synchronization is not needed, in that case, prefer using ArrayDeque.
- A collection designed for holding elements prior to processing.
- It is an abstract data type and it can be implemented either with an ArrayList or a LinkedList.
- Queues typically, but do not necessarily, order elements in a FIFO (first-in-first-out) manner. Among the exceptions are priority queues, which order elements according to a supplied comparator, or the elements’ natural ordering.
- Graph algorithm breadth-first search use queue as an underlying abstract data type.
- Threads are stored in queues.
- CPU scheduling uses queues.
- Besides basic Collection operations, queues provide additional insertion, extraction, and inspection operations. Each of these methods exists in two forms: one throws an exception if the operation fails, and the other returns a special value.
Check Element: Elephant
Remove Element: Elephant
- A priority queue is based on a priority heap.
- The elements of the priority queue are ordered according to their natural ordering, or by a Comparator provided at queue construction time, depending on which constructor is used.
- A priority queue does not permit null elements.
- A linear collection that supports element insertion and removal at both ends. The name deque is short for “double-ended queue”.
- This interface defines methods to access the elements at both ends of the deque.
- This interface extends the Queue interface. When a deque is used as a queue, FIFO (First-In-First-Out) behavior results. Elements are added at the end of the deque and removed from the beginning.
- Deques can also be used as LIFO (Last-In-First-Out) stacks. This interface should be used in preference to the legacy Stack class. When a deque is used as a stack, elements are pushed and popped from the beginning of the deque.
- Resizable-array implementation of the Deque interface.
- They are not thread-safe.
- Null elements are prohibited.
- This class is likely to be faster than Stack when used as a stack, and faster than LinkedList when used as a queue.
ArrayDeque vs Stack Performance
Because Stack is synchronized as it extends the Vector class this is why it is going to be slower than the ArrayDequeue. So it’s advisable to use ArrayDeque if we want a LIFO structure.
// Output on my system
Time taken with ArrayDeque: 9ms
Time taken with Stack: 14ms
- Set are abstract data types that can store certain values without any particular order.
- Set contains no repeated values or duplicates.
- This class implements the Set interface, backed by a hash table (actually a HashMap instance).
- It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.
- This class permits the null element.
- Hash table and linked list implementation of the Set interface, with predictable iteration order.
- This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries.
- A linked hash set has two parameters that affect its performance: initial capacity and load factor. They are defined precisely as for HashSet.
- A Set that further provides a total ordering of its elements.
- The elements are ordered using their natural ordering, or by a Comparator typically provided at a sorted set creation time.
- All elements inserted into a sorted set must implement the Comparable interface
- A NavigableSet implementation based on a TreeMap.
- The elements are ordered using their natural ordering, or by a Comparator provided at a set creation time, depending on which constructor is used.
- This implementation provides guaranteed log(n) time cost for the basic operations (add, remove, and contains).
HashSet vs LinkedHashSet vs TreeSet
- HashSet and LinkedHashSet rely heavily on a one-dimensional array and a hash function.
- HashSet and LinkedHashSet have an average time complexity of O(1) but on collision, it can be O(N).
- HashSet and LinkedHashSet can store null keys as well.
- LinkedHashSet uses a doubly-linked list data structure in addition to maintaining insertion order.
- TreeSet uses a balanced binary search tree and its time complexity is O(logN).
- TreeSet cannot store null keys.
- TreeSet uses less memory compared to HashSet/LinkedHashSet.
- An object that maps keys to values.
- A map cannot contain duplicate keys; each key can map to at most one value.
- This interface takes the place of the Dictionary class, which was a totally abstract class rather than an interface.
- This class implements a hash table, which maps keys to values.
- Any non-null object can be used as a key or as a value.
- To successfully store and retrieve objects from a hashtable, the objects used as keys must implement the hashCode method and the equals method.
- An instance of Hashtable has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.
- Hashtable is similar to HashMap but it’s synchronized so it’s thread-safe.
- Hash table based implementation of the Map interface.
- The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.
- An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.
- The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.
- Hash table and linked list implementation of the Map interface, with predictable iteration order.
- This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries.
- This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order).
- A linked hash map has two parameters that affect its performance: initial capacity and load factor. They are defined precisely as for HashMap.
Before we talk about TreeMap let’s first understand the binary search trees.
- ArrayList can manipulate the last item in O(1) constant time complexity.
- LinkedList can manipulate the first item of the data structure fast.
- Searching for an arbitrary item takes O(N) linear running time for both ArrayList and LinkedList.
What if the array data structure is sorted?
We can search for an arbitrary item in O(logN) time complexity, which is the concept behind the binary search.
Binary Search Trees
- Every node in the tree can have at most 2 children (left and right)
- The left child is smaller than the parent node.
- The right child is greater than the parent node.
- We can access the root node exclusively and all other nodes can be accessed via the root node.
- It is a balanced data structure that has a guaranteed O(logN) time complexity.
- The time complexity of binary search trees depends on the height of the binary search tree.
- AVL trees are faster than red-black trees because they are more rigidly balanced but need more work.
- It’s faster to construct a red-black tree and many operating systems rely on it.
- A Red-Black tree-based NavigableMap implementation.
- The map is sorted according to the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used.
- This implementation provides guaranteed log(n) time cost for the containsKey, get, put and remove operations.
HashMap vs TreeMap performance
// Output on my system
Time taken with TreeMap: 53ms
Time taken with HashMap: 22ms
HashMap vs LinkedHashMap vs TreeMap
- HashMap and LinkedHashMap rely on a one-dimensional array and a hash function.
- So they have average-case time complexity of O(1) that can be O(N) because of collisions.
- LinkedHashMap uses a doubly-linked list data structure to maintain insertion order.
- TreeMap uses a balanced binary search tree and there are no collisions at all with the time complexity of O(logN).
- HashMap and LinkedHashMap can store null keys whereas TreeMap cannot store null keys.