1. List of collections
As you may remember, Java has collections — a handy tool for storing objects of the same type.
Let's try to recall the main collection-related interfaces:
List, Set, Map and Queue.
As usual, tools aren't necessarily good or bad — what matters is whether you are using them for their intended purpose. And to do that, we must thoroughly understand their specific features in order to know which collection to use and when.
1. List
Let's start with the most used collection.
List as close as possible to a plain old array.
This collection lets us conveniently store a list of objects of the same type without worrying about the size of the collection itself, as we would have to if we were using an array. Elements of the collection are accessed by index. If we know exactly where an object is and need to access it frequently without often needing to add or remove elements, a List is ideal.
2. Set
Set has a completely different structure.
Set is most suitable when we need to store unique objects. For example, a set of authors in a library where each author is unique. But we can't just go and grab any specific author from it. Set lets us quickly check whether a particular author is present in our library, i.e. we can check whether a unique object is present in a Set. We can also iterate over the entire collection, accessing each element, but doing that is not optimal.
In other words, for our library, a Set can represent the collection of all unique authors to quickly check if any particular author is present.
3. Map
Map is more like a filing cabinet, where each file is signed and can store individual objects or entire structures. Map should be used in cases where we need to maintaining a mapping from one value to another.
For Map, these relationships are called key-value pairs.
We could use this structure in our library by using author objects as the keys and lists (List objects) of books as the values. Thus, after checking a Set to see if an author object exists in the library, we can use the same author object to get a List of his or her books from a Map.
4. Queue
Queue is a collection that — surprise! — implements the behavior of a queue. And the queue can be either LIFO (Last In, First Out) or FIFO (First In, First Out). What's more, the queue can be bidirectional, or "doubled-ended".
This structure is helpful when the objects added to the class need to be used in the order they are received. For example, take our library.
We can add newly arrived visitors to a Queue and serve them in turn, issuing the books they come for.
As we can see, each of these structures is good if used for its intended purpose. And we found good uses for all four types of collections in a single library example.
2. Complexity
As already noted, the collections we considered above are interfaces, which means they must have implementations in order for us to use them.
Just as hammering nails with a microscope is not the best idea, not every implementation of a collection is suited to every situation.
When choosing the right tool for a job, we typically look at 2 characteristics:
- How well does the tool fit the job?
- How fast will it get the job done?
We've spent some time figuring out how to choose a suitable tool for a job, but its speed is something new.
In computing, a tool's speed is often expressed in terms of time complexity and denoted by a capital letter O.
What in the world is time complexity?
In simple terms, time complexity indicates the time required for an algorithm in the collection to perform a particular action (adding/removing an element, searching for an element).
ArrayList vs LinkedList
Let's look at this using two implementations of the List interface — ArrayList and LinkedList.
For outward appearances, working with these collections is similar:
List<String> arrayList = new ArrayList<>();
arrayList.add(String);
arrayList.get(index);
arrayList.remove(index);
arrayList.remove(String);
List<String> linkedList = new LinkedList<>();
linkedList.add(String);
linkedList.get(index);
linkedList.remove(index);
linkedList.remove(String);
As you can see, for both collection types, adding, getting, and removing elements looks the same. This is because these are implementations on the same interface. But that's where the similarities end.
Due to their different implementations of the List interface, these two structures perform different actions more efficiently than others.
Consider removing and adding an element.
If we need to remove an element from the middle of an ArrayList, we have to overwrite whatever part of the list follows the element we remove.
Suppose we have a list of 5 elements and we want to remove the 3rd one.
List<Integer> list = Arrays.asList(1, 2, 3, 4, 5);
list.remove(2);
In this case, the removal frees up one cell, so we need to write the 4th element where the 3rd was, and the 5th where the 4th was.
This is highly inefficient.
The same happens when adding an element to the middle of the list.
LinkedList is structured differently. Adding or removing elements is fast, since we only need to change the references in the previous and next elements, thereby excluding the object we are removing from the chain of elements.
Returning to the example of the same list of 5 elements, after removing the 3rd element, all we need to do is simply change the 2nd element's reference to the next element and 4th element's reference to the previous one.
When an element is added to the list, the same process happens, but in reverse.
Notice how much less work we need to do in a LinkedList compared to an ArrayList. And that's just 5 elements. If our list had 100 or more elements, the superiority of LinkedList would become even more noticeable.
But how does the situation change if we access an element by index?
Everything is the exact opposite here.
Since ArrayList is structured as an ordinary array, getting any element by its index will be very easy for us. We simply move the pointer to a certain place and get the element from the corresponding cell.
But a LinkedList simply doesn't work that way. We have to go through all the elements of the list to find the element with a certain index.
Shall we try to express all of this in terms of big O?
Let's start by accessing an element by index.
In an ArrayList, this happens in one step, regardless of where the element is located in the list. Whether at the end or the beginning.
In this case, the time complexity will be O(1).
In a LinkedList, we have to iterate over a number of elements equal to the value of the index we need.
The time complexity for such an action is O(n), where n is the index of the element we need.
Here you see that the number we put in the big-O parentheses corresponds to the number of actions performed.
Shell we return to removing and adding?
Let's start with LinkedList.
Because we do not need to do a large number of actions to add or remove an element, and the speed of this operation does not depend in any way on where the element is located, it complexity is expressed as O(1) and is said to be constant.
The time complexity of this operation for ArrayList is also O(n), which we call linear complexity.
In algorithms with linear complexity, the running time depends directly on the number of elements to be processed. It may also depend on the element's position, whether it is at the beginning of the list or towards the end.
Time complexity can also be logarithmic. This is expressed as O(log n).
As an example, consider a sorted TreeSet consisting of 10 numbers. We want to find the number 2.
Since the list is sorted and contains no duplicates, we can split it in half and check which half would contain the desired number, discard the irrelevant part and then repeat this process until we reach the desired element. Ultimately, we will find (or not find) the number after processing log(n) number of elements.
Here's a table that summarizes the time complexity the rest of the collections.
By index | By key | Search | Insertion at the end | Insertion in the end | Removal | |
---|---|---|---|---|---|---|
ArrayList | O(1) | N/A | O(n) | O(1) | O(n) | O(n) |
LinkedList | O(n) | N/A | O(n) | O(1) | O(1) | O(1) |
HashSet | N/A | O(1) | O(1) | N/A | O(1) | O(1) |
TreeSet | N/A | O(1) | O(log n) | N/A | O(log n) | O(log n) |
HashMap | N/A | O(1) | O(1) | N/A | O(1) | O(1) |
TreeMap | N/A | O(1) | O(log n) | N/A | O(log n) | O(log n) |
ArrayDeque | N/A | N/A | O(n) | O(1) | O(1) | O(1) |
Now that we have a table showing the time complexity of popular collections, we can answer the question why, out of so many collections, we most often use ArrayList, HashSet and HashMap.
It's simply that they are the most efficient for most tasks :)
GO TO FULL VERSION