Custom-built ArrayList
Learning Outcomes
- Be able to build a custom list class
- Understand the time costs of various list operations
Author: Gaurav Gupta
Custom-built arraylist
In this chapter, we’ll see how an arraylist holds items and how to implement it.
NOTE! The built-in class for ArrayList in Java holds a collection of objects, not primitive data type, but in our example, we’ll create a collection of double
items, for simplicity.
What an ArrayList class looks like
The ArrayList class has an instance variable array that holds the items. If, at the time of triying to add an item, the array is full, then:
- a bigger array is created,
- items from the instance variable array are copied over
- instance variable array is re-referenced to refer to the new, bigger array
Let’s create one such class holding an array of doubles
1
2
3
4
5
6
7
8
9
10
11
12
13
14
public class MyArrayList {
private double[] data;
private int nItems;
public MyArrayList() {
data = new double[5];
nItems = 0; //no item added yet
}
public void add(double itemToAdd) { //THERE IS A PROBLEM HERE
data[nItems] = itemToAdd;
nItems++;
}
}
The client can create an object of class MyArrayList
as:
1
2
3
4
5
6
MyArrayList list = new MyArrayList();
list.add(10);
list.add(70);
list.add(20);
list.add(90);
list.add(30);
Five items were added to the list, which is ok. But as soon as you try to add a sixth item using,
1
list.add(80);
The method add
will cause,
1
data[5] = 80; //ArrayIndexOutOfBoundsException
Therefore, we must first check if the internal array is full.
1
2
3
4
5
6
7
8
9
//inside class MyArrayList
public boolean isFull() {
if(nItems == data.length) {
return true;
}
else {
return false;
}
}
The method add
should be modified as,
1
2
3
4
5
6
7
8
//inside class MyArrayList
public void add(double itemToAdd) {
if(isFull()) {
???
}
data[nItems] = itemToAdd;
nItems++;
}
If the internal array is full,
- a bigger array is created,
- items from the instance variable array are copied over
- instance variable array is re-referenced to refer to the new, bigger array
These steps are performed in a method grow()
.
1
2
3
4
5
6
7
8
9
public void grow() {
double[] temp = new double[data.length + 5]; //5 extra items
for(int i=0; i < data.length; i++) { //copy all items over
temp[i] = data[i];
}
data = temp; //make instance variable array refer to the bigger array
}
Updated class definition
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
public class MyArrayList {
private double[] data;
private int nItems;
public MyArrayList() {
data = new double[5];
nItems = 0; //no item added yet
}
public boolean isFull() {
if(nItems == data.length)
return true;
else
return false;
}
private void grow() { //should not be called by outside code
double[] temp = new double[data.length + 5]; //5 extra items
for(int i=0; i < data.length; i++) { //copy all items over
temp[i] = data[i];
}
data = temp; //make instance variable array refer to the bigger array
}
public void add(double itemToAdd) {
if(isFull()) {
grow();
}
data[nItems] = itemToAdd;
nItems++;
}
}
Method to add an item at an arbitrary index
Say, the list contains items [12.5, 6.4, 8.5, 9.6, -3.4] and we want to add an item 16.42 at index 0. All items in the array need to move forward by 1.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public boolean add(int idx, double itemToAdd) {
if(idx < 0 || idx > nItems) {
return false; //to indicate failure
}
if(isFull()) {
grow();
}
for(int i=nItems - 1; i >= idx; i--) {
data[i+1] = data[i];
}
data[idx] = itemToAdd;
nItems++;
return true; //to indicate success
}
Method to remove an item from an arbitrary index
Say, the list contains items [12.5, 6.4, 8.5, 9.6, -3.4] and we want to remove the item at index 0. All items in the array need to move backward by 1.
The method should return the item removed.
If we keep the return type as double
, what do we return if the index is invalid and therefore no item exists at that index?
Hence, we keep the return type Double
since we can return null
as error code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
public Double remove(int idx) {
if(idx < 0 || idx >= nItems) {
return null; //error code
}
double itemRemoved = data[idx];
for(int i= idx; i < nItems - 1; i++) {
data[i] = data[i+1];
}
nItems--;
return itemRemoved; //can return double variable as Double
}
Complete class and client (including homework)
- MyArrayList.java (contains several incomplete methods for practice)
- MyArrayListClient.java
Homework tasks
1. Percentage used
Add an instance method percentageUsed
that returns the percentage of the instance array being used. For example, if data.length = 15
and nItems = 12
, method returns the value 80.0
.
2. indexOf
Add an instance method indexOf
that when passed a double
, returns the index of the first occurrence of the item. Return -1 if item doesn’t exist. Method header is:
1
public int indexOf(double target)
3. lastIndexOf
Add an instance method lastIndexOf
that when passed a double
, returns the index of the last occurrence of the item. Return -1 if item doesn’t exist. Method header is:
1
public int lastIndexOf(double target)
4. isValidIndex
Add an instance method isValidIndex
that when passed an int
(index), returns true
if the index is valid, and false
otherwise. Method header is:
1
public boolean isValidIndex(int index)
For example, if nItems = 7
, indices from 0 to 6 (inclusive on both sides) are valid, and all other indices are invalid.
5. set
Add an instance method set
that when passed an int
(index) and a double
(updated value), sets the item at given index (if within range) to the updated value and returns true
. If the index is invalid, return false
. Method header is:
1
public boolean set(int index, double updated)
6. Modified growth
Modify the method grow
such that the array grows only if full. If the array is not full, it should not grow. Original method (that grows the array irresepective of being full or not) provided below:
1
2
3
4
5
6
7
8
9
public void grow() {
double[] temp = new double[data.length + 5]; //5 extra items
for(int i=0; i < data.length; i++) { //copy all items over
temp[i] = data[i];
}
data = temp; //make instance variable array refer to the bigger array
}
7. Remove an item
Add an instance method remove
that when passed a double
, removes the first occurrence of the item from the list (if any). The method should return true
if there was an instance of the item that was removed, and false
if there was no instance of the item in the list. Method header is:
1
public boolean remove(double itemToRemove)
8. Remove all
Add an instance method removeAll
that when passed a double
, removes all occurrences of the item from the list. You can use the method remove
already implemented. The method should return true
if there was an instance of the item that was removed, and false
if there was no instance of the item in the list. Method header is:
1
public boolean removeAll(double itemToRemove)
Array-based lists operation analysis
At the end of the day, an array-based list is an array. In this section, we’ll take a look at the three core operations on arrays:
- accessing at item at a given index
- adding an item at a given index
- removing an item from a given index
Accessing an item at a given index
We need to understand how an array is stored to understand how array items are accessed.
An array is stored as a contiguous block of memory. So, if you have 100 items, you need a block of 400 bytes (given each integer is 4 bytes).
As an example, consider the following array containing 6 integers.
1
int[] data = {20, 50, 10, 90, 70, 30};
24 bytes are reserved for this instance, 4 bytes per integer. The first item is stored in the first four bytes, the second item in the next four bytes and so on.
Thus,
- The address of the first item (at index 0) is the base address of the array (in this example, 600).
- The address of the second item (at index 1) is the base address of the array + 4 bytes.
- The address of the third item (at index 2) is the base address of the array + 2 * 4 bytes.
- …
-
The address of the last item (at index 5) is:
base address of the array
+5
*4
bytes.
With a little tweak we can see:
- Address(
data[0]
) =baseAddress
+ 0 * 4 - Address(
data[1]
) =baseAddress
+ 1 * 4 - Address(
data[2]
) =baseAddress
+ 2 * 4 - …
- Address(
data[data.length - 1]
) =baseAddress
+ (data.length - 1) * 4
Putting it all together,
(address of item at index idx)
= (base address of array)
+
(size of each item)
* (idx)
;
This means, irrespective of the index accessed, it takes the same time to return an item. A sample table illustrating time taken against index is provided below. In this, x
is the amount of time taken and could be any value more than 0.
index | time taken |
---|---|
0 | x milliseconds |
10 | (around) x milliseconds |
800 | (around) x milliseconds |
60000 | (around) x milliseconds |
If you plot a chart with index on x-axis and time taken on y-axis, you’ll get a (near) flat-line. We can this O(1) operation in terms of time complexity (but don’t stress, we will teach the formal terminology in subsequent units).
Example: Accessing an item
Let’s say we want to access data[3]
in the following statement:
1
int item = data[3];
We start off with the reference held by data
and getting the base address (600).
To that, we add 3 * 4 = 12 (3 being the index and 4 being size of each item), giving us the starting address of the item 612.
We grab 4 bytes starting at that address (so bytes 612 to 615) and present it packed as an integer: 90.
Growing the array
The grow
method is as follows:
1
2
3
4
5
6
7
8
private void grow() {
double[] temp = new double[data.length + 5];
for(int i=0; i < data.length; i++) {
temp[i] = data[i];
}
data = temp;
}
The key operation in this method is the loop that iterates data.length
number of times. The following table summarizes the time behaviour of the loop.
current size | time taken |
---|---|
5 | x milliseconds |
10 | (around) 2 times x milliseconds |
500 | (around) 100 times x milliseconds |
10000 | (around) 2000 times x milliseconds |
A chart plotted for this with current size on x-axis and time taken on y-axis would yield a straight (rising) line. We call this O(n) operation in terms of time complexity.
Inserting an item at a given index
There are four cases for inserting an item:
-
Array is not full and you are inserting an item at the end of the array: None of items need to be moved, array doesn’t need to be grown. This requires a fixed number of operations, irrespective of the current state of the array and hence O(1) operation.
-
Array is full and you are inserting an item at the end of the array: None of items need to be moved but array needs to be grown. Growing the array is O(n). Therefore, this is an O(n) scenario.
-
Array is not full but you are inserting an item somewhere (not at the end): Some items need to be moved but array doesn’t need to be grown. The worst case is when you are adding an item at index 0, in which case all items need to move, which is O(n). Therefore, this is also an O(n) scenario.
For example, if there are 5 items currently being used in an array of size 8, and we try to insert an item 5.6 at index 0, all 5 items need to be moved forward.
These shifts are illustrated in the following diagram:
- Array is full and you are inserting an item somewhere (not at the end): Double whammy, but O(n) + O(n) is still O(n), just the straight line is steeper.
Removing at item at a given index
There are two relevant cases for removing an item:
- Best case: removing the last item: No items need to be moved. O(1).
-
Worst case: Removing the first item: All the items need to be moved. O(n) because of the loop that needs to iterate over each item.
For example, if there are 5 items currently being used in an array of size 8, and we try to remove the first item, all 5 items need to be moved backward by one.
These shifts are illustrated in the following diagram:
Homework exercises
-
Explain the calculations performed during calculating the address of
data[6]
if the base address is 4000 and the array is of typedouble[]
. Eachdouble
variable needs 8 bytes of memory. -
Explain the calculations performed during calculating the address of
data[40]
if the base address is 3000 and the array is of typechar[]
. Eachchar
variable needs 2 bytes of memory. -
Consider an array
data = {10, 70, 20, 90, 30, 80, 60}
. Explain, with the help of a diagram, the shifts involved while removing the item 70. -
Consider an array
data = {10, 70, 20, 90, 30, 80, 60}
. Explain, with the help of a diagram, the shifts involved during the process of adding the item 30 at index 1.