When you define an
array in memory, it takes up a certain chunk of
memory. The size of this chunk is fixed, and you can't
grow or
shrink it.
Dynamic data structures are all about storing an amount of data which could be any size up to the size of the computer's total memory. One of the easiest ways to do this is with a linked list.
Suppose I define a linked list, called LList. During the course of my program, I want to perform a number of operations on it - insert, delete and search.
Before discussing these operations, its nessecary to talk a little bit about linked list theory. A linked list is comprised of a number of items, called nodes. Each node comprises of several fields - the actual data held in that node, and one or two pointers. These pointers are the glue that holds the data in the list together. Pointers tell the computer where other items in the list are in memory. In a singly-linked list, each node in a list has a pointer that contains the memory address of the node "in front of" it in the list. In a doubly-linked list, each node has a pointer to the node in front of it, and the node behind it. Imagine it like this -
|-----| |-----| |-----| |-----|
|Node | ---> |Node | ---> |Node | ---> |Node | --->
|-----| |-----| |-----| |-----|
The arrows represent the pointers nodes have to other nodes.
What determines the order the nodes are in? It's simply the order they are inserted into the list. More on insertion later.
The most fundamental operation to perform on a linked list is the search. To search, we start at the head of a list (or possibly the tail, if it's a doubly-linked list), and check if that node is the one we're looking for. If it's not, we ask that node what node is "in front of" (or behind) it, and iterate onto that one. The pattern repeats until we find the node we want.
Right, now we can discuss insertion operations. To insert a new node into a list, we iterate to the end of the list, and create the new node. We set the pointer of the old last node to point to the new last node. The pointer of the last node in the list is typically set to 0 (because there is no node in front of it), so we set the forwards pointer of the new node to 0.
Deletion operations work in a similar fashion. We iterate over the list until we find the node we wish to delete, and wipe it. Then we adjust the pointers around it to point to the relevent other nodes, so the chain is not broken.
Confused? You should be.
Getting rid of the confusion: conceptualising a linked list.
Start pointer: 4.
-----------------------------------
| Memory address | Data | Pointer |
|-------------------------------- |
| 1 | 23 | 3 |
| 2 | 34 | 5 |
| 3 | 45 | 2 |
| 4 | 67 | 1 |
| 5 | 12 | 0 |
-----------------------------------
What you see above is a very simple singly-linked list. The start of the list is in memory address 4, and the last item in the list is in memory address 5.
Stepping through a search operation.
Imagine I want to search for the value 45. I start at memory address 4, because this is the start of the list. Does this contain the value 45? No, so I move onto memory address 1 (as the pointer field of address 4 specifies), and check it. It doesn't contain 45 either, so it's onto address 3. That does contain 45! Job done.
Stepping through an insertion operation.
I want to insert the value 57 into the list. Firstly, I create the new node (as yet unlinked). The table now looks like this -
Start pointer: 4.
-----------------------------------
| Memory address | Data | Pointer |
|-------------------------------- |
| 1 | 23 | 3 |
| 2 | 34 | 5 |
| 3 | 45 | 2 |
| 4 | 67 | 1 |
| 5 | 12 | 0 |
| 6 | 57 | |
-----------------------------------
As you can see, this new node, which is held in memory address 6, doesn't point to anything and isn't pointed to by anything. Let's fix that.
I start at the start pointer, which specifies memory address 4. I then iterate over the entire list until I get to a node which points to address 0. This is the node in memory address 5. As I know the new node is held in memory address 6, I set the pointer of memory address 5 to 6. I then set the pointer of memory address 6 to 0, because it's the last item in the list. This is the new list -
Start pointer: 4.
-----------------------------------
| Memory address | Data | Pointer |
|-------------------------------- |
| 1 | 23 | 3 |
| 2 | 34 | 5 |
| 3 | 45 | 2 |
| 4 | 67 | 1 |
| 5 | 12 | 6 |
| 6 | 57 | 0 |
-----------------------------------
It is actually much more efficient to insert a new item at the start of the list, in which case the start pointer would need to be changed to point to the new first value (6), and the pointer of the new value would need to be set to the old start pointer (4).
The new list:
Start pointer: 6.
-----------------------------------
| Memory address | Data | Pointer |
|-------------------------------- |
| 1 | 23 | 3 |
| 2 | 34 | 5 |
| 3 | 45 | 2 |
| 4 | 67 | 1 |
| 5 | 12 | 0 |
| 6 | 57 | 4 |
-----------------------------------
Deletion operations.
If I want to delete the node with the data value 34, the first thing I need to do is perform a search operation until I find it. I'll eventually find it in address 2. The simplest way to delete this item from the list is simply to remove the pointer to it. (NB: It is important to perform garbage collection on any item you remove the pointer to. Otherwise, the data will be left there using memory, but inaccessible.).
After removing the pointer to item number 2, the list looks like this -
Start pointer: 4.
-----------------------------------
| Memory address | Data | Pointer |
|-------------------------------- |
| 1 | 23 | 3 |
| 2 | 34 | 5 |
| 3 | 45 | 5 |
| 4 | 67 | 1 |
| 5 | 12 | 6 |
| 6 | 57 | 0 |
-----------------------------------
Memory address 2 is no longer a part of the list.
A final note: sorted linked lists and mid-list insertions
All of the lists above are not sorted in any particular order. A list which is sorted in lexicographical order can allow for very fast searches, but requires a bit more nounce when adding items to the list.
For instance, imagine an insertion operation on a sorted linked list. Instead of iterating to the end of the list, I'd have to iterate until I find a point where the item in front of it is greater than the one I am inserting, and the item behind it is less than. Then I'd need to adjust the pointers accordingly for a mid-list insertion.
Mid-list insertion
Start pointer: 5.
-----------------------------------
| Memory address | Data | Pointer |
|-------------------------------- |
| 1 | 23 | 2 |
| 2 | 34 | 3 |
| 3 | 45 | 4 |
| 4 | 67 | 0 |
| 5 | 12 | 1 |
| 6 | 37 | |
-----------------------------------
The value in memory address 6 needs to be inserted in the middle of the list. Here's how we do it -
Start at memory address 5. 37 is greater than 12, so we need to insert past this point. Iterate forward. Next, it's memory address 1, which contains the value 23. Iterate foward. Next value is 34 in memory address 2, iterate forward. What's the next value? 45, aha! We've gone too far, so we back up. Set the pointer of memory address 2 to 6. The older pointer in memory address 2 pointed to 3, so set the pointer of address 6 to 3. Job done.
The new list:
Start pointer: 5.
-----------------------------------
| Memory address | Data | Pointer |
|-------------------------------- |
| 1 | 23 | 2 |
| 2 | 34 | 6 |
| 3 | 45 | 4 |
| 4 | 67 | 0 |
| 5 | 12 | 1 |
| 6 | 37 | 3 |
-----------------------------------
Here's a generic singly-linked list implemented in C++ -
template<class T>struct Link {
T Data;
Link<T>* Next;
Link();
};
template<class T>Link<T>::Link() {
Next = 0;
}
template<class T>class LinkedList {
public:
Link<T>* First;
void InsertAtEnd(T Data);
void InsertAtStart(T Data);
bool InsertAfterPoint(T Data, int Point);
bool DeletePoint(int Point);
T operator(int index);
LinkedList();
~LinkedList();
};
template<class T>
LinkedList<T>::LinkedList() {
First = 0;
}
template<class T>
LinkedList<T>::~LinkedList() {
Link<T>* Next;
while ( First != 0 ) {
Next = First->Next;
delete First;
First = Next;
}
}
template<class T>
void LinkedList<T>::InsertAtEnd(T Data) {
Link<T>* Cur = First;
Link<T>* NewLink = new Link<T>;
NewLink->Data = Data;
NewLink->Next = 0;
while ( Cur->Next ) {
Cur = Cur->Next;
}
Cur->Next = NewLink;
}
template<class T>
void LinkedList<T>::InsertAtStart(T Data) {
Link<T>* NewLink = new Link<T>;
NewLink->Data = Data;
NewLink->Next = First;
First = NewLink;
}
template<class T>
bool LinkedList<T>::InsertAfterPoint(T Data, int Point) {
Link<T>* Cur = First;
if (! Cur )
return false;
for ( int i = 0; i < Point; i++ ) {
if (! Cur->Next )
return false;
Cur = Cur->Next;
}
Link<T>* NewLink = new Link<T>;
NewLink->Data = Data;
NewLink->Next = Cur->Next;
Cur->Next = NewLink;
return true;
}
template<class T>
T LinkedList<T>::operator(int index) {
Link<T>* Cur = First;
if (! Cur )
return NULL;
for (int i = 0;i < index; i++) {
if (! Cur->Next )
return NULL;
Cur = Cur->Next;
}
return Cur->Data;
}
template<class T>
bool LinkedList<T>::DeletePoint(int Point) {
Link<T>* Cur = First;
Link<T>* Prev;
if (! Cur )
return false;
for ( int i = 0;i < Point; i++ ) {
if (! Cur->Next )
return NULL;
Prev = Cur;
Cur = Cur->Next;
}
if ( Point == 0 ) {
First = Cur->Next;
} else {
Prev->Next = Cur->Next;
}
delete Cur;
return true;
}