Linked Lists

By now we’ve covered enough of the basics to begin studying algorithms. For the earlier ones, we’ll hold your hand in the implementation details, but we strongly recommend that you implement all the algorithms and data structures yourself at least once without looking at sample code.

For now let’s look at a way to store an indefinite amount of data. In many programming languages, an object that can store an indefinite amount of items is called a list. So how do we make a list?

The first idea we can have would be to initially allocate an array for, say, ten items. When the list gets full and another item needs to be added, we can allocate a new array of size eleven, then copy everything over and free the original array. Doing this, every addition, we can just allocate a new array, copy everything over and delete the old one. This works, but it’s slow. Can we do better?

Of course, the next intuitive idea is to allocate more space when you need to resize your list. But really, you’re just prolonging the problem.

The next idea is this: what if we just allocate space for each item and store the items there? Well, in this case we’ll have to keep track of where the items are too, otherwise we’ll just have memory leaks and lose track of where our data is. Clearly, if we store an item together with its location, they’ll both be lost at the same time, so we have to store the location somewhere we can already find. This leads us to the most basic idea of the linked list. And at this point, a picture of what we mean will do much more than words.

In this picture, each big rectangle stores the data we want to store and a pointer to the next big rectangle. The last big rectangle’s pointer is the null pointer, or as previously discussed, a pointer to memory address 0. This marks that nothing follows after the fourth big rectangle. In computer language texts, you will find these big rectangles called nodes, and we will use that terminology here too.

To restate the obvious, a node contains some data and a pointer to the next node. This way, we only need to store a pointer to the first node, also called the head of the list, and we can find all the other nodes by following the pointers. The code for a node would look like this:

  1. struct node {
  2. int data;
  3. node* next;
  4. }

data can change. It could be an int or a char. You can name it whatever you want, and you can put as many members as you need there. The only important thing is that you keep a pointer to the next node.

So what would our operations on this look like? Let’s look at two things for now: Adding an item to the list and printing out the entire list.

  1. #include <stdio.h>
  2. struct node {
  3. int data;
  4. node* next;
  5. node(int data) {
  6. this->data = data;
  7. this->next = 0;
  8. }
  9. void add(int x) {
  10. // Look for the last node and store it in `current`
  11. node* current = this;
  12. while(current->next != 0) {
  13. current = current->next;
  14. }
  15. // Create a new node
  16. node* new_node = new node(x);
  17. // Point the last node to the new node
  18. current->next = new_node;
  19. }
  20. void print() {
  21. node* current = this;
  22. // Loop through each node until you hit the null pointer
  23. while(current != 0) {
  24. printf("%d\n", current->data);
  25. current = current->next;
  26. }
  27. }
  28. };
  29. main() {
  30. node* head = new node(1);
  31. head->add(2);
  32. head->add(2);
  33. head->add(3);
  34. head->print();
  35. }

The above code works, but it requires us to initialize the one-element node first. Plus it’s weird to think about because it looks like we are adding data to an object of type node. If we write our code better, we can make it more pleasant to think about.

  1. #include <stdio.h>
  2. struct node {
  3. int data;
  4. node* next;
  5. node(int data) {
  6. this->data = data;
  7. this->next = 0;
  8. }
  9. };
  10. struct linked_list {
  11. node* head;
  12. linked_list() {
  13. this->head = 0;
  14. };
  15. void add(int x) {
  16. if (this->head == 0) {
  17. node* new_node = new node(x);
  18. this->head = new_node;
  19. } else {
  20. node* current = this->head;
  21. while(current->next != 0) {
  22. current = current->next;
  23. }
  24. node* new_node = new node(x);
  25. current->next = new_node;
  26. }
  27. };
  28. void print() {
  29. node* current = this->head;
  30. while(current != 0) {
  31. printf("%d\n", current->data);
  32. current = current->next;
  33. }
  34. }
  35. };
  36. main() {
  37. linked_list list;
  38. list.add(1);
  39. list.add(2);
  40. list.add(2);
  41. list.add(3);
  42. list.print();
  43. }

Isn't that code a little bit nicer to look at now that we’ve defined a linked_list object?

While we’re at it, we should also take care of not causing any memory leaks. Just add the following destructor to our linked list.

  1. ~linked_list() {
  2. node* current = this->head;
  3. while(current != 0) {
  4. // You will want to get a pointer to `next` before you free this one
  5. // Otherwise, you'll get a segfault when you try to access memory after you freed it. :)
  6. node* next = current->next;
  7. delete current;
  8. current = next;
  9. }
  10. };

The improved code quality also means we can easily add items to the start of the list:

  1. void add_start(int x) {
  2. // Create a new node and point it to the current start of the list
  3. node* new_node = new node(x);
  4. new_node->next = this->head;
  5. // Update the start of the list to be the newly-created node
  6. this->head = new_node;
  7. };

If we had kept a reference to the head as we did before, add_start will end up a little bit messier because we would have to remember to update the local variables.

Also, we still have one very big problem with this: our linked list still doesn't solve our original problem of being able to add items quickly to the end! We still have to iterate through our entire list before we can add an item.

The modification we have to do isn’t too difficult. So before you move on, please think of a way to speed up insertion at the end.

You need to add a new member to linked_list to avoid slow computations.

You need to add a new member to linked_list, which points to the last node. Make sure to update it whenever you add a new node. The normal term used for this pointer is tail.

Once you are done, please implement the solution you have found.

Removing Items

Now that we can add items, how do we remove items?

Removing items from the start is easy, just update the head then delete the previous head.

What about removing from the end? From the previous exercise, we’ve already figured out a fast way to get the last node. But if we delete that, how do we update our last node? And how do we make sure the new last node no longer points to a freed memory address?

To remove from the end, all you need to do is make sure each node also has a pointer to the node behind it. This is called a doubly linked list. The first one that we were discussing with only forward links is called a singly linked list.

Please implement a doubly linked list with removal from both sides. :D