Abstract data types

Now we're ready to implement some complex data types that are useful in contests.

An abstract data type is just a data type that defined by its behavior from the perspective of the user. In other words, an abstract data type is only defined by what it does, not how it does what it does. You'll see what I mean in the following sections, where we study three basic ones, and implement them. To implement an abstract data type is to write code that performs the behavior of that data type.

We begin with the stack.

Stacks

A stack is an abstract data type that implements a stack of things. Specifically, it supports the following two operations:

You can imagine this as a stack of books for example. You can only place a book on top of the stack (push), and only take the topmost book (pop).

This sounds simple enough, but in fact there are a few minor details that we need to focus on.

So given the definition of a stack, how do we implement one?

Stack implementation using arrays

This sounds very simple to implement. Let's investigate an implementation of the stack using an array.

  1. struct stack {
  2. int contents[100];
  3. int length;
  4. // initialize
  5. stack() {
  6. this->length = 0;
  7. }
  8. void push(int x) {
  9. this->contents[this->length++] = x;
  10. }
  11. int pop() {
  12. if (this->length == 0) { // empty stack
  13. return 0;
  14. } else {
  15. return this->contents[--this->length];
  16. }
  17. }
  18. bool is_empty() {
  19. return this->length == 0;
  20. }
  21. // destructor
  22. ~stack() {
  23. delete this->contents;
  24. }
  25. }

We just wrote our first stack implementation! This just uses an array, and every push places an element at the “end”. The length attribute signifies how many things have already been inserted, so we know where to put new items, we know where the topmost item is, and we know if the stack is empty.

Here's an example code that uses it:

  1. #include <stdio.h>
  2. ... // stack implementation here
  3. int main() {
  4. // let's create two stacks
  5. stack s1;
  6. stack s2;
  7. printf("empty? %d %d\n", s1.is_empty(), s2.is_empty());
  8. s1.push(5);
  9. printf("empty? %d %d\n", s1.is_empty(), s2.is_empty());
  10. s1.push(10);
  11. printf("pop from s1: %d\n", s1.pop());
  12. printf("pop from s2: %d\n", s2.pop());
  13. printf("empty? %d %d\n", s1.is_empty(), s2.is_empty());
  14. printf("pop from s1: %d\n", s1.pop());
  15. printf("empty? %d %d\n", s1.is_empty(), s2.is_empty());
  16. s1.push(11);
  17. s1.push(5);
  18. s1.push(6);
  19. s2.push(5);
  20. s2.push(3);
  21. s2.push(11);
  22. printf("pop from s1: %d\n", s1.pop());
  23. printf("empty? %d %d\n", s1.is_empty(), s2.is_empty());
  24. }

The expected output is:

empty? 1 1
empty? 0 1
pop from s1: 10
pop from s2: 0
empty? 0 1
pop from s1: 5
empty? 1 1
pop from s1: 6
empty? 0 0

Hooray! Our stack seems to work as intended! Except there is a major problem with our implementation: It can only hold up to $100$ items. This is because we only allocated a size of $100$ for contents. If we try to push more than a hundred items to the stack, then we either get a segmentation fault or we get some obscure hard-to-fix bugs. We should fix our implementation.

The first thought is to simply increase $100$ to something larger, say, $1000000$. But this has two problems:

Clearly, we need a different solution. Well, why don't we just reallocate the entire thing when we exceed the capacity? Indeed, we can do that. Check out our implementation:

  1. struct stack {
  2. int* contents;
  3. int allocated;
  4. int length;
  5. // initialize
  6. stack() {
  7. this->length = 0;
  8. this->allocated = 100;
  9. this->contents = new int[this->allocated];
  10. }
  11. void resize_if_needed() {
  12. // first, check if we need to resize the array
  13. if (this->length >= this->allocated) {
  14. // create a new array with 100 additional space
  15. int new_allocated = this->allocated + 100;
  16. int* new_contents = new int[new_allocated];
  17. // copy over the old contents to the new array
  18. for (int i = 0; i < this->length; i++) {
  19. new_contents[i] = this->contents[i];
  20. }
  21. // delete the old array (to prevent memory leak)
  22. delete this->contents;
  23. // set these as the new contents
  24. this->contents = new_contents;
  25. this->allocated = new_allocated;
  26. }
  27. }
  28. void push(int x) {
  29. resize_if_needed();
  30. this->contents[this->length++] = x;
  31. }
  32. int pop() {
  33. if (this->length == 0) { // empty stack
  34. return 0;
  35. } else {
  36. return this->contents[--this->length];
  37. }
  38. }
  39. bool is_empty() {
  40. return this->length == 0;
  41. }
  42. // destructor
  43. ~stack() {
  44. delete this->contents;
  45. }
  46. }

This implementation creates a new array that has 100 more available space when we run out.

Now, does this work? We can test it again using our example code above. You can see that the output is still the same. Furthermore, if you perform a test that pushes lots of items, you'll find that it still works as intended. Thus, we just wrote our first working stack implementation!

Notice that we didn't need to modify our example code to test our new implementation. That's a nice thing about abstract data types. We only specified its behavior, not how it's implemented, so any code that uses our data type should still work as intended. Indeed, your favorite programming language contains tons of implementations of abstract data types. These implementations get constantly updated between versions, but your programs that use them don't really break. They will only break if the abstract data types themselves change, but language designers and implementors take a lot of care that this only happen rarely.

Stack implementation using linked lists

Unfortunately, while the above works, it runs really slowly. It's because all that reallocating and copying takes a lot of time. Consider for example pushing one million times. Most of the time you only perform a few operations, but every $100$ pushes, we need to reallocate. On the first allocation, we allocate an array of size $200$. On the second, $300$, and so on. Thus, we can estimate the number of operations to be $$3\cdot 1000000 + (200 + 300 + 400 + \ldots + 1000100) > 5\times 10^9.$$ That's a lot of operations for just one million pushes!

We can state this in a different way: We say that the worst-case running time of a push operation is $O(n)$, where $n$ is the number of items in the stack. We would like to have a faster implementation.

Luckily, we can use linked lists which we learned in the previous section!

  1. struct stack {
  2. node* top;
  3. // initialize
  4. stack() {
  5. this->top = NULL; // in C++, "NULL" is just a synonym for "0"
  6. }
  7. void push(int x) {
  8. node* new_top = new node(x);
  9. new_top->next = this->top;
  10. this->top = new_top;
  11. }
  12. int pop() {
  13. if (this->top == NULL) { // empty stack
  14. return 0;
  15. } else {
  16. int result = this->top->data;
  17. this->top = this->top->next;
  18. return result;
  19. }
  20. }
  21. bool is_empty() {
  22. return this->top == NULL;
  23. }
  24. ~stack() {
  25. node* current = this->top;
  26. while (current != 0) {
  27. node* next = current->next;
  28. delete current;
  29. current = next;
  30. }
  31. }
  32. }

This implementation uses a linked list to represent a stack. The first node of the linked list represents the top of the stack, and is referred to by top. Clearly, all operations only does a fixed amount of things, so we say that the running time of each operation is $O(1)$. This is the best you can hope for!

We encourage you to check that it still works.

Queues

Let's move on to the next abstract data type. A queue is an abstract data type that implements a queue of things. Specifically, it supports the following two operations:

You can imagine this as a queue of people waiting to order at McDollibee for example. People that just arrived go to the back of the queue (enqueue), and the person in front of the queue will be served first (dequeue). First in, first out. Some people call “enqueue” and “dequeue” “push” and “pop” just like in stacks, but the names don't really matter too much. Here we just use “enqueue” and “dequeue” so we don't confuse terms.

Let's try to implement a queue. Here's a simple implementation using arrays:

  1. struct queue {
  2. int contents[100];
  3. int front, back;
  4. // initialize
  5. queue() {
  6. this->front = 0;
  7. this->back = 0;
  8. }
  9. void enqueue(int x) {
  10. this->contents[this->back++] = x;
  11. }
  12. int dequeue() {
  13. if (this->front == this->back) { // empty queue
  14. return 0;
  15. } else {
  16. return this->contents[this->front++];
  17. }
  18. }
  19. bool is_empty() {
  20. return this->front == this->back;
  21. }
  22. // destructor
  23. ~queue() {
  24. delete this->contents;
  25. }
  26. }

This uses an array and two indices called front and back to determine the extent of the queue in the array. The items are always in a contiguous subarray of the array at any point. front points to the first item of this subarray, and back points to the first available space after this contiguous subarray. back is always ahead of front except when the queue is empty.

We encourage you to write some code that tests whether this works as intended. Let us know if it does!

Like our first stack implementation, this also suffers from the problem of enqueueing too many items; our queue can only support up to $100$ “enqueue” operations. We can also solve this by reallocating the array every time front exceeds the boundaries of our array, like before. But again, this strategy is slow; the worst case for enqueue is $O(n)$.

Well, let's try using a linked list instead.

  1. struct queue {
  2. node* back;
  3. // initialize
  4. queue() {
  5. this->back = NULL;
  6. }
  7. void enqueue(int x) {
  8. if (this->back == NULL) {
  9. this->back = new node(x);
  10. } else {
  11. node* current = this->back;
  12. while (current->next != NULL) {
  13. current = current->next;
  14. }
  15. current->next = new node(x);
  16. }
  17. }
  18. int dequeue() {
  19. if (this->back == NULL) { // empty queue
  20. return 0;
  21. } else {
  22. int result = this->back->data;
  23. this->back = this->back->next;
  24. return result;
  25. }
  26. }
  27. bool is_empty() {
  28. return this->back == NULL;
  29. }
  30. // destructor
  31. ~queue() {
  32. node* current = this->back;
  33. while (current != 0) {
  34. node* next = current->next;
  35. delete current;
  36. current = next;
  37. }
  38. }
  39. }

Unfortunately, this implementation is still slow. Notice that in an enqueue operation, we perform a while loop just to find the front of the queue. When the queue contains a lot of things, this is slow! Specifically, the running time of an enqueue is still $O(n)$.

Can we improve this? Actually, we can. We just need to use another pointer to point to the last node in the linked list. This way, we don't have to find the front of the queue with a loop every time we enqueue! Check out the following implementation:

  1. struct queue {
  2. node* front;
  3. node* back;
  4. // initialize
  5. queue() {
  6. this->front = NULL;
  7. this->back = NULL;
  8. }
  9. void enqueue(int x) {
  10. if (this->back == NULL) {
  11. this->back = new node(x);
  12. this->front = this->back; // point front to the same node
  13. } else {
  14. node* new_node = new node(x);
  15. this->front->next = new_node;
  16. this->front = new_node; // point front to the new 'last' node
  17. }
  18. }
  19. int dequeue() {
  20. if (this->back == NULL) { // empty queue
  21. return 0;
  22. } else {
  23. int result = this->back->data;
  24. this->back = this->back->next;
  25. return result;
  26. }
  27. }
  28. bool is_empty() {
  29. return this->back == NULL;
  30. }
  31. // destructor
  32. ~queue() {
  33. node* current = this->back;
  34. while (current != 0) {
  35. node* next = current->next;
  36. delete current;
  37. current = next;
  38. }
  39. }
  40. }

Study the role of the new front pointer carefully. This time we see that each operation runs in $O(1)$ time, which is really great!

Deques

Finally, we briefly discuss double-ended queues, or deques for short. (“Deque” is pronounced “deck”.) This is a beefed-up version of a stack and queue. It's like a queue, but you can push and pop elements on both ends! It supports the following operations:

We leave it to you to implement this new data type with arrays. I assure you it won't be very hard, because it is actually very similar to our previous array implementations! Because of that though, it will probably be slow too. So the next thought is to use linked lists with back and front pointers just like in our queue implementation. Before proceeding, we encourage you to implement it with linked lists yourself.

Upon implementation, notice that push_left, push_right and pop_left are easy to implement, because they are very similar to our stack and queue operations. However, we encounter some issues with the pop_right operation; namely, it's easy to find what node to pop, but how do we update the front pointer to point to the new “last” node? It looks like we will need a loop, which is slow! (It takes $O(n)$.)

The real solution for this problem is to use a doubly linked list, as mentioned in the previous section! The idea is to add a new attribute in each node called prev, in addition to data and next, which points to the previous node in the linked list. There will be more pointers to update on every operation, but at least we'll now be able to implement all operations in $O(1)$ time!

Problems