Graph Data Structure Demystified

We use Google search, google maps, and social networks regularly nowadays. One of the things they all have in common is the fact they use a remarkable data structure – graphs under the hood to organize and manipulate data. You may have seen this data structure in college but don’t remember much about it. Or maybe it is a scary topic you avoid all the time. Either way, now is an excellent time to get familiar with it. In this blog, we will cover most of the concepts, and you should be comfortable to move on to work with algorithms related to graphs.

Outline

  1. Definition
  2. Terminology
  3. Representations
  4. Graph algorithms

Definition

A graph is a non-linear data structure that organizes data in an interconnected network. It is very similar to trees. Actually, a tree is a connected graph with no cycles. We will talk about the cycles in a little.

Ignore the red stroke around the Trees box. It was supposed to be around the Graphs box. 😅

Random graph

There are two primary components of any graph: Nodes and Edges.

Nodes are typically called Vertices (singular: vertex), and they can represent any data: integers, strings, people, locations, buildings, etc.

Edges are the lines that connect the nodes. They can represent roads, routes, cables, friendships, etc.

Graph Terminology

There is a lot of vocabulary to remember related to graphs. We will list the most common ones.

Undirected and Directed graphs

A graph can be directed or undirected. As you might have already guessed, directed graphs have edges that point to specific directions. Undirected graphs simply connect the nodes to each other, and there is no notion of direction or whatsoever.

Weighted and Unweighted graphs

Let’s say we are using a navigation application and trying to get the best route between point A and point B. Once we enter the details of the two points, the app does some calculations and shows us the fastest way to reach our goal. Typically, there are many ways to get from point A to point B. So to choose the best way, the app would need to differentiate the options by specific values. The obvious solution, in this case, is to calculate the distance each option entails and pick the one with the shortest distance. So assigning some value to the connection between two points is called adding weight to it. Weighted graphs have some values (distance, cost, time, etc.) attached to their edges.

Cyclic and Acyclic graphs

Earlier, we have mentioned that a tree is actually a graph without cycles. So what is a cycle in a graph? We say a graph is cyclic when it has a continuous sequence of vertices that connects back to itself. Vertices or edges cannot be repeated. Acyclic graphs do not have cycles. Trees happen to be acyclic and directed graphs with a restriction that a child node can have only one parent node.

Representing graphs in memory

One of the main things that make graphs less intuitive and confusing is probably the way they are stored in computer memory. With the nodes being all over the place and flexible amounts of edges connecting them together, it can be challenging to find an obvious way to implement them. However, there are some widely accepted representations we can consider. Let’s store the following undirected graph in three different ways.

Edge List

This representation stores a graph as a list of edges.

const graph = [['A', 'B'], ['A', 'E'], ['C', 'B'], ['C', 'E'], ['C', 'D']];

Edges are mentioned only once on the list. There is no need to state A and B, and also B and A. Additionally, the order of edges in the list does not matter.

Similar to the list of edges, we could also store the nodes as a list. But that is not what the Edge List representation is.

Adjacency List

This method relies on the indexes when storing the connections to a particular node. In JavaScript, we would create an array of arrays, where each index indicates a node in the graph, and value at each index represents the adjacent (neighbor) nodes.

const graph = [
	['B', 'E'],
	['A', 'C'],
	['B', 'D', 'E'],
	['C'],
	['A', 'C']
]

Again, the order of the nodes does not really matter, as long as we organize them without duplicates and with correct adjacent vertices.

Moreover, the graph could also be represented as an object. In that case, keys would represent the nodes and values would be the list of neighbor nodes:

const graph = {
	'A': ['B', 'E'],
	'B': ['A', 'C'],
	'C': ['B', 'D', 'E'],
	'D': ['C'],
	'E': ['A', 'C']
}

This option is usually helpful when the vertices do not properly map to array indexes.

Adjecency matrix

In this representation, we create an array of arrays in which each index indicates a node, and value at that node shows the list of nodes this particular node has connections with. A connection is denoted as 1, and a lack of connection is denoted as 0.

const graph = [
	[0, 1, 0, 0, 1],
	[1, 0, 1, 0, 1],
	[0, 1, 0, 1, 1],
	[0, 0, 1, 0, 0],
	[1, 0, 1, 0, 0]
]

In this case, the order of the nodes in the list matters.

Graph Algorithms

BFS and DFS

There are two main graph algorithms that we absolutely need to know when it comes to graphs:

  • Breadth-First Search
  • Depth First Search

Many graph-related problems can be solved with these two traversal methods.

Breadth-First Traversal
BFS algorithm traverses a graph by visiting the neighbor nodes instead of going down to the child nodes. And it uses a queue data structure to keep track of the visited vertices.

The structure given above looks like a tree, but it does not have to be a tree data structure for us to use the breadth-first search algorithm. Actually, a tree is a type of graph.

There are three main steps that these algorithms follows:

  1. Visit the adjacent unvisited node. Mark it as a visited node by pushing it in a queue.
  2. If no neighbor vertex is found, pop the first node from the queue and use it as the new starting point for a search.
  3. Repeat the steps above until there is nothing left in the queue

Depth-First Traversal
This algorithm visits the child vertices before traversing the sibling nodes. It tries to go as deep as possible before starting a new search on the graph. The significant difference of this algorithm from the previous breadth-first is the fact that it uses stack data structure instead of a queue.

DFS follows these steps to traverse through a graph:

  1. Visit the unvisited neighbor node. Push it in a stack. Keep doing it until there is no adjacent node is found.
  2. If no adjoining node is found, pop the first node from the stack and use it as the next starting point
  3. Repeat the steps above until the stack is clear

Cheers!

Tree Data Structure Simplified – Part 1

Whether we know it or not, trees are everywhere: our computer file systems, HTML DOM, XML/Markup parsers, PDF format, computer chess games, compilers, and others heavily rely on the tree data structure. More practical examples are company hierarchies, family trees, or comments section of any posts. Trees are found to be tricky when implementing in applications and during the coding interviews. How about we take a deep dive in the details of trees and learn the concepts in a more straightforward and fun way? In this article, we look at different types of trees and build a few of them from scratch to solidify our knowledge. Also, we use lots of visuals, which is the key to remembering. Let’s get started!

Outline

  1. What is a tree?
  2. Terminology
  3. Implementation of a general tree
  4. Traversing a general tree
  5. Binary trees
  6. Binary tree traversal

What is a tree?

A tree is a non-linear data structure composed of nodes. It organizes values hierarchically. A node is an entry in a tree, and every node can have zero or more child nodes.

A typical company structure is an excellent example of a hierarchy. There is always a root node at the top of the tree (It looks like a tree when it is flipped upside down) and nodes are connected by edges.

Another example of a hierarchy is the HTML DOM (Document Object Module).

Terminology

– Node

A node is an entry in a tree. It can contain any type of data. It may or may not have child nodes.

– Root node

Root node is also a parent node to its children.

– Edges

Nodes are connected by edges.

– Siblings

Siblings are the child nodes of the same parent.

– Leaf

A leaf node is a node that does not have any child nodes in the tree.

– Depth (or Level)

The number of links or edges from the root node to a selected node is called the depth of the selected node.

– Height

Height of a tree is equal to the maximum depth, or the number of edges from the root to the furthest leaf.

General Tree Implementation

Trees are usually built-in with most of the programming languages. However, to understand it better, we can build one from scratch. Let’s use JavaScript to implement it.

First, we create the Node class that contains the necessary methods to manage the tree.

class Node {
	constructor(data){
		this.data = data;
		this.children = [];
	}
	add(data){
		this.children.push(new Node(data));
	}
	remove(data){
		this.children = this.children.filter(node => node.data !== data);
	}
}

Then we make the Tree class that can use the Node class to create nodes.

class Tree {
	constructor(){
		this.root = null;
	}
}

We utilize both to create and manage a general tree.

const node = new Node(44);
const tree = new Tree();
tree.root = node;

Traversing A General Tree

There are two major algorithms that we can use to traverse a general tree – Breadth-First and Depth-First.

Breadth-First Traversal (BFS)

Breadth-First method tries to stay as close to the root node as possible. Once it starts going through the nodes, it traverses siblings nodes in a row until it reaches that last row.

Depth-First Traversal (DFS)

In DFS, we explore each branch until the end, before moving on to the next branch. It believes in going all the way down to the leaf nodes.

In order to check if a value exists in a tree, we could use either of these algorithms. Let us add these methods to the Tree class we have created earlier and see it in action.

class Tree {
	constructor(){
		this.root = null;
	}
	// Breadth-First Search
	searchBF(value){
		const queue = [this.root];
		while(queue.length){
			const node = queue.shift();
			if(node.data === value){
				return true;
			} else {
				queue.push(...node.children);
			}
		}
		return false;
	}
}

Breadth-First method first creates a queue with the root element of the tree. It then utilizes a while loop as long as the queue is not empty. When it is empty, it stops. Inside the while loop, it removes the first element and assigns it to a variable. If the removed node contains the value we are looking for, it returns true confirming that the value exists in the tree. Otherwise, it just pushes the children of the removed node to the back of the queue and keeps doing this process until the queue is empty or the value is found.

class Tree {
	constructor(){
		this.root = null;
	}
	// Depth-First Search
	searchDF(value){
		const stack = [this.root];
		while(stack.length){
			const node = stack.shift();
			if(node.data === value){
				return true;
			} else {
				stack.unshift(...node.children);
			}
		}
		return false;
	}
}

Depth-First method is very similar to the BFS. There is only one small difference. With non-recursive implementations, we use a stack to keep track of the nodes while exploring them instead of a queue. So instead of pushing the child nodes to the back of the array, we unshift them at the front.

Checkout the documentations for shift and unshift, if you are not familiar with them.

DFS and BFS are also commonly used on Graphs

Both of the methods perform the same task – traverse through a tree. It depends on the situation when to use which. Here are some points about the two approaches that can assist in determining the best option.

DFS and BFS Comparison

  • DFS is usually preferred in order to explore all the nodes of a graph
  • BFS is often better at finding the shortest path between two nodes
  • BFS is implemented using queues and DFS is implemented using stacks
  • BFS and DFS can also be built with recursion

Binary Trees

A binary tree is a very commonly used type of tree that has one distinctive feature – each node in a binary tree can have at most two children.

Also, there are different types of binary trees that we need to be aware of.

1. Balanced & Unbalanced Binary Trees

balanced binary tree is a tree that has a “filled look” and can ensure O(log n) times for insertion and search. It does not have to have a perfectly equal number of nodes on each side. It just should not have really short branches or missing pieces.

2. Full Binary Trees

If every single node in a tree has either two or zero children, we can call it a full binary tree. There cannot be a node with only one child in a full binary tree.

3. Complete Binary Trees

A complete binary tree needs to have almost all levels fully filled from left to right. Only the last level might be unfilled.

4. Perfect Binary Trees

Perfect binary trees are the ones that are complete and full. There must be exactly 2^k – 1 nodes in a perfect binary tree (where k is the depth of the tree).

Binary Tree Traversal

There are three main methods of traversing binary trees: Pre-OrderIn-Order and Post-Order traversal.

Pre-Order Traversal

Pre-order traversal visits the current node, then explores the left subtree, then right subtree. In this type of traversal, the root node is always the first node visited.

class Node {
	constructor(data){
		this.data = data;
		this.right = null;
		this.left = null;
	}
}
class Tree {
	constructor(){
		this.root = null;
	}
	
	preOrder(node){
		if(node !== null){
			console.log(node.data);
			this.preOrder(node.left);
			this.preOrder(node.right);
		}
	}
}

In-Order Traversal

With this method, we visit the left branch, then the current node and then the right branch nodes.

inOrder(node){
	if(node !== null){
		this.preOrder(node.left);
		console.log(node.data);
		this.preOrder(node.right);
	}
}

Post-Order Traversal

This method explores the node’s children first, left subtree, right subtree, and then the current node itself.

postOrder(node){
	if(node !== null){
		this.preOrder(node.left);
		this.preOrder(node.right);
		console.log(node.data);
	}
}

Summary

A tree is a common non-linear data structure that is a part of the applications and devices we use on a daily basis. It can be an extremely efficient way of organizing data when implemented correctly. Furthermore, trees are often discussed in coding interviews, and interviewees find it to be challenging to utilize. In this article, we simplify the tree data structure by looking at it from a high level and getting into details to demystify the tricky concepts.

NEXT:  Tree Data Structure Simplified – Part 2

Tree Data Structure Simplified – Part 2

Outline

  1. Binary search trees
  2. BST Implementation
  3. Binary Heaps
  4. Trie

Binary Search Tree

binary search tree is a binary tree with a unique feature – all nodes to the left of the current node must be smaller than the current node and all nodes to the right must be larger. This rule must be valid for all of the nodes in the tree, not just for the root node.

In terms of performance, Binary Search Tree (BST) is a real competitor for an array. If it takes O(n) to perform an insertion/deletion operation with a sorted array, the same thing can be done in O(log \space n) with BST, if it is balanced.

It is important to note that BST is only useful when it is balanced. An unbalanced BST can be pretty slow – O(n), which defies the purpose of the data structure. There are some trees such as Red-Black Trees or AVL trees that rearrange the nodes during insertion to make sure the tree is always balanced.

BST Implementation

Let us implement the Binary Search Tree in JavaScript from scratch.

First of all, we need to define the node class for our tree. Each node needs to have three properties: data, a link to the left child, and another link to the right child. Left and right children are set to null during the Node class implementation.

// Node class
class Node {
	constructor(data){
		this.data = data;
		this.left = null;
		this.right = null;
	}
}

Now we create the BST class with all the essentials functions needed to manage the data in the tree.

// BST class
class BST {
	constructor(){
		this.root = null;
	}

	// Methods to be implemented
	// insert()
	// remove()
}

Insert

Insert method of the class is fairly simple. First, we create the main method, and then the helper method – insertNode. If the root node of the tree is equal to null on initial insertion, the root node will be initialized with the sent in value.

class BST {
	constructor(){
		this.root = null;
	}
	// Creates a new node and calls the insertNode method
	insert(data){
		const newNode = new Node(data);
		if(this.root === null){
			this.root = newNode;
		} else {
			// Finds the right spot to insert the new node
			this.insertNode(this.root, newNode);
		}
	}
	
	insertNode(node, newNode){
		if(newNode.data < node.data && node.left){
			this.insertNode(node.left, newNode);
		} else if(newNode.data < node.data){
			node.left = newNode;
		} else if(newNode.data > node.data && node.right){
			this.insertNode(node.right, newNode);
		} else {
			node.right = newNode;	
		}	
	}
}

Remove

Remove method is little tricky because we have to consider the reorganization of the tree after the removal of a non-leaf node. Removing a leaf node is done by just assigning null to the parent link. But when the node to be deleted has one or two children, we have to take some additional actions.

To remove a node with one child, we set the pointer from the parent node to null.

In order to remove a node with two children, we have to do three things:

  1. Find the node with the minimum value from its right branch
  2. Set the node we want to delete equal to the node found in the first step
  3. Set the node with minimum value to null

Here is how we implement it in code.

class BST {
	constructor(){
		this.root = null;
	}
	
	remove(data){
		// Re-initialize the root node
		this.root = this.removeNode(this.root, data);
	}
	removeNode(node, data){
		if(node === null){
			return null; // tree is empty
		} else if(node.data > data){ // move left
			node.left = this.removeNode(node.left, data);
			return node;
		} else if(node.data < data){ // move right
			node.right = this.removeNode(node.right, data);
			return node;
		} else {
			// Delete a leaf node
			if(!node.left && !node.right){
				return null;
			} 

			// Delete a node with 1 child
			if(!node.left){
				return node.right;
			} else if(!node.right){
				return node.left;
			}
			
			// Delete a node with 2 children
			const min = this.findMinimumValueNode(node.right);
			node.data = min.data;
			
			node.right = this.removeNode(node.right, min.data);
			return node;
		}
	}
	findMinimumValueNode(node){
		// if left node is null, then this node is the minimum
		// if not, we recursively find the minimum
		return !node.left ? node : this.findMinimumValueNode(node.left);
	}
	inOrderPrint(node){
		if(node !== null){
			this.preOrder(node.left);
			console.log(node.data);
			this.preOrder(node.right);
		}
	}
	
	getRoot(){
		return this.root;	
	}
}

Now we can use methods to create and manage a binary search tree.

const Tree = new BST();

Tree.insert(30);
Tree.insert(9);
Tree.insert(100);
Tree.insert(45);
Tree.insert(166);

Tree.inOrderPrint(Tree.getRoot());
/*
	30
   /  \
  9	  100
	  / \
	45  166 

Print: 9 30 45 100 166
*/


Tree.remove(100);
Tree.inOrderPrint(Tree.getRoot());
/*
	30
   /  \
  9	  45
	    \
	    166 

Print: 9 30 45 166
*/

Binary Heaps

Binary heaps are just binary trees with unique features. There are two type of Binary Heaps – Min-Heaps and Max-Heaps.

Min-Heaps

Min-Heap is a binary tree which must have the following qualities:

  1. It should be a complete binary tree. Meaning the tree should be filled except for the last rightmost branch.
  2. Each node must be smaller than its children.
  3. Root must be the minimum element in the tree.

Max-Heaps

Max-Heaps are almost the same as min-heaps except for the elements are in descending order instead of ascending order like in min-heaps.

There are two main methods used with min-heaps: insert and extract_minimum.

Insert

To insert a value into a min-heap, we place the new node at the rightmost bottom of the tree. That way we can always make sure that the tree stays complete. Then we need to re-organize the tree to bring the node with the minimum value to the top. The whole process takes about O(log n) to execute.

Extracting the minimum value

Extracting the minimum value from the min-heap is simple because the root node always holds the minimum element. The only thing to consider is the time when we need to remove the root node from the tree. In that case, we first remove the minimum element and then set the root equal to the rightmost bottom node. If after the swap the min-heap is out of order, we keep swapping the root node with its children until it is restored.

Heaps vs. Arrays

Heaps can also be stored as arrays. Add a new element to a tree is equivalent of pushing to the back of the array. Traversing the tree when it is an array can be little tricky though. Because it is difficult to know which element is the right child or left child of a node. Therefore, we need to use some indexing technique to accomplish this task. There are various formulas for indexing, but the following is used more often and easy to remember.

Left child: A node at index i has its left child at index 2 * i + 1 That means, node at index 0 would have its left child at index 2 * 0 + 1 => 1

Right child: A node at index i has its left child at index 2 * i + 2 Meaning right child comes after the left child.

Parent: A node’s parent node is located at index (i – 1)/2.

Tries

Trie, also called as prefix tree or radix tree, is another type of tree with a distinctive feature – it is designed to efficiently store and retrieve strings. In a trie, we store characters in each node, and each branch down the tree resembles a word.

Here is an example of a trie that stores “Simon”, “Simba” and “Lisa”:

A trie can save a lot of space when implemented correctly. As we can see from the trie above, the words Simon and Simba have the common prefix of “Sim”. So we store that prefix once and use it multiple times.

Tries are especially useful for predicting the possible words given some prefixes. For example, providing suggestions when we are trying to search for a state in a form. We type in “New ” in the form and it shows us what states are available in the dictionary that starts with the word “New “: “New York,” “New Mexico,” “New Hampshire,” etc.

Each node in a trie may have up to 26 children (English alphabet). And there must be a way of identifying the endings of words. Usually, the endings are indicated by adding a special character node to every word. Let’s say we add a node that contains a hashtag character “#” at the end of each valid word.

In the worst case scenario, it can take up to O(k) runtime to perform an insertion or a lookup in a trie. k being the number of nodes. Space complexity can be as bad as O(n*k). However, a trie is a great data structure to implement with applications that heavily rely on prefixes. It is even better than hash tables in this regard because hash tables cannot tell us if a string is a prefix of any word or not.

Hash Tables Simplified

Hash table is a data structure designed for quick look ups and insertions. On average, its complexity is O(1) or constant time. The main component of a hash table is its hash function. The better the hash function, the more accurate and faster the hash table is. High level process of implementing a hash table is as follows: data is sent to a hash function where it gets evaluated and assigned an index number in an array. The hashing algorithm stores the data under the designated index with a pointer. When a lookup is requested for the same data, the algorithm sends the data to the same hash function and retrieves the index of the data in constant time.

hash table which is also known as hash map, dictionary or just map is a data structure that arranges data for quickly looking up values with a given key.

Average run time complexity of a hash table is O(1) or constant time.

Hash tables are very similar to arrays. Arrays store values in sequential order and they have indices that can be used to look up values instantly. However, arrays have several restrictions: First of all, the indices are generated by the computer in sequential order. Second, if we want to add a value at the beginning or in the middle of the array, we need to move the rest of the values, which requires O(n) steps in the worst case scenario. And in some cases, when we are using non-dynamic arrays, their sizes are fixed.

Hash tables are like giving super powers to arrays. First, instead of relying on sequential indices, we use a function that turns keys into indices. That function is known as the hash function. Hash function is a special function that takes in a key and spits out an index. There are different ways of making hash function and we will look into them later. So then we store the values under that specific index. Later we can use the same function to get the indices and look up values in constant time just like with arrays.

Example

Let’s look at an example case where using a hash table would be a good fit. Let’s say there is a list of students at a university with basic information. |Name|Major|Year| |-|-|-| |Bob|Math|2| |Sam|Biology|1| |Lisa|Art|3|

Now we are asked to organize this data for quick look ups and insertions. Using a hash table we can easily accomplish this task. First we have to pick a key from the list to identify each record. Typically, it is required to choose some value that is unique for each element. But for simplicity purposes we just pick the student names. So we have an array with 5 slots or buckets. Instead of assigning each record in order, we use a hash function to determine which slot each record goes to. We take Bob and give it to the hash function hash('Bob') and the function returns us an index value of 0, for example. We store Bob’s details under the index 0. Then we send Sam to the hash function, the functions tells us to store Sam’s details in slot 4. We do the same for Lisa and store her information in bucket 3.

Hash Function

Now let’s discuss what happens in our hash function. There are countless ways of making hash functions. It is usually preferred to use the ones on the internet which were tested by many people. However, for the sake of understanding the details under the hood, we can implement a simple one here.

Hash function logic We take ASCII value of each character and add them up.

Total %5
B o b
66 111 98 275 0
S a m
83 97 109 289 4
L i s a
76 105 115 97 393 3

For example, for Bob the sum of ASCII values would be 275. Since we have only 5 slots in our array we have to reduce the number to a smaller one. We can use a modulo operator to do that. 275 % 5 = 0

That means we store Bob’s details in slot 0.

Collisions

Remember we said the run time complexity of a hash table is constant – O(1) earlier? Well, its not entirely true. In the worst case scenario, complexity can be as bad as O(n). Main reason for that is the collisions. Collision is a situation when hash function outputs a duplicate index number.

For example, if we send the name Mia to our previous hashing function, it returns index 4. However, that slot is already occupied.

So there are 2 common ways of handling this issue. First one is called Linear Probing. With linear probing, we look at the given index, and if its occupied we go to the next available slot. In our example, there is no next slot. So in this case, we have to wrap around. Meaning we start from the beginning looking for an empty spot. 0 is also taken. We move to slot 1. That place is empty. We insert Mia at index 1. We keep repeating this process for other values if any collisions occur.

Second method is called Chaining. Here we use Linked list to handle duplicate index issues. If you don’t know what a linked list is, here is a good source to read about it. For this method to work, the hash table needs to be properly implemented. All slots should be pointing to the head of the linked list. Then when there is a collision we store the value in the next chain of the linked list. During look ups we need to ensure to check for all nodes by iterating through the list. That is why run time complexity sometimes can be a O(n)

Hash Table in JavaScript

In JavaScript hash table is called Object.

const hash_map = {
	'key':'value'
};

Values in objects could be any type: string, array, integer or even another object. For example, to create an object for the data we had earlier we do the following

const map = {
	Bob: {
		name: 'Bob',
		major:'Math',
		year: 2,
	},
	Sam: {
		name: 'Sam',
		major:'Biology',
		year: 1,
	}
};

Insertion

If we want to insert new key value pairs in the object, we can do it in two ways. First we can use dot method. Second way is similar to **array look up

map['Lisa'] = {
	name: 'Lisa',
	major':'Art',
	year: 3,
};

// OR

map.Lisa = {
	name: 'Lisa',
	major':'Art',
	year: 3,
};

Removal

delete map.Bob;

// OR

delete map['Bob']

Update

map.Lisa.year = 4;
// Or
map['Lisa']['year'] = 4;

Look up

console.log(map.Sam.major);
// Output: 'Biology'

// OR

console.log(map['Sam']['major']);
// Output: 'Biology'

Cheers