Graph Data Structure Demystified

We use Google search, google maps, and social networks regularly nowadays. One of the things they all have in common is the fact they use a remarkable data structure – graphs under the hood to organize and manipulate data. You may have seen this data structure in college but don’t remember much about it. Or maybe it is a scary topic you avoid all the time. Either way, now is an excellent time to get familiar with it. In this blog, we will cover most of the concepts, and you should be comfortable to move on to work with algorithms related to graphs.

Outline

  1. Definition
  2. Terminology
  3. Representations
  4. Graph algorithms

Definition

A graph is a non-linear data structure that organizes data in an interconnected network. It is very similar to trees. Actually, a tree is a connected graph with no cycles. We will talk about the cycles in a little.

Ignore the red stroke around the Trees box. It was supposed to be around the Graphs box. 😅

Random graph

There are two primary components of any graph: Nodes and Edges.

Nodes are typically called Vertices (singular: vertex), and they can represent any data: integers, strings, people, locations, buildings, etc.

Edges are the lines that connect the nodes. They can represent roads, routes, cables, friendships, etc.

Graph Terminology

There is a lot of vocabulary to remember related to graphs. We will list the most common ones.

Undirected and Directed graphs

A graph can be directed or undirected. As you might have already guessed, directed graphs have edges that point to specific directions. Undirected graphs simply connect the nodes to each other, and there is no notion of direction or whatsoever.

Weighted and Unweighted graphs

Let’s say we are using a navigation application and trying to get the best route between point A and point B. Once we enter the details of the two points, the app does some calculations and shows us the fastest way to reach our goal. Typically, there are many ways to get from point A to point B. So to choose the best way, the app would need to differentiate the options by specific values. The obvious solution, in this case, is to calculate the distance each option entails and pick the one with the shortest distance. So assigning some value to the connection between two points is called adding weight to it. Weighted graphs have some values (distance, cost, time, etc.) attached to their edges.

Cyclic and Acyclic graphs

Earlier, we have mentioned that a tree is actually a graph without cycles. So what is a cycle in a graph? We say a graph is cyclic when it has a continuous sequence of vertices that connects back to itself. Vertices or edges cannot be repeated. Acyclic graphs do not have cycles. Trees happen to be acyclic and directed graphs with a restriction that a child node can have only one parent node.

Representing graphs in memory

One of the main things that make graphs less intuitive and confusing is probably the way they are stored in computer memory. With the nodes being all over the place and flexible amounts of edges connecting them together, it can be challenging to find an obvious way to implement them. However, there are some widely accepted representations we can consider. Let’s store the following undirected graph in three different ways.

Edge List

This representation stores a graph as a list of edges.

const graph = [['A', 'B'], ['A', 'E'], ['C', 'B'], ['C', 'E'], ['C', 'D']];

Edges are mentioned only once on the list. There is no need to state A and B, and also B and A. Additionally, the order of edges in the list does not matter.

Similar to the list of edges, we could also store the nodes as a list. But that is not what the Edge List representation is.

Adjacency List

This method relies on the indexes when storing the connections to a particular node. In JavaScript, we would create an array of arrays, where each index indicates a node in the graph, and value at each index represents the adjacent (neighbor) nodes.

const graph = [
	['B', 'E'],
	['A', 'C'],
	['B', 'D', 'E'],
	['C'],
	['A', 'C']
]

Again, the order of the nodes does not really matter, as long as we organize them without duplicates and with correct adjacent vertices.

Moreover, the graph could also be represented as an object. In that case, keys would represent the nodes and values would be the list of neighbor nodes:

const graph = {
	'A': ['B', 'E'],
	'B': ['A', 'C'],
	'C': ['B', 'D', 'E'],
	'D': ['C'],
	'E': ['A', 'C']
}

This option is usually helpful when the vertices do not properly map to array indexes.

Adjecency matrix

In this representation, we create an array of arrays in which each index indicates a node, and value at that node shows the list of nodes this particular node has connections with. A connection is denoted as 1, and a lack of connection is denoted as 0.

const graph = [
	[0, 1, 0, 0, 1],
	[1, 0, 1, 0, 1],
	[0, 1, 0, 1, 1],
	[0, 0, 1, 0, 0],
	[1, 0, 1, 0, 0]
]

In this case, the order of the nodes in the list matters.

Graph Algorithms

BFS and DFS

There are two main graph algorithms that we absolutely need to know when it comes to graphs:

  • Breadth-First Search
  • Depth First Search

Many graph-related problems can be solved with these two traversal methods.

Breadth-First Traversal
BFS algorithm traverses a graph by visiting the neighbor nodes instead of going down to the child nodes. And it uses a queue data structure to keep track of the visited vertices.

The structure given above looks like a tree, but it does not have to be a tree data structure for us to use the breadth-first search algorithm. Actually, a tree is a type of graph.

There are three main steps that these algorithms follows:

  1. Visit the adjacent unvisited node. Mark it as a visited node by pushing it in a queue.
  2. If no neighbor vertex is found, pop the first node from the queue and use it as the new starting point for a search.
  3. Repeat the steps above until there is nothing left in the queue

Depth-First Traversal
This algorithm visits the child vertices before traversing the sibling nodes. It tries to go as deep as possible before starting a new search on the graph. The significant difference of this algorithm from the previous breadth-first is the fact that it uses stack data structure instead of a queue.

DFS follows these steps to traverse through a graph:

  1. Visit the unvisited neighbor node. Push it in a stack. Keep doing it until there is no adjacent node is found.
  2. If no adjoining node is found, pop the first node from the stack and use it as the next starting point
  3. Repeat the steps above until the stack is clear

Cheers!

Tree Data Structure Simplified – Part 1

Whether we know it or not, trees are everywhere: our computer file systems, HTML DOM, XML/Markup parsers, PDF format, computer chess games, compilers, and others heavily rely on the tree data structure. More practical examples are company hierarchies, family trees, or comments section of any posts. Trees are found to be tricky when implementing in applications and during the coding interviews. How about we take a deep dive in the details of trees and learn the concepts in a more straightforward and fun way? In this article, we look at different types of trees and build a few of them from scratch to solidify our knowledge. Also, we use lots of visuals, which is the key to remembering. Let’s get started!

Outline

  1. What is a tree?
  2. Terminology
  3. Implementation of a general tree
  4. Traversing a general tree
  5. Binary trees
  6. Binary tree traversal

What is a tree?

A tree is a non-linear data structure composed of nodes. It organizes values hierarchically. A node is an entry in a tree, and every node can have zero or more child nodes.

A typical company structure is an excellent example of a hierarchy. There is always a root node at the top of the tree (It looks like a tree when it is flipped upside down) and nodes are connected by edges.

Another example of a hierarchy is the HTML DOM (Document Object Module).

Terminology

– Node

A node is an entry in a tree. It can contain any type of data. It may or may not have child nodes.

– Root node

Root node is also a parent node to its children.

– Edges

Nodes are connected by edges.

– Siblings

Siblings are the child nodes of the same parent.

– Leaf

A leaf node is a node that does not have any child nodes in the tree.

– Depth (or Level)

The number of links or edges from the root node to a selected node is called the depth of the selected node.

– Height

Height of a tree is equal to the maximum depth, or the number of edges from the root to the furthest leaf.

General Tree Implementation

Trees are usually built-in with most of the programming languages. However, to understand it better, we can build one from scratch. Let’s use JavaScript to implement it.

First, we create the Node class that contains the necessary methods to manage the tree.

class Node {
	constructor(data){
		this.data = data;
		this.children = [];
	}
	add(data){
		this.children.push(new Node(data));
	}
	remove(data){
		this.children = this.children.filter(node => node.data !== data);
	}
}

Then we make the Tree class that can use the Node class to create nodes.

class Tree {
	constructor(){
		this.root = null;
	}
}

We utilize both to create and manage a general tree.

const node = new Node(44);
const tree = new Tree();
tree.root = node;

Traversing A General Tree

There are two major algorithms that we can use to traverse a general tree – Breadth-First and Depth-First.

Breadth-First Traversal (BFS)

Breadth-First method tries to stay as close to the root node as possible. Once it starts going through the nodes, it traverses siblings nodes in a row until it reaches that last row.

Depth-First Traversal (DFS)

In DFS, we explore each branch until the end, before moving on to the next branch. It believes in going all the way down to the leaf nodes.

In order to check if a value exists in a tree, we could use either of these algorithms. Let us add these methods to the Tree class we have created earlier and see it in action.

class Tree {
	constructor(){
		this.root = null;
	}
	// Breadth-First Search
	searchBF(value){
		const queue = [this.root];
		while(queue.length){
			const node = queue.shift();
			if(node.data === value){
				return true;
			} else {
				queue.push(...node.children);
			}
		}
		return false;
	}
}

Breadth-First method first creates a queue with the root element of the tree. It then utilizes a while loop as long as the queue is not empty. When it is empty, it stops. Inside the while loop, it removes the first element and assigns it to a variable. If the removed node contains the value we are looking for, it returns true confirming that the value exists in the tree. Otherwise, it just pushes the children of the removed node to the back of the queue and keeps doing this process until the queue is empty or the value is found.

class Tree {
	constructor(){
		this.root = null;
	}
	// Depth-First Search
	searchDF(value){
		const stack = [this.root];
		while(stack.length){
			const node = stack.shift();
			if(node.data === value){
				return true;
			} else {
				stack.unshift(...node.children);
			}
		}
		return false;
	}
}

Depth-First method is very similar to the BFS. There is only one small difference. With non-recursive implementations, we use a stack to keep track of the nodes while exploring them instead of a queue. So instead of pushing the child nodes to the back of the array, we unshift them at the front.

Checkout the documentations for shift and unshift, if you are not familiar with them.

DFS and BFS are also commonly used on Graphs

Both of the methods perform the same task – traverse through a tree. It depends on the situation when to use which. Here are some points about the two approaches that can assist in determining the best option.

DFS and BFS Comparison

  • DFS is usually preferred in order to explore all the nodes of a graph
  • BFS is often better at finding the shortest path between two nodes
  • BFS is implemented using queues and DFS is implemented using stacks
  • BFS and DFS can also be built with recursion

Binary Trees

A binary tree is a very commonly used type of tree that has one distinctive feature – each node in a binary tree can have at most two children.

Also, there are different types of binary trees that we need to be aware of.

1. Balanced & Unbalanced Binary Trees

balanced binary tree is a tree that has a “filled look” and can ensure O(log n) times for insertion and search. It does not have to have a perfectly equal number of nodes on each side. It just should not have really short branches or missing pieces.

2. Full Binary Trees

If every single node in a tree has either two or zero children, we can call it a full binary tree. There cannot be a node with only one child in a full binary tree.

3. Complete Binary Trees

A complete binary tree needs to have almost all levels fully filled from left to right. Only the last level might be unfilled.

4. Perfect Binary Trees

Perfect binary trees are the ones that are complete and full. There must be exactly 2^k – 1 nodes in a perfect binary tree (where k is the depth of the tree).

Binary Tree Traversal

There are three main methods of traversing binary trees: Pre-OrderIn-Order and Post-Order traversal.

Pre-Order Traversal

Pre-order traversal visits the current node, then explores the left subtree, then right subtree. In this type of traversal, the root node is always the first node visited.

class Node {
	constructor(data){
		this.data = data;
		this.right = null;
		this.left = null;
	}
}
class Tree {
	constructor(){
		this.root = null;
	}
	
	preOrder(node){
		if(node !== null){
			console.log(node.data);
			this.preOrder(node.left);
			this.preOrder(node.right);
		}
	}
}

In-Order Traversal

With this method, we visit the left branch, then the current node and then the right branch nodes.

inOrder(node){
	if(node !== null){
		this.preOrder(node.left);
		console.log(node.data);
		this.preOrder(node.right);
	}
}

Post-Order Traversal

This method explores the node’s children first, left subtree, right subtree, and then the current node itself.

postOrder(node){
	if(node !== null){
		this.preOrder(node.left);
		this.preOrder(node.right);
		console.log(node.data);
	}
}

Summary

A tree is a common non-linear data structure that is a part of the applications and devices we use on a daily basis. It can be an extremely efficient way of organizing data when implemented correctly. Furthermore, trees are often discussed in coding interviews, and interviewees find it to be challenging to utilize. In this article, we simplify the tree data structure by looking at it from a high level and getting into details to demystify the tricky concepts.

NEXT:  Tree Data Structure Simplified – Part 2

Tree Data Structure Simplified – Part 2

Outline

  1. Binary search trees
  2. BST Implementation
  3. Binary Heaps
  4. Trie

Binary Search Tree

binary search tree is a binary tree with a unique feature – all nodes to the left of the current node must be smaller than the current node and all nodes to the right must be larger. This rule must be valid for all of the nodes in the tree, not just for the root node.

In terms of performance, Binary Search Tree (BST) is a real competitor for an array. If it takes O(n) to perform an insertion/deletion operation with a sorted array, the same thing can be done in O(log \space n) with BST, if it is balanced.

It is important to note that BST is only useful when it is balanced. An unbalanced BST can be pretty slow – O(n), which defies the purpose of the data structure. There are some trees such as Red-Black Trees or AVL trees that rearrange the nodes during insertion to make sure the tree is always balanced.

BST Implementation

Let us implement the Binary Search Tree in JavaScript from scratch.

First of all, we need to define the node class for our tree. Each node needs to have three properties: data, a link to the left child, and another link to the right child. Left and right children are set to null during the Node class implementation.

// Node class
class Node {
	constructor(data){
		this.data = data;
		this.left = null;
		this.right = null;
	}
}

Now we create the BST class with all the essentials functions needed to manage the data in the tree.

// BST class
class BST {
	constructor(){
		this.root = null;
	}

	// Methods to be implemented
	// insert()
	// remove()
}

Insert

Insert method of the class is fairly simple. First, we create the main method, and then the helper method – insertNode. If the root node of the tree is equal to null on initial insertion, the root node will be initialized with the sent in value.

class BST {
	constructor(){
		this.root = null;
	}
	// Creates a new node and calls the insertNode method
	insert(data){
		const newNode = new Node(data);
		if(this.root === null){
			this.root = newNode;
		} else {
			// Finds the right spot to insert the new node
			this.insertNode(this.root, newNode);
		}
	}
	
	insertNode(node, newNode){
		if(newNode.data < node.data && node.left){
			this.insertNode(node.left, newNode);
		} else if(newNode.data < node.data){
			node.left = newNode;
		} else if(newNode.data > node.data && node.right){
			this.insertNode(node.right, newNode);
		} else {
			node.right = newNode;	
		}	
	}
}

Remove

Remove method is little tricky because we have to consider the reorganization of the tree after the removal of a non-leaf node. Removing a leaf node is done by just assigning null to the parent link. But when the node to be deleted has one or two children, we have to take some additional actions.

To remove a node with one child, we set the pointer from the parent node to null.

In order to remove a node with two children, we have to do three things:

  1. Find the node with the minimum value from its right branch
  2. Set the node we want to delete equal to the node found in the first step
  3. Set the node with minimum value to null

Here is how we implement it in code.

class BST {
	constructor(){
		this.root = null;
	}
	
	remove(data){
		// Re-initialize the root node
		this.root = this.removeNode(this.root, data);
	}
	removeNode(node, data){
		if(node === null){
			return null; // tree is empty
		} else if(node.data > data){ // move left
			node.left = this.removeNode(node.left, data);
			return node;
		} else if(node.data < data){ // move right
			node.right = this.removeNode(node.right, data);
			return node;
		} else {
			// Delete a leaf node
			if(!node.left && !node.right){
				return null;
			} 

			// Delete a node with 1 child
			if(!node.left){
				return node.right;
			} else if(!node.right){
				return node.left;
			}
			
			// Delete a node with 2 children
			const min = this.findMinimumValueNode(node.right);
			node.data = min.data;
			
			node.right = this.removeNode(node.right, min.data);
			return node;
		}
	}
	findMinimumValueNode(node){
		// if left node is null, then this node is the minimum
		// if not, we recursively find the minimum
		return !node.left ? node : this.findMinimumValueNode(node.left);
	}
	inOrderPrint(node){
		if(node !== null){
			this.preOrder(node.left);
			console.log(node.data);
			this.preOrder(node.right);
		}
	}
	
	getRoot(){
		return this.root;	
	}
}

Now we can use methods to create and manage a binary search tree.

const Tree = new BST();

Tree.insert(30);
Tree.insert(9);
Tree.insert(100);
Tree.insert(45);
Tree.insert(166);

Tree.inOrderPrint(Tree.getRoot());
/*
	30
   /  \
  9	  100
	  / \
	45  166 

Print: 9 30 45 100 166
*/


Tree.remove(100);
Tree.inOrderPrint(Tree.getRoot());
/*
	30
   /  \
  9	  45
	    \
	    166 

Print: 9 30 45 166
*/

Binary Heaps

Binary heaps are just binary trees with unique features. There are two type of Binary Heaps – Min-Heaps and Max-Heaps.

Min-Heaps

Min-Heap is a binary tree which must have the following qualities:

  1. It should be a complete binary tree. Meaning the tree should be filled except for the last rightmost branch.
  2. Each node must be smaller than its children.
  3. Root must be the minimum element in the tree.

Max-Heaps

Max-Heaps are almost the same as min-heaps except for the elements are in descending order instead of ascending order like in min-heaps.

There are two main methods used with min-heaps: insert and extract_minimum.

Insert

To insert a value into a min-heap, we place the new node at the rightmost bottom of the tree. That way we can always make sure that the tree stays complete. Then we need to re-organize the tree to bring the node with the minimum value to the top. The whole process takes about O(log n) to execute.

Extracting the minimum value

Extracting the minimum value from the min-heap is simple because the root node always holds the minimum element. The only thing to consider is the time when we need to remove the root node from the tree. In that case, we first remove the minimum element and then set the root equal to the rightmost bottom node. If after the swap the min-heap is out of order, we keep swapping the root node with its children until it is restored.

Heaps vs. Arrays

Heaps can also be stored as arrays. Add a new element to a tree is equivalent of pushing to the back of the array. Traversing the tree when it is an array can be little tricky though. Because it is difficult to know which element is the right child or left child of a node. Therefore, we need to use some indexing technique to accomplish this task. There are various formulas for indexing, but the following is used more often and easy to remember.

Left child: A node at index i has its left child at index 2 * i + 1 That means, node at index 0 would have its left child at index 2 * 0 + 1 => 1

Right child: A node at index i has its left child at index 2 * i + 2 Meaning right child comes after the left child.

Parent: A node’s parent node is located at index (i – 1)/2.

Tries

Trie, also called as prefix tree or radix tree, is another type of tree with a distinctive feature – it is designed to efficiently store and retrieve strings. In a trie, we store characters in each node, and each branch down the tree resembles a word.

Here is an example of a trie that stores “Simon”, “Simba” and “Lisa”:

A trie can save a lot of space when implemented correctly. As we can see from the trie above, the words Simon and Simba have the common prefix of “Sim”. So we store that prefix once and use it multiple times.

Tries are especially useful for predicting the possible words given some prefixes. For example, providing suggestions when we are trying to search for a state in a form. We type in “New ” in the form and it shows us what states are available in the dictionary that starts with the word “New “: “New York,” “New Mexico,” “New Hampshire,” etc.

Each node in a trie may have up to 26 children (English alphabet). And there must be a way of identifying the endings of words. Usually, the endings are indicated by adding a special character node to every word. Let’s say we add a node that contains a hashtag character “#” at the end of each valid word.

In the worst case scenario, it can take up to O(k) runtime to perform an insertion or a lookup in a trie. k being the number of nodes. Space complexity can be as bad as O(n*k). However, a trie is a great data structure to implement with applications that heavily rely on prefixes. It is even better than hash tables in this regard because hash tables cannot tell us if a string is a prefix of any word or not.