algorithms and data structure

tags:

data-structure
python
algorithms

Big O Categories Review

Big-O	Name	Description
O(1)	constant	Best The algorithm always takes the same amount of time, regardless of how much data there is. Example: Looking up an item in a list by index
O(log n)	logarithmic	Great Algorithms that remove a percentage of the total steps with each iteration. Very fast, even with large amounts of data. Example: Binary search
O(n)	linear	Good 100 items, 100 units of work. 200 items, 200 units of work. This is usually the case for a single, non-nested loop. Example: unsorted array search.
O(n log n)	linearithmic	Okay This is slightly worse than linear, but not too bad. Example: mergesort and other "fast" sorting algorithms.
O(n^2)	quadratic	Slow The amount of work is the square of the input size. 10 inputs, 100 units of work. 100 Inputs, 10,000 units of work. Example: A nested for loop to find all the ordered pairs in a list.
O(n^3)	cubic	Slower If you have 100 items, this does 100^3 = 1,000,000 units of work. Example: A triple nested for loop to find all the ordered triples in a list.
O(2^n)	exponential	Horrible We want to avoid this kind of algorithm at all costs. Adding one to the input doubles the amount of steps. Example: Brute-force guessing results of a sequence of `n` coin flips.
O(n!)	factorial	Even More Horrible The algorithm becomes so slow so fast, that is practically unusable. Example: Generating all the permutations of a list

P vs NP

NP (which stands for nondeterministic polynomial time is the set of problems whose solutions can be verified in polynomial time, but not necessarily solved in polynomial time.

P is in NP

Because all problems that can be solved in polynomial time can also be verified in polynomial time, all the problems in P are also in NP.

P in NP

The Oracle

A good way of thinking about problems in NP is to imagine that we have a magic oracle that gives us potential solutions to problems. Here would be our process for finding if a problem is in NP:

Present the problem to the magic oracle
The magic oracle gives us a potential solution
We verify in polynomial time that the solution is correct

If we can do the verification in polynomial time, the problem is in NP, otherwise, it isn't.

Stack

It is LIFO(Last In first out) type data structure. It is kind of list but with limitations
All supported operations are O(1) by themselves. However, some tasks, like getting to an item at the bottom of the stack have a higher time complexity because they require multiple pop operations.
Stack operations are limited: no searching, no sorting, no random access
Stacks, like all abstract data types, can store items of any type. What makes it a stack is the behavior of the operations, not the type of data it stores.
Stacks are often used in the real world for:
- Function call management
- Undo/redo functionality
- Expression evaluation
- Browser history

Supported Operations

Operation	Big O	Description
push	O(1)	Add an item to the top of the stack
pop	O(1)	Remove and return the top item from the stack
peek	O(1)	Return the top item from the stack without modifying the stack
size	O(1)	Return the number of items in the stack

Code

class Stack:
    def __init__(self):
        self.items = []

    def push(self, arrow):
        self.items.append(arrow)

    def size(self):
        return len(self.items)

    def peek(self):
        if len(self.items) == 0:
            return None
        return self.items[-1]

    def pop(self):
        if len(self.items) == 0:
            return None
        item = self.items[-1]
        del self.items[-1]
        return item

[[Stacks]]

Queue

A queue is a data structure that stores ordered item. It's like a list, but again, like a stack, its design is more restrictive. A queue only allows items to be added to the tail of the queue and removed from the head of the queue.
It is a FIFO - first in first out data structure.

Operations

Operation	Big O	Description
push / enqueue	O(1)	Add an item to the back of the queue
pop / dequeue	O(1)	Remove and return the front item from the queue
peek	O(1)	Return the front item from the queue without modifying the queue
size	O(1)	Return the number of items in the queue

Code

class Queue:
    def __init__(self):
        self.items = []

    def push(self, item):
        self.items.insert(0, item)

    def pop(self):
        if len(self.items) == 0:
            return None
        temp = self.items[-1]
        del self.items[-1]
        return temp

    def peek(self):
        if len(self.items) == 0:
            return None
        return self.items[-1]

    def size(self):
        return len(self.items)

    def search_and_remove(self, item):
        if item not in self.items:
            return None
        self.items.remove(item)
        return item

    def __repr__(self):
        return f"[{', '.join(self.items)}]"

Linked List

A linked list is a linear data structure where elements are not stored next to each other in memory. The elements in a linked list are linked using references to each other.
The problem with our array (normal Python list) implementation of a queue was that it had O(n) push operations.
The linked list implementation of a queue has O(1) push operations.
. Linked lists are usually a worse choice than standard arrays because:
- They are less performant due to more memory overhead
- They are more complex to work with, debug, and reason about
- They have no random access (indexing to a specific element)
Linked lists are a better choice than arrays in very specific circumstances because they have O(1) insertions and deletions at both ends of the list.

Operations

Operation	Big O	Description
push / insert	O(1)	Add an item to front or back of the linked list
pop / delete	O(1)	Remove and return the item from front or back of the linked list
peek	O(1)	For the item in the start or end
size	O(n)	Need to iterate through the whole linked list

Code (Queue using Linked List)

class Node:
    def __init__(self, val):
        self.val = val
        self.next = None

    def set_next(self, val):
        self.next = val

    def __repr__(self):
        return self.val

class LLQueue:
    def __init__(self):
        self.tail = None
        self.head = None

    def add_to_tail(self, node):
        if self.tail is None:
            self.tail = node
            self.head = node
            return
        self.tail.next = node
        self.tail = node
        
    def remove_from_head(self):
        if self.head is None:
            return None
        node = self.head
        self.head = node.next
        if self.head is None:
            self.tail = None
        return node

    def __iter__(self):
        node = self.head
        while node is not None:
            yield node
            node = node.next

    def __repr__(self):
        nodes = []
        for node in self:
            nodes.append(node.val)
        return " <- ".join(nodes)

Trees

Trees are a widely used data structure that simulate a hierarchical... well... tree structure. That said, they're typically drawn upside down - the "root" node is at the top, and the "leaves" are at the bottom.
Trees are kind of like linked lists in the sense that the root node simply holds references to its child nodes, which in turn hold references to their children. The difference between a Linked List and a Tree is that a tree's nodes can have multiple children instead of just one.
A generic tree structure has the following rules:
- Each node has a value and a list of "children"
- Children can only have a single "parent"

Binary Trees

Trees aren't particularly useful data structures unless they're ordered in some way. One of the most common types of ordered tree is a Binary Search Tree or BST. In addition to the properties we've already talked about, a BST has some additional constraints:
1. Instead of an unbounded list of children, each node has at most 2 children
2. The left child's value must be less than its parent's value
3. The right child's value must be greater than its parent's value
4. No two nodes in the BST can have the same value
By ordering the tree this way, we'll be able to add, remove, find, and update nodes much more quickly.

Operations

Operation	Big O	Description
insert	O(log(n))	inserting element into binary tree, t only requires one comparison for each level of the tree, making it `O(log(n))` (at least in a balanced tree)
delete	O(log(n))	`delete` method is `O(log(n))` because, just like most B-tree operations, we don't have to search the entire tree. We only have to search one path from the root to the leaf node we want to delete.<br>The depth of the tree on average is equal to log base 2 of the number of nodes in the tree.

Preorder Traversal

It's called "preorder" because the current node is visited before its children. This tree: ![[Pasted image 20241130142211.png]]
This would be traversed in this order: 4,2,1,7,6
Sometimes it's useful (albeit a bit slow) to iterate over all the nodes in the tree.

Postorder Traversal

It's called "postorder" because the current node is visited after its children. The following tree: ![[Pasted image 20241130142211.png]]
This would be traversed in this order: 1,2,6,7,4

Inorder Traversal

An "inorder" traversal is the most intuitive way to visit all the nodes in a tree. It's called "inorder" because the current node is visited between its children. It results in an ordered list of the nodes in the tree. The following tree:

![[Pasted image 20241130142211.png]]
This would be traversed in this order: 1,2,4,6,7

Code

class BSTNode:
    def __init__(self, val= None):
        self.val = val
        self.left = None
        self.right = None
    
    '''
        1. Check if the current value is same as the value needs to be inserted, return immediately as theres should be no duplicates
        2. Check if node don't have value then set value as given value and return
        3. if value is less than current node's value and
            a. if the left node is empty then create a new left child node with the given value and return
            b. otherwise recursively call the insert method in the left child with the val and return
        4. now val should be greater than current node's value
            a. if the right node is empty then create a new right child node with the given value and return
            b. otherwise recursively call the insert method in the right child with the val
    '''
    def insert(self, val):
        if self.val == val:
            return
        if self.val is None:
            self.val = val
            return
        if val < self.val:
            if self.left is None:
                self.left = BSTNode(val)
                return
            self.left.insert(val)
            return
        if self.right is None:
            self.right = BSTNode(val)
            return
        self.right.insert(val)

    def get_min(self):
        node = self
        while node.left is not None:
            node = node.left
        return node.val
    
    def get_max(self):
        node = self
        while node.right is not None:
            node = node.right
        return node.val

    '''
        1. Check if the current node's value (self.val) is None. If it is, return None. 
            This represents an empty tree or a leaf node where deletion has already occurred.
        2. If the value to delete (val) is less than the current node's value:
            Check if there's a left child (self.left). If it exists, recursively call the delete method on the left child with the value to delete and set the left child to the return value of the recursive call.
            Regardless of whether the left child exists or not, return the current node.
        3. Do the same (but mirrored) for the right child (self.right).
        4. If the value to delete is equal to the current node's value, we have found the node we want to delete:
            a. If there is no right child, return the left child. This effectively deletes the current node by bypassing it.
            b. If there is no left child, return the right child. This effectively deletes the current node by bypassing it.
            c. If there are both left and right children, find the minimum larger node (min_right_node) by traversing down the left side of the right subtree. This is the node with the smallest value that is still larger than self.val.
            Replace self.val with min_right_node.val.
            d. Delete min_right_node.val from the right subtree and set the right child to the return value of the recursive call.
            e. Return the current node.
    '''
    def delete(self, val):
        if self.val is None:
            return None
        
        if self.val > val:
            if self.left is not None:
                self.left = self.left.delete(val)
            return self
            
        
        if self.val < val:
            if self.right is not None:
                self.right = self.right.delete(val)
            return self

        if self.right is None:
            return self.left

        if self.left is None:
            return self.right
        
        min_right_node = self.right
        while min_right_node.left is not None:
            min_right_node = min_right_node.left
        self.val = min_right_node.val
        self.right = self.right.delete(min_right_node.val)
        return self

    '''
        1. Visit the value of the current node by appending its value to the visited array
        2. Recursively traverse the left subtree
        3. Recursively traverse the right subtree
        4. Return the array of visited nodes
        for the root node call will be tree.preorder([])
    '''
    def preorder(self, visited):
        visited.append(self.val)
        if self.left:
            self.left.preorder(visited)
        if self.right:
            self.right.preorder(visited)
        return visited

    '''
        1. Recursively traverse the left subtree 
        2. Recursively traverse the right subtree
        3. Visit the value of the current node by appending its value to the visited array 
        4. Return the array of visited nodes
    '''
    def postorder(self, visited):
        if self.left:
            self.left.postorder(visited)
        if self.right:
            self.right.postorder(visited)
        visited.append(self.val)
        return visited
    
    '''
        1. Recursively traverse the left subtree 
        2. Visit the value of the current node by appending its value to the visited array 
        3. Recursively traverse the right subtree
        4. Return the array of visited nodes
    '''
    def inorder(self, visited):
        if self.left:
            self.left.inorder(visited)
        visited.append(self.val)
        if self.right:
            self.right.inorder(visited)
        return visited
    
    '''
        To check if the node is exist in the tree or not
        Return True if found the node else False
    '''
    def exists(self, val):
        if self.val == val:
            return True
        if self.left:
            return self.left.exists(val)
        if self.right:
            return self.right.exists(val)
        return False
    
    '''
        Returns all elements with the specified range from the tree in sorted order
        This uses inorder traversal to traverse tree 
    '''
    def search_range(self, lower_bound, upper_bound):
        result = []
        if self.val is None:
            return result
        if self.left and self.val > lower_bound:
            result.extend(self.left.search_range(lower_bound, upper_bound))
        if lower_bound <= self.val <= upper_bound:
            result.append(self.val)
        if self.right and self.val < upper_bound:
            result.extend(self.right.search_range(lower_bound, upper_bound))
        return result
        
    
    '''
        Returns depth of the tree
        1. If the node's value is None, return 0.
        2. Recursively calculate the height of the left subtree.
        3. Recursively calculate the height of the right subtree.
        4. Return the maximum of the left and right subtree heights plus 1.
    '''
    def height(self):
        if self.val is None:
            return 0
        left_depth = 0
        if self.left:
            left_depth = self.left.height()
        right_depth = 0
        if self.right:
            right_depth = self.right.height()
        return max(left_depth, right_depth) + 1

Red Black Tree

BST's have a problem. While it's true that on average a BST has O(log(n)) lookups, deletions, and insertions, that fundamental benefit can break down quickly if most of data is sorted or completely sorted than the depth of the tree will be very huge than the width and which increase the complexity.
A red-black tree is a kind of binary search tree that solves the "balancing" problem. It ensure that as nodes are inserted and deleted, the tree remains relatively balanced.
Each node in an RB Tree stores an extra bit, called the "color": either red or black. The "color" ensures that the tree remains approximately balanced during insertions and deletions. When the tree is modified, the new tree is rearranged and repainted to restore the coloring properties that constrain how unbalanced the tree can become in the worst case.

Rules

In addition to all the rules of a Binary Search Tree, a red-black tree must follow some additional ones:
1. Each node is either red or black
2. The root is black. This rule is sometimes omitted. Since the root can always be changed from red to black, but not necessarily vice versa, this rule has little effect on analysis.
3. All Nil leaf nodes are black.
4. If a node is red, then both its children are black.
5. All paths from a single node go through the same number of black nodes to reach any of its descendant NIL nodes.

Rotations

Rotations are what actually keeps red-black tree balanced.
This operation is the O(1).
A properly-ordered tree pre-rotation remains a properly-ordered tree post-rotation
When rotating left:
- The "pivot" node's initial parent becomes its left child
- The "pivot" node's old left child becomes its initial parent's new right child ![[Pasted image 20241130215023.png]]

Code

class RBNode:
    def init(self, val):
        self.red = False
        self.parent = None
        self.val = val
        self.left = None
        self.right = None
    

class RBTree:
    def init(self):
        self.nil = RBNode(None)
        self.nil.red = None
        self.nil.left = None
        self.nil.right = None
        self.root = self.nil
        
    def insert(self, val):
        new_node = RBNode(val)
        new_node.left = self.nil
        new_node.right = self.nil
        new_node.red = True

        parent = None
        current = self.root
        while current != self.nil:
            parent = current
            if val < current.val:
                current = current.left
            elif current.val < val:
                current = current.right
            else:
                return
        
        new_node.parent = parent
        if parent is None:
            self.root = new_node
        if val < parent.val:
            parent.left = new_node
        else:
            parent.right = new_node
        self.fix_insert(new_node)
    
    def fix_insert(self, new_node):
        while new_node.parent is not None and new_node.parent.red:
            parent = new_node.parent
            grandparent = parent.parent
            if parent == grandparent.right:
                uncle = grandparent.left
                if uncle.red:
                    uncle.red = False
                    parent.red = False
                    grandparent.red = True
                    new_node = grandparent
                else:
                    if new_node == parent.left:
                        new_node = parent
                        self.rotate_right(new_node)
                    parent.red = False
                    grandparent.red = True
                    self.rotate_left(grandparent)
            else:
                uncle = grandparent.right
                if uncle.red:
                    uncle.red = False
                    parent.red = False
                    grandparent.red = True
                    new_node = grandparent
                else:
                    if new_node == parent.right:
                        new_node = parent
                        self.rotate_left(new_node)
                    parent.red = False
                    grandparent.red = True
                    self.rotate_right(grandparent)
        self.root.red = False

    def rotate_left(self, pivot_parent):
        if pivot_parent == self.nil or pivot_parent.right == self.nil:
            return 
        pivot = pivot_parent.right
        pivot_parent.right = pivot.left
        if pivot.left != self.nil:
            pivot.left.parent = pivot_parent
        pivot.parent = pivot_parent.parent
        if pivot_parent.parent is None:
            self.root = pivot
        elif pivot_parent.parent.val < pivot_parent.val:
            pivot_parent.parent.right = pivot
        else:
            pivot_parent.parent.left = pivot
        pivot.left = pivot_parent
        pivot_parent.parent = pivot

    def rotate_right(self, pivot_parent):
        if pivot_parent == self.nil or pivot_parent.left == self.nil:
            return
        pivot = pivot_parent.left
        pivot_parent.left = pivot.right
        if pivot.right != self.nil:
            pivot.right.parent = pivot_parent
        pivot.parent = pivot_parent.parent
        if pivot_parent.parent is None:
            self.root = pivot
        elif pivot_parent.val < pivot_parent.parent.val:
            pivot_parent.parent.left = pivot
        else:
            pivot_parent.parent.right = pivot
        pivot.right = pivot_parent
        pivot_parent.parent = pivot

HashMap

A hash map or "hash table" is a data structure that maps keys to values:

"bob" -> "ross"
"pablo" -> "picasso"
"leonardo" -> "davinci"

The lookup, insertion, and deletion operations of a hashmap have an average computational cost of O(1). Python's dictionary is an example of a hashmap.
In Python, that means dictionaries. In Go, it means maps. In JavaScript, it means object literals. The point is, if you need an in-memory key-value store, hashmaps are awesome, and every language tends to have a built-in implementation.

Under the hood

The hashmap are built on top of the Array's. They use a hash function to convert a "hashable" key into an index in the array. From a high-level, all that matters to us is that the hash function:
1. Takes a key and returns an integer.
2. Always returns the same integer for the same key.
3. Always returns a valid index in the array (e.g. not negative, and not greater than the array size)
Ideally the hash function hashes each key to a unique index, but most hash table designs employ an imperfect hash function, which might cause hash collisions where the hash function generates the same index for more than one key.

Properties of a good Hash map

Fast Lookups: Hash maps have an average time complexity of O(1) for lookups, insertions, and deletions.
Unordered: Hash maps (typically) do not guarantee any particular order of keys.
No Ranging: While hash maps are great for lookups, they don't provide the ability to look into a range of keys (e.g. the largest ten keys). That's one reason production databases like Postgres use binary trees for indexing.
Collision Resistant: Hash maps are built on top of arrays and use a hash function to convert a key into an index. Production-ready implementations (like Python dictionaries) handle hash collisions and make them a non-issue in practice.
Hashable Keys: Keys must be hashable. This means they must be immutable and have a consistent hash value. For example, in Python, a tuple can be a key, but a list cannot.
Efficient Resizing: When a hashmap's capacity is exceeded, it dynamically resizes (usually by doubling in size) and rehashes the elements. A good hash map manages this resizing efficiently, minimizing performance hits.
Uniform Distribution: A good hash function ensures keys are distributed evenly across the hashmap's underlying array, minimizing the number of collisions and optimizing lookup speed.

Handling collisions

Linear Probing: It works by finding next available slot after the collision index and placing the data there. When collision happens we start finding next available index by going sequentially from the collision index.

Code of basic Hash map

from functools import reduce

class HashMap:
    def __init__(self, size):
        self.hashmap = [None for i in range(size)]

    def get(self, key):
        idx = self.key_to_index(key) 
        if self.hashmap[idx] is None:
            raise KeyError("key not found")
        return self.hashmap[idx][1]

    def key_to_index(self, key):
        sum = 0
        for c in key:
            sum += ord(c)
        return sum % len(self.hashmap)

    '''
        Implemented resize mechanism if hashmap seems to be full
    '''
    def resize(self):
        if len(self.hashmap) == 0:
            self.hashmap = [None]
            return
        current_filled = self.current_load()
        if (current_filled * 100) < 5:
            return
        key_values = [ele for ele in self.hashmap if ele is not None]
        self.hashmap = [None for i in range(len(self.hashmap) * 10)]
        for key, val in key_values:
            self.insert(key, val)
    
    '''
        Implemented linear probing mechanism to avoid collisions and also resize when needed
    '''
    def insert(self, key, value):
        self.resize()
        idx = self.key_to_index(key)
        while self.hashmap[idx] is not None:
            idx = (idx + 1) % len(self.hashmap)
        self.hashmap[idx] = (key, value)

    def get(self, key):
        idx = self.key_to_index(key)
        org_idx = idx
        iter = True
        while (org_idx == idx and iter) or self.hashmap[idx]:  
            if self.hashmap[idx][0] == key:
                return self.hashmap[idx][1]
            idx = (idx+1) % len(self.hashmap)
        raise Exception("sorry, key not found")

    def __repr__(self):
        final = ""
        for i, v in enumerate(self.hashmap):
            if v != None:
                final += f" - {str(v)}\n"
        return final

Tries

A Trie is a specialized tree-like data structure that organizes and stores words in an efficient manner. Each node in a Trie represents a single character of a word, allowing for efficient addition, retrieval, and deletion of words.
In simpler terms, a Trie is an advanced information retrieval data structure. It outperforms more conventional data structures like Hashmaps and Trees in terms of the time complexity for various operations, making it particularly useful for tasks involving large sets of strings or words.
A trie is also often referred to as a "prefix tree" because it can be used to efficiently find all of the words that start with a given prefix.
A trie is easily implemented as a nested tree of dictionaries where each key is a character that maps to the next character in a word. For example, the words:
- hello
- help
- hi

Would be represented as:

{
	"h": {
		"e": {
			"l": {
				"l": {
					"o": {
						"*": True
					}
				},
				"p": {
					"*": True
				}
			}
		},
		"i": {
			"*": True
		}
	}
}

The * character is used to indicate the end of a word.

Prefix Matching

Tries tend to be most useful for prefix matching. For example, if you wanted to find all the words in a dictionary that start with a given prefix, a trie works great! Autocomplete, keyword search, and spellcheck are all good examples of where a trie can be useful.
Remember, a hash table is only good for exact matches, whereas a trie allows you to look up all of the words that match a given prefix.

class Trie:
    def search_level(self, cur, cur_prefix, words):
        if self.end_symbol in cur:
            words.append(cur_prefix)
        for letter in sorted(cur.keys()):
            if letter != self.end_symbol:
                self.search_level(cur[letter], cur_prefix + letter, words)
        return words

    def words_with_prefix(self, prefix):
        words = []
        current = self.root
        for letter in prefix:
            if letter not in current:
                return []
            current = current[letter]
        return self.search_level(current, prefix, words)

Find Matches

Tries are super efficient when it comes to finding substrings in a large document of text.

class Trie:
    def find_matches(self, document):
        matches = set()
        for i in range(len(document)):
            level = self.root
            for j in range(i, len(document)):
                ch = document[j]
                if ch not in level:
                    break
                level = level[ch]
                if self.end_symbol in level:
                    matches.add(document[i : j + 1])
        return matches

Longest Common Prefix

class Trie:
    def longest_common_prefix(self):
        curr = self.root
        prefix = ""
        while True:
            if self.end_symbol in curr:
                break
            dict_keys = list(curr.keys())
            if len(dict_keys) == 1:
                prefix += dict_keys[0]
                curr = curr[dict_keys[0]]
            else:
                break
        return prefix

Code

class Tries:
    def __init__(self):
        self.root = {}
        self.end_symbol = "*"
    
    def add(self, word):
        curr = self.root
        for letter in word:
            if letter not in curr:
                curr[letter] = {}
            curr = curr[letter]
        curr[self.end_symbol] = True
	
	def exists(self, word):
        curr = self.root
        for letter in word:
            if letter not in curr:
                return False
            curr = curr[letter]
        if self.end_symbol in curr:
            return True
        return False

Dynamic Programming

[[Dynamic Programming]]