Feineigle.com - Python Algorithms - Mastering Basic Algorithms in the Python Language

# Home · Book Reports · 2017 · Python Algorithms - Mastering Basic Algorithms in the Python Language

Published: March 22, 2017 (6 years ago.)
Tags:  Algorithms · Programming · Python

The book in...
One sentence:
Fairly detailed overview of many of the problems and algorithmic solutions expressed in python.

Five sentences:
It starts out with an overview of big O running times. Next the basics of graphs and trees are discussed, with more advanced discussion later. Many of the common mathematical equations and concepts are covered as well as more advance topics including proofs, NP hardness, and recurrence equations and their applications and running times. Searching and traversing are covered with basic concepts like depth/breadth first search as well as more advanced algorithms. The appendixes list algorithms, problems, and terminology.

designates my notes. / designates important.

# Thoughts

Not for beginners, but if you have a programming language (not python per se as python is pretty close to pseudo) under your belt and aren’t afraid of equations containing funny Greek squiggles, this can provide a lot of insight into problem solving in general.

Graphs are certainly not anything close to new, but with the advent of social media they have come into the “mainstream”. Even so, the graphs discussed here don’t have much in common with the typical social network. They are used here to solve problems that are (sometimes) not intuitively seen as graph problems.

It includes a LOT of code snippets that are well commented as well as list (in the appendixes) of common algorithms and problems. It seems like it might be a worthy reference if you are looking for an algorithmic solution, but aren’t sure which one you need. Heck, you might not even be sure what you problem is and how it might be able to be tweaked so as to take advantage of a time-tested solution.

## · Chapter 1: Introduction

###### page 7:
• Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2009). Introduction to Algorithms, second edition. MIT Press.

## · Chapter 2: The Basics

###### page 13:
• The expression O(g), for some function g(n), represents a set of functions, and a function f (n) is in this set if it satisfies the following condition: there exists a natural number n0 and a positive constant c such that f (n) ≤ cg(n) for all n ≥ n0.

• In other words, if we’re allowed to tweak the constant c (for example, by running the algorithms on machines of different speeds), the function g will eventually (that is, at n0) grow bigger than f. See Figure 2-1 for an example.

• omega, the opposite of big oh: f(n) ≥ cg(n)

###### page 14:
• Note Our first two asymptotic operators, O and Ω, are each others’ inverses: if f is O(g ), then g is Ω(f ).

• The sets formed by Θ are simply intersections of the other two, that is, Θ(g) = O(g) ∩ Ω(g).

###### page 16:
• In a sum, only the dominating summand matters. For example, Θ(n2 + n3 + 42) = Θ(n3).

• In a product, constant factors don’t matter. For example, Θ(4.2n lg n) = Θ(n lg n).

###### page 20:
• If you have a tweak you think will improve your program, try it! Implement the tweak, and run some experiments. Is there an improvement? And if the tweak makes your code less readable and the improvement is small, is it really worth it?
###### page 21:
• Tip 2: For timing things, use timeit. (module)

• If you want to time a function (which could, for example, be a test function wrapping parts of your code), it may be even easier to use timeit from the shell command line, using the -m switch: \$ python -m timeit -s"import mymodule as m" “m.myfunction()”

``````import cProfile
cProfile.run('main()')
``````
• want to empirically examine the behavior of your algorithm on a given problem instance, the trace module in the standard library can be useful
###### page 23:
• In many cases, if you can formulate what you’re working on as a graph problem, you’re (at least) halfway to a solution. And if your problem instances are in some form expressible as trees, you stand a good chance of having a really efficient solution.

• Trees are just a special kind of graphs, so most algorithms and representations for graphs will work for them as well.

###### page 25:
• What this means to us is that accessing elements of a dict or set can be assumed to take constant (expected) time, which makes them highly useful building blocks for more complex structures and algorithms.
###### page 26:
• Listing 2-1. A Straightforward Adjacency Set Representation:
``````a, b, c, d, e, f, g, h = range(8)
N = [
{b, c, d, e, f}, # a
{c, e}, # b
{d}, # c
{e}, # d
{f}, # e
{c, g, h}, # f
{f, h}, # g
{f, g}  # h
]
``````
###### page 27:
• in our code, N[v] is now a set of v’s neighbors. Assuming you have defined N as earlier in an interactive interpreter, you can now play around with the graph:
``````>>> b in N[a] # Neighborhood membership
True
>>> len(N[f]) # Degree
3
``````
• Can use lists over sets, is slower but has less overhead if all you have is smaller data sets.
###### page 28:
• Yet another minor tweak on this idea is to use dicts instead of sets or lists. The neighbors would then be keys in this dict, and you’d be free to associate each neighbor (or out-edge) with some extra value, such as an edge weight. How this might look is shown in Listing 2-3 (with arbitrary edge weights added).

• Listing 2-3. Adjacency dicts with Edge Weights:

``````a, b, c, d, e, f, g, h = range(8)
N = [
{b:2, c:1, d:3, e:9, f:4}, # a
{c:4, e:3}, # b
{d:8}, # c
{e:7}, # d
{f:5}, # e
{c:2, g:2, h:2}, # f
{f:1, h:6}, # g
{f:9, g:8} # h
]
``````
• The adjacency dict version can be used just like the others, with the additional edge weight functionality:
``````>>> b in N[a] # Neighborhood membership
True
>>> len(N[f]) # Degree
3
>>> N[a][b]
# Edge weight for (a, b)
2
``````
###### page 29:
• Listing 2-4. A Dict with Adjacency Sets:
``````N = {
'a': set('bcdef'),
'b': set('ce'),
'c': set('d'),
'd': set('e'),
'e': set('f'),
'f': set('cgh'),
'g': set('fh'),
'h': set('fg')
}
``````
###### page 30:
• inf or float(inf) == infinity
###### page 34:
``````class Bunch(dict):
def __init__(self, *args, **kwds):
super(Bunch, self).__init__(*args, **kwds)
self.__dict__ = self
``````
• There are several useful aspects to this pattern. First, it lets you create and set arbitrary attributes by supplying them as command-line arguments:
``````>>> x = Bunch(name="Jayne Cobb", position="Public Relations")
>>> x.name
'Jayne Cobb'
Second, by subclassing dict, you get lots of functionality for free, such as iterating over the keys/attributes
or easily checking whether an attribute is present. Here’s an example:
>>> T = Bunch
>>> t = T(left=T(left="a", right="b"), right=T(left="c"))
>>> t.left
{'right': 'b', 'left': 'a'}
>>> t.left.right
'b'
>>> t['left']['right']
'b'
>>> "left" in t.right
True
>>> "right" in t.right
False
``````
###### page 37:
• if your program is fast enough, there’s no need to optimize it.

• If you are doing multiple lookups in a list, it may be faster to cast it as a set.

• (without the optimizations) you need to create a new string for every += operation,

``````>>> s = ""
>>> for chunk in input():
... s += chunk
``````
• A better solution would be the following:
``````>>> chunks = []
>>> for chunk in input():
... chunks.append(chunk)
...
>>> s = ''.join(chunks)
``````
• You could even simplify this further:
``````>>> s = ''.join(input())
``````
###### page 38:
``````>>> result = sum(lists, [])
``````
• Faster
``````>>> res = []
>>> for lst in lists:
... res.extend(lst)
``````
###### page 39:
• Float inaccuracy:
``````>>> sum(0.1 for i in range(10)) == 1.0
False
``````
• never compare floats for equality.

• check whether they are approximately equal. For example, you could take the approach of assertAlmostEqual from the unittest module:

``````>>> def almost_equal(x, y, places=7):
... return round(abs(x-y), places) == 0
...
>>> almost_equal(sum(0.1 for i in range(10)), 1.0)
True
``````
• There are also tools you can use if you need exact decimal floating-point numbers, for example the decimal module:
``````>>> from decimal import *
>>> sum(Decimal("0.1") for i in range(10)) == Decimal("1.0")
True
``````
• sage calculates exact values, sagemath.org, but sage is much slower.

## · Chapter 3: Counting 101

###### page 47:
• a technical meaning of the word tournament in graph theory (a complete graph, where each edge is assigned a direction
###### page 48:
• round-robin events are = n(n –1)/2, where n is the number of participants.
###### page 49:
• In a single elimination/knockout tournament:

• In the first round, all the knights are paired, so we have n/2 matches. Only half of them go on to the second round, so there we have n/4 matches. We keep on halving until the last match, giving us the sum n/2 + n/4 + n/8 + … + 1, or, equivalently, 1 + 2 + 4 + … + n/2.

• The upper limit, h –1, is the number of rounds (or h the height of the binary tree, so 2^h = n).

###### page 50:
• summing up the powers of two, you always get one less than the next power of two. For example, 1 + 2 + 4 = 8 – 1, or 1 + 2 + 4 + 8 = 16 – 1, and so forth.
###### page 51:
• Binary searching, how to solve how many “cuts” is needed with log base 2
``````>>> from random import randrange
>>> n = 10**90
>>> p = randrange(10**90)

>>> p < n/2 #cut the search in half
True

>>> from math import log
>>> log(n, 2) # base-two logarithm
298.97352853986263 #how many guesses to always win
``````
###### page 54:
• Permutations are calculated with factorial.
• Combinations are calculated with binomial coefficients. C(n,k) or “n choose k”
###### page 56:
• Recursion and Recurrences
###### page 58:
• The general form of the recurrences you’ll normally encounter is is T(n) = a * T(g(n)) + f(n), where a represents the number of recursive calls, g(n) is the size of each subproblem to be solved recursively, and f(n) is any extra work done in the function, in addition to the recursive calls.

## · Chapter 4: Induction and Recursion… and Reduction

###### page 71:
• Reduction means transforming one problem to another. We normally reduce an unknown problem to one we know how to solve. The reduction may involve transforming both the input (so it works with the new problem) and the output (so it’s valid for the original problem).

• Induction (or, mathematical induction) is used to show that a statement is true for a large class of objects (often the natural numbers). We do this by first showing it to be true for a base case (such as the number 1) and then showing that it “carries over” from one object to the next (if it’s true for n –1, then it’s true for n).

• Recursion is what happens when a function calls itself. Here we need to make sure the function works correctly for a (nonrecursive) base case and that it combines results from the recursive calls into a valid solution.

###### page 78:
• There is usually less overhead with using a loop than with recursion (so it can be faster), and in most languages (Python included), there is a limit to how deep the recursion can go (the maximum stack depth).
###### page 79:
``````'''Listing 4-1. Recursive Insertion Sort'''
def ins_sort_rec(seq, i):
if i==0: return                        #Base case -- do nothing
ins_sort_rec(seq, i-1)                 #Sort 0..i-1
j = i                                  #Start "walking" down
while j > 0 and seq[j-1] > seq[j]:     #Look for OK spot
seq[j-1], seq[j] = seq[j], seq[j-1]    #Keep moving seq[j] down
j -= 1                                 #Decrement j
``````
``````'''Listing 4-2. Insertion Sort'''
def ins_sort(seq):
for i in range(1,len(seq)):             #0..i-1 sorted so far
j = i                                 #Start "walking" down
while j > 0 and seq[j-1] > seq[j]:    #Look for OK spot
seq[j-1], seq[j] = seq[j], seq[j-1] #Keep moving seq[j] down
j -= 1                              #Decrement j
``````
• 7 These algorithms aren’t all that useful, but they’re commonly taught, because they serve as excellent examples. Also, they’re classics, so any algorist should know how they work.
###### page 80:
• Listing 4-3. Recursive Selection Sort

• Listing 4-4. Selection Sort

• The recursive implementation explicitly represents the inductive hypothesis (as a recursive call), while the iterative version explicitly represents repeatedly performing the inductive step.

• These are two major variations of reductions: reducing to a different problem and reducing to a shrunken version of the same. If you think of the subproblems as vertices and the reductions as edges, you get the subproblem graph discussed in Chapter 2, a concept I’ll revisit several times. (It’s especially important in Chapter 8.)

###### page 82:
• in essence, a mapping like this is just a position (0…n –1) associated with each element (also 0…n –1), and we can implement this using a simple list. For example, the example in Figure 4-4 (if a = 0, b = 1, …) can be represented as follows:
``````>>> M = [2, 2, 0, 5, 3, 5, 7, 4]
>>>
0
``````
• Tip When possible, try to use a representation that is as specific to your problem as possible. More general representations can lead to more bookkeeping and complicated code; if you use a representation that implicitly embodies some of the constraints of the problem, both finding and implementing a solution can be much easier.
###### page 83:
``````'''Listing 4-5. A Naïve Implementation of the Recursive Algorithm Idea
for Finding a Maximum Permutation'''
def naive_max_perm(M, A=None):
if A is None:                   # The elt. set not supplied?
A = set(range(len(M)))        # A = {0, 1, ... , n-1}
if len(A) == 1: return A        # Base case -- single-elt. A
B = set(M[i] for i in A)        # The "pointed to" elements
C = A - B                       # "Not pointed to" elements
if C:                           # Any useless elements?
A.remove(C.pop())             # Remove one of them
return naive_max_perm(M, A)   # Solve remaining problem
return A                        # All useful -- return all

>>> naive_max_perm(M)
{0, 2, 5} # So, a, c, and f can take part in the permutation.
``````
###### page 84:
• Tip In recent versions of Python, the collections module contains the Counter class, which can count (hashable) objects for you.
###### page 85:
``````from collections import defaultdict

def counting_sort(A, key=lambda x: x):
B, C = [], defaultdict(list)       # Output and "counts"
for x in A:
C[key(x)].append(x)              # "Count" key(x)
for k in range(min(C), max(C)+1):  # For every key in the range
B.extend(C[k])                   # Add values in sorted order
return B
``````
• By default, I’m just sorting objects based on their values. By supplying a key function, you can sort by anything you’d like.

• key() in C[].append() is interesting.

###### page 87:
• the tasks to be undertaken will have dependencies that partially restrict their ordering.

• Such dependencies are (as mentioned in Chapter 2) easily represented as a directed acyclic graph (DAG), and finding an ordering that respect the dependencies (so that all the edges point forward in the ordering) is called topological sorting. Figure 4-5 illustrates the concept.

###### page 88:
• Tip If you’re using a Unix system of some sort, you can play around with topological sorting of graphs described in plain-text files, using the tsort command.
###### page 89:
• Tip If a problem reminds you of a problem or an algorithm you already know, that’s probably a good sign. In fact, building a mental archive of problems and algorithms is one of the things that can make you a skilled algorist. If you’re faced with a problem and you have no immediate associations, you could systematically consider any relevant (or semirelevant) techniques you know and look for reduction potential.
###### page 91:
• A balance factor is defined for each internal node and is set to the difference between the heights of the left and right subtrees, where height is the greatest distance from the node (downward) to a leaf. For example, the left child of the root in Figure 4-6 has a balance factor of –2 because its left subtree is a leaf (with a height of 0), while its right child has a height of 2.
###### page 93:
• relaxation. You are in an airport, and you can reach several other airports by plane. From each of those airports, you can take the train to several towns and cities. Let’s say that you have a dict or list of flight times, A, so that A[u] is the time it will take you to get to airport u. Similarly, B[u][v] will give you the time it takes to get from airport u to town v by train. (B can be a list of lists or a dict of dicts, for example; see Chapter 2). Consider the following randomized way of estimating the time it will take you to get to each town, C[v]:
``````for v in range(n):
C[v] = float('inf')
for i in range(N):
u, v = randrange(n), randrange(n)
C[v] = min(C[v], A[u] + B[u][v]) # Relax
``````
• The idea here is to repeatedly see whether we can improve our estimate for C[v] by choosing another route. First go to u by plane, and then you take the train to v. If that gives us a better total time, we update C. As long as N is really large, we will eventually get the right answer for every town.
###### page 95:
• If you can (easily) reduce A to B, then B is at least as hard as A.

• If you want to show that X is hard and you know that Y is hard, reduce Y to X.

###### page 97:
• 4-1. A graph that you can draw in the plane without any edges crossing each other is called planar. Such a drawing will have a number of regions, areas bounded by the edges of the graph, as well as the (infinitely large) area around the graph. If the graph has V, E, and F nodes, edges, and regions, respectively, Euler’s formula for connected planar graphs says that V – E + F =
1. Prove that this is correct using induction.
• 4-4. A node is called central if the greatest (unweighted) distance from that node to any other in the same graph is minimum. That is, if you sort the nodes by their greatest distance to any other node, the central nodes will be at the beginning.

## · Chapter 5: Traversal: The Skeleton Key of Algorithmics

###### page 104:
• Listing 5-1. Walking Through a Connected Component of a Graph Represented Using Adjacency Sets
``````def walk(G, s, S=set()):            # Walk the graph from node s
P, Q = dict(), set()              # Predecessors + "to do" queue
P[s] = None                       # s has no predecessor
Q.add(s)                          # We plan on starting with s
while Q:                          # Still nodes to visit
u = Q.pop()                     # Pick one, arbitrarily
for v in G[u].difference(P, S): # New nodes?
Q.add(v)                      # We plan to visit them!
P[v] = u                      # Remember where we came from
return P                          # The traversal tree
``````
• Tip Objects of the set type let you perform set operations on other types as well! For example, in Listing 5-1, I use the dict P as if it were a set (of its keys) in the difference method. This works with other iterables too, such as list or deque, for example, and with other set methods, such as update.

• The S parameter Basically, it represents a “forbidden zone”—a set of nodes that we may not have visited during our traversal but that we have been told to avoid. As for the dictionary P, I’m using it to represent predecessors. Each time we add a new node to the queue, I set its predecessor; that is, I make sure I remember where I came from when I found it. These predecessors will, when taken together, form the traversal tree. If you don’t care about the tree, you’re certainly free to use a set of visited nodes instead .

• Note Whether you add nodes to this sort of “visited” set at the same time as adding them to the queue or later, when you pop them from the queue, is generally not important. It does have consequences for where you need to add an “if visited …” check, though.

###### page 105:
• Listing 5-2. Finding Connected Components
``````def components(G):          # The connected components
comp = []
seen = set()              # Nodes we've already seen
for u in G:               # Try every starting point
if u in seen: continue  # Seen? Ignore it
C = walk(G, u)          # Traverse component
seen.update(C)          # Add keys of C to seen
comp.append(C)          # Collect the components
return comp
``````
• The walk function returns a predecessor map (traversal tree) for the nodes it has visited, and I collect those in the comp list (of connected components). I use the seen set to make sure I don’t traverse from a node in one of the earlier components. Note that even though the operation seen.update(C) is linear in the size of C, the call to walk has already done the same amount of work, so asymptotically, it doesn’t cost us anything. All in all, finding the components like this is Θ(E + V), because every edge and node has to be explored.4 The walk function doesn’t really do all that much. Still, in many ways, this simple piece of code is the backbone of this chapter and (as the chapter title says) a skeleton key to understanding many of the other algorithms you’re going to learn. It might be worth studying it a bit.
###### page 110:
• Trémaux’s algorithm. In this formulation, it is commonly known as depth-first search, and it is one of the most fundamental (and fundamentally important) traversal algorithms.

• Listing 5-4. Recursive Depth-First Search

``````def rec_dfs(G, s, S=None):
if S is None: S = set()  # Initialize the history
for u in G[s]:           # Explore neighbors
if u in S: continue    # Already visited: Skip
rec_dfs(G, u, S)       # New: Explore recursively
``````
• [In depth first search] Once we start working with one node, we make sure we traverse all other nodes we can reach from it before moving on.
###### page 111:
``````def iter_dfs(G, s):
S, Q = set(), []      # Visited-set and queue
Q.append(s)           # We plan on visiting s
while Q:              # Planned nodes left?
u = Q.pop()         # Get one
if u in S: continue # Already visited? Skip it
S.add(u)            # We've visited it now
Q.extend(G[u])      # Schedule all neighbors
yield u             # Report u as visited
``````
• use of a stack (a last-in, first-out or LIFO queue, here implemented by a list, using append and pop), the queue is not a set, so we risk having the same node scheduled for more than one visit. solved by checking a node for membership in S before adding its neighbors. added a yield statement, which will let you iterate over the graph nodes in DFS order. For example, if you had the graph from Figure 2-3 in the variable G, you could try the following:
``````>>> list(iter_dfs(G, 0))
[0, 5, 7, 6, 2, 3, 4, 1]
``````
• both DFS and the other traversal algorithms work just as well for directed graphs. However, if you use DFS on a directed graph, you can’t expect it to explore an entire connected component. For example, for the graph in Figure 2-3, traversing from any other start node than “a” would mean that “a” would never be seen, because it has no in-edges.
###### page 112:
• Listing 5-6. A General Graph Traversal Function
``````def traverse(G, s, qtype=set):
S, Q = set(), qtype()
while Q:
u = Q.pop()
if u in S: continue
for v in G[u]:
yield u
``````
``````>>> list(traverse(G, 0, stack))
[0, 5, 7, 6, 2, 3, 4, 1]
``````
• a traversal without cycles naturally forms a tree.

• Listing 5-6. Depth-First Search with Timestamps

``````def dfs(G, s, d, f, S=None, t=0):
if S is None: S = set()     # Initialize the history
d[s] = t; t += 1            # Set discover time
for u in G[s]:              # Explore neighbors
if u in S: continue       # Already visited. Skip
t = dfs(G, u, d, f, S, t) # Recurse; update timestamp
f[s] = t; t += 1            # Set finish time
return t                    # Return timestamp
``````
###### page 113:
• If we perform DFS on a DAG, we could simply sort the nodes based on their descending finish times, and they’d be topologically sorted
###### page 114:
• If we’re looking for the shortest paths (disregarding edge weights, for now) from our start node to all the others, DFS will, most likely, give us the wrong answer.
###### page 116:
• Listing 5-8. Iterative Deepening Depth-First Search
``````def iddfs(G, s):
yielded = set()                     # Visited for the first time
def recurse(G, s, d, S=None):       # Depth-limited DFS
if s not in yielded:
yield s
if d == 0: return                 # Max depth zero: Backtrack
if S is None: S = set()
for u in G[s]:
if u in S: continue
for v in recurse(G, u, d-1, S): # Recurse with depth-1
yield v
n = len(G)
for d in range(n):                  # Try all depths 0..V-1
if len(yielded) == n: break       # All nodes seen?
for u in recurse(G, s, d):
yield u
``````
• the only salient difference [in BFS] from DFS: we’ve replaced LIFO with FIFO.
###### page 117:
``````def bfs(G, s):
P, Q = {s: None}, deque([s]) # Parents and FIFO queue
while Q:
u = Q.popleft()            # Constant-time for deque
for v in G[u]:
if v in P: continue      # Already has parent
P[v] = u                 # Reached from u: u is parent
Q.append(v)
return P
``````
• 5-9 is similar to 5-5. Deque over list.

• To extract a path to a node u, you can simply “walk backward” in P:

``````>>> path = [u]
>>> while P[u] is not None:
...     path.append(P[u])
...     u = P[u]
...
>>> path.reverse()
``````
• Tip One way of visualizing BFS and DFS is as browsing the Web. DFS is what you get if you keep following links and then use the Back button once you’re done with a page. The backtracking is a bit like an “undo.” BFS is more like opening every link in a new window (or tab) behind those you already have and then closing the windows as you finish with each page.
###### page 118:
• Python has a deque class in the collections module in the standard library. In addition to methods such as append, extend, and pop, which are performed on the right side, it has left equivalents, called appendleft , extendleft, and popleft. Internally, the deque is implemented as a doubly linked list of blocks, each of which is an array of individual elements.

• A connected component is a maximal subgraph where all nodes can reach each other if you ignore edge directions (or if the graph is undirected). To get strongly connected components, though, you need to follow the edge directions; so, SCCs are the maximal subgraphs where there is a directed path from any node to any other.

###### page 120:
• Listing 5-10. Kosaraju’s Algorithm for Finding Strongly Connected Components
``````def tr(G):                  # Transpose (rev. edges of) G
GT = {}
for u in G: GT[u] = set() # Get all the nodes in there
for u in G:
for v in G[u]:
return GT

def scc(G):
GT = tr(G)                # Get the transposed graph
sccs, seen = [], set()
for u in dfs_topsort(G):  # DFS starting points
if u in seen: continue  # Ignore covered nodes
C = walk(GT, u, seen) # Don't go "backward" (seen)
seen.update(C)        # We've now seen C
sccs.append(C)        # Another SCC found
return sccs
``````
• If you try running scc on the graph in Figure 5-7, you should get the three sets {a, b, c, d}; {e, f, g}; and {i, h}.

## · Chapter 6: Divide, Combine, and Conquer

###### page 126:
• Using weak induction generates a recurrance like T(n) = T(n-1) + T(1) + n. Each reduces the problem size by 1 and ends up with quadratic running time. T(n) = 2T(n/2) + n is the equation for dividing the subproblems more evenly (not 1 each step, half). This runs in (n log n) time. See fig 6.1, 6.2.
###### page 127:
• Weak induction is 1 step at a time reduction and strong induction is divide and conquer?
###### page 128:
• Listing 6-1. A General Implementation of the Divide and Conquer Scheme
``````def divide_and_conquer(S, divide, combine):
if len(S) == 1: return S
L, R = divide(S)
A = divide_and_conquer(L, divide, combine)
B = divide_and_conquer(R, divide, combine)
return combine(A, B)
``````
###### page 129:
• bisect module in standard library (written in C).
``````>>> from bisect import bisect
>>> a = [0, 2, 3, 5, 6, 8, 8, 9]
>>> bisect(a, 5)
4
``````
• it doesn’t return the position of the 5 that’s already there. Rather, it reports the position to insert the new 5, making sure it’s placed after all existing items with the same value. In fact, bisect is another name for bisect_right, and there’s also a bisect_left
###### page 133:
• Listing 6-2. Insertion into and Search in a Binary Search Tree
``````class Node:
lft = None
rgt = None
def __init__(self, key, val):
self.key = key
self.val = val

def insert(node, key, val):
if node is None: return Node(key, val)  # Empty leaf: it's not here
if node.key == key: node.val = val      # Found key: return val
elif key < node.key:                    # Less than the key?
node.lft = insert(node.lft, key, val) # Go left
else:                                   # Otherwise...
node.rgt = insert(node.rgt, key, val) # Go right
return node

def search(node, key):
if node is None: raise KeyError         # Empty leaf: add node here
if node.key == key: return node.val     # Found key: replace val
elif key < node.key:                    # Less than the key?
return search(node.lft, key)          # Go left
else:                                   # Otherwise...
return search(node.rgt, key)          # Go right

class Tree:                                 # Simple wrapper
root = None
def __setitem__(self, key, val):
self.root = insert(self.root, key, val)

def __getitem__(self, key):
return search(self.root, key)

def __contains__(self, key):
try: search(self.root, key)
except KeyError: return False
return True
``````
• Using it:
``````>>> tree = Tree()
>>> tree["a"] = 42
>>> tree["a"]
42
>>> "b" in tree
False
``````
• Note The implementation in Listing 6-2 does not permit the tree to contain duplicate keys. If you insert a new value with an existing key, the old value is simply overwritten. This could easily be changed, because the tree structure itself does not preclude duplicates.

• Bisection is fast, with little overhead, but works only on sorted arrays (such as Python lists). And sorted arrays are hard to maintain; adding elements takes linear time.

• Search trees have more overhead but are dynamic and let you insert and remove elements.

• In many cases, though, the clear winner will be the hash table, in the form of dict. Its average asymptotic running time is constant (as opposed to the logarithmic running time of bisection and search trees), and it is very close to that in practice, with little overhead.

• search trees will let you access your values in sorted order—either all of them or just a portion.

• Trees can also be extended to work in multiple dimensions, where hashing may be hard to achieve.

• if you want the entry that is closest to your lookup key, a search tree would be the way to go.

• Actually, more flexible may not be entirely correct. There are many objects (such as complex numbers) that can be hashed but that cannot be compared for size.

###### page 134:
• Tip If you’re looking for the k smallest (or largest) objects in an iterable in Python, you would probably use the nsmallest (or nlargest) function from the heapq module if your k is small, relative to the total number of objects. If the k is large, you should probably sort the sequence (either using the sort method or using the sorted function) and pick out the k first objects.

• Listing 6-3. A Straightforward Implementation of Partition and Select

``````def partition(seq):
pi, seq = seq[0], seq[1:]        # Pick and remove the pivot
lo = [x for x in seq if x <= pi] # All the small elements
hi = [x for x in seq if x > pi]  # All the large ones
return lo, pi, hi                # pi is "in the right place"

def select(seq, k):
lo, pi, hi = partition(seq)      # [<= pi], pi, [> pi]
m = len(lo)
if m == k: return pi             # We found the kth smalles
elif m < k:                      # Too far to the left
return select(hi, k-m-1)       # Remember to adjust k
else:                            # Too far to the right
return select(lo, k)           # Just use original k here
``````
###### page 137:
• Timsort is a close relative to merge sort. It’s an in-place algorithm, in that it merges segments and leaves the result in the original array (although it uses some auxiliary memory during the merging). Instead of simply sorting the array half-and-half and then merging those, though, it starts at the beginning, looking for segments that are already sorted (possibly in reverse), called runs. In random arrays, there won’t be many, but in many kinds of real data, there may be a lot—giving the algorithm a clear edge over a plain merge sort and a linear running time in the best case (and that covers a lot of cases beyond simply getting a sequence that’s already sorted).
###### page 138:
• lg(n!) is asymptotically equivalent to n*lg(n) I think lg here is natural log
###### page 140:
• A region is convex if you can draw a line between any two points inside it, and the line stays inside the region.
###### page 144:
• Note The 2-3-tree is a special case of the B-tree, which forms the basis of almost all database systems, and disk-based trees used in such diverse areas as geographic information systems and image retrieval. The important extension is that B-trees can have thousands of keys (and subtrees), and each node is usually stored as a contiguous block on disk. The main motivation for the large blocks is to minimize the number of disk accesses.
###### page 147:
• BLACK BOX: BINARY HEAPS, HEAPQ, AND HEAPSORT
###### page 149:
• If you like bisection, you should look up interpolation search, which for uniformly distributed data has an average-case running time of O(lg lg n). For implementing sets (that is, efficient membership checking) other than sorted sequences, search trees and hash tables, you could have a look at Bloom filters. If you like search trees and related structures, there are lots of them out there. You could find tons of different balancing mechanisms (red black trees, AVL-trees, splay trees), some of them randomized (treaps), and some of them only abstractly representing trees (skip lists). There are also whole families of specialized tree structures for indexing multidimensional coordinates (so-called spatial access methods) and distances (metric access methods). Other trees structures to check out are interval trees, quadtrees, and octtrees.

## · Chapter 7: Greed Is Good? Prove It!

###### page 151:
• So-called greedy algorithms are short-sighted, in that they make each choice in isolation, doing what looks good right here, right now. In many ways, eager or impatient might be better names for them,
###### page 153:
• Making change, denom is \$10-0.01, owed in cents. It works but is brittle. Reordering denom or different currencies break it.
``````>>> denom = [10000, 5000, 2000, 1000, 500, 200, 100, 50, 25, 10, 5, 1]
>>> owed = 56329
>>> payed = []
>>> for d in denom:
...   while owed >= d:
...   owed -= d
...   payed.append(d)
...
>>> sum(payed)
56329
>>> len(payed)
14
``````
###### page 156:
• If we view each object individually, this is often called 0-1 knapsack, because we can take 0 or 1 of each object.
###### page 157:
• Your goal for the tree structure is still the same. How can you minimize the sum of depth(u) × weight(u) over all leaves u? This problem certainly has other applications as well. In fact, the original (and most common) application is compression—representing a text more compactly—through variable-length codes.

• let a partial solution consist of several tree fragments and then repeatedly combine them. When we combine two trees, we add a new, shared root and give it a weight equal to the sum of its children (that is, the previous roots). This is exactly what the numbers inside the nodes in Figure 7-2 mean.

###### page 158:
• Listing 7-1. Huffman’s Algorithm
``````from heapq import heapify, heappush, heappop
from itertools import count
def huffman(seq, frq):
num = count()
trees = list(zip(frq, num, seq))      # num ensures valid ordering
heapify(trees)                        # A min-heap based on freq
while len(trees) > 1:                 # Until all are combined
fa, _, a = heappop(trees)           # Get the two smallest trees
fb, _, b = heappop(trees)
n = next(num)
heappush(trees, (fa+fb, n, [a, b])) # Combine and re-add them
return trees[0][-1]
``````
• If a future version of the heapq library lets you use a key function, such as in list.sort, you’d no longer need this tuple wrapping at all, of course.
###### page 159:
• Here’s an example of how you might use the code:
``````>>> seq = "abcdefghi"
>>> frq = [4, 5, 6, 9, 11, 12, 15, 16, 20]
>>> huffman(seq, frq)
[['i', [['a', 'b'], 'e']], [['f', 'g'], [['c', 'd'], 'h']]]
``````
• As you can see, the resulting tree structure is equivalent to the one shown in Figure 7-2.

• Listing 7-2. Extracting Huffman Codes from a Huffman Tree

``````def codes(tree, prefix=""):
if len(tree) == 1:
yield (tree, prefix) # A leaf with its code
return
for bit, child in zip("01", tree): # Left (0) and right (1)
for pair in codes(child, prefix + bit): # Get codes recursively
yield pair
``````
• The codes function yields (char, code) pairs suitable for use in the dict constructor, for example. To use such a dict to compress a code, you’d just iterate over the text and look up each character. To decompress the text, you’d rather use the Huffman tree directly, traversing it using the bits in the input for directions (that is, determining whether you should go left or right); I’ll leave the details as an exercise for the reader.
###### page 161:
• [A Minimum Spanning Tree] could be useful in building an electrical grid, building the core of a road or railroad network, laying out a circuit, or even performing some forms of clustering (where we’d only almost connect all the nodes).
###### page 165:
• Listing 7-4. Kruskal’s Algorithm (minimum spanning tree)
``````def find(C, u):
if C[u] != u:
C[u] = find(C, C[u]) # Path compression
return C[u]

def union(C, R, u, v):a
u, v = find(C, u), find(C, v)
if R[u] > R[v]: # Union by rank
C[v] = u
else:
C[u] = v
if R[u] == R[v]: # A tie: Move v up a level
R[v] += 1

def kruskal(G):
E = [(G[u][v],u,v) for u in G for v in G[u]]
T = set()
C, R = {u:u for u in G}, {u:0 for u in G} # Comp. reps and ranks
for _, u, v in sorted(E):
if find(C, u) != find(C, v):
union(C, R, u, v)
return T
``````
• find() above is making sure each node is not already in the subsolution.
###### page 167:
• Listing 7-5. Prim’s Algorithm
``````from heapq import heappop, heappush
def prim(G, s):
P, Q = {}, [(0, None, s)]
while Q:
_, p, u = heappop(Q)
continue
P[u] = p
for v, w in G[u].items():
heappush(Q, (w, u, v))
return P
``````
• prim() above adds duplicate nodes with different weights and discards duplicate heavier edges (u in P: continue).
###### page 168:
• Graham and Hell succinctly explain the algorithms as follows. A partial solution is a spanning forest, consisting of a set of fragments (components, trees). Initially, each node is a fragment. In each iteration, edges are added, joining fragments, until we have a spanning tree.

• Algorithm 1: Add a shortest edge that joins two different fragments.

• Algorithm 2: Add a shortest edge that joins the fragment containing the root to another fragment.

• Algorithm 3: For every fragment, add the shortest edge that joins it to another fragment.

• For algorithm 2, the root is chosen arbitrarily at the beginning. For algorithm 3, it is assumed that all edge weights are different to ensure that no cycles can occur. As you can see, all three algorithms are based on the same fundamental fact—that the shortest edge over a cut is safe. Also, in order to implement them efficiently, you need to be able to find shortest edges, detect whether two nodes belong to the same fragment, and so forth (as explained for algorithms 1 and 2 in the main text). Still, these brief explanations can be useful as a memory aid or to get the bird’s-eye perspective on what’s going on.

###### page 172:
• Greedy algorithms take the best element at any given step, without reguard for previous/later choices.

• Often easy to write, hard to prove they are optimal. Often prove by “keeping up”.

• Important greedy problems and algorithms discussed in this chapter include the knapsack problem (selecting a weight-bounded subset of items with maximum value), where the fractional version can be solved greedily; Huffman trees, which can be used to create optimal prefix codes, and are built greedily by combining the smallest trees in the partial solution; and minimum spanning trees, which can be built using Kruskal’s algorithm (keep adding the smallest valid edge) or Prim’s algorithm (keep connecting the node that is closest to your tree).

## · Chapter 8: Tangled Dependencies and Memoization

###### page 175:
• Another option—one I think is particularly suited to high-level languages such as Python—is to implement the recursive formulation directly but to cache the return values. If a call is made more than once with the same arguments, the result is simply returned directly from the cache. This is called memoization.
###### page 176:
• Listing 8-1. A Naïve Solution to the Longest Increasing Subsequence Problem
``````from itertools import combinations
def naive_lis(seq):
for length in range(len(seq), 0, -1):    # n, n-1, ... , 1
for sub in combinations(seq, length):  # Subsequences of given length
if list(sub) == sorted(sub):         # An increasing subsequence?
return sub                         # Return it!
``````
• combinations from itertools. Generates combinations (!) from lists.

• Algorithms should be DRY as well as code.

###### page 177:
• Listing 8-2. A Memoizing Decorator
``````from functools import wraps
def memo(func):
cache = {}                       # Stored subproblem solutions
@wraps(func)                     # Make wrap look like func
def wrap(*args):                 # The memoized wrapper
if args not in cache:          # Not already computed?
cache[args] = func(*args)    # Compute & cache the solution
return cache[args]             # Return the cached solution
return wrap                      # Return the wrapper
``````
• The idea of a memoized function4 is that it caches its return values. If you call it a second time with the same parameters, it will simply return the cached value. You can certainly put this sort of caching logic inside your function, but the memo function is a more reusable solution. It’s even designed to be used as a decorator.
###### page 178:
``````>>> @memo
... def fib(i):
... if i < 2: return 1
... return fib(i-1) + fib(i-2)
...
>>> fib(100)
573147844013817084101
``````
• The main difference is that the subproblems have tangled dependencies. Or, to put it in another way: We’re faced with overlapping subproblems. This is perhaps even clearer in this rather silly relative of the Fibonacci numbers: a recursive formulation of the powers of two:
``````>>> def two_pow(i):
... if i == 0: return 1
... return two_pow(i-1) + two_pow(i-1)
...
>>> two_pow(10)
1024
>>> two_pow(100) #hangs
``````
• Still horrible. Try adding @memo, and you’ll get the answer instantly. Or, you could try to make the following change, which is actually equivalent:
``````>>> def two_pow(i):
... if i == 0: return 1
... return 2*two_pow(i-1)
...
>>> print(two_pow(10))
1024
>>> print(two_pow(100))
1267650600228229401496703205376
``````
###### page 179:

(n) = (n-1) + (n-1) n choose k (k) (k-1) (k)

• Figure 8-2 shows how the binomial coefficients can be placed in a triangular pattern so that each number is the sum of the two above it. This means that the row (counting from zero) corresponds to n, and the column (the number of the cell, counting from zero at the left in its row) corresponds to k. For example, the value 6 corresponds to C(4, 2), and can be calculated as C(3,1) + C(3,2) = 3 + 3 = 6.
###### page 180:
• The code for C(n, k) is trivial:
``````>>> @memo
>>> def C(n,k):
... if k == 0: return 1
... if n == 0: return 0
... return C(n-1,k-1) + C(n-1,k)
>>> C(4,2)
6
>>> C(10,7)
120
>>> C(100,50)
100891344545564193334812497256
``````
###### page 181:
• Let’s reverse our algorithm, filling out Pascal’s triangle directly. To keep things simple, I’ll use a defaultdict as the cache; feel free to use nested lists, for example. (See also Exercise 8-4.)
``````>>> from collections import defaultdict
>>> n, k = 10, 7
>>> C = defaultdict(int)
>>> for row in range(n+1):
... C[row,0] = 1
... for col in range(1,k+1):
... C[row,col] = C[row-1,col-1] + C[row-1,col]
...
>>> C[n,k]
120
``````
• Basically the same thing is going on. The main difference is that we need to figure out which cells in the cache need to be filled out, and we need to find a safe order to do it in so that when we’re about to calculate C[row,col], the cells C[row-1,col-1] and C[row-1,col] are already calculated. With the memoized function, we needn’t worry about either issue: it will calculate whatever it needs recursively.

• One useful way to visualize dynamic programming algorithms with one or two subproblem parameters (such as n and k, here) is to use a (real or imagined) spreadsheet. For example, try calculating binomial coefficients in a spreadsheet by filling the first column with ones and filling in the rest of the first row with zeros. Put the formula =A1+B1 into cell B2, and copy it to the remaining cells.

###### page 183:
• Listing 8-3. Recursive, Memoized DAG Shortest Path
``````def rec_dag_sp(W, s, t):                   # Shortest path from s to t
@memo # Memoize f
def d(u): # Distance from u to t
if u == t:
return 0                             # We're there!
return min(W[u][v]+d(v) for v in W[u]) # Best of every first step
return d(s)
``````
• Listing 8-4. DAG Shortest Path
``````def dag_sp(W, s, t):
d = {u:float('inf') for u in W}
d[s] = 0
for u in topsort(W):
if u == t: break
for v in W[u]:
d[v] = min(d[v], d[u] + W[u][v])
return d[t]
``````
###### page 185:
• Listing 8-5. A Memoized Recursive Solution to the Longest Increasing Subsequence Problem
``````def rec_lis(seq):                           # Longest increasing subseq.
@memo
def L(cur):                               # Longest ending at seq[cur]
res = 1                                 # Length is at least 1
for pre in range(cur):                  # Potential predecessors
if seq[pre] <= seq[cur]:              # A valid (smaller) predec.
res = max(res, 1 + L(pre))          # Can we improve the solution?
return res
return max(L(i) for i in range(len(seq))) # The longest of them all
``````
• Listing 8-6. A Basic Iterative Solution to the Longest Increasing Subsequence Problem
``````def basic_lis(seq):
L = [1] * len(seq)
for cur, val in enumerate(seq):
for pre in range(cur):
if seq[pre] <= val:
L[cur] = max(L[cur], 1 + L[pre])
return max(L)
``````
###### page 187:
• Listing 8-7. Longest Increasing Subsequence
``````from bisect import bisect
def lis(seq):                         # Longest increasing subseq.
end = []                            # End-values for all lengths
for val in seq:                     # Try every value, in order
idx = bisect(end, val)              # Can we build on an end val?
if idx == len(end): end.append(val) # Longest seq. extended
else: end[idx] = val                # Prev. endpoint reduced
return len(end)                     # The longest we found
``````
• if the length of the sequences are m and n and the length of the longest common subsequence is k, the edit distance is m + n - 2k.
###### page 188:
• Listing 8-8. A Memoized Recursive Solution to the LCS Problem
``````def rec_lcs(a,b):                          # Longest common subsequence
@memo # L is memoized
def L(i,j):                              # Prefixes a[:i] and b[:j]
if min(i,j) < 0: return 0              # One prefix is empty
if a[i] == b[j]: return 1 + L(i-1,j-1) # Match! Move diagonally
return max(L(i-1,j), L(i,j-1))         # Chop off either a[i] or b[j]
return L(len(a)-1,len(b)-1)              # Run L on entire sequences
``````
###### page 189:
• Listing 8-9. An Iterative Solution to the Longest Common Subsequence (LCS)
``````def lcs(a,b):
n, m = len(a), len(b)
pre, cur = [0]*(n+1), [0]*(n+1)      # Previous/current row
for j in range(1,m+1):               # Iterate over b
pre, cur = cur, pre                # Keep prev., overwrite cur.
for i in range(1,n+1):             # Iterate over a
if a[i-1] == b[j-1]:             # Last elts. of pref. equal?
cur[i] = pre[i-1] + 1          # L(i,j) = L(i-1,j-1) + 1
else:                            # Otherwise...
cur[i] = max(pre[i], cur[i-1]) # max(L(i,j-1),L(i-1,j))
return cur[n] # L(n,m)
``````
###### page 190:
• Listing 8-10. A Memoized Recursive Solution to the Unbounded Integer Knapsack Problem
``````def rec_unbounded_knapsack(w, v, c): # Weights, values and capacity
@memo                              # m is memoized
def m(r):                          # Max val. w/remaining cap. r
if r == 0: return 0              # No capacity? No value
val = m(r-1)                     # Ignore the last cap. unit?
for i, wi in enumerate(w):       # Try every object
if wi > r: continue            # Too heavy? Ignore it
val = max(val, v[i] + m(r-wi)) # Add value, remove weight
return val                       # Max over all last objects
return m(c)                        # Full capacity available
``````
• The running time here depends on the capacity and the number of objects. Each memoized call m(r) is computed only once, which means that for a capacity c, we have Θ(c) calls. Each call goes through all the n objects, so the resulting running time is Θ(cn).
###### page 191:
• Listing 8-11. An Iterative Solution to the Unbounded Integer Knapsack Problem
``````def unbounded_knapsack(w, v, c):
m = [0]
for r in range(1,c+1):
val = m[r-1]
for i, wi in enumerate(w):
if wi > r: continue
val = max(val, v[i] + m[r-wi])
m.append(val)
return m[c]
``````
###### page 192:
• Listing 8-12. A Memoized Recursive Solution to the 0-1 Knapsack Problem
``````def rec_knapsack(w, v, c):              # Weights, values and capacity
@memo                                   # m is memoized
def m(k, r):                            # Max val., k objs and cap r
if k == 0 or r == 0: return 0           # No objects/no capacity
i = k-1                                 # Object under consideration
drop = m(k-1, r)                        # What if we drop the object?
if w[i] > r: return drop                # Too heavy: Must drop it
return max(drop, v[i] + m(k-1, r-w[i])) # Include it? Max of in/out
return m(len(w), c)                     # All objects, all capacity
``````
• Listing 8-13. An Iterative Solution to the 0-1 Knapsack Problem
``````def knapsack(w, v, c):                    # Returns solution matrices
n = len(w)                              # Number of available items
m = [[0]*(c+1) for i in range(n+1)]     # Empty max-value matrix
P = [[False]*(c+1) for i in range(n+1)] # Empty keep/drop matrix
for k in range(1,n+1):                  # We can use k first objects
i = k-1                               # Object under consideration
for r in range(1,c+1):                # Every positive capacity
m[k][r] = drop = m[k-1][r]          # By default: drop the object
if w[i] > r: continue               # Too heavy? Ignore it
keep = v[i] + m[k-1][r-w[i]]        # Value of keeping it
m[k][r] = max(drop, keep)           # Best of dropping and keeping
P[k][r] = keep > drop               # Did we keep it?
return m, P                             # Return full results
``````
• Now that the knapsack function returns more information, we can use it to extract the set of objects actually included in the optimal solution. For example, you could do something like this:
``````>>> m, P = knapsack(w, v, c)
>>> k, r, items = len(w), c, set()
>>> while k > 0 and r > 0:
...   i = k-1
...   if P[k][r]:
...     r -= w[i]
...   k -= 1
``````
• In other words, by simply keeping some information about the choices made (in this case, keeping or dropping the element under consideration), we can gradually trace ourselves back from the final state to the initial conditions. In this case, I start with the last object and check P[k][r] to see whether it was included. If it was, I subtract its weight from r; if it wasn’t, I leave r alone (as we still have the full capacity available). In either case, I decrement k, because we’re done looking at the last element and now want to have a look at the next-to-last element (with the updated capacity). You might want to convince yourself that this backtracking operation has a linear running time. The same basic idea can be used in all the examples in this chapter.
###### page 196:
• Problems solved using DP in this chapter include calculating binomial coefficients, finding shortest paths in DAGs, finding the longest increasing subsequence of a given sequence, finding the longest common subsequence of two given sequences, getting the most out of your knapsack with limited and unlimited supplies of indivisible items, and building binary search trees that minimize the expected lookup time.

• For some sequence comparison goodness in the Python standard library, check out the difflib module. If you have Sage installed, you could have a look at its knapsack module (sage.numerical.knapsack).

## · Chapter 9: From A to B with Edsger and Friends

###### page 200:
• Listing 9-1. The Relaxation Operation
``````inf = float('inf')
def relax(W, u, v, D, P):
d = D.get(u,inf) + W[u][v] # Possible shortcut estimate
if d < D.get(v,inf):       # Is it really a shortcut?
D[v], P[v] = d, u        # Update estimate and parent
return True              # There was a change!
``````
• The idea is that we look for an improvement to the currently known distance to v by trying to take a shortcut through v. If it turns out not to be a shortcut, fine. We just ignore it. If it is a shortcut, we register the new distance and remember where we came from (by setting P[v] to u). I’ve also added a small extra piece of functionality: the return value indicates whether any change actually took place; that’ll come in handy later (though you won’t need it for all your algorithms). Here’s a look at how it works:
``````>>> D[u]
7
>>> D[v]
13
>>> W[u][v]
3
>>> relax(W, u, v, D, P)
True
>>> D[v]
10
>>> D[v] = 8
>>> relax(W, u, v, D, P)
>>> D[v]
8
``````
• As you can see, the first call to relax improves D[v] from 13 to 10, because I found a shortcut through u, which I had (presumably) already reached using a distance of 7, and which was just 3 away from v. Now I somehow discover that I can reach v by a path of length 8. I run relax again, but this time, no shortcut is found, so nothing happens. As you can probably surmise, if I now set D[u] to 4 and ran the same relax again, D[v] would improve, this time to 7, propagating the improved estimate from u to v.
###### page 201:
• if you keep randomly relaxing forever, you know that you’ll have the right answer.
###### page 202:
• Tip Imagine each node continuously shouting out bids for supplying short paths to its out-neighbors, based on the shortest path it has gotten itself, so far. If any node gets a better offer than what it already has, it switches its path supplier and lowers its bids accordingly.
###### page 203:
• Listing 9-2. The Bellman-Ford Algorithm
``````def bellman_ford(G, s):
D, P = {s:0}, {}                    # Zero-dist to s; no parents
for rnd in G:                       # n = len(G) rounds
changed = False                   # No changes in round so far
for u in G:                       # For every from-node...
for v in G[u]:                  # ... and its to-nodes...
if relax(G, u, v, D, P):      # Shortcut to v from u?
changed = True              # Yes! So something changed
if not changed: break             # No change in round: Done
else:                               # Not done before round n?
raise ValueError('negative cycle')# Negative cycle detected
return D, P                         # Otherwise: D and P correct
``````
###### page 205:
• Listing 9-3. Dijkstra’s Algorithm
``````from heapq import heappush, heappop
def dijkstra(G, s):
D, P, Q, S = {s:0}, {}, [(0,s)], set() # Est., tree, queue, visited
while Q:                               # Still unprocessed nodes?
_, u = heappop(Q)                    # Node with lowest estimate
if u in S: continue                  # Already visited? Skip it
S.add(u)                             # We've visited it now
for v in G[u]:                       # Go through all its neighbors
relax(G, u, v, D, P)               # Relax the out-edge
heappush(Q, (D[v], v))             # Add to queue, w/est. as pri
return D, P                            # Final D and P returned
``````
• Dijkstra’s algorithm may be similar to Prim’s (with another set of priorities for the queue), but it is also closely related to another old favorite: BFS. Consider the case where the edge weights are positive integers. Now, replace an edge that has weight w with w -1 unweighted edges, connecting a path of dummy nodes (see Figure 9-2). We’re breaking what chances we had for an efficient solution (see Exercise 9-3), but we know that BFS will find a correct solution. In fact, it will do so in a way very similar to Dijkstra’s algorithm: it will spend an amount of time on each (original) edge proportional to its weight
###### page 206:
• It’s a bit like you had set up a series of dominoes along each edge (the number of dominoes proportional to the weight), and you then tip the first domino in the start node. A node may be reached from multiple directions, but we can see which direction won, by looking at which dominoes lie below the others.

• If we started with this approach, we could see Dijkstra’s algorithm as a way of gaining performance by “simulating” BFS, or the dominoes (or flowing water or a spreading sound wave, or …), without bothering to deal with each dummy node (or domino) individually.

###### page 207:
• Listing 9-4. Johnson’s Algorithm
``````from copy import deepcopy
def johnson(G):                 # All pairs shortest paths
G = deepcopy(G)               # Don't want to break original
s = object()                  # Guaranteed unused node
G[s] = {v:0 for v in G}       # Edges from s have zero wgt
h, _ = bellman_ford(G, s)     # h[v]: Shortest dist from s
del G[s]                      # No more need for s
for u in G:                   # The weight from u ...
for v in G[u]:              # ... to v ...
G[u][v] += h[u] - h[v]    # ... is adjusted (nonneg.)
D, P = {}, {}                 # D[u][v] and P[u][v]
for u in G:                   # From every u ...
D[u], P[u] = dijkstra(G, u) # ... find the shortest paths
for v in G:                 # For each destination ...
D[u][v] += h[v] - h[u]    # ... readjust the distance
return D, P                   # These are two-dimensional
``````
###### page 208:
• 7 A common criterion for calling a graph sparse is that m is O(n), for example. In this case, though, Johnson’s will (asymptotically) match Floyd-Warshall as long as m is O(n2/lg n), which allows for quite a lot of edges. On the other hand, Floyd-Warshall has very low constant overhead.
###### page 209:
• Listing 9-5. A Memoized Recursive Implementation of the Floyd-Warshall Algorithm
``````def rec_floyd_warshall(G):                             # All shortest paths
@memo                                                # Store subsolutions
def d(u,v,k):                                        # u to v via 1..k
if k==0: return G[u][v]                            # Assumes v in G[u]
return min(d(u,v,k-1), d(u,k,k-1) + d(k,v,k-1))    # Use k or not?
return {(u,v): d(u,v,len(G)) for u in G for v in G}  # D[u,v] = d(u,v,n)
``````
###### page 210:
• Listing 9-6. The Floyd-Warshall Algorithm, Distances Only
``````def floyd_warshall(G):
D = deepcopy(G) # No intermediates yet
for k in G:     # Look for shortcuts with k
for u in G:
for v in G:
D[u][v] = min(D[u][v], D[u][k] + D[k][v])
return D
``````
• Listing 9-7. The Floyd-Warshall Algorithm
``````def floyd_warshall(G):
D, P = deepcopy(G), {}
for u in G:
for v in G:
if u == v or G[u][v] == inf:
P[u,v] = None
else:
P[u,v] = u
for k in G:
for u in G:
for v in G:
shortcut = D[u][k] + D[k][v]
if shortcut < D[u][v]:
D[u][v] = shortcut
P[u,v] = P[k,v]
return D, P
``````
###### page 211:
• Listing 9-8. Dijkstra’s Algorithm Implemented as a Generator
``````def idijkstra(G, s):
Q, S = [(0,s)], set()            # Queue w/dists, visited
while Q:                         # Still unprocessed nodes?
d, u = heappop(Q)              # Node with lowest estimate
if u in S: continue            # Already visited? Skip it
S.add(u)                       # We've visited it now
yield u, d                     # Yield a subsolution/node
for v in G[u]:                 # Go through all its neighbors
heappush(Q, (d+G[u][v], v))  # Add to queue, w/est. as pri
``````
###### page 212:
• Note that I’ve dropped the use of relax completely—it is now implicit in the heap. Or, rather, heappush is the new relax. Re-adding a node with a better estimate means it will take precedence over the old entry, which is equivalent to overwriting the old one with a relax operation. This is analogous to the implementation of Prim’s algorithm in Chapter 7.
###### page 213:
• Listing 9-9. The Bidirectional Version of Dijkstra’s Algorithm
``````from itertools import cycle
def bidir_dijkstra(G, s, t):
Ds, Dt = {}, {}                             # D from s and t, respectively
forw, back = idijkstra(G,s), idijkstra(G,t) # The "two Dijkstras"
dirs = (Ds, Dt, forw), (Dt, Ds, back)       # Alternating situations
try:                                        # Until one of forw/back ends
for D, other, step in cycle(dirs):        # Switch between the two
v, d = next(step)                       # Next node/distance for one
D[v] = d                                # Remember the distance
if v in other: break                    # Also visited by the other?
except StopIteration: return inf            # One ran out before they met
m = inf                                     # They met; now find the path
for u in Ds:                                # For every visited forw-node
for v in G[u]:                            # ... go through its neighbors
if not v in Dt: continue                # Is it also back-visited?
m = min(m, Ds[u] + G[u][v] + Dt[v])     # Is this path better?
return m                                    # Return the best path
``````
• By now you’ve seen that the basic idea of traversal is pretty versatile, and by simply using different queues, you get several useful algorithms. For example, for FIFO and LIFO queues, you get BFS and DFS, and with the appropriate priorities, you get the core of Prim’s and Dijkstra’s algorithms.
###### page 215:
• Listing 9-10. The A* Algorithm
``````from heapq import heappush, heappop
def a_star(G, s, t, h):
P, Q = {}, [(h(s), None, s)]     # Preds and queue w/heuristic
while Q:                         # Still unprocessed nodes?
d, p, u = heappop(Q)           # Node with lowest heuristic
if u in P: continue            # Already visited? Skip it
P[u] = p                       # Set path predecessor
if u == t: return d - h(t), P  # Arrived! Ret. dist and preds
for v in G[u]:                 # Go through all neighbors
w = G[u][v] - h(u) + h(v)    # Modify weight wrt heuristic
heappush(Q, (d + w, u, v))   # Add to queue, w/heur as pri
return inf, None                 # Didn't get to t
``````
• Of course, in order to get any benefit from the A* algorithm, you need a good heuristic. What this function should be will depend heavily on the exact problem you’re trying to solve, of course. For example, if you’re navigating a road map, you’d know that the Euclidean distance, as the crow flies, from a given node to your destination must be a valid heuristic (lower bound). This would, in fact, be a usable heuristic for any movement on a flat surface, such as monsters walking around in a computer game world. If there are lots of blind alleys and twists and turns, though, this lower bound may not be very accurate.
###### page 216:
• Listing 9-11. An Implicit Graph with Word Ladder Paths
``````from string import ascii_lowercase as chars
class WordSpace:           # An implicit graph w/utils

def __init__(self, words): # Create graph over the words
self.words = words
self.M = M = dict()      # Reachable words

def variants(self, wd, words): # Yield all word variants
wasl = list(wd)              # The word as a list
for i, c in enumerate(wasl): # Each position and character
for oc in chars:           # Every possible character
if c == oc: continue     # Don't replace with the same
wasl[i] = oc             # Replace the character
ow = ''.join(wasl)       # Make a string of the word
if ow in words:          # Is it a valid word?
yield ow               # Then we yield it
wasl[i] = c                # Reset the character

def __getitem__(self, wd): # The adjacency map interface
if wd not in self.M:     # Cache the neighbors
self.M[wd] = dict.fromkeys(self.variants(wd, self.words), 1)
return self.M[wd]

def heuristic(self, u, v):               # The default heuristic
return sum(a!=b for a, b in zip(u, v)) # How many characters differ?

def ladder(self, s, t, h=None):   # Utility wrapper for a_star
if h is None:                   # Allows other heuristics
def h(v):
return self.heuristic(v, t)
_, P = a_star(self, s, t, h)    # Get the predecessor map
if P is None:
return [s, None, t]  # When no path exists
u, p = t, []
while u is not None:  # Walk backward from t
p.append(u)         # Append every predecessor
u = P[u]            # Take another step
p.reverse()           # The path is backward
return p
``````
``````>>> wds = set(line.strip().lower() for line in open("/usr/share/dict/words"))
>>> G = WordSpace(words)
``````
###### page 217:
• this chapter dealt with finding optimal routes in network-like structures and spaces—in other words, shortest paths in graphs.

• One fundamental tactic common to all the shortest path algorithms is that of looking for shortcuts, either through a new possible next-to-last node along a path, using the relax function or something equivalent (most of the algorithms do this), or by considering a shortcut consisting of two subpaths, to and from some intermediate node (the strategy of Floyd-Warshall).

## · Chapter 10: Matchings, Cuts, and Flows

###### page 221:
• The max-flow problem and its variations have almost endless applications. Douglas B. West, in his book Graph Theory (see References, Chapter 2), gives some rather obvious ones, such as determining the total capacities of road and communication networks, or even working with currents in electrical circuits. Kleinberg and Tardos (see “References” in Chapter 1) explain how to apply the formalism to survey design, airline scheduling, image segmentation, project selection, baseball elimination, and assigning doctors to holidays. Ahuja, Magnanti, and Orlin have written one of the most thorough books on the subject and cover well over a hundred applications in such diverse areas as engineering, manufacturing, scheduling, management, medicine, defense, communication, public policy, mathematics, and transportation.
###### page 224:
• Example of pairing as many brides/grooms as possible without regard to preference.

• Listing 10-1. Finding a Maximum Bipartite Matching Using Augmenting Paths

``````from collections import defaultdict
from itertools import chain

def match(G, X, Y):                              # Maximum bipartite matching
H = tr(G)                                      # The transposed graph
S, T, M = set(X), set(Y), set()                # Unmatched left/right + match
while S:                                       # Still unmatched on the left?
s = S.pop()                                  # Get one
Q, P = {s}, {}                               # Start a traversal from it
while Q:                                     # Discovered, unvisited
u = Q.pop()                                # Visit one
if u in T:                                 # Finished augmenting path?
T.remove(u)                              # u is now matched
break                                    # and our traversal is done
forw = (v for v in G[u] if (u,v) not in M) # Possible new edges
back = (v for v in H[u] if (v,u) in M)     # Cancellations
for v in chain(forw, back):                # Along out- and in-edges
if v in P: continue                      # Already visited? Ignore
P[v] = u                                 # Traversal predecessor
while u != s:                                # Augment: Backtrack to s
u, v = P[u], u                             # Shift one step
if v in G[u]:                              # Forward edge?
else:                                      # Backward edge?
M.remove((v,u))                          # Cancellation
return M                                       # Matching -- a set of edges
``````
###### page 225:
• The easiest (and most common) solution is to specify two special nodes, s and t, called the source and the sink. (Such a graph is often called an s-t graph, or an s-t-network.)

• We can do that by introducing two rules: Disjoint path

1. The number of paths going into any node except s or t must equal the number of paths going out of that node.

2. At most one path can go through any given edge.

###### page 227:
• Maximum flow

• More generally, the metaphor here is some form of substance flowing through the network, from the source to the sink, and the capacity represents the limit for how many units can flow through a given edge.

###### page 228:
• magnitude of the flow is the total amount pushed through the network. (This can be found by finding the net flow out of the source, for example.)

• Flow rules

1. The amount of flow going into any node except s or t must equal the amount of flow going out of that node.

2. At most c(e) units of flow can go through any given edge. [c(e) is capacity of edge]

• Flow BACK is always zero, but flow can move “backwards” if there is forward flow that can be cancelled.
###### page 230:
• Listing 10-3. Finding Augmenting Paths with BFS and Labeling
``````from collections import deque
inf = float('inf')

def bfs_aug(G, H, s, t, f):
P, Q, F = {s: None}, deque([s]), {s: inf} # Tree, queue, flow label
def label(inc):                           # Flow increase at v from u?
if v in P or inc <= 0: return           # Seen? Unreachable? Ignore
F[v], P[v] = min(F[u], inc), u          # Max flow here? From where?
Q.append(v)                             # Discovered -- visit later
while Q:                                  # Discovered, unvisited
u = Q.popleft()                         # Get one (FIFO)
if u == t: return P, F[t]               # Reached t? Augmenting path!
for v in G[u]: label(G[u][v]-f[u,v])    # Label along out-edges
for v in H[u]: label(f[v,u])            # Label along in-edges
return None, 0                            # No augmenting path found
``````
• Listing 10-4. The Ford-Fulkerson Method (by Default, the Edmonds-Karp Algorithm)
``````def ford_fulkerson(G, s, t, aug=bfs_aug): # Max flow from s to t
H, f = tr(G), defaultdict(int)          # Transpose and flow
while True:                             # While we can improve things
P, c = aug(G, H, s, t, f)             # Aug. path and capacity/slack
if c == 0: return f                   # No augm. path found? Done!
u = t                                 # Start augmentation
while u != s:                         # Backtrack to s
u, v = P[u], u                      # Shift one step
if v in G[u]: f[u,v] += c           # Forward edge? Add slack
else: f[v,u] -= c                   # Backward edge? Cancel slack
``````
###### page 231:
• Minimum cuts have several applications that don’t really look like max-flow problems. Consider, for example, the problem of allocating processes to two processors in a manner that minimizes the communication between them. Let’s say one of the processors is a GPU and that the processes have different running times on the two processors. Some fit the CPU better, while some should be run on the GPU. However, there might be cases where one fits on the CPU and one on the GPU, but where the two communicate extensively with each other. In that case, we might want to put them on the same processor, just to reduce the communication costs. How would we solve this? We could set up an undirected flow network with the CPU as the source and the GPU as the sink, for example. Each process would have an edge to both source and sink, with a capacity equal to the time it would take to run on that processor. We also add edges between processes that communicate, with capacities representing the communication overhead (in extra computation time) of having them on separate processors. The minimum cut would then distribute the processes on the two processors in such a way that the total cost is as small as possible—a nontrivial task if we couldn’t reduce to the min-cut problem.
###### page 234:
• Listing 10-5. The Busacker-Gowen Algorithm, Using Bellman-Ford for Augmentation
``````def busacker_gowen(G, W, s, t):            # Min-cost max-flow
def sp_aug(G, H, s, t, f):               # Shortest path (Bellman-Ford)
D, P, F = {s:0}, {s:None}, {s:inf,t:0} # Dist, preds and flow
def label(inc, cst):                   # Label + relax, really
if inc <= 0: return False            # No flow increase? Skip it
d = D.get(u,inf) + cst               # New possible aug. distance
if d >= D.get(v,inf): return False   # No improvement? Skip it
D[v], P[v] = d, u                    # Update dist and pred
F[v] = min(F[u], inc)                # Update flow label
return True                          # We changed things!
for rnd in G:                          # n = len(G) rounds
changed = False                      # No changes in round so far
for u in G:                          # Every from-node
for v in G[u]:                     # Every forward to-node
changed |= label(G[u][v]-f[u,v], W[u,v])
for v in H[u]:                     # Every backward to-node
changed |= label(f[v,u], -W[v,u])
if not changed: break                # No change in round: Done
else:                                # Not done before round n?
raise ValueError('negative cycle') # Negative cycle detected
return P, F[t]                         # Preds and flow reaching t
return ford_fulkerson(G, s, t, sp_aug)   # Max-flow with Bellman-Ford
``````
###### page 237:
• This chapter deals with a single core problem, finding maximum flows in flow networks, as well as specialized versions, such as maximum bipartite matching and finding edge-disjoint paths. You also saw how the minimum cut problem is the dual of the maximum flow problem, giving us two solutions for the price of one. Solving the minimum cost flow problem is also a close relative

## · Chapter 11: Hard Problems and (Limited) Sloppiness

###### page 242:
• Let’s say you’ve come to a small town where one of the main attractions is a pair of twin mountain peaks. The locals have affectionately called the two Castor and Pollux, after the twin brothers from Greek and Roman mythology. It is rumored that there’s a long-forgotten gold mine on the top of Pollux, but many an adventurer has been lost to the treacherous mountain. In fact, so many unsuccessful attempts have been made to reach the gold mine that the locals have come to believe it can’t be done. You decide to go for a walk and take a look for yourself.

• After stocking up on donuts and coffee at a local roadhouse, you set off. After a relatively short walk, you get to a vantage point where you can see the mountains relatively clearly. From where you’re standing, you can see that Pollux looks like a really hellish climb—steep faces, deep ravines, and thorny brush all around it. Castor, on the other hand, looks like a climber’s dream. The sides slope gently, and it seems there are lots of handholds all the way to the top. You can’t be sure, but it seems like it might be a nice climb. Too bad the gold mine isn’t up there.

• You decide to take a closer look, and pull out your binoculars. That’s when you spot something odd. There seems to be a small tower on top of Castor, with a zip line down to the peak of Pollux. Immediately, you give up any plans you had to climb Castor. Why? (If you don’t immediately see it, it might be worth pondering for a bit.)1

• Of course, we’ve seen the exact situation before, in the discussions of hardness in Chapters 4 and 6. The zip line makes it easy to get from Castor to Pollux, so if Castor were easy, someone would have found the gold mine already.2 It’s a simple contrapositive: if Castor were easy, Pollux would be too; Pollux is not easy, so Castor can’t be either. This is exactly what we do when we want to prove that a problem (Castor) is hard. We take something we know is hard (Pollux) and show that it’s easy to solve this hard problem using our new, unknown one (we uncover a zip line from Castor to Pollux).

• As I’ve mentioned before, this isn’t so confusing in itself. It can be easy to confuse things when we start talking about it in terms of reductions, though. For example, is it obvious to you that we’re reducing Pollux to Castor here? The reduction is the zip line, which lets us use a solution to Castor as if it were a solution to Pollux. In other words, if you want to prove that problem X is hard, find some hard problem Y and reduce it to X.

• Caution The zip line goes in the opposite direction of the reduction. It’s crucial that you don’t get this mixed up, or the whole idea falls apart. The term reduction here means basically “Oh, that’s easy, you just …” In other words, if you reduce A to B, you’re saying “You want to solve A? That’s easy, you just solve B.” Or in this case: “You want to scale Pollux? That’s easy, just scale Castor (and take the zip line).” In other words, we’ve reduced the scaling of Pollux to the scaling of Castor (and not the other way around).

• Hard Problems:

1. The problem is intractable—any algorithm solving it must be exponential.

2. We don’t know whether the problem is intractable, but no one has ever been able to find a polynomial algorithm for it.

###### page 256:
• I have talked about bipartite graphs several times already—graphs whose nodes can be partitioned into two sets so that all edges are between the sets (that is, no edges connect nodes in the same set). Another way of viewing this is as a two-coloring, where you color every node as either black or white (for example), but you ensure that no neighbors have the same color. If this is possible, the graph is bipartite.
###### page 261:
• Some hard problems can be made easy or at least solvable it we allow ourselves to get sloppy. An approximation ratio is formed by the optimal solution and the and how good you can actually get.

• Let’s consider the maximization case. If we know that the optimum will never be greater than A, and we know our approximation will never be smaller than B, we can be certain that the ratio of the two will never be greater than A/B.

###### page 262:
• A Hamilton cycle connects all nodes, and the absolutely cheapest way of connecting all nodes is using a minimum spanning tree.

• Listing 11-1. The “Twice Around the Tree” Algorithm, a 2-Approximation for Metric TSP

``````from collections import defaultdict # 2-approx for metric TSP
def mtsp(G, r):                     # Tree and cycle
T, C = defaultdict(list), []      # Build a traversable MSP
for c, p in prim(G, r).items():   # Child is parent's neighbor
T[p].append(c)                  # Recursive DFS
def walk(r):                      # Preorder node collection
C.append(r)                     # Visit subtrees recursively
for v in T[r]: walk(v)          # Traverse from the root
walk(r)                           # At least half-optimal cycle
return C
``````
###### page 265:
• Note The nonlocal keyword, which is used in Listing 11-2, lets you modify a variable in a surrounding scope, just like global lets you modify the global scope. However, this feature is new in Python 3.0. If you want similar functionality in earlier Pythons, simply replace the initial sol = 0 by sol = [0], and later access the value using the expression sol[0] instead of just sol .
###### page 266:
• Listing 11-2. Solving the Knapsack Problem with the Branch and Bound Strategy
``````from __future__ import division
from heapq import heappush, heappop
from itertools import count

def bb_knapsack(w, v, c):
sol = 0               # Solution so far
n = len(w)            # Item count
idxs = list(range(n))
idxs.sort(key=lambda i: v[i]/w[i], reverse=True) # Sort by descending unit cost

def bound(sw, sv, m):                     # Greedy knapsack bound
if m == n: return sv                    # No more items?
objs = ((v[i], w[i]) for i in idxs[m:]) # Descending unit cost order
for av, aw in objs:                     # Added value and weight
if sw + aw > c: break                 # Still room?
sw += aw                              # Add wt to sum of wts
sv += av                              # Add val to sum of vals
return sv + (av/aw)*(c-sw)              # Add fraction of last item

def node(sw, sv, m):                  # A node (generates children)
nonlocal sol                        # "Global" inside bb_knapsack
if sw > c: return                   # Weight sum too large? Done
sol = max(sol, sv)                  # Otherwise: Update solution
if m == n: return                   # No more objects? Return
i = idxs[m]                         # Get the right index
ch = [(sw, sv), (sw+w[i], sv+v[i])] # Children: without/with m
for sw, sv in ch:                   # Try both possibilities
b = bound(sw, sv, m+1)            # Bound for m+1 items
if b > sol:                       # Is the branch promising?
yield b, node(sw, sv, m+1)      # Yield child w/bound

num = count()                       # Helps avoid heap collisions
Q = [(0, next(num), node(0, 0, 0))] # Start with just the root
while Q:                            # Any nodes left?
_, _, r = heappop(Q)              # Get one
for b, u in r:                    # Expand it ...
heappush(Q, (b, next(num), u)) # ... and push the children
return sol                          # Return the solution
``````

## · PEDAL TO THE METAL: ACCELERATING PYTHON (Appendix A)

###### page 272:
• GPULib and PyStream. These two packages let you use graphics processing units (GPUs) to accelerate your code. They don’t provide the kind of drop-in acceleration that a JIT compiler such as Psyco would, but if you have a powerful GPU, why not use it? Of the two projects, PyStream is older, and the efforts of Tech-X Corporation have shifted to the newer GPULib project. It gives you a high-level interface for various forms of numeric computation using GPUs.

• Pyrex, Cython, and Shedskin. These three projects let you translate Python code into C or C++. Shedskin compiles plain Python code into C++, while Pyrex and Cython (which is a fork of Pyrex) primarily target C. In Cython (and Pyrex), you can add optional type declarations to your code, stating that a variable is (and will always be) an integer, for example. In Cython, there is also interoperability support for NumPy arrays, letting you write low-level code that accesses the array contents very efficiently. I have used this in my own code, achieving speedup factors of up to 300–400 for suitable code. The code that is generated by Pyrex and Cython can be compiled directly to an extension module that can be imported into Python.

• SWIG, F2PY and Boost.Python. These tools let you wrap C/C++, Fortran, and C++ code, respectively. Although you could write your own wrapper code for accessing your extension modules, using a tool like one of these takes a lot of the tedium out of the job—and makes it more likely that the result will be correct. For example, when using SWIG, you run a command-line tool on your C (or C++) header files, and wrapper code is generated. A bonus to using SWIG is that it can generate wrappers for a lot of other languages, beside Python, so your extension could be available for Java or PHP as well, for example.

• ctypes, llvm-py, and CorePy. These are modules that let you manipulate low-level code objects in your Python code. The ctypes module lets you build C objects in memory and call C functions in shared libraries (such as DLLs) with those objects as parameters. The llvm-py package gives you a Python API to the LLVM, mentioned earlier, which lets you build code and then compile it efficiently. If you wanted, you could use this to build your own compiler (perhaps for a language of your own?) in Python. CorePy also lets you manipulate and efficiently run code objects, although it works at the assembly level. (Note that ctypes is part of the Python standard library.)

## · LIST OF PROBLEMS AND ALGORITHMS (Appendix B)

• In most descriptions in this appendix, n refers to the problem size (such as the number of elements in a sequence). For the special case of graphs, though, n refers to the number of nodes, and m refers to the number of edges.

• Cliques and independent sets. A clique is a graph where there is an edge between every pair of nodes. The main problem of interest here is finding a clique in a larger graph (that is, identifying a clique as a subgraph). An independent set in a graph is a set of nodes where no pair is connected by an edge. In other words, finding an independent set is equivalent to taking the complement of the edge set and finding a clique. Finding a k-clique (a clique of k nodes) or finding the largest clique in a graph (the max- clique problem) is NP-hard. (For more information, see Chapter 11.)

• Closest pair. Given a set of points in the Euclidean plane, find the two points that are closest to each other. This can be solved in loglinear time using the divide-and-conquer strategy (see Chapter 6).

• Compression and optimal decision trees. A Huffman tree is a tree whose leaves have weights (frequencies), and the sum of their weights multiplied by their depth is as small as possible. This makes such trees useful for constructing compression codes and as decision trees when a probability distribution is known for the outcomes. Huffman trees can be built using Huffman’s algorithm, described in Chapter 7 (Listing 7-1).

###### page 276:
• Connected and strongly connected components. An undirected graph is connected if there is a path from every node to every other. A directed graph is connected if its underlying undirected graph is connected. A connected component is a maximal subgraph that is connected. Connected components can be found using traversal algorithms such as DFS (Listing 5-5) or BFS (Listing 5-9), for example. If there is a (directed) path from every node to every other in a directed graph, it is called strongly connected. A strongly connected component (SCC) is a maximal subgraph that is strongly connected. SCCs can be found using Kosaraju’s algorithm (Listing 5-10).

• Convex hulls. A convex hull is the minimum convex region containing a set of points in the Euclidean plane. Convex hulls can be found in loglinear time using the divide-and-conquer strategy (see Chapter 6).

• Finding the minimum/maximum/median. Finding the minimum and maximum of a sequence can be found in linear time by a simple scan. Repeatedly finding and extracting the maximum or minimum in constant time, given linear-time preparation, can be done using a binary heap. It is also possible to find the kth smallest element of a sequence (the median for k = n/2) in linear (or expected linear) time, using the select or randomized select. (For more information, see Chapter 6).

• Flow and cut problems. How many units of flow can be pushed through a network with flow capacities on the edges? That is the max-flow problem. An equivalent problem is finding the set of edge capacities that most constrain the flow; this is the min-cut problem. There are several versions of these problems. For example, you could add costs to the edges, and find the cheapest of the maximum flows. You could add lower bound on each edge and look for a feasible flow. You could even add separate supplies and demands in each node. These problems are dealt with in detail in Chapter 10.

• Graph coloring. Try to color the nodes of a graph so that no neighbors share a color. Now try to do this with a given number of colors, or even to find the lowest such number (the chromatic number of the graph). This is an NP-hard problem in general. If, however, you’re asked to see if a graph is two-colorable (or bipartite), the problem can be solved in linear time using simple traversal. The problem of finding a clique cover is equivalent to finding an independent set cover, which is an identical problem to graph coloring. (See Chapter 11 for more on graph coloring.)

• Hamilton cycles/paths and TSP … and Euler tours. Several path and subgraph problems can be solved efficiently. If, however, you want to visit every node exactly once, you’re in trouble. Any problem involving this constraint is NP-hard, including finding a Hamilton cycle (visit every node once and return), a Hamilton path (visit every node once, without returning), or a shortest tour of a complete graph (the Traveling Salesman/Salesrep problem). The problems are NP-hard both for the directed and undirected case (see Chapter 11). The related problem of visiting every edge exactly once, though— finding a so-called Euler tour—is solvable in polynomial time (see Chapter 5). The TSP problem is NP- hard even for special cases such as using Euclidean distances in the plane, but it can be efficiently approximated to within a factor of 1.5 for this case, and for any other metric distance. Approximating the TSP problem in general, though, is NP-hard. (See Chapter 11 for more information.)

• Longest increasing subsequence. Find the longest subsequence of a given sequence whose elements are in increasing order. This can be solved in loglinear time using dynamic programming (see Chapter 8).

• Matching. There are many matching problems, all of which involve linking some object to others. The problems discussed in this book are bipartite matching and min-cost bipartite matching (Chapter 10) and the stable marriage problem (Chapter 7). Bipartite matching (or maximum bipartite matching) involves finding the greatest subset of edges in a bipartite graph so that no two edges in the subset share a node. The min-cost version does the same, but minimizes the sum of edge costs over this subset. The stable marriage problem is a bit different; there, all men and women have preference rankings of the members of the opposite sex. A stable set of marriages is characterized by the fact that you can’t find a pair that would rather have each other than their current mates.

###### page 277:
• Minimum spanning trees. A spanning tree is a subgraph whose edges form a tree over all the nodes of the original graph. A minimum spanning tree is one that minimizes the sum of edge costs. Minimum spanning trees can be found using Kruskal’s algorithm (Listing 7-4) or Prim’s algorithm (Listing 7-5), for example. Because the number of edges is fixed, a maximum spanning tree can be found by simply negating the edge weights.

• Partitioning and bin packing. Partitioning involves dividing a set of numbers into two sets with equal sums, while the bin packing problem involves packing a set of numbers into a set of “bins” so that the sum in each bin is below a certain limit and so that the number of bins is as small as possible. Both problems are NP-hard. (See Chapter 11.)

• SAT, Circuit-SAT, k-CNF-SAT. These are all varieties of the satisfaction problem (SAT), which asks you to determine whether a given logical (Boolean) formula can ever be true, if you’re allowed to set the variables to whatever truth values you want. The circuit-SAT problem simply uses logical circuits rather than formulas, and k-CNF-SAT involves formulas in conjunctive normal form, where each clause consists of k literals. The latter can be solved in polynomial time for k = 2. The other problems, as well as k-CNF-SAT for k > 2, are NP-complete. (See Chapter 11.)

• Searching. This is a very common and extremely important problem. You have a key and want to find an associated value. This is, for example, how variables work in dynamic languages such as Python. It’s also how you find almost anything on the Internet these days. Two important solutions are hash tables (see Chapter 2) and binary search or search trees (see Chapter 6). Given a probability distribution for the objects in the data set, optimal search trees can be constructed using dynamic programming (see Chapter 8).

• Sequence comparison. You may want to compare two sequences to know how similar (or dissimilar) they are. One way of doing this is to find the longest subsequence the two have in common (longest common subsequence) or to find the minimum number of basic edit operations to go from one sequence to the other (so-called edit distance, or Levenshtein distance). These two problems are more or less equivalent; see Chapter 8 for more information.

• Sequence modification. Inserting an element into the middle of a linked list is cheap (constant time), but finding a given location is costly (linear time); for an array, the opposite is true (constant lookup, linear insert, because all later elements must be shifted). Appending can be done cheaply for both structures, though (see the black box sidebar on list in Chapter 2).

• Set and vertex covers. A vertex cover is a set of vertices that cover (that is, are adjacent to) all the edges of the graph. A set cover is a generalization of this idea, where the nodes are replaced with subsets, and you want to cover the entire set. The problem lies in constraining or minimizing the number of nodes/subsets. Both problems are NP-hard (see Chapter 11).

• Shortest paths. This problem involves finding the shortest path from one node to another, from one node to all the others (or vice versa), or from all nodes to all others. The one-to-one, one-to-all, and all- to-one cases are solved the same way, normally using BFS for unweighted graphs, DAG shortest path for DAGs, Dijkstra’s algorithm for nonnegative edge weights, and Bellman–Ford in the general case. To speed up things in practice (although without affecting the worst-case running time), one can also use bidirectional Dijkstra, or the A* algorithm. For the all pairs shortest paths problem, the algorithms of choice are probably Floyd–Warshall or (for sparse graphs) Johnson’s algorithm. If the edges are nonnegative, Johnson’s algorithm is (asymptotically) equivalent to running Dijkstra’s algorithm from every node (which may be more effective). (For more information on shortest path algorithms, see Chapters 5 and 9.) Note that the longest path problem (for general graphs) can be used to find Hamilton paths, which means that it is NP-hard. This, in fact, means that the shortest path problem is also NP- hard in the general case. If we disallow negative cycles in the graph, however, our polynomial algorithms will work.

###### page 278:
• Sorting and element uniqueness. Sorting is an important operation and an essential subroutine for several other algorithms. In Python, you would normally sort by using the list.sort method or the sorted function, both of which use a highly efficient implementation of the timsort algorithm. Other algorithms include insertion sort, selection sort, and gnome sort (all of which have a quadratic running time), as well as heapsort, mergesort, and quicksort (which are loglinear, although this holds only in the average case for quicksort). For the info on the quadratic sorting algorithms, see Chapter 5; for the loglinear (divide-and-conquer) algorithms, see Chapter 6. Deciding whether a set of real numbers contains duplicates cannot (in the worst case) be solved with a running time better than loglinear. By reduction, neither can sorting.

• The halting problem. Determine whether a given algorithm will terminate with a given input. The problem is undecidable (that is, unsolvable) in the general case (see Chapter 11).

• The knapsack problem and integer programming. The knapsack problem involves choosing a valuable subset of a set of items, under certain constraints. In the (bounded) fractional case, you have a certain amount of some substances, each of which has a unit value (value per unit of weight). You also have a knapsack that can carry a certain maximum weight. The (greedy) solution is to take as much as you can of each substance, starting with the one with the highest unit value. For the integral knapsack problem, you can only take entire items—fractions aren’t allowed. Each item has a weight and a value. For the bounded case (also known as 0-1 knapsack) you have a limited number of objects of each type. (Another perspective would be that you have a fixed set of objects that you either take or not.) In the unbounded case, you can take as many as you want from each of a set of object types (still respecting your carrying capacity, of course). A special case known as the subset sum problem involves selecting a subset of a set of numbers so that the subset has a given sum. These problems are all NP-hard (see Chapter 11), but admit pseudopolynomial solutions based on dynamic programming (see Chapter 8). The fractional knapsack case, as explained, can even be solved in polynomial time using a greedy strategy (see Chapter 7). Integer programming is, in some ways, a generalization of the knapsack problem (and is therefore obviously NP-hard). It is simply linear programming where the variables are constrained to be integers.

• Topological sorting. Order the nodes of a DAG so that all the edges point in the same direction. If the edges represent dependencies, a topological sorting represents an ordering that respects the dependencies. This problem can be solved by a form of reference counting (see Chapter 4) or by using DFS (see Chapter 5).

• Traversal. The problem here is to visit all the objects in some connected structure, usually represented as nodes in a graph or tree. The idea can be either to visit every node or to visit only those needed to solve some problem. The latter strategy of ignoring parts of the graph or tree is called pruning and is used (for example) in search trees and in the branch and bound strategy. For a lot on traversal, see Chapter 5.

## · ALGORITHMS AND DATA STRUCTURES (Appendix B)

• 2-3-trees. Balanced tree structure, allowing insertions, deletions, and search in worst-case Θ(lg n) time. Internal nodes can have two or three children, and the tree is balanced during insertion by splitting nodes, as needed. (See Chapter 6.)

• A*. Heuristically guided single source shortest path algorithm. Suitable for large search spaces. Instead of choosing the node with the lowest distance estimate (as in Dijkstra’s), the node with the lowest heuristic value (sum of distance estimate and guess for remaining distance) is used. Worst-case running time identical to Dijkstra’s algorithm. (See Listing 9-10.)

• AA-tree. 2-3-trees simulated using node rotations in a binary tree with level-numbered nodes. Worst- case running times of Θ(lg n) for insertions, deletions, and search. (See Listing 6-6.)

• Bellman–Ford. Shortest path from one node to all others in weighted graphs. Looks for a shortcut along every edge n times. Without negative cycles, correct answer guaranteed after n –1 iterations. If there’s improvement in the last round, a negative cycle is detected, and the algorithm gives up. Running time Θ(nm). (See Listing 9-2.)

• Bidirectional Dijkstra. Dijkstra’s algorithm run from start and end node simultaneous, with alternating iterations going to each of the two algorithms. The shortest path is found when the two meet up in the middle (although some care must be taken at this point). The worst-case running time is just like for Dijkstra’s algorithm. (See Listings 9-8 and 9-9.)

• Binary search trees. A binary tree structure where each node has a key (and usually an associated value). Descendant keys are partitioned by the node key: smaller keys go in the left subtree, and greater keys go in the right. On the average, the depth of any node is logarithmic, giving an expected insertion and search time of Θ(lg n). Without extra balancing, though (such as in the AA-tree), the tree can become unbalanced, giving linear running times. (See Listing 6-2.)

• Bisection, binary search. A search procedure that works in a manner similar to search trees, by repeated halving the interval of interest in a sorted sequence. The halving is performed by inspecting the middle element, and deciding whether the sought value must lie to the left or right. Running time Θ(lg n). A very efficient implementation can be found in the bisect module. (See Chapter 6.)

• Branch and bound. A general algorithmic design approach. Searches a space of solutions in a depth-first or best-first order by building and evaluating partial solutions. A conservative estimate is kept for the optimal value, while an optimistic estimate is computed for a partial solution. If the optimistic estimate is worse than the conservative one, the partial solution is not extended, and the algorithm backtracks. Often used to solve NP-hard problems. (See Listing 11-2 for a branch-and-bound solution to the 0-1 knapsack problem.)

• Breadth-First Search (BFS). Traversing a graph (possibly a tree) level by level, thereby also identifying (unweighted) shortest path. Implemented by using a FIFO queue to keep track of discovered nodes. Running time Θ(n + m). (See Listing 5-9.)

• Bucket sort. Sort numerical values that are evenly (uniformly) distributed in a given interval by dividing the interval into n equal-sized buckets and placing the values in them. Expected bucket size is constant, so they can be sorted with (for example) insertion sort. Total running time Θ(n). (See Chapter 4.)

• Busacker–Gowen. Finds the cheapest max-flow (or the cheapest flow with a given flow value) in a network by using the cheapest augmenting paths in the Ford–Fulkerson approach. These paths are found using Bellman–Ford or (with some weight adjustments) Dijkstra’s algorithm. The running time in general depends on the maximum flow value and so is pseudopolynomial. For a maximum flow of k, the running time is (assuming Dijkstra’s algorithm is used) O(km lg n). (See Listing 10-5.)

###### page 280:
• Christofides’ algorithm. An approximation algorithm (with an approximation ratio bound of 1.5) for the metric TSP problem. Finds a minimum spanning tree and then a minimum matching2 among the odd- degree nodes of the tree, short-circuiting as needed to make a valid tour of the graph. (See Chapter 11.)

• Counting sort. Sort integers with a small value range (with at most Θ(n) contiguous values) in Θ(n) time. Works by counting occurrences and using the cumulative counts to directly place the numbers in the result, updating the counts as it goes. (See Chapter 4.)

• DAG Shortest Path. Finds the shortest path from one node to all others in a DAG. Works by finding a topological sorting of the nodes and then relaxing all out-edges (or, alternatively, all in-edges) at every node from left to right. Can (because of the lack of cycles) also be used to find longest paths. Running time Θ(n + m). (See Listing 8-4.)

• Depth-First Search (DFS). Traversing a graph (possibly a tree) by going in depth and then backtracking. Implemented by using a LIFO queue to keep track of discovered nodes. By keeping track of discover- and finish-times, DFS can also be used as a subroutine in other algorithms (such as topological sorting or Kosaraju’s algorithm). Running time Θ(n + m). (See Listings 5-4, 5-5, and 5-6.)

• Dijkstra’s algorithm. Find the shortest paths from one node to all others in a weighted graph, as long as there are no negative edge weights. Traverses the graph, repeatedly selecting the next node using a priority queue (a heap). The priority is the current distance estimate of the node. These estimates are updated whenever a shortcut is found from a visited node. The running time is Θ((m+n) lg n), which is simply Θ(m lg n) if the graph is connected.

• Double-ended queues. FIFO queues implemented using linked lists (or linked lists of arrays), so that inserting and extracting objects at either end can be done in constant time. An efficient implementation can be found in the collections.deque class. (See Chapter 5.)

• Dynamic arrays, vectors. The idea of having extra capacity in an array, so appending is efficient. By relocating the contents to a bigger array, growing it by a constant factor, when it fills up, appends can be constant in average (amortized) time. (See Chapter 2.)

• Edmonds–Karp. The concrete instantiation of the Floyd–Warshall method where traversal is performed using BFS. Finds min-cost flow in Θ(nm2) time. (See Listing 10-4.)

• Floyd–Warshall. Finds shortest paths from each nodes to all others. In iteration k, only the first k nodes (in some ordering) are allowed as intermediate nodes along the paths. Extending from k –1 involves checking whether the shortest paths to and from k via the first k –1 nodes is shorter than simply going directly via these nodes. (That is, node k is either used or not, for every shortest path.) Running time is Θ(n3). (See Listing 9-6.)

• Ford–Fulkerson. A general approach to solving max-flow problems. The method involves repeatedly traversing the graph to find a so-called augmenting path, a path along which the flow can be increased (augmented). The flow can be increased along an edge if it has extra capacity, or it can be increased backward across an edge (that is, canceled) if there is flow along the edge. Thus, the traversal can move both forward and backward along the directed edges, depending on the flow across them. The running time depends on the traversal strategy used. (See Listing 10-4.)

• Gale–Shapley. Finds a stable set of marriages given preference rankings for a set of men and women. Any unengaged men propose to the most preferred woman they haven’t proposed to. Each woman will choose her favorite among her current suitors (possibly staying with her fiancé). Can be implemented with quadratic running time. (See Chapter 7.)

###### page 281:
• Gnome sort. A simple sorting algorithm with quadratic running time. Probably not an algorithm you’ll use in practice. (See Listing 3-1.)

• Hashing, hash tables. Look up a key to get the corresponding value, just like in a search tree. Entries are stored in an array, and their positions are found by computing a (pseudorandom, sort of) hash value of the key. Given a good hash function and enough room in the array, the expected running time of insertion, deletion and lookup is Θ(1). (See Chapter 2.)

• Heaps, heapsort. Heaps are efficient priority queues. With linear-time preprocessing, a min- (max-) heap will let you find the smallest (largest) element in constant time and extract or replace it in logarithmic time. Adding an element can also be done in logarithmic time. Conceptually, a heap is a full binary tree where each node is smaller (larger) than its children. When modifications are made, this property can be repaired with Θ(lg n) operations. In practice, heaps are usually implemented using arrays (with nodes encoded as array entries). A very efficient implementation can be found in the heapq module. Heapsort is like selection sort, except that the unsorted region is a heap, so finding the largest element n times gives a total running time of Θ(n lg n). (See Chapter 6.)

• Huffman’s algorithm. Builds Huffman trees, which can be used for building optimal prefix codes, for example. Initially, each element (for example, character in an alphabet) is made into a single-node tree, with a weight equal to its frequency. In each iteration, the two lightest trees are picked, combining them with a new root, giving the new tree a weight equal to the sum of the original two tree weights. This can be done in loglinear time (or, in fact, in linear time if the frequencies are presorted). (See Listing 7-1.)

• Insertion sort. A simple sorting algorithm with quadratic running time. It works by repeatedly inserting the next unsorted element in an initial sorted segment of the array. For small data sets, it can actually be preferable to more advanced (and optimal) algorithms such as merge sort or quicksort. (In Python, though, you should use list.sort or sorted if at all possible.) (See Listing 4-3.)

• Interpolation search. Similar to ordinary binary search, but linear interpolation between the interval endpoints is used to guess the correct position, rather than simply looking at the middle element. The worst-case running time is still Θ(lg n), but the average-case running time is O(lg lg n) for uniformly distributed data. (Mentioned in the “If You’re Curious …” section of Chapter 6.)

• Iterative deepening DFS. Repeated runs of DFS, where each run has a limit to how far it can traverse. For structures with some fanout, the running time will be the same as for DFS or BFS (that is, Θ(n + m)). The point is that it has the advantages of BFS (it finds shortest paths, and explores large state spaces conservatively), with the smaller memory footprint of DFS. (See Listing 5-8.)

• Johnson’s algorithm. Finds shortest paths from every node to all others. Basically runs Dijkstra’s from every node. However, it uses a trick so that it also works with negative edge weights: it first runs Bellman–Ford from a new start node (with added edges to all existing nodes) and then uses the resulting distances to modify the edge weights of the graph. The modified weights are all nonnegative but are set so that the shortest paths in the original graph will also be the shortest paths in the modified graph. Running time Θ(mn lg n). (See Listing 9-4.)

• Kosaraju’s algorithm. Finds strongly connected components, using DFS. First, nodes are ordered by their finish times. Then the edges are reversed, and another DFS is run, selecting start nodes using the first ordering. Running time Θ(n + m). (See Listing 5-10.)

###### page 282:
• Kruskal’s algorithm. Finds a minimum spanning tree by repeatedly adding the smallest remaining edge that doesn’t create a cycle. This cycle checking can (with some cleverness) be performed very efficiently, so the running time is dominated by sorting the edges. All in all, the running time is Θ(m lg n). (See Chapter 7.)

• Linked lists. An alternative to arrays for representing sequences. Although linked lists are cheap (constant time) to modify once you’ve found the right entries, finding those normally takes linear time. Linked lists are implemented sort of like a path, with each node pointing to the next. Note that Python’s list type is implemented as an array, not a linked list. (See Chapter 2.)

• Merge sort. The archetypal divide-and-conquer algorithm. It divides the sequence to be sorted in the middle, sorts the two halves recursively, and then merges the two sorted halves in linear time. The total running time is Θ(n lg n). (See Listing 6-5.)

• Ore’s algorithm. An algorithm for traversing actual mazes in person, by marking passage entries and exits. In many ways similar to iterative deepening DFS or BFS. (See Chapter 5.)

• Prim’s algorithm. Grows a minimum spanning tree by repeatedly adding the node closest to the tree. It is, at core, a traversal algorithm, and uses a priority queue, just like Dijkstra’s algorithm. (See Listing 7-5.)

• Radix sort. Sorts numbers (or other sequences) by digit (element), start with the least significant one. As long as the number of digits is constant, and the digits can be sorted in linear time (using, for example, counting sort), the total running time is linear. It’s important that the sorting algorithm used on the digits is stable. (See Chapter 4.)

• Randomized Select. Finds the median, or, in general, the kth order statistik (the kth smallest element). Works sort of like “half a quicksort.” It chooses a pivot element at random (or arbitrarily), and partitions the other elements to the left (smaller elements) or right (greater elements) of the pivot. The search then continues in the right portion, more or less like binary search. Perfect bisection is not guaranteed, but the expected running time is still linear. (See Listing 6-3.)

• Select. The rather unrealistic, but guaranteed linear, sibling of randomized select. It works as follows: Divide the sequence into groups of five. Find the median in each using insertion sort. Find the median of these medians recursively, using select. Use this median of medians as a pivot, and partition the elements. Now run select on the proper half. In other words, it’s very similar to randomized select—the difference is that it can guarantee that a certain percentage will end up on either side of the pivot, avoiding the totally unbalanced case. Not really an algorithm you’re likely to use in practice, but it’s important to know about. (See Chapter 6.)

• Selection sort. A simple sorting algorithm with quadratic running time. Very similar to insertion sort, but instead of repeatedly inserting the next element into the sorted section, you repeatedly find (that is, select) the largest element in the unsorted region (and swap it with the last unsorted element). (See Listing 4-5.)

• Timsort. A super-duper in-place sorting algorithm based on mergesort. Without any explicit conditions for handling special cases, it is able to take into account partially sorted sequences, including segments that are sorted in reverse, and can therefore sort many real-world sequences faster than what would seem possible. The implementation in list.sort and sorted is also really fast, so if you need to sort something, that’s what you should use. (See Chapter 6.)

• Topological sorting by reference counting. Orders the nodes of a DAG so that all edges go from left to right. This is done by counting the number of in-edges at each node. The nodes with an in-degree of zero are kept in a queue (could just be a set; the order doesn’t matter). Nodes are taken from the queue and placed in the topological sorted order. As you do so, you decrement the counts for the nodes that this node has edges to. If any of them reaches zero, they are placed in the queue. (See Listing 4-11.)

###### page 283:
• Topological sorting with DFS. Another algorithm for sorting DAG nodes topologically. The idea is simple: perform a DFS and sort the nodes by inverse finish time. To easily get a linear running time, you can instead simply append nodes to your ordering as they receive their finish times in DFS. (See Listing 5-7.)

• Tremaux’s algorithm. Like Ore’s algorithm, this is designed to be executed in person, while walking through a maze. The pattern traced by a person executing Tremaux’s algorithm is essentially the same as that of DFS. (See Chapter 5.)

• Twice around the tree. An approximation algorithm for the metric TSP problem, guaranteed to yield a solution with a cost of at most twice the optimum. First, it builds a minimum spanning tree (which is less than the optimum), and then it “walks around” the tree, taking shortcuts to avoid visiting the same edge twice. Because of the metricity, this is guaranteed to be cheaper than walking each edge twice. This last traversal can be implemented by a preorder DFS. (See Listing 11-1.)

## · GRAPH TERMINOLOGY (Appendix C)

###### page 288:
• Another term for an undirected, acyclic graph is forest, and the connected components of a forest are called trees. In other words, a tree is a connected forest (that is, a forest consisting of a single connected component). For example, G1 is a forest with two trees. In a tree, nodes with a degree of one are called leaves (or external nodes),4 while all others are called internal nodes. The larger tree in G1, for example, has three leaves and two internal nodes. The smaller tree consists of only an internal node, although talking about leaves and internal nodes may not make much sense with fewer than three nodes.

• Trees have several interesting and important properties, some of which are dealt with in relation to specific topics throughout the book. I’ll give you a few right here, though. Let T be an undirected graph with n nodes. Then the following statements are equivalent (Exercise 2-9 asks you to show that this is, indeed, the case):

1. T is a tree (that is, it is acyclic and connected).
2. T is acyclic, and has n –1 edges.
3. T is connected, and has n –1 edges.
4. Any two nodes are connected by exactly one path.
5. T is acyclic, but adding any new edge to it will create a cycle.
6. T is connected, but removing any edge yields two connected components.
• In other words, any one of these statements of T, on its own, characterizes it as well as any of the others. If someone tells you that there is exactly one path between any pair of nodes in T, for example, you immediately know that it is connected and has n –1 edges and that it has no cycles. Quite often, we anchor our tree by choosing a root node (or simply root). The result is called a rooted tree, as opposed to the free trees we’ve been looking at so far. (If it is clear from the context whether a tree has a root or not, I will simply use the unqualified term tree in both the rooted and free case.) Singling out a node like this lets us define the notions of up and down. Paradoxically, computer scientists (and graph theorists in general) tend to place the root at the top and the leaves at the bottom. (We probably should get out more …). For any node, up is in the direction of the root (along the single path between the node and the root). Down is then any other direction (automatically in the direction of the leaves). Note that in a rooted tree, the root is considered an internal node, not a leaf, even if it happens to have a degree of one.