Five Sentence Abstract:
It starts out with an overview of big O running times. Next the basics of graphs and trees are discussed, with more advanced discussion later. Many of the common mathematical equations and concepts are covered as well as more advance topics including proofs, NP hardness, and recurrence equations and their applications and running times. Searching and traversing are covered with basic concepts like depth/breadth first search as well as more advanced algorithms. The appendixes list algorithms, problems, and terminology.
Thoughts:
Not for beginners, but if you have a programming language (not python per se as
python is pretty close to pseudo) under your belt and aren't afraid of equations
containing funny Greek squiggles, this can provide a lot of insight into problem
solving in general.
Graphs are certainly not anything close to new, but with the advent of social
media they have come into the "mainstream". Even so, the graphs discussed here
don't have much in common with the typical social network. They are used here
to solve problems that are (sometimes) not intuitively seen as graph problems.
It includes a LOT of code snippets that are well commented as well as list (in
the appendixes) of common algorithms and problems. It seems like it might be a
worthy reference if you are looking for an algorithmic solution, but aren't
sure which one you need. Heck, you might not even be sure what you problem is
and how it might be able to be tweaked so as to take advantage of a timetested
solution.
Notes:
// designates my notes.
designates important.
Chapter 1: Introduction
page 7:
 Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2009).
Introduction to Algorithms, second edition. MIT Press.
Chapter 2: The Basics
page 13:

The expression O(g), for some function g(n), represents a set of functions,
and a function f (n) is in this set if it satisfies the following condition:
there exists a natural number n0 and a positive constant c such that f (n) ≤
cg(n) for all n ≥ n0.

In other words, if we’re allowed to tweak the constant c (for example, by
running the algorithms on machines of different speeds), the function g will
eventually (that is, at n0) grow bigger than f. See Figure 21 for an example.

// omega, the opposite of big oh: f(n) ≥ cg(n)
page 14:

Note Our first two asymptotic operators, O and Ω, are each others’ inverses:
if f is O(g ), then g is Ω(f ).

The sets formed by Θ are simply intersections of the other two, that is, Θ(g)
= O(g) ∩ Ω(g).
page 15:
page 16:

In a sum, only the dominating summand matters. For example, Θ(n2 + n3 + 42) =
Θ(n3).

In a product, constant factors don’t matter. For example, Θ(4.2n lg n) = Θ(n
lg n).
page 20:
 If you have a tweak you think will improve your program, try it! Implement
the tweak, and run some experiments. Is there an improvement? And if the
tweak makes your code less readable and the improvement is small, is it really
worth it?
page 21:

Tip 2: For timing things, use timeit. //(module)

If you want to time a function (which could, for example, be a test function
wrapping parts of your code), it may be even easier to use timeit from the
shell command line, using the m switch: $ python m timeit s"import mymodule
as m" "m.myfunction()"
 import cProfile
cProfile.run('main()')

 want to empirically examine the behavior of your algorithm on a given problem
instance, the trace module in the standard library can be useful
page 23:

In many cases, if you can formulate what you’re
working on as a graph problem, you’re (at least) halfway to a solution. And
if your problem instances are in some form expressible as trees, you stand a
good chance of having a really efficient solution.

Trees are just a special kind of graphs, so most algorithms and
representations for graphs will work for them as well.
page 25:
 What this means to us is that accessing elements of a dict or set can be
assumed to take constant (expected) time, which makes them highly useful
building blocks for more complex structures and algorithms.
page 26:
 Listing 21. A Straightforward Adjacency Set Representation:
 a, b, c, d, e, f, g, h = range(8)
N = [
{b, c, d, e, f}, # a
{c, e}, # b
{d}, # c
{e}, # d
{f}, # e
{c, g, h}, # f
{f, h}, # g
{f, g} # h
]

page 27:
 in our code, N[v] is now a set of v’s neighbors. Assuming you have defined N
as earlier in an interactive interpreter, you can now play around with the
graph:
 >>> b in N[a] # Neighborhood membership
True
>>> len(N[f]) # Degree
3

 // Can use lists over sets, is slower but has less
overhead if all you have is smaller data sets.
page 28:

Yet another minor tweak on this idea is to use dicts instead of sets or
lists. The neighbors would then be keys in this dict, and you’d be free to
associate each neighbor (or outedge) with some extra value, such as an edge
weight. How this might look is shown in Listing 23 (with arbitrary edge
weights added).

Listing 23. Adjacency dicts with Edge Weights:
 a, b, c, d, e, f, g, h = range(8)
N = [
{b:2, c:1, d:3, e:9, f:4}, # a
{c:4, e:3}, # b
{d:8}, # c
{e:7}, # d
{f:5}, # e
{c:2, g:2, h:2}, # f
{f:1, h:6}, # g
{f:9, g:8} # h
]

 The adjacency dict version can be used just like the others, with the additional edge weight
functionality:
 >>> b in N[a] # Neighborhood membership
True
>>> len(N[f]) # Degree
3
>>> N[a][b]
# Edge weight for (a, b)
2

page 29:
 Listing 24. A Dict with Adjacency Sets:
 N = {
'a': set('bcdef'),
'b': set('ce'),
'c': set('d'),
'd': set('e'),
'e': set('f'),
'f': set('cgh'),
'g': set('fh'),
'h': set('fg')
}

 This, a dictionary with adjacency lists, is what Guido van Rossum uses in his
article "Python Patterns— Implementing Graphs," which is found online at
http://www.python.org/doc/essays/graphs.html
page 30:
 // inf or float(inf) == infinity
page 34:
 class Bunch(dict):
def __init__(self, *args, **kwds):
super(Bunch, self).__init__(*args, **kwds)
self.__dict__ = self

 There are several useful aspects to this pattern. First, it lets you create
and set arbitrary attributes by supplying them as commandline arguments:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17  >>> x = Bunch(name="Jayne Cobb", position="Public Relations")
>>> x.name
'Jayne Cobb'
Second, by subclassing dict, you get lots of functionality for free, such as iterating over the keys/attributes
or easily checking whether an attribute is present. Here’s an example:
>>> T = Bunch
>>> t = T(left=T(left="a", right="b"), right=T(left="c"))
>>> t.left
{'right': 'b', 'left': 'a'}
>>> t.left.right
'b'
>>> t['left']['right']
'b'
>>> "left" in t.right
True
>>> "right" in t.right
False

page 37:

if your program is fast enough, there’s no need to
optimize it.

// If you are doing multiple lookups in a list, it may be
faster to cast it as a set.

(without the optimizations) you need to create a new string for every +=
operation,
 >>> s = ""
>>> for chunk in input():
... s += chunk

 A better solution would be the following:
 >>> chunks = []
>>> for chunk in input():
... chunks.append(chunk)
...
>>> s = ''.join(chunks)

 You could even simplify this further:
page 38:
 >>> result = sum(lists, [])

 >>> res = []
>>> for lst in lists:
... res.extend(lst)

page 39:
 >>> sum(0.1 for i in range(10)) == 1.0
False


never compare floats for equality.

check whether they are approximately equal. For example, you could take the
approach of assertAlmostEqual from the unittest module:
 >>> def almost_equal(x, y, places=7):
... return round(abs(xy), places) == 0
...
>>> almost_equal(sum(0.1 for i in range(10)), 1.0)
True

 There are also tools you can use if you need exact decimal floatingpoint
numbers, for example the decimal module:
 >>> from decimal import *
>>> sum(Decimal("0.1") for i in range(10)) == Decimal("1.0")
True

 // sage calculates exact values, sagemath.org, but
sage is much slower.
Chapter 3: Counting 101
page 46:
page 47:
 a technical meaning of the word tournament in graph theory (a complete graph,
where each edge is assigned a direction
page 48:
 // roundrobin events are = n(n –1)/2, where n is
the number of participants.
page 49:

// In a single elimination/knockout tournament:

In the first round, all the knights are paired, so we have n/2 matches. Only
half of them go on to the second round, so there we have n/4 matches. We keep
on halving until the last match, giving us the sum n/2 + n/4 + n/8 + ... + 1,
or, equivalently, 1 + 2 + 4 + ... + n/2.

The upper limit, h –1, is the number of rounds (or h the height of the binary
tree, so 2^h = n).
page 50:
 summing up the powers of two, you always get one less than the next power of
two. For example, 1 + 2 + 4 = 8 – 1, or 1 + 2 + 4 + 8 = 16 – 1, and so
forth.
page 51:
 // Binary searching, how to solve how many "cuts" is
needed with log base 2
 >>> from random import randrange
>>> n = 10**90
>>> p = randrange(10**90)
>>> p < n/2 #cut the search in half
True
>>> from math import log
>>> log(n, 2) # basetwo logarithm
298.97352853986263 #how many guesses to always win

page 53:
page 54:
 // Permutations are calculated with factorial.
 // Combinations are calculated with binomial
coefficients. C(n,k) or "n choose k"
page 56:
 // Recursion and Recurrences
page 58:
 The general form of the recurrences you’ll normally encounter is is T(n) = a
* T(g(n)) + f(n), where a represents the number of recursive calls, g(n) is
the size of each subproblem to be solved recursively, and f(n) is any extra
work done in the function, in addition to the recursive calls.
page 59:
page 60:
page 65:
Chapter 4: Induction and Recursion... and Reduction
page 71:

Reduction means transforming one problem to another. We normally reduce an
unknown problem to one we know how to solve. The reduction may involve
transforming both the input (so it works with the new problem) and the output
(so it’s valid for the original problem).

Induction (or, mathematical induction) is used to show that a statement is
true for a large class of objects (often the natural numbers). We do this by
first showing it to be true for a base case (such as the number 1) and then
showing that it “carries over” from one object to the next (if it’s true for n
–1, then it’s true for n).

Recursion is what happens when a function calls itself. Here we need to make
sure the function works correctly for a (nonrecursive) base case and that it
combines results from the recursive calls into a valid solution.
page 78:
 There is usually less overhead with using a loop than with recursion (so it
can be faster), and in most languages (Python included), there is a limit to
how deep the recursion can go (the maximum stack depth).
page 79:
 '''Listing 41. Recursive Insertion Sort'''
def ins_sort_rec(seq, i):
if i==0: return #Base case  do nothing
ins_sort_rec(seq, i1) #Sort 0..i1
j = i #Start "walking" down
while j > 0 and seq[j1] > seq[j]: #Look for OK spot
seq[j1], seq[j] = seq[j], seq[j1] #Keep moving seq[j] down
j = 1 #Decrement j

 '''Listing 42. Insertion Sort'''
def ins_sort(seq):
for i in range(1,len(seq)): #0..i1 sorted so far
j = i #Start "walking" down
while j > 0 and seq[j1] > seq[j]: #Look for OK spot
seq[j1], seq[j] = seq[j], seq[j1] #Keep moving seq[j] down
j = 1 #Decrement j

 7 These algorithms aren’t all that useful, but they’re commonly taught,
because they serve as excellent examples. Also, they’re classics, so any
algorist should know how they work.
page 80:

Listing 43. Recursive Selection Sort

Listing 44. Selection Sort

The recursive implementation explicitly represents
the inductive hypothesis (as a recursive call), while the iterative version
explicitly represents repeatedly performing the inductive step.

These are two major variations of reductions: reducing to a different problem
and reducing to a shrunken version of the same. If you think of the
subproblems as vertices and the reductions as edges, you get the subproblem
graph discussed in Chapter 2, a concept I’ll revisit several times. (It’s
especially important in Chapter 8.)
page 81:
page 82:
 in essence, a mapping like this is just a position (0...n –1) associated with
each element (also 0...n –1), and we can implement this using a simple list.
For example, the example in Figure 44 (if a = 0, b = 1, ...) can be
represented as follows:
 >>> M = [2, 2, 0, 5, 3, 5, 7, 4]
>>>
0

 Tip When possible, try to use a representation that is as specific to your
problem as possible. More general representations can lead to more
bookkeeping and complicated code; if you use a representation that implicitly
embodies some of the constraints of the problem, both finding and implementing
a solution can be much easier.
page 83:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15  '''Listing 45. A Naïve Implementation of the Recursive Algorithm Idea
for Finding a Maximum Permutation'''
def naive_max_perm(M, A=None):
if A is None: # The elt. set not supplied?
A = set(range(len(M))) # A = {0, 1, ... , n1}
if len(A) == 1: return A # Base case  singleelt. A
B = set(M[i] for i in A) # The "pointed to" elements
C = A  B # "Not pointed to" elements
if C: # Any useless elements?
A.remove(C.pop()) # Remove one of them
return naive_max_perm(M, A) # Solve remaining problem
return A # All useful  return all
>>> naive_max_perm(M)
{0, 2, 5} # So, a, c, and f can take part in the permutation.

page 84:
 Tip In recent versions of Python, the collections module contains the Counter
class, which can count (hashable) objects for you.
page 85:
 from collections import defaultdict
def counting_sort(A, key=lambda x: x):
B, C = [], defaultdict(list) # Output and "counts"
for x in A:
C[key(x)].append(x) # "Count" key(x)
for k in range(min(C), max(C)+1): # For every key in the range
B.extend(C[k]) # Add values in sorted order
return B


By default, I’m just sorting objects based on their values. By supplying a
key function, you can sort by anything you’d like.

// key() in C[].append() is interesting.
page 87:

the tasks to be undertaken will have dependencies that partially restrict
their ordering.

Such dependencies are (as mentioned in Chapter 2) easily represented as a
directed acyclic graph (DAG), and finding an ordering that respect the
dependencies (so that all the edges point forward in the ordering) is called
topological sorting. Figure 45 illustrates the concept.
page 88:
 Tip If you’re using a Unix system of some sort, you can play around with
topological sorting of graphs described in plaintext files, using the tsort
command.
page 89:
 Tip If a problem reminds you of a problem or an algorithm you already know,
that’s probably a good sign. In fact, building a mental archive of problems
and algorithms is one of the things that can make you a skilled algorist. If
you’re faced with a problem and you have no immediate associations, you could
systematically consider any relevant (or semirelevant) techniques you know and
look for reduction potential.
page 91:
 A balance factor is defined for each internal node and is set to the
difference between the heights of the left and right subtrees, where height
is the greatest distance from the node (downward) to a leaf. For example, the
left child of the root in Figure 46 has a balance factor of –2 because its
left subtree is a leaf (with a height of 0), while its right child has a height
of 2.
page 93:
 relaxation. You are in an airport, and you can reach several other airports
by plane. From each of those airports, you can take the train to several
towns and cities. Let’s say that you have a dict or list of flight times, A, so
that A[u] is the time it will take you to get to airport u. Similarly, B[u][v]
will give you the time it takes to get from airport u to town v by train. (B
can be a list of lists or a dict of dicts, for example; see Chapter 2).
Consider the following randomized way of estimating the time it will take you
to get to each town, C[v]:
 for v in range(n):
C[v] = float('inf')
for i in range(N):
u, v = randrange(n), randrange(n)
C[v] = min(C[v], A[u] + B[u][v]) # Relax

 The idea here is to repeatedly see whether we can improve our estimate for
C[v] by choosing another route. First go to u by plane, and then you take the
train to v. If that gives us a better total time, we update C. As long as N is
really large, we will eventually get the right answer for every town.
page 95:

If you can (easily) reduce A to B, then B is at least as hard as A.

If you want to show that X is hard and you know that Y is hard, reduce Y to
X.
page 97:
 41. A graph that you can draw in the plane without any edges crossing each
other is called planar. Such a drawing will have a number of regions, areas
bounded by the edges of the graph, as well as the (infinitely large) area
around the graph. If the graph has V, E, and F nodes, edges, and regions,
respectively, Euler’s formula for connected planar graphs says that V – E + F =

Prove that this is correct using induction.

44. A node is called central if the greatest (unweighted) distance from that
node to any other in the same graph is minimum. That is, if you sort the
nodes by their greatest distance to any other node, the central nodes will be
at the beginning.
Chapter 5: Traversal: The Skeleton Key of Algorithmics
page 104:
 Listing 51. Walking Through a Connected Component of a Graph Represented
Using Adjacency Sets
 def walk(G, s, S=set()): # Walk the graph from node s
P, Q = dict(), set() # Predecessors + "to do" queue
P[s] = None # s has no predecessor
Q.add(s) # We plan on starting with s
while Q: # Still nodes to visit
u = Q.pop() # Pick one, arbitrarily
for v in G[u].difference(P, S): # New nodes?
Q.add(v) # We plan to visit them!
P[v] = u # Remember where we came from
return P # The traversal tree


Tip Objects of the set type let you perform set operations on other types as
well! For example, in Listing 51, I use the dict P
as if it were a set (of its keys) in the difference method. This works with
other iterables too, such as list or deque, for example, and with other set
methods, such as update.

The S parameter Basically, it represents a “forbidden zone”—a set of nodes
that we may not have visited during our traversal but that we have been told
to avoid. As for the dictionary P, I’m using it to represent predecessors. Each
time we add a new node to the queue, I set its predecessor; that is, I make
sure I remember where I came from when I found it. These predecessors will,
when taken together, form the traversal tree. If you don’t care about the tree,
you’re certainly free to use a set of visited nodes instead .

Note Whether you add nodes to this sort of “visited” set at the same time as
adding them to the queue or later, when you pop them from the queue, is
generally not important. It does have consequences for where you need to add an
“if visited ...” check, though.
page 105:
 Listing 52. Finding Connected Components
 def components(G): # The connected components
comp = []
seen = set() # Nodes we've already seen
for u in G: # Try every starting point
if u in seen: continue # Seen? Ignore it
C = walk(G, u) # Traverse component
seen.update(C) # Add keys of C to seen
comp.append(C) # Collect the components
return comp

 The walk function returns a predecessor map (traversal tree) for the nodes it
has visited, and I collect those in the comp list (of connected components).
I use the seen set to make sure I don’t traverse from a node in one of the
earlier components. Note that even though the operation seen.update(C) is
linear in the size of C, the call to walk has already done the same amount of
work, so asymptotically, it doesn’t cost us anything. All in all, finding the
components like this is Θ(E + V), because every edge and node has to be
explored.4 The walk function doesn’t really do all that much. Still, in many
ways, this simple piece of code is the backbone of this
chapter and (as the chapter title says) a skeleton key to understanding many of
the other algorithms you’re going to learn. It might be worth studying it a
bit.
page 108:
page 110:

Trémaux’s algorithm. In this formulation, it is commonly known as depthfirst
search, and it is one of the most fundamental (and fundamentally important)
traversal algorithms.

Listing 54. Recursive DepthFirst Search
 def rec_dfs(G, s, S=None):
if S is None: S = set() # Initialize the history
S.add(s) # We've visited s
for u in G[s]: # Explore neighbors
if u in S: continue # Already visited: Skip
rec_dfs(G, u, S) # New: Explore recursively

 [In depth first search] Once we start working with one node, we make sure we
traverse all other nodes we can reach from it before moving on.
page 111:
 def iter_dfs(G, s):
S, Q = set(), [] # Visitedset and queue
Q.append(s) # We plan on visiting s
while Q: # Planned nodes left?
u = Q.pop() # Get one
if u in S: continue # Already visited? Skip it
S.add(u) # We've visited it now
Q.extend(G[u]) # Schedule all neighbors
yield u # Report u as visited

 use of a stack (a lastin, firstout or LIFO queue, here implemented by a
list, using append and pop), the queue is not a set, so we risk having the
same node scheduled for more than one visit. solved by checking a node for
membership in S before adding its neighbors. added a yield statement, which
will let you iterate over the graph nodes in DFS order. For example, if you had
the graph from Figure 23 in the variable G, you could try the following:
 >>> list(iter_dfs(G, 0))
[0, 5, 7, 6, 2, 3, 4, 1]

 both DFS and the other traversal algorithms work just as well for directed
graphs. However, if you use DFS on a directed graph, you can’t expect it to
explore an entire connected component. For example, for the graph in Figure
23, traversing from any other start node than "a" would mean that "a" would
never be seen, because it has no inedges.
page 112:
 Listing 56. A General Graph Traversal Function
 def traverse(G, s, qtype=set):
S, Q = set(), qtype()
Q.add(s)
while Q:
u = Q.pop()
if u in S: continue
S.add(u)
for v in G[u]:
Q.add(v)
yield u

 >>> list(traverse(G, 0, stack))
[0, 5, 7, 6, 2, 3, 4, 1]

 def dfs(G, s, d, f, S=None, t=0):
if S is None: S = set() # Initialize the history
d[s] = t; t += 1 # Set discover time
S.add(s) # We've visited s
for u in G[s]: # Explore neighbors
if u in S: continue # Already visited. Skip
t = dfs(G, u, d, f, S, t) # Recurse; update timestamp
f[s] = t; t += 1 # Set finish time
return t # Return timestamp

page 113:
 If we perform DFS on a DAG, we could simply sort the nodes based on their
descending finish times, and they’d be topologically sorted
page 114:
 If we’re looking for the shortest paths (disregarding edge weights, for now)
from our start node to all the others, DFS will, most likely, give us the
wrong answer.
page 115:
page 116:
 Listing 58. Iterative Deepening DepthFirst Search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18  def iddfs(G, s):
yielded = set() # Visited for the first time
def recurse(G, s, d, S=None): # Depthlimited DFS
if s not in yielded:
yield s
yielded.add(s)
if d == 0: return # Max depth zero: Backtrack
if S is None: S = set()
S.add(s)
for u in G[s]:
if u in S: continue
for v in recurse(G, u, d1, S): # Recurse with depth1
yield v
n = len(G)
for d in range(n): # Try all depths 0..V1
if len(yielded) == n: break # All nodes seen?
for u in recurse(G, s, d):
yield u

 the only salient difference [in BFS] from DFS: we’ve replaced LIFO with FIFO.
page 117:
 Listing 59. BreadthFirst Search
 def bfs(G, s):
P, Q = {s: None}, deque([s]) # Parents and FIFO queue
while Q:
u = Q.popleft() # Constanttime for deque
for v in G[u]:
if v in P: continue # Already has parent
P[v] = u # Reached from u: u is parent
Q.append(v)
return P


// 59 is similar to 55. Deque over list.

To extract a path to a node u, you can simply “walk backward” in P:
 >>> path = [u]
>>> while P[u] is not None:
... path.append(P[u])
... u = P[u]
...
>>> path.reverse()

 Tip One way of visualizing BFS and DFS is as browsing the Web. DFS is what
you get if you keep following links and then use the Back button once you’re
done with a page. The backtracking is a bit like an “undo.” BFS is more like
opening every link in a new window (or tab) behind those you already have and
then closing the windows as you finish with each page.
page 118:

Python has a deque class in the collections module in the standard library.
In addition to methods such as append, extend, and pop, which are performed
on the right side, it has left equivalents, called appendleft , extendleft, and
popleft. Internally, the deque is implemented as a doubly linked list of
blocks, each of which is an array of individual elements.

A connected component is a maximal subgraph where all nodes can reach each
other if you ignore edge directions (or if the graph is undirected). To get
strongly connected components, though, you need to follow the edge directions;
so, SCCs are the maximal subgraphs where there is a directed path from any node
to any other.
page 119:
page 120:
 Listing 510. Kosaraju’s Algorithm for Finding Strongly Connected Components
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17  def tr(G): # Transpose (rev. edges of) G
GT = {}
for u in G: GT[u] = set() # Get all the nodes in there
for u in G:
for v in G[u]:
GT[v].add(u) # Add all reverse edges
return GT
def scc(G):
GT = tr(G) # Get the transposed graph
sccs, seen = [], set()
for u in dfs_topsort(G): # DFS starting points
if u in seen: continue # Ignore covered nodes
C = walk(GT, u, seen) # Don't go "backward" (seen)
seen.update(C) # We've now seen C
sccs.append(C) # Another SCC found
return sccs

 If you try running scc on the graph in Figure 57, you should get the three
sets {a, b, c, d}; {e, f, g}; and {i, h}.
Chapter 6: Divide, Combine, and Conquer
page 126:
 // Using weak induction generates a recurrance like
T(n) = T(n1) + T(1) + n. Each reduces the problem size by 1 and ends up with
quadratic running time. T(n) = 2T(n/2) + n is the equation for dividing the
subproblems more evenly (not 1 each step, half). This runs in (n log n) time.
See fig 6.1, 6.2.
page 127:
 // Weak induction is 1 step at a time reduction and
strong induction is divide and conquer?
page 128:
 Listing 61. A General Implementation of the Divide and Conquer Scheme
 def divide_and_conquer(S, divide, combine):
if len(S) == 1: return S
L, R = divide(S)
A = divide_and_conquer(L, divide, combine)
B = divide_and_conquer(R, divide, combine)
return combine(A, B)

page 129:
 // bisect module in standard library (written in
C).
 >>> from bisect import bisect
>>> a = [0, 2, 3, 5, 6, 8, 8, 9]
>>> bisect(a, 5)
4

 it doesn’t return the position of the 5 that’s already there. Rather, it
reports the position to insert the new 5, making sure it’s placed after all
existing items with the same value. In fact, bisect is another name for
bisect_right, and there’s also a bisect_left
page 131:
page 133:
 Listing 62. Insertion into and Search in a Binary Search Tree
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36  class Node:
lft = None
rgt = None
def __init__(self, key, val):
self.key = key
self.val = val
def insert(node, key, val):
if node is None: return Node(key, val) # Empty leaf: it's not here
if node.key == key: node.val = val # Found key: return val
elif key < node.key: # Less than the key?
node.lft = insert(node.lft, key, val) # Go left
else: # Otherwise...
node.rgt = insert(node.rgt, key, val) # Go right
return node
def search(node, key):
if node is None: raise KeyError # Empty leaf: add node here
if node.key == key: return node.val # Found key: replace val
elif key < node.key: # Less than the key?
return search(node.lft, key) # Go left
else: # Otherwise...
return search(node.rgt, key) # Go right
class Tree: # Simple wrapper
root = None
def __setitem__(self, key, val):
self.root = insert(self.root, key, val)
def __getitem__(self, key):
return search(self.root, key)
def __contains__(self, key):
try: search(self.root, key)
except KeyError: return False
return True

 >>> tree = Tree()
>>> tree["a"] = 42
>>> tree["a"]
42
>>> "b" in tree
False


Note The implementation in Listing 62 does not permit the tree to contain
duplicate keys. If you insert a new value with an existing key, the old value
is simply overwritten. This could easily be changed, because the tree structure
itself does not preclude duplicates.

Bisection is fast, with little overhead, but works only on sorted arrays
(such as Python lists). And sorted arrays are hard to maintain; adding
elements takes linear time.

Search trees have more overhead but are dynamic and let you insert and remove
elements.

In many cases, though, the clear winner will be the hash table, in the form
of dict. Its average asymptotic running time is constant (as opposed to the
logarithmic running time of bisection and search trees), and it is very close
to that in practice, with little overhead.

search trees will let you access your values in sorted order—either all of
them or just a portion.

Trees can also be extended to work in multiple dimensions, where hashing may
be hard to achieve.

if you want the entry that is closest to your lookup key, a search tree would
be the way to go.

Actually, more flexible may not be entirely correct. There are many objects
(such as complex numbers) that can be hashed but that cannot be compared for
size.
page 134:

Tip If you’re looking for the k smallest (or largest) objects in an iterable
in Python, you would probably use the nsmallest (or nlargest) function from
the heapq module if your k is small, relative to the total number of objects.
If the k is large, you should probably sort the sequence (either using the sort
method or using the sorted function) and pick out the k first objects.

Listing 63. A Straightforward Implementation of Partition and Select
1
2
3
4
5
6
7
8
9
10
11
12
13
14  def partition(seq):
pi, seq = seq[0], seq[1:] # Pick and remove the pivot
lo = [x for x in seq if x <= pi] # All the small elements
hi = [x for x in seq if x > pi] # All the large ones
return lo, pi, hi # pi is "in the right place"
def select(seq, k):
lo, pi, hi = partition(seq) # [<= pi], pi, [> pi]
m = len(lo)
if m == k: return pi # We found the kth smalles
elif m < k: # Too far to the left
return select(hi, km1) # Remember to adjust k
else: # Too far to the right
return select(lo, k) # Just use original k here

page 137:
 Timsort is a close relative to merge sort. It’s an inplace algorithm, in
that it merges segments and leaves the result in the original array (although
it uses some auxiliary memory during the merging). Instead of simply sorting
the array halfandhalf and then merging those, though, it starts at the
beginning, looking for segments that are already sorted (possibly in reverse),
called runs. In random arrays, there won’t be many, but in many kinds of real
data, there may be a lot—giving the algorithm a clear edge over a plain merge
sort and a linear running time in the best case (and that covers a lot of cases
beyond simply getting a sequence that’s already sorted).
page 138:
 lg(n!) is asymptotically equivalent to n*lg(n) // I think
lg here is natural log
page 140:
 A region is convex if you can draw a line between any two points inside it,
and the line stays inside the region.
page 144:
 Note The 23tree is a special case of the Btree, which forms the basis of
almost all database systems, and diskbased trees used in such diverse areas
as geographic information systems and image retrieval. The important extension
is that Btrees can have thousands of keys (and subtrees), and each node is
usually stored as a contiguous block on disk. The main motivation for the large
blocks is to minimize the number of disk accesses.
page 147:
 BLACK BOX: BINARY HEAPS, HEAPQ, AND HEAPSORT
page 149:
 If you like bisection, you should look up interpolation search, which for
uniformly distributed data has an averagecase running time of O(lg lg n).
For implementing sets (that is, efficient membership checking) other than
sorted sequences, search trees and hash tables, you could have a look at Bloom
filters. If you like search trees and related structures, there are lots of
them out there. You could find tons of different balancing mechanisms (red
black trees, AVLtrees, splay trees), some of them randomized (treaps), and
some of them only abstractly representing trees (skip lists). There are also
whole families of specialized tree structures for indexing multidimensional
coordinates (socalled spatial access methods) and distances (metric access
methods). Other trees structures to check out are interval trees, quadtrees,
and octtrees.
Chapter 7: Greed Is Good? Prove It!
page 151:
 Socalled greedy algorithms are shortsighted, in that they make each choice
in isolation, doing what looks good right here, right now. In many ways,
eager or impatient might be better names for them,
page 153:
 // Making change, denom is $100.01, owed in cents. It
works but is brittle. Reordering denom or different currencies break
it.
1
2
3
4
5
6
7
8
9
10
11
12  >>> denom = [10000, 5000, 2000, 1000, 500, 200, 100, 50, 25, 10, 5, 1]
>>> owed = 56329
>>> payed = []
>>> for d in denom:
... while owed >= d:
... owed = d
... payed.append(d)
...
>>> sum(payed)
56329
>>> len(payed)
14

page 156:
 If we view each object individually, this is often called 01 knapsack,
because we can take 0 or 1 of each object.
page 157:

Your goal for the tree structure is still the same. How can you minimize the
sum of depth(u) × weight(u) over all leaves u? This problem certainly has
other applications as well. In fact, the original (and most common) application
is compression—representing a text more compactly—through variablelength
codes.

let a partial solution consist of several tree fragments and then repeatedly
combine them. When we combine two trees, we add a new, shared root and give
it a weight equal to the sum of its children (that is, the previous roots).
This is exactly what the numbers inside the nodes in Figure 72 mean.
page 158:
 Listing 71. Huffman’s Algorithm
1
2
3
4
5
6
7
8
9
10
11
12  from heapq import heapify, heappush, heappop
from itertools import count
def huffman(seq, frq):
num = count()
trees = list(zip(frq, num, seq)) # num ensures valid ordering
heapify(trees) # A minheap based on freq
while len(trees) > 1: # Until all are combined
fa, _, a = heappop(trees) # Get the two smallest trees
fb, _, b = heappop(trees)
n = next(num)
heappush(trees, (fa+fb, n, [a, b])) # Combine and readd them
return trees[0][1]

 If a future version of the heapq library lets you use a key function, such as
in list.sort, you’d no longer need this tuple wrapping at all, of course.
page 159:
 Here’s an example of how you might use the code:
 >>> seq = "abcdefghi"
>>> frq = [4, 5, 6, 9, 11, 12, 15, 16, 20]
>>> huffman(seq, frq)
[['i', [['a', 'b'], 'e']], [['f', 'g'], [['c', 'd'], 'h']]]


As you can see, the resulting tree structure is equivalent to the one shown
in Figure 72.

Listing 72. Extracting Huffman Codes from a Huffman Tree
 def codes(tree, prefix=""):
if len(tree) == 1:
yield (tree, prefix) # A leaf with its code
return
for bit, child in zip("01", tree): # Left (0) and right (1)
for pair in codes(child, prefix + bit): # Get codes recursively
yield pair

 The codes function yields (char, code) pairs suitable for use in the dict
constructor, for example. To use such a dict to compress a code, you’d just
iterate over the text and look up each character. To decompress the text, you’d
rather use the Huffman tree directly, traversing it using the bits in the input
for directions (that is, determining whether you should go left or right); I’ll
leave the details as an exercise for the reader.
page 161:
 [A Minimum Spanning Tree] could be useful in building an electrical grid,
building the core of a road or railroad network, laying out a circuit, or
even performing some forms of clustering (where we’d only almost connect all
the nodes).
page 162:
page 165:
 Listing 74. Kruskal’s Algorithm (minimum spanning tree)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23  def find(C, u):
if C[u] != u:
C[u] = find(C, C[u]) # Path compression
return C[u]
def union(C, R, u, v):a
u, v = find(C, u), find(C, v)
if R[u] > R[v]: # Union by rank
C[v] = u
else:
C[u] = v
if R[u] == R[v]: # A tie: Move v up a level
R[v] += 1
def kruskal(G):
E = [(G[u][v],u,v) for u in G for v in G[u]]
T = set()
C, R = {u:u for u in G}, {u:0 for u in G} # Comp. reps and ranks
for _, u, v in sorted(E):
if find(C, u) != find(C, v):
T.add((u, v))
union(C, R, u, v)
return T

 // find() above is making sure each node is not already in
the subsolution.
page 167:
 Listing 75. Prim’s Algorithm
 from heapq import heappop, heappush
def prim(G, s):
P, Q = {}, [(0, None, s)]
while Q:
_, p, u = heappop(Q)
if u in P: #if we already added u to P, don't add a dupe
continue
P[u] = p
for v, w in G[u].items():
heappush(Q, (w, u, v))
return P

 // prim() above adds duplicate nodes with different weights
and discards duplicate heavier edges (u in P: continue).
page 168:

Graham and Hell succinctly explain the algorithms as follows. A partial
solution is a spanning forest, consisting of a set of fragments (components,
trees). Initially, each node is a fragment. In each iteration, edges are added,
joining fragments, until we have a spanning tree.

Algorithm 1: Add a shortest edge that joins two different fragments.

Algorithm 2: Add a shortest edge that joins the fragment containing the root
to another fragment.

Algorithm 3: For every fragment, add the shortest edge that joins it to
another fragment.

For algorithm 2, the root is chosen arbitrarily at the beginning. For
algorithm 3, it is assumed that all edge weights are different to ensure that
no cycles can occur. As you can see, all three algorithms are based on the same
fundamental fact—that the shortest edge over a cut is safe. Also, in order to
implement them efficiently, you need to be able to find shortest edges, detect
whether two nodes belong to the same fragment, and so forth (as explained for
algorithms 1 and 2 in the main text). Still, these brief explanations can be
useful as a memory aid or to get the bird’seye perspective on what’s going on.
page 172:

// Greedy algorithms take the best element at any given
step, without reguard for previous/later choices.

// Often easy to write, hard to prove they are
optimal. Often prove by "keeping up".

Important greedy problems and algorithms discussed in this chapter include
the knapsack problem (selecting a weightbounded subset of items with maximum
value), where the fractional version can be solved greedily; Huffman trees,
which can be used to create optimal prefix codes, and are built greedily by
combining the smallest trees in the partial solution; and minimum spanning
trees, which can be built using Kruskal’s algorithm (keep adding the smallest
valid edge) or Prim’s algorithm (keep connecting the node that is closest to
your tree).
Chapter 8: Tangled Dependencies and Memoization
page 175:
 Another option—one I think is particularly suited to highlevel languages
such as Python—is to implement the recursive formulation directly but to
cache the return values. If a call is made more than once with the same
arguments, the result is simply returned directly from the cache. This is
called memoization.
page 176:
 Listing 81. A Naïve Solution to the Longest Increasing Subsequence Problem
 from itertools import combinations
def naive_lis(seq):
for length in range(len(seq), 0, 1): # n, n1, ... , 1
for sub in combinations(seq, length): # Subsequences of given length
if list(sub) == sorted(sub): # An increasing subsequence?
return sub # Return it!

page 177:
 Listing 82. A Memoizing Decorator
 from functools import wraps
def memo(func):
cache = {} # Stored subproblem solutions
@wraps(func) # Make wrap look like func
def wrap(*args): # The memoized wrapper
if args not in cache: # Not already computed?
cache[args] = func(*args) # Compute & cache the solution
return cache[args] # Return the cached solution
return wrap # Return the wrapper

 The idea of a memoized function4 is that it caches its return values. If you
call it a second time with the same parameters, it will simply return the
cached value. You can certainly put this sort of caching logic inside your
function, but the memo function is a more reusable solution. It’s even designed
to be used as a decorator.
page 178:
 >>> @memo
... def fib(i):
... if i < 2: return 1
... return fib(i1) + fib(i2)
...
>>> fib(100)
573147844013817084101

 The main difference is that the subproblems have tangled dependencies. Or, to
put it in another way: We’re faced with overlapping subproblems. This is
perhaps even clearer in this rather silly relative of the Fibonacci numbers: a
recursive formulation of the powers of two:
 >>> def two_pow(i):
... if i == 0: return 1
... return two_pow(i1) + two_pow(i1)
...
>>> two_pow(10)
1024
>>> two_pow(100) #hangs

 Still horrible. Try adding @memo, and you’ll get the answer instantly. Or,
you could try to make the following change, which is actually equivalent:
 >>> def two_pow(i):
... if i == 0: return 1
... return 2*two_pow(i1)
...
>>> print(two_pow(10))
1024
>>> print(two_pow(100))
1267650600228229401496703205376

page 179:
(n) = (n1) + (n1) //n choose k
(k) (k1) (k)
 Figure 82 shows how the binomial coefficients can be placed in a triangular
pattern so that each number is the sum of the two above it. This means that
the row (counting from zero) corresponds to n, and the column (the number of
the cell, counting from zero at the left in its row) corresponds to k. For
example, the value 6 corresponds to C(4, 2), and can be calculated as C(3,1) +
C(3,2) = 3 + 3 = 6.
page 180:
 The code for C(n, k) is trivial:
 >>> @memo
>>> def C(n,k):
... if k == 0: return 1
... if n == 0: return 0
... return C(n1,k1) + C(n1,k)
>>> C(4,2)
6
>>> C(10,7)
120
>>> C(100,50)
100891344545564193334812497256

page 181:
 Let’s reverse our algorithm, filling out Pascal’s triangle directly. To keep
things simple, I’ll use a defaultdict as the cache; feel free to use nested
lists, for example. (See also Exercise 84.)
 >>> from collections import defaultdict
>>> n, k = 10, 7
>>> C = defaultdict(int)
>>> for row in range(n+1):
... C[row,0] = 1
... for col in range(1,k+1):
... C[row,col] = C[row1,col1] + C[row1,col]
...
>>> C[n,k]
120


Basically the same thing is going on. The main difference is that we need to
figure out which cells in the cache need to be filled out, and we need to
find a safe order to do it in so that when we’re about to calculate C[row,col],
the cells C[row1,col1] and C[row1,col] are already calculated. With the
memoized function, we needn’t worry about either issue: it will calculate
whatever it needs recursively.

One useful way to visualize dynamic programming algorithms with one or two
subproblem parameters (such as n and k, here) is to use a (real or imagined)
spreadsheet. For example, try calculating binomial coefficients in a
spreadsheet by filling the first column with ones and filling in the rest of
the first row with zeros. Put the formula =A1+B1 into cell B2, and copy it to
the remaining cells.
page 183:
 Listing 83. Recursive, Memoized DAG Shortest Path
 def rec_dag_sp(W, s, t): # Shortest path from s to t
@memo # Memoize f
def d(u): # Distance from u to t
if u == t:
return 0 # We're there!
return min(W[u][v]+d(v) for v in W[u]) # Best of every first step
return d(s)

 Listing 84. DAG Shortest Path
 def dag_sp(W, s, t):
d = {u:float('inf') for u in W}
d[s] = 0
for u in topsort(W):
if u == t: break
for v in W[u]:
d[v] = min(d[v], d[u] + W[u][v])
return d[t]

page 185:
 Listing 85. A Memoized Recursive Solution to the Longest Increasing
Subsequence Problem
 def rec_lis(seq): # Longest increasing subseq.
@memo
def L(cur): # Longest ending at seq[cur]
res = 1 # Length is at least 1
for pre in range(cur): # Potential predecessors
if seq[pre] <= seq[cur]: # A valid (smaller) predec.
res = max(res, 1 + L(pre)) # Can we improve the solution?
return res
return max(L(i) for i in range(len(seq))) # The longest of them all

 Listing 86. A Basic Iterative Solution to the Longest Increasing Subsequence
Problem
 def basic_lis(seq):
L = [1] * len(seq)
for cur, val in enumerate(seq):
for pre in range(cur):
if seq[pre] <= val:
L[cur] = max(L[cur], 1 + L[pre])
return max(L)

page 187:
 Listing 87. Longest Increasing Subsequence
 from bisect import bisect
def lis(seq): # Longest increasing subseq.
end = [] # Endvalues for all lengths
for val in seq: # Try every value, in order
idx = bisect(end, val) # Can we build on an end val?
if idx == len(end): end.append(val) # Longest seq. extended
else: end[idx] = val # Prev. endpoint reduced
return len(end) # The longest we found

 if the length of the sequences are m and n and the length of the longest
common subsequence is k, the edit distance is m + n  2k.
page 188:
 Listing 88. A Memoized Recursive Solution to the LCS Problem
 def rec_lcs(a,b): # Longest common subsequence
@memo # L is memoized
def L(i,j): # Prefixes a[:i] and b[:j]
if min(i,j) < 0: return 0 # One prefix is empty
if a[i] == b[j]: return 1 + L(i1,j1) # Match! Move diagonally
return max(L(i1,j), L(i,j1)) # Chop off either a[i] or b[j]
return L(len(a)1,len(b)1) # Run L on entire sequences

page 189:
 Listing 89. An Iterative Solution to the Longest Common Subsequence (LCS)
 def lcs(a,b):
n, m = len(a), len(b)
pre, cur = [0]*(n+1), [0]*(n+1) # Previous/current row
for j in range(1,m+1): # Iterate over b
pre, cur = cur, pre # Keep prev., overwrite cur.
for i in range(1,n+1): # Iterate over a
if a[i1] == b[j1]: # Last elts. of pref. equal?
cur[i] = pre[i1] + 1 # L(i,j) = L(i1,j1) + 1
else: # Otherwise...
cur[i] = max(pre[i], cur[i1]) # max(L(i,j1),L(i1,j))
return cur[n] # L(n,m)

page 190:
 Listing 810. A Memoized Recursive Solution to the Unbounded Integer Knapsack
Problem
 def rec_unbounded_knapsack(w, v, c): # Weights, values and capacity
@memo # m is memoized
def m(r): # Max val. w/remaining cap. r
if r == 0: return 0 # No capacity? No value
val = m(r1) # Ignore the last cap. unit?
for i, wi in enumerate(w): # Try every object
if wi > r: continue # Too heavy? Ignore it
val = max(val, v[i] + m(rwi)) # Add value, remove weight
return val # Max over all last objects
return m(c) # Full capacity available

 The running time here depends on the capacity and the number of objects. Each
memoized call m(r) is computed only once, which means that for a capacity c,
we have Θ(c) calls. Each call goes through all the n objects, so the resulting
running time is Θ(cn).
page 191:
 Listing 811. An Iterative Solution to the Unbounded Integer Knapsack Problem
 def unbounded_knapsack(w, v, c):
m = [0]
for r in range(1,c+1):
val = m[r1]
for i, wi in enumerate(w):
if wi > r: continue
val = max(val, v[i] + m[rwi])
m.append(val)
return m[c]

page 192:
 Listing 812. A Memoized Recursive Solution to the 01 Knapsack Problem
 def rec_knapsack(w, v, c): # Weights, values and capacity
@memo # m is memoized
def m(k, r): # Max val., k objs and cap r
if k == 0 or r == 0: return 0 # No objects/no capacity
i = k1 # Object under consideration
drop = m(k1, r) # What if we drop the object?
if w[i] > r: return drop # Too heavy: Must drop it
return max(drop, v[i] + m(k1, rw[i])) # Include it? Max of in/out
return m(len(w), c) # All objects, all capacity

 Listing 813. An Iterative Solution to the 01 Knapsack Problem
1
2
3
4
5
6
7
8
9
10
11
12
13  def knapsack(w, v, c): # Returns solution matrices
n = len(w) # Number of available items
m = [[0]*(c+1) for i in range(n+1)] # Empty maxvalue matrix
P = [[False]*(c+1) for i in range(n+1)] # Empty keep/drop matrix
for k in range(1,n+1): # We can use k first objects
i = k1 # Object under consideration
for r in range(1,c+1): # Every positive capacity
m[k][r] = drop = m[k1][r] # By default: drop the object
if w[i] > r: continue # Too heavy? Ignore it
keep = v[i] + m[k1][rw[i]] # Value of keeping it
m[k][r] = max(drop, keep) # Best of dropping and keeping
P[k][r] = keep > drop # Did we keep it?
return m, P # Return full results

 Now that the knapsack function returns more information, we can use it to
extract the set of objects actually included in the optimal solution. For
example, you could do something like this:
 >>> m, P = knapsack(w, v, c)
>>> k, r, items = len(w), c, set()
>>> while k > 0 and r > 0:
... i = k1
... if P[k][r]:
... items.add(i)
... r = w[i]
... k = 1

 In other words, by simply keeping some information about the choices made (in
this case, keeping or dropping the element under consideration), we can
gradually trace ourselves back from the final state to the initial conditions.
In this case, I start with the last object and check P[k][r] to see whether it
was included. If it was, I subtract its weight from r; if it wasn’t, I leave r
alone (as we still have the full capacity available). In either case, I
decrement k, because we’re done looking at the last element and now want to
have a look at the nexttolast element (with the updated capacity). You might
want to convince yourself that this backtracking operation has a linear running
time. The same basic idea can be used in all the examples in this chapter.
page 196:

Problems solved using DP in this chapter include calculating binomial
coefficients, finding shortest paths in DAGs, finding the longest increasing
subsequence of a given sequence, finding the longest common subsequence of two
given sequences, getting the most out of your knapsack with limited and
unlimited supplies of indivisible items, and building binary search trees that
minimize the expected lookup time.

For some sequence comparison goodness in the Python standard library, check
out the difflib module. If you have Sage installed, you could have a look at
its knapsack module (sage.numerical.knapsack).
Chapter 9: From A to B with Edsger and Friends
page 200:
 Listing 91. The Relaxation Operation
 inf = float('inf')
def relax(W, u, v, D, P):
d = D.get(u,inf) + W[u][v] # Possible shortcut estimate
if d < D.get(v,inf): # Is it really a shortcut?
D[v], P[v] = d, u # Update estimate and parent
return True # There was a change!

 The idea is that we look for an improvement to the currently known distance
to v by trying to take a shortcut through v. If it turns out not to be a
shortcut, fine. We just ignore it. If it is a shortcut, we register the new
distance and remember where we came from (by setting P[v] to u). I’ve also
added a small extra piece of functionality: the return value indicates whether
any change actually took place; that’ll come in handy later (though you won’t
need it for all your algorithms). Here’s a look at how it works:
1
2
3
4
5
6
7
8
9
10
11
12
13
14  >>> D[u]
7
>>> D[v]
13
>>> W[u][v]
3
>>> relax(W, u, v, D, P)
True
>>> D[v]
10
>>> D[v] = 8
>>> relax(W, u, v, D, P)
>>> D[v]
8

 As you can see, the first call to relax improves D[v] from 13 to 10, because
I found a shortcut through u, which I had (presumably) already reached using
a distance of 7, and which was just 3 away from v. Now I somehow discover that
I can reach v by a path of length 8. I run relax again, but this time, no
shortcut is found, so nothing happens. As you can probably surmise, if I now
set D[u] to 4 and ran the same relax again, D[v] would improve, this time to 7,
propagating the improved estimate from u to v.
page 201:
 if you keep randomly relaxing forever, you know that you’ll have the right
answer.
page 202:
 Tip Imagine each node continuously shouting out bids for supplying short
paths to its outneighbors, based on the shortest path it has gotten itself,
so far. If any node gets a better offer than what it already has, it switches
its path supplier and lowers its bids accordingly.
page 203:
 Listing 92. The BellmanFord Algorithm
1
2
3
4
5
6
7
8
9
10
11
12  def bellman_ford(G, s):
D, P = {s:0}, {} # Zerodist to s; no parents
for rnd in G: # n = len(G) rounds
changed = False # No changes in round so far
for u in G: # For every fromnode...
for v in G[u]: # ... and its tonodes...
if relax(G, u, v, D, P): # Shortcut to v from u?
changed = True # Yes! So something changed
if not changed: break # No change in round: Done
else: # Not done before round n?
raise ValueError('negative cycle')# Negative cycle detected
return D, P # Otherwise: D and P correct

page 205:
 Listing 93. Dijkstra’s Algorithm
 from heapq import heappush, heappop
def dijkstra(G, s):
D, P, Q, S = {s:0}, {}, [(0,s)], set() # Est., tree, queue, visited
while Q: # Still unprocessed nodes?
_, u = heappop(Q) # Node with lowest estimate
if u in S: continue # Already visited? Skip it
S.add(u) # We've visited it now
for v in G[u]: # Go through all its neighbors
relax(G, u, v, D, P) # Relax the outedge
heappush(Q, (D[v], v)) # Add to queue, w/est. as pri
return D, P # Final D and P returned

 Dijkstra’s algorithm may be similar to Prim’s (with another set of priorities
for the queue), but it is also closely related to another old favorite: BFS.
Consider the case where the edge weights are positive integers. Now, replace an
edge that has weight w with w 1 unweighted edges, connecting a path of dummy
nodes (see Figure 92). We’re breaking what chances we had for an efficient
solution (see Exercise 93), but we know that BFS will find a correct solution.
In fact, it will do so in a way very similar to Dijkstra’s algorithm: it will
spend an amount of time on each (original) edge proportional to its weight
page 206:

It’s a bit like you had set up a series of dominoes along each edge (the
number of dominoes proportional to the weight), and you then tip the first
domino in the start node. A node may be reached from multiple directions, but
we can see which direction won, by looking at which dominoes lie below the
others.

If we started with this approach, we could see Dijkstra’s algorithm as a way
of gaining performance by “simulating” BFS, or the dominoes (or flowing water or a spreading sound wave, or ...),
without bothering to deal with each dummy node (or domino) individually.
page 207:
 Listing 94. Johnson’s Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16  from copy import deepcopy
def johnson(G): # All pairs shortest paths
G = deepcopy(G) # Don't want to break original
s = object() # Guaranteed unused node
G[s] = {v:0 for v in G} # Edges from s have zero wgt
h, _ = bellman_ford(G, s) # h[v]: Shortest dist from s
del G[s] # No more need for s
for u in G: # The weight from u ...
for v in G[u]: # ... to v ...
G[u][v] += h[u]  h[v] # ... is adjusted (nonneg.)
D, P = {}, {} # D[u][v] and P[u][v]
for u in G: # From every u ...
D[u], P[u] = dijkstra(G, u) # ... find the shortest paths
for v in G: # For each destination ...
D[u][v] += h[v]  h[u] # ... readjust the distance
return D, P # These are twodimensional

page 208:
 7 A common criterion for calling a graph sparse is that m is O(n), for
example. In this case, though, Johnson’s will (asymptotically) match
FloydWarshall as long as m is O(n2/lg n), which allows for quite a lot of
edges. On the other hand, FloydWarshall has very low constant overhead.
page 209:
 Listing 95. A Memoized Recursive Implementation of the FloydWarshall Algorithm
 def rec_floyd_warshall(G): # All shortest paths
@memo # Store subsolutions
def d(u,v,k): # u to v via 1..k
if k==0: return G[u][v] # Assumes v in G[u]
return min(d(u,v,k1), d(u,k,k1) + d(k,v,k1)) # Use k or not?
return {(u,v): d(u,v,len(G)) for u in G for v in G} # D[u,v] = d(u,v,n)

page 210:
 Listing 96. The FloydWarshall Algorithm, Distances Only
 def floyd_warshall(G):
D = deepcopy(G) # No intermediates yet
for k in G: # Look for shortcuts with k
for u in G:
for v in G:
D[u][v] = min(D[u][v], D[u][k] + D[k][v])
return D

 Listing 97. The FloydWarshall Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16  def floyd_warshall(G):
D, P = deepcopy(G), {}
for u in G:
for v in G:
if u == v or G[u][v] == inf:
P[u,v] = None
else:
P[u,v] = u
for k in G:
for u in G:
for v in G:
shortcut = D[u][k] + D[k][v]
if shortcut < D[u][v]:
D[u][v] = shortcut
P[u,v] = P[k,v]
return D, P

page 211:
 Listing 98. Dijkstra’s Algorithm Implemented as a Generator
 def idijkstra(G, s):
Q, S = [(0,s)], set() # Queue w/dists, visited
while Q: # Still unprocessed nodes?
d, u = heappop(Q) # Node with lowest estimate
if u in S: continue # Already visited? Skip it
S.add(u) # We've visited it now
yield u, d # Yield a subsolution/node
for v in G[u]: # Go through all its neighbors
heappush(Q, (d+G[u][v], v)) # Add to queue, w/est. as pri

page 212:
 Note that I’ve dropped the use of relax completely—it is now implicit in the
heap. Or, rather, heappush is the new relax. Readding a node with a better
estimate means it will take precedence over the old entry, which is equivalent
to overwriting the old one with a relax operation. This is analogous to the
implementation of Prim’s algorithm in Chapter 7.
page 213:
 Listing 99. The Bidirectional Version of Dijkstra’s Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17  from itertools import cycle
def bidir_dijkstra(G, s, t):
Ds, Dt = {}, {} # D from s and t, respectively
forw, back = idijkstra(G,s), idijkstra(G,t) # The "two Dijkstras"
dirs = (Ds, Dt, forw), (Dt, Ds, back) # Alternating situations
try: # Until one of forw/back ends
for D, other, step in cycle(dirs): # Switch between the two
v, d = next(step) # Next node/distance for one
D[v] = d # Remember the distance
if v in other: break # Also visited by the other?
except StopIteration: return inf # One ran out before they met
m = inf # They met; now find the path
for u in Ds: # For every visited forwnode
for v in G[u]: # ... go through its neighbors
if not v in Dt: continue # Is it also backvisited?
m = min(m, Ds[u] + G[u][v] + Dt[v]) # Is this path better?
return m # Return the best path

 By now you’ve seen that the basic idea of traversal is pretty versatile, and
by simply using different queues, you get several useful algorithms. For
example, for FIFO and LIFO queues, you get BFS and DFS, and with the
appropriate priorities, you get the core of Prim’s and Dijkstra’s algorithms.
page 215:
 Listing 910. The A* Algorithm
1
2
3
4
5
6
7
8
9
10
11
12  from heapq import heappush, heappop
def a_star(G, s, t, h):
P, Q = {}, [(h(s), None, s)] # Preds and queue w/heuristic
while Q: # Still unprocessed nodes?
d, p, u = heappop(Q) # Node with lowest heuristic
if u in P: continue # Already visited? Skip it
P[u] = p # Set path predecessor
if u == t: return d  h(t), P # Arrived! Ret. dist and preds
for v in G[u]: # Go through all neighbors
w = G[u][v]  h(u) + h(v) # Modify weight wrt heuristic
heappush(Q, (d + w, u, v)) # Add to queue, w/heur as pri
return inf, None # Didn't get to t

 Of course, in order to get any benefit from the A* algorithm, you need a good
heuristic. What this function should be will depend heavily on the exact
problem you’re trying to solve, of course. For example, if you’re navigating a
road map, you’d know that the Euclidean distance, as the crow flies, from a
given node to your destination must be a valid heuristic (lower bound). This
would, in fact, be a usable heuristic for any movement on a flat surface, such
as monsters walking around in a computer game world. If there are lots of blind
alleys and twists and turns, though, this lower bound may not be very accurate.
page 216:
 Listing 911. An Implicit Graph with Word Ladder Paths
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39  from string import ascii_lowercase as chars
class WordSpace: # An implicit graph w/utils
def __init__(self, words): # Create graph over the words
self.words = words
self.M = M = dict() # Reachable words
def variants(self, wd, words): # Yield all word variants
wasl = list(wd) # The word as a list
for i, c in enumerate(wasl): # Each position and character
for oc in chars: # Every possible character
if c == oc: continue # Don't replace with the same
wasl[i] = oc # Replace the character
ow = ''.join(wasl) # Make a string of the word
if ow in words: # Is it a valid word?
yield ow # Then we yield it
wasl[i] = c # Reset the character
def __getitem__(self, wd): # The adjacency map interface
if wd not in self.M: # Cache the neighbors
self.M[wd] = dict.fromkeys(self.variants(wd, self.words), 1)
return self.M[wd]
def heuristic(self, u, v): # The default heuristic
return sum(a!=b for a, b in zip(u, v)) # How many characters differ?
def ladder(self, s, t, h=None): # Utility wrapper for a_star
if h is None: # Allows other heuristics
def h(v):
return self.heuristic(v, t)
_, P = a_star(self, s, t, h) # Get the predecessor map
if P is None:
return [s, None, t] # When no path exists
u, p = t, []
while u is not None: # Walk backward from t
p.append(u) # Append every predecessor
u = P[u] # Take another step
p.reverse() # The path is backward
return p

 >>> wds = set(line.strip().lower() for line in open("/usr/share/dict/words"))
>>> G = WordSpace(words)
>>> G.ladder('lead', 'gold')
['lead', 'load', 'goad', 'gold']

page 217:

this chapter dealt with finding optimal routes in networklike structures and
spaces—in other words, shortest paths in graphs.

One fundamental tactic common to all the shortest path algorithms is that of
looking for shortcuts, either through a new possible nexttolast node along
a path, using the relax function or something equivalent (most of the
algorithms do this), or by considering a shortcut consisting of two subpaths,
to and from some intermediate node (the strategy of FloydWarshall).
Chapter 10: Matchings, Cuts, and Flows
page 221:
 The maxflow problem and its variations have almost endless applications.
Douglas B. West, in his book Graph Theory (see References, Chapter 2), gives
some rather obvious ones, such as determining the total capacities of road and
communication networks, or even working with currents in electrical circuits.
Kleinberg and Tardos (see “References” in Chapter 1) explain how to apply the
formalism to survey design, airline scheduling, image segmentation, project
selection, baseball elimination, and assigning doctors to holidays. Ahuja,
Magnanti, and Orlin have written one of the most thorough books on the subject
and cover well over a hundred applications in such diverse areas as
engineering, manufacturing, scheduling, management, medicine, defense,
communication, public policy, mathematics, and transportation.
page 224:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27  from collections import defaultdict
from itertools import chain
def match(G, X, Y): # Maximum bipartite matching
H = tr(G) # The transposed graph
S, T, M = set(X), set(Y), set() # Unmatched left/right + match
while S: # Still unmatched on the left?
s = S.pop() # Get one
Q, P = {s}, {} # Start a traversal from it
while Q: # Discovered, unvisited
u = Q.pop() # Visit one
if u in T: # Finished augmenting path?
T.remove(u) # u is now matched
break # and our traversal is done
forw = (v for v in G[u] if (u,v) not in M) # Possible new edges
back = (v for v in H[u] if (v,u) in M) # Cancellations
for v in chain(forw, back): # Along out and inedges
if v in P: continue # Already visited? Ignore
P[v] = u # Traversal predecessor
Q.add(v) # New node discovered
while u != s: # Augment: Backtrack to s
u, v = P[u], u # Shift one step
if v in G[u]: # Forward edge?
M.add((u,v)) # New edge
else: # Backward edge?
M.remove((v,u)) # Cancellation
return M # Matching  a set of edges

page 225:

The easiest (and most common) solution is to specify two special nodes, s and
t, called the source and the sink. (Such a graph is often called an st
graph, or an stnetwork.)

We can do that by introducing two rules: // Disjoint
path

The number of paths going into any node except s or t must equal the number
of paths going out of that node.

At most one path can go through any given edge.
page 227:

Maximum flow

More generally, the metaphor here is some form of substance flowing through
the network, from the source to the sink, and the capacity represents the
limit for how many units can flow through a given edge.
page 228:

magnitude of the flow is the total amount pushed through the network. (This
can be found by finding the net flow out of the source, for example.)

Flow rules

The amount of flow going into any node except s or t must equal the amount
of flow going out of that node.

At most c(e) units of flow can go through any given edge. [c(e) is capacity
of edge]

// Flow BACK is always zero, but flow can move "backwards"
if there is forward flow that can be cancelled.
page 229:
page 230:
 Listing 103. Finding Augmenting Paths with BFS and Labeling
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15  from collections import deque
inf = float('inf')
def bfs_aug(G, H, s, t, f):
P, Q, F = {s: None}, deque([s]), {s: inf} # Tree, queue, flow label
def label(inc): # Flow increase at v from u?
if v in P or inc <= 0: return # Seen? Unreachable? Ignore
F[v], P[v] = min(F[u], inc), u # Max flow here? From where?
Q.append(v) # Discovered  visit later
while Q: # Discovered, unvisited
u = Q.popleft() # Get one (FIFO)
if u == t: return P, F[t] # Reached t? Augmenting path!
for v in G[u]: label(G[u][v]f[u,v]) # Label along outedges
for v in H[u]: label(f[v,u]) # Label along inedges
return None, 0 # No augmenting path found

 Listing 104. The FordFulkerson Method (by Default, the EdmondsKarp Algorithm)
 def ford_fulkerson(G, s, t, aug=bfs_aug): # Max flow from s to t
H, f = tr(G), defaultdict(int) # Transpose and flow
while True: # While we can improve things
P, c = aug(G, H, s, t, f) # Aug. path and capacity/slack
if c == 0: return f # No augm. path found? Done!
u = t # Start augmentation
while u != s: # Backtrack to s
u, v = P[u], u # Shift one step
if v in G[u]: f[u,v] += c # Forward edge? Add slack
else: f[v,u] = c # Backward edge? Cancel slack

page 231:
 Minimum cuts have several applications that don’t really look like maxflow
problems. Consider, for example, the problem of allocating processes to two
processors in a manner that minimizes the communication between them. Let’s say
one of the processors is a GPU and that the processes have different running
times on the two processors. Some fit the CPU better, while some should be run
on the GPU. However, there might be cases where one fits on the CPU and one on
the GPU, but where the two communicate extensively with each other. In that
case, we might want to put them on the same processor, just to reduce the
communication costs. How would we solve this? We could set up an undirected
flow network with the CPU as the source and the GPU as the sink, for example.
Each process would have an edge to both source and sink, with a capacity equal
to the time it would take to run on that processor. We also add edges between
processes that communicate, with capacities representing the communication
overhead (in extra computation time) of having them on separate processors. The
minimum cut would then distribute the processes on the two processors in such a
way that the total cost is as small as possible—a nontrivial task if we
couldn’t reduce to the mincut problem.
page 234:
 Listing 105. The BusackerGowen Algorithm, Using BellmanFord for Augmentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22  def busacker_gowen(G, W, s, t): # Mincost maxflow
def sp_aug(G, H, s, t, f): # Shortest path (BellmanFord)
D, P, F = {s:0}, {s:None}, {s:inf,t:0} # Dist, preds and flow
def label(inc, cst): # Label + relax, really
if inc <= 0: return False # No flow increase? Skip it
d = D.get(u,inf) + cst # New possible aug. distance
if d >= D.get(v,inf): return False # No improvement? Skip it
D[v], P[v] = d, u # Update dist and pred
F[v] = min(F[u], inc) # Update flow label
return True # We changed things!
for rnd in G: # n = len(G) rounds
changed = False # No changes in round so far
for u in G: # Every fromnode
for v in G[u]: # Every forward tonode
changed = label(G[u][v]f[u,v], W[u,v])
for v in H[u]: # Every backward tonode
changed = label(f[v,u], W[v,u])
if not changed: break # No change in round: Done
else: # Not done before round n?
raise ValueError('negative cycle') # Negative cycle detected
return P, F[t] # Preds and flow reaching t
return ford_fulkerson(G, s, t, sp_aug) # Maxflow with BellmanFord

page 237:
 This chapter deals with a single core problem, finding maximum flows in flow
networks, as well as specialized versions, such as maximum bipartite matching
and finding edgedisjoint paths. You also saw how the minimum cut problem is
the dual of the maximum flow problem, giving us two solutions for the price of
one. Solving the minimum cost flow problem is also a close relative
Chapter 11: Hard Problems and (Limited) Sloppiness
page 242:

Let’s say you’ve come to a small town where one of the main attractions is a
pair of twin mountain peaks. The locals have affectionately called the two
Castor and Pollux, after the twin brothers from Greek and Roman mythology. It
is rumored that there’s a longforgotten gold mine on the top of Pollux, but
many an adventurer has been lost to the treacherous mountain. In fact, so many
unsuccessful attempts have been made to reach the gold mine that the locals
have come to believe it can’t be done. You decide to go for a walk and take a
look for yourself.

After stocking up on donuts and coffee at a local roadhouse, you set off.
After a relatively short walk, you get to a vantage point where you can see
the mountains relatively clearly. From where you’re standing, you can see that
Pollux looks like a really hellish climb—steep faces, deep ravines, and thorny
brush all around it. Castor, on the other hand, looks like a climber’s dream.
The sides slope gently, and it seems there are lots of handholds all the way to
the top. You can’t be sure, but it seems like it might be a nice climb. Too bad
the gold mine isn’t up there.

You decide to take a closer look, and pull out your binoculars. That’s when
you spot something odd. There seems to be a small tower on top of Castor,
with a zip line down to the peak of Pollux. Immediately, you give up any plans
you had to climb Castor. Why? (If you don’t immediately see it, it might be
worth pondering for a bit.)1

Of course, we’ve seen the exact situation before, in the discussions of
hardness in Chapters 4 and 6. The zip line makes it easy to get from Castor
to Pollux, so if Castor were easy, someone would have found the gold mine
already.2 It’s a simple contrapositive: if Castor were easy, Pollux would be
too; Pollux is not easy, so Castor can’t be either. This is exactly what we do
when we want to prove that a problem (Castor) is hard. We take something we
know is hard (Pollux) and show that it’s easy to solve this hard problem using
our new, unknown one (we uncover a zip line from Castor to Pollux).

As I’ve mentioned before, this isn’t so confusing in itself. It can be easy
to confuse things when we start talking about it in terms of reductions,
though. For example, is it obvious to you that we’re reducing Pollux to Castor
here? The reduction is the zip line, which lets us use a solution to Castor as
if it were a solution to Pollux. In other words, if you want to prove that
problem X is hard, find some hard problem Y and reduce it to X.

Caution The zip line goes in the opposite direction of the reduction. It’s
crucial that you don’t get this mixed up, or the whole idea falls apart. The
term reduction here means basically “Oh, that’s easy, you just ...” In other
words, if you reduce A to B, you’re saying “You want to solve A? That’s easy,
you just solve B.” Or in this case: “You want to scale Pollux? That’s easy,
just scale Castor (and take the zip line).” In other words, we’ve reduced the
scaling of Pollux to the scaling of Castor (and not the other way around).

Hard Problems:

The problem is intractable—any algorithm solving it must be exponential.

We don’t know whether the problem is intractable, but no one has ever been
able to find a polynomial algorithm for it.
page 256:
 I have talked about bipartite graphs several times already—graphs whose nodes
can be partitioned into two sets so that all edges are between the sets (that
is, no edges connect nodes in the same set). Another way of viewing this is as
a twocoloring, where you color every node as either black or white (for
example), but you ensure that no neighbors have the same color. If this is
possible, the graph is bipartite.
page 261:

// Some hard problems can be made easy or at least solvable
it we allow ourselves to get sloppy. An approximation ratio is formed by the
optimal solution and the and how good you can actually get.

Let’s consider the maximization case. If we know that the optimum will never
be greater than A, and we know our approximation will never be smaller than
B, we can be certain that the ratio of the two will never be greater than A/B.
page 262:

A Hamilton cycle connects all nodes, and the absolutely cheapest way of
connecting all nodes is using a minimum spanning tree.

Listing 111. The “Twice Around the Tree” Algorithm, a 2Approximation for
Metric TSP
 from collections import defaultdict # 2approx for metric TSP
def mtsp(G, r): # Tree and cycle
T, C = defaultdict(list), [] # Build a traversable MSP
for c, p in prim(G, r).items(): # Child is parent's neighbor
T[p].append(c) # Recursive DFS
def walk(r): # Preorder node collection
C.append(r) # Visit subtrees recursively
for v in T[r]: walk(v) # Traverse from the root
walk(r) # At least halfoptimal cycle
return C

page 265:
 Note The nonlocal keyword, which is used in Listing 112, lets you modify a
variable in a surrounding scope, just like global lets you modify the global
scope. However, this feature is new in Python 3.0. If you want similar
functionality in earlier Pythons, simply replace the initial sol = 0 by sol =
[0], and later access the value using the expression sol[0] instead of just sol
.
page 266:
 Listing 112. Solving the Knapsack Problem with the Branch and Bound Strategy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38  from __future__ import division
from heapq import heappush, heappop
from itertools import count
def bb_knapsack(w, v, c):
sol = 0 # Solution so far
n = len(w) # Item count
idxs = list(range(n))
idxs.sort(key=lambda i: v[i]/w[i], reverse=True) # Sort by descending unit cost
def bound(sw, sv, m): # Greedy knapsack bound
if m == n: return sv # No more items?
objs = ((v[i], w[i]) for i in idxs[m:]) # Descending unit cost order
for av, aw in objs: # Added value and weight
if sw + aw > c: break # Still room?
sw += aw # Add wt to sum of wts
sv += av # Add val to sum of vals
return sv + (av/aw)*(csw) # Add fraction of last item
def node(sw, sv, m): # A node (generates children)
nonlocal sol # "Global" inside bb_knapsack
if sw > c: return # Weight sum too large? Done
sol = max(sol, sv) # Otherwise: Update solution
if m == n: return # No more objects? Return
i = idxs[m] # Get the right index
ch = [(sw, sv), (sw+w[i], sv+v[i])] # Children: without/with m
for sw, sv in ch: # Try both possibilities
b = bound(sw, sv, m+1) # Bound for m+1 items
if b > sol: # Is the branch promising?
yield b, node(sw, sv, m+1) # Yield child w/bound
num = count() # Helps avoid heap collisions
Q = [(0, next(num), node(0, 0, 0))] # Start with just the root
while Q: # Any nodes left?
_, _, r = heappop(Q) # Get one
for b, u in r: # Expand it ...
heappush(Q, (b, next(num), u)) # ... and push the children
return sol # Return the solution

page 271:
PEDAL TO THE METAL: ACCELERATING PYTHON (Appendix A)
page 272:

GPULib and PyStream. These two packages let you use graphics processing units
(GPUs) to accelerate your code. They don’t provide the kind of dropin
acceleration that a JIT compiler such as Psyco would, but if you have a
powerful GPU, why not use it? Of the two projects, PyStream is older, and the
efforts of TechX Corporation have shifted to the newer GPULib project. It
gives you a highlevel interface for various forms of numeric computation using
GPUs.

Pyrex, Cython, and Shedskin. These three projects let you translate Python
code into C or C++. Shedskin compiles plain Python code into C++, while Pyrex
and Cython (which is a fork of Pyrex) primarily target C. In Cython (and
Pyrex), you can add optional type declarations to your code, stating that a
variable is (and will always be) an integer, for example. In Cython, there is
also interoperability support for NumPy arrays, letting you write lowlevel
code that accesses the array contents very efficiently. I have used this in my
own code, achieving speedup factors of up to 300–400 for suitable code. The
code that is generated by Pyrex and Cython can be compiled directly to an
extension module that can be imported into Python.

SWIG, F2PY and Boost.Python. These tools let you wrap C/C++, Fortran, and C++
code, respectively. Although you could write your own wrapper code for
accessing your extension modules, using a tool like one of these takes a lot of
the tedium out of the job—and makes it more likely that the result will be
correct. For example, when using SWIG, you run a commandline tool on your C
(or C++) header files, and wrapper code is generated. A bonus to using SWIG is
that it can generate wrappers for a lot of other languages, beside Python, so
your extension could be available for Java or PHP as well, for example.

ctypes, llvmpy, and CorePy. These are modules that let you manipulate
lowlevel code objects in your Python code. The ctypes module lets you build
C objects in memory and call C functions in shared libraries (such as DLLs)
with those objects as parameters. The llvmpy package gives you a Python API to
the LLVM, mentioned earlier, which lets you build code and then compile it
efficiently. If you wanted, you could use this to build your own compiler
(perhaps for a language of your own?) in Python. CorePy also lets you
manipulate and efficiently run code objects, although it works at the assembly
level. (Note that ctypes is part of the Python standard library.)
page 273:
LIST OF PROBLEMS AND ALGORITHMS (Appendix B)

In most descriptions in this appendix, n refers to the problem size (such as
the number of elements in a sequence). For the special case of graphs,
though, n refers to the number of nodes, and m refers to the number of edges.

Cliques and independent sets. A clique is a graph where there is an edge
between every pair of nodes. The main problem of interest here is finding a
clique in a larger graph (that is, identifying a clique as a subgraph). An
independent set in a graph is a set of nodes where no pair is connected by an
edge. In other words, finding an independent set is equivalent to taking the
complement of the edge set and finding a clique. Finding a kclique (a clique
of k nodes) or finding the largest clique in a graph (the max clique problem)
is NPhard. (For more information, see Chapter 11.)

Closest pair. Given a set of points in the Euclidean plane, find the two
points that are closest to each other. This can be solved in loglinear time
using the divideandconquer strategy (see Chapter 6).

Compression and optimal decision trees. A Huffman tree is a tree whose leaves
have weights (frequencies), and the sum of their weights multiplied by their
depth is as small as possible. This makes such trees useful for constructing
compression codes and as decision trees when a probability distribution is
known for the outcomes. Huffman trees can be built using Huffman’s algorithm,
described in Chapter 7 (Listing 71).
page 276:

Connected and strongly connected components. An undirected graph is connected
if there is a path from every node to every other. A directed graph is
connected if its underlying undirected graph is connected. A connected
component is a maximal subgraph that is connected. Connected components can be
found using traversal algorithms such as DFS (Listing 55) or BFS (Listing
59), for example. If there is a (directed) path from every node to every other
in a directed graph, it is called strongly connected. A strongly connected
component (SCC) is a maximal subgraph that is strongly connected. SCCs can be
found using Kosaraju’s algorithm (Listing 510).

Convex hulls. A convex hull is the minimum convex region containing a set of
points in the Euclidean plane. Convex hulls can be found in loglinear time
using the divideandconquer strategy (see Chapter 6).

Finding the minimum/maximum/median. Finding the minimum and maximum of a
sequence can be found in linear time by a simple scan. Repeatedly finding and
extracting the maximum or minimum in constant time, given lineartime
preparation, can be done using a binary heap. It is also possible to find the
kth smallest element of a sequence (the median for k = n/2) in linear (or
expected linear) time, using the select or randomized select. (For more
information, see Chapter 6).

Flow and cut problems. How many units of flow can be pushed through a network
with flow capacities on the edges? That is the maxflow problem. An
equivalent problem is finding the set of edge capacities that most constrain
the flow; this is the mincut problem. There are several versions of these
problems. For example, you could add costs to the edges, and find the cheapest
of the maximum flows. You could add lower bound on each edge and look for a
feasible flow. You could even add separate supplies and demands in each node.
These problems are dealt with in detail in Chapter 10.

Graph coloring. Try to color the nodes of a graph so that no neighbors share
a color. Now try to do this with a given number of colors, or even to find
the lowest such number (the chromatic number of the graph). This is an NPhard
problem in general. If, however, you’re asked to see if a graph is
twocolorable (or bipartite), the problem can be solved in linear time using
simple traversal. The problem of finding a clique cover is equivalent to
finding an independent set cover, which is an identical problem to graph
coloring. (See Chapter 11 for more on graph coloring.)

Hamilton cycles/paths and TSP ... and Euler tours. Several path and subgraph
problems can be solved efficiently. If, however, you want to visit every node
exactly once, you’re in trouble. Any problem involving this constraint is
NPhard, including finding a Hamilton cycle (visit every node once and return),
a Hamilton path (visit every node once, without returning), or a shortest tour
of a complete graph (the Traveling Salesman/Salesrep problem). The problems are
NPhard both for the directed and undirected case (see Chapter 11). The related
problem of visiting every edge exactly once, though— finding a socalled Euler
tour—is solvable in polynomial time (see Chapter 5). The TSP problem is NP
hard even for special cases such as using Euclidean distances in the plane, but
it can be efficiently approximated to within a factor of 1.5 for this case, and
for any other metric distance. Approximating the TSP problem in general,
though, is NPhard. (See Chapter 11 for more information.)

Longest increasing subsequence. Find the longest subsequence of a given
sequence whose elements are in increasing order. This can be solved in
loglinear time using dynamic programming (see Chapter 8).

Matching. There are many matching problems, all of which involve linking some
object to others. The problems discussed in this book are bipartite matching
and mincost bipartite matching (Chapter 10) and the stable marriage problem
(Chapter 7). Bipartite matching (or maximum bipartite matching) involves
finding the greatest subset of edges in a bipartite graph so that no two edges
in the subset share a node. The mincost version does the same, but minimizes
the sum of edge costs over this subset. The stable marriage problem is a bit
different; there, all men and women have preference rankings of the members of
the opposite sex. A stable set of marriages is characterized by the fact that
you can’t find a pair that would rather have each other than their current
mates.
page 277:

Minimum spanning trees. A spanning tree is a subgraph whose edges form a tree
over all the nodes of the original graph. A minimum spanning tree is one that
minimizes the sum of edge costs. Minimum spanning trees can be found using
Kruskal’s algorithm (Listing 74) or Prim’s algorithm (Listing 75), for
example. Because the number of edges is fixed, a maximum spanning tree can be
found by simply negating the edge weights.

Partitioning and bin packing. Partitioning involves dividing a set of numbers
into two sets with equal sums, while the bin packing problem involves packing
a set of numbers into a set of “bins” so that the sum in each bin is below a
certain limit and so that the number of bins is as small as possible. Both
problems are NPhard. (See Chapter 11.)

SAT, CircuitSAT, kCNFSAT. These are all varieties of the satisfaction
problem (SAT), which asks you to determine whether a given logical (Boolean)
formula can ever be true, if you’re allowed to set the variables to whatever
truth values you want. The circuitSAT problem simply uses logical circuits
rather than formulas, and kCNFSAT involves formulas in conjunctive normal
form, where each clause consists of k literals. The latter can be solved in
polynomial time for k = 2. The other problems, as well as kCNFSAT for k > 2,
are NPcomplete. (See Chapter 11.)

Searching. This is a very common and extremely important problem. You have a
key and want to find an associated value. This is, for example, how variables
work in dynamic languages such as Python. It’s also how you find almost
anything on the Internet these days. Two important solutions are hash tables
(see Chapter 2) and binary search or search trees (see Chapter 6). Given a
probability distribution for the objects in the data set, optimal search trees
can be constructed using dynamic programming (see Chapter 8).

Sequence comparison. You may want to compare two sequences to know how
similar (or dissimilar) they are. One way of doing this is to find the
longest subsequence the two have in common (longest common subsequence) or to
find the minimum number of basic edit operations to go from one sequence to the
other (socalled edit distance, or Levenshtein distance). These two problems
are more or less equivalent; see Chapter 8 for more information.

Sequence modification. Inserting an element into the middle of a linked list
is cheap (constant time), but finding a given location is costly (linear
time); for an array, the opposite is true (constant lookup, linear insert,
because all later elements must be shifted). Appending can be done cheaply for
both structures, though (see the black box sidebar on list in Chapter 2).

Set and vertex covers. A vertex cover is a set of vertices that cover (that
is, are adjacent to) all the edges of the graph. A set cover is a
generalization of this idea, where the nodes are replaced with subsets, and you
want to cover the entire set. The problem lies in constraining or minimizing
the number of nodes/subsets. Both problems are NPhard (see Chapter 11).

Shortest paths. This problem involves finding the shortest path from one node
to another, from one node to all the others (or vice versa), or from all
nodes to all others. The onetoone, onetoall, and all toone cases are
solved the same way, normally using BFS for unweighted graphs, DAG shortest
path for DAGs, Dijkstra’s algorithm for nonnegative edge weights, and
Bellman–Ford in the general case. To speed up things in practice (although
without affecting the worstcase running time), one can also use bidirectional
Dijkstra, or the A* algorithm. For the all pairs shortest paths problem, the
algorithms of choice are probably Floyd–Warshall or (for sparse graphs)
Johnson’s algorithm. If the edges are nonnegative, Johnson’s algorithm is
(asymptotically) equivalent to running Dijkstra’s algorithm from every node
(which may be more effective). (For more information on shortest path
algorithms, see Chapters 5 and 9.) Note that the longest path problem (for
general graphs) can be used to find Hamilton paths, which means that it is
NPhard. This, in fact, means that the shortest path problem is also NP hard
in the general case. If we disallow negative cycles in the graph, however, our
polynomial algorithms will work.
page 278:

Sorting and element uniqueness. Sorting is an important operation and an
essential subroutine for several other algorithms. In Python, you would
normally sort by using the list.sort method or the sorted function, both of
which use a highly efficient implementation of the timsort algorithm. Other
algorithms include insertion sort, selection sort, and gnome sort (all of which
have a quadratic running time), as well as heapsort, mergesort, and quicksort
(which are loglinear, although this holds only in the average case for
quicksort). For the info on the quadratic sorting algorithms, see Chapter 5;
for the loglinear (divideandconquer) algorithms, see Chapter 6. Deciding
whether a set of real numbers contains duplicates cannot (in the worst case) be
solved with a running time better than loglinear. By reduction, neither can
sorting.

The halting problem. Determine whether a given algorithm will terminate with
a given input. The problem is undecidable (that is, unsolvable) in the
general case (see Chapter 11).

The knapsack problem and integer programming. The knapsack problem involves
choosing a valuable subset of a set of items, under certain constraints. In
the (bounded) fractional case, you have a certain amount of some substances,
each of which has a unit value (value per unit of weight). You also have a
knapsack that can carry a certain maximum weight. The (greedy) solution is to
take as much as you can of each substance, starting with the one with the
highest unit value. For the integral knapsack problem, you can only take entire
items—fractions aren’t allowed. Each item has a weight and a value. For the
bounded case (also known as 01 knapsack) you have a limited number of objects
of each type. (Another perspective would be that you have a fixed set of
objects that you either take or not.) In the unbounded case, you can take as
many as you want from each of a set of object types (still respecting your
carrying capacity, of course). A special case known as the subset sum problem
involves selecting a subset of a set of numbers so that the subset has a given
sum. These problems are all NPhard (see Chapter 11), but admit
pseudopolynomial solutions based on dynamic programming (see Chapter 8). The
fractional knapsack case, as explained, can even be solved in polynomial time
using a greedy strategy (see Chapter 7). Integer programming is, in some ways,
a generalization of the knapsack problem (and is therefore obviously NPhard).
It is simply linear programming where the variables are constrained to be
integers.

Topological sorting. Order the nodes of a DAG so that all the edges point in
the same direction. If the edges represent dependencies, a topological
sorting represents an ordering that respects the dependencies. This problem can
be solved by a form of reference counting (see Chapter 4) or by using DFS (see
Chapter 5).

Traversal. The problem here is to visit all the objects in some connected
structure, usually represented as nodes in a graph or tree. The idea can be
either to visit every node or to visit only those needed to solve some problem.
The latter strategy of ignoring parts of the graph or tree is called pruning
and is used (for example) in search trees and in the branch and bound strategy.
For a lot on traversal, see Chapter 5.
ALGORITHMS AND DATA STRUCTURES (Appendix B)

23trees. Balanced tree structure, allowing insertions, deletions, and
search in worstcase Θ(lg n) time. Internal nodes can have two or three
children, and the tree is balanced during insertion by splitting nodes, as
needed. (See Chapter 6.)

A*. Heuristically guided single source shortest path algorithm. Suitable for
large search spaces. Instead of choosing the node with the lowest distance
estimate (as in Dijkstra’s), the node with the lowest heuristic value (sum of
distance estimate and guess for remaining distance) is used. Worstcase running
time identical to Dijkstra’s algorithm. (See Listing 910.)

AAtree. 23trees simulated using node rotations in a binary tree with
levelnumbered nodes. Worst case running times of Θ(lg n) for insertions,
deletions, and search. (See Listing 66.)

Bellman–Ford. Shortest path from one node to all others in weighted graphs.
Looks for a shortcut along every edge n times. Without negative cycles,
correct answer guaranteed after n –1 iterations. If there’s improvement in the
last round, a negative cycle is detected, and the algorithm gives up. Running
time Θ(nm). (See Listing 92.)

Bidirectional Dijkstra. Dijkstra’s algorithm run from start and end node
simultaneous, with alternating iterations going to each of the two
algorithms. The shortest path is found when the two meet up in the middle
(although some care must be taken at this point). The worstcase running time
is just like for Dijkstra’s algorithm. (See Listings 98 and 99.)

Binary search trees. A binary tree structure where each node has a key (and
usually an associated value). Descendant keys are partitioned by the node
key: smaller keys go in the left subtree, and greater keys go in the right. On
the average, the depth of any node is logarithmic, giving an expected insertion
and search time of Θ(lg n). Without extra balancing, though (such as in the
AAtree), the tree can become unbalanced, giving linear running times. (See
Listing 62.)

Bisection, binary search. A search procedure that works in a manner similar
to search trees, by repeated halving the interval of interest in a sorted
sequence. The halving is performed by inspecting the middle element, and
deciding whether the sought value must lie to the left or right. Running time
Θ(lg n). A very efficient implementation can be found in the bisect module.
(See Chapter 6.)

Branch and bound. A general algorithmic design approach. Searches a space of
solutions in a depthfirst or bestfirst order by building and evaluating
partial solutions. A conservative estimate is kept for the optimal value, while
an optimistic estimate is computed for a partial solution. If the optimistic
estimate is worse than the conservative one, the partial solution is not
extended, and the algorithm backtracks. Often used to solve NPhard problems.
(See Listing 112 for a branchandbound solution to the 01 knapsack problem.)

BreadthFirst Search (BFS). Traversing a graph (possibly a tree) level by
level, thereby also identifying (unweighted) shortest path. Implemented by
using a FIFO queue to keep track of discovered nodes. Running time Θ(n + m).
(See Listing 59.)

Bucket sort. Sort numerical values that are evenly (uniformly) distributed in
a given interval by dividing the interval into n equalsized buckets and
placing the values in them. Expected bucket size is constant, so they can be
sorted with (for example) insertion sort. Total running time Θ(n). (See Chapter
4.)

Busacker–Gowen. Finds the cheapest maxflow (or the cheapest flow with a
given flow value) in a network by using the cheapest augmenting paths in the
Ford–Fulkerson approach. These paths are found using Bellman–Ford or (with some
weight adjustments) Dijkstra’s algorithm. The running time in general depends
on the maximum flow value and so is pseudopolynomial. For a maximum flow of k,
the running time is (assuming Dijkstra’s algorithm is used) O(km lg n). (See
Listing 105.)
page 280:

Christofides’ algorithm. An approximation algorithm (with an approximation
ratio bound of 1.5) for the metric TSP problem. Finds a minimum spanning tree
and then a minimum matching2 among the odd degree nodes of the tree,
shortcircuiting as needed to make a valid tour of the graph. (See Chapter 11.)

Counting sort. Sort integers with a small value range (with at most Θ(n)
contiguous values) in Θ(n) time. Works by counting occurrences and using the
cumulative counts to directly place the numbers in the result, updating the
counts as it goes. (See Chapter 4.)

DAG Shortest Path. Finds the shortest path from one node to all others in a
DAG. Works by finding a topological sorting of the nodes and then relaxing
all outedges (or, alternatively, all inedges) at every node from left to
right. Can (because of the lack of cycles) also be used to find longest paths.
Running time Θ(n + m). (See Listing 84.)

DepthFirst Search (DFS). Traversing a graph (possibly a tree) by going in
depth and then backtracking. Implemented by using a LIFO queue to keep track
of discovered nodes. By keeping track of discover and finishtimes, DFS can
also be used as a subroutine in other algorithms (such as topological sorting
or Kosaraju’s algorithm). Running time Θ(n + m). (See Listings 54, 55, and
56.)

Dijkstra’s algorithm. Find the shortest paths from one node to all others in
a weighted graph, as long as there are no negative edge weights. Traverses
the graph, repeatedly selecting the next node using a priority queue (a heap).
The priority is the current distance estimate of the node. These estimates are
updated whenever a shortcut is found from a visited node. The running time is
Θ((m+n) lg n), which is simply Θ(m lg n) if the graph is connected.

Doubleended queues. FIFO queues implemented using linked lists (or linked
lists of arrays), so that inserting and extracting objects at either end can
be done in constant time. An efficient implementation can be found in the
collections.deque class. (See Chapter 5.)

Dynamic arrays, vectors. The idea of having extra capacity in an array, so
appending is efficient. By relocating the contents to a bigger array, growing
it by a constant factor, when it fills up, appends can be constant in average
(amortized) time. (See Chapter 2.)

Edmonds–Karp. The concrete instantiation of the Floyd–Warshall method where
traversal is performed using BFS. Finds mincost flow in Θ(nm2) time. (See
Listing 104.)

Floyd–Warshall. Finds shortest paths from each nodes to all others. In
iteration k, only the first k nodes (in some ordering) are allowed as
intermediate nodes along the paths. Extending from k –1 involves checking
whether the shortest paths to and from k via the first k –1 nodes is shorter
than simply going directly via these nodes. (That is, node k is either used or
not, for every shortest path.) Running time is Θ(n3). (See Listing 96.)

Ford–Fulkerson. A general approach to solving maxflow problems. The method
involves repeatedly traversing the graph to find a socalled augmenting path,
a path along which the flow can be increased (augmented). The flow can be
increased along an edge if it has extra capacity, or it can be increased
backward across an edge (that is, canceled) if there is flow along the edge.
Thus, the traversal can move both forward and backward along the directed
edges, depending on the flow across them. The running time depends on the
traversal strategy used. (See Listing 104.)

Gale–Shapley. Finds a stable set of marriages given preference rankings for a
set of men and women. Any unengaged men propose to the most preferred woman
they haven’t proposed to. Each woman will choose her favorite among her current
suitors (possibly staying with her fiancé). Can be implemented with quadratic
running time. (See Chapter 7.)
page 281:

Gnome sort. A simple sorting algorithm with quadratic running time. Probably
not an algorithm you’ll use in practice. (See Listing 31.)

Hashing, hash tables. Look up a key to get the corresponding value, just like
in a search tree. Entries are stored in an array, and their positions are
found by computing a (pseudorandom, sort of) hash value of the key. Given a
good hash function and enough room in the array, the expected running time of
insertion, deletion and lookup is Θ(1). (See Chapter 2.)

Heaps, heapsort. Heaps are efficient priority queues. With lineartime
preprocessing, a min (max) heap will let you find the smallest (largest)
element in constant time and extract or replace it in logarithmic time. Adding
an element can also be done in logarithmic time. Conceptually, a heap is a full
binary tree where each node is smaller (larger) than its children. When
modifications are made, this property can be repaired with Θ(lg n) operations.
In practice, heaps are usually implemented using arrays (with nodes encoded as
array entries). A very efficient implementation can be found in the heapq
module. Heapsort is like selection sort, except that the unsorted region is a
heap, so finding the largest element n times gives a total running time of Θ(n
lg n). (See Chapter 6.)

Huffman’s algorithm. Builds Huffman trees, which can be used for building
optimal prefix codes, for example. Initially, each element (for example,
character in an alphabet) is made into a singlenode tree, with a weight equal
to its frequency. In each iteration, the two lightest trees are picked,
combining them with a new root, giving the new tree a weight equal to the sum
of the original two tree weights. This can be done in loglinear time (or, in
fact, in linear time if the frequencies are presorted). (See Listing 71.)

Insertion sort. A simple sorting algorithm with quadratic running time. It
works by repeatedly inserting the next unsorted element in an initial sorted
segment of the array. For small data sets, it can actually be preferable to
more advanced (and optimal) algorithms such as merge sort or quicksort. (In
Python, though, you should use list.sort or sorted if at all possible.) (See
Listing 43.)

Interpolation search. Similar to ordinary binary search, but linear
interpolation between the interval endpoints is used to guess the correct
position, rather than simply looking at the middle element. The worstcase
running time is still Θ(lg n), but the averagecase running time is O(lg lg n)
for uniformly distributed data. (Mentioned in the “If You’re Curious ...”
section of Chapter 6.)

Iterative deepening DFS. Repeated runs of DFS, where each run has a limit to
how far it can traverse. For structures with some fanout, the running time
will be the same as for DFS or BFS (that is, Θ(n + m)). The point is that it
has the advantages of BFS (it finds shortest paths, and explores large state
spaces conservatively), with the smaller memory footprint of DFS. (See Listing
58.)

Johnson’s algorithm. Finds shortest paths from every node to all others.
Basically runs Dijkstra’s from every node. However, it uses a trick so that
it also works with negative edge weights: it first runs Bellman–Ford from a new
start node (with added edges to all existing nodes) and then uses the resulting
distances to modify the edge weights of the graph. The modified weights are all
nonnegative but are set so that the shortest paths in the original graph will
also be the shortest paths in the modified graph. Running time Θ(mn lg n).
(See Listing 94.)

Kosaraju’s algorithm. Finds strongly connected components, using DFS. First,
nodes are ordered by their finish times. Then the edges are reversed, and
another DFS is run, selecting start nodes using the first ordering. Running
time Θ(n + m). (See Listing 510.)
page 282:

Kruskal’s algorithm. Finds a minimum spanning tree by repeatedly adding the
smallest remaining edge that doesn’t create a cycle. This cycle checking can
(with some cleverness) be performed very efficiently, so the running time is
dominated by sorting the edges. All in all, the running time is Θ(m lg n). (See
Chapter 7.)

Linked lists. An alternative to arrays for representing sequences. Although
linked lists are cheap (constant time) to modify once you’ve found the right
entries, finding those normally takes linear time. Linked lists are
implemented sort of like a path, with each node pointing to the next. Note that
Python’s list type is implemented as an array, not a linked list. (See Chapter
2.)

Merge sort. The archetypal divideandconquer algorithm. It divides the
sequence to be sorted in the middle, sorts the two halves recursively, and
then merges the two sorted halves in linear time. The total running time is Θ(n
lg n). (See Listing 65.)

Ore’s algorithm. An algorithm for traversing actual mazes in person, by
marking passage entries and exits. In many ways similar to iterative
deepening DFS or BFS. (See Chapter 5.)

Prim’s algorithm. Grows a minimum spanning tree by repeatedly adding the node
closest to the tree. It is, at core, a traversal algorithm, and uses a
priority queue, just like Dijkstra’s algorithm. (See Listing 75.)

Radix sort. Sorts numbers (or other sequences) by digit (element), start with
the least significant one. As long as the number of digits is constant, and
the digits can be sorted in linear time (using, for example, counting sort),
the total running time is linear. It’s important that the sorting algorithm
used on the digits is stable. (See Chapter 4.)

Randomized Select. Finds the median, or, in general, the kth order statistik
(the kth smallest element). Works sort of like “half a quicksort.” It
chooses a pivot element at random (or arbitrarily), and partitions the other
elements to the left (smaller elements) or right (greater elements) of the
pivot. The search then continues in the right portion, more or less like binary
search. Perfect bisection is not guaranteed, but the expected running time is
still linear. (See Listing 63.)

Select. The rather unrealistic, but guaranteed linear, sibling of randomized
select. It works as follows: Divide the sequence into groups of five. Find
the median in each using insertion sort. Find the median of these medians
recursively, using select. Use this median of medians as a pivot, and partition
the elements. Now run select on the proper half. In other words, it’s very
similar to randomized select—the difference is that it can guarantee that a
certain percentage will end up on either side of the pivot, avoiding the
totally unbalanced case. Not really an algorithm you’re likely to use in
practice, but it’s important to know about. (See Chapter 6.)

Selection sort. A simple sorting algorithm with quadratic running time. Very
similar to insertion sort, but instead of repeatedly inserting the next
element into the sorted section, you repeatedly find (that is, select) the
largest element in the unsorted region (and swap it with the last unsorted
element). (See Listing 45.)

Timsort. A superduper inplace sorting algorithm based on mergesort. Without
any explicit conditions for handling special cases, it is able to take into
account partially sorted sequences, including segments that are sorted in
reverse, and can therefore sort many realworld sequences faster than what
would seem possible. The implementation in list.sort and sorted is also really
fast, so if you need to sort something, that’s what you should use. (See
Chapter 6.)

Topological sorting by reference counting. Orders the nodes of a DAG so that
all edges go from left to right. This is done by counting the number of
inedges at each node. The nodes with an indegree of zero are kept in a queue
(could just be a set; the order doesn’t matter). Nodes are taken from the queue
and placed in the topological sorted order. As you do so, you decrement the
counts for the nodes that this node has edges to. If any of them reaches zero,
they are placed in the queue. (See Listing 411.)
page 283:

Topological sorting with DFS. Another algorithm for sorting DAG nodes
topologically. The idea is simple: perform a DFS and sort the nodes by
inverse finish time. To easily get a linear running time, you can instead
simply append nodes to your ordering as they receive their finish times in DFS.
(See Listing 57.)

Tremaux’s algorithm. Like Ore’s algorithm, this is designed to be executed in
person, while walking through a maze. The pattern traced by a person executing
Tremaux’s algorithm is essentially the same as that of DFS. (See Chapter 5.)

Twice around the tree. An approximation algorithm for the metric TSP problem,
guaranteed to yield a solution with a cost of at most twice the optimum.
First, it builds a minimum spanning tree (which is less than the optimum), and
then it “walks around” the tree, taking shortcuts to avoid visiting the same
edge twice. Because of the metricity, this is guaranteed to be cheaper than
walking each edge twice. This last traversal can be implemented by a preorder
DFS. (See Listing 111.)
GRAPH TERMINOLOGY (Appendix C)
page 286:
page 288:

Another term for an undirected, acyclic graph is forest, and the connected
components of a forest are called trees. In other words, a tree is a
connected forest (that is, a forest consisting of a single connected
component). For example, G1 is a forest with two trees. In a tree, nodes with a
degree of one are called leaves (or external nodes),4 while all others are
called internal nodes. The larger tree in G1, for example, has three leaves and
two internal nodes. The smaller tree consists of only an internal node,
although talking about leaves and internal nodes may not make much sense with
fewer than three nodes.

Trees have several interesting and important properties, some of which are
dealt with in relation to specific topics throughout the book. I’ll give you
a few right here, though. Let T be an undirected graph with n nodes. Then the
following statements are equivalent (Exercise 29 asks you to show that this
is, indeed, the case):

T is a tree (that is, it is acyclic and connected).
 T is acyclic, and has n –1 edges.
 T is connected, and has n –1 edges.
 Any two nodes are connected by exactly one path.
 T is acyclic, but adding any new edge to it will create a cycle.

T is connected, but removing any edge yields two connected components.

In other words, any one of these statements of T, on its own, characterizes
it as well as any of the others. If someone tells you that there is exactly
one path between any pair of nodes in T, for example, you immediately know that
it is connected and has n –1 edges and that it has no cycles. Quite often, we
anchor our tree by choosing a root node (or simply root). The result is called
a rooted tree, as opposed to the free trees we’ve been looking at so far. (If
it is clear from the context whether a tree has a root or not, I will simply
use the unqualified term tree in both the rooted and free case.) Singling out a
node like this lets us define the notions of up and down. Paradoxically,
computer scientists (and graph theorists in general) tend to place the root at
the top and the leaves at the bottom. (We probably should get out more ...).
For any node, up is in the direction of the root (along the single path between
the node and the root). Down is then any other direction (automatically in the
direction of the leaves). Note that in a rooted tree, the root is considered
an internal node, not a leaf, even if it happens to have a degree of one.