Home · Book Reports · 2018 · Software Architecture With Python

Published: August 29, 2018

Tags: Programming · Python

Author :: Anand Balachandran Pillai

Publication Year :: 2017

Read Date :: 2018-08-29

Source :: Software_Architecture_with_Python(2017).pdf

The book in...

One sentence:

Architecture (high level) versus design (low level).

Five sentences:

You can't cover every aspect of architecture (or design) in one book. Every single chapter's topic is more of a crash course and could easily (and have been) discussed in their own book. That said, it was an interesting read for someone who has no plans or desires to build a system at the scales the book discusses. So who and when should you read this? Probably best suited for someone with programming experience that wants to scale up their career.

designates my notes. / designates important.

Thoughts

Architecture is different from design. It is done from a higher, enterprise, level and sees everything from the top down. The components of the architecture can be delegated to subsystems architects, each responsible for their piece. (security, technology, data, etc)

Even if you didn’t explicitly map out an architecture you still have one. You always have one, even if might not be formal.

Some things to think about at the architecture level:

Modifiability
Testability
Scalability and performance
Availability
Security
Deployability

Some of the documentation practice is silly. A loop contains the following:

  # sleeps for 1 second every time
  time.sleep(1)

No shit! From what I have learned, it is best to write self-documenting code. In this case the time.sleep() function is pretty self-documented.

There is some suspect advice on potentially using multiple return types. It also mentions returning objects that can hold data (like url.content for example) or returning tuples that contain error codes.

Some tools to make your code cleaner:

Pylint
Pyflakes
McCabe
Pycodestyle
Flake8

Some things to be on the look at for within your code:

style
dependency
refactoring
code smells

Chapter three talks about testability and test strategies. It offers advice on using stubs and mocks and when to white/black box your tests.

Specifically it mentions unittest, nose, and py.test.

Code coverage is covered here alongside integration tests. For example unittest is one unit while integration tests the whole, like selenium.

And you won’t get out of the chapter without the buzzword test driven development. (not that it is terrible)

Chapter four is definitely adjacent to chapter three, literally and figuratively. It talks about testing, but for performance. Should you test for perfomance throughout development (I say no) or should you do it at the end (the end? modern software has that?)?

There is discussion on big O complexity, data structures, and some tools to aid you in squeezing every bit out of your code.

timeit
performance, time, memory, profiling
deque, defaultdict, OrderedDict, Counter, Chainmap, and namedtuple,
bloom filter

With chapter five it is all about scalability (horizontal and vertical). Can you take advantage of concurrency? What kind of latency and performance do you have?

An example of a producer/consumer to output thumbnails is given.

Should you use locks or semaphores (which were faster by 4x)?

Under certain conditions can you throttling aspects of your program? If you are going to use concurrency, will you take advantage of threads or processes.

asyncio
celery, task queue

When you deploy, which interpreter will you choose?

WSGI
uWSGI
Gunicorn

Finally it ends with some suggested best practices.

Chapter six deals with security, but is a little lacks. This can be forgiven since the book tries to cover a lot and security is its own book.

It first starts with things like what kind of protection are you looking for: confidential, integrity, availability?

How is access going to be handled: authentication, authorization, non-reputability?

There is a warning about how eval is unsafe and that you should never use pickle’s from unknown sources (use JSON or YAML instead)

They don’t say it but I will: never trust your users. Always validate and sanitize input.

Don’t use the old template method, instead use “{}".format().

There is a little on passwords. Basic stuff is covered like: don’t store passwords, store hashes and don’t reference passwords in functions. Also, use password library. There is no need to reinvent the wheel (and yours will probably be poorly implemented).

While I personally like to keep everything in-house to reduce reliance on libraries as much as reasonable, passwords are something you shouldn’t take a risk with.

Keeping with the thin content, chapter seven touches on design patterns. You can read many (better) books on this (like The Gang of Four).

They break the patterns down into: Creational, Structural, and Behavioral patterns.

singleton versus the (superior) Borg. I really like this, even though it all feels like jumping through hoops to get global state
factory
prototype
builder
adapter
facade
proxy
iterator
observer
state - the state machine changes the class, it feels crazy to do it this way

Chapter eight is again a bit skimpy with limited coverage of the model/view/controller architecture.

It mentions event driven architecture and some examples and use cases: chat servers, select, sockets, twisted, eventlet, gevent.

My least favorite thing, microservices, comes next. They are not an architecture for a particular problem and can be used in many places. What they don’t mention is the fragility this can lead to.

Pipe and filter architecture

linking generators, neat
link actual pipes within one machine or sockets across multiple
link microservices into a pipe

It was touched on before, kind of, when they mentioned WSGI, but chapter nine is all about deploying. How are you going to set up your dev/testing/staging/production environment?

You should install/setup python packages with pip and run in a virtualenv. There is some discussion on pypi and packaging for pypi distribution: structure and imports, setup.py.

Interestingly they never mention docker, but do talk about managing with fabric vs (the better) ansible (can be run multiple times, won’t change what doesn’t need changed).

A few deployment patterns:

continuous
blueGreen (2 envs, 1 dev, 1 live, you alternate between them)
canary (like beta test)
A/B testing (aka bucket testing)
induced chaos, since deployment envs tend to drift from original config (outside CI), break stuff purposefully to see how it reacts

The final chapter, ten, is on debugging. From the simple print peppered throughout to eliminate blocks of code and using sys.exit().

It circles back on itself and recovers mocking with random data generation via the schematics module.

To save some bandwidth (and time and money) you can use caching (so you don’t need to hit external APIs or make expensive database calls. Some ways to handle the cache include:

redis with time to live so you don’t get too stale data
you could also put a tag of when the data was cached and check it with each call if it exceeds a certain time, get fresh data

Finally it looks at more advanced tools:

logging
pdb - python debugger and more advanced loggers, ipdb, pdb++
trace module
lptrace
strace

01: Principles of Software Architecture
02: Writing Modifiable and Readable Code
03: Testability - Writing Testable Code
04: Good Performance is Rewarding
05: Writing Applications that Scale
06: Security - Writing Secure Code
07: Design Patterns in Python
08: Python – Architectural Patterns
09: Deploying Python Applications
10: Techniques for Debugging

Pages numbers from the actual book.

· 01: Principles of Software Architecture

page 3:

software architecture is about the design of the entire system, whereas, software design is mostly about the details

· 02: Writing Modifiable and Readable Code

page 39:

Overuse of functional constructs: Python, being a mixed paradigm language, provides support for functional programming via its lambda keyword and its map(), reduce(), and filter()functions. However, sometimes, experienced programmers or programmers coming from a background of functional programming to Python, overuse these constructs producing code that is too cryptic, and hence, unreadable to other programmers.
No mention of the object-oriented hammer being used for everything, even when it is not the right way to go.

page 64:

The following are some of the most popular tools in the Python ecosystem which can perform such static analysis:
Pylint: Pylint is a static checker for Python code, which can detect a range of coding errors, code smells, and style errors. Pylint uses a style close to PEP-8. The newer versions of Pylint also provide statistics about code complexity, and can print reports. Pylint requires the code to be executed before checking it. You can refer to the http://pylint.org link.
Pyflakes: Pyflakes is a more recent project than Pylint. It differs from Pylint in that it need not execute the code before checking it for errors. Pyflakes does not check for coding style errors, and only performs logic checks in code. You can refer to the https://launchpad.net/pyflakes link.
McCabe: It is a script which checks and prints a report on the McCabe complexity of your code. You can refer to the https://pypi.python.org/ pypi/mccabe link.
Pycodestyle: Pycodestyle is a tool which checks your Python code against some of the PEP-8 guidelines. This tool was earlier called PEP-8. Refer to the https://github.com/PyCQA/pycodestyle link.
Flake8: Flake8 is a wrapper around the Pyflakes, McCabe, and pycodestyle tools, and can perform a number of checks including the ones provided by these tools. Refer to the https://gitlab.com/pycqa/flake8/ link.

· 03: Testability - Writing Testable Code

page 105:

Coverage.py is a third-party Python module, which works with test suites and cases written with the unittest module, and reports their code coverage. Coverage.py can be installed, like other tools shown here so far, using pip.

$ pip install coverage

page 117:

Integration tests test the software as a whole, whereas unit tests only test particular units in isolation.

· 04: Good Performance is Rewarding

page 159:

Line profiler can be installed via pip as follows:

$ pip3 install line_profiler

page 161:

Memory profiler can be installed the same way as line profiler:

$ pip3 install memory_profiler

page 179:

Dropping duplicates from a container without losing the order. Let us modify the cities list to include duplicates:

>>> cities = ['Jakarta','Delhi','Newyork','Bonn','Kolkata',
              'Bangalore','Bonn','Seoul','Delhi','Jakarta','Mumbai']
>>> cities_odict = OrderedDict.fromkeys(cities)
>>> print(cities_odict.keys())
odict_keys(['Jakarta', 'Delhi', 'Newyork', 'Bonn', 'Kolkata',
            'Bangalore', 'Seoul', 'Mumbai'])

· 05: Writing Applications that Scale

page 191:

When the system scales by either adding or making better use of resources inside a compute node, such as CPU or RAM, it is said to scale vertically or scale up. On the other hand, when a system scales by adding more compute nodes to it, such as a creating a load-balanced cluster of servers, it is said to scale horizontally or scale out.

page 235:

Here are some guidelines.
Use multithreading in the following cases:
The program needs to maintain a lot of shared states, especially mutable ones. A lot of the standard data structures in Python, such as lists, dictionaries, and others, are thread-safe, so it costs much less to maintain a mutable shared state using threads than via processes.
The program needs to keep a low memory foot-print.
The program spends a lot of time doing I/O. Since the GIL is released by threads doing I/O, it doesn’t affect the time taken by the threads to perform I/O.
The program doesn’t have a lot of data parallel operations which it can scale across multiple processes
Use multiprocessing in these scenarios:
The program performs a lot of CPU-bound heavy computing: byte-code operations, number crunching, and the like on reasonably large inputs.
The program has inputs which can be parallelized into chunks and whose results can be combined afterwards – in other words, the input of the program yields well to data-parallel computations.
The program doesn’t have any limitations on memory usage, and you are on a modern machine with a multicore CPU and large enough RAM.
There is not much shared mutable state between processes that need to be synchronized—this can slow down the system, and offset any benefits gained from multiple processes.
Your program is not heavily dependent on I/O—file or disk I/O or socket I/O.

· 06: Security - Writing Secure Code

· 07: Design Patterns in Python

page 339:

Pythonic alternative to Singleton, the Borg.

class Borg(object):
  """ I ain't a Singleton """

  __shared_state = {}

  def __init__(self):
    self.__dict__ = self.__shared_state

Here is a specific example of Borg in action:

  class IBorg(Borg):
  """ I am a Borg """

  def __init__(self):
    Borg.__init__(self)
    self.state = 'init'

  def__str__(self):
    return self.state

>>> i1 = IBorg()
>>> i2 = IBorg()
>>> print(i1)
init
>>> print(i2)
init
>>> i1.state='running'
>>> print(i2)
running
>>> print(i1)
running
>>> i1==i2
False

page 340:

What about dynamic values? We know it will work in a Singleton, since it’s the same object always, but what about the Borg?

>>> i1.x='test'
>>> i2.x
'test'

page 341:

First, let’s create the classes and their instances:

>>> class ABorg(Borg):pass
...
>>> class BBorg(Borg):pass
...
>>> class A1Borg(ABorg):pass
...
>>> a = ABorg()
>>> a1 = A1Borg()
>>> b = BBorg()

#Now let's attach a dynamic attribute x to a with value 100:

>>> a.x = 100
>>> a.x
100
>>> a1.x
100

# Let's check if the instance of the sibling class Borg also gets it:

>>> b.x
100

Singletons fail with this example. Singletons A and A1 will work, but B will be ‘out of sync’.
This proves that the Borg pattern is much better at state sharing across classes and sub-classes than the Singleton pattern, and it does so without a lot of fuss or the overhead of ensuring a single instance.

· 08: Python – Architectural Patterns

page 439:

Pipe and filter are used commonly for applications that perform a lot of data processing such as data analytics, data transformation, metadata extraction, and so on.

page 443:

Here is another program that uses another couple of data filtering generators to build a program, which watches files matching a specific pattern and prints information about the most recent file—something similar to what is done by the watch program on Linux:

# pipe_recent_gen.py
# Using generators, print details of the most recently modified file
# matching a pattern.

import glob
import os
from time import sleep

def watch(pattern):
  """ Watch a folder for modified files matching a pattern """
  while True:
    files = glob.glob(pattern)
    # sort by modified time
    files = sorted(files, key=os.path.getmtime)
    recent = files[-1]
    yield recent
    # Sleep a bit
    sleep(1)

def get(input):
  """ For a given file input, print its meta data """
  for item in input:
    data = os.popen("ls -lh " + item).read()
    # Clear screen
    os.system("clear")
    yield data

if __name__ == "__main__":
  import sys
  # Source + Filter #1
  stream1 = watch('*.' + sys.argv[1])
  while True:
    # Filter #2 + sink
    stream2 = get(stream1)
    print(stream2.__next__())
    sleep(2)

Linking generators together to act as a pipe. This reports the most recently modified file.

· 09: Deploying Python Applications

pip, venv, pypi, ansible.
Deployment architectures:

continuous
blueGreen
canary
A/B testing
induced chaos

· 10: Techniques for Debugging

page 497:

File cache with decorator

import hashlib
import json
import os

def unique_key(address, site):
  """ Return a unique key for the given arguments """
  return hashlib.md5(''.join((address['name'],
                              address['street'],
                              address['city'],
                              site)).encode('utf-8')).hexdigest()

def filecache(func):
  """ A file caching decorator """
  def wrapper(*args, **kwargs):
    # Construct a unique cache filename
    filename = unique_key(args[0], args[1]) + '.data'
    if os.path.isfile(filename):
      print('from file')
      # Return cached data from file
      return json.load(open(filename))
    # Else compute and write into file
    result = func(*args, **kwargs)
    json.dump(result, open(filename,'w'))
    return result
  return wrapper

@filecache
def api_search(address, site='yellowpages.com'):
  """ API to search for a given business address
      on a site and return results """
  req_params = {}
  req_params.update({'key': get_api_key(site),
                     'term': address['name'],
                     'searchloc': '{0}, {1}, {1}'.format(address['street'],
                                                         address['city'],
                                                         address['state'])})
  return requests.post(search_api % locals(), params=req_params)

this suffers from the problem of stale data, as once the file is created, the data is always returned from it. Meanwhile, the data on the server may have changed.

page 498:

To ensure that data is not too stale, a fixed time-to-live (TTL) is used. We use Redis as the cache store engine:

from redis import StrictRedis

def memoize(func, ttl=86400):
  """ A memory caching decorator """
  # Local redis as in-memory cache
  cache = StrictRedis(host='localhost', port=6379)
  def wrapper(*args, **kwargs):
    # Construct a unique key
    key = unique_key(args[0], args[1])
    # Check if its in redis
    cached_data = cache.get(key)
    if cached_data != None:
      print('from cache')
      return json.loads(cached_data)
    # Else calculate and store while putting a TTL
    result = func(*args, **kwargs)
    cache.set(key, json.dumps(result), ttl)
    return result
  return wrapper