designates my notes. / designates important.
Architecture is different from design. It is done from a higher, enterprise, level and sees everything from the top down. The components of the architecture can be delegated to subsystems architects, each responsible for their piece. (security, technology, data, etc)
Even if you didn’t explicitly map out an architecture you still have one. You always have one, even if might not be formal.
Some things to think about at the architecture level:
Some of the documentation practice is silly. A loop contains the following:
# sleeps for 1 second every time
time.sleep(1)
No shit! From what I have learned, it is best to write self-documenting code. In this case the time.sleep() function is pretty self-documented.
There is some suspect advice on potentially using multiple return types. It also mentions returning objects that can hold data (like url.content for example) or returning tuples that contain error codes.
Some tools to make your code cleaner:
Some things to be on the look at for within your code:
Chapter three talks about testability and test strategies. It offers advice on using stubs and mocks and when to white/black box your tests.
Specifically it mentions unittest, nose, and py.test.
Code coverage is covered here alongside integration tests. For example unittest is one unit while integration tests the whole, like selenium.
And you won’t get out of the chapter without the buzzword test driven development. (not that it is terrible)
Chapter four is definitely adjacent to chapter three, literally and figuratively. It talks about testing, but for performance. Should you test for perfomance throughout development (I say no) or should you do it at the end (the end? modern software has that?)?
There is discussion on big O complexity, data structures, and some tools to aid you in squeezing every bit out of your code.
With chapter five it is all about scalability (horizontal and vertical). Can you take advantage of concurrency? What kind of latency and performance do you have?
An example of a producer/consumer to output thumbnails is given.
Should you use locks or semaphores (which were faster by 4x)?
Under certain conditions can you throttling aspects of your program? If you are going to use concurrency, will you take advantage of threads or processes.
When you deploy, which interpreter will you choose?
Finally it ends with some suggested best practices.
Chapter six deals with security, but is a little lacks. This can be forgiven since the book tries to cover a lot and security is its own book.
It first starts with things like what kind of protection are you looking for: confidential, integrity, availability?
How is access going to be handled: authentication, authorization, non-reputability?
There is a warning about how eval
is unsafe and that you should never use
pickle
’s from unknown sources (use JSON or YAML instead)
They don’t say it but I will: never trust your users. Always validate and sanitize input.
Don’t use the old template
method, instead use “{}".format().
There is a little on passwords. Basic stuff is covered like: don’t store passwords, store hashes and don’t reference passwords in functions. Also, use password library. There is no need to reinvent the wheel (and yours will probably be poorly implemented).
While I personally like to keep everything in-house to reduce reliance on libraries as much as reasonable, passwords are something you shouldn’t take a risk with.
Keeping with the thin content, chapter seven touches on design patterns. You can read many (better) books on this (like The Gang of Four).
They break the patterns down into: Creational, Structural, and Behavioral patterns.
singleton versus the (superior) Borg. I really like this, even though it all feels like jumping through hoops to get global state
factory
prototype
builder
adapter
facade
proxy
iterator
observer
state - the state machine changes the class, it feels crazy to do it this way
Chapter eight is again a bit skimpy with limited coverage of the model/view/controller architecture.
It mentions event driven architecture and some examples and use cases: chat servers, select, sockets, twisted, eventlet, gevent.
My least favorite thing, microservices, comes next. They are not an architecture for a particular problem and can be used in many places. What they don’t mention is the fragility this can lead to.
Pipe and filter architecture
It was touched on before, kind of, when they mentioned WSGI, but chapter nine is all about deploying. How are you going to set up your dev/testing/staging/production environment?
You should install/setup python packages with pip
and run in a virtualenv
.
There is some discussion on pypi
and packaging for pypi distribution:
structure and imports, setup.py.
Interestingly they never mention docker, but do talk about managing with fabric vs (the better) ansible (can be run multiple times, won’t change what doesn’t need changed).
A few deployment patterns:
The final chapter, ten, is on debugging. From the simple print
peppered
throughout to eliminate blocks of code and using sys.exit()
.
It circles back on itself and recovers mocking with random data generation via
the schematics
module.
To save some bandwidth (and time and money) you can use caching (so you don’t need to hit external APIs or make expensive database calls. Some ways to handle the cache include:
Finally it looks at more advanced tools:
logging
pdb
- python debugger and more advanced loggers, ipdb, pdb++trace module
lptrace
strace
Overuse of functional constructs: Python, being a mixed paradigm language, provides support for functional programming via its lambda keyword and its map(), reduce(), and filter()functions. However, sometimes, experienced programmers or programmers coming from a background of functional programming to Python, overuse these constructs producing code that is too cryptic, and hence, unreadable to other programmers.
No mention of the object-oriented hammer being used for everything, even when it is not the right way to go.
The following are some of the most popular tools in the Python ecosystem which can perform such static analysis:
Pylint: Pylint is a static checker for Python code, which can detect a range of coding errors, code smells, and style errors. Pylint uses a style close to PEP-8. The newer versions of Pylint also provide statistics about code complexity, and can print reports. Pylint requires the code to be executed before checking it. You can refer to the http://pylint.org link.
Pyflakes: Pyflakes is a more recent project than Pylint. It differs from Pylint in that it need not execute the code before checking it for errors. Pyflakes does not check for coding style errors, and only performs logic checks in code. You can refer to the https://launchpad.net/pyflakes link.
McCabe: It is a script which checks and prints a report on the McCabe complexity of your code. You can refer to the https://pypi.python.org/ pypi/mccabe link.
Pycodestyle: Pycodestyle is a tool which checks your Python code against some of the PEP-8 guidelines. This tool was earlier called PEP-8. Refer to the https://github.com/PyCQA/pycodestyle link.
Flake8: Flake8 is a wrapper around the Pyflakes, McCabe, and pycodestyle tools, and can perform a number of checks including the ones provided by these tools. Refer to the https://gitlab.com/pycqa/flake8/ link.
$ pip install coverage
$ pip3 install line_profiler
$ pip3 install memory_profiler
>>> cities = ['Jakarta','Delhi','Newyork','Bonn','Kolkata',
'Bangalore','Bonn','Seoul','Delhi','Jakarta','Mumbai']
>>> cities_odict = OrderedDict.fromkeys(cities)
>>> print(cities_odict.keys())
odict_keys(['Jakarta', 'Delhi', 'Newyork', 'Bonn', 'Kolkata',
'Bangalore', 'Seoul', 'Mumbai'])
Here are some guidelines.
Use multithreading in the following cases:
The program needs to maintain a lot of shared states, especially mutable ones. A lot of the standard data structures in Python, such as lists, dictionaries, and others, are thread-safe, so it costs much less to maintain a mutable shared state using threads than via processes.
The program needs to keep a low memory foot-print.
The program spends a lot of time doing I/O. Since the GIL is released by threads doing I/O, it doesn’t affect the time taken by the threads to perform I/O.
The program doesn’t have a lot of data parallel operations which it can scale across multiple processes
Use multiprocessing in these scenarios:
The program performs a lot of CPU-bound heavy computing: byte-code operations, number crunching, and the like on reasonably large inputs.
The program has inputs which can be parallelized into chunks and whose results can be combined afterwards – in other words, the input of the program yields well to data-parallel computations.
The program doesn’t have any limitations on memory usage, and you are on a modern machine with a multicore CPU and large enough RAM.
There is not much shared mutable state between processes that need to be synchronized—this can slow down the system, and offset any benefits gained from multiple processes.
Your program is not heavily dependent on I/O—file or disk I/O or socket I/O.
class Borg(object):
""" I ain't a Singleton """
__shared_state = {}
def __init__(self):
self.__dict__ = self.__shared_state
class IBorg(Borg):
""" I am a Borg """
def __init__(self):
Borg.__init__(self)
self.state = 'init'
def__str__(self):
return self.state
>>> i1 = IBorg()
>>> i2 = IBorg()
>>> print(i1)
init
>>> print(i2)
init
>>> i1.state='running'
>>> print(i2)
running
>>> print(i1)
running
>>> i1==i2
False
>>> i1.x='test'
>>> i2.x
'test'
>>> class ABorg(Borg):pass
...
>>> class BBorg(Borg):pass
...
>>> class A1Borg(ABorg):pass
...
>>> a = ABorg()
>>> a1 = A1Borg()
>>> b = BBorg()
#Now let's attach a dynamic attribute x to a with value 100:
>>> a.x = 100
>>> a.x
100
>>> a1.x
100
# Let's check if the instance of the sibling class Borg also gets it:
>>> b.x
100
Singletons fail with this example. Singletons A and A1 will work, but B will be ‘out of sync’.
This proves that the Borg pattern is much better at state sharing across classes and sub-classes than the Singleton pattern, and it does so without a lot of fuss or the overhead of ensuring a single instance.
# pipe_recent_gen.py
# Using generators, print details of the most recently modified file
# matching a pattern.
import glob
import os
from time import sleep
def watch(pattern):
""" Watch a folder for modified files matching a pattern """
while True:
files = glob.glob(pattern)
# sort by modified time
files = sorted(files, key=os.path.getmtime)
recent = files[-1]
yield recent
# Sleep a bit
sleep(1)
def get(input):
""" For a given file input, print its meta data """
for item in input:
data = os.popen("ls -lh " + item).read()
# Clear screen
os.system("clear")
yield data
if __name__ == "__main__":
import sys
# Source + Filter #1
stream1 = watch('*.' + sys.argv[1])
while True:
# Filter #2 + sink
stream2 = get(stream1)
print(stream2.__next__())
sleep(2)
pip, venv, pypi, ansible.
Deployment architectures:
continuous
blueGreen
canary
A/B testing
induced chaos
import hashlib
import json
import os
def unique_key(address, site):
""" Return a unique key for the given arguments """
return hashlib.md5(''.join((address['name'],
address['street'],
address['city'],
site)).encode('utf-8')).hexdigest()
def filecache(func):
""" A file caching decorator """
def wrapper(*args, **kwargs):
# Construct a unique cache filename
filename = unique_key(args[0], args[1]) + '.data'
if os.path.isfile(filename):
print('from file')
# Return cached data from file
return json.load(open(filename))
# Else compute and write into file
result = func(*args, **kwargs)
json.dump(result, open(filename,'w'))
return result
return wrapper
@filecache
def api_search(address, site='yellowpages.com'):
""" API to search for a given business address
on a site and return results """
req_params = {}
req_params.update({'key': get_api_key(site),
'term': address['name'],
'searchloc': '{0}, {1}, {1}'.format(address['street'],
address['city'],
address['state'])})
return requests.post(search_api % locals(), params=req_params)
from redis import StrictRedis
def memoize(func, ttl=86400):
""" A memory caching decorator """
# Local redis as in-memory cache
cache = StrictRedis(host='localhost', port=6379)
def wrapper(*args, **kwargs):
# Construct a unique key
key = unique_key(args[0], args[1])
# Check if its in redis
cached_data = cache.get(key)
if cached_data != None:
print('from cache')
return json.loads(cached_data)
# Else calculate and store while putting a TTL
result = func(*args, **kwargs)
cache.set(key, json.dumps(result), ttl)
return result
return wrapper