
This tour covers advanced modules that support professional programming needs. These modules are more commonly found in larger applications rather than small scripts.
Output Formatting
The reprlib module provides a version of repr()
that creates abbreviated displays of large or deeply nested containers:
import reprlib
reprlib.repr(set('supercalifragilisticexpialidocious'))
# Output: "{'a', 'c', 'd', 'e', 'f', 'g', ...}"
The pprint module offers sophisticated control over printing both built-in and user-defined objects in a way that’s readable by the interpreter. When results span multiple lines, the “pretty printer” adds line breaks and indentation to clearly reveal data structure:
import pprint
t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta', 'yellow'], 'blue']]]
pprint.pprint(t, width=30)
# Output:
# [[[['black', 'cyan'],
# 'white',
# ['green', 'red']],
# [['magenta', 'yellow'],
# 'blue']]]
The textwrap module formats paragraphs of text to fit a given screen width:
import textwrap
doc = """The wrap() method is just like fill() except that it returns
a list of strings instead of one big string with newlines to separate
the wrapped lines."""
print(textwrap.fill(doc, width=40))
# Output:
# The wrap() method is just like fill()
# except that it returns a list of strings
# instead of one big string with newlines
# to separate the wrapped lines.
The locale module accesses a database of culture-specific data formats. The grouping
attribute of locale’s format function provides a direct way of formatting numbers with group separators:
import locale
locale.setlocale(locale.LC_ALL, 'English_United States.1252')
# Output: 'English_United States.1252'
conv = locale.localeconv() # get a mapping of conventions
x = 1234567.8
locale.format_string("%d", x, grouping=True)
# Output: '1,234,567'
locale.format_string("%s%.*f", (conv['currency_symbol'],
conv['frac_digits'], x), grouping=True)
# Output: '$1,234,567.80'
Templating
The string module includes a versatile Template
class with a simplified syntax suitable for end-user editing. This allows users to customize applications without altering the code.
The format uses placeholder names formed by $
with valid Python identifiers. Surrounding the placeholder with braces allows it to be followed by more alphanumeric letters with no intervening spaces. Writing $$
creates a single escaped $
:
from string import Template
t = Template('${village}folk send $$10 to $cause.')
t.substitute(village='Nottingham', cause='the ditch fund')
# Output: 'Nottinghamfolk send $10 to the ditch fund.'
The substitute()
method raises a KeyError
when a placeholder isn’t supplied. For mail-merge style applications, user data may be incomplete, so the safe_substitute()
method is often more appropriate — it leaves placeholders unchanged if data is missing:
t = Template('Return the $item to $owner.')
d = dict(item='unladen swallow')
t.substitute(d)
# Raises KeyError: 'owner'
t.safe_substitute(d)
# Output: 'Return the unladen swallow to $owner.'
Template subclasses can specify a custom delimiter. For example, a batch renaming utility for a photo browser might use percent signs for placeholders:
import time, os.path
photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
class BatchRename(Template):
delimiter = '%'
fmt = input('Enter rename style (%d-date %n-seqnum %f-format): ')
# User enters: Ashley_%n%f
t = BatchRename(fmt)
date = time.strftime('%d%b%y')
for i, filename in enumerate(photofiles):
base, ext = os.path.splitext(filename)
newname = t.substitute(d=date, n=i, f=ext)
print('{0} --> {1}'.format(filename, newname))
# Output:
# img_1074.jpg --> Ashley_0.jpg
# img_1076.jpg --> Ashley_1.jpg
# img_1077.jpg --> Ashley_2.jpg
Templating is also useful for separating program logic from output format details, making it possible to substitute custom templates for XML files, plain text reports, and HTML web reports.
Working with Binary Data Record Layouts
The struct module provides pack()
and unpack()
functions for working with variable-length binary record formats. The example below shows how to loop through header information in a ZIP file:
import struct
with open('myfile.zip', 'rb') as f:
data = f.read()
start = 0
for i in range(3): # show the first 3 file headers
start += 14
fields = struct.unpack('<IIIHH', data[start:start+16])
crc32, comp_size, uncomp_size, filenamesize, extra_size = fields
start += 16
filename = data[start:start+filenamesize]
start += filenamesize
extra = data[start:start+extra_size]
print(filename, hex(crc32), comp_size, uncomp_size)
start += extra_size + comp_size # skip to the next header
Multi-threading
Threading is a technique for decoupling tasks that aren’t sequentially dependent. Threads can improve application responsiveness by accepting user input while other tasks run in the background, or running I/O operations in parallel with computations.
The threading module can run tasks in the background while the main program continues to run:
import threading, zipfile
class AsyncZip(threading.Thread):
def __init__(self, infile, outfile):
threading.Thread.__init__(self)
self.infile = infile
self.outfile = outfile
def run(self):
f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED)
f.write(self.infile)
f.close()
print('Finished background zip of:', self.infile)
background = AsyncZip('mydata.txt', 'myarchive.zip')
background.start()
print('The main program continues to run in foreground.')
background.join() # Wait for the background task to finish
print('Main program waited until background was done.')
The main challenge of multi-threaded applications is coordinating threads that share data or resources. The threading module provides synchronization primitives including locks, events, condition variables, and semaphores.
Since minor design errors can cause difficult-to-reproduce problems, the preferred approach for task coordination is to concentrate all resource access in a single thread and use the queue module to feed that thread with requests from other threads. Applications using Queue
objects for inter-thread communication are easier to design, more readable, and more reliable.
Logging
The logging module offers a flexible logging system. At its simplest, log messages are sent to a file or to sys.stderr
:
import logging
logging.debug('Debugging information')
logging.info('Informational message')
logging.warning('Warning: config file %s not found', 'server.conf')
logging.error('Error occurred')
logging.critical('Critical error -- shutting down')
# Output:
# WARNING:root:Warning: config file server.conf not found
# ERROR:root:Error occurred
# CRITICAL:root:Critical error -- shutting down
By default, informational and debugging messages are suppressed and output is sent to standard error. Other output options include routing messages through email, datagrams, sockets, or to an HTTP Server. New filters can select different routing based on message priority: DEBUG, INFO, WARNING, ERROR, and CRITICAL.
The logging system can be configured directly from Python or loaded from a user-editable configuration file for customized logging without altering the application.
Weak References
Python’s automatic memory management uses reference counting for most objects and garbage collection to eliminate cycles. Memory is freed shortly after the last reference is eliminated.
This works well for most applications, but sometimes you need to track objects only as long as they’re being used elsewhere. The weakref module provides tools for tracking objects without creating a reference that would make them permanent:
import weakref, gc
class A:
def __init__(self, value):
self.value = value
def __repr__(self):
return str(self.value)
a = A(10) # create a reference
d = weakref.WeakValueDictionary()
d['primary'] = a # does not create a reference
d['primary'] # fetch the object if it is still alive
# Output: 10
del a # remove the one reference
gc.collect() # run garbage collection right away
# Output: 0
d['primary'] # entry was automatically removed
# Raises KeyError: 'primary'
Tools for Working with Lists
While the built-in list type meets many data structure needs, sometimes alternative implementations with different performance trade-offs are necessary.
The array module provides an array object that stores homogeneous data more compactly:
from array import array
a = array('H', [4000, 10, 700, 22222]) # 'H' is for 2-byte unsigned ints
sum(a)
# Output: 26932
a[1:3]
# Output: array('H', [10, 700])
The collections module provides a deque
object with faster appends and pops from the left side but slower lookups in the middle. These objects are well suited for implementing queues and breadth-first tree searches:
from collections import deque
d = deque(["task1", "task2", "task3"])
d.append("task4")
print("Handling", d.popleft())
# Output: Handling task1
unsearched = deque([starting_node])
def breadth_first_search(unsearched):
node = unsearched.popleft()
for m in gen_moves(node):
if is_goal(m):
return m
unsearched.append(m)
The bisect module offers functions for manipulating sorted lists:
import bisect
scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
bisect.insort(scores, (300, 'ruby'))
scores
# Output: [(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]
The heapq module provides functions for implementing heaps based on regular lists. The lowest valued entry is always kept at position zero, useful for applications that repeatedly access the smallest element without running a full sort:
from heapq import heapify, heappop, heappush
data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
heapify(data) # rearrange the list into heap order
heappush(data, -5) # add a new entry
[heappop(data) for i in range(3)] # fetch the three smallest entries
# Output: [-5, 0, 1]
Decimal Floating-Point Arithmetic
The decimal module offers a Decimal
datatype for decimal floating-point arithmetic, which is especially helpful for:
- Financial applications requiring exact decimal representation
- Control over precision
- Control over rounding to meet legal or regulatory requirements
- Tracking of significant decimal places
- Applications where users expect results to match hand calculations
For example, calculating a 5% tax on a 70 cent phone charge gives different results in decimal vs. binary floating point:
from decimal import *
round(Decimal('0.70') * Decimal('1.05'), 2)
# Output: Decimal('0.74')
round(.70 * 1.05, 2)
# Output: 0.73
The Decimal
result keeps a trailing zero, automatically inferring four-place significance from multiplicands with two-place significance. It reproduces mathematics as done by hand and avoids issues that arise when binary floating point can’t exactly represent decimal quantities.
Exact representation enables the Decimal
class to perform modulo calculations and equality tests that are unsuitable for binary floating point:
Decimal('1.00') % Decimal('.10')
# Output: Decimal('0.00')
1.00 % 0.10
# Output: 0.09999999999999995
sum([Decimal('0.1')]*10) == Decimal('1.0')
# Output: True
0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 == 1.0
# Output: False
The decimal module provides arithmetic with as much precision as needed:
getcontext().prec = 36
Decimal(1) / Decimal(7)
# Output: Decimal('0.142857142857142857142857142857142857')