Tour of the Python Standard Library – Advanced Modules

This tour covers advanced modules that support professional programming needs. These modules are more commonly found in larger applications rather than small scripts.

Output Formatting

The reprlib module provides a version of repr() that creates abbreviated displays of large or deeply nested containers:

import reprlib
reprlib.repr(set('supercalifragilisticexpialidocious'))
# Output: "{'a', 'c', 'd', 'e', 'f', 'g', ...}"

The pprint module offers sophisticated control over printing both built-in and user-defined objects in a way that’s readable by the interpreter. When results span multiple lines, the “pretty printer” adds line breaks and indentation to clearly reveal data structure:

import pprint
t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta', 'yellow'], 'blue']]]

pprint.pprint(t, width=30)
# Output:
# [[[['black', 'cyan'],
#    'white',
#    ['green', 'red']],
#   [['magenta', 'yellow'],
#    'blue']]]

The textwrap module formats paragraphs of text to fit a given screen width:

import textwrap
doc = """The wrap() method is just like fill() except that it returns
a list of strings instead of one big string with newlines to separate
the wrapped lines."""

print(textwrap.fill(doc, width=40))
# Output:
# The wrap() method is just like fill()
# except that it returns a list of strings
# instead of one big string with newlines
# to separate the wrapped lines.

The locale module accesses a database of culture-specific data formats. The grouping attribute of locale’s format function provides a direct way of formatting numbers with group separators:

import locale
locale.setlocale(locale.LC_ALL, 'English_United States.1252')
# Output: 'English_United States.1252'

conv = locale.localeconv()          # get a mapping of conventions
x = 1234567.8
locale.format_string("%d", x, grouping=True)
# Output: '1,234,567'

locale.format_string("%s%.*f", (conv['currency_symbol'],
                   conv['frac_digits'], x), grouping=True)
# Output: '$1,234,567.80'

Templating

The string module includes a versatile Template class with a simplified syntax suitable for end-user editing. This allows users to customize applications without altering the code.

The format uses placeholder names formed by $ with valid Python identifiers. Surrounding the placeholder with braces allows it to be followed by more alphanumeric letters with no intervening spaces. Writing $$ creates a single escaped $:

from string import Template
t = Template('${village}folk send $$10 to $cause.')
t.substitute(village='Nottingham', cause='the ditch fund')
# Output: 'Nottinghamfolk send $10 to the ditch fund.'

The substitute() method raises a KeyError when a placeholder isn’t supplied. For mail-merge style applications, user data may be incomplete, so the safe_substitute() method is often more appropriate — it leaves placeholders unchanged if data is missing:

t = Template('Return the $item to $owner.')
d = dict(item='unladen swallow')
t.substitute(d)
# Raises KeyError: 'owner'

t.safe_substitute(d)
# Output: 'Return the unladen swallow to $owner.'

Template subclasses can specify a custom delimiter. For example, a batch renaming utility for a photo browser might use percent signs for placeholders:

import time, os.path
photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
class BatchRename(Template):
    delimiter = '%'

fmt = input('Enter rename style (%d-date %n-seqnum %f-format):  ')
# User enters: Ashley_%n%f

t = BatchRename(fmt)
date = time.strftime('%d%b%y')
for i, filename in enumerate(photofiles):
    base, ext = os.path.splitext(filename)
    newname = t.substitute(d=date, n=i, f=ext)
    print('{0} --> {1}'.format(filename, newname))

# Output:
# img_1074.jpg --> Ashley_0.jpg
# img_1076.jpg --> Ashley_1.jpg
# img_1077.jpg --> Ashley_2.jpg

Templating is also useful for separating program logic from output format details, making it possible to substitute custom templates for XML files, plain text reports, and HTML web reports.

Working with Binary Data Record Layouts

The struct module provides pack() and unpack() functions for working with variable-length binary record formats. The example below shows how to loop through header information in a ZIP file:

import struct

with open('myfile.zip', 'rb') as f:
    data = f.read()

start = 0
for i in range(3):                      # show the first 3 file headers
    start += 14
    fields = struct.unpack('<IIIHH', data[start:start+16])
    crc32, comp_size, uncomp_size, filenamesize, extra_size = fields

    start += 16
    filename = data[start:start+filenamesize]
    start += filenamesize
    extra = data[start:start+extra_size]
    print(filename, hex(crc32), comp_size, uncomp_size)

    start += extra_size + comp_size     # skip to the next header

Multi-threading

Threading is a technique for decoupling tasks that aren’t sequentially dependent. Threads can improve application responsiveness by accepting user input while other tasks run in the background, or running I/O operations in parallel with computations.

The threading module can run tasks in the background while the main program continues to run:

import threading, zipfile

class AsyncZip(threading.Thread):
    def __init__(self, infile, outfile):
        threading.Thread.__init__(self)
        self.infile = infile
        self.outfile = outfile

    def run(self):
        f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED)
        f.write(self.infile)
        f.close()
        print('Finished background zip of:', self.infile)

background = AsyncZip('mydata.txt', 'myarchive.zip')
background.start()
print('The main program continues to run in foreground.')

background.join()    # Wait for the background task to finish
print('Main program waited until background was done.')

The main challenge of multi-threaded applications is coordinating threads that share data or resources. The threading module provides synchronization primitives including locks, events, condition variables, and semaphores.

Since minor design errors can cause difficult-to-reproduce problems, the preferred approach for task coordination is to concentrate all resource access in a single thread and use the queue module to feed that thread with requests from other threads. Applications using Queue objects for inter-thread communication are easier to design, more readable, and more reliable.

Logging

The logging module offers a flexible logging system. At its simplest, log messages are sent to a file or to sys.stderr:

import logging
logging.debug('Debugging information')
logging.info('Informational message')
logging.warning('Warning: config file %s not found', 'server.conf')
logging.error('Error occurred')
logging.critical('Critical error -- shutting down')

# Output:
# WARNING:root:Warning: config file server.conf not found
# ERROR:root:Error occurred
# CRITICAL:root:Critical error -- shutting down

By default, informational and debugging messages are suppressed and output is sent to standard error. Other output options include routing messages through email, datagrams, sockets, or to an HTTP Server. New filters can select different routing based on message priority: DEBUG, INFO, WARNING, ERROR, and CRITICAL.

The logging system can be configured directly from Python or loaded from a user-editable configuration file for customized logging without altering the application.

Weak References

Python’s automatic memory management uses reference counting for most objects and garbage collection to eliminate cycles. Memory is freed shortly after the last reference is eliminated.

This works well for most applications, but sometimes you need to track objects only as long as they’re being used elsewhere. The weakref module provides tools for tracking objects without creating a reference that would make them permanent:

import weakref, gc
class A:
    def __init__(self, value):
        self.value = value
    def __repr__(self):
        return str(self.value)

a = A(10)                   # create a reference
d = weakref.WeakValueDictionary()
d['primary'] = a            # does not create a reference
d['primary']                # fetch the object if it is still alive
# Output: 10

del a                       # remove the one reference
gc.collect()                # run garbage collection right away
# Output: 0

d['primary']                # entry was automatically removed
# Raises KeyError: 'primary'

Tools for Working with Lists

While the built-in list type meets many data structure needs, sometimes alternative implementations with different performance trade-offs are necessary.

The array module provides an array object that stores homogeneous data more compactly:

from array import array
a = array('H', [4000, 10, 700, 22222])  # 'H' is for 2-byte unsigned ints
sum(a)
# Output: 26932
a[1:3]
# Output: array('H', [10, 700])

The collections module provides a deque object with faster appends and pops from the left side but slower lookups in the middle. These objects are well suited for implementing queues and breadth-first tree searches:

from collections import deque
d = deque(["task1", "task2", "task3"])
d.append("task4")
print("Handling", d.popleft())
# Output: Handling task1

unsearched = deque([starting_node])
def breadth_first_search(unsearched):
    node = unsearched.popleft()
    for m in gen_moves(node):
        if is_goal(m):
            return m
        unsearched.append(m)

The bisect module offers functions for manipulating sorted lists:

import bisect
scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
bisect.insort(scores, (300, 'ruby'))
scores
# Output: [(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]

The heapq module provides functions for implementing heaps based on regular lists. The lowest valued entry is always kept at position zero, useful for applications that repeatedly access the smallest element without running a full sort:

from heapq import heapify, heappop, heappush
data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
heapify(data)                      # rearrange the list into heap order
heappush(data, -5)                 # add a new entry
[heappop(data) for i in range(3)]  # fetch the three smallest entries
# Output: [-5, 0, 1]

Decimal Floating-Point Arithmetic

The decimal module offers a Decimal datatype for decimal floating-point arithmetic, which is especially helpful for:

  • Financial applications requiring exact decimal representation
  • Control over precision
  • Control over rounding to meet legal or regulatory requirements
  • Tracking of significant decimal places
  • Applications where users expect results to match hand calculations

For example, calculating a 5% tax on a 70 cent phone charge gives different results in decimal vs. binary floating point:

from decimal import *
round(Decimal('0.70') * Decimal('1.05'), 2)
# Output: Decimal('0.74')

round(.70 * 1.05, 2)
# Output: 0.73

The Decimal result keeps a trailing zero, automatically inferring four-place significance from multiplicands with two-place significance. It reproduces mathematics as done by hand and avoids issues that arise when binary floating point can’t exactly represent decimal quantities.

Exact representation enables the Decimal class to perform modulo calculations and equality tests that are unsuitable for binary floating point:

Decimal('1.00') % Decimal('.10')
# Output: Decimal('0.00')

1.00 % 0.10
# Output: 0.09999999999999995

sum([Decimal('0.1')]*10) == Decimal('1.0')
# Output: True

0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 == 1.0
# Output: False

The decimal module provides arithmetic with as much precision as needed:

getcontext().prec = 36
Decimal(1) / Decimal(7)
# Output: Decimal('0.142857142857142857142857142857142857')
  • Related Posts

    Interactive Mode

    There are two variants of the interactive REPL in Python. The classic basic interpreter is supported on all platforms with minimal line control capabilities. On Windows, or Unix-like systems with…

    Read more

    Interactive Input Editing and History Substitution

    Some versions of the Python interpreter support editing of the current input line and history substitution, similar to facilities found in the Korn shell and the GNU Bash shell. This…

    Read more

    You Missed

    How Zoom Helps You Stay Safe in Cyberspace

    How Zoom Helps You Stay Safe in Cyberspace

    The Top 10 Webinar Platforms for Businesses in 2025

    The Top 10 Webinar Platforms for Businesses in 2025

    Enhancing Client Service: 5 Zoom Strategies for Professional Services Firms

    Enhancing Client Service: 5 Zoom Strategies for Professional Services Firms

    Understanding Omnichannel Customer Service

    Understanding Omnichannel Customer Service

    Zoom Set to Enhance Customer Experience with New Salesforce Service Cloud Voice Integration

    Zoom Set to Enhance Customer Experience with New Salesforce Service Cloud Voice Integration

    Leadership Strategies for Remote Teams