10. Brief Tour of the Standard Library
Which module from standard library is used to work with operating system?
The os
module provides dozens of functions for interacting with the
operating system:
Be sure to use the import os
style instead of from os import *
. This will
keep os.open
from shadowing the built-in open
function which operates much
differently.
The built-in dir
and help
functions are useful as interactive
aids for working with large modules like os
. What dir(os)
and help(os)
will return?
For daily file and directory management tasks, the shutil
module provides
a higher level interface that is easier to use, how to copy or move files with
shutil
?
The glob
module provides a function for making file lists from directory
wildcard searches, how to list all files in the current directory with py
extension?
Common utility scripts often need to process command line arguments. These
arguments are stored in the sys
module’s argv attribute as a list. For
instance, let’s take the following demo.py
file:
What this script will print if we run it with this command python demo.py one two three
?
['demo.py', 'one', 'two', 'three']
The argparse
module provides a more sophisticated mechanism to process
command line arguments. The following script extracts one or more filenames
and an optional number of lines to be displayed:
When run at the command line with python top.py --lines=5 alpha.txt beta.txt
,
the script sets args.lines
and args.filenames
to?
args.lines
to 5
and args.filenames
to ['alpha.txt', 'beta.txt']
.
The sys
module also has attributes for stdin, stdout, and stderr.
The latter is useful for emitting warnings and error messages to make them
visible even when stdout has been redirected:
The most direct way to terminate a script (using sys
module) is to use
==sys.exit()
==.
Which python standard library module is used to work with regular expressions?
The re
module provides regular expression tools for advanced string
processing. For complex matching and manipulation, regular expressions offer
succinct, optimized solutions:
When only simple capabilities are needed, string methods are preferred over
re
because they are easier to read and debug:
Which standard library module gives access to the underlying C library functions
for floating point math:
The math
module.
Which standard library module is used to work with random numbers?
The random
module provides tools for making random selections:
Which standard library module is used to work with statistics?
The statistics
module calculates basic statistical properties
(the mean, median, variance, etc.) of numeric data:
Which standard library module is used to work with retrieving data from URL’s
and sending mail?
There are a number of modules for accessing the internet and processing internet
protocols. Two of the simplest are urllib.request
for retrieving data
from URLs and smtplib
for sending mail.
Which standard library module is used to work with dates and times?
The datetime
module supplies classes for manipulating dates and times in
both simple and complex ways. While date and time arithmetic is supported, the
focus of the implementation is on efficient member extraction for output
formatting and manipulation. The module also supports objects that are timezone
aware.
Which Data Compression modules aviable in python standard library? How to use
for example zlib?
Common data archiving and compression formats are directly supported by modules
including: zlib
, gzip
, bz2
, lzma
, zipfile
and tarfile
.
Which performance measurement module is available in python standard library?
Some Python users develop a deep interest in knowing the relative performance of
different approaches to the same problem. Python provides a measurement tool
that answers those questions immediately.
For example, it may be tempting to use the tuple packing and unpacking feature
instead of the traditional approach to swapping arguments. The timeit
module quickly demonstrates a modest performance advantage:
In contrast to timeit
’s fine level of granularity, the profile
and pstats
modules provide tools for identifying time critical sections in larger blocks of
code.
Which quality control module is available in python standard library to provide
automated docstring examples testing?
The doctest
module provides a tool for scanning a module and validating
tests embedded in a program’s docstrings. Test construction is as simple as
cutting-and-pasting a typical call along with its results into the docstring.
This improves the documentation by providing the user with an example and it
allows the doctest module to make sure the code remains true to the
documentation:
Standard library module unittest
is used for what purpose?
The unittest
module is not as effortless as the doctest
module, but it
allows a more comprehensive set of tests to be maintained in a separate file:
Python has a “batteries included” philosophy. This is best seen through the sophisticated and robust capabilities of its larger packages. For example:
-
The
xmlrpc.client
andxmlrpc.server
modules make implementing remote procedure calls into an almost trivial task. Despite the modules’ names, no direct knowledge or handling of XML is needed. -
The
email
package is a library for managing email messages, including MIME and other2822
-based message documents. Unlikesmtplib
andpoplib
which actually send and receive messages, the email package has a complete toolset for building or decoding complex message structures (including attachments) and for implementing internet encoding and header protocols. -
The
json
package provides robust support for parsing this (JSON) popular data interchange format. Thecsv
module supports direct reading and writing of files in Comma-Separated Value format, commonly supported by databases and spreadsheets. XML processing is supported by thexml.etree.ElementTree
,xml.dom
andxml.sax
packages. Together, these modules and packages greatly simplify data interchange between Python applications and other tools. -
The
sqlite3
module is a wrapper for the SQLite database library, providing a persistent database that can be updated and accessed using slightly nonstandard SQL syntax. -
Internationalization is supported by a number of modules including ==
gettext
,locale
, and thecodecs
== package.
The reprlib
module provides a version of repr
customized for abbreviated
displays of large or deeply nested containers:
The pprint
module offers more sophisticated control over printing both
built-in and user defined objects in a way that is readable by the interpreter.
When the result is longer than one line, the “pretty printer” adds
line breaks and indentation to more clearly reveal data structure::
The textwrap
module formats paragraphs of text to fit a given screen
width:
The locale
module accesses a database of culture specific data formats.
The grouping attribute of locale’s format function provides a direct way of
formatting numbers with group separators:
The string
module includes a versatile string.Template
class with a
simplified syntax suitable for editing by end-users. This allows users to
customize their applications without having to alter the application.
The format uses placeholder names formed by $
with valid Python identifiers
(alphanumeric characters and underscores). Surrounding the placeholder with
braces allows it to be followed by more alphanumeric letters with no intervening
spaces. Writing $$
creates a single escaped ==$
==:
The string.Template.substitute
method raises a ==KeyError
== when a
placeholder is not supplied in a dictionary or a keyword argument. For
mail-merge style applications, user supplied data may be incomplete and the
string.Template.safe_substitute
method may be more appropriate --- it will
leave placeholders unchanged if data is missing:
Can I use custom delimiters for string.Template
?
Template subclasses can specify a custom delimiter. For example, a batch
renaming utility for a photo browser may elect to use percent signs for
placeholders such as the current date, image sequence number, or file format:
Another application for templating is separating program logic from the details
of multiple output formats. This makes it possible to substitute custom
templates for XML files, plain text reports, and HTML web reports.
Which standard library module is used to work with binary data?
The struct
module provides struct.pack
and struct.unpack
functions for
working with variable length binary record formats. The following example shows
how to loop through header information in a ZIP file without using the zipfile
module. Pack codes "H"
and "I"
represent two and four byte unsigned numbers
respectively. The "<"
indicates that they are standard size and in
little-endian byte order:
When Threading is used in Python?
Threading is a technique for decoupling tasks which are not sequentially
dependent. Threads can be used to improve the responsiveness of applications
that accept user input while other tasks run in the background. A related use
case is running I/O in parallel with computations in another thread.
The following code shows how the high level :mod:threading
module can run
tasks in background while the main program continues to run:
The principal challenge of multi-threaded applications is coordinating threads
that share data or other resources. To that end, the threading module provides
a number of synchronization primitives including
locks, events, condition variables, and semaphores.
While those tools are powerful, minor design errors can result in problems that
are difficult to reproduce. So, the preferred approach to task coordination is
to concentrate all access to a resource in a single thread and then use the
queue
module to feed that thread with requests from other threads.
Applications using queue.Queue
objects for inter-thread communication and
coordination are easier to design, more readable, and more reliable.
The logging
module offers a full featured and flexible logging system.
At its simplest, log messages are sent to a file or to ==sys.stderr
==:
By default, informational and debugging messages are suppressed and the output
is sent to standard error. Other output options include routing messages through
email, datagrams, sockets, or to an HTTP Server. New filters can select
different routing based on message priority: logging.DEBUG
, logging.INFO
,
logging.WARNING
, logging.ERROR
, and ==logging.CRITICAL
==.
The logging system can be configured directly from Python or can be loaded from a user editable configuration file for customized logging without altering the application.
Python does automatic memory management (reference counting for most objects and
==[[garbage_collection]]
== to eliminate cycles). The memory is freed shortly
after the last reference to it has been eliminated.
Automatic memory management works fine for most applications but occasionally
there is a need to track objects only as long as they are being used by
something else. Unfortunately, just tracking them creates a reference that makes
them permanent. The weakref
module provides tools for tracking objects without
creating a reference. When the object is no longer needed, it is automatically
removed from a weakref table and a callback is triggered for weakref
objects. Typical applications include caching objects that are expensive to
create:
Many data structure needs can be met with the built-in list type. However, sometimes there is a need for alternative implementations with different performance trade-offs.
The array
module provides an array.array
object that is like a list that
stores only homogeneous data and stores it more compactly. The following
example shows an array of numbers stored as two byte unsigned binary numbers
(typecode "H"
) rather than the usual 16 bytes per entry for regular lists of
Python int objects:
The collections
module provides a collections.deque
object that is like a
list with faster appends and pops from the left side but slower lookups in the
middle. These objects are well suited for implementing queues and breadth
first tree searches:
In addition to alternative list implementations, the library also offers other
tools such as the bisect
module with functions for manipulating sorted
lists:
The heapq
module provides functions for implementing heaps based on regular
lists. The lowest valued entry is always kept at position zero. This is useful
for applications which repeatedly access the smallest element but do not want to
run a full list sort:
The decimal
module offers a decimal.Decimal
datatype for
decimal floating point arithmetic. Compared to the built-in float
implementation of binary floating point, the class is especially helpful for
- financial applications and other uses which require exact decimal representation,
- control over precision,
- control over rounding to meet legal or regulatory requirements,
- tracking of significant decimal places, or
- applications where the user expects the results to match calculations done by hand.
For example (decimal module), calculating a 5% tax on a 70 cent phone charge gives different results in decimal floating point and binary floating point. The difference becomes significant if the results are rounded to the nearest cent:
Results:
0.74
0.73
\
The decimal.Decimal
result keeps a trailing zero, automatically inferring four
place significance from multiplicands with two place significance. Decimal
reproduces mathematics as done by hand and avoids issues that can arise when
binary floating point cannot exactly represent decimal quantities.
Can you provide some examples of Decimal
numbers expressions differences with
float
numbers?
Exact representation enables the decimal.Decimal
class to perform modulo
calculations and equality tests that are unsuitable for binary floating point:
How to change precision of Decimal
numbers?
The decimal
module provides arithmetic with as much precision as needed, you
can change the precision (number of significant digits) using the
getcontext.prec
function parameter: