Pragmatic Programmer: Finish What You Start

python context manager

Motivation

When I was reading Pragmatic Programmer, i came across this tip,

Finish What You Start

It means that for functions and objects that allocate resources, you should also free them. This is a good practice to avoid memory leaks.

Example

Let's take a look at the class Customer that handles the customer account information.

Coupled code

First, let's take a look at a coupled code that violates the finish what you start principle.

class Customer:
    def __init__(self, name):
        self.name = name
        self.balance = None

    def read_customer(self):
        self.customer_file = open(self.name + ".rec", "r+")
        self.balance = float(self.customer_file.readline().strip())

    def write_customer(self):
        self.customer_file.seek(0)
        self.customer_file.write(str(self.balance))
        self.customer_file.truncate()
        self.customer_file.close()

    def update_customer(self, transaction_amount):
        self.read_customer()
        self.balance += float(transaction_amount)
        self.write_customer()


if __name__ == "__main__":
    # update the customer's balance
    customer = Customer("john_doe")
    customer.update_customer(100)

It has a couple of red flag, the read_customer and write_customer are coupled together because,

  • they share self.customer_file.

  • read_customer open up the file and write_customer close the file

Then, update_customer calls upon read_customer and write_customer to update the customer's balance. The OS resource that has been allocated to self.customer_file is freed by update_customer.

This is a bad practice because it's not clear who is responsible for freeing the OS resource. And as we receive a new ticket, let's say we can only update the balance if the customer transaction amount is positive, then the code will look like this.

class Customer:
    def __init__(self, name):
        self.name = name
        self.balance = None

    def read_customer(self):
        self.customer_file = open(self.name + ".rec", "r+")
        self.balance = float(self.customer_file.readline().strip())

    def write_customer(self):
        self.customer_file.seek(0)
        self.customer_file.write(str(self.balance))
        self.customer_file.truncate()
        self.customer_file.close()

    def update_customer(self, transaction_amount):
        self.read_customer()
        if transaction_amount > 0:
            self.balance += float(transaction_amount)
            self.write_customer()


if __name__ == "__main__":
    # update the customer's balance
    customer = Customer("john_doe")
    customer.update_customer(100)

It will cause the customer file will not be closed if the transaction amount is negative. After a while, the OS resource will be exhausted and the program will crash (imagine many transactions per minute).

Decoupled code

To fix this, we can decouple the read_customer and write_customer by passing a file handle to them.

class Customer:
    """A customer with a name and a balance to handle
    bank balance transactions"""

    def __init__(self, name):
        self.name = name
        self.balance = None

    def read_customer(self, file):
        """read a line from the file and set the balance"""
        self.balance = float(file.readline())

    def write_customer(self, file):
        """write the balance to the file"""
        file.seek(0)
        file.write(str(self.balance))

    def update_customer(self, transaction_amount):
        """update the customer's balance"""
        with open(self.name + ".rec", "r+", encoding="utf-8") as file:
            self.read_customer(file)
            self.balance += transaction_amount
            self.write_customer(file)


if __name__ == "__main__":
    customer = Customer("john_doe")

    # update the customer's balance
    customer.update_customer(50.25)
problemsolution
coupling from self.customer_filerefactor it out and pass it in file handle instead
update_customer may violate the finish where you start principle oncewith open() in Python, the context manager that does resource management for you

What happens with python opens a file?

It goes through a series of steps,

  • request the OS: python needs to talk to the OS by making a system call.

  • check the file mode: check if the file is opened in read, write or append mode to make sure we can operate on the file.

  • file descriptor: the OS will return a file descriptor to Python, which is an integer that represents the file. Recall the unix philosophy, everything is a file or more precisely everything is a file descriptor.

  • file handle (file object) creation: Python creates a file handle that represents the opened file. This file object is used to perform various operations on the file, such as reading, writing, and seeking.

In the meantime, python will be responsible for the resource management for the operation we perform on the file.

Summary

In this section, we covered

  • the finish what you start principle

  • how to decouple the code to avoid the problem with customer class example

  • a bit dig-in on what's behind the scene when Python open a file and the rise of context manager

To reduce the occurrences of the problem for Python developers, context manager has been released in Python 2.5. After that, the developer always uses with open() to open a file, which reduces the occurrences of the problem. So much effort has been put into guiding the developer to do the right thing and we are taking it for granted.

Note: The same principle applies to database connection. If you don't do it properly, it will occupy the connection pool.

Python context manager has more than just with open(). Feel free to explore it from contextlib import contextmanager, I have some links in the reference section.

Reference