Private Name Mangling

Private Name Mangling

python attempt for name collision, private etc

Introduction

Came across this post on DE subreddit. This post is about private name mangling in Python, the link is here.

Python doesn't support strict private variables like other languages such as C++. However, python has a way to "implement" private variables by name mangling. It has everything to do with the number of leading underscore _ or __

-descriptioninternal in `__dict__`
self.publicall can accesspublic
self._private_ is a friendly hint to another programmer that this variable is private but doesn't enforce rule. You can still access it._private
self.__protected__ is very private name mangling happens. Python will store any variable starting with two leading underscores __variable in the form of _ClassName__variable with a prefix _ClassName_classname__protected

The analogy of public, private and protected, borrowed from C++

Example 1 public, private and protected

Free to play with the following code snippet to explore the difference between public, _private and __protected variable. and how it's handled in Python.

class Test:
    def __init__(self) -> None:
        # use of some c++ lingo
        self.public = 11
        self._private =  23
        self.__protected = 42

    def __private_method(self):
        print("private method")

if __name__ == "__main__":
    t = Test()
    print(t.__dict__)
    print(f"_private variable: {t._private}")
    print(f"__protected variable: {t._Test__protected}")
    t._Test__private_method()

The output of the script is

{'public': 11, '_private': 23, '_Test__protected': 42}
_private variable: 23
__protected variable: 42
private method

you can see there is no __protect attribute in the namespace of the instance. However, you can still access it by t._Test__protected.

Motivation

The reason behind this feature is that they wish to avoid name collision when inheritance. As the project gets larger or works on other people's codebase, for example, it is inevitable to name collision between parent and child class.

Example 2: inspect the __dict__

Let's have a class Class to illustrate the concept

class Class:
    def __init__(self) -> None:
        self.__student_count = 0

    def get_student_count(self):
        return self.__student_count

    def set_student_count(self, count):
        self.__student_count = count


if __name__ == "__main__":
    c = Class()
    # snapshot 1
    print(c.__dict__)

    # snapshot 2
    c.set_student_count(23)
    print(c.__dict__)

    # snapshot 3
    c.__student_count = 10
    print(c.get_student_count())
    print(c.__dict__)

The output is

{'_Class__student_count': 0}
{'_Class__student_count': 23}
23
{'_Class__student_count': 23, '__student_count': 10}

When you try to set the variable __student_count with the setter method, it works as expected. However, when you try to set it directly, it doesn't work. It's because python will store any variable starting with two leading underscores __variable in the form of _ClassName__variable with a prefix _ClassName. It is illustrated in the __dict__ of the instance.

Example 3: class and math class

Let's say we have two classes,

  • Class: a class with a private variable __count, written by author 1 foo. He wants to keep track of the number of students in the class.

  • MathClass: a class that inherits from Class and has a private variable __count as well,, written by author 2 bar. He wants to keep track of the number of textbook used for the math class.

author 1 left the job and author 2 inherit the class Class and name his own class MathClass. He wants to use __count as well but to count completely different things. He will create his own setter and getter method for __count as well. A code snippet is shown below.

class Class:
    def __init__(self) -> None:
        # author 1: foo
        # number of students in the class
        self.__count = 0

    def get_count(self):
        return self.__count

    def set_count(self, count):
        self.__count = count


class MathClass(Class):
    def __init__(self) -> None:
        super().__init__()
        # author 2: bar
        # number of textbook used for the math class
        self.__count = 10

    def get_count(self):
        return self.__count

    def set_count(self, count):
        self.__count = count


if __name__ == "__main__":
    c = Class()
    math_c = MathClass()

    print(c.__dict__)
    print(math_c.__dict__)

    math_c.set_count(20)
    print(c.__dict__)
    print(math_c.__dict__)

Output is here. It works fine.

{'_Class__count': 0}
{'_Class__count': 0, '_MathClass__count': 10}
{'_Class__count': 0}
{'_Class__count': 0, '_MathClass__count': 20}

But imagine if there is no name mangling feature in Python to treat __count as _<ClassName>__count. The output will be

{'__count': 0}
{'__count': 10}
{'__count': 0}
{'__count': 20}

You will accidentally overwrite the variable __count in the parent class but it stands for different meaning in the parent class. This is the reason why Python has this feature.

Summary

In this section, we touched upon

  • private, public and protected variables in python

  • name mangling in python with example

private name mangling is a kinda debatable feature. It's Python's effort to adopt more statically typed features from other languages. It's not a perfect solution but it's a solution. It's a trade-off between flexibility and safety.

This feature acts as a fail-safe for programmers to make mistakes. Also, it's advocates for better naming if we change it to

  • self.__count in class Classto self.student_count

  • self.__count in class MathClassto self.textbook_count

It's more clear and less confusing and you should put more thought into naming things to be more pragmatic. It echos there are two hard things in computer science: cache invalidation, and naming things.

Reference