Introduction
Came across this post on DE subreddit. This post is about private name mangling in Python, the link is here.
Python doesn't support strict private variables like other languages such as C++. However, python has a way to "implement" private variables by name mangling. It has everything to do with the number of leading underscore _
or __
- | description | internal in `__dict__` |
self.public | all can access | public |
self._private | _ is a friendly hint to another programmer that this variable is private but doesn't enforce rule. You can still access it. | _private |
self.__protected | __ is very private name mangling happens. Python will store any variable starting with two leading underscores __variable in the form of _ClassName__variable with a prefix _ClassName | _classname__protected |
The analogy of public, private and protected, borrowed from C++
Example 1 public, private and protected
Free to play with the following code snippet to explore the difference between public
, _private
and __protected
variable. and how it's handled in Python.
class Test:
def __init__(self) -> None:
# use of some c++ lingo
self.public = 11
self._private = 23
self.__protected = 42
def __private_method(self):
print("private method")
if __name__ == "__main__":
t = Test()
print(t.__dict__)
print(f"_private variable: {t._private}")
print(f"__protected variable: {t._Test__protected}")
t._Test__private_method()
The output of the script is
{'public': 11, '_private': 23, '_Test__protected': 42}
_private variable: 23
__protected variable: 42
private method
you can see there is no __protect
attribute in the namespace of the instance. However, you can still access it by t._Test__protected
.
Motivation
The reason behind this feature is that they wish to avoid name collision when inheritance. As the project gets larger or works on other people's codebase, for example, it is inevitable to name collision between parent and child class.
Example 2: inspect the __dict__
Let's have a class Class
to illustrate the concept
class Class:
def __init__(self) -> None:
self.__student_count = 0
def get_student_count(self):
return self.__student_count
def set_student_count(self, count):
self.__student_count = count
if __name__ == "__main__":
c = Class()
# snapshot 1
print(c.__dict__)
# snapshot 2
c.set_student_count(23)
print(c.__dict__)
# snapshot 3
c.__student_count = 10
print(c.get_student_count())
print(c.__dict__)
The output is
{'_Class__student_count': 0}
{'_Class__student_count': 23}
23
{'_Class__student_count': 23, '__student_count': 10}
When you try to set the variable __student_count
with the setter method, it works as expected. However, when you try to set it directly, it doesn't work. It's because python will store any variable starting with two leading underscores __variable
in the form of _ClassName__variable
with a prefix _ClassName
. It is illustrated in the __dict__
of the instance.
Example 3: class and math class
Let's say we have two classes,
Class
: a class with a private variable__count
, written by author 1 foo. He wants to keep track of the number of students in the class.MathClass
: a class that inherits fromClass
and has a private variable__count
as well,, written by author 2 bar. He wants to keep track of the number of textbook used for the math class.
author 1 left the job and author 2 inherit the class Class
and name his own class MathClass
. He wants to use __count
as well but to count completely different things. He will create his own setter and getter method for __count
as well. A code snippet is shown below.
class Class:
def __init__(self) -> None:
# author 1: foo
# number of students in the class
self.__count = 0
def get_count(self):
return self.__count
def set_count(self, count):
self.__count = count
class MathClass(Class):
def __init__(self) -> None:
super().__init__()
# author 2: bar
# number of textbook used for the math class
self.__count = 10
def get_count(self):
return self.__count
def set_count(self, count):
self.__count = count
if __name__ == "__main__":
c = Class()
math_c = MathClass()
print(c.__dict__)
print(math_c.__dict__)
math_c.set_count(20)
print(c.__dict__)
print(math_c.__dict__)
Output is here. It works fine.
{'_Class__count': 0}
{'_Class__count': 0, '_MathClass__count': 10}
{'_Class__count': 0}
{'_Class__count': 0, '_MathClass__count': 20}
But imagine if there is no name mangling feature
in Python to treat __count
as _<ClassName>__count
. The output will be
{'__count': 0}
{'__count': 10}
{'__count': 0}
{'__count': 20}
You will accidentally overwrite the variable __count
in the parent class but it stands for different meaning in the parent class. This is the reason why Python has this feature.
Summary
In this section, we touched upon
private, public and protected variables in python
name mangling in python with example
private name mangling is a kinda debatable feature. It's Python's effort to adopt more statically typed features from other languages. It's not a perfect solution but it's a solution. It's a trade-off between flexibility and safety.
This feature acts as a fail-safe for programmers to make mistakes. Also, it's advocates for better naming if we change it to
self.__count
inclass Class
toself.student_count
self.__count
inclass MathClass
toself.textbook_count
It's more clear and less confusing and you should put more thought into naming things to be more pragmatic. It echos there are two hard things in computer science: cache invalidation, and naming things.