from PIL import Image, ImageFile
from matplotlib.pyplot import imshow
import requests
from io import BytesIO
import numpy as np
import logging

Python Packages Development¶

Introduction¶

Reason for Building a Python Package

Easier code to reuse
prevent copying and pasting
make function up to date
share codes with others

Difference between Scripts, Modules and Packages

Script: A python code run by python script.py.
Package: A directory that has all Python codes in it(numpy).
Subpackage: A smaller package that is inside a package (numpy.random).
Module: A Python code located inside a package that stores package code.
Library: It can be either a package, or a collection of packages.

Directory tree of a package¶

package1/ |-- module1.py |-- __init__.py

This directory package1 is a Python Package
All the package code are withinmodule1.py
This directory is made to a Python package by init.py

Subpackages

Directory tree of a package

preprocessing and regression are subpackages. regression.py, standardize.pyare Python modules

Documentation¶

We should document each:

Function
Class
Class Method

writing help gives documentation as blow:

import numpy as np
help(np.sqrt)

Function documentation is as below:

def genearte_number(nrand=100,file='Output.dat'): """ Find odd and even numbers. It generate an output file defining if number is odd or even """

Documentation style¶

Google documentation style
NumPy style
reStructured text style
Epytext style

NumPy documentation style is the most common one used for numpy, pandas, sklearn and matplotlib....

import numpy as np
help(np.median)

Generate Template¶

Docstrings can be generated by pyment
It should be run from terminal and can be converted to any other documentation styles (Google, Numpydoc...)

The code below should written in terminal at the same directory of code:

pyment -w -o numpydoc script1.py

-w overwrites the file and -o numpydoc make output to the file. The result is below:

def genearte_number(nrand=100,file='Output.dat'): """Find odd and even numbers.

Parameters
----------
nrand :
     (Default value = 100)
file :
     (Default value = 'Output.dat')

Returns
-------

"""

Import Structure¶

See directory tree for package and subpackages of directory below:

import sys
sys.path.append("D:\Resources\Courses\Developing Python Packages\Tests") 

import myclustering

help(myclustering.running)

import myclustering.running

help(myclustering.running)

import myclustering.running.kmeans

help(myclustering.running.kmeans)

Importing subpackages into packages

Absolute import

This is mostly used. It is more explicit.

from myclustering import running

relative import

It is used sometimes. Since it is shorter, sometimes it may be simpler.

A single dot means that the module or package referenced is in the same directory as the current location. Two dots mean that it is in the parent directory of the current location.

sys.path.append("D:\Resources\Courses\Developing Python Packages\Tests") 
from .myclustering import running

import myclustering
import myclustering.monitoring
help(myclustering.monitoring)

We imported monitoring into myclusteringbut monitoring has no link to majorityvote. See below:

help(myclustering.monitoring.majorityvote)

from myclustering.monitoring import majorityvote
help(myclustering.monitoring.majorityvote)

Import function into subpackage

from myclustering.monitoring.majorityvote import Flow_old_new_cluster

help(Flow_old_new_cluster)

Install your Package¶

Packages imported above should be at parent directory, otherwise, they cannot be installed. We can make a package installed to have access to it from anywhere. setup.py should be added to package. It is used to install the package. Moreover, it contains metadata on package. To add setup.py script, we should slightly restructure our directory. setup.py script is part of package but not source code.

We need to create a new folder in setup directory to keep the source code.

What is in setup.py¶

# Import functions
from setuptools import setup, find_packages # this is required to find your packages and sub-packages
# setup function is called

setup( author="Mehdi", description="Clustering and monitoring.", name="myclustering", version="0.1.0", packages=find_packages(include=["myclustering","myclustering.*"]), )

version number has three part:

(major number) . (minor number) . (patch number)

Finally the package can be installed like this pip install -e .

. = means install package in current directory
-e= Install package in editable note (make changes to source code, debug...)

We can easily uninstall a Python package by pip uninstall <packagename>

import myclustering
import myclustering.monitoring
from myclustering import running

Dependency¶

How to make your package usable for other people.

Other packages are imported inside your package. For example inside kmeans.py, these packages are dependencies:

import pandas as pd

import numpy as np

....

If someone wants to install your package, you need to make sure those packages are installed.In order to make sure yours have installed you package, you can have install requirement as below:

setup( ... install_requires=[ 'matplotlib','pandas', 'scipy' ], )

When people install your package by pip, then pip automatically install required packages

Sometimes your code might depend on a specific version of package.

setup( ... install_requires=[ 'matplotlib>=2.3.1,<3', 'pandas>=2', 'scipy==2.1' ], )

we can also include Python version:

from setuptools import setup, find_packages setup( ... python_requires='>=2.7, !=3.0.*, !=3.1.*', )

For above example, python version should be bigger than 2.7 but cannot be 3.0 and 3.1.

"""# Import functions
from setuptools import setup, find_packages # this is required to find your packages and sub-packages

# setup function is called 
setup(
    author="Mehdi",
    description="Clustering and monitoring.",
    name="myclustering",
    version="0.1.0",
    packages=find_packages(include=["myclustering","myclustering.*"]),
    install_requires=['matplotlib','pandas', 
    'scipy', 'scikit-learn==0.24.2'],
    python_requires='>=3.9',

)
"""

Making an environment for developers¶

Reproducibility is the key for all software development. Whoever use your software should have all dependencies. Here you want to know exactly each version you want to use.

You can should all package version installed by pip freeze command:

absl-py==1.0.0

anaconda-navigator==2.1.4

astor==0.8.1

astunparse==1.6.3

atomicwrites==1.4.0

autograd-gamma==0.5.0

You should export this information to a txt file which you include with your package pip freeze > requirements.txt as below:

This allows anyone to install all packages using this pip install command pip install -r requirements.txt. Having the exact same set of packages makes it easier to hunt down any bugs.

import myclustering
from myclustering.running import processing as eda

help(eda.EDA_plot.histplt)

Help on function histplt in module myclustering.running.processing:

histplt(val: list, bins: int, title: str, xlabl: str, ylabl: str, xlimt: list, ylimt: list = False, loc: int = 1, legend: int = 1, axt=None, days: int = False, class_: int = False, scale: int = 1, int_: int = 0, nsplit: int = 1, font: int = 5, color: str = 'b') -> None
    Make histogram of data

import pandas as pd
import matplotlib.pyplot as plt

font = {'size'   : 7}
plt.rc('font', **font)
fig, ax1 = plt.subplots(figsize=(10, 7), dpi= 100, facecolor='w', edgecolor='k')


url = 'https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv'
df = pd.read_csv(url, index_col=0)

ax1=plt.subplot(2,2,1 ) 
val=df['region-code']
eda.EDA_plot.histplt(val,bins=100,title='region-code',xlabl='region-code',
         ylabl='Percentage',xlimt=(0,40),ylimt=(0,0.5),axt=ax1,
         scale=1.15,loc=1,font=10,color='#8DC63F')
plt.show()

Add Licences and READMEs¶

License

We need a license to give others permission to use our code. We can have a open source liceses to allow users:

Apply and use our package

Modify/change our package

distribute updated versions of our packages

README.md

README is the "front page" of the package which we can use it to display in Github or PyPI. Sections in README are :

Title of package

Description and applicability

How to install

Examples shows how to use

Contribution

License

The common README is Markdown (commonmark). It contains README.md file, much simpler than reStructuredText

MANIFEST.in

All extra files to include in package distribution is listed by MANIFEST.in. It includes:

include LICENSE include README.md

The updated directory tree for package and subpackages are:

Publish Packages Online by PyPI¶

When you install packages by pip, you normally download them from Python Package Index whose abbreviation is pip. That is an online code repository and anyone from anywhere can upload packages to it.

Only free registration is required. It can be tempted to wait until package is complete and polished before releasing it early. However, releasing early can lead to get feedback to improve your package.

When you upload your package to PyPI, you actually upload a package distributions.

Distribution Package¶

Distribution package - it is a bundled version of the code which is ready to install. There are two types of Python distribution in Python:

Source distribution: basically the Python source code file you have written. File must be downloaded and the set up.py script should be run.
Wheel distribution is a version of package which is slightly processed format to make it faster to install. It can be installed without running the setup.py script. It has smaller size so faster to download. Wheel is preferred Python distribution. When a distribution is uploaded in PyPI, we should always download the wheel and source distribution.

We can build both source and wheel distribution from terminal by the command below:

python setup.py sdist bdist_wheel

where sdist is source distribution and bdist_wheel is wheel distribution.

Running this command make dist directory that has wheel and source distribution inside in it. Moreover, it creates build and eg-info directory (we may not need those files).

Upload Package¶

Last step is to upload the created distributed files to PyPI to get access to package from anywhere. It can be done through terminal with twine:

twine upload dist/*

We have to go to PyPI](https://pypi.org/) to register before uploading the package. The leads to live package that anyone can have access to it and install it by:

pip install myclustering

Alternatively, we can upload distributions on TestPyPI instead of PyPI. It can be done through terminal with twine:

twine upload -r testpypi dist/*

We need to go to TestPyPI to register. Package can be installed by:

pip install --index-url https://test.pypi.org/simple --extra-index-url https://pypi.org/simple myclustering