from PIL import Image, ImageFile
from matplotlib.pyplot import imshow
import requests
from io import BytesIO
import numpy as np
import logging
Python Packages Development¶
Introduction¶
Reason for Building a Python Package
Easier code to reuse
prevent copying and pasting
make function up to date
share codes with others
Difference between Scripts, Modules and Packages
Script: A python code run by
python script.py
.Package: A directory that has all Python codes in it(
numpy
).Subpackage: A smaller package that is inside a package (
numpy.random
).Module: A Python code located inside a package that stores package code.
Library: It can be either a package, or a collection of packages.
Directory tree of a package¶
package1/
|-- module1.py
|-- __init__.py
This directory
package1
is a Python PackageAll the package code are within
module1.py
This directory is made to a Python package by init.py
Subpackages
Directory tree of a package
me_sklearn/
|-- __init__.py
|-- regression
| |-- __init__.py
| |-- regression.py
|-- preprocessing
| |-- __init__.py
| |-- normalize.py
| |-- standardize.py
|-- utils.py
preprocessing
and regression
are subpackages. regression.py
, standardize.py
are Python modules
Documentation¶
We should document each:
Function
Class
Class Method
writing help gives documentation as blow:
import numpy as np
help(np.sqrt)
Function documentation is as below:
def genearte_number(nrand=100,file='Output.dat'):
""" Find odd and even numbers.
It generate an output file defining if number is odd or even
"""
Documentation style¶
Google documentation style
NumPy style
reStructured text style
Epytext style
NumPy documentation style is the most common one used for numpy
, pandas
, sklearn
and matplotlib
....
import numpy as np
help(np.median)
Generate Template¶
Docstrings can be generated by
pyment
It should be run from terminal and can be converted to any other documentation styles (Google, Numpydoc...)
The code below should written in terminal at the same directory of code:
pyment -w -o numpydoc script1.py
-w
overwrites the file and -o numpydoc
make output to the file. The result is below:
def genearte_number(nrand=100,file='Output.dat'): """Find odd and even numbers.
Parameters
----------
nrand :
(Default value = 100)
file :
(Default value = 'Output.dat')
Returns
-------
"""
Import Structure¶
See directory tree for package and subpackages of directory below:
myclustering/
|-- __init__.py
|-- running
| |-- __init__.py
| |-- kmeans.py
| |-- processing.py
|-- monitoring
| |-- __init__.py
| |-- majorityvote.py
| |-- performance.py
import sys
sys.path.append("D:\Resources\Courses\Developing Python Packages\Tests")
import myclustering
help(myclustering.running)
import myclustering.running
help(myclustering.running)
import myclustering.running.kmeans
help(myclustering.running.kmeans)
Importing subpackages into packages
- Absolute import
This is mostly used. It is more explicit.
from myclustering import running
- relative import
It is used sometimes. Since it is shorter, sometimes it may be simpler.
A single dot means that the module or package referenced is in the same directory as the current location. Two dots mean that it is in the parent directory of the current location.
sys.path.append("D:\Resources\Courses\Developing Python Packages\Tests")
from .myclustering import running
import myclustering
import myclustering.monitoring
help(myclustering.monitoring)
We imported monitoring
into myclustering
but monitoring
has no link to majorityvote
. See below:
help(myclustering.monitoring.majorityvote)
from myclustering.monitoring import majorityvote
help(myclustering.monitoring.majorityvote)
- Import function into subpackage
from myclustering.monitoring.majorityvote import Flow_old_new_cluster
help(Flow_old_new_cluster)
Install your Package¶
Packages imported above should be at parent directory, otherwise, they cannot be installed. We can make a package installed to have access to it from anywhere. setup.py
should be added to package. It is used to install the package. Moreover, it contains metadata on package. To add setup.py
script, we should slightly restructure our directory. setup.py
script is part of package but not source code.
We need to create a new folder in setup directory to keep the source code.
myclustering/ <-- outer directory
|-- myclustering/ <-- inner source code directory
| |-- __init__.py
| |-- running
| | |-- __init__.py
| | |-- kmeans.py
| | |-- processing.py
| |-- monitoring
| | |-- __init__.py
| | |-- majorityvote.py
| | |-- performance.py
|-- setup.py <-- set up script in outer directory
What is in setup.py¶
# Import functions
from setuptools import setup, find_packages # this is required to find your packages and sub-packages
# setup function is called
setup(
author="Mehdi",
description="Clustering and monitoring.",
name="myclustering",
version="0.1.0",
packages=find_packages(include=["myclustering","myclustering.*"]),
)
version number has three part:
- (major number) . (minor number) . (patch number)
Finally the package can be installed like this pip install -e .
.
= means install package in current directory-e
= Install package in editable note (make changes to source code, debug...)
We can easily uninstall a Python package by pip uninstall <packagename>
import myclustering
import myclustering.monitoring
from myclustering import running
Dependency¶
How to make your package usable for other people.
- Other packages are imported inside your package. For example inside
kmeans.py
, these packages are dependencies:
import pandas as pd
import numpy as np
....
If someone wants to install your package, you need to make sure those packages are installed.In order to make sure yours have installed you package, you can have install requirement as below:
setup(
...
install_requires=[
'matplotlib','pandas', 'scipy'
],
)
When people install your package by pip, then pip automatically install required packages
Sometimes your code might depend on a specific version of package.
setup(
...
install_requires=[
'matplotlib>=2.3.1,<3', 'pandas>=2', 'scipy==2.1'
],
)
we can also include Python version:
from setuptools import setup, find_packages
setup(
...
python_requires='>=2.7, !=3.0.*, !=3.1.*',
)
For above example, python version should be bigger than 2.7 but cannot be 3.0 and 3.1.
"""# Import functions
from setuptools import setup, find_packages # this is required to find your packages and sub-packages
# setup function is called
setup(
author="Mehdi",
description="Clustering and monitoring.",
name="myclustering",
version="0.1.0",
packages=find_packages(include=["myclustering","myclustering.*"]),
install_requires=['matplotlib','pandas',
'scipy', 'scikit-learn==0.24.2'],
python_requires='>=3.9',
)
"""
Making an environment for developers¶
Reproducibility is the key for all software development. Whoever use your software should have all dependencies. Here you want to know exactly each version you want to use.
You can should all package version installed by pip freeze
command:
absl-py==1.0.0
anaconda-navigator==2.1.4
astor==0.8.1
astunparse==1.6.3
atomicwrites==1.4.0
autograd-gamma==0.5.0
You should export this information to a txt file which you include with your package pip freeze > requirements.txt
as below:
myclustering/ <-- outer directory
|-- myclustering/ <-- inner source code directory
| |-- __init__.py
| |-- running
| | |-- __init__.py
| | |-- kmeans.py
| | |-- processing.py
| |-- monitoring
| | |-- __init__.py
| | |-- majorityvote.py
| | |-- performance.py
|-- setup.py <-- set up script in outer directory
|-- requirements.txt <-- developer environment
This allows anyone to install all packages using this pip install command pip install -r requirements.txt
. Having the exact same set of packages makes it easier to hunt down any bugs.
import myclustering
from myclustering.running import processing as eda
help(eda.EDA_plot.histplt)
Help on function histplt in module myclustering.running.processing: histplt(val: list, bins: int, title: str, xlabl: str, ylabl: str, xlimt: list, ylimt: list = False, loc: int = 1, legend: int = 1, axt=None, days: int = False, class_: int = False, scale: int = 1, int_: int = 0, nsplit: int = 1, font: int = 5, color: str = 'b') -> None Make histogram of data
import pandas as pd
import matplotlib.pyplot as plt
font = {'size' : 7}
plt.rc('font', **font)
fig, ax1 = plt.subplots(figsize=(10, 7), dpi= 100, facecolor='w', edgecolor='k')
url = 'https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv'
df = pd.read_csv(url, index_col=0)
ax1=plt.subplot(2,2,1 )
val=df['region-code']
eda.EDA_plot.histplt(val,bins=100,title='region-code',xlabl='region-code',
ylabl='Percentage',xlimt=(0,40),ylimt=(0,0.5),axt=ax1,
scale=1.15,loc=1,font=10,color='#8DC63F')
plt.show()
Add Licences and READMEs¶
License
We need a license to give others permission to use our code. We can have a open source liceses to allow users:
- Apply and use our package
- Modify/change our package
- distribute updated versions of our packages
README.md
README is the "front page" of the package which we can use it to display in Github or PyPI. Sections in README are :
- Title of package
- Description and applicability
- How to install
- Examples shows how to use
- Contribution
- License
The common README is Markdown (commonmark). It contains README.md
file, much simpler than reStructuredText
MANIFEST.in
All extra files to include in package distribution is listed by MANIFEST.in
. It includes:
include LICENSE
include README.md
The updated directory tree for package and subpackages are:
myclustering/ <-- outer directory
|-- myclustering/ <-- inner source code directory
| |-- __init__.py
| |-- running
| | |-- __init__.py
| | |-- kmeans.py
| | |-- processing.py
| |-- monitoring
| | |-- __init__.py
| | |-- majorityvote.py
| | |-- performance.py
|-- setup.py
|-- requirements.txt
|-- LICENSE <---
|-- README.md <---
|-- MANIFEST.in <---
Publish Packages Online by PyPI¶
When you install packages by pip, you normally download them from Python Package Index whose abbreviation is pip
. That is an online code repository and anyone from anywhere can upload packages to it.
Only free registration is required. It can be tempted to wait until package is complete and polished before releasing it early. However, releasing early can lead to get feedback to improve your package.
When you upload your package to PyPI, you actually upload a package distributions.
Distribution Package¶
Distribution package - it is a bundled version of the code which is ready to install. There are two types of Python distribution in Python:
Source distribution: basically the Python source code file you have written. File must be downloaded and the
set up.py
script should be run.Wheel distribution is a version of package which is slightly processed format to make it faster to install. It can be installed without running the
setup.py
script. It has smaller size so faster to download. Wheel is preferred Python distribution. When a distribution is uploaded in PyPI, we should always download the wheel and source distribution.
We can build both source and wheel distribution from terminal by the command below:
python setup.py sdist bdist_wheel
where sdist
is source distribution and bdist_wheel
is wheel distribution.
Running this command make dist directory that has wheel and source distribution inside in it. Moreover, it creates build and eg-info directory (we may not need those files).
myclustering/
|-- myclustering/
|-- setup.py
|-- requirements.txt
|-- LICENSE
|-- README.md
|-- MANIFEST.in
|-- dist <---
| |-- myclustering-0.1.0-py3-none-any.whl
| |-- myclustering-0.1.0.tar.gz
|-- build
|-- myclustering.egg-info
Upload Package¶
Last step is to upload the created distributed files to PyPI to get access to package from anywhere. It can be done through terminal with twine
:
twine upload dist/*
We have to go to PyPI](https://pypi.org/) to register before uploading the package. The leads to live package that anyone can have access to it and install it by:
pip install myclustering
Alternatively, we can upload distributions on TestPyPI instead of PyPI. It can be done through terminal with twine
:
twine upload -r testpypi dist/*
We need to go to TestPyPI to register. Package can be installed by:
pip install --index-url https://test.pypi.org/simple
--extra-index-url https://pypi.org/simple
myclustering
- Home
-
- Prediction of Movie Genre by Fine-tunning GPT
- Fine-tunning BERT for Fake News Detection
- Covid Tweet Classification by Fine-tunning BART
- Semantic Search Using BERT
- Abstractive Semantic Search by OpenAI Embedding
- Fine-tunning GPT for Style Completion
- Extractive Question-Answering by BERT
- Fine-tunning T5 Model for Abstract Title Prediction
- Image Captioning by Fine-tunning ViT
- Build Serverless ChatGPT API
- Statistical Analysis in Python
- Clustering Algorithms
- Customer Segmentation
- Time Series Forecasting
- PySpark Fundamentals for Big Data
- Predict Customer Churn
- Classification with Imbalanced Classes
- Feature Importance
- Feature Selection
- Text Similarity Measurement
- Dimensionality Reduction
- Prediction of Methane Leakage
- Imputation by LU Simulation
- Histogram Uncertainty
- Delustering to Improve Preferential Sampling
- Uncertainty in Spatial Correlation
-
- Machine Learning Overview
- Python and Pandas
- Main Steps of Machine Learning
- Classification
- Model Training
- Support Vector Machines
- Decision Trees
- Ensemble Learning & Random Forests
- Artificial Neural Network
- Deep Neural Network (DNN)
- Unsupervised Learning
- Multicollinearity
- Introduction to Git
- Introduction to R
- SQL Basic to Advanced Level
- Develop Python Package
- Introduction to BERT LLM
- Exploratory Data Analysis
- Object Oriented Programming in Python
- Natural Language Processing
- Convolutional Neural Network
- Publications