This lesson is being piloted (Beta version)

Organize and distribute a Python package

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • What license for Python projects?

  • How to organize and distribute a python package on pypi?

  • How to distribute a python package with conda?

  • How do Python projects deploy their documentation?

  • What is DevOps?

Objectives

Making sense of Software licensing

Software licensing is a complicated topic, but crucial when willing to share your code.

A software license is an agreement between users and the owners of a software program that allows users to do certain things that would otherwise be an infringement of copyright law.

The software license usually answers questions such as:

For this section, we will use the CodeRefinery slides.

Social coding and open software

Software packaging

This section is from Python Packaging User guide.

Directory structure for projects

A project directory can look something like this:

project_name/
├── README.md			# overview of the project
├── data			# data files used in the project
│   ├── README.md		# describes where data came from
│   └── sub-folder/		# may contain subdirectories
├── processed_data/		# intermediate files from the analysis
├── manuscript/			# manuscript describing the results
├── results/			# results of the analysis (data, tables, figures)
├── source			# contains all code in the project
│   ├── LICENSE			    # license for your code
│   ├── requirements.txt	# software requirements and dependencies
│   ├── README.md	        # overview of the source folder
│   └── ...
└── doc/			# documentation for your project
    ├── mydocs.rst
    └── ...

Tracking source code, data and results

Create a Python package

The Python source files are in source directory and now we will learn how to organize the source directory to create a python package. We will discuss how to write the software documentation later.

mkdir deep_project
mkdir -p deep_project/source
mkdir -p deep_project/source/mypackage
cd deep_project/source/mypackage
name = "mypackage"

The __init__.py files are required (in each folder with python code) to make Python treat the directories as containing packages; this is done to prevent directories with a common name, such as string, from unintentionally hiding valid modules that occur later (deeper) on the module search path. In the simplest case, __init__.py can just be an empty file, but it can also execute initialization code for the package or set the __all__ variable, described later.

setup.py is the build script for setuptools. It tells setuptools about your package (such as the name and version) as well as which code files to include.

Open setup.py and enter the following content.

import setuptools

with open("README.md", "r") as fh:
    long_description = fh.read()

setuptools.setup(
    name="mypackage-your-username",
    version="0.0.1",
    author="Example Author",
    author_email="author@example.com",
    description="A small example package",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/pypa/sampleproject",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
)

You should update the package name to include your username (for example, mypackage-annefou. You can personalize the other values; for instance the url where you could add your github repository for mypackage (we strongly suggest you use a version control to store your python code).

setup() takes several arguments. This example package uses a relatively minimal set:

There are many more than the ones mentioned here. See Packaging and distributing projects for more details.

The next step is to generate distribution packages for the package. These are archives that are uploaded to the Package Index and can be installed by pip.

Make sure you have the latest versions of setuptools and wheel installed:

python3 -m pip install --user --upgrade setuptools wheel

Tip

You can also install these two packages using conda

Now run this command from the same directory where setup.py is located:

python3 setup.py sdist bdist_wheel

This command should output a lot of text and once completed should generate two files in the dist directory:

dist/
  mypackage_your_username-0.0.1-py3-none-any.whl
  mypackage_your_username-0.0.1.tar.gz

The tar.gz file is a source archive whereas the .whl file is a built distribution. Newer pip versions preferentially install built distributions, but will fall back to source archives if needed. You should always upload a source archive and provide built archives for the platforms your project is compatible with. In this case, our example package is compatible with Python on any platform so only one built distribution is needed.

Finally, it’s time to upload your package to the Python Package Index!

The first thing you’ll need to do is register an account on Test PyPI. Test PyPI is a separate instance of the package index intended for testing and experimentation. It’s great for things like workshops where we don’t necessarily want to upload to the real index. To register an account, go to

https://test.pypi.org/account/register/ and complete the steps on that page.

You will also need to verify your email address before you’re able to upload any packages. For more details on Test PyPI, see Using TestPyPI.

Now that you are registered, you can use twine to upload the distribution packages. You’ll need to install Twine:

python3 -m pip install --user --upgrade twine

As before, you can also use conda to install it.

Once installed, run Twine to upload all of the archives under dist:

python3 -m twine upload --repository-url https://test.pypi.org/legacy/ dist/*

You will be prompted for the username and password you registered with Test PyPI. After the command completes, you should see output similar to this:

Uploading distributions to https://test.pypi.org/legacy/
Enter your username: [your username]
Enter your password:
Uploading mypackage_your_username-0.0.1-py3-none-any.whl
100%|█████████████████████| 4.65k/4.65k [00:01<00:00, 2.88kB/s]
Uploading mypackage_your_username-0.0.1.tar.gz
100%|█████████████████████| 4.25k/4.25k [00:01<00:00, 3.05kB/s]

Once uploaded your package should be viewable on TestPyPI, for example, https://test.pypi.org/project/mypackage-your-username

python3 -m pip install --index-url https://test.pypi.org/simple/ --no-deps mypackage-your-username

Make sure to specify your username in the package name!

pip should install the package from Test PyPI and the output should look something like this:

Collecting mypackage-your-username
  Downloading https://test-files.pythonhosted.org/packages/.../mypackage-your-username-0.0.1-py3-none-any.whl
Installing collected packages: mypackage-your-username
Successfully installed mypackage-your-username-0.0.1

Note This example uses --index-url flag to specify TestPyPI instead of live PyPI. Additionally, it specifies --no-deps. Since TestPyPI doesn’t have the same packages as the live PyPI, it’s possible that attempting to install dependencies may fail or install something unexpected. While our example package doesn’t have any dependencies, it’s a good practice to avoid installing dependencies when using TestPyPI. You can test that it was installed correctly by importing the module and referencing the name property you put in __init__.py earlier.

Run the Python interpreter (make sure you’re still in your virtualenv):

python

And then import the module and print out the name property. This should be the same regardless of what you name you gave your distribution package in setup.py (in this case, mypackage-your-username) because your import package is mypackage.

import mypackage

mypackage.name
'mypackage'

Conda

This section is from the CodeRefinery lesson on Reproducible Research.

Conda as a package manager

With conda it is easy to list, search for, install, remove and update packages. We can list all currently installed packages:

$ conda list

Let’s say we want to install Snakemake. We begin by searching for it:

$ conda search snakemake

Loading channels: done
No match found for: snakemake. Search: *snakemake*

PackagesNotFoundError: The following packages are not available from current channels:

  - snakemake
...

Hmm, it’s not available from our current channels. What are those? Let’s have a look at the configured channels:

$ conda config --get channels

--add channels 'defaults'   # lowest priority
--add channels 'conda-forge'   # highest priority

Ok, so we might need to look into other conda channels. This we can do either via Anaconda Cloud or through the anaconda command:

$ anaconda search snakemake

Using Anaconda API: https://api.anaconda.org
Packages:
   Name                      |  Version | Package Types   | Platforms       | Builds
   ------------------------- |   ------ | --------------- | --------------- | ----------
     bioconda/snakemake        |    5.4.3 | conda           | linux-64, noarch, osx-64 | py34_1, py34_0, py36_1, py36_0, py36_2, 0, 2, py35_2, py35_0, py35_1
...
     bioconda/snakemake-minimal |    5.4.3 | conda           | linux-64, noarch, osx-64 | py36_1, py36_0, py_0, py_1, py_2, py35_0, py35_1
...

We see that Snakemake is available in the bioconda channel. But we also see that there’s an alternative package called snakemake-minimal. What’s the difference? Let’s search for snakemake-minimal in the bioconda channel, display it’s information, and compare it to the full snakemake package. We’ll also limit ourselves to version 5.4.3:

$ conda search -c bioconda snakemake-minimal=5.4.3 --info

snakemake-minimal 5.4.3 py_0
----------------------------
...
dependencies:
  - appdirs
  - configargparse
  - datrie
  - docutils
  - gitpython
  - jsonschema
  - psutil
  - python >=3.5
  - pyyaml
  - ratelimiter
  - requests >=2.8.1
  - setuptools
  - wrapt

What about the full package?

$ conda search -c bioconda snakemake=5.4.3 --info

snakemake 5.4.3 0
-----------------
...
dependencies:
  - aioeasywebdav
  - boto3
  - dropbox >=7.2.1
  - filechunkio >=1.6
  - ftputil >=3.2
  - google-cloud-storage
  - jinja2
  - jsonschema
  - networkx >=2.0
  - pandas
  - psutil
  - pygraphviz
  - pysftp >=0.2.8
  - python-irodsclient
  - snakemake-minimal 5.4.3.*

So we see that snakemake contains several additional packages compared to snakemake-minimal.

We can now install it via:

$ conda install -c bioconda snakemake-minimal

If we want to update the package to the latest version:

$ conda update snakemake-minimal

and if we later want to remove it:

$ conda remove snakemake-minimal

Conda as an environment manager

Conda allows us to create isolated environments for different software projects. For simplicity’s sake, let’s say our colleague is using pandas version 0.20.3, while we have pandas 0.24.1. We create a new conda environment, and specify the versions of pandas:

$ conda create -n pd20 pandas=0.20

## Package Plan ##

  environment location: /Users/ktw/anaconda3/envs/pd20

  added / updated specs:
    - pandas=0.20

The following packages will be downloaded:
...

# To activate this environment, use
#
#     $ conda activate pd20
...

We activate the environment, and double-check that we have the correct versions:

$ conda activate pd20

(pd20)$ python -c "import pandas ; print(pandas.__version__)"
0.20.3

To list all environments, use the info subcommand:

$ conda info -e

base                     /Users/ktw/anaconda3
pd20                  *  /Users/ktw/anaconda3/envs/pd20

Reproducibility

Specifying a single version number of a package is simple, but for increased control, portability and reproducibility, we should use a file (in yaml or txt format) specifying packages, versions and channels needed to create the environment for a project.

Conda can generate this file for you, in one of two ways:

$ conda env export > environment.yml      # exports in yaml format
$ conda list --export > requirements.txt  # exports in simple text

In the word-count project used in earlier episodes there is a simple requirements file, and we can create a new conda environment based on it:

$ conda create -n word-count --file requirements.txt
...
$ conda activate word-count

Using conda to share a package

Conda packages can be built from a recipe and shared on anaconda.org via your own private or public channel, or via conda-forge.

A step-by-step guide on how to contribute packages can be found in the conda-forge documentation.

To get an idea of what’s needed, let’s have a look at the boost feedstock (a set of C++ libraries). We see that:


Conda vs pip vs virtualenv vs pipenv vs poetry…

Tool Purpose Comments
pip Python package installer Can be used with conda.
virtualenv Tool to create isolated Python environments Partly integrated into standard library under venv module.
pipenv Python package and virtualenv management Official PyPA recommendation, combines functionality of pip and virtualenv.
poetry Handle dependency installation, building/packaging of Python packages Competitor to pipenv.

Learn to write Software documentation with CodeRefinery

We use CodeRefinery lesson on code documentation.

Questions

Why is project documentation important?

Software documentation in practice

Good resources

DevOps and automation

In this section we will be using CodeRefinery lesson on DevOps and Automation.

Key Points

  • software license

  • pypi

  • conda

  • Software documentation

  • DevOps and automation