Posted in GSoC

Creating Packages in Python – Python Modules

Today I’ve decided to write a post explaining the process of creating packages in python. I will cover the part on uploading it to PyPi and hosting it for easy installation via pip in another post, but for this one we will have a look at making a sample python package that contains two modules, and we will create the two packages separated physically, but install-able from the same setup.py.

So in short, we will build two packages, adder and multiplier. Here’s the catch, both of them will reside in the same repository, but different folders, and the multiplier package will depend upon the adder package. So the setup.py will install both adder and multiplier. This is a tutorial is best suited for creation of libraries, bootstrapping them in particular.

For a normal package, the official documentation has awesomely documented the procedure from start to end, I’d highly recommend you to go through https://packaging.python.org/tutorials/packaging-projects/ if you want exactly that. Although I will again go through from the basics so this post will be approachable without reading any other material.

Testing such a package is not as easy as just running the module locally without installing, since such modules will usually do imports which will not be available until installed explicitly. Nevertheless, I will point out the pitfalls at appropriate places.

Let’s have an overview of the code structure:

We will have a directory structure as follows:

.
├── adder
│   ├── add.py
│   ├── _cmdline.py
│   └── __init__.py
├── LICENSE
├── multiplier
│   ├── _cmdline.py
│   ├── __init__.py
│   └── multiply.py
├── README.md
└── setup.py

(PS: to generate such an ascii directory representation, you can use tree program on both windows and linux)

Why this structure ?
Both adder and multiplier are in their own folders, which makes sense since they are logically different operations. Similarly you can also separate stuff and segregate into folders like this. The rest is pretty much standard.
The other approach to making such a module would be to put the “add.py” and “multiply.py” inside a single folder called “BasicMath”, but again this is just an example that could be extended to larger libraries with ease which will be difficult under this model.

The __init__.py file can be left empty, or some initialisation can be done in there, but remember there are several traps associated with __init__.py. The best is to not rely on the initialisation done there, but in case you need to, go through this documentation about the traps. For our purposes, the init file is there for compatibility with both python 2 and python 3.

Our sample add.py file :

#!/usr/bin/env python

def add(a,b):
    return a+b

Nothing interesting happening here, just a simple add function.

Now we observe the interesting thing about multiply function (inside multiply.py) :

#!/usr/bin/env python
from adder.add import add

def multiply(a,b):
	if a<0 or b<0:
		return 0
	ans = 0
	for i in range(0,b):
		ans = add(ans,a)
	return ans

Note that in the above function, we do an import via

from adder.add import add

Note the directory structure once again:

.
├── adder
    ├── add.py
    ├── _cmdline.py
    └── __init__.py

Here adder.add refers to the add.py file, and the file defines a function that we want to utilise, that’s the reason why we imported the files as above.

There’s a catch here. If you think you will do

from adder import *

and then use add.add in the multiply function, then you will get an error.
Why so ?
Because __init__.py was supposed to initialise the stuff that needs to be exported when you do an import *. More on this here.

Okay, so preliminaries out of the way, lets see the setup.py itself. This is the main file that can install our python package and defines the important attribute of our package.

#!/usr/bin/env python

from setuptools import setup, find_packages

with open('README.md') as f:
	ldescrip = f.read()

if __name__ == "__main__":
    setup(name='BasicMath',
          version='1.0',
          description='Command line adder and multiplier',
          author='Viresh Gupta',
          url='https://github.com/virresh/python-packaging',
          platforms='any',
          packages=find_packages(),
          license='MIT',
          long_description=ldescrip,
          long_description_content_type="text/markdown",
          entry_points={
              "console_scripts": [
                  "adder = adder._cmdline:main",
                  "multiplier = multiplier._cmdline:main",
              ]
          },
          # from http://pypi.python.org/pypi?%3Aaction=list_classifiers
          classifiers=[
              'Environment :: Console',
              'Intended Audience :: Developers',
              'Operating System :: OS Independent',
          ],
     )

That is our setup.py.
It is pretty much self-explanatory except for the following few things:

  • We are reading the text file using read() and not directly via something like readlines(). This is something to watch out for. If you do this via readlines(), you are most likely to experience an AttributeError, complaining that you cannot split a list object that you didn’t even dream of. Replace that with .read() and it will work like a charm ;). (I leave it to the reader to work out the reason :p).
  • find_packages() function. This one actually does what it’s name suggests, but if we were making only one BasicMath package then why do we need to use this ? The reason is that we have segrated BasicMath further into two folders, which imply two sub-packages. Thus we need to add all these sub-packages into our setup.py, so that it installs both of them.
  • The entry point dictionary. This is where the magic happens. So in case you want your package to be callable from the command line, you can specify so here in the “console_scripts”. Even gui scripts can be put in a separate “gui_scripts” inside the dictionary. And inside this list you can define functions that you want to be called.
    So for our use, we defined a command line script for add.py and multiply.py and referred to that script whenever our packages are called from the command line directly.

How does the entry_point actually work ?
Consider the code for _cmdline.py:

#!/usr/bin/env python
from adder import add
import sys


def main():
	if len(sys.argv) != 3:
		print("Usage : adder <number_1> <number_2>")
		return
	numOne = int(sys.argv[1])
	numTwo = int(sys.argv[2])
	if numOne < 0 or numTwo < 0:
		print("Only positive integers allowed")
		return
	sumR = add.add(numOne, numTwo)
	print("{} + {} = {}".format(numOne,numTwo,sumR))
	return (sumR)

if __name__ == '__main__':
	main()

Here, we simply take the arguments from the command line’s argv, and process them and output accordingly, but of course we can do even more complicated tasks here or use a sophisticated command line argument parser like click.

So what’s happening is, whenever you install the package, the setup.py will create an alias by the name “adder” and point it to this _cmdline’s main() function. Thus whenever you use adder on command line after installing BasicMath, the above function will get called along with the command line arguments that you called adder with.

 

Finally we have completed our BasicMath package and made it installable. I will be using the same trick as above for creating the api required for my GSoC project. So you can expect to see a large and complicated example of the above method soon.

Happy Packaging !
Find the complete code along with some install instructions here.

Advertisements

Author:

Code Lover

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s