Posted in GSoC, Python

coantlib: coala meets ANTLR

This post comes in the last week of my GSoC’18 journey !
At the time of writing this post, I have completed the goals mentioned in my project proposal, and am quite satisfied with the outcome as well. Hope my evaluators too find it up to the mark :D.

This post will summarize over what coantlib is, how is it useful, and what all features it provides, along with some relevant links.

My project, aptly named as coala-antlr is about integrating ANTLRv4 with coala, so that bear writers can utilise ANTLR based parse tree.
If you aren’t familiar with ANTLR, or have no idea what it is, I have written an introductory post about ANTLR, feel free to read through.

coantlib is the core library package within coala-antlr and this package is responsible for providing useful utilities for writing ANTLR-parse tree based bears.
coantbears is the library package within coala-antlr which comprises of bears written with the help of coantlib. These bears leverage the power of parse-trees in addition to coala, which means it makes it easier to write code analysis routines.

Earlier coala could only provide source lines to a bear, and it was difficult to use these lines to write complicated code-analysis routines. With coantlib, now bear writers can use these source lines to get a parse tree, which can be traversed and returns relevant information to the bear-writers which makes it easy to focus on writing the code-analysis.

Why parse trees ?

Parse trees are more powerful than simple linear traversal of source line-by-line, and helps do advanced analysis, such as branch analysis of conditional statements and much more.
Thus it is a powerful tool to have when designing / writing linters.

How does coantlib work ?

coantlib has a concept of walkers. These walkers are grammar dependent pieces of codes, which need to be written for a given grammar. Once a walker for a language is written, a bear writer can simply use this walker to get important information without worrying about how it is extracted from the parse tree.
Thus over time, a walker for a given language will accumulate several useful functions and then a new bear writer can re-use these functions, which drastically reduces the efforts done on part of a bear writer.
A more complete design of coantlib can be found here

What all is possible after addition of coantlib ?

Since the addition of coantlib, one can use any arbitrary grammar, create a walker for it and it’s ready to be used to write bears for coala.
This opens a vast array of languages that can be used with coala, since ANTLR already provides lots of grammars already, located in a github repository : https://github.com/antlr/grammars-v4 .

Examples ?

As a part of this project, I’ve already created walkers for two languages – Python and XML. Find them at: Python3Walker and XMLWalker.
And out of these, I have created three bears, to demonstrate how these walkers can be used:
PyPluralNamingBear
PyQuoteSpacingBear
XMLIndentBear

Out of these, PyPluralNamingBear and XMLIndentBear are capable of fixing the errors that they report, and the fixes suggested by XMLIndentBear are really good.

I encourage the reader to try and hack around (and possibly create a bear as well) in the library which is now available at https://gitlab.com/coala/bears/coala-antlr .

Also docs for this library are currently available at https://virresh.gitlab.io/coala-antlr/ . Although consider this as a mirror, these would soon be hosted officially with coala’s namespace :D.

A big thanks to coala for providing me the opportunity to create this library as their GSoC’18 student, and also a big thanks to my mentors Dong-hee Na for providing guidance and help in shaping this library.

Thanks for reading !

Advertisements
Posted in GSoC, Python

Documenting with sphinx

Every project needs documentation, and without proper documentation, a project is unlikely to survive the test of time. That’s why we have so many tools to help us with documenting things. One such tool very widely used is sphinx.

Sphinx is a highly configurable tool, and so I will go through the basic configuration parameters that I had to deal with while setting up sphinx on my GSoC project. Please be aware that this is not a sphinx-tutorial, just a documentation of few peculiar things I came across while working with sphinx.

For a Java project, this is easily achieved using Java docstrings, and similar stuff is achievable via python triple quoted strings (docstrings).
Continue reading “Documenting with sphinx”

Posted in C++, GSoC, Python

Diamond inheritance: Python and C++ perspective

This post is a follow up from . I’ll expand on the other common and deadly inheritance issue, i.e the diamond inheritance problem. This problem is often not seen in small programs, but is easily encountered when working in an OOP environment, and also is not very easily detected. If you’re seeing weird / unexpected results in an OOP program, this should be one of the highest priority check on your checklist !

Alright, so before explaining what diamond inheritance is, I would like to explain what is multiple inheritance. No multiple inheritance == No diamond inheritance issues ever encountered !

Multiple inheritance is when say you have one class (say child), which is inheriting from two parent classes (say mother and father).

Now to put the issue into perspective, let’s additionally say that both mother and father class are inheriting from a class human. Thus both mother and father will derive some common methods from the human class. Let’s say it’s the eye_colour() (and only eye_colour() for simplicity).

What about the child class then ?
What should be the value of eye_colour() for child class ? Should it be taken from mother class or should it be taken from father class ?

The answer is – It’s language and implementation dependent.
For c++, this kind of a code will not even compile, and will force you to use some form of virtual keyword in order to resolve your problem.

For Python, different versions have different Method Resolution Orders defined (MROs). In short it is a depth first search, so suppose if we use multiple inheritance in Python using:

class Child (Mother, Father):
    pass

Then the Child class will get Mother class’s function eye_colour, since Mother was inherited first and then Father was inherited.
Similarly if we have declared Child class as:

class Child (Father, Mother):
    pass

Then the Child class will get Father class’s function eye_colour since Father was inherited first.

Kindly note that Python had several different algorithms for deciding the Method resolution in the past, so if you try it on older python versions, be sure to check what method resolution order did they use !
The above explanation for resolution is with reference to the MRO used with Python 3.

Note that Java doesn’t support multiple inheritance, and so this will never arise as an issue for the Java programmers 😉 .

Complete code for testing the Python’s method resolution order:

class Human(object):
    def eye_colour(self):
        print('Human eye')

class Mother(Human):
    def eye_colour(self):
        print('Mother eye')

class Father(Human):
    def eye_colour(self):
        print('Father eye')

class Child(Mother, Father):
    pass

newborn = Child()
newborn.eye_colour()

Try changing the order of Mother and Father while inheriting and see the difference in output !

PS: If it isn’t evident from the name, the problem is called diamond problem simply because the shape that’s formed when you draw the inheritance tree on paper.

     Human
    /      \
Mother    Father
    \      /
    Children

Similar to a diamond (if you’ve ever seen a diamond suite in a deck of cards :p ).

Thanks for reading !

Posted in GSoC, Python

Console endpoints in Python: setup tools magic !

Every python package requires some sort of public interaction with the userspace. Python ‘s setuptools provide a perfect way for this. Setuptools provides two ways of direct interaction:
1) Console Scripts- These scripts provide a means to enable the user to directly interact via the console. All console script entry-points are defined to act as binaries on path and invoke the defined function using the command-line arguments passed.

2) Entry points- These are the plugin features provided by setuptools by default. This helps us to create an extension to any given package which supports extension by entrypoints.
The way this works is the main package supports discovery of endpoints using the function iter_entry_points from setuptools.

In reality, console_scripts are really just a special case of entrypoints in general. Instead of having to define a unique name for a console_script function, we have it’s name fixed to console_script in the setup.py. (Example)

A sample snippet in a library that allows plugins to be used would look somewhat like as follows:

for ep in pkg_resources.iter_entry_points('The_unique_entrypoint_a_plugin_must_define'):
        registered_package = ep.load()
        # Do something to take the plugin entrypoint loaded above into account

Now a plugin for a library that uses the above format shall be able to indicate it’s presence at install time via

if __name__=='__main__':
    setup(name='...'
        ...
        entrypoints = {
            'The_unique_entrypoint_a_plugin_must_define':'path to the main function of plugin'
        }
        ...
    )

Thus setup-tools endpoints, provide a really complete and out of the box way of creating a plugin for the python packages.
I found this great tutorial for an example on how the endpoints work. This tutorial goes over both the console_entry scripts and plugin endpoints, both in a very humorous way.

Instead of describing the same thing the above blog does, once again, I will rather focus on how I stumbled across the above.

coala has two main repositories = coala, coala-bears
coala contains the code for the code library. coala-bears will contain all the analysis routines that coala provides.
The quest started when I wanted to make another library – the coala-antlr library that is supposed to provide even more additional bears and extend the functionality of coala by providing additional analysis routines.

The answer came as entrypoints. coala uses the coalabears entrypoint which helps coala distinguish it’s bears uniquely. All packages that define this entrypoint must make this entrypoint point to the python module that contains bears. This is a very nice and neat mechanism, that allows us to extend coala without having to disturb anything else.

Onto the console_scripts !
So coala also has a console script defined, which goes by the name coala. In-fact coala currently provides lot more console scripts than just that. Here are all the entry-points registered as console scripts by coala and can be invoked.

This is how when we install the coala package, we can simply type coala on the command-line and stuff gets imported.

Interestingly, all environment variables and commandline arguments are also passed as such to the function that is called in the console_script entrypoint.

In short, if you just want your package to be imported by other packages, you don’t need entry-points.

If you want your package to be call-able from the command-line as well in addition to above, you need to add console_script entry-points.

If you want your package to have extendable features and would like to enable other packages to enhance your package’s functionality, you definitely want entry-points.

Thanks for reading !

Posted in GSoC, Python

Deeper into Visitor Patterns – The flexible design for any meta-program

This is the third blog on ANTLR, and in this one I will walk through the process on how to manipulate ANTLR’s parse trees to do advanced stuff.
More specifically, I will talk about the visitor pattern of traversing the parse tree, and will explain alongside a complicated use case that I have been working on in these summers alongside coala for my GSoC !

In the last two blogs, I established setting up of ANTLR to work with Python and usage of the Python runtime for parse tree traversal, and also explained it’s usage with one grammar taken from the official grammars repository. You can have a look at them here (Part 1) and here (Part 2).

In the last blog, we constructed a meta-program that could read a python source code and report all the class methods in a nice dictionary format. Notice that we had to override the visitFuncdef method from the Python3Visitor class. This was a relatively simple use case that followed the defaults very closely and required only one function being over-ridden. Continue reading “Deeper into Visitor Patterns – The flexible design for any meta-program”

Posted in GSoC, Python

Dive deeper into ANTLR – Write a python meta-program !

In the previous post on ANTLR, I introduced how to set-up ANTLR locally and use the python runtime to hack on a customised grammar (and made a simple calculator using that).

In this post we will take it forward from there, and write a program that parses actual Python source code !
Just for completeness, such a kind of programming is called meta-programming : A program that takes input as the source code of another program.

Let’s first do some conceptual overview of what we want to achieve:
We want to write a program, that reads another Python class source code, and reports all the variables that are class-methods.

For e.g, let’s take the following class:

def func2():
    pass

class aSampleClass(object):
    def func(self):
        pass

From the above piece of code, we can easily see that func2 is a normal method and func is a class method. We want to identify all methods of this kind !

Continue reading “Dive deeper into ANTLR – Write a python meta-program !”

Posted in GSoC, Python

moban: Templating using jinja !

While working with coala for the past few weeks, there’s a movement to standardise the code of various projects under the same organisation.
The way this is being done is through templating the common parts of code, like the setup.py files, CI files etc etc.

The tool of choice there is via a templating engine like Jinja2, and the parser that we used is moban. Those familiar with django or flask would be already aware of Jinja2 templates, very useful for html templates, and those familiar with jekyll would be aware of liquid, so an analogy for them would be  liquid:ruby::jinja:python .

Even if you didn’t get any of the jargon in the previous para, no worries, we will come back to it in a while. Consider the following example:

You want to create several html files, but you have common navigation bar code for all of them, would you copy paste the same code in each of the files manually ?

Continue reading “moban: Templating using jinja !”