Posted in GSoC, Non Tech

GSoC’18: Passed !

Although probably I’m a bit late (by a week), but I passed my GSoC’18 successfully with @coala.io and I’m so happy and excited about it !
I’ve already written a lot about technical aspects of my project in my previous blogs, so for any technical aspects of the project, feel free to see any of the past GSoC tagged posts. Also in case you’re trying to follow through and get stuck with the coantlib, feel free to drop by at https://gitter.im/coala/ast, we will try our best to help 😀 . This post would be my experience through all of it 😀 .

Continue reading “GSoC’18: Passed !”

Advertisements
Posted in GSoC, Python

coantlib: coala meets ANTLR

This post comes in the last week of my GSoC’18 journey !
At the time of writing this post, I have completed the goals mentioned in my project proposal, and am quite satisfied with the outcome as well. Hope my evaluators too find it up to the mark :D.

This post will summarize over what coantlib is, how is it useful, and what all features it provides, along with some relevant links.

My project, aptly named as coala-antlr is about integrating ANTLRv4 with coala, so that bear writers can utilise ANTLR based parse tree.
If you aren’t familiar with ANTLR, or have no idea what it is, I have written an introductory post about ANTLR, feel free to read through.

coantlib is the core library package within coala-antlr and this package is responsible for providing useful utilities for writing ANTLR-parse tree based bears.
coantbears is the library package within coala-antlr which comprises of bears written with the help of coantlib. These bears leverage the power of parse-trees in addition to coala, which means it makes it easier to write code analysis routines.

Earlier coala could only provide source lines to a bear, and it was difficult to use these lines to write complicated code-analysis routines. With coantlib, now bear writers can use these source lines to get a parse tree, which can be traversed and returns relevant information to the bear-writers which makes it easy to focus on writing the code-analysis.

Why parse trees ?

Parse trees are more powerful than simple linear traversal of source line-by-line, and helps do advanced analysis, such as branch analysis of conditional statements and much more.
Thus it is a powerful tool to have when designing / writing linters.

How does coantlib work ?

coantlib has a concept of walkers. These walkers are grammar dependent pieces of codes, which need to be written for a given grammar. Once a walker for a language is written, a bear writer can simply use this walker to get important information without worrying about how it is extracted from the parse tree.
Thus over time, a walker for a given language will accumulate several useful functions and then a new bear writer can re-use these functions, which drastically reduces the efforts done on part of a bear writer.
A more complete design of coantlib can be found here

What all is possible after addition of coantlib ?

Since the addition of coantlib, one can use any arbitrary grammar, create a walker for it and it’s ready to be used to write bears for coala.
This opens a vast array of languages that can be used with coala, since ANTLR already provides lots of grammars already, located in a github repository : https://github.com/antlr/grammars-v4 .

Examples ?

As a part of this project, I’ve already created walkers for two languages – Python and XML. Find them at: Python3Walker and XMLWalker.
And out of these, I have created three bears, to demonstrate how these walkers can be used:
PyPluralNamingBear
PyQuoteSpacingBear
XMLIndentBear

Out of these, PyPluralNamingBear and XMLIndentBear are capable of fixing the errors that they report, and the fixes suggested by XMLIndentBear are really good.

I encourage the reader to try and hack around (and possibly create a bear as well) in the library which is now available at https://gitlab.com/coala/bears/coala-antlr .

Also docs for this library are currently available at https://virresh.gitlab.io/coala-antlr/ . Although consider this as a mirror, these would soon be hosted officially with coala’s namespace :D.

A big thanks to coala for providing me the opportunity to create this library as their GSoC’18 student, and also a big thanks to my mentors Dong-hee Na for providing guidance and help in shaping this library.

Thanks for reading !

Posted in GSoC, Python

Documenting with sphinx

Every project needs documentation, and without proper documentation, a project is unlikely to survive the test of time. That’s why we have so many tools to help us with documenting things. One such tool very widely used is sphinx.

Sphinx is a highly configurable tool, and so I will go through the basic configuration parameters that I had to deal with while setting up sphinx on my GSoC project. Please be aware that this is not a sphinx-tutorial, just a documentation of few peculiar things I came across while working with sphinx.

For a Java project, this is easily achieved using Java docstrings, and similar stuff is achievable via python triple quoted strings (docstrings).
Continue reading “Documenting with sphinx”

Posted in C++, GSoC, Python

Diamond inheritance: Python and C++ perspective

This post is a follow up from . I’ll expand on the other common and deadly inheritance issue, i.e the diamond inheritance problem. This problem is often not seen in small programs, but is easily encountered when working in an OOP environment, and also is not very easily detected. If you’re seeing weird / unexpected results in an OOP program, this should be one of the highest priority check on your checklist !

Alright, so before explaining what diamond inheritance is, I would like to explain what is multiple inheritance. No multiple inheritance == No diamond inheritance issues ever encountered !

Multiple inheritance is when say you have one class (say child), which is inheriting from two parent classes (say mother and father).

Now to put the issue into perspective, let’s additionally say that both mother and father class are inheriting from a class human. Thus both mother and father will derive some common methods from the human class. Let’s say it’s the eye_colour() (and only eye_colour() for simplicity).

What about the child class then ?
What should be the value of eye_colour() for child class ? Should it be taken from mother class or should it be taken from father class ?

The answer is – It’s language and implementation dependent.
For c++, this kind of a code will not even compile, and will force you to use some form of virtual keyword in order to resolve your problem.

For Python, different versions have different Method Resolution Orders defined (MROs). In short it is a depth first search, so suppose if we use multiple inheritance in Python using:

class Child (Mother, Father):
    pass

Then the Child class will get Mother class’s function eye_colour, since Mother was inherited first and then Father was inherited.
Similarly if we have declared Child class as:

class Child (Father, Mother):
    pass

Then the Child class will get Father class’s function eye_colour since Father was inherited first.

Kindly note that Python had several different algorithms for deciding the Method resolution in the past, so if you try it on older python versions, be sure to check what method resolution order did they use !
The above explanation for resolution is with reference to the MRO used with Python 3.

Note that Java doesn’t support multiple inheritance, and so this will never arise as an issue for the Java programmers 😉 .

Complete code for testing the Python’s method resolution order:

class Human(object):
    def eye_colour(self):
        print('Human eye')

class Mother(Human):
    def eye_colour(self):
        print('Mother eye')

class Father(Human):
    def eye_colour(self):
        print('Father eye')

class Child(Mother, Father):
    pass

newborn = Child()
newborn.eye_colour()

Try changing the order of Mother and Father while inheriting and see the difference in output !

PS: If it isn’t evident from the name, the problem is called diamond problem simply because the shape that’s formed when you draw the inheritance tree on paper.

     Human
    /      \
Mother    Father
    \      /
    Children

Similar to a diamond (if you’ve ever seen a diamond suite in a deck of cards :p ).

Thanks for reading !

Posted in GSoC, Python

Console endpoints in Python: setup tools magic !

Every python package requires some sort of public interaction with the userspace. Python ‘s setuptools provide a perfect way for this. Setuptools provides two ways of direct interaction:
1) Console Scripts- These scripts provide a means to enable the user to directly interact via the console. All console script entry-points are defined to act as binaries on path and invoke the defined function using the command-line arguments passed.

2) Entry points- These are the plugin features provided by setuptools by default. This helps us to create an extension to any given package which supports extension by entrypoints.
The way this works is the main package supports discovery of endpoints using the function iter_entry_points from setuptools.

In reality, console_scripts are really just a special case of entrypoints in general. Instead of having to define a unique name for a console_script function, we have it’s name fixed to console_script in the setup.py. (Example)

A sample snippet in a library that allows plugins to be used would look somewhat like as follows:

for ep in pkg_resources.iter_entry_points('The_unique_entrypoint_a_plugin_must_define'):
        registered_package = ep.load()
        # Do something to take the plugin entrypoint loaded above into account

Now a plugin for a library that uses the above format shall be able to indicate it’s presence at install time via

if __name__=='__main__':
    setup(name='...'
        ...
        entrypoints = {
            'The_unique_entrypoint_a_plugin_must_define':'path to the main function of plugin'
        }
        ...
    )

Thus setup-tools endpoints, provide a really complete and out of the box way of creating a plugin for the python packages.
I found this great tutorial for an example on how the endpoints work. This tutorial goes over both the console_entry scripts and plugin endpoints, both in a very humorous way.

Instead of describing the same thing the above blog does, once again, I will rather focus on how I stumbled across the above.

coala has two main repositories = coala, coala-bears
coala contains the code for the code library. coala-bears will contain all the analysis routines that coala provides.
The quest started when I wanted to make another library – the coala-antlr library that is supposed to provide even more additional bears and extend the functionality of coala by providing additional analysis routines.

The answer came as entrypoints. coala uses the coalabears entrypoint which helps coala distinguish it’s bears uniquely. All packages that define this entrypoint must make this entrypoint point to the python module that contains bears. This is a very nice and neat mechanism, that allows us to extend coala without having to disturb anything else.

Onto the console_scripts !
So coala also has a console script defined, which goes by the name coala. In-fact coala currently provides lot more console scripts than just that. Here are all the entry-points registered as console scripts by coala and can be invoked.

This is how when we install the coala package, we can simply type coala on the command-line and stuff gets imported.

Interestingly, all environment variables and commandline arguments are also passed as such to the function that is called in the console_script entrypoint.

In short, if you just want your package to be imported by other packages, you don’t need entry-points.

If you want your package to be call-able from the command-line as well in addition to above, you need to add console_script entry-points.

If you want your package to have extendable features and would like to enable other packages to enhance your package’s functionality, you definitely want entry-points.

Thanks for reading !

Posted in GSoC, Python

The shortcomings of ANTLR

An important aspect of using any tool is knowing when not to use it, and the same goes with ANTLR as well.
This post is dedicated to the shortcomings that I faced while using the Python runtime, as of July 2018. The ANTLR team is continuously working to improve them and is possible that they might get resolved in the future.

The first and the foremost shortcoming is modifying a parse tree in memory. Once a source code is read, ANTLR did very poorly at providing an interface for re-writing the parse tree. Here is a PR that is an attempt to improve that, so possibly this will change in future soon.

The next shortcoming in the Python runtime specifically (not sure about the Java runtime, would be great if someone could confirm), is retrieval of source code. Although ANTLR does very well at giving us the positions of tokens (there line numbers and column numbers in the original source code), it didn’t work out well when I tried converting the parse tree back into a source code.

For e.g consider a python source code such as:

var_x = 5
if var_x:
    print('Variable x is', var_x)
else:
    print('Variable x is not set')

If I pass the above code to ANTLR’s python runtime, and then use this code to regenerate the output, it would look like as follows:

var_x=5
ifvar_x:
print('Variable x is',var_x)
else:
print('Variable x is not set')

The source of inspiration for the round-tripping code is this stack overflow answer.

Essentially, ANTLR has this concept of a HiddenTokenStream, into which all the unused tokens are dumped (such as whitespaces). During my attempt to retrieve the original source code from a given parse tree, the Python runtime could not put together the spaces well, it missed out on all white spaces. The solution could be to add white spaces according to the column number, but it doesn’t work very well if we don’t know what the original white space was, it is very well possible that we replace a tab with a space, or vice-versa inadvertently.

The above two reasons are majorly blockers into creating an API that can suggest complicated fixes for the anomalies detected by the linting logic.

Also ANTLR is a parse tree generator, and I’ve seen many people avoid parse tree generators for the reason that they are quite slow, and writing a custom parser will give you the extra drop of optimisation that’s needed in a typical compilation process.

Many such articles explain the reasons for better performance of a Parser written by hand, such as this one. I didn’t run into this yet, but is mentioned here for completeness sake.

With this, I conclude my post for the second phase. Hoping for a positive outcome of the evaluations 🙂 .

Thanks for reading !

Posted in GSoC, Python

Deeper into Visitor Patterns – The flexible design for any meta-program

This is the third blog on ANTLR, and in this one I will walk through the process on how to manipulate ANTLR’s parse trees to do advanced stuff.
More specifically, I will talk about the visitor pattern of traversing the parse tree, and will explain alongside a complicated use case that I have been working on in these summers alongside coala for my GSoC !

In the last two blogs, I established setting up of ANTLR to work with Python and usage of the Python runtime for parse tree traversal, and also explained it’s usage with one grammar taken from the official grammars repository. You can have a look at them here (Part 1) and here (Part 2).

In the last blog, we constructed a meta-program that could read a python source code and report all the class methods in a nice dictionary format. Notice that we had to override the visitFuncdef method from the Python3Visitor class. This was a relatively simple use case that followed the defaults very closely and required only one function being over-ridden. Continue reading “Deeper into Visitor Patterns – The flexible design for any meta-program”