Posted in artificial intelligence

ML101: The what and why

With this series of Posts, I’ll dive into the currently hot topic of machine learning and log my explorations with the topic. This post will act as an introduction to my explorations and the way forward that this series will take.

When I first heard about machine learning, it sounded like magic. It seemed we could just simply take data and feed it to computer like food, and just like that the computer starts doing whatever we want it to!
Of course this was a very naive view of ML, but it’s fundamentally the same. We create a function (say f), and we try to optimise this function by feeding it data, and the resultant approximation of this function should do what function f was supposed to do.

If that isn’t very clear to the reader, it’s nothing much to worry about. I mentioned it briefly in my previous post on AI and will elaborate on the mathematical portions in the upcoming posts. What’s more important here is, our job is not done once we have data. We *need* to define the function f and we *need* to define how optimisation will happen. These definitions come from mathematics, and require some probability and statistics to appreciate fully. But in terms of coding, we don’t really need to know all this, we have several libraries that have worked very hard to conceal this and we just need to provide them our data.

The remainder of this post will focus on what Machine learning can do, and hopefully provide an insight to the reader from my perspective of the field, and what are the possible areas that they might want to explore. Essentially the what and the why and ML.

Continue reading “ML101: The what and why”
Posted in artificial intelligence

Artificial Intelligence and the era of Data

I’ve been working on artificial intelligence (including an AI course at my university!) and it’s branches since the past months and I’ll write a series of blogs about my journey on exploring the modern aspects such as Machine Learning (ML) and Deep Learning (DL). But this post is not about that. This post is what we used to do before we had computation power large enough to make ML/DL models. This post is about the other portions of AI that have been buried in the pile of data-science these days. I hope that it will give the readers a wider perspective about AI and inform them about the possibilities outside Machine Learning.

Artificial Intelligence is something that has been an intriguing subject to humans since a long time. AI is about making machines do what humans can. It doesn’t necessarily involve making a system that learns over time though. Traditionally, AI has been involved in making algorithms. For example, making a machine play TicTacToe is AI.

Continue reading “Artificial Intelligence and the era of Data”
Posted in GSoC, Non Tech

GSoC’18: Passed !

Although probably I’m a bit late (by a week), but I passed my GSoC’18 successfully with @coala.io and I’m so happy and excited about it !
I’ve already written a lot about technical aspects of my project in my previous blogs, so for any technical aspects of the project, feel free to see any of the past GSoC tagged posts. Also in case you’re trying to follow through and get stuck with the coantlib, feel free to drop by at https://gitter.im/coala/ast, we will try our best to help 😀 . This post would be my experience through all of it 😀 .

Continue reading “GSoC’18: Passed !”

Posted in GSoC, Python

coantlib: coala meets ANTLR

This post comes in the last week of my GSoC’18 journey !
At the time of writing this post, I have completed the goals mentioned in my project proposal, and am quite satisfied with the outcome as well. Hope my evaluators too find it up to the mark :D.

This post will summarize over what coantlib is, how is it useful, and what all features it provides, along with some relevant links.

My project, aptly named as coala-antlr is about integrating ANTLRv4 with coala, so that bear writers can utilise ANTLR based parse tree.
If you aren’t familiar with ANTLR, or have no idea what it is, I have written an introductory post about ANTLR, feel free to read through.

coantlib is the core library package within coala-antlr and this package is responsible for providing useful utilities for writing ANTLR-parse tree based bears.
coantbears is the library package within coala-antlr which comprises of bears written with the help of coantlib. These bears leverage the power of parse-trees in addition to coala, which means it makes it easier to write code analysis routines.

Earlier coala could only provide source lines to a bear, and it was difficult to use these lines to write complicated code-analysis routines. With coantlib, now bear writers can use these source lines to get a parse tree, which can be traversed and returns relevant information to the bear-writers which makes it easy to focus on writing the code-analysis.

Why parse trees ?

Parse trees are more powerful than simple linear traversal of source line-by-line, and helps do advanced analysis, such as branch analysis of conditional statements and much more.
Thus it is a powerful tool to have when designing / writing linters.

How does coantlib work ?

coantlib has a concept of walkers. These walkers are grammar dependent pieces of codes, which need to be written for a given grammar. Once a walker for a language is written, a bear writer can simply use this walker to get important information without worrying about how it is extracted from the parse tree.
Thus over time, a walker for a given language will accumulate several useful functions and then a new bear writer can re-use these functions, which drastically reduces the efforts done on part of a bear writer.
A more complete design of coantlib can be found here

What all is possible after addition of coantlib ?

Since the addition of coantlib, one can use any arbitrary grammar, create a walker for it and it’s ready to be used to write bears for coala.
This opens a vast array of languages that can be used with coala, since ANTLR already provides lots of grammars already, located in a github repository : https://github.com/antlr/grammars-v4 .

Examples ?

As a part of this project, I’ve already created walkers for two languages – Python and XML. Find them at: Python3Walker and XMLWalker.
And out of these, I have created three bears, to demonstrate how these walkers can be used:
PyPluralNamingBear
PyQuoteSpacingBear
XMLIndentBear

Out of these, PyPluralNamingBear and XMLIndentBear are capable of fixing the errors that they report, and the fixes suggested by XMLIndentBear are really good.

I encourage the reader to try and hack around (and possibly create a bear as well) in the library which is now available at https://gitlab.com/coala/bears/coala-antlr .

Also docs for this library are currently available at https://virresh.gitlab.io/coala-antlr/ . Although consider this as a mirror, these would soon be hosted officially with coala’s namespace :D.

A big thanks to coala for providing me the opportunity to create this library as their GSoC’18 student, and also a big thanks to my mentors Dong-hee Na for providing guidance and help in shaping this library.

Thanks for reading !

Posted in GSoC, Python

Documenting with sphinx

Every project needs documentation, and without proper documentation, a project is unlikely to survive the test of time. That’s why we have so many tools to help us with documenting things. One such tool very widely used is sphinx.

Sphinx is a highly configurable tool, and so I will go through the basic configuration parameters that I had to deal with while setting up sphinx on my GSoC project. Please be aware that this is not a sphinx-tutorial, just a documentation of few peculiar things I came across while working with sphinx.

For a Java project, this is easily achieved using Java docstrings, and similar stuff is achievable via python triple quoted strings (docstrings).
Continue reading “Documenting with sphinx”

Posted in C++, GSoC, Python

Diamond inheritance: Python and C++ perspective

This post is a follow up from . I’ll expand on the other common and deadly inheritance issue, i.e the diamond inheritance problem. This problem is often not seen in small programs, but is easily encountered when working in an OOP environment, and also is not very easily detected. If you’re seeing weird / unexpected results in an OOP program, this should be one of the highest priority check on your checklist !

Alright, so before explaining what diamond inheritance is, I would like to explain what is multiple inheritance. No multiple inheritance == No diamond inheritance issues ever encountered !

Multiple inheritance is when say you have one class (say child), which is inheriting from two parent classes (say mother and father).

Now to put the issue into perspective, let’s additionally say that both mother and father class are inheriting from a class human. Thus both mother and father will derive some common methods from the human class. Let’s say it’s the eye_colour() (and only eye_colour() for simplicity).

What about the child class then ?
What should be the value of eye_colour() for child class ? Should it be taken from mother class or should it be taken from father class ?

The answer is – It’s language and implementation dependent.
For c++, this kind of a code will not even compile, and will force you to use some form of virtual keyword in order to resolve your problem.

For Python, different versions have different Method Resolution Orders defined (MROs). In short it is a depth first search, so suppose if we use multiple inheritance in Python using:

class Child (Mother, Father):
    pass

Then the Child class will get Mother class’s function eye_colour, since Mother was inherited first and then Father was inherited.
Similarly if we have declared Child class as:

class Child (Father, Mother):
    pass

Then the Child class will get Father class’s function eye_colour since Father was inherited first.

Kindly note that Python had several different algorithms for deciding the Method resolution in the past, so if you try it on older python versions, be sure to check what method resolution order did they use !
The above explanation for resolution is with reference to the MRO used with Python 3.

Note that Java doesn’t support multiple inheritance, and so this will never arise as an issue for the Java programmers 😉 .

Complete code for testing the Python’s method resolution order:

class Human(object):
    def eye_colour(self):
        print('Human eye')

class Mother(Human):
    def eye_colour(self):
        print('Mother eye')

class Father(Human):
    def eye_colour(self):
        print('Father eye')

class Child(Mother, Father):
    pass

newborn = Child()
newborn.eye_colour()

Try changing the order of Mother and Father while inheriting and see the difference in output !

PS: If it isn’t evident from the name, the problem is called diamond problem simply because the shape that’s formed when you draw the inheritance tree on paper.

     Human
    /      \
Mother    Father
    \      /
    Children

Similar to a diamond (if you’ve ever seen a diamond suite in a deck of cards :p ).

Thanks for reading !

Posted in GSoC, Python

Console endpoints in Python: setup tools magic !

Every python package requires some sort of public interaction with the userspace. Python ‘s setuptools provide a perfect way for this. Setuptools provides two ways of direct interaction:
1) Console Scripts- These scripts provide a means to enable the user to directly interact via the console. All console script entry-points are defined to act as binaries on path and invoke the defined function using the command-line arguments passed.

2) Entry points- These are the plugin features provided by setuptools by default. This helps us to create an extension to any given package which supports extension by entrypoints.
The way this works is the main package supports discovery of endpoints using the function iter_entry_points from setuptools.

In reality, console_scripts are really just a special case of entrypoints in general. Instead of having to define a unique name for a console_script function, we have it’s name fixed to console_script in the setup.py. (Example)

A sample snippet in a library that allows plugins to be used would look somewhat like as follows:

for ep in pkg_resources.iter_entry_points('The_unique_entrypoint_a_plugin_must_define'):
        registered_package = ep.load()
        # Do something to take the plugin entrypoint loaded above into account

Now a plugin for a library that uses the above format shall be able to indicate it’s presence at install time via

if __name__=='__main__':
    setup(name='...'
        ...
        entrypoints = {
            'The_unique_entrypoint_a_plugin_must_define':'path to the main function of plugin'
        }
        ...
    )

Thus setup-tools endpoints, provide a really complete and out of the box way of creating a plugin for the python packages.
I found this great tutorial for an example on how the endpoints work. This tutorial goes over both the console_entry scripts and plugin endpoints, both in a very humorous way.

Instead of describing the same thing the above blog does, once again, I will rather focus on how I stumbled across the above.

coala has two main repositories = coala, coala-bears
coala contains the code for the code library. coala-bears will contain all the analysis routines that coala provides.
The quest started when I wanted to make another library – the coala-antlr library that is supposed to provide even more additional bears and extend the functionality of coala by providing additional analysis routines.

The answer came as entrypoints. coala uses the coalabears entrypoint which helps coala distinguish it’s bears uniquely. All packages that define this entrypoint must make this entrypoint point to the python module that contains bears. This is a very nice and neat mechanism, that allows us to extend coala without having to disturb anything else.

Onto the console_scripts !
So coala also has a console script defined, which goes by the name coala. In-fact coala currently provides lot more console scripts than just that. Here are all the entry-points registered as console scripts by coala and can be invoked.

This is how when we install the coala package, we can simply type coala on the command-line and stuff gets imported.

Interestingly, all environment variables and commandline arguments are also passed as such to the function that is called in the console_script entrypoint.

In short, if you just want your package to be imported by other packages, you don’t need entry-points.

If you want your package to be call-able from the command-line as well in addition to above, you need to add console_script entry-points.

If you want your package to have extendable features and would like to enable other packages to enhance your package’s functionality, you definitely want entry-points.

Thanks for reading !