Python isn't the only programming language you can use for a data science / machine learning project. But it is the most versatile.
The company I work for has some clever machine learning (ML) algorithms. But they live in MatLab, the development environment for scientists and engineers.
To turn those algorithms into a customer-facing product needed some decision making.
The biggest one being, how were we getting them out of MatLab?
The short answer was to compile them into dynamic C libraries. There are other options, but remember, the plan was to build a software product around them.
By choosing to export the clever stuff as C meant we had options when it came to connecting to and using them.
The big question was: what should we use?
The intention was to allow other software vendors to feed the algorithms with data via an API. And then to use the trained output via another set of APIs.
We've got APIs...
The web service got built outside of our company using PHP and the Laravel framework.
So now there was a way to pass data in ready for processing through the ML algorithms.
That's only half the story. Once the data is in our databases, it needs feeding into the algorithms. So learning/training can take place.
The output from that training needs capturing. It also needs to a way to work in a user interface (UI) of some kind - we'll get to that.
...But we need more APIs
Great, we can get data in. But we need to send it through the algorithms and catch the trained/learned output.
Remember at this point we have a bunch of C libraries.
To run the training process those functions do the job. But how to we call them?
What is there that can call C functions run the processing and catch the output?
Oh and also provide APIs for the whole process?
First, Python has a foreign function library called ctypes. It has C compatiable data types and can call functions in DLLs or shared libraries.
Spot on. Problem solved.
Here's what we did:
- First, compile the C code from MatLAb into a shared library.
- Next, use ctypes to call the shared library into a Python module to expose the ML functions.
- Make sure that the stored data gets processed and delivers the expected output.
- Use the Python Flask micro-framework to create a set of APIs that call the ML functions. Scheduled jobs created with Python crontab look after new data updates.
Python did the gluing
While Python is capable of ML work, in our case that was already completed with MatLab.
Python has provided us with the means to do everything else. Including bringing together PHP web services, C shared libraries, and a web UI.
Our product wouldn't exist without it. Pure and simple.
So, data scientists working on ML projects could save themselves some pain. By coding the core algorithms with Python you have a head start.
But even if you don't, Python is an obvious choice to build a product around the clever stuff.