DataRobot acquires Data Processing Startup, Paxata.

Table of Contents

  1. Introduction
  2. Machine learning flow
  3. Background of DataRobot
  4. How the acquisition of Paxata will help DataRobot
  5. Concluding Remarks


  1. Introduction

As artificial intelligence grows and becomes more used everywhere, so too do data scientists and machine learning engineers get trained and hired to work on these problems. These types of minds take many years of training in order to develop the analytical skillset needed to be able to understand and apply the models. A linkedIn Workforce report published in 2018 found that there is also a huge shortage of data scientists – with more than 151,000 data scientist vacancies. Tools and libraries are being developed that are helping to fill in the gaps such as Google’s autoML which allows automation of deep learning pipelines. Others such as Amazon’s SageMaker Ground Truth are close behind in offering automated machine learning solutions.

  1. Machine learning flow

Partly due to the expense as well as shortage of talent, machine learning process flows are beginning to become more and more automated. Programs such as AutoML completely take care of the optimization of machine learning models. The end-to-end process automation space is developing rapidly with DataRobot being one of the most notable names. A paper called “A Survey on Data Collection for

Machine Learning” highlights how important good data collection methods are to machine learning models. Good data must be collected with solid collection practices to ensure that no bias leaks into the data from the sample group. However, in almost all cases, some transformation and cleaning of the data occurs prior to when it is pumped into a machine learning model. Objects such as missing values or features that are unscaled can throw off these algorithms and affect the prediction accuracy.

The importance of data cleaning and data pre-processing is huge. Forbes states that data scientists spend 60% of their time on cleaning the data prior to feeding it into a machine learning model.

Adoption of AI into companies is also another challenge that must be overcome. Many companies only know that they should be more involved in AI, yet no clear roadmap on how to achieve it. Twenty-three percent of workers say that company culture holds back AI adoption. Thus companies that are working to automate the machine learning workflow are taking off like rocketships.


  1. Background of DataRobot

Datarobot is one of the hottest machine learning startups in that area that has received over $400M in funding according to Crunchbase. The company has made five acquisitions since its inception: Nexosis, Nutonian, Cursor, ParallelM, and Paxata.

Their latest acquisition, Paxata and what it brings to the table, will be the subject of further discussion below. These acquisitions represent a rich history of strategically adding valuable products to their core machine learning platform. Putting everything on their platform together result in a platform that is really a force to be reckoned with.

The DataRobot process flow that is described on their website consists of ingesting data, selecting the target variable (the thing you’re trying to predict) and then create hundreds of variations of different models with single clicks. The platform allows you look through the top performing models and gain understanding into your data and model. Lastly, it allows you to deploy the highest performing model. Parallelization allows DataRobot to crank out models with ridiculous speed. Users also have the ability to create API calls that can run predictions in minutes instead of days according to their promotional materials.

There are many case studies on their website that show how the technology can be used and deployed in industrial settings.

One case study of DataRobot was an application at the French company Carrefour which sells food goods. They use DataRobot to optimize store expansion efforts – identifying where the best locations are to expand. They also relied on DataRobot University in order to get up to speed quickly with the software as well as worked with customer-facing data scientists. The company claims that they can test 5 or 10 new ideas each day with their modelling efforts.

Another company using DataRobot is StoryFit. They try to find unique publishers and studios to find unique content that will sell to an audience. One of the primary ways that they do this is through machine learning methods. StoryFit models what type of audience will respond to certain stories and pieces. For StoryFit, DataRobot is hosted on AWS in order to provide higher levels of scalability. DataRobot interviewed the CEO of StoryFit, Monica Landers, shown in a YouTube video here about how beneficial the technology is for them.

She says: “We help on acquisitions on discovering stories and pinpointing who is the audience and who will connect with a story. these are both industries that pride themselves on going with their gut, but part of what is buried in their gut is actually the human ability to analyze a lot of data.”

The last case study that I will cover with DataRobot is that of LendingTree. One of the selling points that the VP of Analytics at LendingTree, Akshay Tandon, points out is that internal business users of the models can finally begin to understand how the models work. The adoption of analytics within LendingTree has increased due to DataRobot as a result of the improved accuracy and hunger for more analytics. The full case study video can be found here.

The DataRobot platform seems like it is being used everywhere with customers such as BlueCross BlueShield, BASF, Deloitte, Panasonic, and Tableau to name just a few. Clearly, DataRobot is a dominant force in this space. However, recently they have made a push to acquire another specialist in the space, Paxata, to help with the data cleaning step of the machine learning workflow.

  1. How the acquisition of Paxata will help DataRobot

Paxata is a company that provides data processing for data to be subsequently used in analysis or machine learning. Their primary product is a self-service data preparation product. The self-service portal does not require code, instead it allows you to visually format data. The processing engine for the self-service portal is Apache Spark which will take advantage of distributed computing to more quickly process the data.

Paxata will prove to be an invaluable acquisition for DataRobot as they build out their end-to-end machine learning platforms even further. As mentioned earlier, data cleaning is perhaps the most time-intensive aspect of a data scientists’ job and thus will rapidly improve the time savings of the platform. The acquisition will also mark the start of a true end-to-end solution combining expertise in both the data cleaning AND machine learning aspects of the process flow described above.

  1. Concluding Remarks

DataRobot continues to be a force to be reckoned with in the data modelling automation sector. The acquisition of Paxata shows that they will stay competitive in not only the data modelling space but also the data preparation arena as well.


Show More

Nick Allyn

Hello, my name is Nick Allyn. I am extremely passionate about the field of artificial intelligence. I believe that artificial intelligence will save millions of lives in the coming years due in higher cancer survival rates, cleaner air, as well as autonomous cars.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button