In this interview we asked Ruban about his perspectives on diverse issues impacting Big data including smart cities, asian businesses and also particularly about Smart Machines , Smart Lakes Concepts and his platform that is supposed to revolutionize the Big Data & Insights business.
1. What is Smart Machine Analytics?
Smart Machines are systems that use machine learning to perform several procedural, repetitive, time-consuming and complex tasks, traditionally done by humans, in an effort to boost speed, efficiency and productivity. From virtual assistance like Siri, to energy efficient automatic thermostat of Nest, to self-driving cars of Google, are all examples of Smart Machines that we have seen in the consumer world.
Smart Machine Analytics brings the same machine intelligence to the enterprise data world. It automates data analysis using statistical and machine learning approaches and leads users directly to actionable insights on their data! Smart Machine Analytics enables users to communicate with their data in a natural language, enabling anyone, irrespective of their data science knowledge, to get value from their data immediately by directly asking questions and getting analyzed answers instantly from the Smart Machine.
2. How and what value does it provide to users? What is the biggest challenge that it addresses?
Finding actionable insights from data is like finding a needle in a haystack. It requires significant time, effort, statistical or computational expertise, and quite naturally considerable cost, to be able to look into every direction and depth in data to discover the latent patterns, anomalies or variations of significance that can reveal stories of business interest.
It is almost humanly impossible to analyze data from all angles in the big data world, where the data sources are too many and the number of dimensions grow exponentially on a regular basis, where the data volumes are humongous that even a significant sized sample often don’t fit into a single machine for analysis, or where the data is so fast moving that the window of opportunity to analyze the data is often minutes and not days or months!
IDC reported that only 0.5% of all data is even analyzed! This is a huge loss when billions of dollars are spent every year on data analysis efforts and yet people are still not able to unlock the value hidden in their entire data. To add to that, there is a huge shortage of manpower – data scientists – who can even analyze the tiny fraction of the data that is even humanly possible to look into. McKinsey reported that by 2018 the United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data! This is a huge problem that cannot be addressed by building more data analysis tools that can only be operated by expert users.
Smart Machine Analytics addresses this very problem by using machine intelligence to automate analysis of data from disparate sources, identify entities and relationships, and reveal insights hidden deep in the data directly to users without requiring them to know anything about the technical aspects of data science. We call this Smart Machine Insights and this addresses the huge data analysis challenge that exists in today’s Big Data world.
3. Do you think Smart Machine Analytics can add value to the Smart Cities initiative in India? How?
The biggest strengths of Smart Machine Analytics are the ability to analyze data at machine-scale from various different sources, to identify cross data relationships, to find insights of significant value, and to communicate with users in a natural language. Application of smart machines on a wide variety of public data can help generate undiscovered patterns, connections, or even anomalies that can help find answers to complex problems such as optimizing the consumption of scarce resources like electricity and water, solving traffic problems, optimizing fuel consumption, enabling proactive disease control, implementing better crime management, achieving efficient human resource management, and many more. And the best part is that, the solutions to these problems can be found by the citizens themselves who can be empowered to interact with the Smart Machines and make inferences from the publicly available data. Everyone can participate in making his or her own locality, town and city a truly “smart” one.
4. Any recommendations to the Government of India on how they should go about building smart cities?
Just as W. Edwards Deming said “In God we trust, all others must bring data”. If we make data the central point of all decisions that are made, it can deliver unprecedented value. All smart systems rely primarily on data, and smart cities can be no different. If we look at how cities have become “smart” in other parts of the world, the common underlying factor is their ability to leverage data well.
And to leverage data to the fullest we need to ensure the following:
– We have the right system in place to collect, store and archive as much data as possible from everywhere.
– We have the right system in place to analyze all the data that we capture and get to insights fast.
– We have the right system in place to take actions on the insights, run tests of our hypothesis and solutions and measure the results.
And we need to be able to do this at scale. Smart Machine Analytics can help scale easily on the technology front. But the only way that we can scale expertise and manpower is by crowd sourcing. Educate, enable and empower every individual to collect, analyze and act on data and to collaborate with the government. Smart Cities can only be created by Smart Citizens!
5. Which verticals/sector can benefit most?
Any vertical can leverage Smart Machine Analytics and get huge benefits that can transform their businesses. But we have seen verticals that have access to a lot of data or for whom data is at core of their business adopt it faster. These include Banking & Financial Services, Retail & Ecommerce, High Tech, Telecommunication and others.
6. Any use cases?
The approach of Smart Machine-led insights opens up a wide variety of use cases for most enterprises as they can discover a lot of things about their business that are otherwise not obvious. Some of the common use cases that we have seen include: Customer Analysis -from understanding customer behavior to preventing potential fraud to growing revenue channels; Operational Analysis – to identify optimizations and increase efficiency and productivity; Machine Data Analysis – to proactively handle problems in industrial equipment and prevent breakdowns or accidents etc.
7. How do you turn Big Data into Smart Data?
As IBM’s Ginny Rometty said “Big Data is the world’s natural resource for the next century”. And like any natural resource it needs to be processed and refined to make its value shine. So the only way that Big Data gets transformed into “smart data” is by processing and analyzing in entirety, and extracting the gold nuggets of insights hidden deep within the big data lakes.
8. What is the concept of Smart Lakes? How does it help in better data discovery? Is it complicating data governance?
The Big Data Lake is just a huge dump of data collected from various data sources into something like a Hadoop Distributed File System (HDFS). However this does not contain any structural entity or relational information, which is essential to be able to perform quality data analysis. “Smart Lake” brings in that missing layer of intelligence over the raw Hadoop Data Lake where, by using statistical and machine learning algorithms, one can classify the data dump into entities, and identify relationships between the entities. This enables better data discovery by generating meta-data, which helps in better interpretation of the data.
Raw data lakes are a governance nightmare as it is like a dumping ground of data. A Smart Lake, on the other hand, introduces the possibility of better governance because it generates and maintains an understanding of the type of data and creates a meta-data layer on top of it that can define rules of accessibility and change.
9. Do you have a specific product roadmap/business roadmap for Asia? Please elaborate
We are US based company, but our product is geography agonistic. We invite businesses in Asia to talk to us on how Smart Machine Analytics can help transform their business.
10. Do you think real time Big Data Analytics can change the online search scenario in the world? Any idea on how companies like Google are reacting to this new reality.
Real time big data analytics is already changing search, by making search results more accurate, context-aware, and personalized. Companies like Google are already driving innovations in the consumer search space using big data analytics.
11. How do you position DataRPM- as a comprehensive platform or as integrated suite of different products that can be plugged in individually?
We position DataRPM as a comprehensive platform that delivers Smart Machine Insights for Hadoop. We use the Smart Machine Analytics approach to help enterprises discover the deepest insights from their data in an automated way, giving them a jump-start that will speed time to their business decision-making resulting in a significant competitive advantage and ROI.
Ruban Phukan, Co-founder and Chief Product Officer of DataRPM.
Ruban is a serial entrepreneur and technologist with rich and diverse experience in data science, product, technology and business. As a data scientist in Yahoo, Ruban’s role involved data mining and analyzing several big data sets of Yahoo and coming up with strategic business insights. His projects influenced several products & business strategies and led to tens of millions of dollars of positive revenue impact. He co-founded Bixee, a leading vertical search company in India where his patent-pending CrawlX technology revolutionized vertical specific information retrieval from unstructured data. He also created Pixrat, a social photo sharing destination. Ruban sold Bixee and Pixrat to Ibibo, a Naspers Group company, where he became the Vice President of Social Media and helped grow Ibibo to become one of the leading social media destinations in India with several millions of active users, prior to co-founding DataRPM.