We use the terms “Big Data” and “Data Science” for use of data processing to make sense of the world around us. Spanning many fields, Big Data brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture.
These usecases use basic analytics, advanced statistical methods, and predictive technologies like Machine Learning. However, it is not just about crunching the data. Some usecases like Urban Planning can be slow, and there is enough time to process the data. However, with use cases like traffic, patient monitoring, surveillance the the value of results degrades much faster with time and needs results within milliseconds to seconds. Collecting data from many sources, cleaning them up, processing them using computation clusters, and doing all these fast is a major challenge.
This talk will discuss motivation behind big data and data science and how it can make a difference. Then it will discuss the challenges, systems, and methodologies for implementing and sustaining a data science pipeline.