Data Science has opportunities, challenges and potential future strategies for mathematics within it, and the added value that the mathematical sciences can bring to industry. There has been an upsurge of commercial and academic interest in data science. It has become a crucial tool to handle, manipulate and analyze data on which many important decisions are based. Across all sectors of industry and academia, it is recognized that adding new streams of data or finding patterns in existing data can add value to business. With the ever increasing amount of data available, the role of data science becomes ever more important and the mathematical sciences are a key element for its success.
There was a feeling that many industries are suffering a data deluge. The data collected are often complex and the information to be extracted multifaceted; simple counting methods are insufficient to assemble the required information or produce only simple answers. There is a shift towards the use of more advanced and adaptable methods that have the ability to process structured or unstructured data, of changing sizes and dimensions, that is discrete or continuous, and with varying levels of complexity.
There is a wide variety of mathematical and statistical techniques and complex analytics that can extract value from complicated, multifaceted data. There was much emphasis on the successful use of probabilistic methods and the need to disseminate them more widely whilst encouraging understanding by a wider community.
Many mathematical techniques are emerging which on paper provide numerous benefits to data science. As these methods are, however, new to the data science scene there is a limited number of applications and case studies of their use.
Topological Data Analysis (TDA ) example is a new area of study aimed at having applications in areas such as data mining. TDA represents data using topological networks and uses data sampled from an idealised space or shape to infer information about it . It essentially allows algorithms to analyse sets of data to reveal the inherent patterns within rather than showing correlations between preselected variables.
This method could be valuable to industry because it enables the user to discover insight into the data without having to ask the correct question of the data in advance. By working with industrial partners to have access to real data sets, case studies can be published to demonstrate value and promote new techniques to industry.
The area of data science is constantly evolving and new methods or techniques are frequently required to adapt to the changing conditions, it therefore becomes harder to validate and verify the methods and results. Mathematicians and statisticians are expected to create sound, defendable, and auditable conclusions from the data. This is difficult if there are no case studies, or examples to compare ideas, or processes in place to judge the outcomes. It is important to build a knowledge base which has case studies and examples, and provides testing information and evidence to justify results.
In addition, there are many mathematical methods which have not been widely applied in the area of data science but could have the potential to bring considerable advantage. These include:
- Neural computing
- Tropical geometry
- Topological data analysis
- Pattern theory
- Algebraic statistics
The demand for data scientists has risen dramatically in recent years and many companies are finding it hard to hire people with the relevant skills. This is partly due to the creation of new roles and skill sets in industry that did not exist before.
It is uncommon to find experienced individuals who have strong sector specific skills who can also apply cutting-edge mathematical methods of data analysis. There is great value in attracting mathematicians into industrial areas to gain sector-specific knowledge or introduce sector-specific expertise to data science techniques.
The industrialists at the workshop recognized the importance of the mathematical sciences when drawing information from data, and highlighted their frustration of not having (or knowing how to have) access to mathematicians. The discussion uncovered that the implementation of tools was not a problem within industry; what they need is access to people who can provide insight. It also highlighted internal challenges within organizations, such as the need to break down silos between different teams.
For mathematicians, there are challenges of how to take the mathematical sciences to industry and what are the best mechanisms for commercialization of their knowledge.
There appears to be great demand in industry for the provision of advice on what the appropriate mathematical and statistical techniques are, and for mathematicians to engage with the domain experts to solve problems together. New collaborations can avoid non-experts taking on specialist work, and instead create and support an environment where each expert contributes from their own specialism.
This section identifies activities that can bring added value to industry through the use of mathematics and statistics within data science. The following are the priority areas that are recommended:
- to help coordinate leadership for this area;
- improve access to high quality data;
- incentivize collaborations between industry and academia;
- Encourage open research where research results are published alongside the data on which they are based.
Competitions that pose a data science problem and require mathematical techniques to solve it (similar to the Kaggle competitions for machine learning) can attract interest to the area, introduce new ideas and form relationships between relevant sectors. Moreover different approaches can be directly compared.
Mahesh Kumar CV is A Big Data Entrepreneur, ChiefExecutive Officer & Founder at Big Data Force Pvt Ltd. I have about 14 years of experience in architecting and developing distributed and real-time data-driven systems. Currently my focus is on ensuring that my customers are happy, by addressing their business problems through robust data platforms that are fuelled by the advances in Big Data technologies and algorithms. Provided thought leading, practical, cutting edge solutions in the areas of BI,Big Data Analytics , In Memory Computing, Analytics to transform Fortune 500 Clients to deliver higher performance. Achieved great results for many clients through consulting and solving complex decision making environments. Specialties: translating big data into action, Big Data Trainings, Product Engineering Services, and Building Big Data CoE & Big Data Incubators. Specialties: Big Data Transformations (Strategy, Value Articulation, Architecture, Assessments, Portfolio analysis and rationalization), Information Management, Innovation.