Guest Post

Guest Article: Big Data Buzzwords For 2015 & What They Mean

Profile1-215x300It’s the end of the year and everywhere you turn there are predictions for 2015.  The predictions for the future of data are some of the toughest to decipher.  Analytics and big data easily have the most obscure vernacular with new terms and acronyms being tossed around all the time.  If you’re looking for some clarity, you’re not alone.  I’ve compiled a list of the big data buzzwords for the coming year.  Let me know which ones you want more clarity on in the comments and I’ll get those added.

IoT (Internet of Things) & IoE (Internet of Everything)

The Internet of Things is the term used for giving electronics and machines access to the internet and the ability to gather data.

Thousands of startups and the top names in tech are jumping on this trend now and into 2015.  This one’s not a fad and it will have a BIG impact on 2015 and beyond.  It’s starting in homes, offices and manufacturing plants.  Everything from thermostats to refrigerators to projectors to assembly line robotics are gathering information and then transmitting that data over the internet for analysis.

What can you get from a refrigerator?  Grocery buying and diet habits are among the data points businesses would be very interested in.  From a feature standpoint, a consumer would be able to auto-generate a shopping list from that data.  The hypothetical applications are endless which is why it is generating so much buzz.

The Internet of Everything is the next step for the internet of things; connecting people, data, cities, coffee mugs…literally everything together.

Although companies like Cisco are preparing for this now, the Internet of Everything is probably a few years out.  The infrastructure and protocols for the Internet of Everything are being built now because the ramp up needed will be huge.  Everything from internet connection speeds to the architecture of the web need to be modified to accommodate that many connections.  Many are speculating that the Internet of Everything is the end of the web.  In its current form, the World Wide Web cannot support the Internet of Everything and its potential uses.  Large infrastructure companies are investing heavily in preparing for the transition. Why?  Cisco estimates the Internet of Everything to be “a $19 trillion global opportunity over the next decade.”  This trend will be much more transformative than the Internet of Things but is also farther out in the future.

Quantified Self

Quantified Self is the term used to describe the personal, consumer use of data about ourselves and our habits.

Athletes are the biggest participants in Quantified Self.  They track data about their diet, exercise regime, weight, body fat, hydration levels, oxygen levels and more to maximize their progress towards specific fitness goals.  The same techniques which are being automated for business optimization are also being automated for consumer use.  In 2015, apps that allow people to track data points about themselves for fitness, general health, stress management, career planning, education and much more will take off.  Sensors and wearables will enable the collection of data for these apps and analytics will help people optimize their daily lives to achieve their goals.

Artificial Intelligence, Neural Networks& Machine Learning

If you’re a data scientist, these terms are part of your job.  In 2015, they’ll enter the vocabulary of nearly everyone who consumes analytics as well.  It’s good to understand them at a high level so conversations about software that utilize them make more sense.

Machine Learningis just what it sounds like; teaching a machine how to learn something specific then apply that in a useful way.

Machine learning is used to predict an unknown based on what we do know.  If I have some demographic information about a person and I want to predict what political party they associate with, I would use a machine learning algorithm (algorithm is a $10 word for equation or set of equations or model).  You’ll hear two subcategories of machine learning: supervised and unsupervised learning.  Supervised machine learning uses large datasets to train and hone the model.  These datasets already have the correct answer filled in so the model can be trained to fit to the data.  Unsupervised learning uses data as well but the answers aren’t known in those datasets.  We use a number of techniques to help a machine learn to spot meaningful patterns in large datasets that help predict a specific trait or steps that lead to the desired outcome.

Think of machine learning in terms of your own learning.  In school we learn by seeing example after example of a concept until we can recognize the pattern behind it and apply the concept to novel situations.  Later in life we don’t always have a teacher and we learn concepts through experience.  That’s, more or less, how machine learning works too.

Neural Networks are a machine learning tool which works well for highly complex problem solving.

Neural Networks’ architecturesare inspired by how biological neural networks function.  It’s a true example of machine emulating human.  They’re very good at solving problems that other machine learning models struggle with.  It’s a complex topic but that’s really all you need to know to understand it at a high level.

Machine Learning and Neural Networks are what are used under the hood of data science applications to turn data into insights.  Many players are working on reducing the amount of effort involved in data science.  Applications which use Neural Networks and Machine Learning automate what data scientists are creating manually now.  It’ll bring advanced analytics within reach of a lot more businesses in 2015 while driving the costs down significantly.

Artificial Intelligence is a hotly contested term.  When does a neural network or machine learning algorithm start to qualify as an Artificial Intelligence?  That’s under debate.  Artificial Intelligence in a practical, business application is quite a ways off.  I know you’ll hear it thrown around a lot in 2015 especially in ethical discussions.  It’s also going to be big at the box office with over a half-dozen AI movies coming out next year.  Stephen Hawking and Elon Musk believe that AI poses a threat to humanity while others present more rational views on the impacts of AI.  It’s such a compelling subject with a deep connection to our understanding of consciousness as well as what it means to be intelligent.  This term has legs and practical applications.  That combination will make for colorful conversations in 2015 about AI.

Data Wrangling

Data Wrangling is what data scientists have to do with raw data to make it manageable and useful.

Data typically starts out in a variety of forms.  With sources ranging from spreadsheets to tweets to emails to 3rd party sources, the way data comes to a data scientist is frequently unusable without a significant amount of work.  That’s what we refer to as Data Wrangling.  Some estimates say that Data Wrangling takes up about 70% to 80% of a data scientist’s time.  I can speak from personal experience to say that’s not far off and I deeply hate Data Wrangling.

It’s an expensive and painful problem which means there are a number of companies working on a solution.  2015 will see these apps save time and sanity for data science teams.


NoSQL is a type of database that can handle data that isn’t strictly structured.

The variety of data sources requires new types of databases to handle unstructured data.  That can be free text data like tweets or emails.  It can also be data that defies traditional relational definitions; where the relationship between one point of data and another isn’t a straight line.  These relationships are significant because they help establish patterns for machine learning so having a database that preserves them is a big help.

As the IoT ramps up, the number of data sources will increase in 2015 making applications that can handle diverse types of data highly useful to data scientists.  Cassandra, MongoDB, CouchBase, HBase and many others fall into this category.

Sentiment Analysis & Intent Analysis

Sentiment Analysis mines what people say on social sites, comments on articles, surveys and reviews as well as behavioral data to determine how they feel about a product, company or policy.

Sentiment Analysis has a lot of practical business applications.  For marketing departments it’s a window into how customers really feel about products, marketing campaigns and brands.  For HR departments it helps build a picture of employee engagement.  It can be used to gauge how investors feel about a business.  2015 will be filled with new uses for sentiment analysis as well as new tools to help businesses get it done.

Intent Analysis predicts what people are likely to do based on what they’re saying and patterns of activity.

An exciting field of predictive analytics is Intent Analysis.  By looking at how groups behave and what they’re saying it’s possible to paint a picture of what they’re most likely to do next.  It’s also possible to model how they will respond to an ad campaign, a product or a new company policy.  In 2015 intent analysis will be used by IT, marketing, HR, compliance, product management and many other business groups.  Businesses looking for a competitive advantage over their competitors are quickly adopting technologies and hiring the people who can bring the benefits of intent analysis to life.  That trend will continue into 2015 and beyond as this technology becomes more widely available and more accurate.

What did I miss?  If you have big data terms you want defined, drop me a comment and I’ll add it to the list.

Author Bio: Vineet Vashishta is the founder of V-Squared Consulting, a leading edge data science services company.  He has spent the last 20 years in retail/eComm, gaming, hospitality, and finance building the teams,infrastructure and capabilities behind some of the most advanced analytics companies in the US.
You can follow him on Twitter:  @V_Vashishta.

Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

To Top