Features

Learn to Implement Data Science Using Python, HADOOP & SAP HANA

By Lubhawna soni

Posted on May 18, 2015

Learn to Lead Big Data Analysis

Learn Beyond Concepts and Sandbox Environment.

___________________________________________________________________

Let’s Set Expectations:

Following diagram explains our “customer analytics-data science” that has been implemented using Python programming constructs on very large integrated data set that has been provisioned from SAP HANA (structured) and HADOOP (unstructured and machine data from channel).
In this posting, we didn’t provide the entire Python code block, we are presenting only a high level view on Python, upon request we should be able to provide the entire code block and it’s associated system artifacts that has been developed for our customer analytics
I am sure you may have seen many general Python programming constructs, but here is an real opportunity to review the power of Python to implement data science using SAP HANA and HADOOP as a data source

_________________________________________________________________

We Are Global Leader in Solving Data Science Skill Gaps

___________________________________________________________________

Learning Objectives:

Review various components (building blocks) of Python programing to design and develop data science
Better understand the overall Python program constructs that’s required to implement data science
Step through data science implementation process using HADOOP and SAP HANA as a data sources
Gain practical technical skills to implement data science
Also, we are answering your general questions like, “what should I learn in Python to develop data science?”, or “how much deep/details I should learn Python to implement big data analysis?”, etc.

Industry (Real) Use Case:

In order to achieve an unmatched learning experience, we decided to develop this materials through a real business problem, we choose “unmatched customer experience – customer analytics” as an industry use case to develop this article.

Customer Analytics Business Challenges:

Providing a consistent customer experience across all channels
Technology to create better customer experiences – Digital revolution, Big data
Systemizing a customer feedback process
Skilled professionals who can lead a strategy to gain great customer experience
Lack of proven process, methodology, cost effective options for identifying and developing customer experience through customer analytics

Customer Analytics Business Needs:

Effective feedback system
Collet customer feedback and sentiment across all channel
Analyze feedback regardless of feedback format – electronic, manual/paper process, voice, image, video, etc.
Provide continues timely response to customer feedback
Process to follow up to ensure customer satisfaction
Real-data, real-response with real-people – social media (such as Facebook, Twitter and Yelp/etc.), write responses when your customers post on your forum or blog
Global strategy to look at every touch point of customer experience in the customer lifecycle management
Level of services to create customer satisfaction
Increase customer retention, and turn customers into advocates

Customer Analytics Business Requirements:

Text Analytics – Ex: Understand customer feedback and quantify customer rating
Social Media Analytics – Ex: Customer experience – what they like, what they don’t like and why
Comparison Analysis – Ex: Brand awareness and brand recognition
Trend Analysis – Ex: Purchase pattern and spending trends
Sentiment Analysis – Ex: Summarize surveys, customer reviews
Predictive Analytics – Ex: Predict and reduce churn
Advanced Analytics – Ex: Next best offer strategies
Data Mining – Ex: Identify and reduce return fraud
Performance & Effectiveness Reporting – Ex: Identify product issues

Our Customer Analytics Unlocks Customer Insights:

Our customer analytics unlocks the customer insights to develop customer strategy to win customers and grow business, our customer analytics empowers business by answering the following questions in real-time

Who is your customers?
What is your customer needs are?
How to reach your customer?, when to contact your customer?
How to engage with customers – right channel, right message, right time?
What is your customer expectation, which customers are at risk of churning and why?
What is your customer buying brand choice?
What is your customer sentiment and trends?

… much more

Key Takeaways:

Python and Data Science Learning Tips and Techniques:

After 17 (seventeen) years of management consulting work experience with Deloitte, E&Y and KPMG, I strongly believe technology plays a key role in implementing data science. No matter who we are and what we have done in the past, “WE” can’t develop a big data or data science strategy, roadmap, business case, use cases and other non technical data science deliverables without having adequate technical implementation skills and knowledge.
In case if you want to become a “Enterprise Data Scientist” ( Just not Data Scientist ) ,you must have to consider learning Python and “R” , there are many good things to talk about both Python and “R”, at the same time both Python and “R” has it’s own strength and weakness.
In my experience, as a data scientist we must have to develop expertise in both Python and “R”. Also, we must have to demonstrate how to process large data sets using industry leading big data appliances such as SAP HANA and Oracle if your focus is on enterprise data science.
Just don’t learn general Python, learn Python on a real big data system landscape (such as SAP HANA, HADOOP & Oracle) and develop small and simple industry use cases (ex: Customer Analytics, Marketing Analytics, Workforce Analytics, etc.) using data science technicians such as models, algorithms, statistical computing, mathematical modeling and machine learning, etc.
In this posting, we covered the key and core building blocks of Python to develop data sciences. We believe by practicing each of the topics that’s listed in this posting will make you a very strong and a successful data scientist in the future.
To learn non general Python programming constructs (ex: data science techniques) you need a true development environment. Make sure first you establish a development environment (ex: SAP HANA, HADOOP, and/or Oracle) to learn Python to develop enterprise data science.
For each topics that’s listed in this article, we have proven live system examples/demos please feel free to call or mail me for any further assistance on your Python and/or enterprise data science learning.

I wish you all the best and good luck on your Python and data science learning !

____________________________________________________________________

Python Learning Roadmap: Summary

Learn Python to implement Big Data Analysis and Data Science

___________________________________________________________________

•This course is designed for anyone who interested to learn how to write Python programs to implement data science and big data analysis

Fundamentals:

Getting Started
Simple Functions and Test Driven Labs
Types and Variables
Simple Expressions
Advanced Types: Containers
A Bit More Iteration
Functions
Exceptions
Code Organization
Working with Files
Functional Programming
Advanced Iteration
Debugging Tools
Object-Oriented Programming
UnitTest

Data Science:

Introduction and Setting Up Your Integrated Analysis Environment
Using Python to Control and Document Your Data Science Processes
Accessing and Preparing Data
Numerical Analysis, Data Exploration, and Data Visualization with NumPy Arrays & Matplotlib
Exploring Data with Pandas and scipy.stats
Machine Learning with scikit-learn

Data Management

HADOOP as a Data Source
SAP HANA as a Data Source
Oracle as a Data Source
Other RDBMS as a Data Source
- MS SQL
- IBM DB2
- MySQL
- PostgreSQL
Web/Log as a Data Source

________________________________________

Python Learning Roadmap: Details

Learn Python to implement statistical computing, mathematical modeling, machine learning and complex large data processing

______________________________________________________

Fundamentals:

Getting Started:

Lesson Goals
Setting up Development Environment
The Interactive Interpreter
Follow up of Getting Started
Python Versions
Installing Python
Environment Variables
Executing Python from the Command Line
IDLE
Editing Python Files
Getting Help
Dynamic Types
Python Reserved Words
Naming Conventions

Simple Functions and Test Driven Labs:

Lesson Goals
Writing simple functions
Completing Test Driven Labs
Test Driven Labs Follow up

Types and Variables:

Lesson Goals
Types & Variables
Strings
Integers
Floats
Complex
Variables
Defining a Function
Dynamic Typing
Static Typing
Managing your Types
Internals

Simple Expressions:

Lesson Goals
Boolean Evaluation
Truthiness
Branching (if)
Branching (if else)
Branching (elif )
Block Structure & Whitespace
Regular Expressions

Advanced Types: Containers:

Lesson Goals
Lists
Strings Revisited
Tuples
Dictionaries
Sets
Collection Transitions
Advanced Types Follow-up

A Bit More Iteration:

Lesson Goals
Break & Continue
Loop-Else
More Iteration

Functions:

Lesson Goals
Defining
Arguments
Mutable Arguments & Binding of Default Values
Accepting Variable Arguments
Unpacking Argument Lists
Scope
Functions

Exceptions:

Lesson Goals
Basic Error Handling
Raising & Re-Raising Exceptions
Exceptions

Code Organization:

Lesson Goals
Namespaces
Importing Modules
Creating Modules
Preventing Execution on Import
Code Organization

Working with Files:

Lesson Goals
File I/O
File Data Manipulation

Functional Programming:

Lesson Goals
Functions as Objects
Higher-Order Functions
Sorting: An Example of Higher-Order Functions
Anonymous Functions
Nested Functions
Closures
Lexical Scoping
Useful Function Objects: Operator
Decorators

Advanced Iteration:

Lesson Goals
List Comprehensions
Generator Expressions
Generator Functions
Iteration Helpers: itertools
chain()
izip()
Advanced Iterations

Debugging Tools:

Lesson Goals
logging
pprint
pdb
Debugging Tools Follow-up

Object-Oriented Programming:

Lesson Goals
Classes
Emulation
classmethod and staticmethod
Inheritance
Encapsulation

UnitTest:

Lesson Goals
UnitTest

Data Science:

Introduction and Setting Up Your Integrated Analysis Environment:

Lesson Goals
IPython Shell
Custom environment settings
IPython Notebooks
Script editor
Basic Packages:
NumPy, SciPy, Matplotlib, Ipython etc
Data Structures & Analysis Packages:
Pandas
Machine Learning Packages:
scikit-learn
Networks Packages:
networkx
Statistical Packages:
PyMc, Statsmodels, PyMVPA
Live Data Packages:
twython
Visualization Packages:
matplotlib
Orange

Using Python to Control and Document Your Data Science Processes:

Lesson Goals
Data types and objects
Loading packages, namespaces
Reading and writing data
Simple plotting
Control flow
Debugging
Code profiling

Accessing and Preparing Data:

Lesson Goals
Loading from CSV files
Accessing SQL databases
Stripping out extraneous information
Normalizing data
Formatting data
HADOOP Data Access
SAP HANA Data Access
Oracle Data Access
Web/Log Data Access

Numerical Analysis, Data Exploration, and Data Visualization with NumPy Arrays & Matplotlib:

Lesson Goals
The NumPy array
2D plotting with Matplotlib
N-dimensional array operations and manipulations
Memory mapped files

Exploring Data with Pandas and scipy.stats:

Lesson Goals
Data manipulation with Pandas
Statistical analysis with Pandas
Time series analysis with Pandas
Overview of statistical tools in scipy.stats

Machine Learning with scikit-learn:

Lesson Goals
Input: 2D, samples, and features
Estimator, predictor, transformer interfaces
Pre-processing data
Regression
Classification
Model selection

Data Management

Lesson Goals
HADOOP as a Data Source
SAP HANA as a Data Source
Oracle as a Data Source
Other RDBMS as a Data Source
Web/Log as a Data Source

_________________________________________________________________

Actual Python Program Constructs

Customer Analytics Implementation Steps
—————————————————————————————-

/**************************************************************************************
Data Access from SAP HANA & HADOOP
/**************************************************************************************
#!/usr/bin/python
# -*- coding: utf-8 -*-
import pypyodbc as pyodbc
CSLAB_CONN = pyodbc.connect(‘DSN=DS;UID=DURGA;PWD=Delhi123’)
CSLAB_CURSOR = CSLAB_CONN.cursor()
CSLAB_STMNT = ‘SELECT * FROM DURGA.TICKETS’
CSLAB_CURSOR.execute(CSLAB_STMNT)
CSLAB_RESULT = CSLAB_CURSOR.fetchall()
print (CSLAB_RESULT)

/**************************************************************************************

import pypyodbc as pyodbc
CSLAB_CONN = pyodbc.connect(‘DSN=Hive;UID=hive;PWD=hive’,autocommit=True)
CSLAB_CURSOR = CSLAB_CONN.cursor()
CSLAB_STMNT = ‘SHOW TABLES’
CSLAB_CURSOR.execute(CSLAB_STMNT)
CSLAB_RESULT = CSLAB_CURSOR.fetchall()

/**************************************************************************************

import pandas as pd                                         # Import the Pandas Library
import numpy as np                                          # Import the Numpy Library
import matplotlib as plt                                      # Import the matplot Library
import pypyodbc as pyodbc                              # Import the pypyodbc Driver
CSLAB_CONN = pyodbc.connect(‘DSN=DS;UID=DURGA;PWD=Delhi123’) # Connection to SAP HANA using ODBC Connection
CSLAB_CURSOR = CSLAB_CONN.cursor()
CSLAB_STMNT = ‘SELECT * FROM DURGA.CSLAB_CUSTOMER’            # Fetching the Customer Table Data into Python
CSLAB_CURSOR.execute(CSLAB_STMNT)

————————————————————————————————————–
This Block of Code Shows Importing of Different Libraries needed for Customer Analytics. The Customer Information is Exported from SAP HANA into Python
————————————————————————————————————–CSLAB_DF.boxplot(column=’Sales’)
CSLAB_FIGURE = plt.pyplot.figure()
CSLAB_AX = CSLAB_FIGURE.add_subplot(111) # Sales Analysis
plt.pyplot.xlabel(‘Customer_ID’)
plt.pyplot.ylabel(‘Amount’)
plt.pyplot.show() # Display the Bar Chart of Sales Analysis

————————————————————————————————————–
This Block of Code Shows the Sales Analysis Plot in a Bar Chart, Suitable X-Axis, Y-Axis Labels & Titles are Provided by means of Correct Syntax.
————————————————————————————————————–

from pylab import *
CSLAB_FIGURE(1, figsize=(6,6))
CSLAB_AX = axes([0.1, 0.1, 0.8, 0.8])
CSLAB_labels = ‘Nicholas’, ‘Nasser’, ‘Elizabeth’, ‘William’, ‘Johan’
CSLAB_fracs = [10, 20, 25, 25, 20]
CSLAB_explode=(0, 0.05, 0, 0)
pie(CSLAB_fracs, explode=CSLAB_explode, labels=labels, autopct=’%1.1f%%’, shadow=True, startangle=90) # Pie Chart of Customer Churn %
title(‘Customer Churn % ‘, bbox={‘facecolor’:’0.8′, ‘pad’:5})
show()

————————————————————————————————————–
This Block of Code Shows the YTD Collection from the Customer in a Line Chart. Suitable Y-Axis Labels are Provided.
————————————————————————————————————–

import matplotlib.pyplot as plt
plt.plot([1124,1958,857,2465,3042,3245,4544,4956,3896,4948,4565]) # Line Chart of YTD Collection from the Customer
plt.ylabel(‘Amount($)’)
plt.show()

————————————————————————————————————–
This Block of Code Shows the Prediction of Customer Churn % in a Pie Chart. Suitable Labels are Provided.
__________________________________________________________________

_____________________________________________________

Community Service:

_________________________________________________________________

Grow Data Science Competency:

•Our effort is to address global skill gap issues on data science, this is a free community service to help individuals to learn enterprise data science using analytics techniques.

•This is not a sales material and/or marketing tool kits to market on what we do for the data science learning community.

Our Value To Data Science Community:

As you may have realized, we are starting our posting with measurable business values by presenting real analytics. I am sure you may have seen 100’s of presentation on big data , analytics, data science and predictive analytics but have you seen one single presentation which explains practical process and steps that’s required to implement predictive analytics at an enterprise level?, and the answer is “NO”.

With us, now you have an opportunity to understand the steps to implement big data and also a true learning platform to learn to implement Predictive analytics.

Learn beyond concepts and presentation

Community Contributor:

Jothi Periasamy
Durga Prasad
Karthikeyan Rajamanickam
Uday Bhoomagoud
Gourav Reddy

____________________________________________________________

Learn the roots of data science with real use cases on real data platform. Just not HADOOP.

Author Bio-

Jothi Periasamy is an Author, Speaker, Thought Leader & Community Contributor with Seventeen years of experience on management consulting, entrepreneurial and process excellence with Deloitte, E&Y, KPMG. Deeply “hands-on” on SAP HANA, HADOOP, & BI Tools .

Learn to Lead Big Data Analysis

Let’s Set Expectations:

We Are Global Leader in Solving Data Science Skill Gaps

Learning Objectives:

Industry (Real) Use Case:

Customer Analytics Business Challenges:

Customer Analytics Business Needs:

Customer Analytics Business Requirements:

Our Customer Analytics Unlocks Customer Insights:

Key Takeaways:

Python and Data Science Learning Tips and Techniques:

Python Learning Roadmap: Summary

Fundamentals:

Data Science:

Data Management

________________________________________

Python Learning Roadmap: Details

______________________________________________________

Fundamentals:

Getting Started:

Simple Functions and Test Driven Labs:

Types and Variables:

Simple Expressions:

Advanced Types: Containers:

A Bit More Iteration:

Functions:

Exceptions:

Code Organization:

Working with Files:

Functional Programming:

Advanced Iteration:

Debugging Tools:

Object-Oriented Programming:

UnitTest:

Data Science:

Introduction and Setting Up Your Integrated Analysis Environment:

Using Python to Control and Document Your Data Science Processes:

Accessing and Preparing Data:

Numerical Analysis, Data Exploration, and Data Visualization with NumPy Arrays & Matplotlib:

Exploring Data with Pandas and scipy.stats:

Machine Learning with scikit-learn:

Data Management

Actual Python Program Constructs

Customer Analytics Implementation Steps —————————————————————————————-

_____________________________________________________

Community Service:

Grow Data Science Competency:

Our Value To Data Science Community:

Community Contributor:

Follow Article

Leave a Reply

Featured Ad

Leading Solution Providers

Submit Guest Article

Subscribe To Our Newsletter

Latest Tweets

For AI To Change Business, It Needs To Be Fueled With Quality Data

Morten Middelfart – Big Data Solutions for Tumor Sequencing

What Are The Opportunities For High Performance Computing In India?

“First Thing We Tell Them Is That When You Go On A Public Cloud And Put Your Workloads There, Make That Secure”

“With CI, Infrastructure Is Less A Business Constraint And More A Business Enabler”

Customer Analytics Implementation Steps
—————————————————————————————-