Features

Learn to Implement Data Science Using Python, HADOOP & SAP HANA

100783874

xx

Learn to Lead Big Data Analysis

Learn Beyond Concepts and Sandbox Environment.

___________________________________________________________________

Let’s Set Expectations:  

  • Following diagram explains our “customer analytics-data science” that has been implemented using Python programming constructs on very large integrated data set that has been provisioned from SAP HANA (structured) and HADOOP (unstructured and machine data from channel).
  • In this posting, we didn’t provide the entire Python code block, we are presenting only a high level view on Python, upon request we should be able to provide the entire code block and it’s associated system artifacts that has been developed for our customer analytics
  • I am sure you may have seen many general Python programming constructs, but here is an real opportunity to review the power of Python to implement data science using SAP HANA and HADOOP as a data source 

_________________________________________________________________

We Are Global Leader in Solving Data Science Skill Gaps

___________________________________________________________________

Learning Objectives:

  • Review various components (building blocks) of Python programing to design and develop data science
  • Better understand the overall Python program constructs that’s required to implement data science
  • Step through data science implementation process using HADOOP and SAP HANA as a data sources
  • Gain practical technical skills to implement data science
  • Also, we are answering your general questions like,  “what should I learn in Python to develop data science?”, or “how much deep/details I should learn Python to implement big data analysis?”, etc.

Industry (Real) Use Case:

In order to achieve an unmatched learning experience, we decided to develop this materials through a real business problem, we choose “unmatched customer experience – customer analytics” as an industry use case to develop this article.

Customer Analytics Business Challenges:

  • Providing a consistent customer experience across all channels
  • Technology to create better customer experiences – Digital revolution, Big data
  • Systemizing a customer feedback process
  • Skilled professionals who can lead a strategy to gain great customer experience
  • Lack of proven process, methodology, cost effective options  for identifying and developing customer experience through customer analytics

Customer Analytics Business Needs:

  • Effective feedback system
  • Collet customer feedback and sentiment across all channel
  • Analyze feedback regardless of feedback format – electronic, manual/paper process, voice, image, video, etc.
  • Provide continues timely response to customer feedback
  • Process to follow up to ensure customer satisfaction
  • Real-data, real-response with real-people – social media (such as Facebook, Twitter and Yelp/etc.), write responses when your customers post on your forum or blog
  • Global strategy to look at every touch point of customer experience in the customer lifecycle management
  • Level of services to create customer satisfaction
  • Increase customer retention, and turn customers into advocates

Customer Analytics Business Requirements:

  • Text AnalyticsEx: Understand customer feedback and quantify customer rating
  • Social Media AnalyticsEx: Customer experience – what they like, what they don’t like and why
  • Comparison Analysis - Ex: Brand awareness and brand recognition
  • Trend AnalysisEx: Purchase pattern and spending trends
  • Sentiment AnalysisEx: Summarize surveys, customer reviews
  • Predictive AnalyticsEx: Predict and reduce churn
  • Advanced AnalyticsEx: Next best offer strategies
  • Data MiningEx: Identify and reduce return fraud
  • Performance & Effectiveness ReportingEx: Identify product issues

Our Customer Analytics Unlocks Customer Insights:

Our customer analytics unlocks the customer insights to develop customer strategy to win customers and grow business, our customer analytics empowers business by answering the following questions in real-time

  • Who is your customers?
  • What is your customer needs are?
  • How to reach your customer?, when to contact your customer?
  • How to engage with customers – right channel, right message, right time?
  • What is your customer expectation, which customers are at risk of churning and why?
  • What is your customer buying brand choice?
  • What is your customer sentiment and trends?

    much more

Key Takeaways:

Python and  Data Science Learning Tips and Techniques:

  • After 17 (seventeen) years of management consulting work experience with Deloitte, E&Y and KPMG, I strongly believe technology plays a key role in implementing data science. No matter who we are and what we have done in the past, “WE” can’t develop a big data or data science strategy, roadmap, business case, use cases and other non technical data science deliverables without having  adequate technical implementation skills and knowledge.
  • In case if you want to become a “Enterprise Data Scientist” ( Just not Data Scientist ) ,you must have to consider learning Python  and “R” , there are many good things to talk about both Python and “R”, at the same time both Python and “R” has it’s own strength and weakness.
  • In my experience, as a data scientist we must have to develop expertise in both Python and “R”. Also, we must have to demonstrate how to process large data sets using industry leading  big data appliances such as SAP HANA and Oracle if your focus is on enterprise data science.
  • Just don’t learn general Python, learn Python on a real big data system landscape (such as  SAP HANA, HADOOP & Oracle) and develop small and simple industry use cases (ex: Customer Analytics, Marketing Analytics, Workforce Analytics, etc.)  using data science technicians such as models, algorithms, statistical computing, mathematical modeling and machine learning, etc.
  • In this posting, we covered the key and core building blocks of Python to develop data sciences. We believe by practicing each of the topics that’s listed in this posting will make you a very strong and a successful data scientist in the future.
  • To learn non general Python programming constructs (ex: data science techniques) you need a true development environment. Make sure first you establish a development environment  (ex: SAP HANA, HADOOP, and/or Oracle) to learn Python to develop enterprise data science.
  • For each topics that’s listed in this article, we have proven live system examples/demos please feel free to call or mail me for any further assistance on your Python and/or enterprise data science learning.

I wish you all the best and good luck on your Python and data science learning !

____________________________________________________________________

Python Learning Roadmap:  Summary

Learn Python to implement Big Data Analysis and Data Science

___________________________________________________________________

•This course is designed for anyone who interested to learn how to write Python programs to implement data science and big data analysis

Fundamentals:  

  • Getting Started
  • Simple Functions and Test Driven Labs
  • Types and Variables
  • Simple Expressions
  • Advanced Types: Containers
  • A Bit More Iteration
  • Functions
  • Exceptions
  • Code Organization
  • Working with Files
  • Functional Programming
  • Advanced Iteration
  • Debugging Tools
  • Object-Oriented Programming
  • UnitTest

Data Science:    

  • Introduction and Setting Up Your Integrated Analysis Environment
  • Using Python to Control and Document Your Data Science Processes
  • Accessing and Preparing Data
  • Numerical Analysis, Data Exploration, and Data Visualization with NumPy Arrays & Matplotlib
  • Exploring Data with Pandas and scipy.stats
  • Machine Learning with scikit-learn

Data Management

  • HADOOP as a Data Source
  • SAP HANA as a Data Source
  • Oracle as a Data Source
  • Other RDBMS as a Data Source
    • MS SQL
    • IBM DB2
    • MySQL
    • PostgreSQL
  • Web/Log as a Data Source

________________________________________

Python Learning Roadmap: Details

Learn Python to implement statistical computing, mathematical modeling, machine learning and complex large data processing

______________________________________________________

Fundamentals:   

Getting Started: 

      • Lesson Goals
      • Setting up Development Environment
      • The Interactive Interpreter
      • Follow up of Getting Started
      • Python Versions
      • Installing Python
      • Environment Variables
      • Executing Python from the Command Line
      • IDLE
      • Editing Python Files
      • Getting Help
      • Dynamic Types
      • Python Reserved Words
      • Naming Conventions

Simple Functions and Test Driven Labs:

      • Lesson Goals
      • Writing simple functions
      • Completing Test Driven Labs
      • Test Driven Labs Follow up

Types and Variables:  

      • Lesson Goals
      • Types & Variables
      • Strings
      • Integers
      • Floats
      • Complex
      • Variables
      • Defining a Function
      • Dynamic Typing
      • Static Typing
      • Managing your Types
      • Internals

Simple Expressions:    

      • Lesson Goals
      • Boolean Evaluation
      • Truthiness
      • Branching (if)
      • Branching (if else)
      • Branching (elif )
      • Block Structure & Whitespace
      • Regular Expressions

Advanced Types: Containers:       

      • Lesson Goals
      • Lists
      • Strings Revisited
      • Tuples
      • Dictionaries
      • Sets
      • Collection Transitions
      • Advanced Types Follow-up

A Bit More Iteration:   

      • Lesson Goals
      • Break & Continue
      • Loop-Else
      • More Iteration

Functions: 

      • Lesson Goals
      • Defining
      • Arguments
      • Mutable Arguments & Binding of Default Values
      • Accepting Variable Arguments
      • Unpacking Argument Lists
      • Scope
      • Functions

Exceptions:

      • Lesson Goals
      • Basic Error Handling
      • Raising & Re-Raising Exceptions
      • Exceptions

 Code Organization:    

      • Lesson Goals
      • Namespaces
      • Importing Modules
      • Creating Modules
      • Preventing Execution on Import
      • Code Organization

  Working with Files:     

      • Lesson Goals
      • File I/O
      • File Data Manipulation

 Functional Programming:    

      • Lesson Goals
      • Functions as Objects
      • Higher-Order Functions
      • Sorting: An Example of Higher-Order Functions
      • Anonymous Functions
      • Nested Functions
      • Closures
      • Lexical Scoping
      • Useful Function Objects: Operator
      • Decorators

Advanced Iteration:    

      • Lesson Goals
      • List Comprehensions
      • Generator Expressions
      • Generator Functions
      • Iteration Helpers: itertools
      • chain()
      • izip()
      • Advanced Iterations

  Debugging Tools:       

      • Lesson Goals
      • logging
      • pprint
      • pdb
      • Debugging Tools Follow-up

Object-Oriented Programming:   

      • Lesson Goals
      • Classes
      • Emulation
      • classmethod and staticmethod
      • Inheritance
      • Encapsulation

UnitTest:    

      • Lesson Goals
      • UnitTest

Data Science:    

Introduction and Setting Up Your Integrated Analysis Environment:

      • Lesson Goals
      • IPython Shell
      • Custom environment settings
      • IPython Notebooks
      • Script editor
      • Basic Packages:
      • NumPy, SciPy, Matplotlib, Ipython etc
      • Data Structures & Analysis Packages:
      • Pandas
      • Machine Learning Packages:
      • scikit-learn
      • Networks Packages:
      • networkx
      • Statistical Packages:
      • PyMc, Statsmodels, PyMVPA
      • Live Data Packages:
      • twython
      • Visualization Packages:
      • matplotlib
      • Orange

Using Python to Control and Document Your Data Science Processes: 

      • Lesson Goals
      • Data types and objects
      • Loading packages, namespaces
      • Reading and writing data
      • Simple plotting
      • Control flow
      • Debugging
      • Code profiling

 Accessing and Preparing Data:    

      • Lesson Goals
      • Loading from CSV files
      • Accessing SQL databases
      • Stripping out extraneous information
      • Normalizing data
      • Formatting data
      • HADOOP Data Access
      • SAP HANA Data Access
      • Oracle Data Access
      • Web/Log Data Access

Numerical Analysis, Data Exploration, and Data Visualization with NumPy Arrays & Matplotlib:

      • Lesson Goals
      • The NumPy array
      • 2D plotting with Matplotlib
      • N-dimensional array operations and manipulations
      • Memory mapped files

Exploring Data with Pandas and scipy.stats:  

      • Lesson Goals
      • Data manipulation with Pandas
      • Statistical analysis with Pandas
      • Time series analysis with Pandas
      • Overview of statistical tools in scipy.stats

 Machine Learning with scikit-learn:      

      • Lesson Goals
      • Input: 2D, samples, and features
      • Estimator, predictor, transformer interfaces
      • Pre-processing data
      • Regression
      • Classification
      • Model selection

Data Management     

      • Lesson Goals
      • HADOOP as a Data Source
      • SAP HANA as a Data Source
      • Oracle as a Data Source
      • Other RDBMS as a Data Source
      • Web/Log as a Data Source

_________________________________________________________________

 Actual Python Program Constructs

 Customer Analytics Implementation Steps
—————————————————————————————-

/**************************************************************************************
Data Access from SAP HANA & HADOOP
/**************************************************************************************
#!/usr/bin/python
# -*- coding: utf-8 -*-
import pypyodbc as pyodbc
CSLAB_CONN = pyodbc.connect(‘DSN=DS;UID=DURGA;PWD=Delhi123′)
CSLAB_CURSOR = CSLAB_CONN.cursor()
CSLAB_STMNT = ‘SELECT * FROM DURGA.TICKETS’
CSLAB_CURSOR.execute(CSLAB_STMNT)
CSLAB_RESULT = CSLAB_CURSOR.fetchall()
print (CSLAB_RESULT)

/**************************************************************************************

import pypyodbc as pyodbc
CSLAB_CONN = pyodbc.connect(‘DSN=Hive;UID=hive;PWD=hive’,autocommit=True)
CSLAB_CURSOR = CSLAB_CONN.cursor()
CSLAB_STMNT = ‘SHOW TABLES’
CSLAB_CURSOR.execute(CSLAB_STMNT)
CSLAB_RESULT = CSLAB_CURSOR.fetchall()

/**************************************************************************************

import pandas as pd                                         # Import the Pandas Library
import numpy as np                                          # Import the Numpy Library
import matplotlib as plt                                      # Import the matplot Library
import pypyodbc as pyodbc                              # Import the pypyodbc Driver
CSLAB_CONN = pyodbc.connect(‘DSN=DS;UID=DURGA;PWD=Delhi123′)  # Connection to SAP HANA using ODBC Connection
CSLAB_CURSOR = CSLAB_CONN.cursor()
CSLAB_STMNT = ‘SELECT * FROM DURGA.CSLAB_CUSTOMER’            # Fetching the Customer Table Data into Python
CSLAB_CURSOR.execute(CSLAB_STMNT)

————————————————————————————————————–
This Block of Code Shows Importing of Different Libraries needed for Customer Analytics. The Customer Information is Exported from SAP HANA into Python
————————————————————————————————————–CSLAB_DF.boxplot(column=’Sales’)
CSLAB_FIGURE = plt.pyplot.figure()
CSLAB_AX = CSLAB_FIGURE.add_subplot(111)                       # Sales Analysis
plt.pyplot.xlabel(‘Customer_ID’)
plt.pyplot.ylabel(‘Amount’)
plt.pyplot.show()                                              # Display the Bar Chart of Sales Analysis

————————————————————————————————————–
This Block of Code Shows the Sales Analysis Plot in a Bar Chart, Suitable X-Axis, Y-Axis Labels & Titles are Provided by means of Correct Syntax.
————————————————————————————————————–

from pylab import *
CSLAB_FIGURE(1, figsize=(6,6))
CSLAB_AX = axes([0.1, 0.1, 0.8, 0.8])
CSLAB_labels = ‘Nicholas’, ‘Nasser’, ‘Elizabeth’, ‘William’, ‘Johan’
CSLAB_fracs = [10, 20, 25, 25, 20]
CSLAB_explode=(0, 0.05, 0, 0)
pie(CSLAB_fracs, explode=CSLAB_explode, labels=labels, autopct=’%1.1f%%’, shadow=True, startangle=90)   # Pie Chart of Customer Churn %
title(‘Customer Churn % ‘, bbox={‘facecolor':’0.8′, ‘pad':5})
show()

————————————————————————————————————–
This Block of Code Shows the YTD Collection from the Customer in a Line Chart. Suitable Y-Axis Labels are Provided.
————————————————————————————————————–

import matplotlib.pyplot as plt
plt.plot([1124,1958,857,2465,3042,3245,4544,4956,3896,4948,4565]) # Line Chart of YTD Collection from the Customer
plt.ylabel(‘Amount($)’)
plt.show()

————————————————————————————————————–
This Block of Code Shows the Prediction of Customer Churn % in a Pie Chart. Suitable Labels are Provided.
__________________________________________________________________

_____________________________________________________

Community Service:

_________________________________________________________________

Grow Data Science Competency:

•Our effort is to address global skill gap issues on data science, this is a free community service to help individuals to learn enterprise data science using analytics techniques.

•This is not a sales material and/or marketing tool kits to market on what we do for the data science learning community.

Our Value To Data Science Community:

As you may have  realized, we are starting our posting with measurable business values by presenting real analytics. I am sure you may have seen 100’s of presentation on big data , analytics, data science and predictive analytics but have you seen one single presentation which explains practical process and steps that’s required to implement predictive analytics at an enterprise level?and the answer is “NO”.

With us, now you have an opportunity to understand the steps to implement big data and also a true learning platform to  learn to implement Predictive analytics.

Learn beyond concepts and presentation

Community Contributor:

  • Jothi Periasamy
  • Durga Prasad
  • Karthikeyan Rajamanickam
  • Uday Bhoomagoud
  • Gourav Reddy

____________________________________________________________

Learn the roots of data science with real use cases on real data platform. Just not HADOOP.

Author Bio-

Jothi Periasamy is an Author, Speaker, Thought Leader & Community Contributor with Seventeen years of experience on management consulting, entrepreneurial and process excellence with Deloitte, E&Y, KPMG. Deeply “hands-on” on SAP HANA, HADOOP, & BI Tools .

Comments

comments

Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

To Top