As data analytics mold the future in harvesting data information and transforming it into business intelligence, the positioning of data scientist is to provide our communities with non-random results based on causalities and hypothesis testing using the scientific methodology of research. No, big data does not represent the entire population and to make that assumption places us on a slippery slope. The risk is that our clients lose faith in our profession when there is no difference in what we offer and black box methodology. The higher order for scientist is to truthify data to the maximum extent possible and deliver results that are based on theoretical constructs and not correlations. Our clients look at what we do in degrees of ignorance and may even believe we have some secret. The facts are that we don’t and this should motivate us to move closer to the truth as theoretically possible. The good news is that we can get closer by treating our data conditionally to illuminate its hidden truth statistically. To that end, classical statistical methods have some utility but cannot harvest that information in its entirety. An alternative statistical approach is the use of Bayesian Belief Networks (BBN), which use the constructs of Bayes’ Theorem.
Let’s review Bayes’ Theorem. This simple yet dynamic equation contains the DNA that chains across the countable universe of conditional data to provide us with a greater view of the truth. Here it is: P(B|A) = P(AB) / P(A), in its simplest form. This says that the probability of an Event B (our hypothesis) given an Event A is equal to the probability of those two events occurring together at the same time divided by the probability of the Event A. Using the chain rule of probability this expands to P(B|A) = P(A|B) * P(B) / P(A). The classical beautify of this equation is we can chain other new events, say an Event C. We specify this as: P(C|BA) = P(ABC) / P(BA). Using the chain rule again, this expands to: P(C|BA) = P(A|BC) * P(B|C) * P(C) / P(BA), etc. For example, if we are diagnosed with a disease and the test is 90 percent accurate, the question remains…do we actually have the disease? Then, when we include the prevalent rate in the population, we begin to question the initial diagnosis as the truth. When we include an additional event like family history, this chaining process continues. As we add events, we conditionally refine our hypothesis to obtain greater fidelity in representing the truth; data truthification. Just a thought?
Strategic Economic Decision-Making @ www.springer.com/gp/book/
Dr. Jeff Grover is founder and Chief Data Scientist of Grover Group, Inc. He specializes in Bayes’ Theorem and its application through Bayesian Belief Networks (BBN) to strategic economic decision-making. He has conducted research with the US Army Research Institute for the Behavioral and Social Sciences and US Army Recruiting Command in the areas of Soldier attrition and special operations and medical recruiting. He has also recently conducted BBN research in the agricultural industry. Jeff received his Doctor of Business Administration in Finance from NOVA Southeastern, a Master of Business Administration in Aviation from Embry Riddle Aeronautical University, and a Bachelor of Science in Mathematics from Mobile College.
Jeff has published a book, Strategic Economic Decision-Making: Using Bayesian Belief Networks to Make Complex Decisions with SpringerBriefs. He has also operationalized the content of the book into a Bayesian big data algorithm at BayeSniffer.com. Also, he has published in the Journal of Wealth Management, the Journal of Business and Leadership; Research, Practice, and Teaching, and the Journal of Business Economics Research. He was also a guest speaker at the December 2014 MORS Conference in Washington, DC. Dr. Grover is also a retired US Army Special Forces officer.