Our panel today included Dan Dunn, Product Operations, HubSpot; Michael Kane, PhD, Associate Research Scientist, Yale Center for Analytical Sciences, Yale University; Mike Keohane, VP of Software Engineering, OwnerIQ; Ian Stokes-Rees, PhD, NEBioGrid, Harvard Medical School.
The first question thrown out to panelists laid the foundation for the types of work they do, which primarily is exploratory data analysis using simple statistical methods. Michael Kane points out that as the methods become more complex, the mathematics becomes more predictive and harder to prove. Dan Dunn says that ultimately the most trustworthy outcomes rely on a combination of data variety and data integrity. This combination allows for more dimensions to be added and thus creating more links among the data points.
Another key component of the day was how crucial visualization is, not only to the end user of the report, but also for the data scientist to review along the way. According to Ian Stokes-Rees an interactive scatter-plot with clear axis points is necessary to provide your stakeholders to get complete buy-in.
While there was some lively debate on the best technologies available, particularly between our academics, the panel was in complete agreement that they are all very difficult to use, which brought us to our skills-gap portion of the day. So, what makes a great data scientist? Someone with experience in mathematics/statistics, computers/programming, and a grasp on business environments. Thus, a true rarity. According to Mike Keohane, data scientists may only have depth in one of those areas but if they have at least an understanding of the other traits then they are still a commodity.
In the end, data science is a team approach. Generally, the person performing the statistical analysis is not the same person who is administering the data or programming the technologies to run the data. In addition, having more than a single data scientist allows for checks and balances in analysis. There are so many tools and algorithms that can be completed, it is crucial to have a keen understanding of what questions should and are being asked.
The final question of the morning was what does the future hold? All panel members were in agreement that the volume of data will continue to increase along with its importance in business intelligence.