Uncategorized
lifesciences, Oracle, pfizer, The Broad Institute

Big Data Seminar: Life Sciences and the Big Data Challenge

Big Data Seminar: February 15, 2013

The
challenges of big data are beyond storage; the true challenges are handling and
analyzing the data.
Kris Joshi , Global Vice President Healthcare
at Oracle Health Sciences opened the panel with a big data challenge in a sector far from life sciences,
The US Border Patrol. The Border Patrol is tracking and analyzing approximately
50 billion transactions per day in real time. The good:   If
there is any cross correlation of a risk the appropriate personnel can be
notified. The bad:  If there is any
failure at a center, every border must shut down within 30 minutes.
He then
asked the panel, Peter Bergethon, PhD., Head of Neuroinformatics
and Computational Neurology, Pfizer; Michele Clamp, Interim Director of Research Computing,
Director of Informatics & Scientific Applications, Harvard University; and Matthew Trunnell, CIO, Broad Institute what big data challenges they faced in each of
their respective roles.
There was a
tremendous amount of consensus that data is coming in too fast to scale
algorithmic approaches, interact with data, analyze the data, and then have the
ability to repeat the process to get a better output. Within Broad and Harvard
the infrastructure is developed for a general purpose and that is no longer a
scalable model. Further, data sets in the life sciences sector are immensely
different. And often times it is not the volume but the variety that causes
issues. While Michele and Matthew feel the best solution would be to move from
a relational database to fuzzier data models. Kris countered that, while
flexible, data models also lose the traceability of the data which is required
by many federal agencies.

Peter
included the communication barriers between analysts and biologists as another
major challenge. As a neurologist, his focus is on the need for big data to
help constrain the data sets so it can be put in an environment where it can be
predictive.
The panel’s
response to the best opportunities to improve alignment is to obtain more
accurate raw data that analysts can trust. In addition, they feel that
visualization software is still not up to par with where it needs to be for
biologists to make sense of the data. This further supports a communications
barrier between analysts and statisticians, and scientists.
On the
volume issue, many of the panel members are moving, or plan to move, some of
the data into the cloud so that sources of data can be shared and collaborated
upon.  Once taboo, the Cloud is beginning
to be embraced among life science specialists.
Not
surprisingly the discussion moved into the ever increasing issue of talent
shortage. Not only is there a small world of people that really understand
Hardtop, within the life sciences/biological industry the engineers  need to have the ability to communicate with
the scientists. What is the solution?
Peter
discussed how Pfizer offers onsite training, but the costs are increasing so
quickly and so much that they have begun outsourcing and he feels that is the
future. Matthew added that there is a huge need for statistical geneticists,
currently they just don’t exist.
There is
also a cultural component to the skills gap problem. A limitation of perceived
sense in value in sharing and explaining data to someone else. The data and the
algorithms are proprietary and that limits collaboration. There is also a need
for transparency on the application side which is why many, like the Broad and
Harvard are using software with open licensing. They need to understand how to
manipulate the tools to integrate the data sets properly.
It seems
that in the end with money, resources, and improving technology, it is not the
infrastructure that causes the true challenges of Big Data. But the disparate
data sets, the ability for scientists to understand what the data is saying,
and a major skills gap between analysts and scientists.

Upcoming Events

Share

Related Articles