Four Big Data Use Cases from the Media and Publishing Industry: BBC, Healthcare.gov and more

The media and publishing industry traditionally collects and generates vast amounts of data. With the advent of e-books, e-journals and e-magazines this has only increased. All this data offers the possibility for the media and publishing industry to reinvent themselves. Big Data offers ample new possibilities for this industry and luckily we are already seeing interesting use cases.

Next to being the founder of this platform, I am also the co-founder of a bi-monthly meetup on Big Data in Amsterdam. ‘Data Donderdag’, or in English ‘Data Thursday’ focuses on the impact of Big Data and helps organisations better understand Big Data. Without all the technical details but with concrete examples of how Big Data will change the way organisations work and do business. From Bytes to Business Case. The latest edition was about Big Data in the media and publishing industry and I would like to share some of the interesting best practices that were told during the meetup.

LexisNexis Combines Thousands of Data Sources for Insights

LexisNexis is a corporation providing computer-assisted legal research as well as business research and risk solution services. LexisNexis BIS is a subsidiary that provides business support by delivering the right information via the right channel at the right moment. They collect raw data and make this data easily accessible for customers and end-users within the financial industry, business services industry and the government.

One typical use case for LexisNexis is providing background checks for the financial services industry. They are capable of combining tens of thousands data sources and millions of company profiles to perform a background check for a new customer of a bank. The data and the insights derived from it are richer compared to traditional background checks done via Google.

To be sure that the data is correct, LexisNexis also validates the source of the data. They are quite reluctant with incorporating new sources and only work with sources of high quality such as the New York Times.

But LexisNexis goes further than that. Currently their R&D department is working on making video content (in this specific case it is about news bulletins) searchable. This will enable end-users to search for specific scenes within the video content using a traditional search query.

Springer Makes Scientific Journals Easy Accessible

The company Springer is a leading publisher on scientific journals and books. They have over 6000 employees in 25 different countries. They publish over 2000 journals and 123.000 e-books and have an annual revenue of € 1 billion. Springer also noticed that there is remarkable shift from print to digital. According to a survey carried out by Springer, they found out that scientific journals would become mostly digital (88%) and remain a little bit print (12%). They needed to act upon it fast.

They changed their offering and therefore they developed Springer Link. With Springer Link they are making 95% of their revenue online with over 225 million downloads per year by 16 monthly visitors. All content is accessible on all devices and easily searchable. All data, structured and unstructured, is searchable via different channels. AuthorMapper for example provides search results based on geographical location and Realtime Springer provides insights in which scientific articles and journals are read in real-time and which topics are trending.

The BBC Generated >2 Petabytes of Data during the 2012 Olympics

The BBC is the largest media company of the United Kingdom. They have an impressive reach of 96% to English households. During the 2010 Worldcup they discovered they would have a big challenge to deal with all the data that was to be created during the Olympics. The Worldcup ‘only’ required 1000’s of individual pages for 32 Teams, 8 Groups and 700+ Player pages. The Olympics however, would require 10.000+ of individual pages covering over 10.000 athletes, 200+ teams and 500 disciplines. Without Big Data this would have been impossible.

In the end, the BBC opted for generating these pages automatically and to enrich it with metadata. This metadata was also used to automatically generate more new pages by linking facts to Open Data such as DPPedia (the structured version of Wikipedia). This approach worked as during the two weeks the BBC created impressive amounts of data. They had approximately 9.5 millions visitors per day and on the busiest day they generated 2.8 Petabytes of data with an average of 25.000 transactions per second.

Healthcare.gov Required a New, NoSQL, Approach

The introduction of health insurance for all Americans was a typical example of Big Data. They had to combine personal details of 300 million Americans from 50 states with multiple dozens of insurers, the tax authority, social security data, employee data etc. They first tried a traditional RDBMS approach, which failed. Therefore they switched to NoSQL to load the data without defining any structure.

Thanks to this approach, the team working on Healthcare.gov was capable of starting to work on the data at short notice and enabling rapid prototyping. They used MarkLogic to unify and store all of the data and they simplified the legacy architecture. They were capable of loading the data ‘as is’, removing the burden from states, providers and payers. It provided a ‘future proof’ system as yet-unidentified data sources can be added later on without any problem. This innovative approach enabled in the end that Healthcare.gov could launch on time and serve millions of Americans.

These four examples of Big Data in the media and publishing industry are only the beginning. With so much data available already and more data underway, Big Data could truly revolutionize this industry.

Image: Flickr user brianjmatis
Image Credit: Fer+Gregory/Shutterstock