Kayak Uses Big Data To Predict The Best Day To Book Your Travel Journey
Kayak.com is a travel meta search engine that offers users the possibility to find hotels, flights, vacations and rental cars across hundreds of different booking websites. It was acquired by Priceline.com in 2012. Time named Kayak to its list of 50 best website of 2009. They handle over a billion searches a year and maintain advertising agreements with over 4.000 travel suppliers and travel agencies including most global hotel and car rental operators, nearly every leading airline globally and the world’s leading travel agencies, so it is obvious that they are heavily involved with big data.
Kayak is a meta search engine for the travel industry doing what the large travel platforms, Orbitz, Expedia etc, did for the individual (airline) websites. They aggregate the aggregators and in the mean time they add new layers of information to the basics to give a rich user experience. They not only take care of flight searches, which generally only have a few variables that affect the search results (price and duration / stop overs) but also of hotels that have many variables that affect the results. Think of facilities, quality, pricing, distances to certain areas of interest etc. This requires substantially more analytics.
However, for their flight search they have moved into predictive analytics. Kayak introduced predictive analytics for their flights module to predict whether or not the price will go up or down in the next seven days. This is a major improvement compared to traditional flight booking websites who generally only give you a matrix of prices for the week, apart from perhaps Bing Travel.
Kayak developed the predictive search engine by using historical data from search queries from the past years and mathematical models to develop an algorithm that can predict the price. Of course, the forecast remains a prediction and therefore the system provides the visitor with the confidence of the statistical analysis. In order to improve the algorithm, Kayak tracks the flights in the background throughout the (seven) days of the forecast and uses that data to determine if the predictions where actually correct.
While they were working on the predictive model they analysed 1 billion search queries to discover the cheapest flights, the busiest airports, the most popular destinations and which destinations offered great value. It seams that for domestic flights in the USA, September is the cheapest month, while for international flights February and March were the cheapest months. Apparently, to get for cheapest fares, travellers should book between 21 and 3 days before departure.
Next to the predictive modelling, Kayak performs a lot of A/B testing to improve their website and the user experience. Every day between 30-50% of all visitors participate, of course without them knowing, in some sort of test. The tests are used to determine a cause-and-effect relationship behind which features provide the best results and the highest conversion.
Of course, Kayak relies heavily on a large Hadoop cluster, but they use Hadoop for data analytics and not to produce core business metrics, according to a Reverse Engineer at Kayak. Their data warehouse, including the accompanying ETL processes, loads over 40 millions rows per day into 43 fact tables. Kayak’s data warehouse contains 18 billion rows and several terabytes of data. In addition they use TokuDB from Tokutek, which is a storage engine to scale-up MySQL while maintaining ACID compliance.
With more and more travel being booked online, it is to be expected that Kayak improves its predictive models and who knows one day also includes a forecasting for hotels or cars. For now at least they have adopted big data throughout the organisation, so it will be interesting to see where they will be heading.