Kayak Uses Big Data To Predict The Best Day To Book Your Travel Journey

Kayak Uses Big Data To Predict The Best Day To Book Your Travel Journey
đź‘‹ Hi, I am Mark. I am a strategic futurist and innovation keynote speaker. I advise governments and enterprises on emerging technologies such as AI or the metaverse. My subscribers receive a free weekly newsletter on cutting-edge technology.

Kayak.com is a travel meta search engine that offers users the possibility to find hotels, flights, vacations and rental cars across hundreds of different booking websites. It was acquired by Priceline.com in 2012. Time named Kayak to its list of 50 best website of 2009. They handle over a billion searches a year and maintain advertising agreements with over 4.000 travel suppliers and travel agencies including most global hotel and car rental operators, nearly every leading airline globally and the world’s leading travel agencies, so it is obvious that they are heavily involved with big data.

Kayak is a meta search engine for the travel industry doing what the large travel platforms, Orbitz, Expedia etc, did for the individual (airline) websites. They aggregate the aggregators and in the mean time they add new layers of information to the basics to give a rich user experience. They not only take care of flight searches, which generally only have a few variables that affect the search results (price and duration / stop overs) but also of hotels that have many variables that affect the results. Think of facilities, quality, pricing, distances to certain areas of interest etc. This requires substantially more analytics.

However, for their flight search they have moved into predictive analytics. Kayak introduced predictive analytics for their flights module to predict whether or not the price will go up or down in the next seven days. This is a major improvement compared to traditional flight booking websites who generally only give you a matrix of prices for the week, apart from perhaps Bing Travel.

Kayak developed the predictive search engine by using historical data from search queries from the past years and mathematical models to develop an algorithm that can predict the price. Of course, the forecast remains a prediction and therefore the system provides the visitor with the confidence of the statistical analysis. In order to improve the algorithm, Kayak tracks the flights in the background throughout the (seven) days of the forecast and uses that data to determine if the predictions where actually correct.

While they were working on the predictive model they analysed 1 billion search queries to discover the cheapest flights, the busiest airports, the most popular destinations and which destinations offered great value. It seams that for domestic flights in the USA, September is the cheapest month, while for international flights February and March were the cheapest months. Apparently, to get for cheapest fares, travellers should book between 21 and 3 days before departure.

Next to the predictive modelling, Kayak performs a lot of A/B testing to improve their website and the user experience. Every day between 30-50% of all visitors participate, of course without them knowing, in some sort of test. The tests are used to determine a cause-and-effect relationship behind which features provide the best results and the highest conversion.

Of course, Kayak relies heavily on a large Hadoop cluster, but they use Hadoop for data analytics and not to produce core business metrics, according to a Reverse Engineer at Kayak. Their data warehouse, including the accompanying ETL processes, loads over 40 millions rows per day into 43 fact tables. Kayak’s data warehouse contains 18 billion rows and several terabytes of data. In addition they use TokuDB from Tokutek, which is a storage engine to scale-up MySQL while maintaining ACID compliance.

With more and more travel being booked online, it is to be expected that Kayak improves its predictive models and who knows one day also includes a forecasting for hotels or cars. For now at least they have adopted big data throughout the organisation, so it will be interesting to see where they will be heading.

Image Credit: Kunst+Bilder/Shutterstock
Dr Mark van Rijmenam

Dr Mark van Rijmenam

Dr. Mark van Rijmenam is a strategic futurist known as The Digital Speaker. He stands at the forefront of the digital age and lives and breathes cutting-edge technologies to inspire Fortune 500 companies and governments worldwide. As an optimistic dystopian, he has a deep understanding of AI, blockchain, the metaverse, and other emerging technologies, blending academic rigor with technological innovation.

His pioneering efforts include the world’s first TEDx Talk in VR in 2020. In 2023, he further pushed boundaries when he delivered a TEDx talk in Athens with his digital twin, delving into the complex interplay of AI and our perception of reality. In 2024, he launched a digital twin of himself, offering interactive, on-demand conversations via text, audio, or video in 29 languages, thereby bridging the gap between the digital and physical worlds – another world’s first.

Dr. Van Rijmenam is a prolific author and has written more than 1,200 articles and five books in his career. As a corporate educator, he is celebrated for his candid, independent, and balanced insights. He is also the founder of Futurwise, which focuses on elevating global knowledge on crucial topics like technology, healthcare, and climate change by providing high-quality, hyper-personalized, and easily digestible insights from trusted sources.

Share

Digital Twin