Federating analytics workloads at HRS with AWS Orbit Workbench

HRS Technology Blog
5 min readMar 18, 2022
Source: AWS Orbit Workbench Website

HRS Group data teams and business experts collaborate to incorporate data and artificial intelligence to transform how corporations STAY, WORK and PAY. HRS Group needed a data analytics workbench that simplified the development of analytics workloads such traveler insights, volume forecasting, hotel portfolio optimizations, recommendation engines, and many more. A massively scalable solution was required to analyze terabytes of data from internal and external sources. Such tooling also needed to reduce cognitive work required to get started with analytics, based on the personas in their data and product teams.

In this blog, we provide a brief introduction to the HRS data analytics platform built on AWS Orbit Workbench, its purpose, how it is being used and our collaboration with AWS Professional Services team. We also cover the benefits and advantages of our solution, and the next steps for HRS data platform team we continue to innovate in analytics for our customers, how they STAY, WORK, and PAY.

Analytics Workbench Requirements

Our data platform team is tasked with responsibility of scale analytics workloads beyond a centralized team of data experts. This implies that software engineers, data engineers, data analysts, data scientists and technical product managers across HRS could work with data without infrastructure concerns, focusing on innovating with data on behalf of our customers.

To enable these groups of technical data practitioners, the data platform team set out to deploy a scalable solution based on self-service capabilities that allowed its users to analyze massive amounts of data. In addition, the solution needed to mirror our data lake setup, which is managed by AWS Lake Formation. Furthermore, not all data practitioners are familiar with the AWS Console, the solution had to ensure that data practitioners can successfully access the environment. Hence, integration with our internal authentication and authorization tools managed by our internal infrastructure and security teams is a priority.

The evolution of our exploratory analytics workbench

Before our use of AWS resources for analytic workloads, a central BI team was responsible for managing analytic workloads for the entire company. This meant that the team gradually became a bottleneck. Aside from the organization challenge, this also presented issues with infrastructure resources due to on-premise limitations.

Leveraging AWS resources for analytics we were able to setup and manage our data lake on Amazon S3 with AWS Lake Formation and accessed via Amazon Athena. With Amazon Athena as the query layer of the data lake, we soon observed that more than 90% of our data practitioners had no access to the AWS Console or could not cope with the cognitive overload of the tons of services. To address this, the data platform provisioned a web-based SQL client, Redash, connected with our central single sign-on. This raised the bar with data access, enabling everyone, with an HRS identity, to access and explore the data lake without downloading any software to their computers. Two highlights of this approach were

  1. Our Product Operations team reduced the lead time to deliver a report shared with our Sales Enablement 99.359% with self-service.
  2. Our Reporting and Analytics team extended Redash as a data exchange API layer on the data lake to our CRM team. Completely removing previously existing ‘Central BI team’ bottleneck.

But our data practitioners wanted more than SQL. They wanted to perform complex transformations, joins, analytic functions, and more. Some executed complex SQL queries against data in the data lake, running for about 10 minutes.

Redash UI showing execution time for saved queries

This informed us that there were underserved needs that our data platform features were not enabling. Our data practitioners shared feedback with us, we reached out to AWS, and got to work.

Data Analytics Environment with AWS Orbit Workbench

We learned from initial interviews that data experts were already familiar with Jupyter Notebooks. We also learned that data experts did not appreciate sending their analytics workloads over to engineers to schedule. They wanted shorter lead-time-to-analytics as well as a unified experience, from data exploration to operationalized analytics, whilst not being overloaded with tons of services available via the AWS Console.

Unified interface and common data access pattern

In collaboration with the AWS Professional Services, Data & Analytics, we rolled out AWS Orbit Workbench — an open-source framework, running on Kubernetes, that provides a single, unified experience for your data, analytics and machine learning projects.

Source: AWS Orbit Workbench Website

The AWS ProServe team enabled our data platform team to manage the AWS Orbit Workbench on Amazon EKS. With AWS Orbit Workbench integration with AWS Glue Data Catalog and Amazon Athena, data practitioners can explore data in the data lake with SQL and Python. And when they are ready to operationalize their workloads, they can schedule Jupyter Notebooks, write new data to data lake and create dashboards in our BI tool, MicroStrategy.

Since the roll out, we have seen rapid adoption across the company, including teams outside the broader Data & Analytics organization. We use several Amazon Elastic Kubernetes Service metrics available via AWS CloudWatch ContainerInsights as proxies for measuring adoption. Two important metrics that we pay attention to are Number of running pods by team and Cluster CPU Utilization by team. These give us insights into how teams are consuming cluster resources as well as understanding usage patterns to identify possible improvements.

AWS CloudWatch Metric showing total number running pods

Several analytic workloads important for making daily decisions are now powered by AWS Orbit Workbench, and many more to come. This is because with AWS Orbit Workbench, the data platform team has lowered entry barrier to analytics across HRS. As the need for analytics increases, the data platform team continues work on improvements such as integrating AWS Glue Development Endpoints with AWS Orbit Workbench, enabling data practitioners to wield the power of Apache Spark on AWS Glue Jobs for processing large amounts of data.

Conclusion

At HRS Group, data practitioners now have a clearer picture of where our data resides, the scale we need for today and for future growth. Our data platform team is also working on solutions that will improve data practitioners’ experiences working with data, covering topics such as data curation and discoverability, data quality and data ingestion based on self-service tooling. As such, our data platform is freed to focus on pressing platform features that improve the experience of developing analytic workloads and at the same time data practitioners focus on innovating our customer and supplier experiences with data and analytics without thinking of infrastructure details.

While HRSTech technologies and challenges will likely change, our mission and culture of overcoming them will last. Want to be a part of it?

--

--