Brands around the globe, big or small, are giving it all they got, going over and beyond for their customers. They know that their business is as successful as their least satisfied customer.
When your business is serving customers in 6000 cities, in 64 countries and millions of riders, you have to make sure you live and breath customer satisfaction. Uber, the global tech leader in ride-sharing, can teach us a thing or two with how they manage to keep all their riders and drivers happy.
We’ve previously featured a portion of Uber’s case study in an article about the transformation of customer experience in the digital era and the methods global brands are using to put a smile on their customers’ face.
The digital era brings in technology that allows us to obtain large masses of customer data about their preferences, behaviours and pain points, analyse them using AI and ML and get super-specific insights about how to mould our products and services accordingly.
And Uber knows how to put machine learning to work to delight their customers.
In an interview for Hyperight, Ritesh Agrawal, Tech Lead Manager at Uber, highlighted Uber’s ML-based features that help to enhance riders’ experience:
- Personalised destination suggestions based on ride history and frequently travelled destinations.
- One-Click chat – a smart reply system which allows riders and drivers to communicate easily with in-app messaging. The system uses machine learning and NLP to anticipate responses to frequent riders’ questions. Drivers can reply with just one click of a button.
- Bridging the supply-demand gap – Uber’s system predicts time periods and area are going to have increased demand and alerts drivers accordingly. Meeting the demand in pick hours helps Uber keep customers happy and increase its customer retention rate.
But one of the crucial applications of AI and machine learning at Uber is detecting and resolving user experience incidents in order to make sure their app is up-and-running and reliable at all times.
Rolling out changes like the “Fast and Furious”
Ritesh Agrawal and Anando Sen delivered their story about how Uber scales fast and furious, but safely at a very distributed marketplace, as well as the challenges they come across and the constant changes they are making to make both drivers and riders happy.
Every day, hundreds of thousands of people rely on Uber to get to work, commute or make their daily livelihood by driving for Uber, highlights Anando. This is why Uber has taken their job seriously with detecting user issues and fixing them as fast as possible while doing it in the safest possible way.
Every month Uber does 1,200 code changes and 22,000 configuration changes which allow the app to be tuned for every city it runs in. However, rolling out changes to the app, distributed systems and infrastructure with great velocity causes reliability issues.
To measure if the app is reliable at all times, Uber measures three KPIs:
- Availability – were riders able to hail a ride and complete a trip.
- Latency – the time needed to go through the entire process of calling a ride, the system flow, the screen transition, the tabs functionality.
- Accuracy – of the information, the map, price, discount.
Anando states that there’s a strong correlation between the speed at which changes are delivered to the users and their overall service reliability.
Tackling user-facing incidents
The most important thing out of this correlation is Uber’s ability to detect user-facing incidents and to attribute them back to their code or configuration, get people to fix them, and roll out mitigations fast and furious.
“The time to detect and time to resolve an incident should be among the main company’s KPIs”, advises Anando. However, solely recognising the issue is not enough.
If they detect issues at the beginning of their data pipeline, the cost is nearly non-existent, he explains. When the issues are detected in the rollout phase, the costs start rising. This is why it’s crucial to move, detect and resolve fast to reduce the blast radius of users’ bad experiences, – instead of 1 million users affected, reversing it to 10,000 users.
From a data scientists point of view, Ritesh focuses on transforming such a vague problem, as an incident, into a concrete, actionable item.
When they looked closer at the incident reports to try to understand the source of error, they discovered that 40% of all incidents are code-related changes, but 20% are configuration-related changes. The problem arises in the disconnect between the person who builds the code and configurations, and the city operations teams that are applying the configuration changes.
Delivering Changes to a Globally Distributed Marketplace – Ritesh Agrawal, Uber and Anando Sen, Uber
As it operates in more than 64 cities, the Uber app is customised with different product choices depending on the location. For example, the vehicle choice for San Francisco is an e-bike, whereas, in Delhi, the app offers auto. The product choice that appears for each city is driven by configurations. Additionally, promotions and maps of each city are enabled by configurations.
As Ritesh explains, the biggest benefit of configurations is that it allows creating a global app, but optimises it for local city performance. On top of that, it provides experimentation capabilities.
However, it doesn’t come free from challenges. Configurations are hard to review, states Ritesh, because they are immediately rolled out. Each change in the configuration is applied to all app users around the world. And consequently, if there is an error with configuration, it will be rolled out with the update.
To take control, they started thinking about building an ML solution that will automatically detect if there is any issue with the app by examining signals from configurations. They compare them to signals prior to the configuration and perform A/B testing to confirm if an error has occurred.
Learnings from delivering fast and furious user experience with ML and data science
For a global widely-used app as Uber, customisation is necessary, but at the same time, they have to focus on reliability, emphasises Ritesh.
Incidents will always occur, but the critical point is to detect them as soon as possible. The time to detect and time to resolve KPIs help Uber tackle the incidents.
What’s more, when deploying an ML solution at an infrastructure level, Ritesh advises not starting from scratch with a completely new system. Instead, they leverage the systems their developers and engineers are using and apply automation to them. The engineers should be able to understand the ML models and tie back the errors to the actual code.
Ultimately from a business point of view, Anando concludes that ROI really matters. The investment made for the ML solutions, infrastructure, training and deployment should be weighed against the number of detected incidents and the blast radius reduction. Every successful data science and machine learning-based effort should be founded on a two-way street where investment is tied back to ROI so it can be sustainable in the long-run.