Who would have thought that creating toys would require machine learning models? But The LEGO Group has taken machine learning seriously and have gone to great lengths to integrate it into its processes.
Francesc Joan Riera, Applied Machine Learning Engineer, Data Engineering at The LEGO Group, is coming to the Data Innovation Summit 2020 to demonstrate how they train, maintain and update deployed machine learning models for LEGO’s Moderation Service.
Francesc talked to us beforehand to tell his story with machine learning, how he found his place in the machine learning world and he revealed how The LEGO Group does the cool stuff with ML.
Hyperight: Hello Francesc, we are glad to have you as one of the speakers at Data Innovation Summit. To begin with, please tell us a bit about yourself and your role in The LEGO Group.
Francesc Joan Riera: Hi, it is my pleasure to have been invited to such a wonderful event here in Sweden! About myself, I started my journey together with The LEGO Group in June 2019, when I joined to be part of the Big Data Engineering team as an Applied Machine Learning Engineer. Even though such long title might frighten you, my work in the team has been to implement and optimize a cloud-based AI-driven moderation service, which is in charge of receiving the content generated by our users on different applications (this content can be images and/or text) and filter away those that do not fulfil the high quality and safety standards that define the LEGO Group.
Hyperight: As 2020 is the year in which the Data Innovation Summit turns 5, could you point out what have been the most important breakthroughs with AI and machine learning in the last 5 years according to you?
Francesc Joan Riera: I could argue that it was also 5 years ago, that I had my first contact with machine learning, so my experience within the field has been growing at the same time than it has become what it means today.
For the last 5 years, we have been investing in improving our customers’ digital experience by means of AI techniques to provide better and safer play experiences for children all around the globe.
To put into perspective, 5 years ago I was in my last bachelor’s year at Universidad de Oviedo (Spain). I had a course called “Introduction to Computer Vision for Robotics”, and, I thought to myself “this is the coolest subject I’ve ever studied!”. The next day, I went to my professor’s office, and asked whether it would be possible to write my bachelor’s thesis on this specific field, even though it was an odd path to follow in-depth as I was an electronics engineering student and the projects tend to be more industry-heavy oriented… Nonetheless, he supported the initiative, and I got to work on my first research project related to computer vision algorithms for automated video surveillance, with detection and tracking of people in different scenes and face recognition via close-up webcam images. It all felt like sci-fi to me, and the deeper I got into the different approaches, the more I loved working on the topic.
At this point in my life, I was amazed by all these computer vision-based techniques, and honestly, quite “intimidated” of all the neural network-related algorithms, as I did not get the chance to deep dive into these until my master’s degree. Once this fear was gone, and I felt confident on the machine learning world, I would say the first breakthrough that opened my eyes was that we managed to beat human accuracy on image classification tasks (like ImageNet), primarily achieved with the amazing research works carried out by the big AI industries, but also because ImageNet, for example, managed to compile a huge labelled dataset that allowed them to improve these models to find even the smallest of details. I must admit, I’ve been scrolling that site many times, and still can’t recognize half of the dog breeds in there.
Many other breakthroughs have happened in these last 5 years, and they are all related to how far we have managed to take all these machine learning model design and implementation, and the learnings we’ve made from our own mistakes. I think it is only fair for me here to mention the ones that come instantly into my mind:
- A post by NVIDIA presented a documentary on how a Chinese university joined forces with a hospital and started developing an image-based Deep Learning model to detect and classify lung cancer cells in MRI images. In the documentary, they explained how the network was trained, and then used with the supervision of doctors, so that more data could be perfectly labelled to keep improving this method. At some point in the documentary, a doctor also explains that the algorithm could identify a tiny region as a cancer cell, which was not initially identified by the doctor!
- The self-driving vehicle technology has also benefited in the last years from the advances in machine learning (and compute power!). We can already see self-driving drones in the U.S., as Amazon is using these to deliver packages. And even though we are not quite there yet with fully autonomous vehicles, many companies have made incredible progress on having autonomous driving assistance (human-machine interaction, rather than full automation).
- Finally, with great power comes great responsibilities… and with all the great breakthroughs we’ve managed in the AI field, some might have caused problems that did also not exist before… I am talking about deep-fake videos, which have gained popularity in the last 2 years, as huge advances have been made in Generative Adversarial Networks. With similar techniques, we have also developed algorithms that can “hack” face and/or fingerprint recognition machines, simply by “reverse-engineering” how such machines were developed in the first place. All in all, it is also a great accomplishment!
Hyperight: You are going to present at the Data Engineering Stage and discuss on the topic of Continuous evaluation and improvement of Deployed Models in Production at The LEGO Group. I’m sure our readers are curious to know how and where exactly LEGO uses machine learning.
Francesc Joan Riera: In my opinion, it’s a global trend that all companies are investing more effort and resources in developing their digital areas, both for hardware production, as well as digital customer experience. For the last 5 years, we have been investing in improving our customers’ digital experience by means of AI techniques to provide better and safer play experiences for children all around the globe.
To provide a couple of examples, we, at The LEGO Group, have developed the following products that use/or are based on machine learning techniques:
- A recommender engine for our online shop, which learns from previous customers’ purchase behaviour in order to recommend items you might be interested in buying when you have purchased product X in your cart.
- A cloud-based moderation system that receives all user-generated content from our online platforms (webpage, phone applications…), and automatically removes the content we have specified as “not-suitable” from said platform. This system can receive images, text and videos, and by means of various pre-trained machine learning models, return a decision to the moderation team, who will then review the edge-cases to provide an even safer play experience.
- In LEGO® Hidden Side, the customer gets to buy and build a LEGO® set, like any other you would find in the shop, but, when used with its own phone application, this set will “come to live” on the device, by means of augmented reality technologies. Moreover, the different minifigures from the set will be detected and identified using machine learning techniques, that will allow the customers to play with them in different ways!
Hyperight: What are the challenges an Applied Machine Learning Engineer faces on a daily basis working on deploying machine learning models at LEGO and how do you overcome them?
Francesc Joan Riera: The process of deploying the model itself to the running service might be the easiest part of our daily work. However, this is a simpler process since much more work and effort has been previously put into designing, deploying and maintaining the infrastructure that holds the whole service in place.
In our specific case, I can directly relate to the vision Ole Kirk Kristiansen (the founder of The LEGO Group back in 1932) had for his company and the products he designed: Det bedste er ikke for godt (“Only the best is good enough”). How does this apply to our daily work?
First, we need to make sure the task at hand can BENEFIT from introducing machine learning into its normal operations. For example, if we need to moderate images on our user-generated content to filter out the ones that contain people, a good machine learning approach would be to use a pre-trained image classification or object detection algorithm, depending on the requirements.
Next, the integration of the machine learning part onto the whole process needs to be designed, implemented, validated, accepted and deployed. It is not enough having an ML model flying around working “most-of-the-time”. Following the same example as before, the same moderation service is built on a cloud solution, with a set of requirements (monthly costs, I/O formats, processing time per image…), and some running moderation models. The addition of this “new model” should adapt to the current deployment, adding whatever resources are necessary and making sure it only updates the current ones, avoiding breaking anything else. Once this is set up, the whole infrastructure with the model needs to be tested, validating that it is capable of performing the same operations as before, plus the new one; and this has to also be accepted by the end customer benefiting from the moderation services.
I can relate to the vision Ole Kirk Kristiansen (the founder of The LEGO Group back in 1932) had for his company and the products he designed: Det bedste er ikke for godt (“Only the best is good enough”).
We need to make sure the task at hand can BENEFIT from introducing machine learning into its normal operations.
Then, this new service needs to be deployed to the live service. And even though everything has been tested beforehand, this step always requires careful attention, as no failures are allowed on a running service. Deploying a service with a downtime of 5 minutes will always have people worried, asking why the service is not working. Deploying it with an hour downtime will have you checked under the microscope!
Finally, all these operations need to be monitored constantly, as our products cannot be just fine. We are expected to deliver the best, both for our internal and external customers. Even though this complicates all tasks, we strive for perfection, and thus all together help maintain the great image The LEGO Group has.
The Data Innovation Summit has gone 100% Online and become a Global event!
You can now join the summit from the comfort of your home or office, and enjoy the unparalleled content shared through the program. The entire program will be streamed LIVE through the event platform Agorify between 18th to 21st of August 2020.
Register on the link below to get your online ticket and listen to more than 300 sessions delivered by the leading data-driven companies in the world!