Data Innovation Summit turns five next March. Along the way, we have had fantastic speakers unselfishly sharing their knowledge on stage with their peers. Without them, this journey would be impossible.
This interview is part of an interview series dedicated to humanising Data and AI innovation and celebrating speakers who have presented on Data Innovation Summit. The emphasis lies on the Data/AI people/practitioners, their professional journey and their stories.
Data has become the most valuable resource on earth. So no wonder that giving up ownership over their data even for collaboration for building machine learning models is what organisations fret about.
Daniel Zakrisson talked about federated machine learning as a solution for this predicament at the Data Innovation Summit 2019. We recently reached out to him about the state of federated machine learning today and his outlooks of machine learning in the future.
Hyperight: Hi Daniel, we are glad to have you with us today and have the chance to catch up. You were a speaker at Data Innovation Summit 2019. To refresh our memories and introduce yourself to our readers, please tell us a bit about yourself and the company you are coming from.
Daniel Zakrisson: My name is Daniel Zakrisson and I’m the CEO and co-founder of Scaleout. We’re building a platform that enables privacy in machine learning, removing the need to move, disclose or pool valuable, sensitive or regulated data. In daily operations, we also help organizations put advanced machine learning and DevOps technologies into production, by sharing best practices and working on machine learning projects together with teams on the customer side.
Hyperight: Next year we are celebrating our 5th anniversary. A lot has changed with data and AI during these 5 years. From your point of view, where do we see the biggest changes and advancements we have had?
Daniel Zakrisson: First of all, over the last 5 years data has become the most valuable resource on earth. At the same time, machine learning has become the most important tool to extract value out of this data. Machine learning systems also reach better than human performance in more and more areas, such as complex board games, computer games, vision, and classification etc.
However, developing these systems is also a very costly process largely driven by large tech companies that can harness value at scale.
Front runners in the industry are now turning around to become data-driven in order to utilise the value in their data and use the advancements of others and put machine learning into production where it directly impacts their efficiency.
Hyperight: In your presentation at Data Innovation Summit 2019, you talked about federated machine learning as a solution for collaboration without giving up ownership of the data. As you said, federated learning was still in the research phase then. Can we expect to see it in the operational phase soon?
Daniel Zakrisson: Yes, the key question with federated learning is how much you trust the different data sources that work together to train a machine learning model. In simpler cases, where data sources are within one company the technology is ready to use. In more complex cases, such as data collaboration in a partnership or even data collaboration with competitors there is still work to be done in order to create a system that everyone can fully trust.
Hyperight: What are some challenges with federated machine learning?
Daniel Zakrisson: The main challenge with federated learning is to make everyone fully trust the trained machine learning models without being able to directly see all data and follow the process that results in the model. Today this is done with trusted setups, where a third-party trust provider will perform the central processing and orchestration. One way to remove the need for this trusted third party is the use of decentralised computation, which we are working on.
Another general challenge is around adversarial machine learning – how can we make the system robust to dishonest members and external threats? This is a wide topic that must be dealt with step by step.
Finally, the third main challenge we see is around scalability and performance as datasets and the number of members gets larger and larger. We need to work on ways to scale out the algorithms and frameworks to the fog and edge.
Over the last 5 years, data has become the most valuable resource on earth. At the same time, machine learning has become the most important tool to extract value out of this data.
Hyperight: Talking about the decade to come, where do you see machine learning and federated learning for that matter in 2030?
Daniel Zakrisson: Machine learning has made incredible advances over the last 5 years. Looking 10 years into the future is impossible. However, I know what I wish to see happen, and at Scaleout we are doing everything we can to steer development this way.
Ten years from now, data is considered a very valuable resource by most companies and individuals. There is a separation of data and machine learning models that allow individuals and enterprises to contribute to machine learning models without disclosing or giving up ownership of their data. Schemes exist that allow collaboration and fair compensation for those that contribute their data to create powerful machine learning models.
As an example, joint machine learning models has enabled the solution to some of the really hard machine learning problems we face today (such as self-driving cars, affordable drug development, the energy shift to renewables, improved farming practices etc), enabled by collaboration between many different data owners with both size and diversity of their data sets.
The most powerful machine learning models will not be proprietary assets of individual companies anymore, but jointly owned by all those who contributed to their creation. This creates a new dynamic for data owners, where network effects around particular joint data sets and models will make it impossible for individual companies to corner a specific market.
The Data Innovation Summit has gone 100% Online and become a Global event!
You can now join the summit from the comfort of your home or office, and enjoy the unparalleled content shared through the program. The entire program will be streamed LIVE through the event platform Agorify between 18th to 21st of August 2020.
Register on the link below to get your online ticket and listen to more than 300 sessions delivered by the leading data-driven companies in the world!