The advanced analytics and data warehousing scene is changing with the speed of light. With the advent of big data, streaming data, IoT, and the cloud, the changes never cease. It may seem to be a very different world with different concepts, terms, and techniques. Or is it?
While lots of people still talk about data warehouses and data marts, the conversation now includes data science, machine learning, and AI! How do these things bring value and data-based decisions to our organizations? Is there a coherent architecture we can define?
Kent Graziano, Chief Technical Evangelist for Snowflake will explore these and other topics to portray what the data analytics platform of the future might look like at the Data Innovation Summit, the Analytics and Visualisation stage.
Hyperight: Hi Kent, we are very excited to have at the Data Innovation Summit 2020. Could you please give us a short introduction about yourself and your role in Snowflake.
Kent Graziano: I am the Chief Technical Evangelist for Snowflake. I have been with the company for a little over 4 years now, starting when we were only about 100 people! I am an author, speaker, and trainer, with expertise in the areas of data modelling, data architecture, and agile data warehousing. I am an Oracle ACE Director (Alumni), Knight of the OakTable Network, one of the first certified Data Vault Masters and a certified Data Vault 2.0 Practitioner (CDVP2).
I have been a data architect for well over 30 years, including more than 25 years designing data warehousing and business intelligence solutions. I have written about and been recognized as an expert in Cloud and Agile Data Warehousing. I have written numerous articles and blogs on these topics and give talks about them globally. I have authored three Kindle books (available on Amazon.com), co-authored four books (including the 1st Edition of The Data Model Resource Book). I am a co-author on the first book on Data Vault, and the technical editor for Super Charge Your Data Warehouse. You can follow me on twitter @KentGraziano or on my blog The Data Warrior.
Hyperight: You are going to deliver a presentation at the Analytics and Visualisation stage on the subject of The Future of Analytics in the Cloud. Indeed, the advanced analytics scene is changing with the speed of light. What did data analytics look like 5 years ago, and where it is now where cloud has a central role?
To be successful in the modern era, organizations must adopt a cloud-based data architecture that allows them to ingest, share, and act on data in a secured and governed manner.
Kent Graziano: Five years ago folks were just starting to think about the cloud. Many were honestly scared and worried about things like the security of their data in the cloud. Many were also struggling with what to do with Hadoop on-premises. Data volumes and types were growing rapidly and really folks were struggling to get value from all that data. The on-prem systems, RDBMS and Hadoop, just were not meeting the needs of businesses to adapt quickly and deliver the insights needed from all this data in a timely manner.
Today, with the advent and maturation of the cloud, and cloud data platforms (like Snowflake), businesses are finding it much easier to get all their data, structured and semi-structured, into one secure and governed location, in not only a timely manner but also for a much lower TCO than they could have ever dreamed possible with the legacy on-prem databases and appliances. This has allowed many organizations to greatly reduce their time to value and really start leveraging their data for better business outcomes.
Hyperight: There’s a growing business need for data processing and real-time data insights which fuels increasing investment into enterprise data warehouse development. Could you share some best practices for data warehouse development?
Five years ago folks were just starting to think about the cloud. Many were honestly scared and worried about things like the security of their data in the cloud.
Kent Graziano: Happy to. Last year I even wrote an ebook on that very topic – 5 Best Practices for Developing a Data Warehouse.
In summary, they are:
- Create a data model – This is key to understanding the true business semantics of the data and help provide a catalogue of the data in your warehouse in business terms.
- Adopt an agile data warehouse methodology – time to value must be reduced to stay competitive. The software development world created agile methods and practices almost two decades ago to enable faster delivery of web-based systems. It is time for us to take similar approaches in the data world if we are going to keep up with the rapidly changing data landscape and deliver this data to the business in a form they can consume, in a timely manner.
- Favour ELT over ETL – with a cloud data platform, it just makes sense to keep the data as close to the repository as possible. Why pull the data out of the platform to change it just to push it back in again? Instead, rely on the power, and security you get by using the platform.
- Adopt a data warehouse automation tool – if you want to be agile, reduce time to value, and integrate new sources quickly, the best way to do that is with automation. Automate the build of the data warehouse and the generation of the ELT pipeline. Then automate the testing too! There are a growing number of tools and frameworks on the market that do this.
- Train your staff on new approaches – this is a critical success factor. If we are going to adopt new platforms (cloud), new methods (agile) and new ways of working (ELT and automation), it is not just new skills, but a new mindset as well. The data team needs to be trained, coached, and mentored in all of this to really become successful and achieve true velocity in delivery. And you need to allow time for this change and adoption to really take hold.
Hyperight: According to Gartner, one of the top strategic technology trends for 2020 is a distributed cloud which they say is the new era of cloud computing. What is your take on this? Is distributed cloud the future of cloud computing?
Kent Graziano: At Snowflake, we are seeing this already. Global organizations need their data available in many locations for a variety of reasons. Some have to do with privacy and regulatory compliance, others have to do with performance and latency when running analytics. That is why we introduced cross-region replication to allow our customers to replicate their data to wherever it makes sense for the business, in a simple, automated fashion. In fact, we are also seeing a trend for multi-cloud – needing the data to be not only on AWS, but Azure and GCP as well. For that reason, we also provide our customers with the ability to implement cross-cloud replication so they can build a truly globally distributed, multi-cloud, data ecosystem.
Hyperight: And lastly, what is your future outlook on data analytics? What are the changes that companies can expect?
Kent Graziano: The future for analytics is global and multi-cloud for sure. To be successful in the modern era, organizations must adopt a cloud-based data architecture that allows them to ingest, share, and act on data in a secured and governed manner. To that end, I see the evolution now towards architectures and platforms that allow companies to put all their data in one place, on a single platform and easily share it with others. The concepts of data lake, data warehouse, and data marts are converging. The technology in the cloud gives us the opportunity to put all these conceptual frameworks together in one place so we can truly democratize all a company’s data making it available to not only C-suite and analysts, but data scientists and every person in the company that can benefit from access to data in order to drive better business outcomes.
The Data Innovation Summit has gone 100% Online and become a Global event!
You can now join the summit from the comfort of your home or office, and enjoy the unparalleled content shared through the program. The entire program will be streamed LIVE through the event platform Agorify between 18th to 21st of August 2020.
Register on the link below to get your online ticket and listen to more than 300 sessions delivered by the leading data-driven companies in the world!