Data Exploration Made Easy with Open Source

Tools-in-Action

Room: AmigaOS

Scheduled at : Wednesday 14:15 14:40

Abstract

Exploring data is HARD! I bet you’ve often asked yourselves how can I possibly run SQL queries on Parquet and other tabular formats inside my own data lake?

While running a modern data lake certainly has a lot of upside, with all the benefits, it’s pretty obvious that object stores (the bedrock of every cloud data lake) were never designed to be Data Warehouses. Sometimes all we need is to be able to explore our data, look at its schema, compare versions and more, ideally without having to use any additional tools or install additional components.

This is where we can leverage the power of open source. In this talk I’ll share how open source lakeFS embedded DuckDB to enable just this kind of experience, natively from within the lakeFS UI. By leveraging DuckDB data engineers achieve simple, performant ways to explore data without having to run expensive and complex distributed systems, all within their same workflow and experience. In this talk you’ll learn about the benefits of leveraging the power of DuckDB within lakeFS––what this looks like in practice, and why you should try this at home.

Oz Katz

Oz Katz

Oz Katz is the CTO and Co-founder of Treeverse, the company behind lakeFS, an open source platform that delivers resilience and manageability to object-storage based data lakes. Oz engineered and maintained petabyte-scale data infrastructure at analytics giant SmilarWeb, which he joined after the acquisition of Swayy.

Speaker's bio
Presentation type Tools in Action, 25mn
Track Big Data, Machine Learning, IA & Analytics
Presentation level Intermediate
Keywords Big Data exploration open source

Room AmigaOS