dlthub October ‘24: User meetups, Towards iceberg support
1. Community section
Community Milestones
We are thrilled to announce that over 3,000 private sources were built last month alone! In 2025, we want to enable you to share these sources so others can benefit from your work too.
Upcoming Events:
dltHub Portable data lake launch Berlin: 26 Nov 2024. Sign up here!
Community organised Melbourne Meetup!
Past events:
We are doing a series of community meetups where community users demo their work and we demo ours. We were in San Francisco, New York, Paris and Berlin is next week.
We organised community meetups with showcases from our users, with demos from
Alex Butler from Harness,
Josh Wills from Datalogy,
Wang Chen, Joan Andre, Sebastien Clouet from Stellantis,
Christophe Blefari CTO/Co-Founder of nao & writer of the blef newsletter,
Nicolas Estrada, 42,
Matthieu Rousseau, Modeo.
The community vibes were extraordinary. We will publish the recordings as they are ready.
From San Francisco we already have Josh’s talk uploaded. Watch Josh Wills “Shift Yourself Left”: Integration testing for data engineers
Do you want to talk about how you use dlt? Let us know (reach out to Adrian or Alena) so we can share your message to the dlt community.
Community contributions
• Fix BigQueryLoadJob hiding root cause exception by @xneg in #1992
• fix: if name of distribution is None by @senickel in #2024
• Fix pagination issue in JSONResponseCursorPaginator
with empty string cursor value by @kang8 in #2016
• fix typo by @mariarice15 in #1995
• simplify advanced section by @kning in #2037
• Update url in deploy-with-airflow-composer.md by @FriedrichtenHagen in #1942
Selected community play projects
Find more here: https://github.com/dlt-hub/dlt/network/dependents
2. What we recently did
Feature developments
Added incremental lag for handling use cases like re-acquiring last X days of data from reporting APIs
Preparations for a new interface for the data; coming in the next release. Read more early details here.
Preparations for the OSS segment of the portable data lake.
Various improvements and bugfixes.
Read the detailed commit/release logs here: dlt, Sources.
Early design partnerships and user research
We have been working on the portable data lake, taking feedback from some early design partners. Since it’s a new product with many facets, there’s a lot to learn about how to materialise our ideas into something you will enjoy using.
If you are looking for
a way to manage dlt for teams
a way to extend dlt into your data platform
a way to do proactive governance for data lake development
a way to get python+iceberg into production
a way to cut large scale compute costs
a way to package your “data product” into pip installable portable data lakes
then the portable data lake might be for you, and you should get in touch for a design partnership for early access or sign up to the waiting list.
3. Coming up next
Meet us in Berlin at our paid product launch.
Listen to talks from Personio, Forto, Flatiron Health, Bayer, Taktile and Tower and Data Talks Club. Join here!
Our Iceberg roadmap
We are preparing our iceberg roadmap spanning for the next half year. It aims to bring pythonic iceberg to production. Part of this effort will go into dlt OSS to assist creating iceberg pipelines, while another part will go towards our paid offering, aiming to assist creating iceberg or agnostic data platorms.
A new interface to filesystem data querying
We call it datasets and it will enable uniform client for access to data loaded to filesystems.
💡 Get involved: See our short-term roadmap here and tell us what you need.
Stay tuned for the next edition!