dltHub August '24: Towards Version 1.0 - trust and stability
dlthub August ‘24: Towards v1.0
1. Community
The dlthub community is building pipelines faster than ever, having now passed 8k private sources. In other words, we see about 10x more usage now than we did in the beginning of the year. We are seeing all kinds of interesting work in our dependents, from government ministries, other vendors using dlt under the hood, to various data platform experiments.
dlt in social media - give back in a simple way.
If you've benefited from dlt and appreciate our small, community-driven approach, consider giving us a shout-out. We're still a small company, and your contribution in this area enables us to allocate our resources where you need them most. The data engineering community is large and fragmented and most of it can only be reached by your word of mouth.
First certifications, and the second edition of the ELT with dlt workshop: US timezone.
First run was great!
600 of you signed up to attend the ELT with dlt workshop. Of those, half showed up live 60 of you submitted homework for Part 1 or 2, and 13 of you already passed. The homework deadline is end of this month, so better hurry!
Watch the first recordings here, or join the next cohort (see below). Here’s a short snippet:
…and then we made it better
Since it was so well received, we are applying our learnings and your feedback to bring you an improved version that will be more engaging. This one’s running in US timezones.
The next run will be on the 19th and 26th of September at 9 AM PDT // 12 noon EDT // 6 PM CEST.
Join here.
GDPR and HIPAA got you down? Lawyer up!
Governance and compliance are often topics we do not fully understand - what exactly must we do, what should it look like? In practice, there are lots of things we can do to comply and apply the best practices and demonstrate effort to protect private user data. For these purposes, we hired a specialist in the legal and applied aspects of data protection to give us webinars about GDPR and HIPAA. She comes heavily recommended by data professionals in Berlin.
GDRP: 2024-09-26 7:30 AM PDT / 9:30 AM CDT / 8:30 AM MDT / 10:30 AM EDT /3:30 PM BST / 4:30 PM CEST
HIPAA: 2024-10-02 9:00 AM PDT / 11:00 AM CDT / 10:00 AM MDT / 12:00 PM EDT / 5:00 PM BST / 6:00 PM CEST
Join here.
Community contributions
Update salesforce.md by @makies in #1665
Correct the library name for mem stats to
psutil
by @deepyaman in #1733
2. What we recently did
Trust & Stability: Version 1.0 of dlt is coming
Recently dltHub CTO/co-founder announced that we are about to release dlt 1.0. Read Marcin’s note on GitHub (https://github.com/dlt-hub/dlt/issues/1778).
A history of reliability and efficiency
dlt has been powering enterprise-grade mission-critical components in large deployments for over 2 years without maintenance. We publicly launched just over 1 year ago, and we are starting to hear from you that your pipelines are running for over 1 year with no maintenance.
A commitment to Enterprise grade stability
This release is specifically designed to meet the high expectations for trust and stability in enterprise environments. As part of this release we will move some commonly used sources such as SQL, REST API, and Filesystem into the core, signifying our commitment to their maintenance.
💡 Interested in our SMB or Enterprise support? Contact our solutions engineering team.
A standard is more embeddable when API is stable
It’s no secret that multiple data-tool companies are using dlt under the hood. For example, the PostHog Data Warehouse product runs over 20,000 dlt jobs daily for their customers from common sources to common destinations. Some tools like Ingestr /which is a light wrapper on our SQL source) even have more github stars than we do. We can see a few other vendors in our dependents and we often hear from agencies who build themed connector packages for the verticals they serve.
In a commitment to being a predictable, reliable open source partner, v1 will also bring stability to dlt’s API for your embedded usage.
Feature developments
Core
BigQuery: Can now load to a different project than specified in credentials
Delta Tables: Enabled schema evolution and partitioning. Added storage options and fixed bugs related to delta tables.
Performance enhancements**,** improvements around metadata.
Support for external locations and staging configurations for Databricks and Azure.
Enhanced SCD2 support with record reinsertion capabilities and customizable "valid from" / "valid to" values
Staging can now be configured to be truncated or left intact.
Connectors
Column selector for SQL source
REST API pagination and docs were improved
Read the detailed commit/release logs here: dlt, Sources.
3. Coming up next
Meet us at events
London 18th-19
Big Data London Conference September 18th to 19th
If you attend Big Data London, then meet us at booth X215 in the Discovery Zone. We are just next to our friends at Tobiko Data/sqlmesh and Euno.
Dig Ventures + dltHub networking breakfast - September 18th
In the morning of September 18th we are co-organising a networking breakfast for data professionals. The even will be attended, amongst others, by Ross Mason, the CTO/co-founder of Mulesoft.
Join us by filling out this form.
San Francisco Sep 23rd - 24th
We are one of the sponsors of the Small Data SF conference. Come by our booth. If you are a user of dlt, let us know and let us help you with free/discounted tickets.
Upcoming development
While we are working on launching v1, there’s still room for some other upcoming things:
We are working on a sqlite-powered SQL destination to bridge the other SQL dialects, and an iceberg-native destination
Quality of life improvements for developer experience: Improvements in dropping columns, handling state and schemas
Improvements to sources and destinations for finer grained control and extension.
💡 Get involved: See our short-term roadmap here and tell us what you need.
Thanks for being a part of the dltHub community! Together, we are building a foundational standard for data ingestion. Help us accelerate our mission by telling a friend about dlt.