dlt 02/24 update: Community Growth, Async generators, towards source generation

Mar 01, 2024

Something about you

The slack community grew by 25% the last month thanks to the influx of dtc data zoomcamp students where dlt was introduced. This attracted a mix of personas into our group, from python laypeople to senior data people in non engineering roles, putting dlt’s simplicity to the test.

Highlighted contributions

Evan Phillips and swishbi.com contributed code time and testing for Databricks destination.
willi-mueller contributed Bing Webmaster our sources
glebzhidkov contributed a fix to the notion pipeline
salomartin offers 2 gists for sdmx source(Eurostat, OECD, ECB, most national statistics agencies etc) and async postgres source
mhauzirek improved our blog article that compares semantic modelling capabilities
taljaards and snehangsude fixed some issues with our docs; we got the hint and will run a GPT spell checker :)
And many of you contributed by asking questions, effectively highlighting unclear docs, gotchas or issues.

💡 Do you want to contribute but are unsure how? Ask for help in our slack channel “#2-Sharing and contributing”

2. `dlt` in recent weeks

Doubling down on user understanding - New Solutions Engineering Team

We have started a solutions engineering team to better support your onboarding and learn how to improve our product to support your needs.

You will notice small changes to our processes around capturing your Slack questions and offering onboarding support.

Alki from 433 explaining to us in a visit of their Amsterdam HQ in February how they run dlt in production, sharing their feedback and needs

The data normie generation steps out of the boundaries of classical data warehousing.

At dlthub , we want to be grounded in the current reality of your work and build solutions for your use cases and needs

💡 Do you need help onboarding? Book an exploration call or fill out a dlt support program form!

Constant docs improvements

Thanks to data talks club, we recently had a large influx of users and lot more engagement. We are listening to all the inputs and using it as an opportunity to continue improving our docs

Using your questions in slack to improve our docs
Using GPT to fix spelling errors and grammar
We are working towards more automation, such as self-documenting pipelines .

Async generators are now supported

This enables performance boosts and better streaming support. You can find more info under performance docs.

Community member Martin Salo gives an example of using this feature to create a postgres copy 15x faster than Fivetran. He also tweets about using it in an ERP pipeline for 182X cost and 10x speed gains.

Moving towards CDC (change data capture) support, enabling near-real time clones of databases and other systems.

We added hard_delete and dedup_sort columns hint for merge, PR, docs.

Additionally we prioritised higher a few usability improvements such as more control on destinations over the bigger milestones like the bad data sink. We are still planning to release sink destinations for reverse ETL in the coming weeks.

New destinations

Databricks destination, big thanks to Evan Phillips and swishbi.com for contributing code, time and test environment.
Azure Synapse destination.
Google drive source and destination enabled via filesystem, read more in the updated docs.

New sources:

Google drive source and destination enabled via filesystem, read more in the updated docs.
Bing webmaster added by community contributor Willi.
CSV reader with duckdb added to filesystem source: PR.
Kafka source.

3. Coming up next

Short-term vision: make sure the library is not only liked by you, but “loved”

We did a lot of user interviews recently to deeply understand what they like about dlt and what they want to be improved. Many users pick dlt because it is the fastest and most light way to create a dataset. As we frequently hear it from you “dlt is pip install and go”.

In Q1 a big focus of ours is to invest even more into what the community appreciates about dlt. Every week we automate small things in your workflows and integrate the requirements you offer.

In the spirit of our value “Multiply, don’t add” we are doing things that compound to make dlt the data loading tool we all wish we had.

Build sources fast
Run pipelines fast
Document pipelines fast
Build destination fast
Build transformation fast

Any related feedback or suggestions? Let me (Adrian) know. We will present some of these projects in the upcoming product updates.

The HTTP API source is a stepping stone to source generation

The upcoming HTTP API source will essentially be a “building kit” for http API sources, reducing efforts to merely declaring endpoints and authentication. This makes it very simple to build sources, but also means that we can build sources out of information that can be obtained automatically. This brings us closer to being able to point at an API and generate the entire pipeline.

On our radar

You can find our current issue board here. Noteworthy are better Athena support, SCD2, bad data sinks and a simplified destination for reverse ETL.

dlt verified sources/destinations in the works

We are working closer with some of you to explore various ways of onboarding. Resulting out of this will be a handful of sources that we will add in the next weeks: Scrapy for web scraping, Betteruptime, Aws cost explorer and possibly others.

If you need a source added, please request it or offer it by opening a git issue or asking for it in Slack.

💡 Get involved: request a source, request a feature

4. Community queries + Opportunities

Call for Snowflake + dlt customers

We are in the process to become a Snowflake ready technology partner. If you are a Snowflake customer that uses dlt, we would love to get in touch in a 30min call, learn about your use case and see if/how we can additionally help.

💡 If you are interested, please out yourself to Adrian in our Slack!

Are you writing or talking about dlt? Do you want to present dlt a local meetups?

Friends don’t let friends build crappy pipelines. We want to empower you to be a voice for dlt.

If you are happy with dlt and would like to tell others, please consider:

If you want to write about dlt, let us know in #2-sharing-and-contributing. We will support with resources, feedback and visibility.
If you want to present at local meetups on behalf of dlt, let us know in #2-sharing-and-contributing.

Closing words

Thank for being part of this journey with us, and thank you for your continued support.

As a data engineer, I am excited to see such technical excellence, and as a professional I am deeply humbled by the quality of professionals that work with us towards creating better, simpler, saner solutions.

The passion, professionalism, dedication, open mindedness and inquisitive mindset of our community members echo my own feelings about the project.

Adrian Brudaru

Data Engineer & dlt Cofounder

dlthub’s Substack

Discussion about this post