bradleybuda 2 hours ago

I really wish data engineers didn't have to hand-roll incremental materialization in 2024. This is really hard stuff to get right (as the post outlines) but it is absolutely critical to keeping latency and costs down if you're going to go all in on deep, layered, fine-grained transformations (which still seems to me to be the best way to scale a large / complex analytics stack).

My prediction a few years back was that Materialize (or similar tech) would magically solve this - data teams could operate in terms of pure views and let the database engine differentiate their SQL and determine how to apply incremental (ideally streaming) updates through the view stack. While I'm in an adjacent space, I don't do this day-to-day so I'm not quite sure what's holding back adoption here - maybe in a few years more we'll get there.

0cf8612b2e1e 4 hours ago

Is anyone using SQLMesh in production? I love “lessons learned” tools which have the opportunity to improve core design after seeing the weak points of the initial product in the space. That being said, I hate being an early adopter, so will let others determine if the new tool has an entirely novel set of shortcomings vs dbt.

  • captaintobs 4 hours ago

    There are many teams using SQLMesh in production. Fivetran, Harness, Hopper, Pitchbook to name a few.

    You can read some case studies here https://tobikodata.com/harness.html or join Slack to meet with folks to learn more about their experiences.

pdr94 6 hours ago

Great to see dbt finally rolling out microbatch incremental models! It's a much-needed feature and a step forward for data transformation. Excited to see how this evolves and complements tools like SQLMesh. Keep up the good work!

  • captaintobs 6 hours ago

    Thanks! Yes, it's a much requested feature but it's difficult to get right!

whinvik 2 hours ago

Can someone who understands it explain what dbt is and how it is used. I hear a lot about it but I just haven't figured out what it is useful for.

  • gkapur an hour ago

    Basically people are constantly calculating metrics based on existing tables. Think something as simple as a moving average or the sum of two separate columns in a table. Once upon a time you would set up a cronjob and populate these every day as a SQL query in some python or Perl script.

    Dbt introduced a language for managing these “metrics” at scale including the ability to use variables and more complex templates (Jinja.)

    Then you do dbt run (https://docs.getdbt.com/reference/commands/run) and kapow the metric is populated in your database.

    More broadly dbt did two other things: 1. It pushed the paradigm from ETL to ELT (so stick all the data in your warehouse and then transform it rather than transform it at extraction time.) 2. It created the concept of an “analytics engineer” (previously know as guy who knows SQL or business analyst.)

  • tiew9Vii 2 hours ago

    Some opinionated conventions around defining templated SQL queries in YAML files for ETL.

    Then it provides additional tooling around that, GUI’s, governance, everything your average large corporate asks for.

  • bitlad 2 hours ago

    I am not sure if it is that popular these days. Couple of years ago it was pretty popular.