Summary of Rethinking Orchestration as Reconciliation: Software Defined Assets in Dagster | Elementl

This is an AI generated summary. There may be inaccuracies. Summarize another video

00:00:00 - 00:25:00

The talk argues that software-defined assets are a better way to manage data than the traditional imperative approach. Assets should be the primary abstraction in an orchestration system, as this system is in the best position to act as the source of truth for assets. Additionally, the talk suggests that software-defined assets allow for Python to become a first-class citizen in the modern data stack.

  • 00:00:00 This talk is about the move from an imperative to declarative paradigm in various software domains, and how this can be applied to data processing. The declarative paradigm is better at managing complexity and change, and the talk gives an example of how this can be applied to a real-world scenario.
  • 00:05:00 In the video, the speaker discusses the idea of "orchestration as reconciliation" and how it can be used to manage data more effectively. He introduces the concept of a "software-defined asset" and explains how it can be used to declaratively specify the contents of an asset and how it should be computed.
  • 00:10:00 In the video, the speaker discusses the concept of software-defined assets, which are assets that are defined in code and exist independently of any physical storage. He demonstrates how these assets can be managed and tracked using the open source tool Dexter, and how they can be used to power data-driven applications.
  • 00:15:00 In this video, the speaker discusses the idea of using an asset orchestrator to manage assets instead of tasks. He argues that this would allow for better visibility into what has happened with assets and make it easier to answer questions about them. He demonstrates how this would work with a few examples of discrepancies that need to be reconciled.
  • 00:20:00 In the "pre-modern" world, data processing was done in an imperative way, with an orchestrator like Airflow invoking computations across a variety of frameworks. This had many drawbacks, but one advantage was that it was easy to understand everything that was running in one place, and to make sure that everything was running in the right order. In the "modern" data stack, each tool has its own way to schedule work and view the assets that are defined inside of it, which has led to a loss of a central control plane and has made it difficult to understand what is happening with data processing.
  • 00:25:00 This talk examines the idea of software-defined assets, or assets that are declared in code, as a way to manage data more effectively. The talk argues that assets should be the primary abstraction in an orchestration system, as this system is in the best position to act as the source of truth for assets. Additionally, the talk suggests that software-defined assets allow for Python to become a first-class citizen in the modern data stack.