- Added a
report_asset_check
REST API endpoint for runless external asset check evaluation events. This is available in cloud as well. - The
config
argument is now supported on @graph_multi_asset
- [ui] Improved performance for global search UI, especially for deployments with very large numbers of jobs or assets.
- [dagster-pipes] Add S3 context injector/reader.
- [dagster-dbt] When an exception when running a dbt command, error messages from the underlying dbt invocation are now properly surfaced to the Dagster exception.
- [dagster-dbt] The path to the dbt executable is now configurable in
DbtCliResource
.
- Fixed a bug introduced in 1.5.3 that caused errors when launching specific Ops in a Job.
- Fixed a bug introduced in 1.5.0 that prevented the
AssetExecutionContext
type annotation for the context
parameter in @asset_check
functions. - Fixed an issue where the Dagster scheduler would sometimes fail to retry a tick if there was an error reloading a code location in the middle of the tick.
- [dagster-dbt] Fixed an issue where explicitly passing in
profiles_dir=None
into DbtCliResource
would cause incorrect validation. - [dagster-dbt] Fixed an issue where partial parsing was not working when reusing existing target paths in subsequent dbt invocations.
- [ui] Fixed an issue where the job partitions UI would show “0 total partitions” if the job consisted of more than 100 assets
- [dagster-duckdb] The
DuckDBResource
and DuckDBIOManager
accept a connection_config
configuration that will be passed as config
to the DuckDB connection. Thanks @xjhc!
- Added events in the run log when a step is blocked by a global op concurrency limit.
- Added a backoff for steps querying for open concurrency slots.
- Auto-materialize logic to skip materializing when (1) a backfill is in progress or (2) parent partitions are required but nonexistent are now refactored to be skip rules.
- [ui] Added 2 new asset graph layout algorithms under user settings that are significantly faster for large graphs (1000+ assets).
- Running multiple agents is no longer considered experimental.
- When the agent spins up a new code server while updating a code location, it will now wait until the new code location uploads any changes to Dagster Cloud before allowing the new server to serve requests.
- Alert policies can now be set on assets + asset checks (currently experimental). Check out the alerting docs for more information.
- Added a new flag
--live-data-poll-rate
that allows configuring how often the UI polls for new asset data when viewing the asset graph, asset catalog, or overview assets page. It defaults to 2000 ms. - Added back the ability to materialize changed and missing assets from the global asset-graph. A dialog will open allowing you to preview and select which assets to materialize.
- Added an experimental AMP Timeline page to give more visibility into the automaterialization daemon. You can enable it under user settings
- Added a
report_asset_materialization
REST API endpoint for creating external asset materialization events. This is available in cloud as well. - [dbt] The
@dbt_assets
decorator now accepts a backfill_policy
argument, for controlling how the assets are backfilled. - [dbt] The
@dbt_assets
decorator now accepts a op_tags
argument, for passing tags to the op underlying the produced AssetsDefinition
. - [pipes] Added
get_materialize_result
& get_asset_check_result
to PipesClientCompletedInvocation
- [dagster-datahub] The
acryl-datahub
pin in the dagster-datahub
package has been removed. - [dagster-databricks] The
PipesDatabricksClient
now performs stdout/stderr forwarding from the Databricks master node to Dagster. - [dagster-dbt] The hostname of the dbt API can now be configured when executing the
dagster-dbt-cloud
CLI. - [dagster-k8s] Added the ability to customize how raw k8s config tags set on an individual Dagster job are merged with raw k8s config set on the
K8sRunLauncher
. See the docs for more information.
Previously, the asset backfill page would display negative counts if failed partitions were manually re-executed. This has been fixed.
Fixed an issue where the run list dialog for viewing the runs occupying global op concurrency slots did not expand to fit the content size.
Fixed an issue where selecting a partition would clear the launchpad and typing in the launchpad would clear the partition selection
Fixed various issues with the asset-graph displaying the wrong graph
The IO manager’s handle_output
method is no longer invoked when observing an observable source asset.
[ui] Fixed an issue where the run config dialog could not be scrolled.
[pipes] Fixed an issue in the PipesDockerClient
with parsing logs fetched via the docker client.
[external assets] Fixed an issue in external_assets_from_specs
where providing multiple specs would error
[external assets] Correct copy in tooltip to explain why Materialize button is disabled on an external asset.
- [pipes] A change has been made to the environment variables used to detect if the external process has been launched with pipes. Update the
dagster-pipes
version used in the external process. - [pipes] The top level function
is_dagster_pipes_process
has been removed from the dagster-pipes
package.
- Override a method in the azure data lake IO manager (thanks @0xfabioo)!
- Add support of external launch types in ECS run launcher (thanks @cuttius)!
- The Python GraphQL client is considered stable and is no longer marked as experimental.
- Previously, asset backfills targeting assets with multi-run backfill policies would raise a "did not submit all run requests" error. This has been fixed.
- The experimental dagster-insights package has receieved some API surface area updates and bugfixes.
- Dagster now automatically infers a dependency relationship between a time-partitioned asset and a multi-partitioned asset with a time dimension. Previously, this was only inferred when the time dimension was the same in each asset.
- The
EnvVar
utility will now raise an exception if it is used outside of the context of a Dagster resource or config class. The get_value()
utility will retrieve the value outside of this context. - [ui] The runs page now displays a “terminate all” button at the top, to bulk terminate in-progress runs.
- [ui] Asset Graph - Various performance improvements that make navigating large asset graphs smooth
- [ui] Asset Graph - The graph now only fetches data for assets within the viewport solving timeout issues with large asset graphs
- [ui] Asset Graph Sidebar - The sidebar now shows asset status
- [dagster-dbt] When executing dbt invocations using
DbtCliResource
, an explicit target_path
can now be specified. - [dagster-dbt] Asset checks can now be enabled by using
DagsterDbtTranslator
and DagsterDbtTranslatorSettings
: see the docs for more information. - [dagster-embedded-elt] Dagster library for embedded ELT
- [ui] Fixed various issues on the asset details page where partition names would overflow outside their containers
- [ui] Backfill notification - Fixed an issue where the backfill link didn’t take the —path-prefix option into account
- [ui] Fixed an issue where the instance configuration yaml would persist rendering even after navigating away from the page.
- [ui] Fixed issues where config yaml displays could not be scrolled.
- [dagster-webserver] Fixed a performance issue that caused the UI to load slowly
- [dagster-dbt] Enabling asset checks using dbt project metadata has been deprecated.
Improved ergonomics for execution dependencies in assets - We introduced a set of APIs to simplify working with Dagster that don't use the I/O manager system for handling data between assets. I/O manager workflows will not be affected.
AssetDep
type allows you to specify upstream dependencies with partition mappings when using the deps
parameter of @asset
and AssetSpec
.MaterializeResult
can be optionally returned from an asset to report metadata about the asset when the asset handles any storage requirements within the function body and does not use an I/O manager.AssetSpec
has been added as a new way to declare the assets produced by @multi_asset
. When using AssetSpec
, the multi_asset does not need to return any values to be stored by the I/O manager. Instead, the multi_asset should handle any storage requirements in the body of the function.
Asset checks (experimental) - You can now define, execute, and monitor data quality checks in Dagster [docs].
- The
@asset_check
decorator, as well as the check_specs
argument to @asset
and @multi_asset
enable defining asset checks. - Materializing assets from the UI will default to executing their asset checks. You can also execute individual checks.
- When viewing an asset in the asset graph or the asset details page, you can see whether its checks have passed, failed, or haven’t run successfully.
Auto materialize customization (experimental) - AutoMaterializePolicies
can now be customized [docs].
- All policies are composed of a set of
AutoMaterializeRule
s which determine if an asset should be materialized or skipped. - To modify the default behavior, rules can be added to or removed from a policy to change the conditions under which assets will be materialized.
- Dagster pipes is a new library that implements a protocol for launching compute into external execution environments and consuming streaming logs and Dagster metadata from those environments. See https://github.com/dagster-io/dagster/discussions/16319 for more details on the motivation and vision behind Pipes.
- Out-the-box integrations
- Clients: local subprocess, Docker containers, Kubernetes, and Databricks
PipesSubprocessClient
, PipesDocketClient
, PipesK8sClient
, PipesDatabricksClient
- Transport: Unix pipes, Filesystem, s3, dbfs
- Languages: Python
- Dagster pipes is composable with existing launching infrastructure via
open_pipes_session
. One can augment existing invocations rather than replacing them wholesale.
- [ui] Global Asset Graph performance improvement - the first time you load the graph it will be cached to disk and any subsequent load of the graph should load instantly.
- Fixed a bug where deleted runs could retain instance-wide op concurrency slots.
AssetExecutionContext
is now a subclass of OpExecutionContext
, not a type alias. The code
def my_helper_function(context: AssetExecutionContext):
...
@op
def my_op(context: OpExecutionContext):
my_helper_function(context)
will cause type checking errors. To migrate, update type hints to respect the new subclassing.
AssetExecutionContext
cannot be used as the type annotation for @op
s run in @jobs
. To migrate, update the type hint in @op
to OpExecutionContext
. @op
s that are used in @graph_assets
may still use the AssetExecutionContext
type hint.
@op
def my_op(context: AssetExecutionContext):
...
@op
def my_op(context: OpExecutionContext):
...
- [ui] We have removed the option to launch an asset backfill as a single run. To achieve this behavior, add
backfill_policy=BackfillPolicy.single_run()
to your assets.
has_dynamic_partition
implementation has been optimized. Thanks @edvardlindelof!- [dagster-airbyte] Added an optional
stream_to_asset_map
argument to build_airbyte_assets
to support the Airbyte prefix setting with special characters. Thanks @chollinger93! - [dagster-k8s] Moved “labels” to a lower precedence. Thanks @jrouly!
- [dagster-k8s] Improved handling of failed jobs. Thanks @Milias!
- [dagster-databricks] Fixed an issue where
DatabricksPysparkStepLauncher
fails to get logs when job_run
doesn’t have cluster_id
at root level. Thanks @PadenZach! - Docs type fix from @sethusabarish, thank you!
- Our Partitions documentation has gotten a facelift! We’ve split the original page into several smaller pages, as follows:
- New dagster-insights sub-module - We have released an experimental
dagster_cloud.dagster_insights
module that contains utilities for capturing and submitting external metrics about data operations to Dagster Cloud via an api. Dagster Cloud Insights is a soon-to-be released feature that shows improves visibility into usage and cost metrics such as run duration and Snowflake credits in the Cloud UI.