Airflow dag unit testing

11/16/2023

This will do the job for you, resetting a metastore for every test session. Do not forget to clean up temporary files after running your tests, so you probably want to implement a clean-up logic after the yield statement. You can either create the metastore manually viaĪirflow db init or you can also set up a conftest.py file which could contain code like this: The metastore is needed because it contains information about your DAGs, environment configuration, metadata and so on. This is a simple SQLite database by default. Regarding the metastore, this can be also quite frustrating if you do not know about this. You will also need a metastore in this case. This has to do with Airflow’s internal structure when Airflow inserts template arguments. But if you use templated arguments instead, then you cannot use execute anymore and you have to use the run method. Normally, you will use the execute method on them and check if the result matches your expectations. One confusion may arise when testing operators in Airflow. What are such cases where testing in Airflow can become bothersome or confusing? This is suboptimal since testing should be easy, the more difficult testing for a developer is, the more they will tend to leave tests out. Otherwise, the testing experience can become quite frustrating. In order to perform well in unit testing, you have to understand a lot of internal details of how Airflow works. Then, you can clean up the test files and we have tested our task!īut testing in Airflow is not always easy, a lot of users complain that testing in Airflow can become quite difficult and most of the time this is true. Afterward, we can test whether the target file exists and whether the data that has been written to it is as we expect it to be.

Do not forget to use the function method on your task, otherwise, your test will not work. In this case, we are lucky, and mocking is quite easy. Mocking can be quite difficult in Airflow but does not have to be. Pd.testing.assert_frame_equal(left=act_df, right=exp_df)įirst of all, we create a fixture mock_postgres_connection which will mock the PostgresHook. Ingest_store_data_from_psql.function(target_file) With patch.object(.postgres.PostgresHook, "get_conn", return_value = mock_postgres_connection): Mock_value = Noneĭef test_ingestion_from_postgres(mock_postgres_connection: MagicMock): Mock_postgres_hook.return_value.get_conn.return_value = mock_connection

Mock_postgres_hook = MagicMock(name="postgres_hook") Import .postgresįrom unittest.mock import MagicMock, patchįrom import mock_postgres_connection(): In the following, I want to show you how one could start out writing unit tests for tasks and respectively assets in Prefect, Airflow, and Dagster. In Software Engineering writing tests is natural and this has to establish in the data orchestration space too. Nevertheless, in the space of data orchestration, testing is often done. Additionally, developers are not always in possession of quality test data due to PII or other reasons, and creating test datasets can be very time-consuming. Also sometimes there is just not enough time and the developers are under pressure to deliver pipelines in time in order to meet deadlines. There is nothing worse than deploying your code into production without testing it beforehand. This can have several reasons, often it is tedious to write tests and also not always so easy to write tests for pipelines. In Software Engineering testing is so incredibly important, one cannot stress enough HOW important testing is. It can be helpful to read the other parts of this article series for more context, here you go: Part 1, Part 2 and Part 3.

Welcome to the last part of this article series, in this article we want to have a look at Unit testing in Airflow, Prefect and Dagster.

0 Comments

Airflow dag unit testing

Leave a Reply.

Author

Archives

Categories