
Airflow users are confined to the frameworks and clients that exist on the Airflow worker at the moment of execution.These resources are expensive, we don’t need to run the pipelines 24/7 at all. By default, Airflow needs to allocate worker resources all the time.There are many different ways to deploy an Airflow cluster, from a simple installation with CeleryExecutor to Dockerize deployment. Not only Data Engineers but also Data Scientists and Analysts are starting to pick it up to schedule their transformation pipelines or model training. Airflow allows users to launch a multi-step pipeline using a simple Python object DAG (Directed Acyclic Graph).
#Airflow solutions software
I have to repeat it! Airflow is a “MUST HAVE” software for Data Platform. For this reason, Airflow on Kubernetes is our final solution. So I have to look for something that can achieve our requirements: stability, scalability, and multi-tenants user support. After a few months, our scheduler needs to serve more users and handles more heavy workloads. It was working fine as a scheduler for our Data Engineering team.

Whenever I discuss “building a scheduler”, my head immediately pops out the “Airflow” word.Īt first, my Airflow is running using docker container and CeleryExecutor. Airflow is always my top favorite scheduler in our workflow management system. I have been using Airflow for a long time.

Four jets photo (Photo by Chandler Cruttenden on Unsplash)
