Back
Key Takeaway
Integrated dispersed data workflows into Airflow DAG for unified analytics environment
Restructured BigQuery-based analytical queries and Databricks workflows into Airflow DAG, and enhanced execution efficiency, reusability, and maintainability through code refactoring.
Fashion e-commerce (M Company)
Client :Fashion e-commerce (M Company)
Industry :Retail / Software
Service Area :Data & AI
Applied Solution :AIR
1. Overview (Project Background)
This project was initiated to transition data analysis workloads that were operated on BigQuery to the Databricks platform,
and to consolidate dispersed data processing workflows into a single Airflow operational system.
Previously, the system was operated with a mixed structure of BigQuery Scheduled Query and Airflow,
and the Databricks environment had workflows configured around sequential execution or single notebook-centric approaches,
which presented structural limitations in terms of scalability and maintainability.
In particular, complex logic where data processing reference dates vary depending on classification values was included,
necessitating improvements in workflow readability and reusability.
2. Solution (Resolution Approach)
This project established solutions centered on two key validation tasks.
Validation Task 1
Converted existing BigQuery-based SQL to Databricks SQL tailored to the Databricks environment,
and restructured some repetitive logic using Databricks UDF to improve execution efficiency and management convenience.
Validation Task 2
After analyzing the workflows running in the Databricks environment,
redesigned them into a To-Be Airflow DAG structure to standardize workflow execution and operations.
3. Result (Achievements)
Based on existing processing logic, each step was restructured into Airflow Task units,
and logic requiring refactoring was separately modularized to improve structure.
Through function-level modularization work, code reusability and maintainability were enhanced,
and by conducting logic analysis prior to Airflow implementation, risks during the workflow transition process were minimized.
Additionally, for existing workflows composed of multiple SQL queries and individual function logic,
analysis centered on structure and execution flow was performed, enabling organization in a form conducive to future expansion and operations.
Expected Benefits
By converting Databricks workflows to Airflow DAG and conducting code refactoring in parallel,
the overall workflow execution time is shortened, and a foundation for eliminating unnecessary computations has been established.
Furthermore, through query structure optimization, improvements in data processing efficiency and operational stability are expected.







