etl: Add a base job to fetch basic GitLab data asynchronously
This merge request introduces significant updates to support asynchronous ETL operations for GitLab project data. The primary changes include:
-
CI Pipeline Update: Adjusted the
.gitlab-ci.yml
to include a new job for collecting basic statistics and modified the distribution tag to incorporate asynchronous libraries. - Asynchronous Utilities: Added utility functions to handle asynchronous tasks and retries, ensuring robust and efficient execution of ETL processes.
-
Base ETL Job: Created a foundational ETL job that fetches and processes the main data from GitLab
mesa/mesa
project, including:- pipelines (
/pipelines
and/pipelines/<id>
endpoints) - merge requests (
/merge_requests
and/merge_requests/<id>
endpoints) - merge requests notes
- jobs
- pipelines (
- Data Models and Helper Functions: Defined data models and helper functions to streamline operations with InfluxDB, facilitating efficient data extraction and formatting.
-
Dependencies Update: Updated the
requirements.txt
to include new dependencies necessary for asynchronous operations.
Edited by Guilherme Gallo