Skip to content

etl: Add a base job to fetch basic GitLab data asynchronously

This merge request introduces significant updates to support asynchronous ETL operations for GitLab project data. The primary changes include:

  • CI Pipeline Update: Adjusted the .gitlab-ci.yml to include a new job for collecting basic statistics and modified the distribution tag to incorporate asynchronous libraries.
  • Asynchronous Utilities: Added utility functions to handle asynchronous tasks and retries, ensuring robust and efficient execution of ETL processes.
  • Base ETL Job: Created a foundational ETL job that fetches and processes the main data from GitLab mesa/mesa project, including:
    • pipelines (/pipelines and /pipelines/<id> endpoints)
    • merge requests (/merge_requests and /merge_requests/<id> endpoints)
    • merge requests notes
    • jobs
  • Data Models and Helper Functions: Defined data models and helper functions to streamline operations with InfluxDB, facilitating efficient data extraction and formatting.
  • Dependencies Update: Updated the requirements.txt to include new dependencies necessary for asynchronous operations.
Edited by Guilherme Gallo

Merge request reports

Loading