- Design, build, and maintain scalable ETL/ELT data pipelines using Azure Data Factory, Databricks, and Spark.
- Develop and optimize data workflows using SQL and Python or Scala for large-scale data processing and transformation.
- Implement performance tuning and optimization strategies for data pipelines and Spark jobs to ensure efficient data handling.
- Collaborate with data engineers to support feature engineering, model deployment, and end-to-end data engineering workflows.
- Ensure data quality and integrity by implementing validation, error-handling, and monitoring mechanisms.
- Work with structured and unstructured data using technologies such as Delta Lake and Parquet within a Big Data ecosystem.
- Contribute to MLOps practices, including integrating ML pipelines, managing model versioning, and supporting CI/CD processes.
- Data Engineering & Cloud:
- Proficiency in Azure Data Platform (Data Factory, Databricks).
- Strong skills in SQL and [Python or Scala] for data manipulation.
- Experience with ETL/ELT pipelines and data transformations.
- Familiarity with Big Data technologies (Spark, Delta Lake, Parquet).
- Data Optimization & Performance:
- Expertise in data pipeline optimization and performance tuning.
- Experience on feature engineering and model deployment.
- Analytical & Problem-Solving:
- Strong troubleshooting and problem-solving skills.
- Experience with data quality checks and validation.
Nice-to-Have Skills:
- Exposure to NLP, time-series forecasting, and anomaly detection.
- Familiarity with data governance frameworks and compliance practices.
- Basics of AI/ML like:
- ML & MLOps Integration
- Experience supporting ML pipelines with efficient data workflows.
- Knowledge of MLOps practices (CI/CD, model monitoring, versioning)
नौकरी रिपोर्ट करें