Key Responsibilities
- Design and implement scalable batch processing systems using Python and big data technologies
- Optimize database performance, focusing on slow-running queries and latency improvements
- Use Python profilers and performance monitoring tools to identify bottlenecks
- Reduce P95 and P99 latency metrics across our data platform
- Build efficient ETL pipelines that can handle large-scale data processing
- Collaborate with data scientists and product teams to understand data requirements
- Monitor and troubleshoot data pipeline issues in production
- Implement data quality checks and validation mechanisms
- Document data architecture and engineering processes
- Stay current with emerging big data technologies and best practices
Qualifications
- Required
- Bachelor's degree in Computer Science, Engineering, or related technical field
- 4+ years of experience in data engineering roles
- Strong Python programming skills with focus on data processing libraries
- Experience with big data technologies (Spark, Hadoop, etc.)
- Proven experience optimizing database performance (SQL or NoSQL)
- Knowledge of data pipeline orchestration tools (Airflow, Luigi, etc.)
- Understanding of performance optimization techniques and profiling tool
Preferred
- Master's degree in Computer Science or related field
- Experience with SEO data or web crawling systems
- Experience with Clickhouse Database
- Knowledge of distributed systems and microservices architecture
- Familiarity with container orchestration (Kubernetes, Docker)
- Experience with real-time data processing
- Contributions to open-source projects
- Experience with machine learning operations
नौकरी रिपोर्ट करें