As a Data Engineer, you will be a key member of our data engineering team, responsible for building and maintaining large-scale data products and infrastructure. You’ll shape the next generation of data analytics tech stack by leveraging modern big data technologies. This role involves working closely with business stakeholders, product managers, and engineering teams to meet diverse data requirements that drive business insights and product innovation.Objectives
- Design, build, and maintain scalable data infrastructure for collection, storage, and processing.
- Enable easy access to reliable data for data scientists, analysts, and business users.
- Support data-driven decision-making and improve organizational efficiency through high-quality data products.
- Build large-scale batch and real-time data pipelines using frameworks like Apache Spark on AWS or GCP.
- Design, manage, and automate data flows between multiple data sources.
- Implement best practices for continuous integration, testing, and data quality assurance.
- Maintain data documentation, definitions, and governance practices.
- Optimize performance, scalability, and cost-effectiveness of data systems.
- Collaborate with stakeholders to translate business needs into data-driven solutions.
- Bachelor’s degree in Computer Science, Engineering, or related field (exceptional coding performance on platforms like LeetCode/HackerRank may substitute).
- 2+ years’ experience working on full lifecycle Big Data projects.
- Strong foundation in data structures, algorithms, and software design principles.
- Proficiency in at least two programming languages – Python or Scala preferred.
- Experience with AWS services such as EMR, Lambda, S3, DynamoDB (GCP equivalents also relevant).
- Hands-on experience with Databricks Notebooks and Jobs API.
- Strong expertise in big data frameworks: Spark, MapReduce, Hadoop, Sqoop, Hive, HDFS, Airflow, Zookeeper.
- Familiarity with containerization (Docker) and workflow management tools (Apache Airflow).
- Intermediate to advanced knowledge of SQL (relational + NoSQL databases like Postgres, MySQL, Redshift, Redis).
- Experience with SQL tuning, schema design, and analytical programming.
- Proficient in Git (version control) and collaborative workflows.
- Comfortable working across diverse technologies in a fast-paced, results-oriented environment.