The Role
- Implementing Back-end cloud-based data lake/ data warehouse with best practices
- Working closely with the development team to build and deliver back-end data pipelining components following industry best-practices and in adherence with architectural principles
- Design, develop, and maintain scalable and efficient ETL/ELT data pipelines from various internal and external data sources
- Identify and gather business requirements to design data models to ensure data quality, integrity and performance
- Conduct thorough testing and validation of data pipelines to ensure accuracy, reliability, and data consistency
- High-Quality Analytics Solutions
- Collaborate closely with data scientists and analysts, to understand needs and develop solutions that meet those requirements
- Troubleshoot and resolve data-related issues, perform root cause analysis, and implement preventive measures
- Create clear documentation for architectures, data dictionary, data mapping, and any other relevant technical information
- Stay up to date with emerging technologies, tools, and best practices in the field of data engineering and apply them to improve existing processes and systems
- Working collaboratively with the rest of the IT team members – based onshore and offshore – to ensure that solutions delivered are high-quality and easy-to-support
- Communicating clearly and concisely across all levels – facilitating design decisions with other IT stakeholders in simple terms
We are looking for :
- 3+ years experience in a Data Engineering role using SQL and Python
- Strong understanding in data lake/ data warehouse design best practices and principles
- Practical hands-on experience in cloud-based data services for ETL, e.g. AWS, EMR, Airflow, Redshift, Glue
- Deployment experience and management of MLOps framework, e.g. AWS SageMaker, ECR
- Experience in distributed computing system, such as Spark, Hive, Hadoop
- Experience with Databases such as Postgres, MySQL, Oracle
- Strong communication skills in English – both in speaking and writing
Experience in other cloud platform and hybrid cloud infrastructure, e.g. GCP, Azure
Understanding in Machine Learning / Deep Learning
Proficient in real-time, near real-time data streaming, e.g. Kafka, Spark Streaming, Pub/Sub