Lifesaving Nutrients. For Everyone. Forever.
The Data Engineer role at Sanku Project Healthy Children is vital for the orchestration and optimization of the organization’s data pipeline. Tasked with ensuring the seamless integration of diverse data sources into a unified, secure, and fault-tolerant system, the role emphasizes the robustness and reliability of data flows. They are responsible for monitoring cloud data systems, overseeing database administration, administering data governance, and ensuring compliance with local and international regulations. Collaborating closely with the Data Scientist and Senior Data Analyst, the Data Engineer ensures the timely, secure, and efficient processing of data, thus empowering the organization to fulfil its mission effectively.
Data Pipeline Administration:
- Monitor data pipeline on AWS, ensuring its robustness and reliability.
- Build, maintain, and optimize data models, data structures, and ETL processes using Apache Airflow and Python.
- Regularly monitor data performance and make necessary pipeline modifications using CloudWatch.
- Debug complex data pipeline issues ensuring continuous data flow.
- Oversee and optimize databases like MySQL for efficient data handling and querying.
- Generate, document, and test various scripts essential for operational metrics and reports.
Data Validation and Cleaning:
- Validate data extracted from the pipeline against other relevant data sources.
- Automate processes using AWS Lambda ensuring consistent and accurate data extraction.
- Develop and implement algorithms to clean and validate data.
- Work closely with the Data Scientist and Senior Data Analyst to refine data-driven strategies.
- Assist teams with data-related technical issues and fulfil their data pipeline needs.
- Identify system enhancements and recommend changes.
Data Governance Administration:
- Implement and enforce standards and guidelines across the ETL and data pipeline processes.
- Work collaboratively with stakeholders to define and maintain metadata standards, ensuring consistent data definitions and clarity.
- Oversee data quality assurance processes, ensuring data integrity and reliability throughout the data pipeline.
- Advocate for data privacy and security protocols, ensuring compliance with relevant local and international regulations.
Data Engineer Stack:
- Data Warehousing: Amazon S3, Amazon RDS
- Data Integration & Processing: Apache Airflow, Python, AWS Lambda
- Monitoring & Alerts: CloudWatch
- Database Management: MySQL
- Data Visualization & Reporting: Power BI
- Data Exchange: REST APIs, JSON, NetSuite REST API, Postman
- Infrastructure & Networking: AWS ECS, AWS EC2, AWS VPC, AWS Subnets, AWS Route Tables, AWS Security Groups
- Automation & Scripting: IaC automation using Terraform, SQL, Python, Bash and Linux Scripting
- Version Control: Git, AWS CodeCommit, GitHub Actions, AWS Code Pipeline
Key Performance Indicators (KPIs):
- Data Pipeline Efficiency: Measure the performance, speed, reliability, and cost-effectiveness (including cost management) of data pipelines, ensuring data is available for analysis in a timely manner.
- Data Accuracy and Integrity: Monitor the accuracy of data ingested into systems and ensure that data cleaning processes are effectively maintaining the quality of data.
- System Uptime and Resilience: Ensure that data pipelines, including databases and ETL processes, are consistently available with minimal downtime.
Qualifications & Experience:
- Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or a related field.
- Minimum of 5 years of progressive experience in data engineering, especially in large-scale and complex FMCG data environments.
- Proficiency in the aforementioned data stack.
- Demonstrated ability to build scalable data models and data pipelines.
- Familiarity with big data tools and environments.
- Strong problem-solving skills and analytical thinking.
- Ability to work in a team-oriented environment and communicate effectively.
Are you looking to sharpen your Software Development skills to stay relevant in the market? CLICK HERE to have a look at the top schools.
For all your IT certification needs, please, click here for information on how to get started