Outcomes : Primary Outcomes : 1. Build Data Pipelines (ETL) and Data APIs : Design / Develop Data Pipelines to sink data in GCP services & Data APIs to use it for either data collection or serve the data 2.
Design / Build Data Modeling and Data Warehouse Solutions 3. Improve data reliability, efficiency, and quality : Recommend ways () to achieve this by implementing processes and developing solutions 4.
Discover opportunities for data acquisition : Explore the data integration opportunities in order to generate actionable insights for customer.
5. Learning & Development : Contribute to knowledge asset building by conducting a series of technical trainings & best practices document.
Develop internal team members to build and execute (big) data processing systems Job description : 1. Analyze Client Requirements -
Collect detailed requirements from the given brief requirements & document it (if required) - Create technical specifications document that includes the data transformation logic, analysis recipes 2.
Data Exploration & Transformation - Explore the different data sources to uncover the new insights which helps to fulfill the business requirements -
Design the data pre-processing technique to clean & transform the data sets 3. Design & Develop data processing systems -
Develop, construct, test, and maintain architectures for the large-scale data processing systems such as data warehousing to meet business requirements -
Discover opportunities for data acquisition by identifying the hidden patterns from the data sets - Learn and use Google Cloud / AWS products for the data processing systems & deploy same on Cloud Platform -
Develop data set processes for data modeling, mining and production - Data migration, transformation, and scripting - Quick development of PoC / MVP for business requirements -
Employ a variety of tools and technologies to marry data processing systems together 4. Conduct research to answer industry and business questions 5.
Prepare data for use in predictive and prescriptive modeling by using data analytics programs 6. Maintain coding standards & Building reusable code and libraries for future use 7.
Automate the data analysis workflows 8. Data security and data protection - Configure / set up the systems / services to maintain (big) data securely by implementing data encryption techniques Technical Competencies : -
Data Structure & Algorithms : Must have knowledge of Python Data Structure (data manipulation) - Libraries / Packages : numpy / pandas -
Big Data Technologies : MapReduce / Hadoop / Apache Spark - Python / Java : Python - Cloud Technology : (AWS / GCP / Azure) -
Testing, Code Debugging & Error Logging - API & Web Services - Database : SQL Queries, MongoDB,Cassandra - Version Control System : GitLab / GitHub / Bitbucket (ref : hirist.com)