We need someone with 3- 5 years of extensive experience in Data Warehousing, ETL and Big data technologies(Hadoop, Hive, Sqoop.
etc) and 2+ years of mandatory experience in Spark with Python / Scala with more than one end- to- end implementation experience.
Roles and Responsibilities
To develop Scala or Python scripts, UDFs using both Data frames / SQL / Data sets and RDD in Spark 2.3+ for Data Aggregation, queries and writing data back into the OLTP system through Sqoop.
Should have a very good understanding of Partitions, Bucketing concepts and designed both Managed and external tables, ORC files in Hive to optimize performance.
Wrote and Implemented Spark and Scala scripts to load data from and to store data into Cassandra / Hbase / any NoSQL
Implementing SCD Type 1 and Type 2 model using Spark
Developed Oozie workflow for scheduling and orchestrating the ETL process
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, the correct level of Parallelism and memory tuning
Streaming data into Elastic search for visualization using Kibana
Should have implemented the mapping parameters / variables in the mapping and the session level to increase the reusability of the code and parameterize the hardcoded values. Additional skills :
Knowledge in AWS stacks AWS Glue, S3, SQS
Exposure to Elastic Search, Solr is a plus
Exposure to NoSQL Databases Cassandra, MongoDB
Exposure to Serverless computing