The candidate will be the part of Analytical experts of the organization who would help in expanding and optimizing data, support data initiatives and data structure.
The candidate is expected to have a good command over technology and languages like Python, Hadoop, Apache Spark (SparkR, Sparklyr), C, C++ or Java, etc.
Repository maintenance in Git / GitHub, experience in maintaining security policies on Hadoop. Performance tuning in Spark, experience in setting up best practices in Hadoop and Spark based projects.
Responsibilities and Duties
The candidate will have to work on Data Warehousing and ETL tools.
The candidate will have to work on Hadoop based analysis (HBase, Hive, MapReduce, etc)
Processing structured and unstructured data
Deploying and maintaining a Hadoop cluster, adding and removing the nodes using cluster monitoring tools like Ganglia or Cloudera manager
Configuring the name node high availability and keeping a track of all Hadoop running jobs.
resolve cluster connectivity, track, monitor and improve the performance
Coding in R programming language
Development in Jupyter Notebook and Zeppelin
Required Experience, Skills and Qualifications
Essential Qualifications :
In- depth knowledge of SQL and other database solution
Minimum 2- 4 years of experience on Hadoop technologies
Sound knowledge of Apache Spark
Sound knowledge of Python Programming, Jupyter Notebook
Excellent knowledge of UNIX / LINUX and OS
Knowledge of cluster monitoring tools like Ganglia, Ambari or Nagios
Knowledge of Networking, CPU, Memory, and Storage
Knowledge of virtual machines and their capabilities
Excellent troubleshooting capabilities in data integration
Desirable Qualification :
Informatica knowledge will be an add on.
Acquaintance with GitHub
Knowledge of programming tools like R and SAS
Knowledge of visualization tools likePowerBI, Tableau