This position is responsible for hands-on design & implementation expertise in Spark and Python (PySpark) along with other Hadoop ecosystems like HDFS, Hive, Hue, Impala, Zeppelin etc. The purpose of position includes -
- Analysis, design and implementation of business requirements using SPARK & Python.
- Cloudera Hadoop development around Big Data.
- Solid SQL experience.
- Development experience with PySpark & SparkSql with good analytical & debugging skills.
- Development work for building new solutions around Hadoop and automation of operational tasks.
- Assisting team and troubleshooting issues.
- Design and development around Apache SPARK, Python and Hadoop Framework.
- Extensive usage and experience with RDD and Data Frames with in Spark.
- Extensive experience with data analytics, and working knowledge of big data infrastructure such as various Hadoop Ecosystems like HDFS, Hive, Spark etc.
- Should be working with gigabytes/terabytes of data and must understand the challenges of transforming and enriching such large datasets.
- Provide effective solutions to address the business problems – strategic and tactical.
- Collaboration with team members, project managers, business analysts and business users in conceptualizing, estimating and developing new solutions and enhancements.
- Work closely with the stake holders to define and refine the big data platform to achieve company product and business objectives.
- Collaborate with other technology teams and architects to define and develop cross- function technology stack interactions.
- Read, extract, transform, stage and load data to multiple targets, including Hadoop and Oracle.
- Develop automation scripts around Hadoop framework to automate processes and existing flows around.
- Should be able to modify existing programming/codes for new requirements.
- Unit testing and debugging. Perform root cause analysis (RCA) for any failed processes.
- Document existing processes as well as analyze for potential automation and performance improvements.
- Convert business requirements into technical design specifications and execute on them.
- Execute new development as per design specifications and business rules/requirements.
- Participate in code reviews and keep applications/code base in sync with version control.
- Effective communicator, self-motivated and able to work independently but fully aligned within a team environment.
Bachelor’s in computer science (or equivalent) or master’s with 5+ years of experience with big data against ingestion, transformation and staging using following technologies/principles/methodologies:
- Design and solution capabilities.
- Rich experience with Hadoop distributed frameworks, handling large amount of big data using Apache Spark and Hadoop Ecosystems.
- Python & Spark (SparkSQL, PySpark), HDFS, Hive, Impala, Hue, Cloudera Hadoop, Zeppelin.
- Proficient knowledge of SQL with any RDBMS.
- Knowledge of Oracle databases and PL/SQL.
- Working knowledge and good experience in Unix environment and capable of Unix Shell scripts (ksh, bash).
- Basic Hadoop administration knowledge.
- DevOps Knowledge is an added advantage.
- Ability to work within deadlines and effectively prioritize and execute on tasks.
- Strong communication skills (verbal and written) with ability to communicate across teams, internal and external at all levels.
- Working knowledge of Oracle databases and PL/SQL.
- Hadoop Admin & Dev-Ops.
- Good analytical thinking and problem-solving skills.
- Ability to diagnose and troubleshoot problems quickly.
- Motivated to learn new technologies, applications and domain.
- Possess appetite for learning through exploration and reverse engineering.
- Strong time management skills.
- Ability to take full ownership of tasks and projects.
- Team player with excellent interpersonal skills.
- Good verbal and written communication.
- Possess Can-Do attitude to overcome any kind of challenges.
Anyone of these:
1. CCA Spark and Hadoop Developer.
2. MapR Certified Spark Developer (MCSD).
3. MapR Certified Hadoop Developer (MCHD).
4. HDP Certified Apache Spark Developer.
5. HDP Certified Developer.