Education / qualification : IIT, IIIT, NIT
Must have experience in GPU, CUDA
Must have experience in Image Processing
Must have experience in Signal Processing, Deep Learning, Parallel Programming
STRICT NO NOS :
Key Responsibilities :
As part of KLA algorithms team, the job entails understanding core algorithms that have to expressed in various parallel computing constructs particularly HPC accelerators such as GPUs.
The first step in optimizing will be to theoretically model break-down of our AI algorithms and model it in terms of available bandwidth, computational FLOPS etc.
The implementation steps will include CUDA level programming along with performance tuning to ensure that we can come close to achieving the theoretical model.
The developer will be exposed to a variety of image processing, signal processing and deep learning loads that have to be optimized.
A complimentary stage of optimization includes exploring existing libraries and programming in higher level constructs such C++ Parallel programming.
While the initial focus of the team will be on NVIDIA GPUs, the R&D team will also be looking at other GPU accelerators from other vendors as well as FPGA acceleration.
You will collaborate with peer researchers in parallel computing areas and with algorithm teams in product groups.
Minimum Qualifications :
New / Recent College graduates in Ph.D. (preferred), Dual Degree MS in EE, CS or CSE. Bachelors graduates will also be considered.
A researcher who has a strong foundation in computer architecture, and in particular with a focus on high performance parallel processing at the device level (GPUs or CPUs / SIMD or FPGAs).
The researcher should have a strong mental model of computational loads and mapping different algorithms to parallel architectures.
Proficient in programming skills in C / Modern C++ and Python.
Experience in analyzing and tuning applications using profiling tools such as NSIGHT or VTUNES.
Good understanding and exposure to the Linux operating system at the user level.
Exposure to multiprocessor and multithreading concepts
Some familiarity with GPU programming such as CUDA, OpenCL or SYCL.
The position also requires a person with significant communication, initiative and the ability to navigate from relatively high-level requirements to low level computational models.