Fault Management Engineer
GS Global Services
Noida, Uttar Pradesh, India
4d ago

Technical skills

  • Identifying relevant specific components and interfaces needed for a cloud infrastructure & virtualization platform.
  • Cloud Management Platform (VIM controller)
  • End to end architectural know how on Openstack components (RHEL host OS & Linux components (ovs))
  • System Administration of Linux ubuntu / VMware environment
  • Nova(Compute) & Cinder(storage) and Ericsson Hyperscale datacenter solution like HDS 8000 HW knowledge,
  • Basic knowledge of handling physical connectivity / switching .
  • NFV and Mano architecture,
  • Neutron(network), SDN and VLAN and overlay (VXLAN, MPLSGRE etc.)
  • DC network virtualization based on SDN controllers and virtual switches
  • Tenant virtual network implementation stretching across multiple NFVI
  • Demonstrated experience in cloud administration in datacenter.
  • Redhat 7, any one of the cloud technology (OpenStack, Azure, AWS)
  • Backup / restore and Upgrade rollback and new VM’s creation.
  • Cinder, Ceilometer / Gnocchi / Aodh, Glance, Heat, Horizon, Keystone, Neutron, Nova, Libvirt / KVM, Pacemaker, Rabitt MQ
  • Linux distributions : Ubuntu, Centos, Red hat New installation and trouble shoot.
  • Thorough understanding of Cloud technologies and ecosystem.
  • Analyze the issue and provide the root cause.
  • In-depth understanding of Open Stack Architecture and the components and IaaS (Infrastructure as a service) deployments.
  • Live experience in deploying / managing the Cloud Infra using Openstack .
  • Senior level OpenStack experience (Minimum 2 years of Open stack) . Must know architecture, operations and be able to troubleshoot bugs within OpenStack to achieve root cause analysis
  • Trouble shooting with OS for memory and CPU utilization breaches and provide RCA.
  • Expert level Linux OS troubleshooting. Ability to troubleshoot issues with the underlying components of OpenStack when investigating incidents or testing new features and projects
  • Demonstrated ability to use configuration languages like Ansible / Bash to create automations and manage systems
  • Understands service deployment using the Virtualization, Orchestration, Image, and Bare Metal Provisioning services etc.
  • Hands on skills on Shell scripting knowledge .
  • Any experience on cloud technologies like Openstack / AWS is highly desirable
  • Strong in fundamentals including Networking, Security, OS concepts, Virtualization etc.
  • Design, Implement and Maintain Cloud infrastructure and services
  • Server Hardware and Software integration & testing, installation of virtualization services, Cloud, server OS, applications.
  • Configuring storage, networking and security functionalities.
  • Storage knowledge like HP, EMC and IBM LUN creation and allocation.
  • Network Knowledge with create and assigning VLANS.
  • Knowledge with ticketing tool like ITSM, Incident management problem management change management.
  • Experience and Education

  • Experience on SDM platforms between 7-9 yrs
  • Educational Qualification B.E / B.Tech
  • Role requires tasks majorly listed below (but not limited to)

  • Provide 24 / 7 L2 support on relevant elements
  • Perform L2 fault and service restoration
  • Perform L2 fault investigations and root cause analysis to resolve issues within SLAs.
  • Input into Incident Reports and Problem Management
  • Update TT with L2 fault information, impact analysis and rectification.
  • During critical and escalated major fault rectification, participate in fault management as requested by the Incident Manager and attend the bridge conferences as per the WLA.
  • Escalation to Vendor for L3 support in the rectification of service impacting faults and other system faults as necessary
  • Collect and analyses required logs and traces during and post fault investigations
  • Support other L2 domain teams in identifying faults
  • Develop, maintain and perform L2 Health Checks and scripts
  • Develop and maintain L2 technical processes and tools
  • Ensure backup of all elements as per Back-Up policy
  • Assist performance team in troubleshooting of Network performance issues and customer faults, CPDs.
  • Review plans and results of IOT, SW release and feature testing
  • Review regression test plan, specifications and results for SW updates
  • Implementation of SW updates including review release notes, impact / risk assessments, FAR (participate in FAR to up-skill for GAR rollout), GAR, analyze performance post change
  • Maintain network baseline document set eg life cycle database for all elements
  • Review technical documentation / notes and faults descriptions from Vendor and implement workarounds / fixes as required
  • Maintain known issue list / Risk Register
  • Improve optimization of alarms, counters and thresholds.
  • Provide recommendation to the local Engineering teams on configuration, software updates and general element management as appropriate.
  • Review, awareness and approval of CR activity and participate in Daily CR Review meetings
  • Usage data collection and control
  • Perform System Performance monitoring and analysis
  • Provide Operational Acceptance for new elements being introduced into the network
  • Provide data for pre-sales support
  • Experience and Education

    Experience on Core platforms between 4-6 yrs

    Educational Qualification B.E / B.Tech

    Report this job
    checkmark

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    Apply
    My Email
    By clicking on "Continue", I give neuvoo consent to process my data and to send me email alerts, as detailed in neuvoo's Privacy Policy . I may withdraw my consent or unsubscribe at any time.
    Continue
    Application form