- Entry level
- No Education
- Salary to negotiate
The Incident Commander will act as the lead technical problem solver during critical incidents to drive incidents to the shortest resolution time for our impacted customers. They will also act as a tie breaker and decision maker during outages and any crisis events across the Cisco WebEx Teams platform. The Incident Commander is the single point of control of the incident, they are the ultimate authority during the incident. While the Incident Commander does not need to be a subject matter expert on services they are restoring, they must excel at problem analysis, troubleshooting methodologies, and situational appraisal. They are expected to have the authority and confidence to demand actions from SRE Teams and Service Owners. The Incident Commander is expected to have excellent communication skills while doing so. You must possess a strong sense of emotional intelligence while leading teams under potentially stressful situations. The 3 essential qualities of the Incident Commander include (1) Commanding the action plan and the technical resources that support it; (2) Communicating the action plan and other pertinent information to stakeholders and interested parties; and (3) Coordination among parties executing the action plan. Finally, you will work in a team with other Incident Commanders.
Your key responsibilities will be:
Take command of incidents by setting up or taking over a cross-functional technical bridge call, comprised of Senior Engineers and SRE team members
Create an effective plan to restore service as quickly as possible in order to minimize MTTR ( Mean Time To Repair )
Ensure team is able to execute and overcome any identified blockers
Communicate across multiple audiences including extended stakeholders and executives
Be responsible for the incident management process, prioritizing, and ensuring the most critical issues are addressed timely and successfully
Identify any action items which will require follow up to improve service availability
Conduct post mortem reviews
Ensure highest operational readiness by leading training sessions, simulations and drills
Drive the technical root cause analysis process by assessing issues correctly and timely and then assembling the correct technical teams to execute remediation plan
Be on-call for incidents.
High level of understanding of the nature of distributed systems and cloud providers
Deep understanding of restful APIs
Consistent track record with customer incident, escalation and crisis management resolution
Ability to interact with senior executives, customers, and developers at the appropriate level
Excellent project management skills
Strong communications skills
Cloud providers such as AWS, GCP, Azure, etc
Familiarity with operating systems such as Linux, UNIX;
Familiar with network programming concepts and protocols
At ease with at least one scripting language
Cloud infrastructure and hybrid solutions
Modern toolsets such as Jira, PagerDuty, Kibana, Grafana etc.
About the company
At Cisco, transforming the way people work, live, play and learn also includes being a great place to work. For 17 years, we’ve been named a Fortune 100 Best Place to Work, one of a handful of companies included on the list since its inception. And, we are among 25 companies acknowledged as a world’s best multinational workplace.
We’re proud of this recognition because it means, at Cisco, you will enjoy working flexibly, using our own market-leading collaboration technology to drive innovation; taking advantage of numerous health and wellness resources; pursuing exciting career opportunities, and participating in programs to help your local communities or fulfill philanthropic interests. And, you’ll be doing all of this while being part of a global team that is making a positive difference in the world.
Learn why Cisco is a great place to work and what we offer you.