Why Work at Lenovo
Description and Requirements
Job Title: Director of Engineering – AI/ML Ops, Cloud Operations and SRE
Overview: Lenovo is expanding its Cloud DevOps team and seeking a Director of Engineering specializing in Artificial Intelligence (AI). This pivotal role will oversee three critical groups: DevOps for AI/ML, Site Reliability Engineering (SRE), and CloudOps for Platform Engineering, supported by managers of the respective focus areas. The successful candidate will lead the design, implementation, and optimization of cloud infrastructure and platforms, ensuring seamless support for AI development, data pipelines, and SaaS solutions. With a strong emphasis on DevOps practices, automation, especially in AI/ML operations, drives innovation and efficiency across the DevOps.
Responsibilities:
- Leadership & Management: Manage and mentor DevOps and Engineering teams responsible for continuous integration/deployment pipelines, cloud infrastructure, uptime SLAs, and automation of security tools. Provide strategic direction and support to engineering managers, project managers, and scrum masters, fostering a culture of innovation and collaboration.
- Technical Strategy: Set the long-term technical direction and strategy for cloud infrastructure and platform, contributing to broader engineering processes. Champion reliability, performance, and scalability of core infrastructure services (AWS, Azure), focusing on metrics-driven decision-making and data analysis.
- Budget & Resource Planning: Own the cloud and infrastructure budgets, resource planning, and operational expense execution. Develop financial models in collaboration with Engineering Leadership and Finance Ops teams, ensuring alignment with organizational objectives.
- Stakeholder Collaboration: Collaborate with business and engineering stakeholders to define and support the Cloud Infrastructure roadmap. Co-develop infrastructure standards, documentation, and processes, establishing infrastructure-wide SLOs, KPIs, and metrics.
- Security & Reliability: Infuse site reliability and security into all aspects of Lenovo's cloud infrastructure, ensuring stability, reliability, and high performance. Implement modern software security practices and secure software systems within cloud-based infrastructure.
- Team Empowerment & Development: Empower teams to continuously improve processes and culture, fostering the happiness and development of engineers. Mentor Engineering Managers and individual contributors, driving professional growth and excellence.
- Commitment & Support: Ensure teams consistently deliver value by actively removing roadblocks and supporting dependent teams. Maintain a focus on meeting commitments and driving results aligned with organizational goals.
Minimum Qualifications:
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience.
- 12+ years of experience in a software engineering environment, including 6+ years in engineering management roles.
- Prior experience supporting AI solutions and utilizing AI tools to enhance engineering efficiencies.
- Demonstrated leadership in managing global DevOps teams and offshore squads.
- Strong technical background in infrastructure platform container frameworks (e.g., Kubernetes), production database engines, and metrics tooling.
- Experience with cloud platforms (e.g., AWS, Azure) and CI/CD pipeline tooling (e.g., Jenkins, GitLab).
Preferred Qualifications:
- 15+ years of experience in a software engineering environment, with 10+ years in engineering management roles.
- Extensive experience in DevOps or Cloud Engineering management, including managing large teams and overseeing significant initiatives.
- Proven experience as a DevOps Engineer or SRE, with a strong software development and automation background.
- Expertise in deployment and management of LLMs, including technologies like RAG.
- Proficient in CI/CD tools (Jenkins, GitLab CI, CircleCI) and infrastructure as code (Terraform, Ansible)
- Solid knowledge of container orchestration technologies (Kubernetes, Docker)
- Familiarity with MLOps tools and practices to support machine learning lifecycle management
- Experience with cloud services (AWS, GCP, Azure), particularly in AI/ML deployments
- Background in monitoring tools like Prometheus, Grafana, and ELK stack
- Understanding of Python, particularly in data science and machine learning contexts
- Proven track record of executing customer-facing infrastructure initiatives across diverse product teams.
- Excellent leadership, influencing, and communication skills, focusing on driving continuous performance improvements.
- Proficiency in managing Azure, AWS, or distributed VMware environments, along with expertise in container technologies and code management tooling.
This role offers an exciting opportunity to lead and shape the future of Lenovo's cloud infrastructure, driving innovation and efficiency with a focus on AI technologies. We'd love to hear from you if you're a seasoned engineering leader passionate about DevOps and AI!
#LI-DB1