General Information

Req #
WD00067560
Career area:
Research/Development
Country/Region:
China
State:
Beijing
City:
北京(Beijing)
Date:
Thursday, August 1, 2024
Working time:
Full-time
Additional Locations
* China - Beijing - 北京(Beijing)

Why Work at Lenovo

We are Lenovo. We do what we say. We own what we do. We WOW our customers. 

Lenovo is a US$57 billion revenue global technology powerhouse, ranked #248 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world’s largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo’s continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY). 

This transformation together with Lenovo’s world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub.

Description and Requirements

工作职责:

1.负责设计高可用大模型训练容错系统,支持千亿大模型预训练

2.负责大模型训练容错checkpoint优化,提升大模型checkpoint读写与恢复性能

3.负责大模型弹性训练框架的研发

岗位要求:

1.全日制硕士以上学历,计算机科学与技术、人工智能等相关专业;

2.熟练C++/Python语言、数据结构以及计算机系统结构,有AI模型性能调优经验,以及良好的工程实现能力;

3.熟悉 AI 领域常见的分布式训练技术,包括但不限于:数据并行、流水线并行和张量并行等,具有相应的项目经验;

4.至少熟悉一种AI框架(PyTorch/TensorFlow/Paddle/DeepSpeed等),能够熟练使用和调试;

5.熟悉 GPU 硬件结构和 CUDA 计算原理,有 CUDA 相关算子开发、调试经验,对 NCCL/cuDNN 等有一定了解;

6.对大规模预训练模型有较好的了解,熟悉常见的预训练模型(如GPT、BERT等)结构、训练方法和优化技巧。

7.具备出色的问题解决能力和创新思维,能够分析和解决复杂的训练问题,并提出改进和优化的方案;

8.具有良好的团队合作精神,能够与跨部门的团队紧密合作,共同推动项目的成功。

加分项:

1.有大模型研发和分布式训练经验;

2.熟悉Kubernetes架构以及大模型训练容错系统;

3.在AI或者HPC领域发表过高水平论文。

Additional Locations
* China - Beijing - 北京(Beijing)
* China
* China - Beijing
* China - Beijing - 北京(Beijing)