技术 | 在阿里云 PAI 上一键部署和使用 NVIDIA Cosmos Reason-1 模型

NVIDIA 近期发布了 Cosmos Reason-1 的 7B 和 56B 两款多模态大语言模型 (MLLM)，它们经过了“物理 AI 监督微调”和“物理 AI 强化学习”两个阶段的训练。其中，Cosmos Reason-1-7B 已经开源，其基于 Qwen2.5-VL 使用物理常识和具身推理数据进行了后训练。

阿里云 PAI-Model Gallery 现已集成 Cosmos Reason-1-7B 模型并提供企业级部署方案，本文介绍如何在阿里云人工智能平台 PAI (Platform of AI) 上快速部署和使用该模型。

NVIDIA Cosmos 平台

NVIDIA Cosmos™ 是一个世界基础模型 (WFM) 的开发平台，整合了先进的分词器、护栏以及用于加速数据处理和管理的工作流，它为世界模型的训练提供支持，并加速智能驾驶汽车 (AV) 和机器人的物理 AI 开发。

Cosmos 提供了一系列预训练多模态模型，开发者可以开箱即用，包括用于世界生成和后训练的 Cosmos Predict、生成大规模可控且高保真合成数据的 Cosmos Transfer、物理 AI 推理的 Cosmos Reason 以及过滤不安全输入并确保输出一致性和安全性的 Cosmos Guardrail 等。

其中，NVIDIA Cosmos Reason-1 是一款可完全定制的多模态 AI 推理模型，它专门为理解运动、物体交互以及时空关系而构建。基于思维链 (Chain-of-thought, CoT) 推理，Cosmos Reason-1 模型可以解读视觉输入、根据给定的提示词预测结果、并基于推理给出优化分析和决策。

该模型基于真实世界的物理规律实现推理，从而生成清晰且能够感知上下文环境的自然语言回复。Cosmos Reason-1 既可以作为其他物理 AI 模型的数据清洗和质量过滤工具，也可以作为规划模型用于推理具身智能体下一步的行为。

阿里云 PAI-Model Gallery 集成的 Cosmos Reason-1-7B

PAI-Model Gallery 是阿里云人工智能平台 PAI 产品组件，集成了众多 AI 开源社区中优质的预训练模型，并且基于开源模型支持零代码实现模型训练（微调）、压缩、评测、部署和推理的全部过程，提供更快、更高效、更便捷的 AI 应用体验。此外，它还提供开箱即用的 API，并且支持企业级数据安全。

Cosmos Reason-1-7B 集成到阿里云 PAI-Model Gallery，标志着用户能够在“AI + 云”的范式下，通过预训练模型的即用性与模块化设计，显著降低多模态技术栈选型的复杂度及模型适配成本。

企业和开发者现在可以基于领先的云原生平台，实现从原始视觉数据输入到物理约束驱动的高级推理输出的端到端开发，从而加速物理 AI 从原型到生产级部署的全生命周期。

在阿里云 PAI 上一键部署 Cosmos Reason-1-7B

本节介绍 Cosmos Reason-1-7B 模型在阿里云人工智能平台 PAI 上进行一键部署，以及模型验证的实践。

1. 在 PAI-Model Gallery 模型广场找到 Cosmos Reason-1-7B 模型。以阿里云新加坡和中国大陆的节点为例，模型直达链接如下：

中国大陆:

https://pai.console.aliyun.com/?regionId=cn-beijing#/quick-start/models/Cosmos-Reason1-7B/intro

新加坡:

https://pai.console.aliyun.com/?regionId=ap-southeast-1#/quick-start/models/Cosmos-Reason1-7B/intro

2. 在模型详情页右上角点击「部署」，在选择计算资源后，即可一键完成模型的云上部署。

3. 部署成功后，在服务页面可以点击“查看调用信息”获取调用的 Endpoint 和 Token，想了解服务调用方式可以点击“预训练模型”链接，返回模型介绍页查看调用方式说明。

4. 使用推理服务：您可以使用 API 调用模型服务，也可以使用 PAI 提供的 WebUI 界面与模型交互。

模型实测

实测使用 NVIDIA Research 官网上的视频案例作为模型输入。视频中一个人正在往杯子中倒牛奶，我们给模型的问题是：“视频中的人将牛奶倒入杯子以后，下一步最合理的动作是什么？”模型的回答展示出模型考虑到了人物下一步动作的各种可能性，甚至考虑到了倒完牛奶放好奶瓶后，用勺子搅拌牛奶和咖啡，最后模型在进行了一番深入思考后得出了结论，下一步是“将奶瓶放回到台面上”。

从这个例子可以看出，模型准确地“理解”了视频中人的动作并给出了正确的预测，这种对运动、物体交互以及时空关系的理解及预测能力一般难以做到，而且模型输出了完整的思维链，让我们能清楚地知道模型的整个思考过程。

下面附上完整的视频、输入问题、模型输出结果，供大家参考。

Question：The overall goal is "The agent pours milk into the cup". The agent in the video is currently performing one subtask out of many to complete this instruction. For the agent in the video, what is the most plausible next immediate subtask?

Model outputs：The agent is currently pouring milk into the cup. The next immediate subtask after pouring would logically be to stop pouring once the desired amount is achieved. Then, the agent would need to place the milk bottle back on the countertop. After that, they might close the bottle's cap to prevent spills. Finally, they could stir the mixture with the spoon to mix the milk and coffee thoroughly. However, since the current action is pouring, the very next step after pouring would be to stop pouring and set the bottle down. But the question asks for the most plausible next immediate subtask. Since the pouring is ongoing, the next step after completing the pour would be to place the milk bottle back on the countertop. Alternatively, if the pouring is still in progress, maybe adjusting the angle to ensure all milk is poured. But given the context, the agent is actively pouring, so once that's done, placing the bottle down makes sense. However, the answer might be to stop pouring and set the bottle aside. But the exact next step after pouring would be to place the bottle back. Alternatively, maybe the agent needs to check if the cup is full enough. But the most logical next step after pouring is to stop and place the bottle down. So the answer is likely "Place the milk bottle back on the countertop."

即将到来：基于 NVIDIA Cosmos Reason-1 tools 的模型后训练最佳实践

NVIDIA 也开源了 Cosmos Reason-1 tools，它是用于在 Cosmos Reason-1 基准测试上训练和评估模型的脚本。它提供的模型后训练脚本 (SFT + RL) 将加速企业和开发者基于自有数据定制 Cosmos Reason-1 模型，从而创建专属的 physical AI 模型。

阿里云 PAI 团队测试的 Cosmos Reason-1 tools 的性能表现如下：在 Qwen2.5-32B-Instruct 模型和 gsm8k 数据集 (Batch size = 2,048) 组合上进行后训练测试，相比其他开源框架，Cosmos Reason-1 tools 在小规模集群上实测有 1-2 倍的性能加速。PAI 将在近期集成 Cosmos Reason-1 tools 的模型后训练能力。

通过阿里云 PAI 上手实践 NVIDIA Cosmos Reason-1-7B 模型

您可以根据所在区域，在阿里云国际站的新加坡或中国大陆节点，通过阿里云 PAI 使用 Cosmos Reason-1-7B 模型，更多 Cosmos 相关资源请查：

Cosmos 开发者官网：

https://www.nvidia.cn/ai/cosmos/

Cosmos Reason-1 GitHub:

github.com/nvidia-cosmos/cosmos-reason1

Cosmos 开发文档：

https://docs.nvidia.com/cosmos/

NVIDIA Research 论文: Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning:

https://arxiv.org/abs/2503.15558