Skip to main content

Running NemoClaw on JarvisLabs with Your Own Model (No NVIDIA API Key Required)

NemoClaw is NVIDIA's open-source framework for running sandboxed AI agents securely. By default, it routes inference through NVIDIA's cloud API — but you can run it entirely self-hosted on a JarvisLabs GPU instance with any open-source model, no NVIDIA API key needed.

This tutorial walks through setting up NemoClaw on a JarvisLabs VM with Qwen 2.5 7B served via vLLM.

What You'll Get

  • A sandboxed AI agent running on your own GPU
  • Local LLM inference via vLLM (no API costs, no data leaving your machine)
  • Landlock + seccomp + network namespace isolation for security
  • Full control over which model powers the agent

Prerequisites

  • A JarvisLabs account with the jl CLI installed and authenticated
  • An SSH key registered with JarvisLabs (jl ssh-key list to verify)

Architecture Overview

┌─────────────────────────────────────────────────────┐
│ JarvisLabs VM (A100 80GB) │
│ │
│ ┌──────────────┐ ┌────────────────────────────┐ │
│ │ vLLM │ │ OpenShell Gateway (k3s) │ │
│ │ Qwen 2.5 │◄───│ │ │
│ │ Port 8000 │ │ ┌──────────────────────┐ │ │
│ └──────────────┘ │ │ NemoClaw Sandbox │ │ │
│ │ │ (Landlock + seccomp) │ │ │
│ │ │ │ │ │
│ │ │ AI Agent (OpenClaw) │ │ │
│ │ └──────────────────────┘ │ │
│ └────────────────────────────┘ │
└─────────────────────────────────────────────────────┘

The AI agent runs inside a sandboxed container managed by OpenShell. When the agent needs to think, it calls out to vLLM running on the same machine — inference never leaves the VM.

Step 1: Launch a JarvisLabs VM

Check GPU availability and launch a VM with an A100 80GB:

# Check what's available
jl gpus --json

# Create a VM (not a container — NemoClaw needs Docker inside the instance)
jl create --gpu A100-80GB --vm --storage 100 --region IN2 --yes --json

Note the machine_id and ssh_command from the output. The VM template gives you full root access with Docker pre-installed.

Why a VM? NemoClaw runs Docker containers (OpenShell gateway, sandbox). JarvisLabs VMs support Docker-in-VM, while container instances don't support Docker-in-Docker well for this use case.

# Rename for easy identification
jl rename <machine_id> --name "nemoclaw-tutorial" --yes --json

Wait about 30 seconds for SSH to become available, then verify:

jl exec <machine_id> -- nvidia-smi

You should see your A100 80GB GPU.

Step 2: Check What's Pre-Installed

JarvisLabs VMs come with most of what we need:

jl exec <machine_id> -- sh -lc '
echo "=== Docker ===" && docker --version
echo "=== NVIDIA Container Toolkit ===" && dpkg -l | grep nvidia-container
echo "=== Python ===" && python3 --version
echo "=== OS ===" && cat /etc/os-release | head -2
'

What's already there:

  • Docker v29+
  • NVIDIA Container Toolkit v1.18+
  • Python 3.10
  • Ubuntu 22.04 LTS

What we need to add:

  • Node.js 22 (NemoClaw requirement)
  • Docker group access for the ubuntu user
  • A cgroup configuration fix

Step 3: Install Node.js and Fix Permissions

# Install Node.js 22
jl exec <machine_id> -- sh -lc '
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash - \
&& sudo apt-get install -y nodejs \
&& node --version \
&& npm --version
'

# Add ubuntu user to docker group (avoids needing sudo for docker)
jl exec <machine_id> -- sh -lc '
sudo usermod -aG docker ubuntu
'

Step 4: Fix cgroup Configuration

NemoClaw's OpenShell gateway runs k3s inside Docker, which requires cgroupns=host:

# Check current Docker daemon config
jl exec <machine_id> -- sh -lc 'sudo cat /etc/docker/daemon.json'

Add the cgroup setting and restart Docker:

jl exec <machine_id> -- sh -lc '
sudo python3 -c "
import json
with open(\"/etc/docker/daemon.json\") as f:
cfg = json.load(f)
cfg[\"default-cgroupns-mode\"] = \"host\"
with open(\"/etc/docker/daemon.json\", \"w\") as f:
json.dump(cfg, f, indent=4)
" && sudo systemctl restart docker
'

Your /etc/docker/daemon.json should now look like:

{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
},
"default-cgroupns-mode": "host"
}

Step 5: Start vLLM with an Open-Source Model

Launch vLLM serving Qwen 2.5 7B Instruct. This model doesn't require a HuggingFace token, uses only ~15 GB of VRAM, and performs well for agent tasks:

jl exec <machine_id> -- sh -lc '
sg docker -c "docker run -d \
--gpus all \
--name vllm \
-p 8000:8000 \
--shm-size 16g \
vllm/vllm-openai:latest \
--model Qwen/Qwen2.5-7B-Instruct"
'

Wait about 60 seconds for the model to download and load, then verify:

# Check vLLM logs (look for "Application startup complete")
jl exec <machine_id> -- sh -lc '
sg docker -c "docker logs vllm 2>&1 | tail -5"
'

# Test inference
jl exec <machine_id> -- sh -lc '
curl -s http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d "{
\"model\": \"Qwen/Qwen2.5-7B-Instruct\",
\"messages\": [{\"role\": \"user\", \"content\": \"Hello, who are you?\"}],
\"max_tokens\": 50
}"
'

You should get a JSON response with the model's reply.

Alternative Models

You can swap Qwen/Qwen2.5-7B-Instruct for any HuggingFace model that fits your GPU. Options for A100 80GB:

ModelVRAMNotes
Qwen/Qwen2.5-7B-Instruct~15 GBGreat balance of speed and quality
Qwen/Qwen2.5-32B-Instruct~40 GBStronger reasoning
meta-llama/Llama-3.1-8B-Instruct~16 GBRequires HF token
mistralai/Mistral-Small-24B-Instruct-2501~30 GBStrong for its size

Step 6: Install NemoClaw

Clone from GitHub, build the CLI, and install globally:

jl exec <machine_id> -- sh -lc '
cd /home/ubuntu \
&& git clone https://github.com/NVIDIA/NemoClaw.git \
&& cd NemoClaw \
&& npm install typescript \
&& npx tsc -p tsconfig.src.json \
&& cd nemoclaw \
&& npm install --ignore-scripts \
&& ./node_modules/.bin/tsc \
&& cd .. \
&& sudo npm install -g .
'

Note: NemoClaw requires two TypeScript build steps — one at the repo root (tsconfig.src.json) and one in the nemoclaw/ subdirectory. Both must complete before the CLI will work.

Verify the installation:

jl exec <machine_id> -- sh -lc 'nemoclaw help'

Step 7: Run NemoClaw Onboarding

The key trick: setting NEMOCLAW_EXPERIMENTAL=1 makes NemoClaw auto-detect your running vLLM instance and skip the NVIDIA API key requirement.

Since the onboarding wizard is interactive, SSH into the VM directly:

jl ssh <machine_id>

Then run:

NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard

The wizard will:

  1. Preflight checks — verifies Docker, OpenShell, GPU, cgroups
  2. Start gateway — deploys OpenShell (k3s cluster inside Docker)
  3. Create sandbox — builds and launches the agent container (takes a few minutes on first run)
  4. Configure inference — auto-detects vLLM on port 8000, selects it as the provider
  5. Set up inference route — configures OpenShell to route LLM calls to local vLLM
  6. OpenClaw setup — launches the agent framework inside the sandbox
  7. Policy presets — apply security policies (pypi, npm suggested by default)

When prompted:

  • Sandbox name: press Enter for default (my-assistant)
  • Policy presets: press Enter to accept suggestions

At the end you'll see a dashboard confirming the setup:

  ──────────────────────────────────────────────────
Sandbox my-assistant (Landlock + seccomp + netns)
Model vllm-local (Local vLLM)
NIM not running
──────────────────────────────────────────────────
Run: nemoclaw my-assistant connect
Status: nemoclaw my-assistant status
Logs: nemoclaw my-assistant logs --follow
──────────────────────────────────────────────────

Step 8: Verify Everything Works

# Check sandbox status
nemoclaw my-assistant status

# Check running containers
docker ps

You should see two containers:

  • openshell-cluster-nemoclaw — the OpenShell gateway
  • vllm — your local LLM server

Step 9: Connect to the Agent

nemoclaw my-assistant connect

This drops you into the sandboxed agent environment where OpenClaw is running with your local model.

Useful Commands

# View sandbox status
nemoclaw my-assistant status

# View logs
nemoclaw my-assistant logs --follow

# List all sandboxes
nemoclaw list

# Add policy presets (e.g., allow PyPI, npm, GitHub access)
nemoclaw my-assistant policy-add

# List available policy presets
nemoclaw my-assistant policy-list

# Stop everything
nemoclaw stop

# Destroy sandbox
nemoclaw my-assistant destroy

Cost Management

The JarvisLabs VM costs $1.49/hr for A100 80GB. To save money:

# Pause when not using (keeps storage, stops billing for GPU)
jl pause <machine_id> --yes --json

# Resume when needed
jl resume <machine_id> --yes --json

# Destroy when done (deletes everything)
jl destroy <machine_id> --yes --json

When you resume, you'll need to:

  1. Restart vLLM: docker start vllm
  2. Restart NemoClaw: nemoclaw start or re-run the onboard

Troubleshooting

"cgroup v2 detected but Docker is not configured"

Run the cgroup fix from Step 4. This is required for OpenShell's k3s to work.

vLLM container won't start after VM resume

Docker restart kills containers. Restart it:

docker start vllm
# Wait 60 seconds for model to load
docker logs vllm 2>&1 | tail -5

NemoClaw onboarding asks for NVIDIA API key

Make sure you set NEMOCLAW_EXPERIMENTAL=1 before running nemoclaw onboard, and that vLLM is running and healthy on port 8000:

curl -s http://localhost:8000/v1/models

Sandbox creation fails with permission errors

The installer script may try to npm install -g NemoClaw again. If you see EACCES errors, they're non-fatal — the tool is already installed from Step 6.

Want a different model?

Stop vLLM, remove the container, and start a new one with a different model:

docker stop vllm && docker rm vllm
docker run -d --gpus all --name vllm -p 8000:8000 --shm-size 16g \
vllm/vllm-openai:latest --model <new-model-name>

Then re-run NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard (it will detect the new model).

What's Next

  • Automate this — Create a JarvisLabs startup script that runs Steps 3-7 automatically
  • Try larger models — Swap in Qwen 2.5 32B or 72B for better agent reasoning
  • Add integrations — Configure Telegram or Slack bridges for remote agent access
  • Multi-GPU — Use 2x A100 80GB for models like Llama 3.3 70B with tensor parallelism

Summary

WhatDetails
PlatformJarvisLabs VM
GPUA100 80GB ($1.49/hr)
ModelQwen 2.5 7B Instruct (any HF model works)
InferencevLLM (OpenAI-compatible API)
Agent RuntimeNemoClaw + OpenShell + OpenClaw
NVIDIA API KeyNot required
HuggingFace TokenNot required (model-dependent)
SecurityLandlock + seccomp + network namespace
Total Setup Time~10 minutes