Jarvislabs Blog

Engineering

Learn how to speed up LLM inference by 1.4-1.6x using speculative decoding in vLLM. This guide covers Draft Models, N-Gram Matching, Suffix Decoding, MLP Speculators, and EAGLE-3 with real benchmarks on Llama-3.1-8B and Llama-3.3-70B.

NemoClaw

How to Deploy NVIDIA NemoClaw on JarvisLabs

Team JarvisLabs · March 2026 · 10 min read

Deploy NVIDIA NemoClaw with local Nemotron 3 Nano 30B inference on a JarvisLabs A100 VM. Kernel-level sandboxing for AI agents with zero cloud API dependency.

PrismAudio

How to Run PrismAudio on JarvisLabs

Vishnu Subramanian · March 2026 · 8 min read

Run PrismAudio, the 518M parameter Video-to-Audio model from ICLR 2026, on a JarvisLabs A100 GPU. From setup to Gradio web UI in under 15 minutes.

ComfyUI

AI Videos, Music, and 3D Models from Your Terminal — ComfyUI on Cloud GPUs with Claude Code

Vishnu Subramanian · March 2026 · 9 min read

A step-by-step guide to running ComfyUI workflows on cloud GPUs using the JarvisLabs CLI and Claude Code. Generate videos from photos, music from text, and 3D models from images — across multiple GPUs in parallel — all without leaving your terminal.

CLI

Introducing the JarvisLabs CLI: Let Your Agents Run the GPUs

Atharva Ingle · March 2026 · 15 min read

Introducing jl, a command-line interface for the JarvisLabs GPU cloud built for both humans and coding agents. Provision GPUs, run training jobs, monitor experiments, and let your agents handle the infrastructure.

Engineering

How We Made GPU Instance Launch 4x Faster

Vishnu Subramanian · March 2026 · 14 min read

From 8 seconds to 1.8 — how we tore apart every layer of our instance creation pipeline in three days to make GPU launches feel instant.

LLM

Scaling LLM Inference: Data, Pipeline & Tensor Parallelism in vLLM

Jaydev Tonde · March 2026 · 54 min read

Learn how to scale LLM inference using data parallelism, pipeline parallelism, and tensor parallelism in vLLM. Practical guide with A100 GPU benchmarks comparing DP vs PP vs TP.

LLM

vLLM Optimization Techniques: 5 Practical Methods to Improve Performance

Jaydev Tonde · February 2026 · 26 min read

Learn 5 practical vLLM optimization methods: prefix caching, FP8 KV-cache, CPU offloading, disaggregated prefill/decode, and zero-reload sleep mode, with benchmark-backed guidance.

LLM

Disaggregated Prefill-Decode: The Architecture Behind Meta's LLM Serving

Vishnu Subramanian · January 2026 · 11 min read

Part 1 of my LLM optimization research series. Exploring how Meta's disaggregated prefill-decode strategy separates prompt processing from token generation - and what it means for JarvisLabs.

LLM

The Complete Guide to LLM Quantization with vLLM: Benchmarks & Best Practices

Jaydev Tonde · January 2026 · 47 min read

Complete guide to LLM quantization with vLLM. Compare AWQ, GPTQ, Marlin, GGUF, and BitsandBytes with real benchmarks on Qwen2.5-32B using H200 GPU - 4-bit quantization tested for perplexity, HumanEval accuracy, and inference speed.

LLM

Deploying MiniMax M2.1 with vLLM: Complete Guide for Agentic Workloads

Atharva Ingle · December 2025 · 10 min read

Learn how to deploy MiniMax M2.1 with vLLM for agentic workloads and coding assistants. Covers hardware requirements, tensor/expert parallelism, benchmarking on InstructCoder, tool calling with interleaved thinking, and integration with Claude Code, Cline, and Cursor.

Engineering

CUDA Cores Explained

December 2024 · 7 min read

A deep dive into CUDA cores, Tensor Cores, precision modes, and other specialized GPU features that impact performance.

comfyui

ComfyUI Prompt Enhancement Guide: Using Ollama and LLMs for Better AI Image Generation

Thamim · November 2024 · 3 min read

Learn how to improve your Stable Diffusion prompts using Ollama and LLMs in ComfyUI. Step-by-step guide to setup, workflow, and best practices for enhanced AI image generation.

PyTorch

ML Experiment Tracking: Complete Guide to W&B and Hydra

Atharva Ingle · November 2024 · 22 min read

Learn how to effectively track and manage ML experiments using Weights & Biases (W&B) and Hydra. A comprehensive guide for machine learning practitioners and researchers.

comfyui

How to Run FLUX AI Image Generator with ComfyUI: Complete Setup Guide

Thamim · November 2024 · 4 min read

Step-by-step guide to set up and run FLUX.1 Schnell for AI image generation using ComfyUI on cloud GPUs. Includes workflows, LoRA integration, and practical examples.

LLM

Uncensored LLM Models: A Complete Guide to Unfiltered AI Language Models

Vishnu Subramanian · November 2024 · 4 min read

Explore uncensored LLM models, their differences from ChatGPT, and how they're built. Learn about foundation models, fine-tuning, and running unfiltered AI models locally.

vision

Flux AI Image Generator Tutorial: Setup Guide for Cloud GPU (2024)

Vishnu Subramanian · August 2024 · 4 min read

Step-by-step guide to install and run Flux.1 AI image generator on cloud GPU. Learn how to generate high-quality AI images using Flux's open-source model with detailed setup instructions and examples.

finetuning

Create AI Training Datasets with Fooocus: Face Swap and Pose Matching Guide

Praveen · March 2024 · 3 min read

Step-by-step guide to creating custom AI training datasets using Fooocus's face swap and pose matching features for Stable Diffusion model finetuning

LLM

How to Deploy and Connect with Ollama LLM Models: A Comprehensive Guide

Vishnu Subramanian · March 2024 · 3 min read

Learn how to effectively deploy and interact with Ollama LLM models using terminal commands, local clients, and REST APIs. Discover tips for choosing the right GPU, managing storage, and troubleshooting common issues.

GPU

Boost PyTorch Performance with Hugging Face Accelerate: Multi-GPU & Mixed Precision Training

Vishnu Subramanian · February 2024 · 5 min read

Discover how to enhance your PyTorch scripts using Hugging Face Accelerate for efficient multi-GPU and mixed precision training. Learn setup, configuration, and code adaptation for faster deep learning model training.

NLP

How to Train Billion-Parameter NLP Models on One GPU with DeepSpeed and HuggingFace

Tanul Singh · February 2022 · 5 min read

Learn how to train large language models efficiently using DeepSpeed and HuggingFace Trainer. This step-by-step guide shows you how to optimize GPU memory and train 10B+ parameter models on a single GPU using ZeRO-Offload.

NLP

Build a Toxic Comment Classifier with RoBERTa and PyTorch Lightning | Complete Tutorial

Ishan Dutta · December 2021 · 15 min read

Learn how to build a toxic comment classifier using RoBERTa and PyTorch Lightning. This step-by-step tutorial covers mixed precision training, multi-GPU setup, and Weights & Biases integration for ML model tracking.

Computer Vision

ResNet-50 Performance Optimization: Modern Training Techniques to Achieve 80.4% Accuracy

Vishnu Subramanian · October 2021 · 5 min read

Learn how to boost ResNet-50 accuracy from 75.3% to 80.4% using advanced training techniques, including BCE loss, data augmentation, and optimization strategies. A comprehensive guide to modern CNN training best practices.

Engineering

Speculative Decoding in vLLM: Complete Guide to Faster LLM Inference

How to Deploy NVIDIA NemoClaw on JarvisLabs

How to Run PrismAudio on JarvisLabs

AI Videos, Music, and 3D Models from Your Terminal — ComfyUI on Cloud GPUs with Claude Code

Introducing the JarvisLabs CLI: Let Your Agents Run the GPUs

How We Made GPU Instance Launch 4x Faster

Scaling LLM Inference: Data, Pipeline & Tensor Parallelism in vLLM

vLLM Optimization Techniques: 5 Practical Methods to Improve Performance

Disaggregated Prefill-Decode: The Architecture Behind Meta's LLM Serving

The Complete Guide to LLM Quantization with vLLM: Benchmarks & Best Practices

Deploying MiniMax M2.1 with vLLM: Complete Guide for Agentic Workloads

CUDA Cores Explained

ComfyUI Prompt Enhancement Guide: Using Ollama and LLMs for Better AI Image Generation

ML Experiment Tracking: Complete Guide to W&B and Hydra

How to Run FLUX AI Image Generator with ComfyUI: Complete Setup Guide

Uncensored LLM Models: A Complete Guide to Unfiltered AI Language Models

Flux AI Image Generator Tutorial: Setup Guide for Cloud GPU (2024)

Create AI Training Datasets with Fooocus: Face Swap and Pose Matching Guide

How to Deploy and Connect with Ollama LLM Models: A Comprehensive Guide

Boost PyTorch Performance with Hugging Face Accelerate: Multi-GPU & Mixed Precision Training

How to Train Billion-Parameter NLP Models on One GPU with DeepSpeed and HuggingFace

Build a Toxic Comment Classifier with RoBERTa and PyTorch Lightning | Complete Tutorial

ResNet-50 Performance Optimization: Modern Training Techniques to Achieve 80.4% Accuracy