From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 333E627F00E for ; Sat, 4 Oct 2025 16:38:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759595901; cv=none; b=BpHmlqEr+KFl3PoiIQc9aayvvJ3vdZSVLgJSB3VvCwj0SUIXJDUoDKtde+ApkFJFm8nthBGkpQtrq5ON/6plW57gXYxOBOr8IoO6aMChei89P/Mk1eUbRc1Gu8krVMiu8m/2Ruf8q7lpEXrLK0S7IEU4iSv7xIKGMFv2uvppTK4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759595901; c=relaxed/simple; bh=+uh6o7kvy+9CcrajMWefwEZpbwtucvgotNpOcbokxjI=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=LdoQcfKlnx+3YBWXsHZe5zM3sKv+ecmeJaI3rVmZBaqSyEAPWLnQYpDh8jrj15vl8mqgNAKXwarsCem6WLpuN9F6it1k508gLaXUUJXKP5Vi4EAMDfu8cY1bK3MJC5H2bMOxphgCGCHIvkbC2bThjVGs0zY7F7zppBsPUjBjBvA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=FSc9/Ltv; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="FSc9/Ltv" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:Message-ID:Date:Subject:Cc:To:From:Reply-To: Content-ID:Content-Description:In-Reply-To:References; bh=5ej4t4SpZiKaudHz57Clle2hesm5xdINYtmlNl2EYeM=; b=FSc9/LtvQ+guSgicCOb0jG3UNo wWvubmjXMS9pVJUpWGNLW8lWQ7MJYYCvih/xnpKD71mXdwD6gE4o2WNN+X5QVg3DxzhLJVu32KhSE LkHNRNBrrqq5cpSOEHUe25FjdZgLeYQrYjwdl9WByow75QyH5xQ1AYAIwfRPPWJ9Pt3gPJY46l5Zk OomadwtuZZYeHJL9zdcpyjxN50L+oJYLmoMyDcG0bIO4JhY8bfVmgXncijR1jLAx2vtbfwhMql+n8 q4mwmn5p8zaGCsYtPtDzSuKDBV/eStfjsDNA7Rr1iSpqxbTn7TiUrl0bPJXJX0x1atSgJd/kh9IWy zxRbjYKw==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1v55Gz-0000000DrK8-3e7K; Sat, 04 Oct 2025 16:38:17 +0000 From: Luis Chamberlain To: Chuck Lever , Daniel Gomez , kdevops@lists.linux.dev Cc: Devasena Inupakutika , DongjooSeo , Joel Fernandes , Luis Chamberlain Subject: [PATCH v2 0/4] vLLM and the vLLM production stack Date: Sat, 4 Oct 2025 09:38:10 -0700 Message-ID: <20251004163816.3303237-1-mcgrof@kernel.org> X-Mailer: git-send-email 2.51.0 Precedence: bulk X-Mailing-List: kdevops@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: Luis Chamberlain This adds initial vLLM and vLLM production stack support on kdevops. This v2 series augments vLLM support for real CPUs on bare metal using the DECLARE_HOSTS and also goes tested against a real GPU on the cloud, showing that essentially now anyone can use the vLLM production stack on any cloud provider we support in a flash. All we need are the instances which have GPUs added, and for that we expect growth soon using dynamic kconfig support. Demo results of the temporary quick benchmark for all cases, GPUs, CPUs, and VMs are here: https://github.com/mcgrof/demo-vllm-benchmark We will expand support soon for synthetic engines, so we can stress test vLLM routing without the overhead of any real hardware. We then need to expand the scope of testing using the vLLM benchmarks and graphing them. One of the benefits of all this is we can support *upstream kernel* changes and automatic testing of vLLM for compute in any complex way we can think of. Upstream kernels are not a requriement, we just support that. We also support AB testing since we already provide support for that, meaning folks can do AB testing with two different kernels. This should be enough to kick the tires, and scale real production AI workloads on kdevops. Since the first v1 patch already passed CI testing on kdevops, I'm posting this just as formality and will soon be merging this. Luis Chamberlain (4): workflows: Add vLLM workflow for LLM inference and production deployment vllm: Add DECLARE_HOSTS support for bare metal and existing infrastructure vllm: Add GPU-enabled defconfig with compatibility documentation defconfigs: Add composable fragments for Lambda Labs vLLM deployment .gitignore | 1 + PROMPTS.md | 31 + README.md | 26 +- .../configs/lambdalabs-gpu-1x-a10.config | 8 + .../configs/vllm-production-stack-gpu.config | 61 ++ defconfigs/lambdalabs-vllm-gpu-1x-a10 | 103 +++ defconfigs/vllm | 40 + defconfigs/vllm-declared-hosts | 53 ++ defconfigs/vllm-production-stack-cpu | 45 ++ .../vllm-production-stack-declared-hosts | 66 ++ .../vllm-production-stack-declared-hosts-gpu | 118 +++ defconfigs/vllm-quick-test | 42 ++ kconfigs/Kconfig.libvirt | 3 + kconfigs/workflows/Kconfig | 28 + playbooks/roles/gen_hosts/defaults/main.yml | 1 + playbooks/roles/gen_hosts/tasks/main.yml | 15 + .../gen_hosts/templates/workflows/vllm.j2 | 65 ++ playbooks/roles/gen_nodes/defaults/main.yml | 1 + playbooks/roles/gen_nodes/tasks/main.yml | 36 + playbooks/roles/linux-mirror/tasks/main.yml | 1 + playbooks/roles/vllm/defaults/main.yml | 17 + .../roles/vllm/tasks/cleanup-bare-metal.yml | 110 +++ .../vllm/tasks/configure-docker-data.yml | 187 +++++ .../roles/vllm/tasks/deploy-bare-metal.yml | 281 +++++++ playbooks/roles/vllm/tasks/deploy-docker.yml | 105 +++ .../vllm/tasks/deploy-production-stack.yml | 252 +++++++ .../vllm/tasks/install-deps/debian/main.yml | 101 +++ .../roles/vllm/tasks/install-deps/main.yml | 12 + .../vllm/tasks/install-deps/redhat/main.yml | 108 +++ .../vllm/tasks/install-deps/suse/main.yml | 50 ++ playbooks/roles/vllm/tasks/main.yml | 362 +++++++++ playbooks/roles/vllm/tasks/setup-helm.yml | 33 + .../roles/vllm/tasks/setup-kubernetes.yml | 307 ++++++++ .../roles/vllm/templates/vllm-benchmark.py.j2 | 152 ++++ .../vllm/templates/vllm-container.service.j2 | 80 ++ .../vllm/templates/vllm-deployment.yaml.j2 | 94 +++ .../vllm/templates/vllm-helm-values.yaml.j2 | 63 ++ .../vllm-prod-stack-official-values.yaml.j2 | 154 ++++ .../templates/vllm-upstream-values.yaml.j2 | 151 ++++ .../roles/vllm/templates/vllm-visualize.py.j2 | 434 +++++++++++ playbooks/vllm.yml | 12 + scripts/vllm-quick-test.sh | 191 +++++ scripts/vllm-status-summary.py | 404 ++++++++++ workflows/Makefile | 4 + workflows/vllm/Kconfig | 699 ++++++++++++++++++ workflows/vllm/Makefile | 136 ++++ workflows/vllm/README.md | 522 +++++++++++++ 47 files changed, 5763 insertions(+), 2 deletions(-) create mode 100644 defconfigs/configs/lambdalabs-gpu-1x-a10.config create mode 100644 defconfigs/configs/vllm-production-stack-gpu.config create mode 100644 defconfigs/lambdalabs-vllm-gpu-1x-a10 create mode 100644 defconfigs/vllm create mode 100644 defconfigs/vllm-declared-hosts create mode 100644 defconfigs/vllm-production-stack-cpu create mode 100644 defconfigs/vllm-production-stack-declared-hosts create mode 100644 defconfigs/vllm-production-stack-declared-hosts-gpu create mode 100644 defconfigs/vllm-quick-test create mode 100644 playbooks/roles/gen_hosts/templates/workflows/vllm.j2 create mode 100644 playbooks/roles/vllm/defaults/main.yml create mode 100644 playbooks/roles/vllm/tasks/cleanup-bare-metal.yml create mode 100644 playbooks/roles/vllm/tasks/configure-docker-data.yml create mode 100644 playbooks/roles/vllm/tasks/deploy-bare-metal.yml create mode 100644 playbooks/roles/vllm/tasks/deploy-docker.yml create mode 100644 playbooks/roles/vllm/tasks/deploy-production-stack.yml create mode 100644 playbooks/roles/vllm/tasks/install-deps/debian/main.yml create mode 100644 playbooks/roles/vllm/tasks/install-deps/main.yml create mode 100644 playbooks/roles/vllm/tasks/install-deps/redhat/main.yml create mode 100644 playbooks/roles/vllm/tasks/install-deps/suse/main.yml create mode 100644 playbooks/roles/vllm/tasks/main.yml create mode 100644 playbooks/roles/vllm/tasks/setup-helm.yml create mode 100644 playbooks/roles/vllm/tasks/setup-kubernetes.yml create mode 100644 playbooks/roles/vllm/templates/vllm-benchmark.py.j2 create mode 100644 playbooks/roles/vllm/templates/vllm-container.service.j2 create mode 100644 playbooks/roles/vllm/templates/vllm-deployment.yaml.j2 create mode 100644 playbooks/roles/vllm/templates/vllm-helm-values.yaml.j2 create mode 100644 playbooks/roles/vllm/templates/vllm-prod-stack-official-values.yaml.j2 create mode 100644 playbooks/roles/vllm/templates/vllm-upstream-values.yaml.j2 create mode 100644 playbooks/roles/vllm/templates/vllm-visualize.py.j2 create mode 100644 playbooks/vllm.yml create mode 100755 scripts/vllm-quick-test.sh create mode 100755 scripts/vllm-status-summary.py create mode 100644 workflows/vllm/Kconfig create mode 100644 workflows/vllm/Makefile create mode 100644 workflows/vllm/README.md -- 2.51.0