From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 333E627F00E
	for <kdevops@lists.linux.dev>; Sat,  4 Oct 2025 16:38:19 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1759595901; cv=none; b=BpHmlqEr+KFl3PoiIQc9aayvvJ3vdZSVLgJSB3VvCwj0SUIXJDUoDKtde+ApkFJFm8nthBGkpQtrq5ON/6plW57gXYxOBOr8IoO6aMChei89P/Mk1eUbRc1Gu8krVMiu8m/2Ruf8q7lpEXrLK0S7IEU4iSv7xIKGMFv2uvppTK4=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1759595901; c=relaxed/simple;
	bh=+uh6o7kvy+9CcrajMWefwEZpbwtucvgotNpOcbokxjI=;
	h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=LdoQcfKlnx+3YBWXsHZe5zM3sKv+ecmeJaI3rVmZBaqSyEAPWLnQYpDh8jrj15vl8mqgNAKXwarsCem6WLpuN9F6it1k508gLaXUUJXKP5Vi4EAMDfu8cY1bK3MJC5H2bMOxphgCGCHIvkbC2bThjVGs0zY7F7zppBsPUjBjBvA=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=FSc9/Ltv; arc=none smtp.client-ip=198.137.202.133
Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="FSc9/Ltv"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding:
	Content-Type:MIME-Version:Message-ID:Date:Subject:Cc:To:From:Reply-To:
	Content-ID:Content-Description:In-Reply-To:References;
	bh=5ej4t4SpZiKaudHz57Clle2hesm5xdINYtmlNl2EYeM=; b=FSc9/LtvQ+guSgicCOb0jG3UNo
	wWvubmjXMS9pVJUpWGNLW8lWQ7MJYYCvih/xnpKD71mXdwD6gE4o2WNN+X5QVg3DxzhLJVu32KhSE
	LkHNRNBrrqq5cpSOEHUe25FjdZgLeYQrYjwdl9WByow75QyH5xQ1AYAIwfRPPWJ9Pt3gPJY46l5Zk
	OomadwtuZZYeHJL9zdcpyjxN50L+oJYLmoMyDcG0bIO4JhY8bfVmgXncijR1jLAx2vtbfwhMql+n8
	q4mwmn5p8zaGCsYtPtDzSuKDBV/eStfjsDNA7Rr1iSpqxbTn7TiUrl0bPJXJX0x1atSgJd/kh9IWy
	zxRbjYKw==;
Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux))
	id 1v55Gz-0000000DrK8-3e7K;
	Sat, 04 Oct 2025 16:38:17 +0000
From: Luis Chamberlain <mcgrof@kernel.org>
To: Chuck Lever <cel@kernel.org>,
	Daniel Gomez <da.gomez@kruces.com>,
	kdevops@lists.linux.dev
Cc: Devasena Inupakutika <devasena.i@samsung.com>,
	DongjooSeo <dongjoo.seo1@samsung.com>,
	Joel Fernandes <Joelagnelf@nvidia.com>,
	Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH v2 0/4] vLLM and the vLLM production stack
Date: Sat,  4 Oct 2025 09:38:10 -0700
Message-ID: <20251004163816.3303237-1-mcgrof@kernel.org>
X-Mailer: git-send-email 2.51.0
Precedence: bulk
X-Mailing-List: kdevops@lists.linux.dev
List-Id: <kdevops.lists.linux.dev>
List-Subscribe: <mailto:kdevops+subscribe@lists.linux.dev>
List-Unsubscribe: <mailto:kdevops+unsubscribe@lists.linux.dev>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Sender: Luis Chamberlain <mcgrof@infradead.org>

This adds initial vLLM and vLLM production stack support on kdevops.

This v2 series augments vLLM support for real CPUs on bare metal using
the DECLARE_HOSTS and also goes tested against a real GPU on the cloud,
showing that essentially now anyone can use the vLLM production stack on
any cloud provider we support in a flash. All we need are the instances
which have GPUs added, and for that we expect growth soon using dynamic
kconfig support.

Demo results of the temporary quick benchmark for all cases, GPUs, CPUs,
and VMs are here:

https://github.com/mcgrof/demo-vllm-benchmark

We will expand support soon for synthetic engines, so we can stress test
vLLM routing without the overhead of any real hardware. We then need to
expand the scope of testing using the vLLM benchmarks and graphing them.

One of the benefits of all this is we can support *upstream kernel*
changes and automatic testing of vLLM for compute in any complex way
we can think of. Upstream kernels are not a requriement, we just support
that. We also support AB testing since we already provide support for
that, meaning folks can do AB testing with two different kernels.

This should be enough to kick the tires, and scale real production AI
workloads on kdevops.

Since the first v1 patch already passed CI testing on kdevops, I'm
posting this just as formality and will soon be merging this.

Luis Chamberlain (4):
  workflows: Add vLLM workflow for LLM inference and production
    deployment
  vllm: Add DECLARE_HOSTS support for bare metal and existing
    infrastructure
  vllm: Add GPU-enabled defconfig with compatibility documentation
  defconfigs: Add composable fragments for Lambda Labs vLLM deployment

 .gitignore                                    |   1 +
 PROMPTS.md                                    |  31 +
 README.md                                     |  26 +-
 .../configs/lambdalabs-gpu-1x-a10.config      |   8 +
 .../configs/vllm-production-stack-gpu.config  |  61 ++
 defconfigs/lambdalabs-vllm-gpu-1x-a10         | 103 +++
 defconfigs/vllm                               |  40 +
 defconfigs/vllm-declared-hosts                |  53 ++
 defconfigs/vllm-production-stack-cpu          |  45 ++
 .../vllm-production-stack-declared-hosts      |  66 ++
 .../vllm-production-stack-declared-hosts-gpu  | 118 +++
 defconfigs/vllm-quick-test                    |  42 ++
 kconfigs/Kconfig.libvirt                      |   3 +
 kconfigs/workflows/Kconfig                    |  28 +
 playbooks/roles/gen_hosts/defaults/main.yml   |   1 +
 playbooks/roles/gen_hosts/tasks/main.yml      |  15 +
 .../gen_hosts/templates/workflows/vllm.j2     |  65 ++
 playbooks/roles/gen_nodes/defaults/main.yml   |   1 +
 playbooks/roles/gen_nodes/tasks/main.yml      |  36 +
 playbooks/roles/linux-mirror/tasks/main.yml   |   1 +
 playbooks/roles/vllm/defaults/main.yml        |  17 +
 .../roles/vllm/tasks/cleanup-bare-metal.yml   | 110 +++
 .../vllm/tasks/configure-docker-data.yml      | 187 +++++
 .../roles/vllm/tasks/deploy-bare-metal.yml    | 281 +++++++
 playbooks/roles/vllm/tasks/deploy-docker.yml  | 105 +++
 .../vllm/tasks/deploy-production-stack.yml    | 252 +++++++
 .../vllm/tasks/install-deps/debian/main.yml   | 101 +++
 .../roles/vllm/tasks/install-deps/main.yml    |  12 +
 .../vllm/tasks/install-deps/redhat/main.yml   | 108 +++
 .../vllm/tasks/install-deps/suse/main.yml     |  50 ++
 playbooks/roles/vllm/tasks/main.yml           | 362 +++++++++
 playbooks/roles/vllm/tasks/setup-helm.yml     |  33 +
 .../roles/vllm/tasks/setup-kubernetes.yml     | 307 ++++++++
 .../roles/vllm/templates/vllm-benchmark.py.j2 | 152 ++++
 .../vllm/templates/vllm-container.service.j2  |  80 ++
 .../vllm/templates/vllm-deployment.yaml.j2    |  94 +++
 .../vllm/templates/vllm-helm-values.yaml.j2   |  63 ++
 .../vllm-prod-stack-official-values.yaml.j2   | 154 ++++
 .../templates/vllm-upstream-values.yaml.j2    | 151 ++++
 .../roles/vllm/templates/vllm-visualize.py.j2 | 434 +++++++++++
 playbooks/vllm.yml                            |  12 +
 scripts/vllm-quick-test.sh                    | 191 +++++
 scripts/vllm-status-summary.py                | 404 ++++++++++
 workflows/Makefile                            |   4 +
 workflows/vllm/Kconfig                        | 699 ++++++++++++++++++
 workflows/vllm/Makefile                       | 136 ++++
 workflows/vllm/README.md                      | 522 +++++++++++++
 47 files changed, 5763 insertions(+), 2 deletions(-)
 create mode 100644 defconfigs/configs/lambdalabs-gpu-1x-a10.config
 create mode 100644 defconfigs/configs/vllm-production-stack-gpu.config
 create mode 100644 defconfigs/lambdalabs-vllm-gpu-1x-a10
 create mode 100644 defconfigs/vllm
 create mode 100644 defconfigs/vllm-declared-hosts
 create mode 100644 defconfigs/vllm-production-stack-cpu
 create mode 100644 defconfigs/vllm-production-stack-declared-hosts
 create mode 100644 defconfigs/vllm-production-stack-declared-hosts-gpu
 create mode 100644 defconfigs/vllm-quick-test
 create mode 100644 playbooks/roles/gen_hosts/templates/workflows/vllm.j2
 create mode 100644 playbooks/roles/vllm/defaults/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/cleanup-bare-metal.yml
 create mode 100644 playbooks/roles/vllm/tasks/configure-docker-data.yml
 create mode 100644 playbooks/roles/vllm/tasks/deploy-bare-metal.yml
 create mode 100644 playbooks/roles/vllm/tasks/deploy-docker.yml
 create mode 100644 playbooks/roles/vllm/tasks/deploy-production-stack.yml
 create mode 100644 playbooks/roles/vllm/tasks/install-deps/debian/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/install-deps/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/install-deps/redhat/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/install-deps/suse/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/setup-helm.yml
 create mode 100644 playbooks/roles/vllm/tasks/setup-kubernetes.yml
 create mode 100644 playbooks/roles/vllm/templates/vllm-benchmark.py.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-container.service.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-deployment.yaml.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-helm-values.yaml.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-prod-stack-official-values.yaml.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-upstream-values.yaml.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-visualize.py.j2
 create mode 100644 playbooks/vllm.yml
 create mode 100755 scripts/vllm-quick-test.sh
 create mode 100755 scripts/vllm-status-summary.py
 create mode 100644 workflows/vllm/Kconfig
 create mode 100644 workflows/vllm/Makefile
 create mode 100644 workflows/vllm/README.md

-- 
2.51.0