[PATCH v2 0/4] vLLM and the vLLM production stack

public inbox for kdevops@lists.linux.dev
 help / color / mirror / Atom feed

From: Luis Chamberlain <mcgrof@kernel.org>
To: Chuck Lever <cel@kernel.org>, Daniel Gomez <da.gomez@kruces.com>,
	kdevops@lists.linux.dev
Cc: Devasena Inupakutika <devasena.i@samsung.com>,
	DongjooSeo <dongjoo.seo1@samsung.com>,
	Joel Fernandes <Joelagnelf@nvidia.com>,
	Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH v2 0/4] vLLM and the vLLM production stack
Date: Sat,  4 Oct 2025 09:38:10 -0700	[thread overview]
Message-ID: <20251004163816.3303237-1-mcgrof@kernel.org> (raw)

This adds initial vLLM and vLLM production stack support on kdevops.

This v2 series augments vLLM support for real CPUs on bare metal using
the DECLARE_HOSTS and also goes tested against a real GPU on the cloud,
showing that essentially now anyone can use the vLLM production stack on
any cloud provider we support in a flash. All we need are the instances
which have GPUs added, and for that we expect growth soon using dynamic
kconfig support.

Demo results of the temporary quick benchmark for all cases, GPUs, CPUs,
and VMs are here:

https://github.com/mcgrof/demo-vllm-benchmark

We will expand support soon for synthetic engines, so we can stress test
vLLM routing without the overhead of any real hardware. We then need to
expand the scope of testing using the vLLM benchmarks and graphing them.

One of the benefits of all this is we can support *upstream kernel*
changes and automatic testing of vLLM for compute in any complex way
we can think of. Upstream kernels are not a requriement, we just support
that. We also support AB testing since we already provide support for
that, meaning folks can do AB testing with two different kernels.

This should be enough to kick the tires, and scale real production AI
workloads on kdevops.

Since the first v1 patch already passed CI testing on kdevops, I'm
posting this just as formality and will soon be merging this.

Luis Chamberlain (4):
  workflows: Add vLLM workflow for LLM inference and production
    deployment
  vllm: Add DECLARE_HOSTS support for bare metal and existing
    infrastructure
  vllm: Add GPU-enabled defconfig with compatibility documentation
  defconfigs: Add composable fragments for Lambda Labs vLLM deployment

 .gitignore                                    |   1 +
 PROMPTS.md                                    |  31 +
 README.md                                     |  26 +-
 .../configs/lambdalabs-gpu-1x-a10.config      |   8 +
 .../configs/vllm-production-stack-gpu.config  |  61 ++
 defconfigs/lambdalabs-vllm-gpu-1x-a10         | 103 +++
 defconfigs/vllm                               |  40 +
 defconfigs/vllm-declared-hosts                |  53 ++
 defconfigs/vllm-production-stack-cpu          |  45 ++
 .../vllm-production-stack-declared-hosts      |  66 ++
 .../vllm-production-stack-declared-hosts-gpu  | 118 +++
 defconfigs/vllm-quick-test                    |  42 ++
 kconfigs/Kconfig.libvirt                      |   3 +
 kconfigs/workflows/Kconfig                    |  28 +
 playbooks/roles/gen_hosts/defaults/main.yml   |   1 +
 playbooks/roles/gen_hosts/tasks/main.yml      |  15 +
 .../gen_hosts/templates/workflows/vllm.j2     |  65 ++
 playbooks/roles/gen_nodes/defaults/main.yml   |   1 +
 playbooks/roles/gen_nodes/tasks/main.yml      |  36 +
 playbooks/roles/linux-mirror/tasks/main.yml   |   1 +
 playbooks/roles/vllm/defaults/main.yml        |  17 +
 .../roles/vllm/tasks/cleanup-bare-metal.yml   | 110 +++
 .../vllm/tasks/configure-docker-data.yml      | 187 +++++
 .../roles/vllm/tasks/deploy-bare-metal.yml    | 281 +++++++
 playbooks/roles/vllm/tasks/deploy-docker.yml  | 105 +++
 .../vllm/tasks/deploy-production-stack.yml    | 252 +++++++
 .../vllm/tasks/install-deps/debian/main.yml   | 101 +++
 .../roles/vllm/tasks/install-deps/main.yml    |  12 +
 .../vllm/tasks/install-deps/redhat/main.yml   | 108 +++
 .../vllm/tasks/install-deps/suse/main.yml     |  50 ++
 playbooks/roles/vllm/tasks/main.yml           | 362 +++++++++
 playbooks/roles/vllm/tasks/setup-helm.yml     |  33 +
 .../roles/vllm/tasks/setup-kubernetes.yml     | 307 ++++++++
 .../roles/vllm/templates/vllm-benchmark.py.j2 | 152 ++++
 .../vllm/templates/vllm-container.service.j2  |  80 ++
 .../vllm/templates/vllm-deployment.yaml.j2    |  94 +++
 .../vllm/templates/vllm-helm-values.yaml.j2   |  63 ++
 .../vllm-prod-stack-official-values.yaml.j2   | 154 ++++
 .../templates/vllm-upstream-values.yaml.j2    | 151 ++++
 .../roles/vllm/templates/vllm-visualize.py.j2 | 434 +++++++++++
 playbooks/vllm.yml                            |  12 +
 scripts/vllm-quick-test.sh                    | 191 +++++
 scripts/vllm-status-summary.py                | 404 ++++++++++
 workflows/Makefile                            |   4 +
 workflows/vllm/Kconfig                        | 699 ++++++++++++++++++
 workflows/vllm/Makefile                       | 136 ++++
 workflows/vllm/README.md                      | 522 +++++++++++++
 47 files changed, 5763 insertions(+), 2 deletions(-)
 create mode 100644 defconfigs/configs/lambdalabs-gpu-1x-a10.config
 create mode 100644 defconfigs/configs/vllm-production-stack-gpu.config
 create mode 100644 defconfigs/lambdalabs-vllm-gpu-1x-a10
 create mode 100644 defconfigs/vllm
 create mode 100644 defconfigs/vllm-declared-hosts
 create mode 100644 defconfigs/vllm-production-stack-cpu
 create mode 100644 defconfigs/vllm-production-stack-declared-hosts
 create mode 100644 defconfigs/vllm-production-stack-declared-hosts-gpu
 create mode 100644 defconfigs/vllm-quick-test
 create mode 100644 playbooks/roles/gen_hosts/templates/workflows/vllm.j2
 create mode 100644 playbooks/roles/vllm/defaults/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/cleanup-bare-metal.yml
 create mode 100644 playbooks/roles/vllm/tasks/configure-docker-data.yml
 create mode 100644 playbooks/roles/vllm/tasks/deploy-bare-metal.yml
 create mode 100644 playbooks/roles/vllm/tasks/deploy-docker.yml
 create mode 100644 playbooks/roles/vllm/tasks/deploy-production-stack.yml
 create mode 100644 playbooks/roles/vllm/tasks/install-deps/debian/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/install-deps/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/install-deps/redhat/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/install-deps/suse/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/setup-helm.yml
 create mode 100644 playbooks/roles/vllm/tasks/setup-kubernetes.yml
 create mode 100644 playbooks/roles/vllm/templates/vllm-benchmark.py.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-container.service.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-deployment.yaml.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-helm-values.yaml.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-prod-stack-official-values.yaml.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-upstream-values.yaml.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-visualize.py.j2
 create mode 100644 playbooks/vllm.yml
 create mode 100755 scripts/vllm-quick-test.sh
 create mode 100755 scripts/vllm-status-summary.py
 create mode 100644 workflows/vllm/Kconfig
 create mode 100644 workflows/vllm/Makefile
 create mode 100644 workflows/vllm/README.md

-- 
2.51.0

next             reply	other threads:[~2025-10-04 16:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-04 16:38 Luis Chamberlain [this message]
2025-10-04 16:38 ` [PATCH v2 1/4] workflows: Add vLLM workflow for LLM inference and production deployment Luis Chamberlain
2025-10-04 16:38 ` [PATCH v2 2/4] vllm: Add DECLARE_HOSTS support for bare metal and existing infrastructure Luis Chamberlain
2025-10-04 16:38 ` [PATCH v2 3/4] vllm: Add GPU-enabled defconfig with compatibility documentation Luis Chamberlain
2025-10-04 16:38 ` [PATCH v2 4/4] defconfigs: Add composable fragments for Lambda Labs vLLM deployment Luis Chamberlain
2025-10-04 16:39 ` [PATCH v2 0/4] vLLM and the vLLM production stack Luis Chamberlain
2025-10-04 16:55 ` Chuck Lever
2025-10-04 17:03   ` Luis Chamberlain
2025-10-04 17:14     ` Chuck Lever
2025-10-08 17:46       ` Chuck Lever
2025-10-10  0:55         ` Luis Chamberlain
2025-10-10 12:38           ` Chuck Lever
2025-10-10 16:20             ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251004163816.3303237-1-mcgrof@kernel.org \
    --to=mcgrof@kernel.org \
    --cc=Joelagnelf@nvidia.com \
    --cc=cel@kernel.org \
    --cc=da.gomez@kruces.com \
    --cc=devasena.i@samsung.com \
    --cc=dongjoo.seo1@samsung.com \
    --cc=kdevops@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox