public inbox for kdevops@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH v2 0/4] vLLM and the vLLM production stack
@ 2025-10-04 16:38 Luis Chamberlain
  2025-10-04 16:38 ` [PATCH v2 1/4] workflows: Add vLLM workflow for LLM inference and production deployment Luis Chamberlain
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Luis Chamberlain @ 2025-10-04 16:38 UTC (permalink / raw)
  To: Chuck Lever, Daniel Gomez, kdevops
  Cc: Devasena Inupakutika, DongjooSeo, Joel Fernandes,
	Luis Chamberlain

This adds initial vLLM and vLLM production stack support on kdevops.

This v2 series augments vLLM support for real CPUs on bare metal using
the DECLARE_HOSTS and also goes tested against a real GPU on the cloud,
showing that essentially now anyone can use the vLLM production stack on
any cloud provider we support in a flash. All we need are the instances
which have GPUs added, and for that we expect growth soon using dynamic
kconfig support.

Demo results of the temporary quick benchmark for all cases, GPUs, CPUs,
and VMs are here:

https://github.com/mcgrof/demo-vllm-benchmark

We will expand support soon for synthetic engines, so we can stress test
vLLM routing without the overhead of any real hardware. We then need to
expand the scope of testing using the vLLM benchmarks and graphing them.

One of the benefits of all this is we can support *upstream kernel*
changes and automatic testing of vLLM for compute in any complex way
we can think of. Upstream kernels are not a requriement, we just support
that. We also support AB testing since we already provide support for
that, meaning folks can do AB testing with two different kernels.

This should be enough to kick the tires, and scale real production AI
workloads on kdevops.

Since the first v1 patch already passed CI testing on kdevops, I'm
posting this just as formality and will soon be merging this.

Luis Chamberlain (4):
  workflows: Add vLLM workflow for LLM inference and production
    deployment
  vllm: Add DECLARE_HOSTS support for bare metal and existing
    infrastructure
  vllm: Add GPU-enabled defconfig with compatibility documentation
  defconfigs: Add composable fragments for Lambda Labs vLLM deployment

 .gitignore                                    |   1 +
 PROMPTS.md                                    |  31 +
 README.md                                     |  26 +-
 .../configs/lambdalabs-gpu-1x-a10.config      |   8 +
 .../configs/vllm-production-stack-gpu.config  |  61 ++
 defconfigs/lambdalabs-vllm-gpu-1x-a10         | 103 +++
 defconfigs/vllm                               |  40 +
 defconfigs/vllm-declared-hosts                |  53 ++
 defconfigs/vllm-production-stack-cpu          |  45 ++
 .../vllm-production-stack-declared-hosts      |  66 ++
 .../vllm-production-stack-declared-hosts-gpu  | 118 +++
 defconfigs/vllm-quick-test                    |  42 ++
 kconfigs/Kconfig.libvirt                      |   3 +
 kconfigs/workflows/Kconfig                    |  28 +
 playbooks/roles/gen_hosts/defaults/main.yml   |   1 +
 playbooks/roles/gen_hosts/tasks/main.yml      |  15 +
 .../gen_hosts/templates/workflows/vllm.j2     |  65 ++
 playbooks/roles/gen_nodes/defaults/main.yml   |   1 +
 playbooks/roles/gen_nodes/tasks/main.yml      |  36 +
 playbooks/roles/linux-mirror/tasks/main.yml   |   1 +
 playbooks/roles/vllm/defaults/main.yml        |  17 +
 .../roles/vllm/tasks/cleanup-bare-metal.yml   | 110 +++
 .../vllm/tasks/configure-docker-data.yml      | 187 +++++
 .../roles/vllm/tasks/deploy-bare-metal.yml    | 281 +++++++
 playbooks/roles/vllm/tasks/deploy-docker.yml  | 105 +++
 .../vllm/tasks/deploy-production-stack.yml    | 252 +++++++
 .../vllm/tasks/install-deps/debian/main.yml   | 101 +++
 .../roles/vllm/tasks/install-deps/main.yml    |  12 +
 .../vllm/tasks/install-deps/redhat/main.yml   | 108 +++
 .../vllm/tasks/install-deps/suse/main.yml     |  50 ++
 playbooks/roles/vllm/tasks/main.yml           | 362 +++++++++
 playbooks/roles/vllm/tasks/setup-helm.yml     |  33 +
 .../roles/vllm/tasks/setup-kubernetes.yml     | 307 ++++++++
 .../roles/vllm/templates/vllm-benchmark.py.j2 | 152 ++++
 .../vllm/templates/vllm-container.service.j2  |  80 ++
 .../vllm/templates/vllm-deployment.yaml.j2    |  94 +++
 .../vllm/templates/vllm-helm-values.yaml.j2   |  63 ++
 .../vllm-prod-stack-official-values.yaml.j2   | 154 ++++
 .../templates/vllm-upstream-values.yaml.j2    | 151 ++++
 .../roles/vllm/templates/vllm-visualize.py.j2 | 434 +++++++++++
 playbooks/vllm.yml                            |  12 +
 scripts/vllm-quick-test.sh                    | 191 +++++
 scripts/vllm-status-summary.py                | 404 ++++++++++
 workflows/Makefile                            |   4 +
 workflows/vllm/Kconfig                        | 699 ++++++++++++++++++
 workflows/vllm/Makefile                       | 136 ++++
 workflows/vllm/README.md                      | 522 +++++++++++++
 47 files changed, 5763 insertions(+), 2 deletions(-)
 create mode 100644 defconfigs/configs/lambdalabs-gpu-1x-a10.config
 create mode 100644 defconfigs/configs/vllm-production-stack-gpu.config
 create mode 100644 defconfigs/lambdalabs-vllm-gpu-1x-a10
 create mode 100644 defconfigs/vllm
 create mode 100644 defconfigs/vllm-declared-hosts
 create mode 100644 defconfigs/vllm-production-stack-cpu
 create mode 100644 defconfigs/vllm-production-stack-declared-hosts
 create mode 100644 defconfigs/vllm-production-stack-declared-hosts-gpu
 create mode 100644 defconfigs/vllm-quick-test
 create mode 100644 playbooks/roles/gen_hosts/templates/workflows/vllm.j2
 create mode 100644 playbooks/roles/vllm/defaults/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/cleanup-bare-metal.yml
 create mode 100644 playbooks/roles/vllm/tasks/configure-docker-data.yml
 create mode 100644 playbooks/roles/vllm/tasks/deploy-bare-metal.yml
 create mode 100644 playbooks/roles/vllm/tasks/deploy-docker.yml
 create mode 100644 playbooks/roles/vllm/tasks/deploy-production-stack.yml
 create mode 100644 playbooks/roles/vllm/tasks/install-deps/debian/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/install-deps/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/install-deps/redhat/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/install-deps/suse/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/main.yml
 create mode 100644 playbooks/roles/vllm/tasks/setup-helm.yml
 create mode 100644 playbooks/roles/vllm/tasks/setup-kubernetes.yml
 create mode 100644 playbooks/roles/vllm/templates/vllm-benchmark.py.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-container.service.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-deployment.yaml.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-helm-values.yaml.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-prod-stack-official-values.yaml.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-upstream-values.yaml.j2
 create mode 100644 playbooks/roles/vllm/templates/vllm-visualize.py.j2
 create mode 100644 playbooks/vllm.yml
 create mode 100755 scripts/vllm-quick-test.sh
 create mode 100755 scripts/vllm-status-summary.py
 create mode 100644 workflows/vllm/Kconfig
 create mode 100644 workflows/vllm/Makefile
 create mode 100644 workflows/vllm/README.md

-- 
2.51.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-10-10 16:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-04 16:38 [PATCH v2 0/4] vLLM and the vLLM production stack Luis Chamberlain
2025-10-04 16:38 ` [PATCH v2 1/4] workflows: Add vLLM workflow for LLM inference and production deployment Luis Chamberlain
2025-10-04 16:38 ` [PATCH v2 2/4] vllm: Add DECLARE_HOSTS support for bare metal and existing infrastructure Luis Chamberlain
2025-10-04 16:38 ` [PATCH v2 3/4] vllm: Add GPU-enabled defconfig with compatibility documentation Luis Chamberlain
2025-10-04 16:38 ` [PATCH v2 4/4] defconfigs: Add composable fragments for Lambda Labs vLLM deployment Luis Chamberlain
2025-10-04 16:39 ` [PATCH v2 0/4] vLLM and the vLLM production stack Luis Chamberlain
2025-10-04 16:55 ` Chuck Lever
2025-10-04 17:03   ` Luis Chamberlain
2025-10-04 17:14     ` Chuck Lever
2025-10-08 17:46       ` Chuck Lever
2025-10-10  0:55         ` Luis Chamberlain
2025-10-10 12:38           ` Chuck Lever
2025-10-10 16:20             ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox