From: Luis Chamberlain <mcgrof@kernel.org>
To: Chuck Lever <cel@kernel.org>, Daniel Gomez <da.gomez@kruces.com>,
kdevops@lists.linux.dev
Cc: Devasena Inupakutika <devasena.i@samsung.com>,
DongjooSeo <dongjoo.seo1@samsung.com>,
Joel Fernandes <Joelagnelf@nvidia.com>,
Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH v2 2/4] vllm: Add DECLARE_HOSTS support for bare metal and existing infrastructure
Date: Sat, 4 Oct 2025 09:38:12 -0700 [thread overview]
Message-ID: <20251004163816.3303237-3-mcgrof@kernel.org> (raw)
In-Reply-To: <20251004163816.3303237-1-mcgrof@kernel.org>
Following the pattern established by the MinIO workflow in commit
533be4c716d1 ("minio: add MinIO Warp S3 benchmarking with declared
hosts support"), add DECLARE_HOSTS support to the vLLM workflow to
enable testing on pre-existing infrastructure including bare metal
servers with GPUs.
This enables users to leverage existing GPU infrastructure without
requiring kdevops to provision new systems. Two new defconfigs are
provided for different deployment scenarios.
New defconfigs:
1. defconfig-vllm-declared-hosts
- Bare metal deployment using Docker containers
- Targets single-node GPU servers
- Uses systemd service management for vLLM
- Configurable GPU type (nvidia-a100, etc.) and count
- Direct port 8000 access without Kubernetes overhead
- Suitable for direct hardware access scenarios
2. defconfig-vllm-production-stack-declared-hosts
- Production Stack deployment on existing Kubernetes clusters
- Full Production Stack with router and monitoring
- Autoscaling support (2-5 replicas)
- Grafana/Prometheus observability stack
- Suitable for production GPU clusters
Both configurations automatically:
- Set CONFIG_SKIP_BRINGUP=y to skip infrastructure provisioning
- Set CONFIG_KDEVOPS_USE_DECLARED_HOSTS=y for pre-existing systems
- Enable benchmarking for performance validation
- Support HuggingFace model deployment (default: facebook/opt-125m)
Implementation changes:
Bare-metal deployment improvements (deploy-bare-metal.yml):
- Remove legacy kubectl/minikube/helm installation for bare-metal
- Implement Docker image mirror fallback (try mirror, fall back to public)
- Replace handler-based service restart with direct systemd management
- Make config file creation optional (template may not exist)
- Support both Docker and Podman container runtimes
- Automatic GPU detection and container image selection (CPU vs GPU)
Main task routing (main.yml):
- Remove 260+ line legacy deployment block that was causing unwanted
Kubernetes installation even with when: false due to tags override
- Change configure-docker-data.yml to only run for Kubernetes deployments
- Convert deployment method routing from import_tasks to include_tasks
with apply parameter to properly respect when conditions and tags
- Add bare-metal specific conditions to benchmark tasks (skip kubectl
port-forward, connect directly to port 8000)
- Add bare-metal specific conditions to monitoring tasks (skip Kubernetes
service queries, show systemd journal instructions instead)
- Add new bare-metal monitoring info task with journalctl instructions
Cleanup support (cleanup-bare-metal.yml - NEW):
- Add vllm-cleanup target: Remove containers and systemd services
- Add vllm-cleanup-full target: Also remove kubectl/helm/minikube binaries
- Add vllm-cleanup-purge target: Complete purge including data directories
- Essential for declared hosts since 'make destroy' doesn't apply
Testing improvements (vllm-quick-test.sh):
- Detect CONFIG_KDEVOPS_USE_DECLARED_HOSTS for declared hosts mode
- Detect CONFIG_VLLM_BARE_METAL for deployment type
- Read actual hostnames from kdevops_declared_hosts in extra_vars.yaml
- Support comma-separated host lists for multiple declared hosts
- Skip kubectl port-forward setup for bare-metal deployments
- Direct connection to port 8000 for bare-metal API access
- Maintain backward compatibility with provisioned VMs
Makefile additions (workflows/vllm/Makefile):
- Add vllm-cleanup target for basic cleanup
- Add vllm-cleanup-full target for complete cleanup with binaries
- Add vllm-cleanup-purge target for purging all data
- Update help text for new cleanup targets
Dependency fixes (install-deps/debian/main.yml):
- Make kubectl/minikube installation conditional on deployment type
- Skip Kubernetes tools for bare-metal deployments
Example usage for bare metal GPU server:
make defconfig-vllm-declared-hosts DECLARE_HOSTS=gpu-server-01
make
make vllm # Deploy vLLM as systemd service
make vllm-quick-test # Verify API endpoint
make vllm-benchmark # Run performance benchmarks
make vllm-cleanup # Clean up when done
Example usage for existing Kubernetes cluster:
make defconfig-vllm-production-stack-declared-hosts DECLARE_HOSTS=k8s-cluster
make
make vllm # Deploy via Helm
make vllm-status # Check deployment status
make vllm-monitor # Access Grafana/Prometheus
make vllm-cleanup # Clean up namespace
Key architectural decisions:
1. Avoid fragile hostvars access patterns - use configuration variables
that are globally accessible across execution contexts (localhost vs
target nodes)
2. Use include_tasks instead of import_tasks for conditional execution
since import_tasks is static and evaluated at parse time, while
include_tasks is dynamic and respects when conditions
3. Apply tags properly to included tasks using the apply parameter,
otherwise tags only apply to the include statement itself
4. Implement graceful fallbacks for infrastructure dependencies (Docker
mirror → public registry, Kubernetes → bare-metal)
5. Provide cleanup targets for declared hosts since standard 'make destroy'
only applies to provisioned infrastructure
This implementation mirrors the approach used for MinIO declared hosts
support and enables vLLM testing on any infrastructure where GPUs are
available, whether bare metal servers or existing Kubernetes clusters.
Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
defconfigs/vllm-declared-hosts | 53 +++
.../vllm-production-stack-declared-hosts | 66 ++++
.../roles/vllm/tasks/cleanup-bare-metal.yml | 110 +++++++
.../roles/vllm/tasks/deploy-bare-metal.yml | 116 +++++--
.../vllm/tasks/install-deps/debian/main.yml | 4 +-
playbooks/roles/vllm/tasks/main.yml | 311 +++---------------
playbooks/vllm.yml | 1 +
scripts/vllm-quick-test.sh | 58 +++-
workflows/vllm/Makefile | 26 +-
9 files changed, 421 insertions(+), 324 deletions(-)
create mode 100644 defconfigs/vllm-declared-hosts
create mode 100644 defconfigs/vllm-production-stack-declared-hosts
create mode 100644 playbooks/roles/vllm/tasks/cleanup-bare-metal.yml
diff --git a/defconfigs/vllm-declared-hosts b/defconfigs/vllm-declared-hosts
new file mode 100644
index 00000000..bd475e9f
--- /dev/null
+++ b/defconfigs/vllm-declared-hosts
@@ -0,0 +1,53 @@
+#
+# vLLM with declared hosts (bare metal or pre-existing infrastructure)
+#
+# Automatically generated file; DO NOT EDIT.
+# kdevops 5.0.2 Configuration
+#
+CONFIG_WORKFLOWS=y
+CONFIG_WORKFLOWS_TESTS=y
+CONFIG_WORKFLOWS_LINUX_TESTS=y
+CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
+CONFIG_KDEVOPS_WORKFLOW_DEDICATE_VLLM=y
+
+# Skip bringup for declared hosts
+CONFIG_SKIP_BRINGUP=y
+CONFIG_KDEVOPS_USE_DECLARED_HOSTS=y
+
+# vLLM specific configuration - Using bare metal deployment for declared hosts
+CONFIG_VLLM_BARE_METAL=y
+CONFIG_VLLM_BARE_METAL_USE_CONTAINER=y
+CONFIG_VLLM_BARE_METAL_DOCKER=y
+CONFIG_VLLM_BARE_METAL_SERVICE_NAME="vllm"
+CONFIG_VLLM_BARE_METAL_DATA_DIR="/var/lib/vllm"
+CONFIG_VLLM_BARE_METAL_LOG_DIR="/var/log/vllm"
+
+# GPU configuration for declared hosts
+CONFIG_VLLM_BARE_METAL_DECLARE_HOST_GPU_TYPE="nvidia-a100"
+CONFIG_VLLM_BARE_METAL_DECLARE_HOST_GPU_COUNT=1
+
+# Model configuration
+CONFIG_VLLM_MODEL_URL="facebook/opt-125m"
+CONFIG_VLLM_MODEL_NAME="opt-125m"
+
+# Engine configuration
+CONFIG_VLLM_VERSION_STABLE=y
+CONFIG_VLLM_ENGINE_IMAGE_TAG="v0.10.2"
+CONFIG_VLLM_REQUEST_CPU=8
+CONFIG_VLLM_REQUEST_MEMORY="16Gi"
+CONFIG_VLLM_REQUEST_GPU=1
+CONFIG_VLLM_MAX_MODEL_LEN=2048
+CONFIG_VLLM_DTYPE="auto"
+CONFIG_VLLM_GPU_MEMORY_UTILIZATION="0.9"
+CONFIG_VLLM_TENSOR_PARALLEL_SIZE=1
+
+# API configuration
+CONFIG_VLLM_API_PORT=8000
+CONFIG_VLLM_API_KEY=""
+CONFIG_VLLM_HF_TOKEN=""
+
+# Benchmarking
+CONFIG_VLLM_BENCHMARK_ENABLED=y
+CONFIG_VLLM_BENCHMARK_DURATION=60
+CONFIG_VLLM_BENCHMARK_CONCURRENT_USERS=10
+CONFIG_VLLM_BENCHMARK_RESULTS_DIR="/data/vllm-benchmark"
diff --git a/defconfigs/vllm-production-stack-declared-hosts b/defconfigs/vllm-production-stack-declared-hosts
new file mode 100644
index 00000000..b9406807
--- /dev/null
+++ b/defconfigs/vllm-production-stack-declared-hosts
@@ -0,0 +1,66 @@
+#
+# vLLM Production Stack with declared hosts (bare metal with GPU)
+#
+# Automatically generated file; DO NOT EDIT.
+# kdevops 5.0.2 Configuration
+#
+CONFIG_WORKFLOWS=y
+CONFIG_WORKFLOWS_TESTS=y
+CONFIG_WORKFLOWS_LINUX_TESTS=y
+CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
+CONFIG_KDEVOPS_WORKFLOW_DEDICATE_VLLM=y
+
+# Skip bringup for declared hosts
+CONFIG_SKIP_BRINGUP=y
+CONFIG_KDEVOPS_USE_DECLARED_HOSTS=y
+
+# vLLM Production Stack with Kubernetes on declared hosts
+CONFIG_VLLM_PRODUCTION_STACK=y
+CONFIG_VLLM_K8S_EXISTING=y
+CONFIG_VLLM_VERSION_STABLE=y
+CONFIG_VLLM_ENGINE_IMAGE_TAG="v0.10.2"
+CONFIG_VLLM_HELM_RELEASE_NAME="vllm-prod"
+CONFIG_VLLM_HELM_NAMESPACE="vllm-system"
+
+# Production Stack components
+CONFIG_VLLM_PROD_STACK_REPO="https://vllm-project.github.io/production-stack"
+CONFIG_VLLM_PROD_STACK_CHART_VERSION="latest"
+CONFIG_VLLM_PROD_STACK_ROUTER_IMAGE="ghcr.io/vllm-project/production-stack/router"
+CONFIG_VLLM_PROD_STACK_ROUTER_TAG="latest"
+CONFIG_VLLM_PROD_STACK_ENABLE_MONITORING=y
+CONFIG_VLLM_PROD_STACK_ENABLE_AUTOSCALING=y
+CONFIG_VLLM_PROD_STACK_MIN_REPLICAS=2
+CONFIG_VLLM_PROD_STACK_MAX_REPLICAS=5
+CONFIG_VLLM_PROD_STACK_TARGET_GPU_UTILIZATION=80
+
+# Model configuration
+CONFIG_VLLM_MODEL_URL="facebook/opt-125m"
+CONFIG_VLLM_MODEL_NAME="opt-125m"
+
+# Engine configuration for GPU
+CONFIG_VLLM_REPLICA_COUNT=2
+CONFIG_VLLM_REQUEST_CPU=8
+CONFIG_VLLM_REQUEST_MEMORY="16Gi"
+CONFIG_VLLM_REQUEST_GPU=1
+CONFIG_VLLM_MAX_MODEL_LEN=2048
+CONFIG_VLLM_DTYPE="auto"
+CONFIG_VLLM_GPU_MEMORY_UTILIZATION="0.9"
+CONFIG_VLLM_TENSOR_PARALLEL_SIZE=1
+
+# Router and observability
+CONFIG_VLLM_ROUTER_ENABLED=y
+CONFIG_VLLM_ROUTER_ROUND_ROBIN=y
+CONFIG_VLLM_OBSERVABILITY_ENABLED=y
+CONFIG_VLLM_GRAFANA_PORT=3000
+CONFIG_VLLM_PROMETHEUS_PORT=9090
+
+# API configuration
+CONFIG_VLLM_API_PORT=8000
+CONFIG_VLLM_API_KEY=""
+CONFIG_VLLM_HF_TOKEN=""
+
+# Benchmarking
+CONFIG_VLLM_BENCHMARK_ENABLED=y
+CONFIG_VLLM_BENCHMARK_DURATION=60
+CONFIG_VLLM_BENCHMARK_CONCURRENT_USERS=10
+CONFIG_VLLM_BENCHMARK_RESULTS_DIR="/data/vllm-benchmark"
diff --git a/playbooks/roles/vllm/tasks/cleanup-bare-metal.yml b/playbooks/roles/vllm/tasks/cleanup-bare-metal.yml
new file mode 100644
index 00000000..ca6d48be
--- /dev/null
+++ b/playbooks/roles/vllm/tasks/cleanup-bare-metal.yml
@@ -0,0 +1,110 @@
+---
+# Cleanup tasks for bare metal vLLM deployment
+# Removes all installed components and data
+
+- name: Stop and remove vLLM systemd service
+ ansible.builtin.systemd:
+ name: "{{ vllm_bare_metal_service_name | default('vllm') }}"
+ state: stopped
+ enabled: no
+ become: yes
+ ignore_errors: yes
+
+- name: Remove vLLM systemd service file
+ ansible.builtin.file:
+ path: "/etc/systemd/system/{{ vllm_bare_metal_service_name | default('vllm') }}.service"
+ state: absent
+ become: yes
+
+- name: Reload systemd daemon
+ ansible.builtin.systemd:
+ daemon_reload: yes
+ become: yes
+
+- name: Stop all vLLM Docker containers
+ ansible.builtin.command:
+ cmd: docker stop $(docker ps -a -q --filter ancestor={{ vllm_bare_metal_image_final }})
+ ignore_errors: yes
+ changed_when: false
+
+- name: Remove all vLLM Docker containers
+ ansible.builtin.command:
+ cmd: docker rm $(docker ps -a -q --filter ancestor={{ vllm_bare_metal_image_final }})
+ ignore_errors: yes
+ changed_when: false
+
+- name: Remove vLLM Docker images
+ ansible.builtin.command:
+ cmd: docker rmi {{ vllm_bare_metal_image_final }}
+ ignore_errors: yes
+ changed_when: false
+
+- name: Stop minikube if running
+ ansible.builtin.command:
+ cmd: minikube stop
+ ignore_errors: yes
+ changed_when: false
+ become: no
+
+- name: Delete minikube cluster
+ ansible.builtin.command:
+ cmd: minikube delete
+ ignore_errors: yes
+ changed_when: false
+ become: no
+
+- name: Remove kubectl binary
+ ansible.builtin.file:
+ path: /usr/local/bin/kubectl
+ state: absent
+ become: yes
+ when: vllm_cleanup_remove_binaries | default(false)
+
+- name: Remove minikube binary
+ ansible.builtin.file:
+ path: /usr/local/bin/minikube
+ state: absent
+ become: yes
+ when: vllm_cleanup_remove_binaries | default(false)
+
+- name: Remove helm binary
+ ansible.builtin.file:
+ path: /usr/local/bin/helm
+ state: absent
+ become: yes
+ when: vllm_cleanup_remove_binaries | default(false)
+
+- name: Remove vLLM data directories
+ ansible.builtin.file:
+ path: "{{ item }}"
+ state: absent
+ become: yes
+ loop:
+ - "{{ vllm_bare_metal_data_dir | default('/var/lib/vllm') }}"
+ - "{{ vllm_bare_metal_log_dir | default('/var/log/vllm') }}"
+ - "{{ vllm_local_path | default('/data/vllm') }}"
+ - "{{ vllm_results_dir | default('/data/vllm/results') }}"
+ when: vllm_cleanup_remove_data | default(false)
+
+- name: Remove /data/minikube directory
+ ansible.builtin.file:
+ path: /data/minikube
+ state: absent
+ become: yes
+ when: vllm_cleanup_remove_data | default(false)
+
+- name: Display cleanup completion message
+ debug:
+ msg: |
+ vLLM bare metal cleanup completed.
+
+ Removed:
+ - vLLM systemd service
+ - vLLM Docker containers and images
+ - Minikube cluster
+
+ To also remove binaries (kubectl, minikube, helm), run:
+ make vllm-cleanup-full
+
+ To remove all data directories, run:
+ make vllm-cleanup-purge
diff --git a/playbooks/roles/vllm/tasks/deploy-bare-metal.yml b/playbooks/roles/vllm/tasks/deploy-bare-metal.yml
index 0aaea73d..425ffb04 100644
--- a/playbooks/roles/vllm/tasks/deploy-bare-metal.yml
+++ b/playbooks/roles/vllm/tasks/deploy-bare-metal.yml
@@ -47,11 +47,24 @@
set_fact:
container_runtime: "{{ 'docker' if vllm_bare_metal_docker | default(true) else 'podman' }}"
- - name: Ensure container runtime is installed
- package:
- name: "{{ container_runtime }}"
- state: present
+ - name: Ensure Docker service is started and enabled
+ ansible.builtin.systemd:
+ name: docker
+ state: started
+ enabled: yes
become: yes
+ when: container_runtime == 'docker'
+
+ - name: Add current user to docker group
+ ansible.builtin.user:
+ name: "{{ ansible_user_id }}"
+ groups: docker
+ append: yes
+ become: yes
+ when: container_runtime == 'docker'
+
+ - name: Reset connection to apply docker group membership
+ meta: reset_connection
- name: Install nvidia-container-toolkit for GPU support
when: has_nvidia_gpu
@@ -75,27 +88,57 @@
state: restarted
become: yes
- - name: Set vLLM bare metal container image with Docker mirror if enabled
+ - name: Set vLLM bare metal container images
ansible.builtin.set_fact:
- vllm_bare_metal_image_final: >-
- {%- if use_docker_mirror | default(false) | bool -%}
- {%- if not has_nvidia_gpu -%}
- localhost:{{ docker_mirror_port | default(5000) }}/vllm:v0.6.3-cpu
- {%- else -%}
- localhost:{{ docker_mirror_port | default(5000) }}/vllm-openai:latest
- {%- endif -%}
+ vllm_bare_metal_image_mirror: >-
+ {%- if not has_nvidia_gpu -%}
+ localhost:{{ docker_mirror_port | default(5000) }}/vllm:v0.6.3-cpu
{%- else -%}
- {%- if not has_nvidia_gpu -%}
- substratusai/vllm:v0.6.3-cpu
- {%- else -%}
- vllm/vllm-openai:latest
- {%- endif -%}
+ localhost:{{ docker_mirror_port | default(5000) }}/vllm-openai:latest
{%- endif -%}
+ vllm_bare_metal_image_public: >-
+ {%- if not has_nvidia_gpu -%}
+ substratusai/vllm:v0.6.3-cpu
+ {%- else -%}
+ vllm/vllm-openai:latest
+ {%- endif -%}
+
+ - name: Set initial image to try (mirror if enabled, otherwise public)
+ ansible.builtin.set_fact:
+ vllm_bare_metal_image_final: "{{ vllm_bare_metal_image_mirror if (use_docker_mirror | default(false) | bool) else vllm_bare_metal_image_public }}"
+
+ - name: Check if vLLM container image already exists
+ ansible.builtin.command:
+ cmd: "docker images -q {{ vllm_bare_metal_image_final }}"
+ register: image_exists
+ changed_when: false
+ failed_when: false
- - name: Pull vLLM container image
- community.docker.docker_image:
- name: "{{ vllm_bare_metal_image_final }}"
- source: pull
+ - name: Try pulling from Docker mirror first (if configured)
+ ansible.builtin.command:
+ cmd: "docker pull {{ vllm_bare_metal_image_mirror }}"
+ register: docker_pull_mirror
+ when:
+ - use_docker_mirror | default(false) | bool
+ - image_exists.stdout == ""
+ failed_when: false
+ changed_when: "'Downloaded' in docker_pull_mirror.stdout or 'Pull complete' in docker_pull_mirror.stdout"
+
+ - name: Fall back to public registry if mirror failed
+ ansible.builtin.command:
+ cmd: "docker pull {{ vllm_bare_metal_image_public }}"
+ register: docker_pull_public
+ when:
+ - image_exists.stdout == ""
+ - (not (use_docker_mirror | default(false) | bool)) or (docker_pull_mirror is defined and docker_pull_mirror.rc != 0)
+ changed_when: "'Downloaded' in docker_pull_public.stdout or 'Pull complete' in docker_pull_public.stdout"
+
+ - name: Update final image name if we used public registry
+ ansible.builtin.set_fact:
+ vllm_bare_metal_image_final: "{{ vllm_bare_metal_image_public }}"
+ when:
+ - docker_pull_public is defined
+ - docker_pull_public.rc == 0
- name: Create vLLM systemd service for container
template:
@@ -103,7 +146,7 @@
dest: "/etc/systemd/system/{{ vllm_bare_metal_service_name | default('vllm') }}.service"
mode: '0644'
become: yes
- notify: restart vllm
+ register: systemd_service_container
# Direct installation (pip/source)
- name: Deploy vLLM with direct installation
@@ -155,7 +198,13 @@
dest: "/etc/systemd/system/{{ vllm_bare_metal_service_name | default('vllm') }}.service"
mode: '0644'
become: yes
- notify: restart vllm
+ register: systemd_service_direct
+
+ - name: Check if vLLM configuration template exists
+ stat:
+ path: "{{ role_path }}/templates/vllm.conf.j2"
+ register: vllm_conf_template
+ delegate_to: localhost
- name: Create vLLM configuration file
template:
@@ -163,13 +212,25 @@
dest: /etc/vllm/vllm.conf
mode: '0644'
become: yes
- notify: restart vllm
+ register: vllm_config
+ when: vllm_conf_template.stat.exists
- name: Reload systemd daemon
systemd:
daemon_reload: yes
become: yes
+ - name: Restart vLLM service if configuration changed
+ systemd:
+ name: "{{ vllm_bare_metal_service_name | default('vllm') }}"
+ state: restarted
+ daemon_reload: yes
+ become: yes
+ when: >-
+ (systemd_service_container is defined and systemd_service_container.changed) or
+ (systemd_service_direct is defined and systemd_service_direct.changed) or
+ (vllm_config is defined and vllm_config.changed)
+
- name: Start and enable vLLM service
systemd:
name: "{{ vllm_bare_metal_service_name | default('vllm') }}"
@@ -218,10 +279,3 @@
- Stop: sudo systemctl stop {{ vllm_bare_metal_service_name | default('vllm') }}
- Status: sudo systemctl status {{ vllm_bare_metal_service_name | default('vllm') }}
- Logs: sudo journalctl -u {{ vllm_bare_metal_service_name | default('vllm') }} -f
-
-# Handler for restarting vLLM
-- name: restart vllm
- systemd:
- name: "{{ vllm_bare_metal_service_name | default('vllm') }}"
- state: restarted
- become: yes
diff --git a/playbooks/roles/vllm/tasks/install-deps/debian/main.yml b/playbooks/roles/vllm/tasks/install-deps/debian/main.yml
index 12a8a8e3..a7a82193 100644
--- a/playbooks/roles/vllm/tasks/install-deps/debian/main.yml
+++ b/playbooks/roles/vllm/tasks/install-deps/debian/main.yml
@@ -60,11 +60,11 @@
state: present
tags: ["vllm", "deps"]
-- name: Add kdevops user to docker group
+- name: Add current user to docker group
become: true
become_method: sudo
ansible.builtin.user:
- name: kdevops
+ name: "{{ ansible_user_id }}"
groups: docker
append: yes
tags: ["vllm", "deps", "docker-config"]
diff --git a/playbooks/roles/vllm/tasks/main.yml b/playbooks/roles/vllm/tasks/main.yml
index d6b239f4..d799cf8c 100644
--- a/playbooks/roles/vllm/tasks/main.yml
+++ b/playbooks/roles/vllm/tasks/main.yml
@@ -19,288 +19,37 @@
tags: ["vllm", "deps"]
# Configure Docker and storage to use /data partition BEFORE starting any containers
+# Only needed for Kubernetes-based deployments (docker and production-stack)
- name: Configure Docker to use /data for storage
- ansible.builtin.import_tasks: tasks/configure-docker-data.yml
+ ansible.builtin.include_tasks: tasks/configure-docker-data.yml
+ when: vllm_deployment_type | default('docker') in ['docker', 'production-stack']
tags: ["deps", "docker-config", "storage", "vllm-deploy"]
# Route to appropriate deployment method based on configuration
- name: Deploy vLLM using latest Docker images
- ansible.builtin.import_tasks: tasks/deploy-docker.yml
+ ansible.builtin.include_tasks:
+ file: tasks/deploy-docker.yml
+ apply:
+ tags: ["vllm-deploy"]
when: vllm_deployment_type | default('docker') == 'docker'
tags: ["vllm-deploy"]
- name: Deploy vLLM Production Stack with Helm
- ansible.builtin.import_tasks: tasks/deploy-production-stack.yml
+ ansible.builtin.include_tasks:
+ file: tasks/deploy-production-stack.yml
+ apply:
+ tags: ["vllm-deploy"]
when: vllm_deployment_type | default('docker') == 'production-stack'
tags: ["vllm-deploy"]
- name: Deploy vLLM on bare metal
- ansible.builtin.import_tasks: tasks/deploy-bare-metal.yml
+ ansible.builtin.include_tasks:
+ file: tasks/deploy-bare-metal.yml
+ apply:
+ tags: ["vllm-deploy"]
when: vllm_deployment_type | default('docker') == 'bare-metal'
tags: ["vllm-deploy"]
-# Legacy deployment block - will be moved to deploy-docker.yml
-- name: vLLM deployment tasks (legacy)
- tags: vllm-deploy
- when: vllm_deployment_type | default('docker') != 'production-stack'
- block:
- - name: Ensure Docker service is started and enabled
- ansible.builtin.systemd:
- name: docker
- state: started
- enabled: yes
- become: yes
-
- - name: Add current user to docker group
- ansible.builtin.user:
- name: "{{ ansible_user_id }}"
- groups: docker
- append: yes
- become: yes
-
- - name: Ensure docker socket has correct permissions
- ansible.builtin.file:
- path: /var/run/docker.sock
- mode: '0666'
- become: yes
-
- - name: Reset connection to apply docker group membership
- meta: reset_connection
-
- - name: Wait for Docker to be accessible
- ansible.builtin.wait_for:
- path: /var/run/docker.sock
- state: present
- timeout: 30
-
- - name: Test Docker access
- ansible.builtin.command:
- cmd: docker version
- register: docker_test
- become: no
- failed_when: false
- changed_when: false
- retries: 3
- delay: 2
- until: docker_test.rc == 0
-
- - name: Check if kubectl exists
- ansible.builtin.stat:
- path: /usr/local/bin/kubectl
- register: kubectl_stat
-
- - name: Get latest kubectl version
- when: not kubectl_stat.stat.exists
- ansible.builtin.uri:
- url: https://dl.k8s.io/release/stable.txt
- return_content: yes
- register: kubectl_version
-
- - name: Download kubectl
- when: not kubectl_stat.stat.exists
- ansible.builtin.get_url:
- url: "https://dl.k8s.io/release/{{ kubectl_version.content | trim }}/bin/linux/amd64/kubectl"
- dest: /tmp/kubectl
- mode: '0755'
-
- - name: Install kubectl
- when: not kubectl_stat.stat.exists
- ansible.builtin.copy:
- src: /tmp/kubectl
- dest: /usr/local/bin/kubectl
- mode: '0755'
- remote_src: yes
-
- - name: Check if helm exists
- ansible.builtin.stat:
- path: /usr/local/bin/helm
- register: helm_stat
-
- - name: Download Helm installer script
- when: not helm_stat.stat.exists
- ansible.builtin.get_url:
- url: https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
- dest: /tmp/get-helm-3.sh
- mode: '0755'
-
- - name: Install Helm
- when: not helm_stat.stat.exists
- ansible.builtin.command:
- cmd: /tmp/get-helm-3.sh
- environment:
- HELM_INSTALL_DIR: /usr/local/bin
-
- - name: Check if minikube exists
- when: vllm_k8s_type | default('minikube') == 'minikube'
- ansible.builtin.stat:
- path: /usr/local/bin/minikube
- register: minikube_stat
-
- - name: Download Minikube
- when:
- - vllm_k8s_type | default('minikube') == 'minikube'
- - not minikube_stat.stat.exists
- ansible.builtin.get_url:
- url: https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
- dest: /tmp/minikube-linux-amd64
- mode: '0755'
-
- - name: Install Minikube
- when:
- - vllm_k8s_type | default('minikube') == 'minikube'
- - not minikube_stat.stat.exists
- ansible.builtin.copy:
- src: /tmp/minikube-linux-amd64
- dest: /usr/local/bin/minikube
- mode: '0755'
- remote_src: yes
-
- - name: Get available system memory
- ansible.builtin.command:
- cmd: free -m
- register: memory_info
- changed_when: false
-
- - name: Calculate minikube memory allocation
- set_fact:
- minikube_memory_mb: >-
- {%- set total_mem = memory_info.stdout_lines[1].split()[1] | int -%}
- {%- set requested_mem = (vllm_request_memory | default('16Gi') | regex_replace('Gi', '') | int) * 1024 -%}
- {%- set available_mem = (total_mem * 0.8) | int -%}
- {{ [requested_mem, available_mem, 3072] | min }}
-
- - name: Calculate minikube CPU allocation
- set_fact:
- minikube_cpus: >-
- {%- set requested_cpus = vllm_request_cpu | default(4) | int -%}
- {%- set available_cpus = ansible_processor_vcpus | default(4) | int -%}
- {{ [requested_cpus, available_cpus] | min }}
-
- - name: Check if Minikube is already running
- when: vllm_k8s_type | default('minikube') == 'minikube'
- become: no
- ansible.builtin.command:
- cmd: minikube status --format={{ "{{.Host}}" }}
- register: minikube_status
- changed_when: false
- failed_when: false
-
- - name: Ensure Minikube directory permissions
- when: vllm_k8s_type | default('minikube') == 'minikube'
- ansible.builtin.file:
- path: /data/minikube
- state: directory
- owner: "kdevops"
- group: "kdevops"
- mode: '0755'
- recurse: yes
- become: yes
-
- - name: Display minikube start parameters
- when:
- - vllm_k8s_type | default('minikube') == 'minikube'
- - minikube_status.stdout != 'Running'
- debug:
- msg: "Starting minikube with {{ minikube_cpus }} CPUs, {{ minikube_memory_mb }}MB RAM, 50GB disk. This may take 5-10 minutes on first run..."
-
- - name: Start Minikube cluster
- when:
- - vllm_k8s_type | default('minikube') == 'minikube'
- - minikube_status.stdout != 'Running'
- become: no
- ansible.builtin.command:
- cmd: minikube start --driver=docker --cpus={{ minikube_cpus }} --memory={{ minikube_memory_mb }} --disk-size=50g --insecure-registry="{{ ansible_default_ipv4.gateway }}:5000"
- environment:
- MINIKUBE_HOME: /data/minikube
- register: minikube_start
- changed_when: "'Done!' in minikube_start.stdout"
- async: 600 # Allow up to 10 minutes
- poll: 30 # Check every 30 seconds
-
- - name: Enable GPU support in Minikube (if available)
- when:
- - vllm_k8s_type | default('minikube') == 'minikube'
- - not (vllm_use_cpu_inference | default(false))
- - vllm_request_gpu | default(1) | int > 0
- become: no
- ansible.builtin.command:
- cmd: minikube addons enable nvidia-gpu-device-plugin
- ignore_errors: yes
-
- - name: Disable GPU support in Minikube for CPU inference
- when:
- - vllm_k8s_type | default('minikube') == 'minikube'
- - vllm_use_cpu_inference | default(false)
- - minikube_status.stdout == 'Running'
- become: no
- ansible.builtin.command:
- cmd: minikube addons disable nvidia-gpu-device-plugin
- ignore_errors: yes
-
- - name: Clone vLLM production stack repository
- git:
- repo: "{{ vllm_production_stack_repo }}"
- dest: "{{ vllm_local_path }}/production-stack-repo"
- version: "{{ vllm_production_stack_version }}"
- update: yes
- force: yes
- when: false # Not needed for production-stack deployment type which uses Helm
-
- - name: Create results directory
- file:
- path: "{{ vllm_results_dir }}"
- state: directory
- mode: '0755'
-
- - name: Generate vLLM deployment manifest
- template:
- src: vllm-deployment.yaml.j2
- dest: "{{ vllm_local_path }}/vllm-deployment.yaml"
- mode: '0644'
- when: vllm_deployment_type != "production-stack"
-
- - name: Deploy vLLM using kubectl
- become: no
- ansible.builtin.command:
- cmd: kubectl apply -f {{ vllm_local_path }}/vllm-deployment.yaml
- register: kubectl_apply
- changed_when: "'created' in kubectl_apply.stdout or 'configured' in kubectl_apply.stdout"
- when: vllm_deployment_type != "production-stack"
-
- - name: Wait for vLLM pods to be ready
- become: no
- kubernetes.core.k8s_info:
- api_version: v1
- kind: Pod
- namespace: "{{ vllm_helm_namespace | default('vllm-system') }}"
- label_selectors:
- - app=vllm-server
- register: pod_list
- until: pod_list.resources | length > 0 and pod_list.resources | selectattr('status.phase', 'equalto', 'Running') | list | length == pod_list.resources | length
- retries: 30
- delay: 10
- when: vllm_deployment_type != "production-stack"
-
- - name: Get vLLM service endpoint
- become: no
- kubernetes.core.k8s_info:
- api_version: v1
- kind: Service
- namespace: "{{ vllm_helm_namespace | default('vllm-system') }}"
- name: vllm-service
- register: vllm_service
-
- - name: Display vLLM endpoint information
- debug:
- msg: |
- vLLM deployed successfully!
- {% if vllm_k8s_type | default('minikube') == 'minikube' %}
- To access the API, run: kubectl port-forward -n {{ vllm_helm_namespace | default('vllm-system') }} svc/vllm-service {{ vllm_api_port | default(8000) }}:8000
- Then access: http://localhost:{{ vllm_api_port | default(8000) }}/v1/models
- {% else %}
- API endpoint: {{ vllm_service.resources[0].status.loadBalancer.ingress[0].ip | default('pending') }}:{{ vllm_api_port | default(8000) }}
- {% endif %}
-
- name: vLLM benchmark tasks
tags: vllm-benchmark
when: vllm_benchmark_enabled | default(true)
@@ -321,7 +70,9 @@
group: "{{ ansible_user | default('ubuntu') }}"
- name: Set up port forwarding for benchmarking
- when: vllm_k8s_type | default('minikube') == 'minikube'
+ when:
+ - vllm_deployment_type | default('docker') != 'bare-metal'
+ - vllm_k8s_type | default('minikube') == 'minikube'
become: no
ansible.builtin.command:
cmd: kubectl port-forward -n {{ vllm_helm_namespace | default('vllm-system') }} svc/vllm-service {{ vllm_api_port | default(8000) }}:8000
@@ -330,7 +81,9 @@
register: port_forward_task
- name: Wait for port forwarding to be ready
- when: vllm_k8s_type | default('minikube') == 'minikube'
+ when:
+ - vllm_deployment_type | default('docker') != 'bare-metal'
+ - vllm_k8s_type | default('minikube') == 'minikube'
ansible.builtin.wait_for:
port: "{{ vllm_api_port | default(8000) }}"
host: localhost
@@ -347,6 +100,7 @@
- name: Stop port forwarding
when:
+ - vllm_deployment_type | default('docker') != 'bare-metal'
- vllm_k8s_type | default('minikube') == 'minikube'
- port_forward_task is defined
become: no
@@ -357,6 +111,7 @@
- name: Kill port forwarding if still running
when:
+ - vllm_deployment_type | default('docker') != 'bare-metal'
- vllm_k8s_type | default('minikube') == 'minikube'
- port_forward_task is defined
- job_result.finished is defined
@@ -412,7 +167,9 @@
- name: vLLM monitoring tasks
tags: vllm-monitor
- when: vllm_observability_enabled | default(true)
+ when:
+ - vllm_observability_enabled | default(true)
+ - vllm_deployment_type | default('docker') != 'bare-metal'
block:
- name: Get Grafana service information
become: no
@@ -444,8 +201,22 @@
Prometheus: http://{{ prometheus_service.resources[0].status.loadBalancer.ingress[0].ip | default('pending') }}:{{ vllm_prometheus_port | default(9090) }}
{% endif %}
-- name: vLLM cleanup tasks
+- name: vLLM bare-metal monitoring info
+ tags: vllm-monitor
+ when: vllm_deployment_type | default('docker') == 'bare-metal'
+ debug:
+ msg: |
+ Bare-metal deployment does not include Grafana/Prometheus monitoring stack.
+ vLLM service logs available via: sudo journalctl -u {{ vllm_bare_metal_service_name | default('vllm') }} -f
+
+- name: vLLM cleanup for bare metal
+ ansible.builtin.include_tasks: tasks/cleanup-bare-metal.yml
+ when: vllm_deployment_type | default('docker') == 'bare-metal'
+ tags: ["vllm-cleanup"]
+
+- name: vLLM cleanup tasks (Kubernetes)
tags: vllm-cleanup
+ when: vllm_deployment_type | default('docker') != 'bare-metal'
block:
- name: Delete all resources in vLLM namespace
become: no
diff --git a/playbooks/vllm.yml b/playbooks/vllm.yml
index 2aad56a8..51151afe 100644
--- a/playbooks/vllm.yml
+++ b/playbooks/vllm.yml
@@ -8,4 +8,5 @@
roles:
- role: create_data_partition
tags: ["data_partition"]
+ when: data_device is defined and data_device != None and data_device | length > 0
- role: vllm
diff --git a/scripts/vllm-quick-test.sh b/scripts/vllm-quick-test.sh
index c68de2c8..30ecf355 100755
--- a/scripts/vllm-quick-test.sh
+++ b/scripts/vllm-quick-test.sh
@@ -27,22 +27,41 @@ fi
# Check if baseline and dev are enabled
BASELINE_AND_DEV=$(grep "^CONFIG_KDEVOPS_BASELINE_AND_DEV=y" "${TOPDIR}/.config" || true)
+# Check if using declared hosts (bare metal or existing infrastructure)
+USE_DECLARED_HOSTS=$(grep "^CONFIG_KDEVOPS_USE_DECLARED_HOSTS=y" "${TOPDIR}/.config" || true)
+
+# Check deployment type
+VLLM_BARE_METAL=$(grep "^CONFIG_VLLM_BARE_METAL=y" "${TOPDIR}/.config" || true)
+
# Get node names from extra_vars.yaml
if [[ ! -f "${TOPDIR}/extra_vars.yaml" ]]; then
echo -e "${RED}Error: extra_vars.yaml not found. Run 'make' first.${NC}"
exit 1
fi
-KDEVOPS_HOST_PREFIX=$(grep "^kdevops_host_prefix:" "${TOPDIR}/extra_vars.yaml" | awk '{print $2}' | tr -d '"')
-if [[ -z "$KDEVOPS_HOST_PREFIX" ]]; then
- echo -e "${RED}Error: Could not determine host prefix from extra_vars.yaml${NC}"
- exit 1
-fi
+# Determine nodes to test based on deployment type
+if [[ -n "$USE_DECLARED_HOSTS" ]]; then
+ # For declared hosts, get the actual hostnames from extra_vars.yaml
+ DECLARED_HOSTS=$(grep "^kdevops_declared_hosts:" "${TOPDIR}/extra_vars.yaml" | awk '{print $2}' | tr -d '"')
+ if [[ -z "$DECLARED_HOSTS" ]]; then
+ echo -e "${RED}Error: Declared hosts enabled but no hosts specified in extra_vars.yaml${NC}"
+ exit 1
+ fi
+ # Split comma-separated hosts into array
+ IFS=',' read -ra NODES <<< "$DECLARED_HOSTS"
+else
+ # For provisioned VMs, use the host prefix
+ KDEVOPS_HOST_PREFIX=$(grep "^kdevops_host_prefix:" "${TOPDIR}/extra_vars.yaml" | awk '{print $2}' | tr -d '"')
+ if [[ -z "$KDEVOPS_HOST_PREFIX" ]]; then
+ echo -e "${RED}Error: Could not determine host prefix from extra_vars.yaml${NC}"
+ exit 1
+ fi
-# Determine nodes to test
-NODES=("${KDEVOPS_HOST_PREFIX}-vllm")
-if [[ -n "$BASELINE_AND_DEV" ]]; then
- NODES+=("${KDEVOPS_HOST_PREFIX}-vllm-dev")
+ # Determine nodes to test
+ NODES=("${KDEVOPS_HOST_PREFIX}-vllm")
+ if [[ -n "$BASELINE_AND_DEV" ]]; then
+ NODES+=("${KDEVOPS_HOST_PREFIX}-vllm-dev")
+ fi
fi
# Function to test a single node
@@ -65,15 +84,20 @@ test_node() {
echo "Node IP: ${node_ip}"
- # Check if port-forward is running
- local pf_running=$(ssh "${node}" "ps aux | grep 'kubectl port-forward' | grep 8000 | grep -v grep" 2>/dev/null || true)
-
- if [[ -z "$pf_running" ]]; then
- echo "Starting kubectl port-forward..."
- ssh "${node}" "sudo nohup kubectl --kubeconfig=/root/.kube/config port-forward -n vllm-system svc/vllm-prod-${node}-router-service 8000:80 --address=0.0.0.0 > /tmp/pf.log 2>&1 &" 2>/dev/null || true
- sleep 2
+ # Only setup kubectl port-forward for Kubernetes deployments
+ if [[ -z "$VLLM_BARE_METAL" ]]; then
+ # Check if port-forward is running
+ local pf_running=$(ssh "${node}" "ps aux | grep 'kubectl port-forward' | grep 8000 | grep -v grep" 2>/dev/null || true)
+
+ if [[ -z "$pf_running" ]]; then
+ echo "Starting kubectl port-forward..."
+ ssh "${node}" "sudo nohup kubectl --kubeconfig=/root/.kube/config port-forward -n vllm-system svc/vllm-prod-${node}-router-service 8000:80 --address=0.0.0.0 > /tmp/pf.log 2>&1 &" 2>/dev/null || true
+ sleep 2
+ else
+ echo "kubectl port-forward already running"
+ fi
else
- echo "kubectl port-forward already running"
+ echo "Deployment type: Bare metal (direct connection to port 8000)"
fi
# Test the endpoint with timing
diff --git a/workflows/vllm/Makefile b/workflows/vllm/Makefile
index 91966b28..caed2e8a 100644
--- a/workflows/vllm/Makefile
+++ b/workflows/vllm/Makefile
@@ -44,6 +44,22 @@ vllm-cleanup:
--tags vars,vllm-cleanup \
--extra-vars=@./extra_vars.yaml
+vllm-cleanup-full:
+ $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
+ --limit 'baseline:dev' \
+ playbooks/vllm.yml \
+ --tags vars,vllm-cleanup \
+ --extra-vars=@./extra_vars.yaml \
+ --extra-vars='{"vllm_cleanup_remove_binaries": true}'
+
+vllm-cleanup-purge:
+ $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
+ --limit 'baseline:dev' \
+ playbooks/vllm.yml \
+ --tags vars,vllm-cleanup \
+ --extra-vars=@./extra_vars.yaml \
+ --extra-vars='{"vllm_cleanup_remove_binaries": true, "vllm_cleanup_remove_data": true}'
+
vllm-results:
$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
--limit 'baseline:dev' \
@@ -102,17 +118,19 @@ vllm-quick-test:
vllm-help-menu:
@echo "vLLM Production Stack options:"
- @echo "vllm - Deploy vLLM stack to Kubernetes"
- @echo "vllm-deploy - Deploy vLLM stack to Kubernetes (same as vllm)"
+ @echo "vllm - Deploy vLLM stack"
+ @echo "vllm-deploy - Deploy vLLM stack (same as vllm)"
@echo "vllm-benchmark - Run performance benchmarks and collect results"
@echo "vllm-monitor - Display monitoring dashboard URLs"
@echo "vllm-status - Check detailed deployment status (verbose)"
@echo "vllm-status-simplified - Check deployment status (clean summary)"
@echo "vllm-quick-test - Quick API test (baseline + dev if enabled)"
@echo "vllm-teardown - Gracefully remove vLLM deployment"
- @echo "vllm-cleanup - Force delete all vLLM resources (use when stuck)"
+ @echo "vllm-cleanup - Remove vLLM containers/services (keep binaries & data)"
+ @echo "vllm-cleanup-full - Remove everything including binaries (kubectl, helm, minikube)"
+ @echo "vllm-cleanup-purge - PURGE ALL: Remove binaries + all data directories"
@echo "vllm-results - Collect and visualize benchmark results"
@echo "vllm-visualize-results - Generate HTML visualization of benchmark results"
@echo ""
-.PHONY: vllm vllm-deploy vllm-benchmark vllm-monitor vllm-status vllm-status-simplified vllm-quick-test vllm-teardown vllm-cleanup vllm-results vllm-visualize-results vllm-help-menu
+.PHONY: vllm vllm-deploy vllm-benchmark vllm-monitor vllm-status vllm-status-simplified vllm-quick-test vllm-teardown vllm-cleanup vllm-cleanup-full vllm-cleanup-purge vllm-results vllm-visualize-results vllm-help-menu
--
2.51.0
next prev parent reply other threads:[~2025-10-04 16:38 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-04 16:38 [PATCH v2 0/4] vLLM and the vLLM production stack Luis Chamberlain
2025-10-04 16:38 ` [PATCH v2 1/4] workflows: Add vLLM workflow for LLM inference and production deployment Luis Chamberlain
2025-10-04 16:38 ` Luis Chamberlain [this message]
2025-10-04 16:38 ` [PATCH v2 3/4] vllm: Add GPU-enabled defconfig with compatibility documentation Luis Chamberlain
2025-10-04 16:38 ` [PATCH v2 4/4] defconfigs: Add composable fragments for Lambda Labs vLLM deployment Luis Chamberlain
2025-10-04 16:39 ` [PATCH v2 0/4] vLLM and the vLLM production stack Luis Chamberlain
2025-10-04 16:55 ` Chuck Lever
2025-10-04 17:03 ` Luis Chamberlain
2025-10-04 17:14 ` Chuck Lever
2025-10-08 17:46 ` Chuck Lever
2025-10-10 0:55 ` Luis Chamberlain
2025-10-10 12:38 ` Chuck Lever
2025-10-10 16:20 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251004163816.3303237-3-mcgrof@kernel.org \
--to=mcgrof@kernel.org \
--cc=Joelagnelf@nvidia.com \
--cc=cel@kernel.org \
--cc=da.gomez@kruces.com \
--cc=devasena.i@samsung.com \
--cc=dongjoo.seo1@samsung.com \
--cc=kdevops@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox