From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E3C714F70 for ; Sat, 4 Oct 2025 16:38:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759595902; cv=none; b=PLx921rvO/gRe5yFxcUc0oqQGyi06+tAqqwH4ycyRctytjcO4/kCvM+SLGZb4l2lPGAL0oaB6rSCx7LdTQpPzj5RuoNPIa+Q4hyoM9BrQQR85JACSdgFBrip4ZqsumnRCevcXGVEASvEU4Xeb8LnTF9yNAphLkWtY0Fm1ln+ssQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759595902; c=relaxed/simple; bh=F+KwBEHNW9aocSNReS28oHdFVRkUl7cm1QqOk4yBQro=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KZvGqSK5AcnmvnhlhlH1FP9zC64QSmA6v/dO/zcErwuSoaV30GIGOKD9koXGtm2e5yuV0qfQbcFw8no42BKrWDM4/AW69izmLEJ44uCvlyFz/QTDTHm3wicnuq9vG9O+LAVRgcF0psbTlmfketaqZjM2qEwOas7rJ6ky6oob7Zo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=hgIpjybm; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="hgIpjybm" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description; bh=uqMuWvXR8quKpw5DwDA6FOP4ijzzXh4vs27DeLvaOBM=; b=hgIpjybmjTA1XaF1XIJlmSuiyW t23xrzKd/pbpw5jmgkwyRkvK9/bu9eas2JxEACLt2+WNyi22tyUMJBecEEwwapVKuN54fQ55m+W96 vU1XXjqQzPwW7N4Wf/B7x+2XLUjF043wB83wZdkwKARkF3NuqoLbygsb9WQaW3QZcoquvHEC/vajP OIxG8ojSKUfVwk1vo5gjW8/JenJPihmQXamng4LT1STr3xKqcWMKH5wJNhqFrDNCNRP2R2m57bzwq CoStmxJWNzcmm3d6UHq5iYUZrfI2+NS6rHnGVR7D3vUWAnPKpXi5+aYqfxBSAbmWPfi+xMymJ62gv as/CuDsA==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1v55Gz-0000000DrKC-40Ro; Sat, 04 Oct 2025 16:38:17 +0000 From: Luis Chamberlain To: Chuck Lever , Daniel Gomez , kdevops@lists.linux.dev Cc: Devasena Inupakutika , DongjooSeo , Joel Fernandes , Luis Chamberlain Subject: [PATCH v2 2/4] vllm: Add DECLARE_HOSTS support for bare metal and existing infrastructure Date: Sat, 4 Oct 2025 09:38:12 -0700 Message-ID: <20251004163816.3303237-3-mcgrof@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251004163816.3303237-1-mcgrof@kernel.org> References: <20251004163816.3303237-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: kdevops@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: Luis Chamberlain Following the pattern established by the MinIO workflow in commit 533be4c716d1 ("minio: add MinIO Warp S3 benchmarking with declared hosts support"), add DECLARE_HOSTS support to the vLLM workflow to enable testing on pre-existing infrastructure including bare metal servers with GPUs. This enables users to leverage existing GPU infrastructure without requiring kdevops to provision new systems. Two new defconfigs are provided for different deployment scenarios. New defconfigs: 1. defconfig-vllm-declared-hosts - Bare metal deployment using Docker containers - Targets single-node GPU servers - Uses systemd service management for vLLM - Configurable GPU type (nvidia-a100, etc.) and count - Direct port 8000 access without Kubernetes overhead - Suitable for direct hardware access scenarios 2. defconfig-vllm-production-stack-declared-hosts - Production Stack deployment on existing Kubernetes clusters - Full Production Stack with router and monitoring - Autoscaling support (2-5 replicas) - Grafana/Prometheus observability stack - Suitable for production GPU clusters Both configurations automatically: - Set CONFIG_SKIP_BRINGUP=y to skip infrastructure provisioning - Set CONFIG_KDEVOPS_USE_DECLARED_HOSTS=y for pre-existing systems - Enable benchmarking for performance validation - Support HuggingFace model deployment (default: facebook/opt-125m) Implementation changes: Bare-metal deployment improvements (deploy-bare-metal.yml): - Remove legacy kubectl/minikube/helm installation for bare-metal - Implement Docker image mirror fallback (try mirror, fall back to public) - Replace handler-based service restart with direct systemd management - Make config file creation optional (template may not exist) - Support both Docker and Podman container runtimes - Automatic GPU detection and container image selection (CPU vs GPU) Main task routing (main.yml): - Remove 260+ line legacy deployment block that was causing unwanted Kubernetes installation even with when: false due to tags override - Change configure-docker-data.yml to only run for Kubernetes deployments - Convert deployment method routing from import_tasks to include_tasks with apply parameter to properly respect when conditions and tags - Add bare-metal specific conditions to benchmark tasks (skip kubectl port-forward, connect directly to port 8000) - Add bare-metal specific conditions to monitoring tasks (skip Kubernetes service queries, show systemd journal instructions instead) - Add new bare-metal monitoring info task with journalctl instructions Cleanup support (cleanup-bare-metal.yml - NEW): - Add vllm-cleanup target: Remove containers and systemd services - Add vllm-cleanup-full target: Also remove kubectl/helm/minikube binaries - Add vllm-cleanup-purge target: Complete purge including data directories - Essential for declared hosts since 'make destroy' doesn't apply Testing improvements (vllm-quick-test.sh): - Detect CONFIG_KDEVOPS_USE_DECLARED_HOSTS for declared hosts mode - Detect CONFIG_VLLM_BARE_METAL for deployment type - Read actual hostnames from kdevops_declared_hosts in extra_vars.yaml - Support comma-separated host lists for multiple declared hosts - Skip kubectl port-forward setup for bare-metal deployments - Direct connection to port 8000 for bare-metal API access - Maintain backward compatibility with provisioned VMs Makefile additions (workflows/vllm/Makefile): - Add vllm-cleanup target for basic cleanup - Add vllm-cleanup-full target for complete cleanup with binaries - Add vllm-cleanup-purge target for purging all data - Update help text for new cleanup targets Dependency fixes (install-deps/debian/main.yml): - Make kubectl/minikube installation conditional on deployment type - Skip Kubernetes tools for bare-metal deployments Example usage for bare metal GPU server: make defconfig-vllm-declared-hosts DECLARE_HOSTS=gpu-server-01 make make vllm # Deploy vLLM as systemd service make vllm-quick-test # Verify API endpoint make vllm-benchmark # Run performance benchmarks make vllm-cleanup # Clean up when done Example usage for existing Kubernetes cluster: make defconfig-vllm-production-stack-declared-hosts DECLARE_HOSTS=k8s-cluster make make vllm # Deploy via Helm make vllm-status # Check deployment status make vllm-monitor # Access Grafana/Prometheus make vllm-cleanup # Clean up namespace Key architectural decisions: 1. Avoid fragile hostvars access patterns - use configuration variables that are globally accessible across execution contexts (localhost vs target nodes) 2. Use include_tasks instead of import_tasks for conditional execution since import_tasks is static and evaluated at parse time, while include_tasks is dynamic and respects when conditions 3. Apply tags properly to included tasks using the apply parameter, otherwise tags only apply to the include statement itself 4. Implement graceful fallbacks for infrastructure dependencies (Docker mirror → public registry, Kubernetes → bare-metal) 5. Provide cleanup targets for declared hosts since standard 'make destroy' only applies to provisioned infrastructure This implementation mirrors the approach used for MinIO declared hosts support and enables vLLM testing on any infrastructure where GPUs are available, whether bare metal servers or existing Kubernetes clusters. Generated-by: Claude AI Signed-off-by: Luis Chamberlain --- defconfigs/vllm-declared-hosts | 53 +++ .../vllm-production-stack-declared-hosts | 66 ++++ .../roles/vllm/tasks/cleanup-bare-metal.yml | 110 +++++++ .../roles/vllm/tasks/deploy-bare-metal.yml | 116 +++++-- .../vllm/tasks/install-deps/debian/main.yml | 4 +- playbooks/roles/vllm/tasks/main.yml | 311 +++--------------- playbooks/vllm.yml | 1 + scripts/vllm-quick-test.sh | 58 +++- workflows/vllm/Makefile | 26 +- 9 files changed, 421 insertions(+), 324 deletions(-) create mode 100644 defconfigs/vllm-declared-hosts create mode 100644 defconfigs/vllm-production-stack-declared-hosts create mode 100644 playbooks/roles/vllm/tasks/cleanup-bare-metal.yml diff --git a/defconfigs/vllm-declared-hosts b/defconfigs/vllm-declared-hosts new file mode 100644 index 00000000..bd475e9f --- /dev/null +++ b/defconfigs/vllm-declared-hosts @@ -0,0 +1,53 @@ +# +# vLLM with declared hosts (bare metal or pre-existing infrastructure) +# +# Automatically generated file; DO NOT EDIT. +# kdevops 5.0.2 Configuration +# +CONFIG_WORKFLOWS=y +CONFIG_WORKFLOWS_TESTS=y +CONFIG_WORKFLOWS_LINUX_TESTS=y +CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y +CONFIG_KDEVOPS_WORKFLOW_DEDICATE_VLLM=y + +# Skip bringup for declared hosts +CONFIG_SKIP_BRINGUP=y +CONFIG_KDEVOPS_USE_DECLARED_HOSTS=y + +# vLLM specific configuration - Using bare metal deployment for declared hosts +CONFIG_VLLM_BARE_METAL=y +CONFIG_VLLM_BARE_METAL_USE_CONTAINER=y +CONFIG_VLLM_BARE_METAL_DOCKER=y +CONFIG_VLLM_BARE_METAL_SERVICE_NAME="vllm" +CONFIG_VLLM_BARE_METAL_DATA_DIR="/var/lib/vllm" +CONFIG_VLLM_BARE_METAL_LOG_DIR="/var/log/vllm" + +# GPU configuration for declared hosts +CONFIG_VLLM_BARE_METAL_DECLARE_HOST_GPU_TYPE="nvidia-a100" +CONFIG_VLLM_BARE_METAL_DECLARE_HOST_GPU_COUNT=1 + +# Model configuration +CONFIG_VLLM_MODEL_URL="facebook/opt-125m" +CONFIG_VLLM_MODEL_NAME="opt-125m" + +# Engine configuration +CONFIG_VLLM_VERSION_STABLE=y +CONFIG_VLLM_ENGINE_IMAGE_TAG="v0.10.2" +CONFIG_VLLM_REQUEST_CPU=8 +CONFIG_VLLM_REQUEST_MEMORY="16Gi" +CONFIG_VLLM_REQUEST_GPU=1 +CONFIG_VLLM_MAX_MODEL_LEN=2048 +CONFIG_VLLM_DTYPE="auto" +CONFIG_VLLM_GPU_MEMORY_UTILIZATION="0.9" +CONFIG_VLLM_TENSOR_PARALLEL_SIZE=1 + +# API configuration +CONFIG_VLLM_API_PORT=8000 +CONFIG_VLLM_API_KEY="" +CONFIG_VLLM_HF_TOKEN="" + +# Benchmarking +CONFIG_VLLM_BENCHMARK_ENABLED=y +CONFIG_VLLM_BENCHMARK_DURATION=60 +CONFIG_VLLM_BENCHMARK_CONCURRENT_USERS=10 +CONFIG_VLLM_BENCHMARK_RESULTS_DIR="/data/vllm-benchmark" diff --git a/defconfigs/vllm-production-stack-declared-hosts b/defconfigs/vllm-production-stack-declared-hosts new file mode 100644 index 00000000..b9406807 --- /dev/null +++ b/defconfigs/vllm-production-stack-declared-hosts @@ -0,0 +1,66 @@ +# +# vLLM Production Stack with declared hosts (bare metal with GPU) +# +# Automatically generated file; DO NOT EDIT. +# kdevops 5.0.2 Configuration +# +CONFIG_WORKFLOWS=y +CONFIG_WORKFLOWS_TESTS=y +CONFIG_WORKFLOWS_LINUX_TESTS=y +CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y +CONFIG_KDEVOPS_WORKFLOW_DEDICATE_VLLM=y + +# Skip bringup for declared hosts +CONFIG_SKIP_BRINGUP=y +CONFIG_KDEVOPS_USE_DECLARED_HOSTS=y + +# vLLM Production Stack with Kubernetes on declared hosts +CONFIG_VLLM_PRODUCTION_STACK=y +CONFIG_VLLM_K8S_EXISTING=y +CONFIG_VLLM_VERSION_STABLE=y +CONFIG_VLLM_ENGINE_IMAGE_TAG="v0.10.2" +CONFIG_VLLM_HELM_RELEASE_NAME="vllm-prod" +CONFIG_VLLM_HELM_NAMESPACE="vllm-system" + +# Production Stack components +CONFIG_VLLM_PROD_STACK_REPO="https://vllm-project.github.io/production-stack" +CONFIG_VLLM_PROD_STACK_CHART_VERSION="latest" +CONFIG_VLLM_PROD_STACK_ROUTER_IMAGE="ghcr.io/vllm-project/production-stack/router" +CONFIG_VLLM_PROD_STACK_ROUTER_TAG="latest" +CONFIG_VLLM_PROD_STACK_ENABLE_MONITORING=y +CONFIG_VLLM_PROD_STACK_ENABLE_AUTOSCALING=y +CONFIG_VLLM_PROD_STACK_MIN_REPLICAS=2 +CONFIG_VLLM_PROD_STACK_MAX_REPLICAS=5 +CONFIG_VLLM_PROD_STACK_TARGET_GPU_UTILIZATION=80 + +# Model configuration +CONFIG_VLLM_MODEL_URL="facebook/opt-125m" +CONFIG_VLLM_MODEL_NAME="opt-125m" + +# Engine configuration for GPU +CONFIG_VLLM_REPLICA_COUNT=2 +CONFIG_VLLM_REQUEST_CPU=8 +CONFIG_VLLM_REQUEST_MEMORY="16Gi" +CONFIG_VLLM_REQUEST_GPU=1 +CONFIG_VLLM_MAX_MODEL_LEN=2048 +CONFIG_VLLM_DTYPE="auto" +CONFIG_VLLM_GPU_MEMORY_UTILIZATION="0.9" +CONFIG_VLLM_TENSOR_PARALLEL_SIZE=1 + +# Router and observability +CONFIG_VLLM_ROUTER_ENABLED=y +CONFIG_VLLM_ROUTER_ROUND_ROBIN=y +CONFIG_VLLM_OBSERVABILITY_ENABLED=y +CONFIG_VLLM_GRAFANA_PORT=3000 +CONFIG_VLLM_PROMETHEUS_PORT=9090 + +# API configuration +CONFIG_VLLM_API_PORT=8000 +CONFIG_VLLM_API_KEY="" +CONFIG_VLLM_HF_TOKEN="" + +# Benchmarking +CONFIG_VLLM_BENCHMARK_ENABLED=y +CONFIG_VLLM_BENCHMARK_DURATION=60 +CONFIG_VLLM_BENCHMARK_CONCURRENT_USERS=10 +CONFIG_VLLM_BENCHMARK_RESULTS_DIR="/data/vllm-benchmark" diff --git a/playbooks/roles/vllm/tasks/cleanup-bare-metal.yml b/playbooks/roles/vllm/tasks/cleanup-bare-metal.yml new file mode 100644 index 00000000..ca6d48be --- /dev/null +++ b/playbooks/roles/vllm/tasks/cleanup-bare-metal.yml @@ -0,0 +1,110 @@ +--- +# Cleanup tasks for bare metal vLLM deployment +# Removes all installed components and data + +- name: Stop and remove vLLM systemd service + ansible.builtin.systemd: + name: "{{ vllm_bare_metal_service_name | default('vllm') }}" + state: stopped + enabled: no + become: yes + ignore_errors: yes + +- name: Remove vLLM systemd service file + ansible.builtin.file: + path: "/etc/systemd/system/{{ vllm_bare_metal_service_name | default('vllm') }}.service" + state: absent + become: yes + +- name: Reload systemd daemon + ansible.builtin.systemd: + daemon_reload: yes + become: yes + +- name: Stop all vLLM Docker containers + ansible.builtin.command: + cmd: docker stop $(docker ps -a -q --filter ancestor={{ vllm_bare_metal_image_final }}) + ignore_errors: yes + changed_when: false + +- name: Remove all vLLM Docker containers + ansible.builtin.command: + cmd: docker rm $(docker ps -a -q --filter ancestor={{ vllm_bare_metal_image_final }}) + ignore_errors: yes + changed_when: false + +- name: Remove vLLM Docker images + ansible.builtin.command: + cmd: docker rmi {{ vllm_bare_metal_image_final }} + ignore_errors: yes + changed_when: false + +- name: Stop minikube if running + ansible.builtin.command: + cmd: minikube stop + ignore_errors: yes + changed_when: false + become: no + +- name: Delete minikube cluster + ansible.builtin.command: + cmd: minikube delete + ignore_errors: yes + changed_when: false + become: no + +- name: Remove kubectl binary + ansible.builtin.file: + path: /usr/local/bin/kubectl + state: absent + become: yes + when: vllm_cleanup_remove_binaries | default(false) + +- name: Remove minikube binary + ansible.builtin.file: + path: /usr/local/bin/minikube + state: absent + become: yes + when: vllm_cleanup_remove_binaries | default(false) + +- name: Remove helm binary + ansible.builtin.file: + path: /usr/local/bin/helm + state: absent + become: yes + when: vllm_cleanup_remove_binaries | default(false) + +- name: Remove vLLM data directories + ansible.builtin.file: + path: "{{ item }}" + state: absent + become: yes + loop: + - "{{ vllm_bare_metal_data_dir | default('/var/lib/vllm') }}" + - "{{ vllm_bare_metal_log_dir | default('/var/log/vllm') }}" + - "{{ vllm_local_path | default('/data/vllm') }}" + - "{{ vllm_results_dir | default('/data/vllm/results') }}" + when: vllm_cleanup_remove_data | default(false) + +- name: Remove /data/minikube directory + ansible.builtin.file: + path: /data/minikube + state: absent + become: yes + when: vllm_cleanup_remove_data | default(false) + +- name: Display cleanup completion message + debug: + msg: | + vLLM bare metal cleanup completed. + + Removed: + - vLLM systemd service + - vLLM Docker containers and images + - Minikube cluster + + To also remove binaries (kubectl, minikube, helm), run: + make vllm-cleanup-full + + To remove all data directories, run: + make vllm-cleanup-purge diff --git a/playbooks/roles/vllm/tasks/deploy-bare-metal.yml b/playbooks/roles/vllm/tasks/deploy-bare-metal.yml index 0aaea73d..425ffb04 100644 --- a/playbooks/roles/vllm/tasks/deploy-bare-metal.yml +++ b/playbooks/roles/vllm/tasks/deploy-bare-metal.yml @@ -47,11 +47,24 @@ set_fact: container_runtime: "{{ 'docker' if vllm_bare_metal_docker | default(true) else 'podman' }}" - - name: Ensure container runtime is installed - package: - name: "{{ container_runtime }}" - state: present + - name: Ensure Docker service is started and enabled + ansible.builtin.systemd: + name: docker + state: started + enabled: yes become: yes + when: container_runtime == 'docker' + + - name: Add current user to docker group + ansible.builtin.user: + name: "{{ ansible_user_id }}" + groups: docker + append: yes + become: yes + when: container_runtime == 'docker' + + - name: Reset connection to apply docker group membership + meta: reset_connection - name: Install nvidia-container-toolkit for GPU support when: has_nvidia_gpu @@ -75,27 +88,57 @@ state: restarted become: yes - - name: Set vLLM bare metal container image with Docker mirror if enabled + - name: Set vLLM bare metal container images ansible.builtin.set_fact: - vllm_bare_metal_image_final: >- - {%- if use_docker_mirror | default(false) | bool -%} - {%- if not has_nvidia_gpu -%} - localhost:{{ docker_mirror_port | default(5000) }}/vllm:v0.6.3-cpu - {%- else -%} - localhost:{{ docker_mirror_port | default(5000) }}/vllm-openai:latest - {%- endif -%} + vllm_bare_metal_image_mirror: >- + {%- if not has_nvidia_gpu -%} + localhost:{{ docker_mirror_port | default(5000) }}/vllm:v0.6.3-cpu {%- else -%} - {%- if not has_nvidia_gpu -%} - substratusai/vllm:v0.6.3-cpu - {%- else -%} - vllm/vllm-openai:latest - {%- endif -%} + localhost:{{ docker_mirror_port | default(5000) }}/vllm-openai:latest {%- endif -%} + vllm_bare_metal_image_public: >- + {%- if not has_nvidia_gpu -%} + substratusai/vllm:v0.6.3-cpu + {%- else -%} + vllm/vllm-openai:latest + {%- endif -%} + + - name: Set initial image to try (mirror if enabled, otherwise public) + ansible.builtin.set_fact: + vllm_bare_metal_image_final: "{{ vllm_bare_metal_image_mirror if (use_docker_mirror | default(false) | bool) else vllm_bare_metal_image_public }}" + + - name: Check if vLLM container image already exists + ansible.builtin.command: + cmd: "docker images -q {{ vllm_bare_metal_image_final }}" + register: image_exists + changed_when: false + failed_when: false - - name: Pull vLLM container image - community.docker.docker_image: - name: "{{ vllm_bare_metal_image_final }}" - source: pull + - name: Try pulling from Docker mirror first (if configured) + ansible.builtin.command: + cmd: "docker pull {{ vllm_bare_metal_image_mirror }}" + register: docker_pull_mirror + when: + - use_docker_mirror | default(false) | bool + - image_exists.stdout == "" + failed_when: false + changed_when: "'Downloaded' in docker_pull_mirror.stdout or 'Pull complete' in docker_pull_mirror.stdout" + + - name: Fall back to public registry if mirror failed + ansible.builtin.command: + cmd: "docker pull {{ vllm_bare_metal_image_public }}" + register: docker_pull_public + when: + - image_exists.stdout == "" + - (not (use_docker_mirror | default(false) | bool)) or (docker_pull_mirror is defined and docker_pull_mirror.rc != 0) + changed_when: "'Downloaded' in docker_pull_public.stdout or 'Pull complete' in docker_pull_public.stdout" + + - name: Update final image name if we used public registry + ansible.builtin.set_fact: + vllm_bare_metal_image_final: "{{ vllm_bare_metal_image_public }}" + when: + - docker_pull_public is defined + - docker_pull_public.rc == 0 - name: Create vLLM systemd service for container template: @@ -103,7 +146,7 @@ dest: "/etc/systemd/system/{{ vllm_bare_metal_service_name | default('vllm') }}.service" mode: '0644' become: yes - notify: restart vllm + register: systemd_service_container # Direct installation (pip/source) - name: Deploy vLLM with direct installation @@ -155,7 +198,13 @@ dest: "/etc/systemd/system/{{ vllm_bare_metal_service_name | default('vllm') }}.service" mode: '0644' become: yes - notify: restart vllm + register: systemd_service_direct + + - name: Check if vLLM configuration template exists + stat: + path: "{{ role_path }}/templates/vllm.conf.j2" + register: vllm_conf_template + delegate_to: localhost - name: Create vLLM configuration file template: @@ -163,13 +212,25 @@ dest: /etc/vllm/vllm.conf mode: '0644' become: yes - notify: restart vllm + register: vllm_config + when: vllm_conf_template.stat.exists - name: Reload systemd daemon systemd: daemon_reload: yes become: yes + - name: Restart vLLM service if configuration changed + systemd: + name: "{{ vllm_bare_metal_service_name | default('vllm') }}" + state: restarted + daemon_reload: yes + become: yes + when: >- + (systemd_service_container is defined and systemd_service_container.changed) or + (systemd_service_direct is defined and systemd_service_direct.changed) or + (vllm_config is defined and vllm_config.changed) + - name: Start and enable vLLM service systemd: name: "{{ vllm_bare_metal_service_name | default('vllm') }}" @@ -218,10 +279,3 @@ - Stop: sudo systemctl stop {{ vllm_bare_metal_service_name | default('vllm') }} - Status: sudo systemctl status {{ vllm_bare_metal_service_name | default('vllm') }} - Logs: sudo journalctl -u {{ vllm_bare_metal_service_name | default('vllm') }} -f - -# Handler for restarting vLLM -- name: restart vllm - systemd: - name: "{{ vllm_bare_metal_service_name | default('vllm') }}" - state: restarted - become: yes diff --git a/playbooks/roles/vllm/tasks/install-deps/debian/main.yml b/playbooks/roles/vllm/tasks/install-deps/debian/main.yml index 12a8a8e3..a7a82193 100644 --- a/playbooks/roles/vllm/tasks/install-deps/debian/main.yml +++ b/playbooks/roles/vllm/tasks/install-deps/debian/main.yml @@ -60,11 +60,11 @@ state: present tags: ["vllm", "deps"] -- name: Add kdevops user to docker group +- name: Add current user to docker group become: true become_method: sudo ansible.builtin.user: - name: kdevops + name: "{{ ansible_user_id }}" groups: docker append: yes tags: ["vllm", "deps", "docker-config"] diff --git a/playbooks/roles/vllm/tasks/main.yml b/playbooks/roles/vllm/tasks/main.yml index d6b239f4..d799cf8c 100644 --- a/playbooks/roles/vllm/tasks/main.yml +++ b/playbooks/roles/vllm/tasks/main.yml @@ -19,288 +19,37 @@ tags: ["vllm", "deps"] # Configure Docker and storage to use /data partition BEFORE starting any containers +# Only needed for Kubernetes-based deployments (docker and production-stack) - name: Configure Docker to use /data for storage - ansible.builtin.import_tasks: tasks/configure-docker-data.yml + ansible.builtin.include_tasks: tasks/configure-docker-data.yml + when: vllm_deployment_type | default('docker') in ['docker', 'production-stack'] tags: ["deps", "docker-config", "storage", "vllm-deploy"] # Route to appropriate deployment method based on configuration - name: Deploy vLLM using latest Docker images - ansible.builtin.import_tasks: tasks/deploy-docker.yml + ansible.builtin.include_tasks: + file: tasks/deploy-docker.yml + apply: + tags: ["vllm-deploy"] when: vllm_deployment_type | default('docker') == 'docker' tags: ["vllm-deploy"] - name: Deploy vLLM Production Stack with Helm - ansible.builtin.import_tasks: tasks/deploy-production-stack.yml + ansible.builtin.include_tasks: + file: tasks/deploy-production-stack.yml + apply: + tags: ["vllm-deploy"] when: vllm_deployment_type | default('docker') == 'production-stack' tags: ["vllm-deploy"] - name: Deploy vLLM on bare metal - ansible.builtin.import_tasks: tasks/deploy-bare-metal.yml + ansible.builtin.include_tasks: + file: tasks/deploy-bare-metal.yml + apply: + tags: ["vllm-deploy"] when: vllm_deployment_type | default('docker') == 'bare-metal' tags: ["vllm-deploy"] -# Legacy deployment block - will be moved to deploy-docker.yml -- name: vLLM deployment tasks (legacy) - tags: vllm-deploy - when: vllm_deployment_type | default('docker') != 'production-stack' - block: - - name: Ensure Docker service is started and enabled - ansible.builtin.systemd: - name: docker - state: started - enabled: yes - become: yes - - - name: Add current user to docker group - ansible.builtin.user: - name: "{{ ansible_user_id }}" - groups: docker - append: yes - become: yes - - - name: Ensure docker socket has correct permissions - ansible.builtin.file: - path: /var/run/docker.sock - mode: '0666' - become: yes - - - name: Reset connection to apply docker group membership - meta: reset_connection - - - name: Wait for Docker to be accessible - ansible.builtin.wait_for: - path: /var/run/docker.sock - state: present - timeout: 30 - - - name: Test Docker access - ansible.builtin.command: - cmd: docker version - register: docker_test - become: no - failed_when: false - changed_when: false - retries: 3 - delay: 2 - until: docker_test.rc == 0 - - - name: Check if kubectl exists - ansible.builtin.stat: - path: /usr/local/bin/kubectl - register: kubectl_stat - - - name: Get latest kubectl version - when: not kubectl_stat.stat.exists - ansible.builtin.uri: - url: https://dl.k8s.io/release/stable.txt - return_content: yes - register: kubectl_version - - - name: Download kubectl - when: not kubectl_stat.stat.exists - ansible.builtin.get_url: - url: "https://dl.k8s.io/release/{{ kubectl_version.content | trim }}/bin/linux/amd64/kubectl" - dest: /tmp/kubectl - mode: '0755' - - - name: Install kubectl - when: not kubectl_stat.stat.exists - ansible.builtin.copy: - src: /tmp/kubectl - dest: /usr/local/bin/kubectl - mode: '0755' - remote_src: yes - - - name: Check if helm exists - ansible.builtin.stat: - path: /usr/local/bin/helm - register: helm_stat - - - name: Download Helm installer script - when: not helm_stat.stat.exists - ansible.builtin.get_url: - url: https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 - dest: /tmp/get-helm-3.sh - mode: '0755' - - - name: Install Helm - when: not helm_stat.stat.exists - ansible.builtin.command: - cmd: /tmp/get-helm-3.sh - environment: - HELM_INSTALL_DIR: /usr/local/bin - - - name: Check if minikube exists - when: vllm_k8s_type | default('minikube') == 'minikube' - ansible.builtin.stat: - path: /usr/local/bin/minikube - register: minikube_stat - - - name: Download Minikube - when: - - vllm_k8s_type | default('minikube') == 'minikube' - - not minikube_stat.stat.exists - ansible.builtin.get_url: - url: https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 - dest: /tmp/minikube-linux-amd64 - mode: '0755' - - - name: Install Minikube - when: - - vllm_k8s_type | default('minikube') == 'minikube' - - not minikube_stat.stat.exists - ansible.builtin.copy: - src: /tmp/minikube-linux-amd64 - dest: /usr/local/bin/minikube - mode: '0755' - remote_src: yes - - - name: Get available system memory - ansible.builtin.command: - cmd: free -m - register: memory_info - changed_when: false - - - name: Calculate minikube memory allocation - set_fact: - minikube_memory_mb: >- - {%- set total_mem = memory_info.stdout_lines[1].split()[1] | int -%} - {%- set requested_mem = (vllm_request_memory | default('16Gi') | regex_replace('Gi', '') | int) * 1024 -%} - {%- set available_mem = (total_mem * 0.8) | int -%} - {{ [requested_mem, available_mem, 3072] | min }} - - - name: Calculate minikube CPU allocation - set_fact: - minikube_cpus: >- - {%- set requested_cpus = vllm_request_cpu | default(4) | int -%} - {%- set available_cpus = ansible_processor_vcpus | default(4) | int -%} - {{ [requested_cpus, available_cpus] | min }} - - - name: Check if Minikube is already running - when: vllm_k8s_type | default('minikube') == 'minikube' - become: no - ansible.builtin.command: - cmd: minikube status --format={{ "{{.Host}}" }} - register: minikube_status - changed_when: false - failed_when: false - - - name: Ensure Minikube directory permissions - when: vllm_k8s_type | default('minikube') == 'minikube' - ansible.builtin.file: - path: /data/minikube - state: directory - owner: "kdevops" - group: "kdevops" - mode: '0755' - recurse: yes - become: yes - - - name: Display minikube start parameters - when: - - vllm_k8s_type | default('minikube') == 'minikube' - - minikube_status.stdout != 'Running' - debug: - msg: "Starting minikube with {{ minikube_cpus }} CPUs, {{ minikube_memory_mb }}MB RAM, 50GB disk. This may take 5-10 minutes on first run..." - - - name: Start Minikube cluster - when: - - vllm_k8s_type | default('minikube') == 'minikube' - - minikube_status.stdout != 'Running' - become: no - ansible.builtin.command: - cmd: minikube start --driver=docker --cpus={{ minikube_cpus }} --memory={{ minikube_memory_mb }} --disk-size=50g --insecure-registry="{{ ansible_default_ipv4.gateway }}:5000" - environment: - MINIKUBE_HOME: /data/minikube - register: minikube_start - changed_when: "'Done!' in minikube_start.stdout" - async: 600 # Allow up to 10 minutes - poll: 30 # Check every 30 seconds - - - name: Enable GPU support in Minikube (if available) - when: - - vllm_k8s_type | default('minikube') == 'minikube' - - not (vllm_use_cpu_inference | default(false)) - - vllm_request_gpu | default(1) | int > 0 - become: no - ansible.builtin.command: - cmd: minikube addons enable nvidia-gpu-device-plugin - ignore_errors: yes - - - name: Disable GPU support in Minikube for CPU inference - when: - - vllm_k8s_type | default('minikube') == 'minikube' - - vllm_use_cpu_inference | default(false) - - minikube_status.stdout == 'Running' - become: no - ansible.builtin.command: - cmd: minikube addons disable nvidia-gpu-device-plugin - ignore_errors: yes - - - name: Clone vLLM production stack repository - git: - repo: "{{ vllm_production_stack_repo }}" - dest: "{{ vllm_local_path }}/production-stack-repo" - version: "{{ vllm_production_stack_version }}" - update: yes - force: yes - when: false # Not needed for production-stack deployment type which uses Helm - - - name: Create results directory - file: - path: "{{ vllm_results_dir }}" - state: directory - mode: '0755' - - - name: Generate vLLM deployment manifest - template: - src: vllm-deployment.yaml.j2 - dest: "{{ vllm_local_path }}/vllm-deployment.yaml" - mode: '0644' - when: vllm_deployment_type != "production-stack" - - - name: Deploy vLLM using kubectl - become: no - ansible.builtin.command: - cmd: kubectl apply -f {{ vllm_local_path }}/vllm-deployment.yaml - register: kubectl_apply - changed_when: "'created' in kubectl_apply.stdout or 'configured' in kubectl_apply.stdout" - when: vllm_deployment_type != "production-stack" - - - name: Wait for vLLM pods to be ready - become: no - kubernetes.core.k8s_info: - api_version: v1 - kind: Pod - namespace: "{{ vllm_helm_namespace | default('vllm-system') }}" - label_selectors: - - app=vllm-server - register: pod_list - until: pod_list.resources | length > 0 and pod_list.resources | selectattr('status.phase', 'equalto', 'Running') | list | length == pod_list.resources | length - retries: 30 - delay: 10 - when: vllm_deployment_type != "production-stack" - - - name: Get vLLM service endpoint - become: no - kubernetes.core.k8s_info: - api_version: v1 - kind: Service - namespace: "{{ vllm_helm_namespace | default('vllm-system') }}" - name: vllm-service - register: vllm_service - - - name: Display vLLM endpoint information - debug: - msg: | - vLLM deployed successfully! - {% if vllm_k8s_type | default('minikube') == 'minikube' %} - To access the API, run: kubectl port-forward -n {{ vllm_helm_namespace | default('vllm-system') }} svc/vllm-service {{ vllm_api_port | default(8000) }}:8000 - Then access: http://localhost:{{ vllm_api_port | default(8000) }}/v1/models - {% else %} - API endpoint: {{ vllm_service.resources[0].status.loadBalancer.ingress[0].ip | default('pending') }}:{{ vllm_api_port | default(8000) }} - {% endif %} - - name: vLLM benchmark tasks tags: vllm-benchmark when: vllm_benchmark_enabled | default(true) @@ -321,7 +70,9 @@ group: "{{ ansible_user | default('ubuntu') }}" - name: Set up port forwarding for benchmarking - when: vllm_k8s_type | default('minikube') == 'minikube' + when: + - vllm_deployment_type | default('docker') != 'bare-metal' + - vllm_k8s_type | default('minikube') == 'minikube' become: no ansible.builtin.command: cmd: kubectl port-forward -n {{ vllm_helm_namespace | default('vllm-system') }} svc/vllm-service {{ vllm_api_port | default(8000) }}:8000 @@ -330,7 +81,9 @@ register: port_forward_task - name: Wait for port forwarding to be ready - when: vllm_k8s_type | default('minikube') == 'minikube' + when: + - vllm_deployment_type | default('docker') != 'bare-metal' + - vllm_k8s_type | default('minikube') == 'minikube' ansible.builtin.wait_for: port: "{{ vllm_api_port | default(8000) }}" host: localhost @@ -347,6 +100,7 @@ - name: Stop port forwarding when: + - vllm_deployment_type | default('docker') != 'bare-metal' - vllm_k8s_type | default('minikube') == 'minikube' - port_forward_task is defined become: no @@ -357,6 +111,7 @@ - name: Kill port forwarding if still running when: + - vllm_deployment_type | default('docker') != 'bare-metal' - vllm_k8s_type | default('minikube') == 'minikube' - port_forward_task is defined - job_result.finished is defined @@ -412,7 +167,9 @@ - name: vLLM monitoring tasks tags: vllm-monitor - when: vllm_observability_enabled | default(true) + when: + - vllm_observability_enabled | default(true) + - vllm_deployment_type | default('docker') != 'bare-metal' block: - name: Get Grafana service information become: no @@ -444,8 +201,22 @@ Prometheus: http://{{ prometheus_service.resources[0].status.loadBalancer.ingress[0].ip | default('pending') }}:{{ vllm_prometheus_port | default(9090) }} {% endif %} -- name: vLLM cleanup tasks +- name: vLLM bare-metal monitoring info + tags: vllm-monitor + when: vllm_deployment_type | default('docker') == 'bare-metal' + debug: + msg: | + Bare-metal deployment does not include Grafana/Prometheus monitoring stack. + vLLM service logs available via: sudo journalctl -u {{ vllm_bare_metal_service_name | default('vllm') }} -f + +- name: vLLM cleanup for bare metal + ansible.builtin.include_tasks: tasks/cleanup-bare-metal.yml + when: vllm_deployment_type | default('docker') == 'bare-metal' + tags: ["vllm-cleanup"] + +- name: vLLM cleanup tasks (Kubernetes) tags: vllm-cleanup + when: vllm_deployment_type | default('docker') != 'bare-metal' block: - name: Delete all resources in vLLM namespace become: no diff --git a/playbooks/vllm.yml b/playbooks/vllm.yml index 2aad56a8..51151afe 100644 --- a/playbooks/vllm.yml +++ b/playbooks/vllm.yml @@ -8,4 +8,5 @@ roles: - role: create_data_partition tags: ["data_partition"] + when: data_device is defined and data_device != None and data_device | length > 0 - role: vllm diff --git a/scripts/vllm-quick-test.sh b/scripts/vllm-quick-test.sh index c68de2c8..30ecf355 100755 --- a/scripts/vllm-quick-test.sh +++ b/scripts/vllm-quick-test.sh @@ -27,22 +27,41 @@ fi # Check if baseline and dev are enabled BASELINE_AND_DEV=$(grep "^CONFIG_KDEVOPS_BASELINE_AND_DEV=y" "${TOPDIR}/.config" || true) +# Check if using declared hosts (bare metal or existing infrastructure) +USE_DECLARED_HOSTS=$(grep "^CONFIG_KDEVOPS_USE_DECLARED_HOSTS=y" "${TOPDIR}/.config" || true) + +# Check deployment type +VLLM_BARE_METAL=$(grep "^CONFIG_VLLM_BARE_METAL=y" "${TOPDIR}/.config" || true) + # Get node names from extra_vars.yaml if [[ ! -f "${TOPDIR}/extra_vars.yaml" ]]; then echo -e "${RED}Error: extra_vars.yaml not found. Run 'make' first.${NC}" exit 1 fi -KDEVOPS_HOST_PREFIX=$(grep "^kdevops_host_prefix:" "${TOPDIR}/extra_vars.yaml" | awk '{print $2}' | tr -d '"') -if [[ -z "$KDEVOPS_HOST_PREFIX" ]]; then - echo -e "${RED}Error: Could not determine host prefix from extra_vars.yaml${NC}" - exit 1 -fi +# Determine nodes to test based on deployment type +if [[ -n "$USE_DECLARED_HOSTS" ]]; then + # For declared hosts, get the actual hostnames from extra_vars.yaml + DECLARED_HOSTS=$(grep "^kdevops_declared_hosts:" "${TOPDIR}/extra_vars.yaml" | awk '{print $2}' | tr -d '"') + if [[ -z "$DECLARED_HOSTS" ]]; then + echo -e "${RED}Error: Declared hosts enabled but no hosts specified in extra_vars.yaml${NC}" + exit 1 + fi + # Split comma-separated hosts into array + IFS=',' read -ra NODES <<< "$DECLARED_HOSTS" +else + # For provisioned VMs, use the host prefix + KDEVOPS_HOST_PREFIX=$(grep "^kdevops_host_prefix:" "${TOPDIR}/extra_vars.yaml" | awk '{print $2}' | tr -d '"') + if [[ -z "$KDEVOPS_HOST_PREFIX" ]]; then + echo -e "${RED}Error: Could not determine host prefix from extra_vars.yaml${NC}" + exit 1 + fi -# Determine nodes to test -NODES=("${KDEVOPS_HOST_PREFIX}-vllm") -if [[ -n "$BASELINE_AND_DEV" ]]; then - NODES+=("${KDEVOPS_HOST_PREFIX}-vllm-dev") + # Determine nodes to test + NODES=("${KDEVOPS_HOST_PREFIX}-vllm") + if [[ -n "$BASELINE_AND_DEV" ]]; then + NODES+=("${KDEVOPS_HOST_PREFIX}-vllm-dev") + fi fi # Function to test a single node @@ -65,15 +84,20 @@ test_node() { echo "Node IP: ${node_ip}" - # Check if port-forward is running - local pf_running=$(ssh "${node}" "ps aux | grep 'kubectl port-forward' | grep 8000 | grep -v grep" 2>/dev/null || true) - - if [[ -z "$pf_running" ]]; then - echo "Starting kubectl port-forward..." - ssh "${node}" "sudo nohup kubectl --kubeconfig=/root/.kube/config port-forward -n vllm-system svc/vllm-prod-${node}-router-service 8000:80 --address=0.0.0.0 > /tmp/pf.log 2>&1 &" 2>/dev/null || true - sleep 2 + # Only setup kubectl port-forward for Kubernetes deployments + if [[ -z "$VLLM_BARE_METAL" ]]; then + # Check if port-forward is running + local pf_running=$(ssh "${node}" "ps aux | grep 'kubectl port-forward' | grep 8000 | grep -v grep" 2>/dev/null || true) + + if [[ -z "$pf_running" ]]; then + echo "Starting kubectl port-forward..." + ssh "${node}" "sudo nohup kubectl --kubeconfig=/root/.kube/config port-forward -n vllm-system svc/vllm-prod-${node}-router-service 8000:80 --address=0.0.0.0 > /tmp/pf.log 2>&1 &" 2>/dev/null || true + sleep 2 + else + echo "kubectl port-forward already running" + fi else - echo "kubectl port-forward already running" + echo "Deployment type: Bare metal (direct connection to port 8000)" fi # Test the endpoint with timing diff --git a/workflows/vllm/Makefile b/workflows/vllm/Makefile index 91966b28..caed2e8a 100644 --- a/workflows/vllm/Makefile +++ b/workflows/vllm/Makefile @@ -44,6 +44,22 @@ vllm-cleanup: --tags vars,vllm-cleanup \ --extra-vars=@./extra_vars.yaml +vllm-cleanup-full: + $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \ + --limit 'baseline:dev' \ + playbooks/vllm.yml \ + --tags vars,vllm-cleanup \ + --extra-vars=@./extra_vars.yaml \ + --extra-vars='{"vllm_cleanup_remove_binaries": true}' + +vllm-cleanup-purge: + $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \ + --limit 'baseline:dev' \ + playbooks/vllm.yml \ + --tags vars,vllm-cleanup \ + --extra-vars=@./extra_vars.yaml \ + --extra-vars='{"vllm_cleanup_remove_binaries": true, "vllm_cleanup_remove_data": true}' + vllm-results: $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \ --limit 'baseline:dev' \ @@ -102,17 +118,19 @@ vllm-quick-test: vllm-help-menu: @echo "vLLM Production Stack options:" - @echo "vllm - Deploy vLLM stack to Kubernetes" - @echo "vllm-deploy - Deploy vLLM stack to Kubernetes (same as vllm)" + @echo "vllm - Deploy vLLM stack" + @echo "vllm-deploy - Deploy vLLM stack (same as vllm)" @echo "vllm-benchmark - Run performance benchmarks and collect results" @echo "vllm-monitor - Display monitoring dashboard URLs" @echo "vllm-status - Check detailed deployment status (verbose)" @echo "vllm-status-simplified - Check deployment status (clean summary)" @echo "vllm-quick-test - Quick API test (baseline + dev if enabled)" @echo "vllm-teardown - Gracefully remove vLLM deployment" - @echo "vllm-cleanup - Force delete all vLLM resources (use when stuck)" + @echo "vllm-cleanup - Remove vLLM containers/services (keep binaries & data)" + @echo "vllm-cleanup-full - Remove everything including binaries (kubectl, helm, minikube)" + @echo "vllm-cleanup-purge - PURGE ALL: Remove binaries + all data directories" @echo "vllm-results - Collect and visualize benchmark results" @echo "vllm-visualize-results - Generate HTML visualization of benchmark results" @echo "" -.PHONY: vllm vllm-deploy vllm-benchmark vllm-monitor vllm-status vllm-status-simplified vllm-quick-test vllm-teardown vllm-cleanup vllm-results vllm-visualize-results vllm-help-menu +.PHONY: vllm vllm-deploy vllm-benchmark vllm-monitor vllm-status vllm-status-simplified vllm-quick-test vllm-teardown vllm-cleanup vllm-cleanup-full vllm-cleanup-purge vllm-results vllm-visualize-results vllm-help-menu -- 2.51.0