From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6776821CC51 for ; Sun, 31 Aug 2025 04:00:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756612810; cv=none; b=dwHRE+xt5+Okf4MkNSgbCKd5pnm6pHdVmgLjBDVGPeo+IsiisqH6Xihm9/LxUlq7leYKkJnoxVcQweduvah5AMdigNjIpG8zx7owQgu45VsQdnMdzMTDmG278JuwOOKlSQ4qFsi2LRVd+MNVUh4fat+Z68bTxOk4TKc572rk1hQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756612810; c=relaxed/simple; bh=hCrH/PBoL5aJcQ3IN0LYrWM14PsnpYysVziYqhq28wE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=kVfMWUvo5+/HuYBM8nVye7Mk7w7c7uq7xKbDiWn34MTjBLC2pnjh0n7GLVLID+SoVHe7YGiounQQppKaCJpltj/OKDwsRozr4r6Yi4QFaQm4h1ktZLMwoVHjCHHZvCcDQtKqL2fD8w/tC36R8BzNiNJ2xTwOXVE+JQ38dPnROO0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=pbHVyUJ0; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="pbHVyUJ0" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description; bh=31H7mDfgX+gnub243Dv3SKVQ8eIcwe0V5uTnl5fzee0=; b=pbHVyUJ0tpLJOP48Km7MuZyOXA nzGoVZIxp/Sxs6yaVB99vNfX8j4JGz9ve58ekkbIJpVhbBCMUlwAVVL5mJ+QP5Vg4MAOu4jTpMAVc TW+e1a9CAX3+nidP+yYtT1hheuzG+rWqkUoaNjq10jsD9mp3ixC/Xma8JBORCW3L3pKqoD+LZLbUQ vwmIUN9xzsSLPtAZ5P3UMoP60D9jKKFSe3MyrJnT38Sga9IJVkk90YweOgUCnvVZoaCeAs9xtUWFg hXnMQ1aM01Rky5aCoRr5cRuNg5YuIU/0iJx/m8AhQ1rAxVkEBkK3b0D/5vGL0swinJXPMJFLktNUL I50R/5nA==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1usZEb-000000093sK-3bzU; Sun, 31 Aug 2025 04:00:05 +0000 From: Luis Chamberlain To: Chuck Lever , Daniel Gomez , kdevops@lists.linux.dev Cc: Luis Chamberlain , Your Name Subject: [PATCH v3 08/10] terraform/lambdalabs: add terraform provider implementation Date: Sat, 30 Aug 2025 21:00:02 -0700 Message-ID: <20250831040004.2159779-9-mcgrof@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250831040004.2159779-1-mcgrof@kernel.org> References: <20250831040004.2159779-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: kdevops@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: Luis Chamberlain Add complete Terraform implementation for Lambda Labs cloud provider: - main.tf: Core instance and SSH key resource definitions - provider.tf: Lambda Labs provider configuration - vars.tf: Input variables with validation and descriptions - output.tf: Instance information outputs for Ansible integration - shared.tf: Shared resource management - ansible_provision_cmd.tpl: Ansible provisioning command template - README.md: Comprehensive setup and usage documentation - SET_API_KEY.sh: API key configuration helper script - extract_api_key.py: API key extraction utility Features: - Full instance lifecycle management - SSH key provisioning and management - Ansible integration with dynamic inventory - Comprehensive error handling and validation - Provider limitation documentation and workarounds This provides complete infrastructure-as-code support for Lambda Labs GPU instances with seamless kdevops integration. Generated-by: Claude AI Signed-off-by: Your Name --- terraform/lambdalabs/README.md | 349 ++++++++++++++++++ terraform/lambdalabs/SET_API_KEY.sh | 20 + .../lambdalabs/ansible_provision_cmd.tpl | 1 + terraform/lambdalabs/extract_api_key.py | 40 ++ terraform/lambdalabs/main.tf | 154 ++++++++ terraform/lambdalabs/output.tf | 51 +++ terraform/lambdalabs/provider.tf | 19 + terraform/lambdalabs/shared.tf | 1 + terraform/lambdalabs/vars.tf | 65 ++++ 9 files changed, 700 insertions(+) create mode 100644 terraform/lambdalabs/README.md create mode 100644 terraform/lambdalabs/SET_API_KEY.sh create mode 120000 terraform/lambdalabs/ansible_provision_cmd.tpl create mode 100755 terraform/lambdalabs/extract_api_key.py create mode 100644 terraform/lambdalabs/main.tf create mode 100644 terraform/lambdalabs/output.tf create mode 100644 terraform/lambdalabs/provider.tf create mode 120000 terraform/lambdalabs/shared.tf create mode 100644 terraform/lambdalabs/vars.tf diff --git a/terraform/lambdalabs/README.md b/terraform/lambdalabs/README.md new file mode 100644 index 0000000..4ec1ac4 --- /dev/null +++ b/terraform/lambdalabs/README.md @@ -0,0 +1,349 @@ +# Lambda Labs Terraform Provider for kdevops + +This directory contains the Terraform configuration for deploying kdevops infrastructure on Lambda Labs cloud GPU platform. + +> **Architecture Note**: Lambda Labs serves as the reference implementation for kdevops' dynamic cloud configuration system. For details on how the dynamic Kconfig generation works, see [Dynamic Cloud Kconfig Documentation](../../docs/dynamic-cloud-kconfig.md). + +## Table of Contents +- [Prerequisites](#prerequisites) +- [Quick Start](#quick-start) +- [Dynamic Configuration](#dynamic-configuration) +- [SSH Key Security](#ssh-key-security) +- [Configuration Options](#configuration-options) +- [Provider Limitations](#provider-limitations) +- [Troubleshooting](#troubleshooting) +- [API Reference](#api-reference) + +## Prerequisites + +1. **Lambda Labs Account**: Sign up at https://cloud.lambdalabs.com +2. **API Key**: Generate at https://cloud.lambdalabs.com/api-keys +3. **Terraform**: Version 1.0 or higher + +### API Key Setup + +Configure your Lambda Labs API key using the credentials file method: + +**Credentials File Configuration (Required)** +```bash +# Using the helper script: +python3 scripts/lambdalabs_credentials.py set "your-api-key-here" + +# Or manually: +mkdir -p ~/.lambdalabs +cat > ~/.lambdalabs/credentials << EOF +[default] +lambdalabs_api_key = your-api-key-here +EOF +chmod 600 ~/.lambdalabs/credentials +``` + +The system uses file-based authentication for consistency with other cloud providers. +Environment variables are NOT supported to avoid configuration complexity. + +## Quick Start + +```bash +# Step 1: Configure API credentials +python3 scripts/lambdalabs_credentials.py set "your-api-key" + +# Step 2: Generate cloud configuration (queries available instances) +make cloud-config + +# Step 3: Configure for Lambda Labs with smart defaults +make defconfig-lambdalabs + +# Step 4: Deploy infrastructure (SSH keys handled automatically) +make bringup + +# Step 5: When done, clean up everything +make destroy +``` + +## Dynamic Configuration + +Lambda Labs support includes dynamic configuration generation that queries the Lambda Labs API to provide: + +- **Real-time availability**: Only shows instance types with current capacity +- **Smart defaults**: Automatically selects cheapest available instances +- **Regional awareness**: Shows which regions have capacity for each instance type +- **Current pricing**: Displays up-to-date pricing information + +### How It Works + +1. **API Query**: When you run `make cloud-config` or `make menuconfig`, the system uses `lambda-cli` to query Lambda Labs API +2. **Kconfig Generation**: Available instances and regions are written to `.generated` files +3. **Menu Integration**: Generated files are included in the Kconfig menu system +4. **Smart Selection**: The system can automatically choose optimal configurations + +### lambda-cli Tool + +The `lambda-cli` tool is the central interface for Lambda Labs operations: + +```bash +# List available instances +scripts/lambda-cli instance-types list --available-only + +# Get pricing information +scripts/lambda-cli pricing list + +# Smart selection (finds cheapest available) +scripts/lambda-cli smart-select --mode cheapest + +# Generate Kconfig files +scripts/lambda-cli generate-kconfig +``` + +### Manual API Queries + +You can manually query Lambda Labs availability: + +```bash +# Check what's available right now +scripts/lambda-cli --output json instance-types list --available-only + +# Check specific region +scripts/lambda-cli --output json instance-types list --region us-west-1 + +# Get current pricing +scripts/lambda-cli --output json pricing list +``` + +For more details on the dynamic configuration system, see [Dynamic Cloud Kconfig Documentation](../../docs/dynamic-cloud-kconfig.md). + +## SSH Key Security + +### Automatic Unique Keys (Default - Recommended) + +Each kdevops project directory automatically gets its own unique SSH key: + +- **Key Format**: `kdevops--` (e.g., `kdevops-lambda-kdevops-611374da`) +- **Automatic Creation**: Keys are created and uploaded on first `make bringup` +- **Automatic Cleanup**: Keys are removed when you run `make destroy` +- **No Manual Setup**: Everything is handled automatically + +### Legacy Shared Key Mode + +For backwards compatibility, you can use a shared key across projects: + +```bash +# Use the shared key configuration +make defconfig-lambdalabs-shared-key + +# Manually add your key to Lambda Labs console +# https://cloud.lambdalabs.com/ssh-keys +``` + +### SSH Key Management Commands + +```bash +# List all SSH keys in your account +make lambdalabs-ssh-list + +# Manually setup project SSH key +make lambdalabs-ssh-setup + +# Remove project SSH key +make lambdalabs-ssh-clean + +# Direct CLI usage +python3 scripts/lambdalabs_ssh_keys.py list +python3 scripts/lambdalabs_ssh_keys.py add +python3 scripts/lambdalabs_ssh_keys.py delete +``` + +## Configuration Options + +### Smart Instance Selection + +The default configuration automatically: +1. Detects your geographic location from your public IP +2. Queries Lambda Labs API for available instances +3. Finds the cheapest available GPU instance +4. Deploys to the closest region with that instance + +### Available Defconfigs + +| Config | Description | Use Case | +|--------|-------------|----------| +| `defconfig-lambdalabs` | Smart instance + unique SSH keys | Production (recommended) | +| `defconfig-lambdalabs-shared-key` | Smart instance + shared SSH key | Legacy/testing | + +### Manual Configuration + +```bash +# Configure specific options +make menuconfig + +# Navigate to: +# → Bring up methods → Terraform → Lambda Labs +``` + +Configuration options: +- **Instance Type**: Choose specific GPU (or use smart selection) +- **Region**: Choose specific region (or use smart selection) +- **SSH Key Strategy**: Unique per-project or shared + +## Provider Limitations + +The Lambda Labs Terraform provider (elct9620/lambdalabs v0.3.0) has significant limitations: + +| Feature | Supported | Notes | +|---------|-----------|-------| +| Instance Creation | ✅ Yes | Basic instance provisioning | +| GPU Selection | ✅ Yes | All Lambda Labs GPU types | +| Region Selection | ✅ Yes | With availability checking | +| SSH Key Reference | ✅ Yes | By name only | +| OS Image Selection | ❌ No | Always Ubuntu 22.04 | +| Custom User Creation | ❌ No | Always uses 'ubuntu' user | +| Storage Volumes | ❌ No | Cannot attach additional storage | +| User Data/Cloud-Init | ❌ No | No initialization scripts | +| Network Configuration | ❌ No | Basic networking only | +| SSH Key Creation | ❌ No | Must exist in console first | + +## Troubleshooting + +### SSH Authentication Failures + +**Problem**: `Permission denied (publickey)` when connecting + +**Solutions**: +1. Verify SSH key exists in Lambda Labs: + ```bash + make lambdalabs-ssh-list + ``` + +2. Check key name matches configuration: + ```bash + grep TERRAFORM_LAMBDALABS_SSH_KEY_NAME .config + ``` + +3. Ensure using correct private key: + ```bash + ssh -i ~/.ssh/kdevops_terraform ubuntu@ + ``` + +### No Capacity Available + +**Problem**: `No capacity available for instance type` + +**Solutions**: +1. Smart inference automatically finds available regions +2. Regenerate configs to check current availability: + ```bash + make cloud-config + cat terraform/lambdalabs/kconfigs/Kconfig.compute.generated | grep "✓" + ``` +3. Try different instance type or wait for capacity + +### API Key Issues + +**Problem**: `Invalid API key` or 403 errors + +**Solutions**: +1. Verify credentials: + ```bash + cat ~/.lambdalabs/credentials + ``` + +2. Test API access: + ```bash + python3 scripts/lambdalabs_list_instances.py + ``` + +3. Generate new API key at https://cloud.lambdalabs.com/api-keys + +### Instance Creation Fails + +**Problem**: `Bad Request` when creating instances + +**Solutions**: +1. Ensure SSH key exists with exact name +2. Verify instance type is available in region +3. Check terraform output: + ```bash + cd terraform/lambdalabs + terraform plan + ``` + +## API Reference + +### Scripts + +| Script | Purpose | +|--------|---------| +| `lambdalabs_api.py` | Main API integration, generates Kconfig | +| `lambdalabs_smart_inference.py` | Smart instance/region selection | +| `lambdalabs_ssh_keys.py` | SSH key management | +| `lambdalabs_list_instances.py` | List running instances | +| `lambdalabs_credentials.py` | Manage API credentials | +| `lambdalabs_ssh_key_name.py` | Generate unique key names | +| `generate_cloud_configs.py` | Update all cloud configurations | + +### Make Targets + +| Target | Description | +|--------|-------------| +| `cloud-config` | Generate/update cloud configurations | +| `defconfig-lambdalabs` | Configure with smart defaults | +| `bringup` | Deploy infrastructure | +| `destroy` | Destroy infrastructure and cleanup | +| `lambdalabs-ssh-list` | List SSH keys | +| `lambdalabs-ssh-setup` | Setup SSH key | +| `lambdalabs-ssh-clean` | Remove SSH key | + +### Authentication Architecture + +The Lambda Labs provider uses file-based authentication exclusively: + +1. **Credentials File**: `~/.lambdalabs/credentials` contains the API key +2. **Extraction Script**: `extract_api_key.py` reads and validates the key +3. **Terraform Integration**: External data source provides the key to the provider +4. **No Environment Variables**: Consistent with AWS/GCE authentication patterns + +## Files + +``` +terraform/lambdalabs/ +├── README.md # This file +├── main.tf # Instance configuration +├── provider.tf # Provider setup +├── vars.tf # Variable definitions +├── output.tf # Output definitions +└── kconfigs/ # Kconfig integration + ├── Kconfig # Main configuration + ├── Kconfig.compute # Instance selection + ├── Kconfig.identity # SSH key configuration + ├── Kconfig.location # Region selection + ├── Kconfig.storage # Storage placeholder + └── *.generated # Dynamic configs from API +``` + +## Testing Your Setup + +```bash +# 1. Test API connectivity +python3 scripts/lambdalabs_list_instances.py + +# 2. Test smart inference +python3 scripts/lambdalabs_smart_inference.py + +# 3. Validate terraform +cd terraform/lambdalabs +terraform init +terraform validate +terraform plan + +# 4. Test SSH key management +make lambdalabs-ssh-list +``` + +## Support + +- **kdevops Issues**: https://github.com/linux-kdevops/kdevops/issues +- **Lambda Labs Support**: support@lambdalabs.com +- **Lambda Labs Status**: https://status.lambdalabs.com + +--- + +*Generated for kdevops v5.0.2 with Lambda Labs provider v0.3.0* diff --git a/terraform/lambdalabs/SET_API_KEY.sh b/terraform/lambdalabs/SET_API_KEY.sh new file mode 100644 index 0000000..bac441a --- /dev/null +++ b/terraform/lambdalabs/SET_API_KEY.sh @@ -0,0 +1,20 @@ +#!/bin/bash +# SPDX-License-Identifier: copyleft-next-0.3.1 + +echo "==========================================" +echo "CRITICAL: Set your Lambda Labs API Key" +echo "==========================================" +echo "" +echo "Your Lambda Labs API key file is not set up." +echo "" +echo "To fix this:" +echo "1. Get your API key from: https://cloud.lambdalabs.com" +echo "2. Create the directory and file:" +echo "" +echo " mkdir -p ~/.lambdalabs" +echo " echo 'your-actual-api-key-here' > ~/.lambdalabs/credentials" +echo " chmod 600 ~/.lambdalabs/credentials" +echo "" +echo "Then run: make bringup" +echo "" +echo "==========================================" diff --git a/terraform/lambdalabs/ansible_provision_cmd.tpl b/terraform/lambdalabs/ansible_provision_cmd.tpl new file mode 120000 index 0000000..5c92657 --- /dev/null +++ b/terraform/lambdalabs/ansible_provision_cmd.tpl @@ -0,0 +1 @@ +../ansible_provision_cmd.tpl \ No newline at end of file diff --git a/terraform/lambdalabs/extract_api_key.py b/terraform/lambdalabs/extract_api_key.py new file mode 100755 index 0000000..10c9599 --- /dev/null +++ b/terraform/lambdalabs/extract_api_key.py @@ -0,0 +1,40 @@ +#!/usr/bin/env python3 +# Extract API key from Lambda Labs credentials file +import configparser +import json +import sys +from pathlib import Path + + +def extract_api_key(creds_file="~/.lambdalabs/credentials"): + """Extract just the API key value from credentials file.""" + try: + path = Path(creds_file).expanduser() + if not path.exists(): + sys.stderr.write(f"Credentials file not found: {path}\n") + sys.exit(1) + + config = configparser.ConfigParser() + config.read(path) + + # Try default section first + if "default" in config and "lambdalabs_api_key" in config["default"]: + return config["default"]["lambdalabs_api_key"].strip() + + # Try DEFAULT section + if "DEFAULT" in config and "lambdalabs_api_key" in config["DEFAULT"]: + return config["DEFAULT"]["lambdalabs_api_key"].strip() + + sys.stderr.write("API key not found in credentials file\n") + sys.exit(1) + + except Exception as e: + sys.stderr.write(f"Error reading credentials: {e}\n") + sys.exit(1) + + +if __name__ == "__main__": + creds_file = sys.argv[1] if len(sys.argv) > 1 else "~/.lambdalabs/credentials" + api_key = extract_api_key(creds_file) + # Output JSON format required by terraform external data source + print(json.dumps({"api_key": api_key})) diff --git a/terraform/lambdalabs/main.tf b/terraform/lambdalabs/main.tf new file mode 100644 index 0000000..a78866c --- /dev/null +++ b/terraform/lambdalabs/main.tf @@ -0,0 +1,154 @@ +# Create SSH key if configured to do so +resource "lambdalabs_ssh_key" "kdevops" { + count = var.ssh_config_genkey ? 1 : 0 + name = var.lambdalabs_ssh_key_name + + # If we have an existing public key file, use it (trimming whitespace) + # Otherwise the provider will generate a new key pair + public_key = fileexists(pathexpand(var.ssh_config_pubkey_file)) ? trimspace(file(pathexpand(var.ssh_config_pubkey_file))) : null + + lifecycle { + # Ignore changes to public_key to work around provider bug with whitespace + ignore_changes = [public_key] + } +} + +# Save the generated SSH key to files if it was created +resource "null_resource" "save_ssh_key" { + count = var.ssh_config_genkey && !fileexists(pathexpand(var.ssh_config_pubkey_file)) ? 1 : 0 + + provisioner "local-exec" { + command = <<-EOT + # Save private key + echo "${lambdalabs_ssh_key.kdevops[0].private_key}" > ${pathexpand(var.ssh_config_privkey_file)} + chmod 600 ${pathexpand(var.ssh_config_privkey_file)} + + # Extract and save public key + ssh-keygen -y -f ${pathexpand(var.ssh_config_privkey_file)} > ${pathexpand(var.ssh_config_pubkey_file)} + chmod 644 ${pathexpand(var.ssh_config_pubkey_file)} + EOT + } + + depends_on = [ + lambdalabs_ssh_key.kdevops + ] +} + +# Local variables for SSH user mapping based on OS +locals { + # Map OS images to their default SSH users + # Lambda Labs typically uses Ubuntu, but this allows for flexibility + ssh_user_map = { + "ubuntu-22.04" = "ubuntu" + "ubuntu-20.04" = "ubuntu" + "ubuntu-24.04" = "ubuntu" + "ubuntu-18.04" = "ubuntu" + "debian-11" = "debian" + "debian-12" = "debian" + "debian-10" = "debian" + "rocky-8" = "rocky" + "rocky-9" = "rocky" + "centos-7" = "centos" + "centos-8" = "centos" + "alma-8" = "almalinux" + "alma-9" = "almalinux" + } + + # Determine SSH user - Lambda Labs doesn't support OS selection + # All instances use Ubuntu 22.04, so we always use "ubuntu" user + # The ssh_user_map is kept for potential future provider updates + ssh_user = "ubuntu" +} + +# Create instances +resource "lambdalabs_instance" "kdevops" { + for_each = toset(var.kdevops_nodes) + name = each.value + region_name = var.lambdalabs_region + instance_type_name = var.lambdalabs_instance_type + ssh_key_names = var.ssh_config_genkey ? [lambdalabs_ssh_key.kdevops[0].name] : [var.lambdalabs_ssh_key_name] + # Note: Lambda Labs provider doesn't currently support specifying the OS image + # The provider uses a default image (typically Ubuntu 22.04) + + lifecycle { + ignore_changes = [ssh_key_names] + } + + depends_on = [ + lambdalabs_ssh_key.kdevops + ] +} + +# Note: Lambda Labs provider doesn't currently support persistent storage resources +# This would need to be managed through the Lambda Labs console or API directly +# Keeping this comment for future implementation when the provider supports it + +# SSH config update +resource "null_resource" "ansible_update_ssh_config_hosts" { + for_each = var.ssh_config_update ? toset(var.kdevops_nodes) : [] + + provisioner "local-exec" { + command = "python3 ${path.module}/../../scripts/update_ssh_config_lambdalabs.py update ${each.key} ${lambdalabs_instance.kdevops[each.key].ip} ${local.ssh_user} ${var.ssh_config_name} ${var.ssh_config_privkey_file} 'Lambda Labs'" + } + + triggers = { + instance_id = lambdalabs_instance.kdevops[each.key].id + } +} + +# Remove SSH config entries on destroy +resource "null_resource" "remove_ssh_config" { + for_each = var.ssh_config_update ? toset(var.kdevops_nodes) : [] + + provisioner "local-exec" { + when = destroy + command = "python3 ${self.triggers.ssh_config_script} remove ${self.triggers.hostname} '' '' ${self.triggers.ssh_config_name} '' 'Lambda Labs'" + } + + triggers = { + instance_id = lambdalabs_instance.kdevops[each.key].id + ssh_config_script = "${path.module}/../../scripts/update_ssh_config_lambdalabs.py" + ssh_config_name = var.ssh_config_name + hostname = each.key + } +} + +# Ansible provisioning +resource "null_resource" "ansible_provision" { + for_each = toset(var.kdevops_nodes) + + connection { + type = "ssh" + host = lambdalabs_instance.kdevops[each.key].ip + user = local.ssh_user + private_key = file(pathexpand(var.ssh_config_privkey_file)) + } + + provisioner "remote-exec" { + inline = [ + "echo 'Waiting for system to be ready...'", + "sudo cloud-init status --wait || true", + "echo 'System is ready for provisioning'" + ] + } + + provisioner "local-exec" { + command = templatefile("${path.module}/ansible_provision_cmd.tpl", { + inventory = "../../hosts", + limit = each.key, + extra_vars = "../../extra_vars.yaml", + playbook_dir = "../../playbooks", + provision_playbook = "devconfig.yml", + extra_args = "--limit ${each.key} --extra-vars @../../extra_vars.yaml" + }) + } + + depends_on = [ + lambdalabs_instance.kdevops, + null_resource.ansible_update_ssh_config_hosts + ] + + triggers = { + instance_id = lambdalabs_instance.kdevops[each.key].id + } +} diff --git a/terraform/lambdalabs/output.tf b/terraform/lambdalabs/output.tf new file mode 100644 index 0000000..347d032 --- /dev/null +++ b/terraform/lambdalabs/output.tf @@ -0,0 +1,51 @@ +output "instance_ids" { + description = "The IDs of the Lambda Labs instances" + value = { for k, v in lambdalabs_instance.kdevops : k => v.id } +} + +output "instance_ips" { + description = "The IP addresses of the Lambda Labs instances" + value = { for k, v in lambdalabs_instance.kdevops : k => v.ip } +} + +output "instance_names" { + description = "The names of the Lambda Labs instances" + value = { for k, v in lambdalabs_instance.kdevops : k => v.name } +} + +output "instance_regions" { + description = "The regions of the Lambda Labs instances" + value = { for k, v in lambdalabs_instance.kdevops : k => v.region_name } +} + +# Storage management is not supported by Lambda Labs provider +# output "storage_enabled" { +# description = "Whether persistent storage is enabled" +# value = var.extra_storage_enable +# } + +output "ssh_key_name" { + description = "The name of the SSH key used" + value = var.lambdalabs_ssh_key_name +} + +output "ssh_key_generated" { + description = "Whether an SSH key was generated" + value = var.ssh_config_genkey +} + +output "generated_private_key" { + description = "The generated private SSH key (if created)" + value = var.ssh_config_genkey && length(lambdalabs_ssh_key.kdevops) > 0 ? lambdalabs_ssh_key.kdevops[0].private_key : null + sensitive = true +} + +output "controller_ip_map" { + description = "Map of instance names to IP addresses for Ansible" + value = { for k, v in lambdalabs_instance.kdevops : k => v.ip } +} + +output "ssh_user" { + description = "SSH user for connecting to instances based on OS image" + value = local.ssh_user +} diff --git a/terraform/lambdalabs/provider.tf b/terraform/lambdalabs/provider.tf new file mode 100644 index 0000000..a49500c --- /dev/null +++ b/terraform/lambdalabs/provider.tf @@ -0,0 +1,19 @@ +terraform { + required_version = ">= 1.0" + required_providers { + lambdalabs = { + source = "elct9620/lambdalabs" + version = "~> 0.3.0" + } + } +} + +# Extract API key from credentials file +data "external" "lambdalabs_api_key" { + program = ["python3", "${path.module}/extract_api_key.py", var.lambdalabs_api_key_file] +} + +provider "lambdalabs" { + # API key extracted from credentials file + api_key = data.external.lambdalabs_api_key.result["api_key"] +} diff --git a/terraform/lambdalabs/shared.tf b/terraform/lambdalabs/shared.tf new file mode 120000 index 0000000..c10b610 --- /dev/null +++ b/terraform/lambdalabs/shared.tf @@ -0,0 +1 @@ +../shared.tf \ No newline at end of file diff --git a/terraform/lambdalabs/vars.tf b/terraform/lambdalabs/vars.tf new file mode 100644 index 0000000..a11d043 --- /dev/null +++ b/terraform/lambdalabs/vars.tf @@ -0,0 +1,65 @@ +variable "lambdalabs_api_key_file" { + description = "Path to file containing Lambda Labs API key" + type = string + default = "~/.lambdalabs/credentials" +} + +variable "lambdalabs_region" { + description = "Lambda Labs region to deploy resources" + type = string + default = "us-tx-1" +} + +variable "lambdalabs_instance_type" { + description = "Lambda Labs instance type" + type = string + default = "gpu_1x_a10" +} + +variable "lambdalabs_ssh_key_name" { + description = "Name of the existing SSH key in Lambda Labs to use for instances" + type = string +} + +# NOTE: Lambda Labs provider doesn't support OS image selection +# All instances use Ubuntu 22.04 by default +# This variable is kept for compatibility but has no effect +#variable "image_name" { +# description = "OS image to use for instances" +# type = string +# default = "ubuntu-22.04" +#} + + +variable "ssh_config_name" { + description = "The name of your ssh_config file" + type = string + default = "../.ssh/config" +} + +variable "ssh_config_use" { + description = "Set this to false to disable the use of the ssh config file" + type = bool + default = true +} + +variable "ssh_config_genkey" { + description = "Set this to true to enable regenerating an ssh key" + type = bool + default = false +} + +# NOTE: Lambda Labs provider doesn't support storage volume management +# Instances come with their default storage only +# These variables are kept for compatibility but have no effect +#variable "extra_storage_size" { +# description = "Size of extra storage volume in GB" +# type = number +# default = 0 +#} +# +#variable "extra_storage_enable" { +# description = "Enable extra storage volume" +# type = bool +# default = false +#} -- 2.50.1