From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 94A10272E41 for ; Thu, 4 Sep 2025 13:55:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756994148; cv=none; b=bdFkGCmvFLR94jMwkPxw/zSMIazbgwALSdvTTrrvsT/LgOa9t43yvV5DLX/p4A6y6yx74THXn6L+WVx7e2pqtk/OVhg0Q1Vxq0SXTu4VxmQiAZn7ZBjD971UJBPx9uAAJJEBL/LTU2o+C0bvES4uC0Oz2qlR5BL1Cb7XzXEJR0Y= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756994148; c=relaxed/simple; bh=RHxr1LRtwb1Y3sgPp8YRR5xAOAJMDrEgvZk1Ilyc7ZY=; h=Message-ID:Date:MIME-Version:Subject:To:References:From: In-Reply-To:Content-Type; b=SdCAEhr9VngbOGVrrSNnK09pCwa2NqN2I7IS2NurQRXMvssgy3/2jzz2/6lOpqCSdM5KM75mOd+4Bfuu1LkUhg2ieYD4PLgzHEvn/m+q9jVAQI5KNO4G8pAsuBicMvTIMlpB5j8egdt0Cz4LC1+sgMS1jLFDUMX709aM20kFHFY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=r2ic3OO0; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="r2ic3OO0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4AAEC4CEF0; Thu, 4 Sep 2025 13:55:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756994148; bh=RHxr1LRtwb1Y3sgPp8YRR5xAOAJMDrEgvZk1Ilyc7ZY=; h=Date:Subject:To:References:From:In-Reply-To:From; b=r2ic3OO0O43S9T2AuIZnFJsZhQ5358Y8LS59nxbv5L0UkSwQvILZ8tHe9meRojuUR v+phQHWskDYp43LwM2VlHcbGjn+dYR/2Dnz1AYWrObN3vW1IX/FNBM2oe8h9Ti5y/d 5I6pxSwq/EMkTUJcWTzcQUCa+bYWege+cwYMDoilHmoU9nfuk0+wqOyILK+Tr3w9yL dWOM9fzSEXYAnoDDzp03OZwIytwZHcqAANu/WoKDbcfyLFTGIZWdaEklC3ltBl5Z67 amCEevj3aLfm5IN9rYg2FVB3uauBRV51/eC+8uAOz7J2zD40fxuPWUoQgxdXoDsYbH Lj3I/ZvKkMPmA== Message-ID: Date: Thu, 4 Sep 2025 09:55:46 -0400 Precedence: bulk X-Mailing-List: kdevops@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/3] aws: add dynamic cloud configuration support using AWS CLI To: Luis Chamberlain , Daniel Gomez , kdevops@lists.linux.dev References: <20250904090030.2481840-1-mcgrof@kernel.org> <20250904090030.2481840-2-mcgrof@kernel.org> Content-Language: en-US From: Chuck Lever Organization: kernel.org In-Reply-To: <20250904090030.2481840-2-mcgrof@kernel.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 9/4/25 5:00 AM, Luis Chamberlain wrote: > Add support for dynamically generating AWS instance types and regions > configuration using the AWS CLI, similar to the Lambda Labs implementation. > > This allows users to: > - Query real-time AWS instance availability > - Generate Kconfig files with current instance families and regions > - Choose between dynamic and static configuration modes > - See pricing estimates and resource summaries > > Key components: > - scripts/aws-cli: AWS CLI wrapper tool for kdevops > - scripts/aws_api.py: Low-level AWS API functions > - Updated generate_cloud_configs.py to support AWS > - Makefile integration for AWS Kconfig generation > - Option to use dynamic or static AWS configuration > > Usage: Run 'make cloud-config' to generate dynamic configuration. I'd like to see more documentation for this make target. Is this a target to be run as part of every workflow, or is it one that developers run every once in a while? (I think the latter, based on the next patch in this series, but it would be nice to put that in docs somewhere). For example, ISTR a docs file that describes "make refs-default". > This also parallelize cloud provider operations to significantly improve > generation. > > $ time make cloud-config > Cloud Provider Configuration Summary > ============================================================ > > ✓ Lambda Labs: 14/20 instances available, 14 regions, $0.50-$10.32/hr > Kconfig files generated successfully > > ✓ AWS: 979 instance types available, 17 regions, ~$0.05-$3.60/hr > Kconfig files generated successfully > > ⚠ Azure: Dynamic configuration not yet implemented > ⚠ GCE: Dynamic configuration not yet implemented > > Note: Dynamic configurations query real-time availability > Run 'make menuconfig' to configure your cloud provider > > real 6m51.859s > user 37m16.347s > sys 3m8.130s I spent a little time yesterday adding new instance families by hand, after asking the AWS instance type assistant to recommend appropriate families for CI/CD. I added: t3, t3a, m7i-flex, and the two g4 families. Other families looked too expensive or might be more than is needed for CI/CD (like who needs 16 GPUs, 192 vCPUs, and 8 200GbE adapters for a development system? ;-) What I'd like to do is prevent an overload of choices in the instance menus -- perhaps we can maintain a list somewhere of the families we'd like to add to the menu and let the scripting consult that list as it constructs the menu. (And, I can drop my by-hand patches... I didn't realize you were ready to post your automation that does the same thing). > This also adds support for GPU AMIs: IMHO the new GPU AMI support is a good idea, but should be split into a separate patch in this series. > - AWS Deep Learning AMI with pre-installed NVIDIA drivers, CUDA, and > ML frameworks > - NVIDIA Deep Learning AMI option for NGC containers > - Custom GPU AMI support for specialized images > - Automatic detection of GPU instance types > - We conditionally display of GPU AMI options only for GPU instances (I think OCI also provides distinct OS images with GPU user space tools, fwiw). > We automatically detects when you select a GPU instance family (like G6E) and > provides appropriate GPU-optimized AMI options including the AWS Deep > Learning AMI with all necessary drivers and frameworks pre-installed. > > Generated-by: Claude AI > Signed-off-by: Luis Chamberlain > --- > .gitignore | 3 + > defconfigs/aws-gpu-g6e-ai | 53 + > .../templates/aws/terraform.tfvars.j2 | 5 + > scripts/aws-cli | 436 +++++++ > scripts/aws_api.py | 1135 +++++++++++++++++ Note there is also an Ansible collection that provides the AWS API to playbooks: amazon.aws. This is not a recommendation to reimplement this patch, just pointing out there are other ways to skin the cat. > scripts/dynamic-cloud-kconfig.Makefile | 88 +- > scripts/generate_cloud_configs.py | 198 ++- > terraform/aws/kconfigs/Kconfig.compute | 109 +- > 8 files changed, 1937 insertions(+), 90 deletions(-) > create mode 100644 defconfigs/aws-gpu-g6e-ai > create mode 100755 scripts/aws-cli > create mode 100755 scripts/aws_api.py > > diff --git a/.gitignore b/.gitignore > index 09d2ae33..30337add 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -115,3 +115,6 @@ terraform/lambdalabs/.terraform_api_key > .cloud.initialized > > scripts/__pycache__/ > +.aws_cloud_config_generated > +terraform/aws/kconfigs/*.generated > +terraform/aws/kconfigs/instance-types/*.generated > diff --git a/defconfigs/aws-gpu-g6e-ai b/defconfigs/aws-gpu-g6e-ai > new file mode 100644 > index 00000000..affc7a98 > --- /dev/null > +++ b/defconfigs/aws-gpu-g6e-ai > @@ -0,0 +1,53 @@ > +# AWS G6e.2xlarge GPU instance with Deep Learning AMI for AI/ML workloads > +# This configuration sets up an AWS G6e.2xlarge instance with NVIDIA L40S GPU > +# optimized for machine learning, AI inference, and GPU-accelerated workloads > + > +# Cloud provider configuration > +CONFIG_KDEVOPS_ENABLE_TERRAFORM=y > +CONFIG_TERRAFORM=y > +CONFIG_TERRAFORM_AWS=y > + > +# AWS Dynamic configuration (required for G6E instance family and GPU AMIs) > +CONFIG_TERRAFORM_AWS_USE_DYNAMIC_CONFIG=y > + > +# AWS Instance configuration - G6E family with NVIDIA L40S GPU > +# G6E.2XLARGE specifications: > +# - 8 vCPUs (3rd Gen AMD EPYC processors) > +# - 32 GB system RAM > +# - 1x NVIDIA L40S Tensor Core GPU > +# - 48 GB GPU memory > +# - Up to 15 Gbps network performance > +# - Up to 10 Gbps EBS bandwidth > +CONFIG_TERRAFORM_AWS_INSTANCE_TYPE_G6E=y > +CONFIG_TERRAFORM_AWS_INSTANCE_G6E_2XLARGE=y > + > +# AWS Region - US East (N. Virginia) - primary availability for G6E > +CONFIG_TERRAFORM_AWS_REGION_US_EAST_1=y > + > +# GPU-optimized Deep Learning AMI > +# Includes: NVIDIA drivers 535+, CUDA 12.x, cuDNN, TensorFlow, PyTorch, MXNet > +CONFIG_TERRAFORM_AWS_USE_GPU_AMI=y > +CONFIG_TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING=y > +CONFIG_TERRAFORM_AWS_GPU_AMI_NAME="Deep Learning AMI GPU TensorFlow*" > +CONFIG_TERRAFORM_AWS_GPU_AMI_OWNER="amazon" > + > +# Storage configuration optimized for ML workloads > +# 200 GB for datasets, models, and experiment artifacts > +CONFIG_TERRAFORM_AWS_DATA_VOLUME_SIZE=200 > + > +# Basic workflow configuration for kernel development > +CONFIG_WORKFLOWS=y > +CONFIG_WORKFLOW_LINUX_CUSTOM=y > +CONFIG_BOOTLINUX=y > + > +# Skip testing workflows for pure AI/ML setup > +CONFIG_WORKFLOWS_TESTS=n > + > +# Enable systemd journal remote for debugging > +CONFIG_DEVCONFIG_ENABLE_SYSTEMD_JOURNAL_REMOTE=y > + > +# Note: After provisioning, the instance will have: > +# - Jupyter notebook server ready for ML experiments > +# - Pre-installed deep learning frameworks > +# - NVIDIA GPU drivers and CUDA toolkit > +# - Docker with NVIDIA Container Toolkit for containerized ML workloads > \ No newline at end of file > diff --git a/playbooks/roles/gen_tfvars/templates/aws/terraform.tfvars.j2 b/playbooks/roles/gen_tfvars/templates/aws/terraform.tfvars.j2 > index d880254b..f8f4c842 100644 > --- a/playbooks/roles/gen_tfvars/templates/aws/terraform.tfvars.j2 > +++ b/playbooks/roles/gen_tfvars/templates/aws/terraform.tfvars.j2 > @@ -1,8 +1,13 @@ > aws_profile = "{{ terraform_aws_profile }}" > aws_region = "{{ terraform_aws_region }}" > aws_availability_zone = "{{ terraform_aws_av_zone }}" > +{% if terraform_aws_use_gpu_ami is defined and terraform_aws_use_gpu_ami %} > +aws_name_search = "{{ terraform_aws_gpu_ami_name }}" > +aws_ami_owner = "{{ terraform_aws_gpu_ami_owner }}" > +{% else %} > aws_name_search = "{{ terraform_aws_ns }}" > aws_ami_owner = "{{ terraform_aws_ami_owner }}" > +{% endif %} > aws_instance_type = "{{ terraform_aws_instance_type }}" > aws_ebs_volumes_per_instance = "{{ terraform_aws_ebs_volumes_per_instance }}" > aws_ebs_volume_size = {{ terraform_aws_ebs_volume_size }} > diff --git a/scripts/aws-cli b/scripts/aws-cli > new file mode 100755 > index 00000000..6cacce8b > --- /dev/null > +++ b/scripts/aws-cli > @@ -0,0 +1,436 @@ > +#!/usr/bin/env python3 > +# SPDX-License-Identifier: MIT > +""" > +AWS CLI tool for kdevops > + > +A structured CLI tool that wraps AWS CLI commands and provides access to > +AWS cloud provider functionality for dynamic configuration generation > +and resource management. > +""" > + > +import argparse > +import json > +import sys > +import os > +from typing import Dict, List, Any, Optional, Tuple > +from pathlib import Path > + > +# Import the AWS API functions > +try: > + from aws_api import ( > + check_aws_cli, > + get_instance_types, > + get_regions, > + get_availability_zones, > + get_pricing_info, > + generate_instance_types_kconfig, > + generate_regions_kconfig, > + generate_instance_families_kconfig, > + generate_gpu_amis_kconfig, > + ) > +except ImportError: > + # Try to import from scripts directory if not in path > + sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) > + from aws_api import ( > + check_aws_cli, > + get_instance_types, > + get_regions, > + get_availability_zones, > + get_pricing_info, > + generate_instance_types_kconfig, > + generate_regions_kconfig, > + generate_instance_families_kconfig, > + generate_gpu_amis_kconfig, > + ) > + > + > +class AWSCLI: > + """AWS CLI interface for kdevops""" > + > + def __init__(self, output_format: str = "json"): > + """ > + Initialize the CLI with specified output format > + > + Args: > + output_format: 'json' or 'text' for output formatting > + """ > + self.output_format = output_format > + self.aws_available = check_aws_cli() > + > + def output(self, data: Any, headers: Optional[List[str]] = None): > + """ > + Output data in the specified format > + > + Args: > + data: Data to output (dict, list, or primitive) > + headers: Column headers for text format (optional) > + """ > + if self.output_format == "json": > + print(json.dumps(data, indent=2)) > + else: > + # Human-readable text format > + if isinstance(data, list): > + if data and isinstance(data[0], dict): > + # Table format for list of dicts > + if not headers: > + headers = list(data[0].keys()) if data else [] > + > + if headers: > + # Calculate column widths > + widths = {h: len(h) for h in headers} > + for item in data: > + for h in headers: > + val = str(item.get(h, "")) > + widths[h] = max(widths[h], len(val)) > + > + # Print header > + header_line = " | ".join(h.ljust(widths[h]) for h in headers) > + print(header_line) > + print("-" * len(header_line)) > + > + # Print rows > + for item in data: > + row = " | ".join( > + str(item.get(h, "")).ljust(widths[h]) for h in headers > + ) > + print(row) > + else: > + # Simple list > + for item in data: > + print(item) > + elif isinstance(data, dict): > + # Key-value format > + max_key_len = max(len(k) for k in data.keys()) if data else 0 > + for key, value in data.items(): > + print(f"{key.ljust(max_key_len)} : {value}") > + else: > + # Simple value > + print(data) > + > + def list_instance_types( > + self, > + family: Optional[str] = None, > + region: Optional[str] = None, > + max_results: int = 100, > + ) -> List[Dict[str, Any]]: > + """ > + List instance types > + > + Args: > + family: Filter by instance family (e.g., 'm5', 't3') > + region: AWS region to query > + max_results: Maximum number of results to return > + > + Returns: > + List of instance type information > + """ > + if not self.aws_available: > + return [ > + { > + "error": "AWS CLI not found. Please install AWS CLI and configure credentials." > + } > + ] > + > + instances = get_instance_types( > + family=family, region=region, max_results=max_results > + ) > + > + # Format the results > + result = [] > + for instance in instances: > + item = { > + "name": instance.get("InstanceType", ""), > + "vcpu": instance.get("VCpuInfo", {}).get("DefaultVCpus", 0), > + "memory_gb": instance.get("MemoryInfo", {}).get("SizeInMiB", 0) / 1024, > + "instance_storage": instance.get("InstanceStorageSupported", False), > + "network_performance": instance.get("NetworkInfo", {}).get( > + "NetworkPerformance", "" > + ), > + "architecture": ", ".join( > + instance.get("ProcessorInfo", {}).get("SupportedArchitectures", []) > + ), > + } > + result.append(item) > + > + # Sort by name > + result.sort(key=lambda x: x["name"]) > + > + return result > + > + def list_regions(self, include_zones: bool = False) -> List[Dict[str, Any]]: > + """ > + List regions > + > + Args: > + include_zones: Include availability zones for each region > + > + Returns: > + List of region information > + """ > + if not self.aws_available: > + return [ > + { > + "error": "AWS CLI not found. Please install AWS CLI and configure credentials." > + } > + ] > + > + regions = get_regions() > + > + result = [] > + for region in regions: > + item = { > + "name": region.get("RegionName", ""), > + "endpoint": region.get("Endpoint", ""), > + "opt_in_status": region.get("OptInStatus", ""), > + } > + > + if include_zones: > + # Get availability zones for this region > + zones = get_availability_zones(region["RegionName"]) > + item["zones"] = len(zones) > + item["zone_names"] = ", ".join([z["ZoneName"] for z in zones]) > + > + result.append(item) > + > + return result > + > + def get_cheapest_instance( > + self, > + region: Optional[str] = None, > + family: Optional[str] = None, > + min_vcpus: int = 2, > + ) -> Dict[str, Any]: > + """ > + Get the cheapest instance meeting criteria > + > + Args: > + region: AWS region > + family: Instance family filter > + min_vcpus: Minimum number of vCPUs required > + > + Returns: > + Dictionary with instance information > + """ > + if not self.aws_available: > + return {"error": "AWS CLI not available"} > + > + instances = get_instance_types(family=family, region=region) > + > + # Filter by minimum vCPUs > + eligible = [] > + for instance in instances: > + vcpus = instance.get("VCpuInfo", {}).get("DefaultVCpus", 0) > + if vcpus >= min_vcpus: > + eligible.append(instance) > + > + if not eligible: > + return {"error": "No instances found matching criteria"} > + > + # Get pricing for eligible instances > + pricing = get_pricing_info(region=region or "us-east-1") > + > + # Find cheapest > + cheapest = None > + cheapest_price = float("inf") > + > + for instance in eligible: > + instance_type = instance.get("InstanceType") > + price = pricing.get(instance_type, {}).get("on_demand", float("inf")) > + if price < cheapest_price: > + cheapest_price = price > + cheapest = instance > + > + if cheapest: > + return { > + "instance_type": cheapest.get("InstanceType"), > + "vcpus": cheapest.get("VCpuInfo", {}).get("DefaultVCpus", 0), > + "memory_gb": cheapest.get("MemoryInfo", {}).get("SizeInMiB", 0) / 1024, > + "price_per_hour": f"${cheapest_price:.3f}", > + } > + > + return {"error": "Could not determine cheapest instance"} > + > + def generate_kconfig(self) -> bool: > + """ > + Generate Kconfig files for AWS > + > + Returns: > + True on success, False on failure > + """ > + if not self.aws_available: > + print("AWS CLI not available, cannot generate Kconfig", file=sys.stderr) > + return False > + > + output_dir = Path("terraform/aws/kconfigs") > + > + # Create directory if it doesn't exist > + output_dir.mkdir(parents=True, exist_ok=True) > + > + try: > + from concurrent.futures import ThreadPoolExecutor, as_completed > + > + # Generate files in parallel > + instance_types_dir = output_dir / "instance-types" > + instance_types_dir.mkdir(exist_ok=True) > + > + def generate_family_file(family): > + """Generate Kconfig for a single family.""" > + types_kconfig = generate_instance_types_kconfig(family) > + if types_kconfig: > + types_file = instance_types_dir / f"Kconfig.{family}.generated" > + types_file.write_text(types_kconfig) > + return f"Generated {types_file}" > + return None > + > + with ThreadPoolExecutor(max_workers=10) as executor: > + # Submit all generation tasks > + futures = [] > + > + # Generate instance families Kconfig > + futures.append(executor.submit(generate_instance_families_kconfig)) > + > + # Generate regions Kconfig > + futures.append(executor.submit(generate_regions_kconfig)) > + > + # Generate GPU AMIs Kconfig > + futures.append(executor.submit(generate_gpu_amis_kconfig)) > + > + # Generate instance types for each family > + # Get all families dynamically from AWS > + from aws_api import get_generated_instance_families > + > + families = get_generated_instance_families() > + > + family_futures = [] > + for family in sorted(families): > + family_futures.append(executor.submit(generate_family_file, family)) > + > + # Process main config results > + families_kconfig = futures[0].result() > + regions_kconfig = futures[1].result() > + gpu_amis_kconfig = futures[2].result() > + > + # Write main configs > + families_file = output_dir / "Kconfig.compute.generated" > + families_file.write_text(families_kconfig) > + print(f"Generated {families_file}") > + > + regions_file = output_dir / "Kconfig.location.generated" > + regions_file.write_text(regions_kconfig) > + print(f"Generated {regions_file}") > + > + gpu_amis_file = output_dir / "Kconfig.gpu-amis.generated" > + gpu_amis_file.write_text(gpu_amis_kconfig) > + print(f"Generated {gpu_amis_file}") > + > + # Process family results > + for future in family_futures: > + result = future.result() > + if result: > + print(result) > + > + return True > + > + except Exception as e: > + print(f"Error generating Kconfig: {e}", file=sys.stderr) > + return False > + > + > +def main(): > + """Main entry point""" > + parser = argparse.ArgumentParser( > + description="AWS CLI tool for kdevops", > + formatter_class=argparse.RawDescriptionHelpFormatter, > + ) > + > + parser.add_argument( > + "--output", > + choices=["json", "text"], > + default="json", > + help="Output format (default: json)", > + ) > + > + subparsers = parser.add_subparsers(dest="command", help="Available commands") > + > + # Generate Kconfig command > + kconfig_parser = subparsers.add_parser( > + "generate-kconfig", help="Generate Kconfig files for AWS" > + ) > + > + # Instance types command > + instances_parser = subparsers.add_parser( > + "instance-types", help="Manage instance types" > + ) > + instances_subparsers = instances_parser.add_subparsers( > + dest="subcommand", help="Instance type operations" > + ) > + > + # Instance types list > + list_instances = instances_subparsers.add_parser("list", help="List instance types") > + list_instances.add_argument("--family", help="Filter by instance family") > + list_instances.add_argument("--region", help="AWS region") > + list_instances.add_argument( > + "--max-results", type=int, default=100, help="Maximum results (default: 100)" > + ) > + > + # Regions command > + regions_parser = subparsers.add_parser("regions", help="Manage regions") > + regions_subparsers = regions_parser.add_subparsers( > + dest="subcommand", help="Region operations" > + ) > + > + # Regions list > + list_regions = regions_subparsers.add_parser("list", help="List regions") > + list_regions.add_argument( > + "--include-zones", > + action="store_true", > + help="Include availability zones", > + ) > + > + # Cheapest instance command > + cheapest_parser = subparsers.add_parser( > + "cheapest", help="Find cheapest instance meeting criteria" > + ) > + cheapest_parser.add_argument("--region", help="AWS region") > + cheapest_parser.add_argument("--family", help="Instance family") > + cheapest_parser.add_argument( > + "--min-vcpus", type=int, default=2, help="Minimum vCPUs (default: 2)" > + ) > + > + args = parser.parse_args() > + > + cli = AWSCLI(output_format=args.output) > + > + if args.command == "generate-kconfig": > + success = cli.generate_kconfig() > + sys.exit(0 if success else 1) > + > + elif args.command == "instance-types": > + if args.subcommand == "list": > + instances = cli.list_instance_types( > + family=args.family, > + region=args.region, > + max_results=args.max_results, > + ) > + cli.output(instances) > + > + elif args.command == "regions": > + if args.subcommand == "list": > + regions = cli.list_regions(include_zones=args.include_zones) > + cli.output(regions) > + > + elif args.command == "cheapest": > + result = cli.get_cheapest_instance( > + region=args.region, > + family=args.family, > + min_vcpus=args.min_vcpus, > + ) > + cli.output(result) > + > + else: > + parser.print_help() > + sys.exit(1) > + > + > +if __name__ == "__main__": > + main() > diff --git a/scripts/aws_api.py b/scripts/aws_api.py > new file mode 100755 > index 00000000..1cf42f39 > --- /dev/null > +++ b/scripts/aws_api.py > @@ -0,0 +1,1135 @@ > +#!/usr/bin/env python3 > +# SPDX-License-Identifier: MIT > +""" > +AWS API library for kdevops. > + > +Provides AWS CLI wrapper functions for dynamic configuration generation. > +Used by aws-cli and other kdevops components. > +""" > + > +import json > +import os > +import re > +import subprocess > +import sys > +from typing import Dict, List, Optional, Any > + > + > +def check_aws_cli() -> bool: > + """Check if AWS CLI is installed and configured.""" > + try: > + # Check if AWS CLI is installed > + result = subprocess.run( > + ["aws", "--version"], > + capture_output=True, > + text=True, > + check=False, > + ) > + if result.returncode != 0: > + return False > + > + # Check if credentials are configured > + result = subprocess.run( > + ["aws", "sts", "get-caller-identity"], > + capture_output=True, > + text=True, > + check=False, > + ) > + return result.returncode == 0 > + except FileNotFoundError: > + return False > + > + > +def get_default_region() -> str: > + """Get the default AWS region from configuration or environment.""" > + # Try to get from environment > + region = os.environ.get("AWS_DEFAULT_REGION") > + if region: > + return region > + > + # Try to get from AWS config > + try: > + result = subprocess.run( > + ["aws", "configure", "get", "region"], > + capture_output=True, > + text=True, > + check=False, > + ) > + if result.returncode == 0 and result.stdout.strip(): > + return result.stdout.strip() > + except: > + pass > + > + # Default to us-east-1 > + return "us-east-1" > + > + > +def run_aws_command(command: List[str], region: Optional[str] = None) -> Optional[Dict]: > + """ > + Run an AWS CLI command and return the JSON output. > + > + Args: > + command: AWS CLI command as a list > + region: Optional AWS region > + > + Returns: > + Parsed JSON output or None on error > + """ > + cmd = ["aws"] + command + ["--output", "json"] > + > + # Always specify a region (use default if not provided) > + if not region: > + region = get_default_region() > + cmd.extend(["--region", region]) > + > + try: > + result = subprocess.run( > + cmd, > + capture_output=True, > + text=True, > + check=False, > + ) > + if result.returncode == 0: > + return json.loads(result.stdout) if result.stdout else {} > + else: > + print(f"AWS command failed: {result.stderr}", file=sys.stderr) > + return None > + except (subprocess.SubprocessError, json.JSONDecodeError) as e: > + print(f"Error running AWS command: {e}", file=sys.stderr) > + return None > + > + > +def get_regions() -> List[Dict[str, Any]]: > + """Get available AWS regions.""" > + response = run_aws_command(["ec2", "describe-regions"]) > + if response and "Regions" in response: > + return response["Regions"] > + return [] > + > + > +def get_availability_zones(region: str) -> List[Dict[str, Any]]: > + """Get availability zones for a specific region.""" > + response = run_aws_command( > + ["ec2", "describe-availability-zones"], > + region=region, > + ) > + if response and "AvailabilityZones" in response: > + return response["AvailabilityZones"] > + return [] > + > + > +def get_instance_types( > + family: Optional[str] = None, > + region: Optional[str] = None, > + max_results: int = 100, > + fetch_all: bool = True, > +) -> List[Dict[str, Any]]: > + """ > + Get available instance types. > + > + Args: > + family: Instance family filter (e.g., 'm5', 't3') > + region: AWS region > + max_results: Maximum number of results per API call (max 100) > + fetch_all: If True, fetch all pages using NextToken pagination > + > + Returns: > + List of instance type information > + """ > + all_instances = [] > + next_token = None > + page_count = 0 > + > + # Ensure max_results doesn't exceed AWS limit > + max_results = min(max_results, 100) > + > + while True: > + cmd = ["ec2", "describe-instance-types"] > + > + filters = [] > + if family: > + # Filter by instance type pattern > + filters.append(f"Name=instance-type,Values={family}*") > + > + if filters: > + cmd.append("--filters") > + cmd.extend(filters) > + > + cmd.extend(["--max-results", str(max_results)]) > + > + if next_token: > + cmd.extend(["--next-token", next_token]) > + > + response = run_aws_command(cmd, region=region) > + if response and "InstanceTypes" in response: > + batch_size = len(response["InstanceTypes"]) > + all_instances.extend(response["InstanceTypes"]) > + page_count += 1 > + > + if fetch_all and not family: > + # Only show progress for full fetches (not family-specific) > + print( > + f" Fetched page {page_count}: {batch_size} instance types (total: {len(all_instances)})", > + file=sys.stderr, > + ) > + > + # Check if there are more results > + if fetch_all and "NextToken" in response: > + next_token = response["NextToken"] > + else: > + break > + else: > + break > + > + if fetch_all and page_count > 1: > + filter_desc = f" for family '{family}'" if family else "" > + print( > + f" Total: {len(all_instances)} instance types fetched{filter_desc}", > + file=sys.stderr, > + ) > + > + return all_instances > + > + > +def get_pricing_info(region: str = "us-east-1") -> Dict[str, Dict[str, float]]: > + """ > + Get pricing information for instance types. > + > + Note: AWS Pricing API requires us-east-1 region. > + Returns a simplified pricing structure. > + > + Args: > + region: AWS region for pricing > + > + Returns: > + Dictionary mapping instance types to pricing info > + """ > + # For simplicity, we'll use hardcoded common instance prices > + # In production, you'd query the AWS Pricing API > + pricing = { > + # T3 family (burstable) > + "t3.nano": {"on_demand": 0.0052}, > + "t3.micro": {"on_demand": 0.0104}, > + "t3.small": {"on_demand": 0.0208}, > + "t3.medium": {"on_demand": 0.0416}, > + "t3.large": {"on_demand": 0.0832}, > + "t3.xlarge": {"on_demand": 0.1664}, > + "t3.2xlarge": {"on_demand": 0.3328}, > + # T3a family (AMD) > + "t3a.nano": {"on_demand": 0.0047}, > + "t3a.micro": {"on_demand": 0.0094}, > + "t3a.small": {"on_demand": 0.0188}, > + "t3a.medium": {"on_demand": 0.0376}, > + "t3a.large": {"on_demand": 0.0752}, > + "t3a.xlarge": {"on_demand": 0.1504}, > + "t3a.2xlarge": {"on_demand": 0.3008}, > + # M5 family (general purpose Intel) > + "m5.large": {"on_demand": 0.096}, > + "m5.xlarge": {"on_demand": 0.192}, > + "m5.2xlarge": {"on_demand": 0.384}, > + "m5.4xlarge": {"on_demand": 0.768}, > + "m5.8xlarge": {"on_demand": 1.536}, > + "m5.12xlarge": {"on_demand": 2.304}, > + "m5.16xlarge": {"on_demand": 3.072}, > + "m5.24xlarge": {"on_demand": 4.608}, > + # M7a family (general purpose AMD) > + "m7a.medium": {"on_demand": 0.0464}, > + "m7a.large": {"on_demand": 0.0928}, > + "m7a.xlarge": {"on_demand": 0.1856}, > + "m7a.2xlarge": {"on_demand": 0.3712}, > + "m7a.4xlarge": {"on_demand": 0.7424}, > + "m7a.8xlarge": {"on_demand": 1.4848}, > + "m7a.12xlarge": {"on_demand": 2.2272}, > + "m7a.16xlarge": {"on_demand": 2.9696}, > + "m7a.24xlarge": {"on_demand": 4.4544}, > + "m7a.32xlarge": {"on_demand": 5.9392}, > + "m7a.48xlarge": {"on_demand": 8.9088}, > + # C5 family (compute optimized) > + "c5.large": {"on_demand": 0.085}, > + "c5.xlarge": {"on_demand": 0.17}, > + "c5.2xlarge": {"on_demand": 0.34}, > + "c5.4xlarge": {"on_demand": 0.68}, > + "c5.9xlarge": {"on_demand": 1.53}, > + "c5.12xlarge": {"on_demand": 2.04}, > + "c5.18xlarge": {"on_demand": 3.06}, > + "c5.24xlarge": {"on_demand": 4.08}, > + # C7a family (compute optimized AMD) > + "c7a.medium": {"on_demand": 0.0387}, > + "c7a.large": {"on_demand": 0.0774}, > + "c7a.xlarge": {"on_demand": 0.1548}, > + "c7a.2xlarge": {"on_demand": 0.3096}, > + "c7a.4xlarge": {"on_demand": 0.6192}, > + "c7a.8xlarge": {"on_demand": 1.2384}, > + "c7a.12xlarge": {"on_demand": 1.8576}, > + "c7a.16xlarge": {"on_demand": 2.4768}, > + "c7a.24xlarge": {"on_demand": 3.7152}, > + "c7a.32xlarge": {"on_demand": 4.9536}, > + "c7a.48xlarge": {"on_demand": 7.4304}, > + # I4i family (storage optimized) > + "i4i.large": {"on_demand": 0.117}, > + "i4i.xlarge": {"on_demand": 0.234}, > + "i4i.2xlarge": {"on_demand": 0.468}, > + "i4i.4xlarge": {"on_demand": 0.936}, > + "i4i.8xlarge": {"on_demand": 1.872}, > + "i4i.16xlarge": {"on_demand": 3.744}, > + "i4i.32xlarge": {"on_demand": 7.488}, > + } > + > + # Adjust pricing based on region (simplified) > + # Some regions are more expensive than others > + region_multipliers = { > + "us-east-1": 1.0, > + "us-east-2": 1.0, > + "us-west-1": 1.08, > + "us-west-2": 1.0, > + "eu-west-1": 1.1, > + "eu-central-1": 1.15, > + "ap-southeast-1": 1.2, > + "ap-northeast-1": 1.25, > + } > + > + multiplier = region_multipliers.get(region, 1.1) > + if multiplier != 1.0: > + adjusted_pricing = {} > + for instance_type, prices in pricing.items(): > + adjusted_pricing[instance_type] = { > + "on_demand": prices["on_demand"] * multiplier > + } > + return adjusted_pricing > + > + return pricing > + > + > +def sanitize_kconfig_name(name: str) -> str: > + """Convert a name to a valid Kconfig symbol.""" > + # Replace special characters with underscores > + name = name.replace("-", "_").replace(".", "_").replace(" ", "_") > + # Convert to uppercase > + name = name.upper() > + # Remove any non-alphanumeric characters (except underscore) > + name = "".join(c for c in name if c.isalnum() or c == "_") > + # Ensure it doesn't start with a number > + if name and name[0].isdigit(): > + name = "_" + name > + return name > + > + > +# Cache for instance families to avoid redundant API calls > +_cached_families = None > + > + > +def get_generated_instance_families() -> set: > + """Get the set of instance families that will have generated Kconfig files.""" > + global _cached_families > + > + # Return cached result if available > + if _cached_families is not None: > + return _cached_families > + > + # Return all families - we'll generate Kconfig files for all of them > + # This function will be called by the aws-cli tool to determine which files to generate > + if not check_aws_cli(): > + # Return a minimal set if AWS CLI is not available > + _cached_families = {"m5", "t3", "c5"} > + return _cached_families > + > + # Get all available instance types > + print(" Discovering available instance families...", file=sys.stderr) > + instance_types = get_instance_types(fetch_all=True) > + > + # Extract unique families > + families = set() > + for instance_type in instance_types: > + type_name = instance_type.get("InstanceType", "") > + # Extract family prefix (e.g., "m5" from "m5.large") > + if "." in type_name: > + family = type_name.split(".")[0] > + families.add(family) > + > + print(f" Found {len(families)} instance families", file=sys.stderr) > + _cached_families = families > + return families > + > + > +def generate_instance_families_kconfig() -> str: > + """Generate Kconfig content for AWS instance families.""" > + # Check if AWS CLI is available > + if not check_aws_cli(): > + return generate_default_instance_families_kconfig() > + > + # Get all available instance types (with pagination) > + instance_types = get_instance_types(fetch_all=True) > + > + # Extract unique families > + families = set() > + family_info = {} > + for instance in instance_types: > + instance_type = instance.get("InstanceType", "") > + if "." in instance_type: > + family = instance_type.split(".")[0] > + families.add(family) > + if family not in family_info: > + family_info[family] = { > + "architectures": set(), > + "count": 0, > + } > + family_info[family]["count"] += 1 > + for arch in instance.get("ProcessorInfo", {}).get( > + "SupportedArchitectures", [] > + ): > + family_info[family]["architectures"].add(arch) > + > + if not families: > + return generate_default_instance_families_kconfig() > + > + # Group families by category - use prefix patterns to catch all variants > + def categorize_family(family_name): > + """Categorize a family based on its prefix.""" > + if family_name.startswith(("m", "t")): > + return "general_purpose" > + elif family_name.startswith("c"): > + return "compute_optimized" > + elif family_name.startswith(("r", "x", "z")): > + return "memory_optimized" > + elif family_name.startswith(("i", "d", "h")): > + return "storage_optimized" > + elif family_name.startswith(("p", "g", "dl", "trn", "inf", "vt", "f")): > + return "accelerated" > + elif family_name.startswith(("mac", "hpc")): > + return "specialized" > + else: > + return "other" > + > + # Organize families by category > + categorized_families = { > + "general_purpose": [], > + "compute_optimized": [], > + "memory_optimized": [], > + "storage_optimized": [], > + "accelerated": [], > + "specialized": [], > + "other": [], > + } > + > + for family in sorted(families): > + category = categorize_family(family) > + categorized_families[category].append(family) > + > + kconfig = """# AWS instance families (dynamically generated) > +# Generated by aws-cli from live AWS data > + > +choice > + prompt "AWS instance family" > + default TERRAFORM_AWS_INSTANCE_TYPE_M5 > + help > + Select the AWS instance family for your deployment. > + Different families are optimized for different workloads. > + > +""" > + > + # Category headers > + category_headers = { > + "general_purpose": "# General Purpose - balanced compute, memory, and networking\n", > + "compute_optimized": "# Compute Optimized - ideal for CPU-intensive applications\n", > + "memory_optimized": "# Memory Optimized - for memory-intensive applications\n", > + "storage_optimized": "# Storage Optimized - for high sequential read/write workloads\n", > + "accelerated": "# Accelerated Computing - GPU and other accelerators\n", > + "specialized": "# Specialized - for specific use cases\n", > + "other": "# Other instance families\n", > + } > + > + # Add each category of families > + for category in [ > + "general_purpose", > + "compute_optimized", > + "memory_optimized", > + "storage_optimized", > + "accelerated", > + "specialized", > + "other", > + ]: > + if categorized_families[category]: > + kconfig += category_headers[category] > + for family in categorized_families[category]: > + kconfig += generate_family_config(family, family_info.get(family, {})) > + if category != "other": # Don't add extra newline after the last category > + kconfig += "\n" > + > + kconfig += "\nendchoice\n" > + > + # Add instance type source includes for each family > + # Only include families that we actually generate files for > + generated_families = get_generated_instance_families() > + kconfig += "\n# Include instance-specific configurations\n" > + for family in sorted(families): > + # Only add source statement if we generate a file for this family > + if family in generated_families: > + safe_name = sanitize_kconfig_name(family) > + kconfig += f"""if TERRAFORM_AWS_INSTANCE_TYPE_{safe_name} > +source "terraform/aws/kconfigs/instance-types/Kconfig.{family}.generated" > +endif > + > +""" > + > + return kconfig > + > + > +def generate_family_config(family: str, info: Dict) -> str: > + """Generate Kconfig entry for an instance family.""" > + safe_name = sanitize_kconfig_name(family) > + > + # Determine architecture dependencies > + architectures = info.get("architectures", set()) > + depends_line = "" > + if architectures: > + if "x86_64" in architectures and "arm64" not in architectures: > + depends_line = "\n\tdepends on TARGET_ARCH_X86_64" > + elif "arm64" in architectures and "x86_64" not in architectures: > + depends_line = "\n\tdepends on TARGET_ARCH_ARM64" > + > + # Family descriptions > + descriptions = { > + "t3": "Burstable performance instances powered by Intel processors", > + "t3a": "Burstable performance instances powered by AMD processors", > + "m5": "General purpose instances powered by Intel Xeon Platinum processors", > + "m7a": "Latest generation general purpose instances powered by AMD EPYC processors", > + "c5": "Compute optimized instances powered by Intel Xeon Platinum processors", > + "c7a": "Latest generation compute optimized instances powered by AMD EPYC processors", > + "i4i": "Storage optimized instances with NVMe SSD storage", > + "is4gen": "Storage optimized ARM instances powered by AWS Graviton2", > + "im4gn": "Storage optimized ARM instances with NVMe storage", > + "r5": "Memory optimized instances powered by Intel Xeon Platinum processors", > + "p3": "GPU instances for machine learning and HPC", > + "g4dn": "GPU instances for graphics-intensive applications", > + } > + > + description = descriptions.get(family, f"AWS {family.upper()} instance family") > + count = info.get("count", 0) > + > + config = f"""config TERRAFORM_AWS_INSTANCE_TYPE_{safe_name} > +\tbool "{family.upper()}" > +{depends_line} > +\thelp > +\t {description} > +\t Available instance types: {count} > + > +""" > + return config > + > + > +def generate_default_instance_families_kconfig() -> str: > + """Generate default Kconfig content when AWS CLI is not available.""" > + return """# AWS instance families (default - AWS CLI not available) > + > +choice > + prompt "AWS instance family" > + default TERRAFORM_AWS_INSTANCE_TYPE_M5 > + help > + Select the AWS instance family for your deployment. > + Note: AWS CLI is not available, showing default options. > + > +config TERRAFORM_AWS_INSTANCE_TYPE_M5 > + bool "M5" > + depends on TARGET_ARCH_X86_64 > + help > + General purpose instances powered by Intel Xeon Platinum processors. > + > +config TERRAFORM_AWS_INSTANCE_TYPE_M7A > + bool "M7a" > + depends on TARGET_ARCH_X86_64 > + help > + Latest generation general purpose instances powered by AMD EPYC processors. > + > +config TERRAFORM_AWS_INSTANCE_TYPE_T3 > + bool "T3" > + depends on TARGET_ARCH_X86_64 > + help > + Burstable performance instances powered by Intel processors. > + > +config TERRAFORM_AWS_INSTANCE_TYPE_C5 > + bool "C5" > + depends on TARGET_ARCH_X86_64 > + help > + Compute optimized instances powered by Intel Xeon Platinum processors. > + > +config TERRAFORM_AWS_INSTANCE_TYPE_I4I > + bool "I4i" > + depends on TARGET_ARCH_X86_64 > + help > + Storage optimized instances with NVMe SSD storage. > + > +endchoice > + > +# Include instance-specific configurations > +if TERRAFORM_AWS_INSTANCE_TYPE_M5 > +source "terraform/aws/kconfigs/instance-types/Kconfig.m5" > +endif > + > +if TERRAFORM_AWS_INSTANCE_TYPE_M7A > +source "terraform/aws/kconfigs/instance-types/Kconfig.m7a" > +endif > + > +if TERRAFORM_AWS_INSTANCE_TYPE_T3 > +source "terraform/aws/kconfigs/instance-types/Kconfig.t3.generated" > +endif > + > +if TERRAFORM_AWS_INSTANCE_TYPE_C5 > +source "terraform/aws/kconfigs/instance-types/Kconfig.c5.generated" > +endif > + > +if TERRAFORM_AWS_INSTANCE_TYPE_I4I > +source "terraform/aws/kconfigs/instance-types/Kconfig.i4i" > +endif > + > +""" > + > + > +def generate_instance_types_kconfig(family: str) -> str: > + """Generate Kconfig content for specific instance types within a family.""" > + if not check_aws_cli(): > + return "" > + > + instance_types = get_instance_types(family=family, fetch_all=True) > + if not instance_types: > + return "" > + > + # Filter to only exact family matches (e.g., c5a but not c5ad) > + filtered_instances = [] > + for instance in instance_types: > + instance_type = instance.get("InstanceType", "") > + if "." in instance_type: > + inst_family = instance_type.split(".")[0] > + if inst_family == family: > + filtered_instances.append(instance) > + > + instance_types = filtered_instances > + if not instance_types: > + return "" > + > + pricing = get_pricing_info() > + > + # Sort by vCPU count and memory > + instance_types.sort( > + key=lambda x: ( > + x.get("VCpuInfo", {}).get("DefaultVCpus", 0), > + x.get("MemoryInfo", {}).get("SizeInMiB", 0), > + ) > + ) > + > + safe_family = sanitize_kconfig_name(family) > + > + # Get the first instance type to use as default > + default_instance_name = f"{safe_family}_LARGE" # Fallback > + if instance_types: > + first_instance_type = instance_types[0].get("InstanceType", "") > + if "." in first_instance_type: > + first_full_name = first_instance_type.replace(".", "_") > + default_instance_name = sanitize_kconfig_name(first_full_name) > + > + kconfig = f"""# AWS {family.upper()} instance sizes (dynamically generated) > + > +choice > +\tprompt "Instance size for {family.upper()} family" > +\tdefault TERRAFORM_AWS_INSTANCE_{default_instance_name} > +\thelp > +\t Select the specific instance size within the {family.upper()} family. > + > +""" > + > + seen_configs = set() > + for instance in instance_types: > + instance_type = instance.get("InstanceType", "") > + if "." not in instance_type: > + continue > + > + # Get the full instance type name to make unique config names > + full_name = instance_type.replace(".", "_") > + safe_full_name = sanitize_kconfig_name(full_name) > + > + # Skip if we've already seen this config name > + if safe_full_name in seen_configs: > + continue > + seen_configs.add(safe_full_name) > + > + size = instance_type.split(".")[1] > + > + vcpus = instance.get("VCpuInfo", {}).get("DefaultVCpus", 0) > + memory_mib = instance.get("MemoryInfo", {}).get("SizeInMiB", 0) > + memory_gb = memory_mib / 1024 > + > + # Get pricing > + price = pricing.get(instance_type, {}).get("on_demand", 0.0) > + price_str = f"${price:.3f}/hour" if price > 0 else "pricing varies" > + > + # Network performance > + network = instance.get("NetworkInfo", {}).get("NetworkPerformance", "varies") > + > + # Storage > + storage_info = "" > + if instance.get("InstanceStorageSupported"): > + storage = instance.get("InstanceStorageInfo", {}) > + total_size = storage.get("TotalSizeInGB", 0) > + if total_size > 0: > + storage_info = f"\n\t Instance storage: {total_size} GB" > + > + kconfig += f"""config TERRAFORM_AWS_INSTANCE_{safe_full_name} > +\tbool "{instance_type}" > +\thelp > +\t vCPUs: {vcpus} > +\t Memory: {memory_gb:.1f} GB > +\t Network: {network} > +\t Price: {price_str}{storage_info} > + > +""" > + > + kconfig += "endchoice\n" > + > + # Add the actual instance type string config with full instance names > + kconfig += f""" > +config TERRAFORM_AWS_{safe_family}_SIZE > +\tstring > +""" > + > + # Generate default mappings for each seen instance type > + for instance in instance_types: > + instance_type = instance.get("InstanceType", "") > + if "." not in instance_type: > + continue > + > + full_name = instance_type.replace(".", "_") > + safe_full_name = sanitize_kconfig_name(full_name) > + > + kconfig += ( > + f'\tdefault "{instance_type}" if TERRAFORM_AWS_INSTANCE_{safe_full_name}\n' > + ) > + > + # Use the first instance type as the final fallback default > + final_default = f"{family}.large" > + if instance_types: > + first_instance_type = instance_types[0].get("InstanceType", "") > + if first_instance_type: > + final_default = first_instance_type > + > + kconfig += f'\tdefault "{final_default}"\n\n' > + > + return kconfig > + > + > +def generate_regions_kconfig() -> str: > + """Generate Kconfig content for AWS regions.""" > + if not check_aws_cli(): > + return generate_default_regions_kconfig() > + > + regions = get_regions() > + if not regions: > + return generate_default_regions_kconfig() > + > + kconfig = """# AWS regions (dynamically generated) > + > +choice > + prompt "AWS region" > + default TERRAFORM_AWS_REGION_USEAST1 > + help > + Select the AWS region for your deployment. > + Note: Not all instance types are available in all regions. > + > +""" > + > + # Group regions by geographic area > + us_regions = [] > + eu_regions = [] > + ap_regions = [] > + other_regions = [] > + > + for region in regions: > + region_name = region.get("RegionName", "") > + if region_name.startswith("us-"): > + us_regions.append(region) > + elif region_name.startswith("eu-"): > + eu_regions.append(region) > + elif region_name.startswith("ap-"): > + ap_regions.append(region) > + else: > + other_regions.append(region) > + > + # Add US regions > + if us_regions: > + kconfig += "# US Regions\n" > + for region in sorted(us_regions, key=lambda x: x.get("RegionName", "")): > + kconfig += generate_region_config(region) > + kconfig += "\n" > + > + # Add EU regions > + if eu_regions: > + kconfig += "# Europe Regions\n" > + for region in sorted(eu_regions, key=lambda x: x.get("RegionName", "")): > + kconfig += generate_region_config(region) > + kconfig += "\n" > + > + # Add Asia Pacific regions > + if ap_regions: > + kconfig += "# Asia Pacific Regions\n" > + for region in sorted(ap_regions, key=lambda x: x.get("RegionName", "")): > + kconfig += generate_region_config(region) > + kconfig += "\n" > + > + # Add other regions > + if other_regions: > + kconfig += "# Other Regions\n" > + for region in sorted(other_regions, key=lambda x: x.get("RegionName", "")): > + kconfig += generate_region_config(region) > + > + kconfig += "\nendchoice\n" > + > + # Add the actual region string config > + kconfig += """ > +config TERRAFORM_AWS_REGION > + string > +""" > + > + for region in regions: > + region_name = region.get("RegionName", "") > + safe_name = sanitize_kconfig_name(region_name) > + kconfig += f'\tdefault "{region_name}" if TERRAFORM_AWS_REGION_{safe_name}\n' > + > + kconfig += '\tdefault "us-east-1"\n' > + > + return kconfig > + > + > +def generate_region_config(region: Dict) -> str: > + """Generate Kconfig entry for a region.""" > + region_name = region.get("RegionName", "") > + safe_name = sanitize_kconfig_name(region_name) > + opt_in_status = region.get("OptInStatus", "") > + > + # Region display names > + display_names = { > + "us-east-1": "US East (N. Virginia)", > + "us-east-2": "US East (Ohio)", > + "us-west-1": "US West (N. California)", > + "us-west-2": "US West (Oregon)", > + "eu-west-1": "Europe (Ireland)", > + "eu-west-2": "Europe (London)", > + "eu-west-3": "Europe (Paris)", > + "eu-central-1": "Europe (Frankfurt)", > + "eu-north-1": "Europe (Stockholm)", > + "ap-southeast-1": "Asia Pacific (Singapore)", > + "ap-southeast-2": "Asia Pacific (Sydney)", > + "ap-northeast-1": "Asia Pacific (Tokyo)", > + "ap-northeast-2": "Asia Pacific (Seoul)", > + "ap-south-1": "Asia Pacific (Mumbai)", > + "ca-central-1": "Canada (Central)", > + "sa-east-1": "South America (São Paulo)", > + } > + > + display_name = display_names.get(region_name, region_name.replace("-", " ").title()) > + > + help_text = f"\t Region: {display_name}" > + if opt_in_status and opt_in_status != "opt-in-not-required": > + help_text += f"\n\t Status: {opt_in_status}" > + > + config = f"""config TERRAFORM_AWS_REGION_{safe_name} > +\tbool "{display_name}" > +\thelp > +{help_text} > + > +""" > + return config > + > + > +def get_gpu_amis(region: str = None) -> List[Dict[str, Any]]: > + """ > + Get available GPU-optimized AMIs including Deep Learning AMIs. > + > + Args: > + region: AWS region > + > + Returns: > + List of AMI information > + """ > + # Query for Deep Learning AMIs from AWS > + cmd = ["ec2", "describe-images"] > + filters = [ > + "Name=owner-alias,Values=amazon", > + "Name=name,Values=Deep Learning AMI GPU*", > + "Name=state,Values=available", > + "Name=architecture,Values=x86_64", > + ] > + cmd.append("--filters") > + cmd.extend(filters) > + cmd.extend(["--query", "Images[?contains(Name, '2024') || contains(Name, '2025')]"]) > + > + response = run_aws_command(cmd, region=region) > + > + if response: > + # Sort by creation date to get the most recent > + response.sort(key=lambda x: x.get("CreationDate", ""), reverse=True) > + return response[:10] # Return top 10 most recent > + return [] > + > + > +def generate_gpu_amis_kconfig() -> str: > + """Generate Kconfig content for GPU AMIs.""" > + # Check if AWS CLI is available > + if not check_aws_cli(): > + return generate_default_gpu_amis_kconfig() > + > + # Get available GPU AMIs > + amis = get_gpu_amis() > + > + if not amis: > + return generate_default_gpu_amis_kconfig() > + > + kconfig = """# GPU-optimized AMIs (dynamically generated) > + > +# GPU AMI Override - only shown for GPU instances > +config TERRAFORM_AWS_USE_GPU_AMI > + bool "Use GPU-optimized AMI instead of standard distribution" > + depends on TERRAFORM_AWS_IS_GPU_INSTANCE > + output yaml > + default n > + help > + Enable this to use a GPU-optimized AMI with pre-installed NVIDIA drivers, > + CUDA, and ML frameworks instead of the standard distribution AMI. > + > + When disabled, the standard distribution AMI will be used and you'll need > + to install GPU drivers manually. > + > +if TERRAFORM_AWS_USE_GPU_AMI > + > +choice > + prompt "GPU-optimized AMI selection" > + default TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING > + depends on TERRAFORM_AWS_IS_GPU_INSTANCE > + help > + Select which GPU-optimized AMI to use for your GPU instance. > + > +config TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING > + bool "AWS Deep Learning AMI (Ubuntu 22.04)" > + help > + AWS Deep Learning AMI with NVIDIA drivers, CUDA, cuDNN, and popular ML frameworks. > + Optimized for machine learning workloads on GPU instances. > + Includes: TensorFlow, PyTorch, MXNet, and Jupyter. > + > +config TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING_NVIDIA > + bool "NVIDIA Deep Learning AMI" > + help > + NVIDIA optimized Deep Learning AMI with latest GPU drivers. > + Includes NVIDIA GPU Cloud (NGC) containers and frameworks. > + > +config TERRAFORM_AWS_GPU_AMI_CUSTOM > + bool "Custom GPU AMI" > + help > + Specify a custom AMI ID for GPU instances. > + > +endchoice > + > +if TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING > + > +config TERRAFORM_AWS_GPU_AMI_NAME > + string > + output yaml > + default "Deep Learning AMI GPU TensorFlow*" > + help > + AMI name pattern for AWS Deep Learning AMI. > + > +config TERRAFORM_AWS_GPU_AMI_OWNER > + string > + output yaml > + default "amazon" > + > +endif # TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING > + > +if TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING_NVIDIA > + > +config TERRAFORM_AWS_GPU_AMI_NAME > + string > + output yaml > + default "NVIDIA Deep Learning AMI*" > + help > + AMI name pattern for NVIDIA Deep Learning AMI. > + > +config TERRAFORM_AWS_GPU_AMI_OWNER > + string > + output yaml > + default "amazon" > + > +endif # TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING_NVIDIA > + > +if TERRAFORM_AWS_GPU_AMI_CUSTOM > + > +config TERRAFORM_AWS_GPU_AMI_ID > + string "Custom GPU AMI ID" > + output yaml > + help > + Specify the AMI ID for your custom GPU image. > + Example: ami-0123456789abcdef0 > + > +endif # TERRAFORM_AWS_GPU_AMI_CUSTOM > + > +endif # TERRAFORM_AWS_USE_GPU_AMI > + > +# GPU instance detection > +config TERRAFORM_AWS_IS_GPU_INSTANCE > + bool > + output yaml > + default y if TERRAFORM_AWS_INSTANCE_TYPE_G6E > + default y if TERRAFORM_AWS_INSTANCE_TYPE_G6 > + default y if TERRAFORM_AWS_INSTANCE_TYPE_G5 > + default y if TERRAFORM_AWS_INSTANCE_TYPE_G5G > + default y if TERRAFORM_AWS_INSTANCE_TYPE_G4DN > + default y if TERRAFORM_AWS_INSTANCE_TYPE_G4AD > + default y if TERRAFORM_AWS_INSTANCE_TYPE_P5 > + default y if TERRAFORM_AWS_INSTANCE_TYPE_P5EN > + default y if TERRAFORM_AWS_INSTANCE_TYPE_P4D > + default y if TERRAFORM_AWS_INSTANCE_TYPE_P4DE > + default y if TERRAFORM_AWS_INSTANCE_TYPE_P3 > + default y if TERRAFORM_AWS_INSTANCE_TYPE_P3DN > + default n > + help > + Automatically detected based on selected instance type. > + This indicates whether the selected instance has GPU support. > + > +""" > + > + return kconfig > + > + > +def generate_default_gpu_amis_kconfig() -> str: > + """Generate default GPU AMI Kconfig when AWS CLI is not available.""" > + return """# GPU-optimized AMIs (default - AWS CLI not available) > + > +# GPU AMI Override - only shown for GPU instances > +config TERRAFORM_AWS_USE_GPU_AMI > + bool "Use GPU-optimized AMI instead of standard distribution" > + depends on TERRAFORM_AWS_IS_GPU_INSTANCE > + output yaml > + default n > + help > + Enable this to use a GPU-optimized AMI with pre-installed NVIDIA drivers, > + CUDA, and ML frameworks instead of the standard distribution AMI. > + Note: AWS CLI is not available, showing default options. > + > +if TERRAFORM_AWS_USE_GPU_AMI > + > +choice > + prompt "GPU-optimized AMI selection" > + default TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING > + depends on TERRAFORM_AWS_IS_GPU_INSTANCE > + help > + Select which GPU-optimized AMI to use for your GPU instance. > + > +config TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING > + bool "AWS Deep Learning AMI (Ubuntu 22.04)" > + help > + Pre-configured with NVIDIA drivers, CUDA, and ML frameworks. > + > +config TERRAFORM_AWS_GPU_AMI_CUSTOM > + bool "Custom GPU AMI" > + > +endchoice > + > +if TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING > + > +config TERRAFORM_AWS_GPU_AMI_NAME > + string > + output yaml > + default "Deep Learning AMI GPU TensorFlow*" > + > +config TERRAFORM_AWS_GPU_AMI_OWNER > + string > + output yaml > + default "amazon" > + > +endif # TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING > + > +if TERRAFORM_AWS_GPU_AMI_CUSTOM > + > +config TERRAFORM_AWS_GPU_AMI_ID > + string "Custom GPU AMI ID" > + output yaml > + help > + Specify the AMI ID for your custom GPU image. > + > +endif # TERRAFORM_AWS_GPU_AMI_CUSTOM > + > +endif # TERRAFORM_AWS_USE_GPU_AMI > + > +# GPU instance detection (static) > +config TERRAFORM_AWS_IS_GPU_INSTANCE > + bool > + output yaml > + default y if TERRAFORM_AWS_INSTANCE_TYPE_G6E > + default y if TERRAFORM_AWS_INSTANCE_TYPE_G6 > + default y if TERRAFORM_AWS_INSTANCE_TYPE_G5 > + default y if TERRAFORM_AWS_INSTANCE_TYPE_G5G > + default y if TERRAFORM_AWS_INSTANCE_TYPE_G4DN > + default y if TERRAFORM_AWS_INSTANCE_TYPE_G4AD > + default y if TERRAFORM_AWS_INSTANCE_TYPE_P5 > + default y if TERRAFORM_AWS_INSTANCE_TYPE_P5EN > + default y if TERRAFORM_AWS_INSTANCE_TYPE_P4D > + default y if TERRAFORM_AWS_INSTANCE_TYPE_P4DE > + default y if TERRAFORM_AWS_INSTANCE_TYPE_P3 > + default y if TERRAFORM_AWS_INSTANCE_TYPE_P3DN > + default n > + help > + Automatically detected based on selected instance type. > + This indicates whether the selected instance has GPU support. > + > +""" > + > + > +def generate_default_regions_kconfig() -> str: > + """Generate default Kconfig content when AWS CLI is not available.""" > + return """# AWS regions (default - AWS CLI not available) > + > +choice > + prompt "AWS region" > + default TERRAFORM_AWS_REGION_USEAST1 > + help > + Select the AWS region for your deployment. > + Note: AWS CLI is not available, showing default options. > + > +# US Regions > +config TERRAFORM_AWS_REGION_USEAST1 > + bool "US East (N. Virginia)" > + > +config TERRAFORM_AWS_REGION_USEAST2 > + bool "US East (Ohio)" > + > +config TERRAFORM_AWS_REGION_USWEST1 > + bool "US West (N. California)" > + > +config TERRAFORM_AWS_REGION_USWEST2 > + bool "US West (Oregon)" > + > +# Europe Regions > +config TERRAFORM_AWS_REGION_EUWEST1 > + bool "Europe (Ireland)" > + > +config TERRAFORM_AWS_REGION_EUCENTRAL1 > + bool "Europe (Frankfurt)" > + > +# Asia Pacific Regions > +config TERRAFORM_AWS_REGION_APSOUTHEAST1 > + bool "Asia Pacific (Singapore)" > + > +config TERRAFORM_AWS_REGION_APNORTHEAST1 > + bool "Asia Pacific (Tokyo)" > + > +endchoice > + > +config TERRAFORM_AWS_REGION > + string > + default "us-east-1" if TERRAFORM_AWS_REGION_USEAST1 > + default "us-east-2" if TERRAFORM_AWS_REGION_USEAST2 > + default "us-west-1" if TERRAFORM_AWS_REGION_USWEST1 > + default "us-west-2" if TERRAFORM_AWS_REGION_USWEST2 > + default "eu-west-1" if TERRAFORM_AWS_REGION_EUWEST1 > + default "eu-central-1" if TERRAFORM_AWS_REGION_EUCENTRAL1 > + default "ap-southeast-1" if TERRAFORM_AWS_REGION_APSOUTHEAST1 > + default "ap-northeast-1" if TERRAFORM_AWS_REGION_APNORTHEAST1 > + default "us-east-1" > + > +""" > diff --git a/scripts/dynamic-cloud-kconfig.Makefile b/scripts/dynamic-cloud-kconfig.Makefile > index e15651ab..4105e706 100644 > --- a/scripts/dynamic-cloud-kconfig.Makefile > +++ b/scripts/dynamic-cloud-kconfig.Makefile > @@ -12,9 +12,24 @@ LAMBDALABS_KCONFIG_IMAGES := $(LAMBDALABS_KCONFIG_DIR)/Kconfig.images.generated > > LAMBDALABS_KCONFIGS := $(LAMBDALABS_KCONFIG_COMPUTE) $(LAMBDALABS_KCONFIG_LOCATION) $(LAMBDALABS_KCONFIG_IMAGES) > > +# AWS dynamic configuration > +AWS_KCONFIG_DIR := terraform/aws/kconfigs > +AWS_KCONFIG_COMPUTE := $(AWS_KCONFIG_DIR)/Kconfig.compute.generated > +AWS_KCONFIG_LOCATION := $(AWS_KCONFIG_DIR)/Kconfig.location.generated > +AWS_INSTANCE_TYPES_DIR := $(AWS_KCONFIG_DIR)/instance-types > + > +# List of AWS instance type family files that will be generated > +AWS_INSTANCE_TYPE_FAMILIES := m5 m7a t3 t3a c5 c7a i4i is4gen im4gn > +AWS_INSTANCE_TYPE_KCONFIGS := $(foreach family,$(AWS_INSTANCE_TYPE_FAMILIES),$(AWS_INSTANCE_TYPES_DIR)/Kconfig.$(family).generated) > + > +AWS_KCONFIGS := $(AWS_KCONFIG_COMPUTE) $(AWS_KCONFIG_LOCATION) $(AWS_INSTANCE_TYPE_KCONFIGS) > + > # Add Lambda Labs generated files to mrproper clean list > KDEVOPS_MRPROPER += $(LAMBDALABS_KCONFIGS) > > +# Add AWS generated files to mrproper clean list > +KDEVOPS_MRPROPER += $(AWS_KCONFIGS) > + > # Touch Lambda Labs generated files so Kconfig can source them > # This ensures the files exist (even if empty) before Kconfig runs > dynamic_lambdalabs_kconfig_touch: > @@ -22,20 +37,43 @@ dynamic_lambdalabs_kconfig_touch: > > DYNAMIC_KCONFIG += dynamic_lambdalabs_kconfig_touch > > +# Touch AWS generated files so Kconfig can source them > +# This ensures the files exist (even if empty) before Kconfig runs > +dynamic_aws_kconfig_touch: > + $(Q)mkdir -p $(AWS_INSTANCE_TYPES_DIR) > + $(Q)touch $(AWS_KCONFIG_COMPUTE) $(AWS_KCONFIG_LOCATION) > + $(Q)touch $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.generated > + $(Q)for family in $(AWS_INSTANCE_TYPE_FAMILIES); do \ > + touch $(AWS_INSTANCE_TYPES_DIR)/Kconfig.$$family.generated; \ > + done > + > +DYNAMIC_KCONFIG += dynamic_aws_kconfig_touch > + > # Individual Lambda Labs targets are now handled by generate_cloud_configs.py > cloud-config-lambdalabs: > $(Q)python3 scripts/generate_cloud_configs.py > > +# Individual AWS targets are now handled by generate_cloud_configs.py > +cloud-config-aws: > + $(Q)python3 scripts/generate_cloud_configs.py > + > # Clean Lambda Labs generated files > clean-cloud-config-lambdalabs: > $(Q)rm -f $(LAMBDALABS_KCONFIGS) > > -DYNAMIC_CLOUD_KCONFIG += cloud-config-lambdalabs > +# Clean AWS generated files > +clean-cloud-config-aws: > + $(Q)rm -f $(AWS_KCONFIGS) > + $(Q)rm -f .aws_cloud_config_generated > + > +DYNAMIC_CLOUD_KCONFIG += cloud-config-lambdalabs cloud-config-aws > > cloud-config-help: > @echo "Cloud-specific dynamic kconfig targets:" > @echo "cloud-config - generates all cloud provider dynamic kconfig content" > @echo "cloud-config-lambdalabs - generates Lambda Labs dynamic kconfig content" > + @echo "cloud-config-aws - generates AWS dynamic kconfig content" > + @echo "cloud-update - converts generated cloud configs to static (for committing)" > @echo "clean-cloud-config - removes all generated cloud kconfig files" > @echo "cloud-list-all - list all cloud instances for configured provider" > > @@ -44,11 +82,55 @@ HELP_TARGETS += cloud-config-help > cloud-config: > $(Q)python3 scripts/generate_cloud_configs.py > > -clean-cloud-config: clean-cloud-config-lambdalabs > +clean-cloud-config: clean-cloud-config-lambdalabs clean-cloud-config-aws > + $(Q)rm -f .cloud.initialized > $(Q)echo "Cleaned all cloud provider dynamic Kconfig files." > > cloud-list-all: > $(Q)chmod +x scripts/cloud_list_all.sh > $(Q)scripts/cloud_list_all.sh > > -PHONY += cloud-config cloud-config-lambdalabs clean-cloud-config clean-cloud-config-lambdalabs cloud-config-help cloud-list-all > +# Convert dynamically generated cloud configs to static versions for git commits > +# This allows admins to generate configs once and commit them for regular users > +cloud-update: > + @echo "Converting generated cloud configs to static versions..." > + # AWS configs > + $(Q)if [ -f $(AWS_KCONFIG_COMPUTE) ]; then \ > + cp $(AWS_KCONFIG_COMPUTE) $(AWS_KCONFIG_DIR)/Kconfig.compute.static; \ > + sed -i 's/Kconfig\.\([^.]*\)\.generated/Kconfig.\1.static/g' $(AWS_KCONFIG_DIR)/Kconfig.compute.static; \ > + echo " Created $(AWS_KCONFIG_DIR)/Kconfig.compute.static"; \ > + fi > + $(Q)if [ -f $(AWS_KCONFIG_LOCATION) ]; then \ > + cp $(AWS_KCONFIG_LOCATION) $(AWS_KCONFIG_DIR)/Kconfig.location.static; \ > + sed -i 's/Kconfig\.\([^.]*\)\.generated/Kconfig.\1.static/g' $(AWS_KCONFIG_DIR)/Kconfig.location.static; \ > + echo " Created $(AWS_KCONFIG_DIR)/Kconfig.location.static"; \ > + fi > + $(Q)if [ -f $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.generated ]; then \ > + cp $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.generated $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.static; \ > + sed -i 's/Kconfig\.\([^.]*\)\.generated/Kconfig.\1.static/g' $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.static; \ > + echo " Created $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.static"; \ > + fi > + # AWS instance type families > + $(Q)for file in $(AWS_INSTANCE_TYPES_DIR)/Kconfig.*.generated; do \ > + if [ -f "$$file" ]; then \ > + static_file=$$(echo "$$file" | sed 's/\.generated$$/\.static/'); \ > + cp "$$file" "$$static_file"; \ > + echo " Created $$static_file"; \ > + fi; \ > + done > + # Lambda Labs configs > + $(Q)if [ -f $(LAMBDALABS_KCONFIG_COMPUTE) ]; then \ > + cp $(LAMBDALABS_KCONFIG_COMPUTE) $(LAMBDALABS_KCONFIG_DIR)/Kconfig.compute.static; \ > + echo " Created $(LAMBDALABS_KCONFIG_DIR)/Kconfig.compute.static"; \ > + fi > + $(Q)if [ -f $(LAMBDALABS_KCONFIG_LOCATION) ]; then \ > + cp $(LAMBDALABS_KCONFIG_LOCATION) $(LAMBDALABS_KCONFIG_DIR)/Kconfig.location.static; \ > + echo " Created $(LAMBDALABS_KCONFIG_DIR)/Kconfig.location.static"; \ > + fi > + $(Q)if [ -f $(LAMBDALABS_KCONFIG_IMAGES) ]; then \ > + cp $(LAMBDALABS_KCONFIG_IMAGES) $(LAMBDALABS_KCONFIG_DIR)/Kconfig.images.static; \ > + echo " Created $(LAMBDALABS_KCONFIG_DIR)/Kconfig.images.static"; \ > + fi > + @echo "Static cloud configs created. You can now commit these .static files to git." > + > +PHONY += cloud-config cloud-config-lambdalabs cloud-config-aws clean-cloud-config clean-cloud-config-lambdalabs clean-cloud-config-aws cloud-config-help cloud-list-all cloud-update > diff --git a/scripts/generate_cloud_configs.py b/scripts/generate_cloud_configs.py > index b16294dd..332cebe7 100755 > --- a/scripts/generate_cloud_configs.py > +++ b/scripts/generate_cloud_configs.py > @@ -10,6 +10,9 @@ import os > import sys > import subprocess > import json > +from concurrent.futures import ThreadPoolExecutor, as_completed > +from pathlib import Path > +from typing import Tuple > > > def generate_lambdalabs_kconfig() -> bool: > @@ -100,29 +103,194 @@ def get_lambdalabs_summary() -> tuple[bool, str]: > return False, "Lambda Labs: Error querying API - using defaults" > > > +def generate_aws_kconfig() -> bool: > + """ > + Generate AWS Kconfig files. > + Returns True on success, False on failure. > + """ > + script_dir = os.path.dirname(os.path.abspath(__file__)) > + cli_path = os.path.join(script_dir, "aws-cli") > + > + # Generate the Kconfig files > + result = subprocess.run( > + [cli_path, "generate-kconfig"], > + capture_output=True, > + text=True, > + check=False, > + ) > + > + return result.returncode == 0 > + > + > +def get_aws_summary() -> tuple[bool, str]: > + """ > + Get a summary of AWS configurations using aws-cli. > + Returns (success, summary_string) > + """ > + script_dir = os.path.dirname(os.path.abspath(__file__)) > + cli_path = os.path.join(script_dir, "aws-cli") > + > + try: > + # Check if AWS CLI is available > + result = subprocess.run( > + ["aws", "--version"], > + capture_output=True, > + text=True, > + check=False, > + ) > + > + if result.returncode != 0: > + return False, "AWS: AWS CLI not installed - using defaults" > + > + # Check if credentials are configured > + result = subprocess.run( > + ["aws", "sts", "get-caller-identity"], > + capture_output=True, > + text=True, > + check=False, > + ) > + > + if result.returncode != 0: > + return False, "AWS: Credentials not configured - using defaults" > + > + # Get instance types count > + result = subprocess.run( > + [ > + cli_path, > + "--output", > + "json", > + "instance-types", > + "list", > + "--max-results", > + "100", > + ], > + capture_output=True, > + text=True, > + check=False, > + ) > + > + if result.returncode != 0: > + return False, "AWS: Error querying API - using defaults" > + > + instances = json.loads(result.stdout) > + instance_count = len(instances) > + > + # Get regions > + result = subprocess.run( > + [cli_path, "--output", "json", "regions", "list"], > + capture_output=True, > + text=True, > + check=False, > + ) > + > + if result.returncode == 0: > + regions = json.loads(result.stdout) > + region_count = len(regions) > + else: > + region_count = 0 > + > + # Get price range from a sample of instances > + prices = [] > + for instance in instances[:20]: # Sample first 20 for speed > + if "error" not in instance: > + # Extract price if available (would need pricing API) > + # For now, we'll use placeholder > + vcpus = instance.get("vcpu", 0) > + if vcpus > 0: > + # Rough estimate: $0.05 per vCPU/hour > + estimated_price = vcpus * 0.05 > + prices.append(estimated_price) > + > + # Format summary > + if prices: > + min_price = min(prices) > + max_price = max(prices) > + price_range = f"~${min_price:.2f}-${max_price:.2f}/hr" > + else: > + price_range = "pricing varies by region" > + > + return ( > + True, > + f"AWS: {instance_count} instance types available, " > + f"{region_count} regions, {price_range}", > + ) > + > + except (subprocess.SubprocessError, json.JSONDecodeError, KeyError): > + return False, "AWS: Error querying API - using defaults" > + > + > +def process_lambdalabs() -> Tuple[bool, bool, str]: > + """Process Lambda Labs configuration generation and summary. > + Returns (kconfig_generated, summary_success, summary_text) > + """ > + kconfig_generated = generate_lambdalabs_kconfig() > + success, summary = get_lambdalabs_summary() > + return kconfig_generated, success, summary > + > + > +def process_aws() -> Tuple[bool, bool, str]: > + """Process AWS configuration generation and summary. > + Returns (kconfig_generated, summary_success, summary_text) > + """ > + kconfig_generated = generate_aws_kconfig() > + success, summary = get_aws_summary() > + > + # Create marker file to indicate dynamic AWS config is available > + if kconfig_generated: > + marker_file = Path(".aws_cloud_config_generated") > + marker_file.touch() > + > + return kconfig_generated, success, summary > + > + > def main(): > """Main function to generate cloud configurations.""" > print("Cloud Provider Configuration Summary") > print("=" * 60) > print() > > - # Lambda Labs - Generate Kconfig files first > - kconfig_generated = generate_lambdalabs_kconfig() > + # Run cloud provider operations in parallel > + results = {} > + any_success = False > > - # Lambda Labs - Get summary > - success, summary = get_lambdalabs_summary() > - if success: > - print(f"✓ {summary}") > - if kconfig_generated: > - print(" Kconfig files generated successfully") > - else: > - print(" Warning: Failed to generate Kconfig files") > - else: > - print(f"⚠ {summary}") > - print() > + with ThreadPoolExecutor(max_workers=4) as executor: > + # Submit all tasks > + futures = { > + executor.submit(process_lambdalabs): "lambdalabs", > + executor.submit(process_aws): "aws", > + } > + > + # Process results as they complete > + for future in as_completed(futures): > + provider = futures[future] > + try: > + results[provider] = future.result() > + except Exception as e: > + results[provider] = ( > + False, > + False, > + f"{provider.upper()}: Error - {str(e)}", > + ) > + > + # Display results in consistent order > + for provider in ["lambdalabs", "aws"]: > + if provider in results: > + kconfig_gen, success, summary = results[provider] > + if success and kconfig_gen: > + any_success = True > + if success: > + print(f"✓ {summary}") > + if kconfig_gen: > + print(" Kconfig files generated successfully") > + else: > + print(" Warning: Failed to generate Kconfig files") > + else: > + print(f"⚠ {summary}") > + print() > > - # AWS (placeholder - not implemented) > - print("⚠ AWS: Dynamic configuration not yet implemented") > + # Create .cloud.initialized if any provider succeeded > + if any_success: > + Path(".cloud.initialized").touch() > > # Azure (placeholder - not implemented) > print("⚠ Azure: Dynamic configuration not yet implemented") > diff --git a/terraform/aws/kconfigs/Kconfig.compute b/terraform/aws/kconfigs/Kconfig.compute > index bae0ea1c..6b5ff900 100644 > --- a/terraform/aws/kconfigs/Kconfig.compute > +++ b/terraform/aws/kconfigs/Kconfig.compute > @@ -1,94 +1,54 @@ > -choice > - prompt "AWS instance types" > - help > - Instance types comprise varying combinations of hardware > - platform, CPU count, memory size, storage, and networking > - capacity. Select the type that provides an appropriate mix > - of resources for your preferred workflows. > - > - Some instance types are region- and capacity-limited. > - > - See https://aws.amazon.com/ec2/instance-types/ for > - details. > +# AWS compute configuration > > -config TERRAFORM_AWS_INSTANCE_TYPE_M5 > - bool "M5" > - depends on TARGET_ARCH_X86_64 > +config TERRAFORM_AWS_USE_DYNAMIC_CONFIG > + bool "Use dynamically generated instance types" > + default $(shell, test -f .aws_cloud_config_generated && echo y || echo n) > help > - This is a general purpose type powered by Intel Xeon® > - Platinum 8175M or 8259CL processors (Skylake or Cascade > - Lake). > + Enable this to use dynamically generated instance types from AWS CLI. > + Run 'make cloud-config' to query AWS and generate available options. > + When disabled, uses static predefined instance types. > > - See https://aws.amazon.com/ec2/instance-types/m5/ for > - details. > + This is automatically enabled when you run 'make cloud-config'. > > -config TERRAFORM_AWS_INSTANCE_TYPE_M7A > - bool "M7a" > - depends on TARGET_ARCH_X86_64 > - help > - This is a general purpose type powered by 4th Generation > - AMD EPYC processors. > - > - See https://aws.amazon.com/ec2/instance-types/m7a/ for > - details. > - > -config TERRAFORM_AWS_INSTANCE_TYPE_I4I > - bool "I4i" > - depends on TARGET_ARCH_X86_64 > - help > - This is a storage-optimized type powered by 3rd generation > - Intel Xeon Scalable processors (Ice Lake) and use AWS Nitro > - NVMe SSDs. > +if TERRAFORM_AWS_USE_DYNAMIC_CONFIG > +# Include cloud-generated or static instance families > +# Try static first (pre-generated by admins for faster loading) > +# Fall back to generated files (requires AWS CLI) > +source "terraform/aws/kconfigs/Kconfig.compute.static" > +endif > > - See https://aws.amazon.com/ec2/instance-types/i4i/ for > - details. > - > -config TERRAFORM_AWS_INSTANCE_TYPE_IS4GEN > - bool "Is4gen" > - depends on TARGET_ARCH_ARM64 > - help > - This is a Storage-optimized type powered by AWS Graviton2 > - processors. > - > - See https://aws.amazon.com/ec2/instance-types/i4g/ for > - details. > - > -config TERRAFORM_AWS_INSTANCE_TYPE_IM4GN > - bool "Im4gn" > - depends on TARGET_ARCH_ARM64 > +if !TERRAFORM_AWS_USE_DYNAMIC_CONFIG > +# Static instance types when not using dynamic config > +choice > + prompt "AWS instance types" > help > - This is a storage-optimized type powered by AWS Graviton2 > - processors. > + Instance types comprise varying combinations of hardware > + platform, CPU count, memory size, storage, and networking > + capacity. Select the type that provides an appropriate mix > + of resources for your preferred workflows. > > - See https://aws.amazon.com/ec2/instance-types/i4g/ for > - details. > + Some instance types are region- and capacity-limited. > > -config TERRAFORM_AWS_INSTANCE_TYPE_C7A > - depends on TARGET_ARCH_X86_64 > - bool "c7a" > - help > - This is a compute-optimized type powered by 4th generation > - AMD EPYC processors. > + See https://aws.amazon.com/ec2/instance-types/ for > + details. > > - See https://aws.amazon.com/ec2/instance-types/c7a/ for > - details. > > endchoice > +endif # !TERRAFORM_AWS_USE_DYNAMIC_CONFIG > > +if !TERRAFORM_AWS_USE_DYNAMIC_CONFIG > +# Use static instance type definitions when not using dynamic config > source "terraform/aws/kconfigs/instance-types/Kconfig.m5" > source "terraform/aws/kconfigs/instance-types/Kconfig.m7a" > -source "terraform/aws/kconfigs/instance-types/Kconfig.i4i" > -source "terraform/aws/kconfigs/instance-types/Kconfig.is4gen" > -source "terraform/aws/kconfigs/instance-types/Kconfig.im4gn" > -source "terraform/aws/kconfigs/instance-types/Kconfig.c7a" > +endif # !TERRAFORM_AWS_USE_DYNAMIC_CONFIG > > choice > prompt "Linux distribution" > default TERRAFORM_AWS_DISTRO_DEBIAN > help > - Select a popular Linux distribution to install on your > - instances, or use the "Custom AMI image" selection to > - choose an image that is off the beaten path. > + Select a popular Linux distribution to install on your > + instances, or use the "Custom AMI image" selection to > + choose an image that is off the beaten path. > > config TERRAFORM_AWS_DISTRO_AMAZON > bool "Amazon Linux" > @@ -120,3 +80,8 @@ source "terraform/aws/kconfigs/distros/Kconfig.oracle" > source "terraform/aws/kconfigs/distros/Kconfig.rhel" > source "terraform/aws/kconfigs/distros/Kconfig.sles" > source "terraform/aws/kconfigs/distros/Kconfig.custom" > + > +# Include GPU AMI configuration if available (generated by cloud-config) > +if TERRAFORM_AWS_USE_DYNAMIC_CONFIG > +source "terraform/aws/kconfigs/Kconfig.gpu-amis.static" > +endif -- Chuck Lever