From: Luis Chamberlain <mcgrof@kernel.org>
To: Chuck Lever <cel@kernel.org>, Daniel Gomez <da.gomez@kruces.com>,
kdevops@lists.linux.dev
Cc: Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH 2/2] aws: enable GPU AMI support for GPU instances
Date: Sat, 6 Sep 2025 21:23:23 -0700 [thread overview]
Message-ID: <20250907042325.2228868-3-mcgrof@kernel.org> (raw)
In-Reply-To: <20250907042325.2228868-1-mcgrof@kernel.org>
Add support for GPU-optimized AMIs when using GPU instance types.
This includes:
- AWS Deep Learning AMI with pre-installed NVIDIA drivers, CUDA, and
ML frameworks
- NVIDIA Deep Learning AMI option for NGC containers
- Custom GPU AMI support for specialized images
- Automatic detection of GPU instance types
- Conditional display of GPU AMI options only for GPU instances
- Update terraform.tfvars template to use GPU AMI when configured
- Add defconfig for AWS G6e.2xlarge GPU instance with Deep Learning AMI
The system automatically detects when you select a GPU instance family
(like G6E) and provides appropriate GPU-optimized AMI options including
the AWS Deep Learning AMI with all necessary drivers and frameworks
pre-installed.
Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
defconfigs/aws-gpu-g6e-ai | 42 +++++++++++++++++++
.../templates/aws/terraform.tfvars.j2 | 5 +++
scripts/aws_api.py | 4 +-
scripts/dynamic-cloud-kconfig.Makefile | 6 +++
terraform/aws/kconfigs/Kconfig.compute | 5 +++
5 files changed, 60 insertions(+), 2 deletions(-)
create mode 100644 defconfigs/aws-gpu-g6e-ai
diff --git a/defconfigs/aws-gpu-g6e-ai b/defconfigs/aws-gpu-g6e-ai
new file mode 100644
index 00000000..028b0c5e
--- /dev/null
+++ b/defconfigs/aws-gpu-g6e-ai
@@ -0,0 +1,42 @@
+# AWS G6e.2xlarge GPU instance with Deep Learning AMI for AI/ML workloads
+# This configuration sets up an AWS G6e.2xlarge instance with NVIDIA L40S GPU
+# optimized for machine learning, AI inference, and GPU-accelerated workloads
+
+# Cloud provider configuration
+CONFIG_KDEVOPS_ENABLE_TERRAFORM=y
+CONFIG_TERRAFORM=y
+CONFIG_TERRAFORM_AWS=y
+
+# AWS Dynamic configuration (required for G6E instance family and GPU AMIs)
+CONFIG_TERRAFORM_AWS_USE_DYNAMIC_CONFIG=y
+
+# AWS Instance configuration - G6E family with NVIDIA L40S GPU
+# G6E.2XLARGE specifications:
+# - 8 vCPUs (3rd Gen AMD EPYC processors)
+# - 32 GB system RAM
+# - 1x NVIDIA L40S Tensor Core GPU
+# - 48 GB GPU memory
+# - Up to 15 Gbps network performance
+# - Up to 10 Gbps EBS bandwidth
+CONFIG_TERRAFORM_AWS_INSTANCE_TYPE_G6E=y
+CONFIG_TERRAFORM_AWS_INSTANCE_G6E_2XLARGE=y
+
+# AWS Region - US East (N. Virginia) - primary availability for G6E
+CONFIG_TERRAFORM_AWS_REGION_US_EAST_1=y
+
+# GPU-optimized Deep Learning AMI
+# Includes: NVIDIA drivers 535+, CUDA 12.x, cuDNN, TensorFlow, PyTorch, MXNet
+CONFIG_TERRAFORM_AWS_USE_GPU_AMI=y
+CONFIG_TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING=y
+CONFIG_TERRAFORM_AWS_GPU_AMI_NAME="Deep Learning OSS Nvidia Driver AMI GPU PyTorch*Ubuntu 22.04*"
+CONFIG_TERRAFORM_AWS_GPU_AMI_OWNER="amazon"
+
+# Storage configuration optimized for ML workloads
+# 200 GB for datasets, models, and experiment artifacts
+CONFIG_TERRAFORM_AWS_DATA_VOLUME_SIZE=200
+
+# Note: After provisioning, the instance will have:
+# - Jupyter notebook server ready for ML experiments
+# - Pre-installed deep learning frameworks
+# - NVIDIA GPU drivers and CUDA toolkit
+# - Docker with NVIDIA Container Toolkit for containerized ML workloads
diff --git a/playbooks/roles/gen_tfvars/templates/aws/terraform.tfvars.j2 b/playbooks/roles/gen_tfvars/templates/aws/terraform.tfvars.j2
index d880254b..f8f4c842 100644
--- a/playbooks/roles/gen_tfvars/templates/aws/terraform.tfvars.j2
+++ b/playbooks/roles/gen_tfvars/templates/aws/terraform.tfvars.j2
@@ -1,8 +1,13 @@
aws_profile = "{{ terraform_aws_profile }}"
aws_region = "{{ terraform_aws_region }}"
aws_availability_zone = "{{ terraform_aws_av_zone }}"
+{% if terraform_aws_use_gpu_ami is defined and terraform_aws_use_gpu_ami %}
+aws_name_search = "{{ terraform_aws_gpu_ami_name }}"
+aws_ami_owner = "{{ terraform_aws_gpu_ami_owner }}"
+{% else %}
aws_name_search = "{{ terraform_aws_ns }}"
aws_ami_owner = "{{ terraform_aws_ami_owner }}"
+{% endif %}
aws_instance_type = "{{ terraform_aws_instance_type }}"
aws_ebs_volumes_per_instance = "{{ terraform_aws_ebs_volumes_per_instance }}"
aws_ebs_volume_size = {{ terraform_aws_ebs_volume_size }}
diff --git a/scripts/aws_api.py b/scripts/aws_api.py
index e23acaa9..b22da559 100755
--- a/scripts/aws_api.py
+++ b/scripts/aws_api.py
@@ -956,7 +956,7 @@ if TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING
config TERRAFORM_AWS_GPU_AMI_NAME
string
output yaml
- default "Deep Learning AMI GPU TensorFlow*"
+ default "Deep Learning OSS Nvidia Driver AMI GPU PyTorch*Ubuntu 22.04*"
help
AMI name pattern for AWS Deep Learning AMI.
@@ -1061,7 +1061,7 @@ if TERRAFORM_AWS_GPU_AMI_DEEP_LEARNING
config TERRAFORM_AWS_GPU_AMI_NAME
string
output yaml
- default "Deep Learning AMI GPU TensorFlow*"
+ default "Deep Learning OSS Nvidia Driver AMI GPU PyTorch*Ubuntu 22.04*"
config TERRAFORM_AWS_GPU_AMI_OWNER
string
diff --git a/scripts/dynamic-cloud-kconfig.Makefile b/scripts/dynamic-cloud-kconfig.Makefile
index fffa5446..c2d187bf 100644
--- a/scripts/dynamic-cloud-kconfig.Makefile
+++ b/scripts/dynamic-cloud-kconfig.Makefile
@@ -45,6 +45,7 @@ dynamic_aws_kconfig_touch:
$(Q)touch $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.generated
$(Q)touch $(AWS_KCONFIG_DIR)/Kconfig.compute.static
$(Q)touch $(AWS_KCONFIG_DIR)/Kconfig.location.static
+ $(Q)touch $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.static
$(Q)for family in $(AWS_INSTANCE_TYPE_FAMILIES); do \
touch $(AWS_INSTANCE_TYPES_DIR)/Kconfig.$$family.generated; \
touch $(AWS_INSTANCE_TYPES_DIR)/Kconfig.$$family.static; \
@@ -117,6 +118,11 @@ cloud-update:
sed -i 's/Kconfig\.\([^.]*\)\.generated/Kconfig.\1.static/g' $(AWS_KCONFIG_DIR)/Kconfig.location.static; \
echo " Created $(AWS_KCONFIG_DIR)/Kconfig.location.static"; \
fi
+ $(Q)if [ -f $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.generated ]; then \
+ cp $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.generated $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.static; \
+ sed -i 's/Kconfig\.\([^.]*\)\.generated/Kconfig.\1.static/g' $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.static; \
+ echo " Created $(AWS_KCONFIG_DIR)/Kconfig.gpu-amis.static"; \
+ fi
# AWS instance type families
$(Q)for file in $(AWS_INSTANCE_TYPES_DIR)/Kconfig.*.generated; do \
if [ -f "$$file" ]; then \
diff --git a/terraform/aws/kconfigs/Kconfig.compute b/terraform/aws/kconfigs/Kconfig.compute
index 12083d1a..6b5ff900 100644
--- a/terraform/aws/kconfigs/Kconfig.compute
+++ b/terraform/aws/kconfigs/Kconfig.compute
@@ -80,3 +80,8 @@ source "terraform/aws/kconfigs/distros/Kconfig.oracle"
source "terraform/aws/kconfigs/distros/Kconfig.rhel"
source "terraform/aws/kconfigs/distros/Kconfig.sles"
source "terraform/aws/kconfigs/distros/Kconfig.custom"
+
+# Include GPU AMI configuration if available (generated by cloud-config)
+if TERRAFORM_AWS_USE_DYNAMIC_CONFIG
+source "terraform/aws/kconfigs/Kconfig.gpu-amis.static"
+endif
--
2.50.1
prev parent reply other threads:[~2025-09-07 4:23 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-07 4:23 [PATCH 0/2] aws: add dynamic kconfig support Luis Chamberlain
2025-09-07 4:23 ` [PATCH 1/2] aws: add dynamic cloud configuration support using AWS CLI Luis Chamberlain
2025-09-07 17:24 ` Chuck Lever
2025-09-07 22:10 ` Luis Chamberlain
2025-09-07 22:12 ` Luis Chamberlain
2025-09-08 14:12 ` Chuck Lever
2025-09-08 14:21 ` Chuck Lever
2025-09-08 15:23 ` Chuck Lever
2025-09-08 20:22 ` Luis Chamberlain
2025-09-07 4:23 ` Luis Chamberlain [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250907042325.2228868-3-mcgrof@kernel.org \
--to=mcgrof@kernel.org \
--cc=cel@kernel.org \
--cc=da.gomez@kruces.com \
--cc=kdevops@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox