[PATCH 0/4] kdevops: add support for A/B testing

public inbox for kdevops@lists.linux.dev
 help / color / mirror / Atom feed

* [PATCH 0/4] kdevops: add support for A/B testing
@ 2025-07-26  1:16 Luis Chamberlain
  2025-07-26  1:16 ` [PATCH 1/4] Makefile: add make style for style checking Luis Chamberlain
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Luis Chamberlain @ 2025-07-26  1:16 UTC (permalink / raw)
  To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain

Leverage KDEVOPS_BASELINE_AND_DEV to enable A/B testing.
When     KDEVOPS_BASELINE_AND_DEV is enabled we can let the user
now pick and choose different kernel tree and ref tag for A and B.

With all the automation we have ongoing, this will let us compare
performance / features / enhancements. With automation in place, it
also means we can get bots do easily do testing for us for random
inquiries in the future.

The first patch can go in with this or it can go with Daniel's work.
The CLAUDE.md changes are lessons learned for Claude code based on
cleaning up after it. One minor ansible warning is included as well.

Luis Chamberlain (4):
  Makefile: add make style for style checking
  CLAUDE.md: new workflow guide for hosts and nodes
  gen_nodes/gen_hosts: avoid usage of fs_config_path on task names
  bootlinux: add support for A/B kernel testing

 CLAUDE.md                                   | 689 ++++++++++++++++++++
 Makefile                                    |   5 +
 PROMPTS.md                                  |  48 ++
 docs/kdevops-make-linux.md                  | 158 +++++
 kdevops-ci                                  |   1 +
 playbooks/roles/bootlinux/defaults/main.yml |  12 +
 playbooks/roles/bootlinux/tasks/main.yml    |  99 ++-
 playbooks/roles/gen_hosts/tasks/main.yml    |   4 +-
 playbooks/roles/gen_nodes/tasks/main.yml    |   2 +-
 scripts/check_commit_format.py              |  85 +++
 scripts/detect_whitespace_issues.py         | 109 ++++
 scripts/fix_whitespace_issues.py            | 137 ++++
 scripts/infer_last_stable_kernel.sh         |  35 +
 workflows/linux/Kconfig                     | 116 +++-
 workflows/linux/Makefile                    |  39 ++
 15 files changed, 1529 insertions(+), 10 deletions(-)
 create mode 160000 kdevops-ci
 create mode 100755 scripts/check_commit_format.py
 create mode 100755 scripts/detect_whitespace_issues.py
 create mode 100755 scripts/fix_whitespace_issues.py
 create mode 100755 scripts/infer_last_stable_kernel.sh

-- 
2.47.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] Makefile: add make style for style checking
  2025-07-26  1:16 [PATCH 0/4] kdevops: add support for A/B testing Luis Chamberlain
@ 2025-07-26  1:16 ` Luis Chamberlain
  2025-07-26  1:16 ` [PATCH 2/4] CLAUDE.md: new workflow guide for hosts and nodes Luis Chamberlain
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2025-07-26  1:16 UTC (permalink / raw)
  To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain

Add a 'make style' which helps humans and bots follow some sensible
coding conventions.

- Add scripts/detect_whitespace_issues.py check for white space eye sores
- Add scripts/check_commit_format.py to validate Generated-by and the
  Signed-off-by spacing
- Add scripts/fix_whitespace_issues.py to help with stupid spacing
  eyesores
- Update CLAUDE.md to ensure it runs 'make style'

Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 CLAUDE.md                           |  61 +++++++++++++
 Makefile                            |   5 +
 kdevops-ci                          |   1 +
 scripts/check_commit_format.py      |  85 +++++++++++++++++
 scripts/detect_whitespace_issues.py | 109 ++++++++++++++++++++++
 scripts/fix_whitespace_issues.py    | 137 ++++++++++++++++++++++++++++
 6 files changed, 398 insertions(+)
 create mode 160000 kdevops-ci
 create mode 100755 scripts/check_commit_format.py
 create mode 100755 scripts/detect_whitespace_issues.py
 create mode 100755 scripts/fix_whitespace_issues.py

diff --git a/CLAUDE.md b/CLAUDE.md
index 8bee7c0..ea7c0ff 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -89,6 +89,7 @@ make help               # Show available targets
 make V=1 [target]       # Verbose build output
 make AV=1-6 [target]    # Ansible verbose output (levels 0-6)
 make dynconfig          # Generate dynamic configuration
+make style              # Check for whitespace issues - ALWAYS run before completing work
 make mrproper           # Clean everything and restart from scratch
 ```
 
@@ -211,6 +212,66 @@ Developer Certificate or Origin.
 Use this tag for code generated by Claude code AI. Put this before the
 Signed-off-by tag.
 
+**CRITICAL FORMATTING RULE**: When using "Generated-by: Claude AI", it MUST be
+immediately followed by the "Signed-off-by:" tag with NO empty lines between them.
+These two lines must be consecutive.
+
+Correct format:
+```
+Subject line
+
+Detailed description of changes...
+
+Generated-by: Claude AI
+Signed-off-by: Your Name <email@example.com>
+```
+
+**WRONG** - Do NOT add empty lines between Generated-by and Signed-off-by:
+```
+Generated-by: Claude AI
+
+Signed-off-by: Your Name <email@example.com>
+```
+
+**WRONG** - Do NOT add extra empty lines:
+```
+Generated-by: Claude AI
+
+
+Signed-off-by: Your Name <email@example.com>
+```
+
+## Code Quality Requirements
+
+**IMPORTANT**: Before completing any work, you MUST run `make style` to check for
+both whitespace issues and commit message formatting. This ensures code consistency
+and prevents formatting issues from being introduced into the codebase.
+
+The style checker will identify:
+- Trailing whitespace
+- Mixed tabs and spaces
+- Files without newlines at EOF
+- Other whitespace-related issues
+- Incorrect commit message formatting (Generated-by/Signed-off-by spacing)
+
+Fix all reported issues before submitting your work. The `make style` command
+checks both file whitespace and the most recent commit message format.
+
+### Automatic Whitespace Fixing
+
+For convenience, you can automatically fix whitespace issues using:
+```bash
+python3 scripts/fix_whitespace_issues.py              # Fix all modified files
+python3 scripts/fix_whitespace_issues.py file1 file2  # Fix specific files
+```
+
+The fixer script will:
+- Remove trailing whitespace from lines
+- Add missing newlines at end of files
+- Reduce excessive blank lines to maximum 2 consecutive
+
+Always run `make style` after using the fixer to verify all issues are resolved.
+
 ## Prompt Examples
 
 Refer to PROMPTS.md for example set of prompts used to generate code on
diff --git a/Makefile b/Makefile
index c6a5c32..bfff8f9 100644
--- a/Makefile
+++ b/Makefile
@@ -243,6 +243,11 @@ include scripts/ci.Makefile
 include scripts/archive.Makefile
 include scripts/defconfig.Makefile
 
+PHONY += style
+style:
+	$(Q)python3 scripts/detect_whitespace_issues.py
+	$(Q)python3 scripts/check_commit_format.py
+
 PHONY += clean
 clean:
 	$(Q)$(MAKE) -f scripts/build.Makefile $@
diff --git a/kdevops-ci b/kdevops-ci
new file mode 160000
index 0000000..8120607
--- /dev/null
+++ b/kdevops-ci
@@ -0,0 +1 @@
+Subproject commit 812060752af00e601add5716c3180fbb21c41784
diff --git a/scripts/check_commit_format.py b/scripts/check_commit_format.py
new file mode 100755
index 0000000..f72f9d1
--- /dev/null
+++ b/scripts/check_commit_format.py
@@ -0,0 +1,85 @@
+#!/usr/bin/env python3
+"""
+Commit Message Format Checker for kdevops
+
+This script checks the most recent commit message for proper formatting:
+- If "Generated-by: Claude AI" is present, it must be immediately followed by
+  "Signed-off-by:" with no blank lines in between
+"""
+
+import subprocess
+import sys
+import re
+
+def get_latest_commit_message():
+    """Get the latest commit message"""
+    try:
+        result = subprocess.run(['git', 'log', '-1', '--pretty=format:%B'],
+                              capture_output=True, text=True, check=True)
+        return result.stdout
+    except subprocess.CalledProcessError:
+        print("Error: Failed to get commit message")
+        return None
+    except FileNotFoundError:
+        print("Error: git command not found")
+        return None
+
+def check_commit_format(commit_msg):
+    """Check commit message formatting"""
+    issues = []
+    if not commit_msg:
+        return ["No commit message found"]
+    lines = commit_msg.strip().split('\n')
+    # Find Generated-by line
+    generated_by_idx = None
+    signed_off_by_idx = None
+    for i, line in enumerate(lines):
+        if line.startswith('Generated-by: Claude AI'):
+            generated_by_idx = i
+        elif line.startswith('Signed-off-by:'):
+            signed_off_by_idx = i
+    # If Generated-by is present, check formatting
+    if generated_by_idx is not None:
+        if signed_off_by_idx is None:
+            issues.append("Generated-by: Claude AI found but no Signed-off-by tag present")
+        else:
+            # Check if Generated-by is immediately followed by Signed-off-by (no lines in between)
+            if signed_off_by_idx != generated_by_idx + 1:
+                lines_between = signed_off_by_idx - generated_by_idx - 1
+                if lines_between > 0:
+                    issues.append(f"Generated-by: Claude AI must be immediately followed by Signed-off-by (found {lines_between} lines between them)")
+                    for i in range(generated_by_idx + 1, signed_off_by_idx):
+                        if lines[i].strip():
+                            issues.append(f"  - Non-empty line at {i+1}: '{lines[i]}'")
+                        else:
+                            issues.append(f"  - Empty line at {i+1}")
+    return issues
+
+def main():
+    """Main function to check commit message format"""
+    commit_msg = get_latest_commit_message()
+    if commit_msg is None:
+        return 1
+    issues = check_commit_format(commit_msg)
+    if issues:
+        print("❌ Commit message formatting issues found:")
+        for issue in issues:
+            print(f"  ⚠️  {issue}")
+        print("\nLatest commit message:")
+        print("=" * 50)
+        print(commit_msg)
+        print("=" * 50)
+        print("\nCorrect format when using Generated-by:")
+        print("Subject line")
+        print("")
+        print("Detailed description...")
+        print("")
+        print("Generated-by: Claude AI")
+        print("Signed-off-by: Your Name <email@example.com>")
+        return 1
+    else:
+        print("✅ Commit message formatting is correct!")
+        return 0
+
+if __name__ == '__main__':
+    sys.exit(main())
diff --git a/scripts/detect_whitespace_issues.py b/scripts/detect_whitespace_issues.py
new file mode 100755
index 0000000..165a33e
--- /dev/null
+++ b/scripts/detect_whitespace_issues.py
@@ -0,0 +1,109 @@
+#!/usr/bin/env python3
+"""
+Whitespace Issue Detector for kdevops
+
+This script detects common whitespace issues that Claude AI tends to introduce:
+- Trailing whitespace at end of lines
+- Missing newline at end of file
+- Excessive blank lines
+"""
+
+import os
+import sys
+from pathlib import Path
+
+def check_file_whitespace(file_path):
+    """Check a single file for whitespace issues"""
+    issues = []
+
+    try:
+        with open(file_path, 'rb') as f:
+            content = f.read()
+
+        # Skip binary files
+        if b'\0' in content:
+            return issues
+
+        lines = content.decode('utf-8', errors='ignore').splitlines(keepends=True)
+
+        # Check trailing whitespace
+        for line_num, line in enumerate(lines, 1):
+            if line.rstrip('\n\r').endswith(' ') or line.rstrip('\n\r').endswith('\t'):
+                issues.append(f"Line {line_num}: Trailing whitespace")
+
+        # Check missing newline at end of file
+        if content and not content.endswith(b'\n'):
+            issues.append("Missing newline at end of file")
+
+        # Check for excessive blank lines (more than 2 consecutive)
+        blank_count = 0
+        for line_num, line in enumerate(lines, 1):
+            if line.strip() == '':
+                blank_count += 1
+            else:
+                if blank_count > 2:
+                    issues.append(f"Line {line_num - blank_count}: {blank_count} consecutive blank lines")
+                blank_count = 0
+
+    except Exception as e:
+        issues.append(f"Error reading file: {e}")
+
+    return issues
+
+def main():
+    """Main function to scan for whitespace issues"""
+    if len(sys.argv) > 1:
+        paths = sys.argv[1:]
+    else:
+        # Default to git tracked files with modifications
+        import subprocess
+        try:
+            result = subprocess.run(['git', 'diff', '--name-only'],
+                                  capture_output=True, text=True, check=True)
+            paths = result.stdout.strip().split('\n') if result.stdout.strip() else []
+            if not paths:
+                print("No modified files found in git")
+                return
+        except subprocess.CalledProcessError:
+            print("Error: Not in a git repository or git command failed")
+            return
+        except FileNotFoundError:
+            print("Error: git command not found")
+            return
+
+    total_issues = 0
+    files_with_issues = 0
+
+    for path_str in paths:
+        path = Path(path_str)
+        if not path.exists():
+            print(f"Warning: {path} does not exist")
+            continue
+
+        if path.is_file():
+            # Skip certain file types
+            if path.suffix in ['.pyc', '.so', '.o', '.bin', '.jpg', '.png', '.gif']:
+                continue
+
+            issues = check_file_whitespace(path)
+            if issues:
+                files_with_issues += 1
+                total_issues += len(issues)
+                print(f"\n{path}:")
+                for issue in issues:
+                    print(f"  ⚠️  {issue}")
+
+    print(f"\nSummary: {total_issues} whitespace issues found in {files_with_issues} files")
+
+    if total_issues > 0:
+        print("\nTo fix these issues:")
+        print("- Remove trailing spaces/tabs from lines")
+        print("- Add newline at end of files")
+        print("- Reduce excessive blank lines to 1-2 maximum")
+        return 1
+    else:
+        print("✅ No whitespace issues found!")
+        return 0
+
+if __name__ == '__main__':
+    sys.exit(main())
diff --git a/scripts/fix_whitespace_issues.py b/scripts/fix_whitespace_issues.py
new file mode 100755
index 0000000..3e69ea5
--- /dev/null
+++ b/scripts/fix_whitespace_issues.py
@@ -0,0 +1,137 @@
+#!/usr/bin/env python3
+"""
+Whitespace Issue Fixer for kdevops
+
+This script fixes common whitespace issues that Claude AI tends to introduce:
+- Trailing whitespace at end of lines
+- Missing newline at end of file
+- Excessive blank lines (reduces to maximum 2 consecutive)
+"""
+
+import os
+import sys
+from pathlib import Path
+
+def fix_file_whitespace(file_path):
+    """Fix whitespace issues in a single file"""
+    issues_fixed = []
+
+    try:
+        with open(file_path, 'rb') as f:
+            content = f.read()
+
+        # Skip binary files
+        if b'\0' in content:
+            return issues_fixed
+
+        original_content = content.decode('utf-8', errors='ignore')
+        lines = original_content.splitlines(keepends=True)
+        modified = False
+
+        # Fix trailing whitespace
+        new_lines = []
+        for line_num, line in enumerate(lines, 1):
+            original_line = line
+            # Remove trailing whitespace but preserve line endings
+            if line.endswith('\r\n'):
+                cleaned_line = line.rstrip(' \t\r\n') + '\r\n'
+            elif line.endswith('\n'):
+                cleaned_line = line.rstrip(' \t\n') + '\n'
+            else:
+                cleaned_line = line.rstrip(' \t')
+
+            if original_line != cleaned_line:
+                issues_fixed.append(f"Line {line_num}: Removed trailing whitespace")
+                modified = True
+
+            new_lines.append(cleaned_line)
+
+        # Fix excessive blank lines (reduce to maximum 2 consecutive)
+        final_lines = []
+        blank_count = 0
+        i = 0
+        while i < len(new_lines):
+            line = new_lines[i]
+            if line.strip() == '':
+                blank_count += 1
+                if blank_count <= 2:
+                    final_lines.append(line)
+                else:
+                    issues_fixed.append(f"Line {i+1}: Removed excessive blank line")
+                    modified = True
+            else:
+                blank_count = 0
+                final_lines.append(line)
+            i += 1
+
+        # Fix missing newline at end of file
+        new_content = ''.join(final_lines)
+        if new_content and not new_content.endswith('\n'):
+            new_content += '\n'
+            issues_fixed.append("Added missing newline at end of file")
+            modified = True
+
+        # Write back if modified
+        if modified:
+            with open(file_path, 'w', encoding='utf-8') as f:
+                f.write(new_content)
+
+    except Exception as e:
+        issues_fixed.append(f"Error processing file: {e}")
+
+    return issues_fixed
+
+def main():
+    """Main function to fix whitespace issues"""
+    if len(sys.argv) > 1:
+        paths = sys.argv[1:]
+    else:
+        # Default to git tracked files with modifications
+        import subprocess
+        try:
+            result = subprocess.run(['git', 'diff', '--name-only'],
+                                  capture_output=True, text=True, check=True)
+            paths = result.stdout.strip().split('\n') if result.stdout.strip() else []
+            if not paths:
+                print("No modified files found in git")
+                return 0
+        except subprocess.CalledProcessError:
+            print("Error: Not in a git repository or git command failed")
+            return 1
+        except FileNotFoundError:
+            print("Error: git command not found")
+            return 1
+
+    total_fixes = 0
+    files_fixed = 0
+
+    for path_str in paths:
+        path = Path(path_str)
+        if not path.exists():
+            print(f"Warning: {path} does not exist")
+            continue
+
+        if path.is_file():
+            # Skip certain file types
+            if path.suffix in ['.pyc', '.so', '.o', '.bin', '.jpg', '.png', '.gif']:
+                continue
+
+            fixes = fix_file_whitespace(path)
+            if fixes:
+                files_fixed += 1
+                total_fixes += len(fixes)
+                print(f"\n{path}:")
+                for fix in fixes:
+                    print(f"  ✅ {fix}")
+
+    print(f"\nSummary: {total_fixes} whitespace issues fixed in {files_fixed} files")
+
+    if total_fixes > 0:
+        print("✅ Whitespace issues have been automatically fixed!")
+        return 0
+    else:
+        print("✅ No whitespace issues found to fix!")
+        return 0
+
+if __name__ == '__main__':
+    sys.exit(main())
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] CLAUDE.md: new workflow guide for hosts and nodes
  2025-07-26  1:16 [PATCH 0/4] kdevops: add support for A/B testing Luis Chamberlain
  2025-07-26  1:16 ` [PATCH 1/4] Makefile: add make style for style checking Luis Chamberlain
@ 2025-07-26  1:16 ` Luis Chamberlain
  2025-07-26  1:16 ` [PATCH 3/4] gen_nodes/gen_hosts: avoid usage of fs_config_path on task names Luis Chamberlain
  2025-07-26  1:16 ` [PATCH 4/4] bootlinux: add support for A/B kernel testing Luis Chamberlain
  3 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2025-07-26  1:16 UTC (permalink / raw)
  To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain

Extend CLAUDE.md with a slew of guidelines for the gen_host and gen_nodes
playbook as well as tons of other tips to work around complex ansible
issues which are not easy to infer.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 CLAUDE.md | 628 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 628 insertions(+)

diff --git a/CLAUDE.md b/CLAUDE.md
index ea7c0ff..7764bc6 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -160,6 +160,131 @@ make mrproper           # Clean everything and restart from scratch
 4. **Baseline Management**: Comprehensive tracking of known failures and regressions
 5. **Template Generation**: Dynamic file generation based on configuration
 
+## Adding New Workflows
+
+When adding a new workflow to kdevops, you must add node generation rules to both
+`playbooks/roles/gen_nodes/tasks/main.yml` and `playbooks/roles/gen_hosts/tasks/main.yml`
+to avoid the failure "dedicated workflow has no rules for node configuration".
+
+### Required Additions
+
+For each new workflow, add the following sections to both playbooks:
+
+#### gen_nodes playbook
+Add node generation rules with appropriate conditional logic based on whether the
+workflow uses individual test configuration flags (like mmtests) or choice-based
+configuration (like fio-tests):
+
+```yaml
+# For workflows with individual test flags (like mmtests)
+- name: Infer enabled WORKFLOW test section types
+  set_fact:
+    workflow_enabled_test_types: >-
+      {{
+        [kdevops_host_prefix + '-']
+        | product(
+            lookup('file', topdir_path + '/.config')
+            | regex_findall('^CONFIG_WORKFLOW_ENABLE_(.*)=y$', multiline=True)
+            | map('lower')
+            | list
+        )
+        | map('join')
+        | list
+      }}
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_WORKFLOW
+    - ansible_nodes_template.stat.exists
+    - not kdevops_baseline_and_dev
+
+# For workflows with choice-based configuration (like fio-tests)
+- name: Generate the WORKFLOW kdevops nodes file using {{ kdevops_nodes_template }} as jinja2 source template
+  tags: [ 'hosts' ]
+  vars:
+    node_template: "{{ kdevops_nodes_template | basename }}"
+    nodes: "{{ [kdevops_host_prefix + '-WORKFLOW'] }}"
+    all_generic_nodes: "{{ [kdevops_host_prefix + '-WORKFLOW'] }}"
+  template:
+    src: "{{ node_template }}"
+    dest: "{{ topdir_path }}/{{ kdevops_nodes }}"
+    force: yes
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_WORKFLOW
+    - ansible_nodes_template.stat.exists
+```
+
+#### gen_hosts playbook
+Add host file generation task for the workflow:
+
+```yaml
+- name: Generate the Ansible hosts file for a dedicated WORKFLOW setup
+  tags: [ 'hosts' ]
+  template:
+    src: "{{ kdevops_hosts_template }}"
+    dest: "{{ topdir_path }}/{{ kdevops_hosts }}"
+    force: yes
+    trim_blocks: True
+    lstrip_blocks: True
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_WORKFLOW
+    - ansible_hosts_template.stat.exists
+```
+
+#### Update the generic hosts template
+Add support for your workflow in the generic hosts template at
+`playbooks/roles/gen_hosts/templates/hosts.j2`. Find the dedicated workflow
+section and add your workflow's conditional logic:
+
+```jinja2
+{% if kdevops_workflows_dedicated_workflow %}
+{% if kdevops_workflow_enable_WORKFLOW %}
+[all]
+{{ kdevops_host_prefix }}-WORKFLOW
+{% if kdevops_baseline_and_dev %}
+{{ kdevops_host_prefix }}-WORKFLOW-dev
+{% endif %}
+
+[all:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+
+[baseline]
+{{ kdevops_host_prefix }}-WORKFLOW
+
+[baseline:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+
+{% if kdevops_baseline_and_dev %}
+[dev]
+{{ kdevops_host_prefix }}-WORKFLOW-dev
+
+[dev:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+
+{% endif %}
+[WORKFLOW]
+{{ kdevops_host_prefix }}-WORKFLOW
+{% if kdevops_baseline_and_dev %}
+{{ kdevops_host_prefix }}-WORKFLOW-dev
+{% endif %}
+
+[WORKFLOW:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+{% else %}
+```
+
+### Examples
+
+Refer to the existing mmtests implementation for workflows with multiple individual
+test configuration flags, or the fio-tests implementation for workflows with
+choice-based configuration patterns.
+
+**Important**: All workflows use the same generic hosts template in
+`playbooks/roles/gen_hosts/templates/hosts.j2`. Do NOT create workflow-specific
+template files. Instead, extend the generic template with conditional logic
+for your workflow.
+
 ## Quick Setup Examples
 
 ### XFS Filesystem Testing
@@ -272,6 +397,509 @@ The fixer script will:
 
 Always run `make style` after using the fixer to verify all issues are resolved.
 
+## Complex System Interactions
+
+kdevops integrates multiple subsystems (Ansible, Kconfig, Git, Make) that often
+interact in non-obvious ways. Understanding these interactions is crucial for
+effective debugging and development.
+
+### Ansible Architecture Patterns
+
+#### Host vs Control Node Execution
+kdevops uses several Ansible execution patterns that affect variable scope:
+
+- **Control Host Execution**: `run_once: true, delegate_to: localhost`
+  - Executes once on the control host, not on target nodes
+  - Per-node variables may not be available in localhost context
+  - Common in 9P filesystem builds where single build is shared to all guests
+  - Use `hostvars[groups['group_name'][0]]['variable_name']` to access node-specific vars
+
+- **Variable Resolution Issues**:
+  - Variables set per-node (like A/B testing configs) aren't automatically available on localhost
+  - Need explicit variable resolution for cross-context access
+  - Git repository state must be managed carefully when switching between refs
+
+#### A/B Testing Variable Management
+```yaml
+# Detect dev nodes by hostname pattern
+- name: Determine if this is a dev node for A/B testing
+  set_fact:
+    bootlinux_is_dev_node: "{{ ansible_hostname | regex_search('^.*-dev$') is not none }}"
+
+# Resolve active parameters for 9P builds
+- name: Determine active kernel parameters for A/B testing with 9P
+  set_fact:
+    active_linux_ref: "{{ hostvars[groups['dev'][0]]['target_linux_ref'] if 'dev' in group_names else target_linux_ref }}"
+  run_once: true
+  delegate_to: localhost
+```
+
+### Kconfig Dynamic Configuration Patterns
+
+#### Shell Command Integration
+```kconfig
+config BOOTLINUX_DEV_TREE_REF
+    string "Development kernel reference"
+    default $(shell, scripts/infer_last_stable_kernel.sh)
+    help
+      The default is automatically inferred as the most recent stable
+      kernel version from the git repository.
+```
+
+**Best Practices**:
+- Always provide fallback values in scripts
+- Place scripts in `scripts/` directory
+- Use conditional defaults: `default VALUE if CONDITION`
+- Test scripts work in different environments
+
+#### Dependencies and Conflicts
+```kconfig
+config BOOTLINUX_SHALLOW_CLONE
+    bool "Shallow git clone"
+    default y if !KDEVOPS_BASELINE_AND_DEV
+    depends on !BOOTLINUX_AB_DIFFERENT_REF
+    help
+      This option is automatically disabled when using A/B testing with
+      different kernel references, as shallow clones may not contain all
+      the required refs for checkout.
+```
+
+**Key Patterns**:
+- `depends on !CONFIG_OPTION` - Prevent incompatible combinations
+- `default y if !OTHER_CONFIG` - Conditional defaults
+- Document why restrictions exist in help text
+
+### Git Repository Management
+
+#### Shallow Clone Limitations
+- **Problem**: A/B testing with different refs requires full git history
+- **Solution**: Make shallow clones depend on `!BOOTLINUX_AB_DIFFERENT_REF`
+- **Detection**: Use `git --git-dir=/path/to/mirror.git` for mirror access
+
+#### Version Detection Scripts
+```bash
+# Get latest stable kernel version, excluding release candidates
+LAST_STABLE=$(git --git-dir="$GIT_TREE" tag --list 'v6.*' | \
+    grep -v -- '-rc' | \
+    sort -V | \
+    tail -1)
+```
+
+**Patterns**:
+- Use `sort -V` for proper semantic version ordering
+- Filter out pre-release versions with `grep -v -- '-rc'`
+- Always provide fallback values
+- Handle missing git repositories gracefully
+
+### Systematic Debugging Methodology
+
+#### Configuration Tracing
+1. **Check actual values**: Look at `extra_vars.yaml` for resolved variables
+2. **Trace execution context**: Identify if code runs on localhost vs target nodes
+3. **Verify prerequisites**: Ensure git refs exist before checkout attempts
+4. **Follow variable inheritance**: Understand Ansible variable precedence
+
+#### A/B Testing Debug Steps
+```bash
+# Check current configuration
+grep "BOOTLINUX.*REF" .config
+grep "bootlinux.*tree.*ref" extra_vars.yaml
+
+# Verify git repository state
+git branch -v
+git describe --tags --always
+
+# Test kernel version detection
+scripts/infer_last_stable_kernel.sh
+
+# Check available refs in mirror
+git --git-dir=/mirror/linux.git tag --list 'v6.*' | sort -V | tail -10
+```
+
+#### Common Root Causes
+- **Variable scope mismatches**: Per-node vars not available on localhost
+- **Git ref unavailability**: Shallow clones missing required refs
+- **Execution context confusion**: Code expecting node context running on localhost
+- **Configuration interdependencies**: Features conflicting in unexpected ways
+
+### Feature Integration Patterns
+
+#### When Features Conflict
+1. **Identify early**: Use Kconfig dependencies to prevent invalid combinations
+2. **Provide alternatives**: Suggest compatible configurations
+3. **Clear messaging**: Explain why restrictions exist
+4. **Graceful degradation**: Disable conflicting features automatically
+
+Example: Shallow clones + A/B different refs
+- **Problem**: Shallow clone may not have required git refs
+- **Solution**: `depends on !BOOTLINUX_AB_DIFFERENT_REF`
+- **User experience**: Feature automatically disabled with explanation
+
+#### Smart Defaults Philosophy
+- **Infer from system state**: Use dynamic detection where possible
+- **Show off capabilities**: Make examples compelling and useful
+- **Balance automation with control**: Provide overrides for advanced users
+- **Fail gracefully**: Always have sensible fallbacks
+
+### AI Assistant Development Guidelines
+
+#### Investigation Sequence
+1. **Understand the problem**: What's not working as expected?
+2. **Trace execution path**: Follow code from config → ansible → execution
+3. **Identify context and scope**: Where does code run? What variables are available?
+4. **Find intersection points**: Issues often emerge where subsystems meet
+5. **Design holistic solutions**: Fix root cause, enhance the feature
+6. **Validate across use cases**: Test both specific case and general functionality
+
+#### Common Anti-Patterns to Avoid
+- Band-aid fixes that ignore root cause
+- Breaking existing functionality while fixing edge cases
+- Ignoring variable scope and execution context
+- Missing cross-feature impact analysis
+- Not considering user experience implications
+
+#### Quality Gates
+- Always run `make style` before completion
+- Test both the specific case and general functionality
+- Consider impact on existing users and configurations
+- Document new patterns for future reference
+- Verify changes work across different execution contexts
+
+### Examples from Practice
+
+#### A/B Kernel Testing Issue
+**Symptoms**: `bootlinux_dev_tree_kernelrelease` not being used in dev builds
+
+**Root Cause Analysis**:
+- 9P builds execute `run_once: true, delegate_to: localhost`
+- Per-node A/B variables not available in localhost context
+- Git checkout failed due to shallow clone missing refs
+
+**Solution Components**:
+1. Variable resolution: `hostvars[groups['dev'][0]]['target_linux_ref']`
+2. Git ref management: Force checkout correct ref before build
+3. Configuration fix: Disable shallow clones for A/B different refs
+4. Smart defaults: Auto-detect latest stable kernel version
+
+**Key Insight**: Complex issues often involve multiple subsystem interactions
+rather than bugs in individual components.
+
+## Per-Node Variable Management and Scope Issues
+
+One of the most common and subtle sources of bugs in kdevops is per-node variable
+scope issues, particularly when combining Ansible's execution contexts with
+complex features like A/B testing and 9P builds.
+
+### Understanding Ansible Execution Contexts
+
+kdevops uses multiple Ansible execution patterns that affect variable visibility:
+
+#### 1. Normal Node Execution
+```yaml
+- name: Set per-node variable
+  set_fact:
+    my_node_var: "{{ some_value }}"
+  # Runs on each target node, variable is per-node
+```
+
+#### 2. Control Host Execution (run_once + delegate_to: localhost)
+```yaml
+- name: Process shared data
+  set_fact:
+    shared_var: "{{ processed_value }}"
+  run_once: true
+  delegate_to: localhost
+  # Runs once on control host, not on target nodes
+```
+
+#### 3. Mixed Context Operations
+```yaml
+- name: Access per-node data from control host
+  set_fact:
+    aggregated_data: "{{ hostvars[groups['target'][0]]['node_var'] }}"
+  run_once: true
+  delegate_to: localhost
+  # Attempts to access per-node variable from control context
+```
+
+### Common Variable Scope Problems
+
+#### Problem 1: Per-Node Variables Not Available on localhost
+
+**Symptom**: Variables set on target nodes are undefined when accessed from
+localhost tasks.
+
+**Example**:
+```yaml
+# This sets target_linux_ref ONLY on nodes matching the condition
+- name: Set development kernel parameters for dev nodes
+  set_fact:
+    target_linux_ref: "{{ bootlinux_dev_tree_ref }}"
+  when:
+    - kdevops_baseline_and_dev|bool
+    - bootlinux_ab_different_ref|bool
+    - bootlinux_is_dev_node|default(false)|bool
+
+# This tries to access per-node variable from localhost - MAY FAIL
+- name: Use dev node's kernel ref for 9P build
+  set_fact:
+    active_ref: "{{ hostvars[groups['dev'][0]]['target_linux_ref'] }}"
+  run_once: true
+  delegate_to: localhost
+```
+
+**Root Cause**: The first task only runs on dev nodes and sets a per-node
+variable. The second task runs on localhost but may not have access to the
+per-node variable if there are timing or context issues.
+
+#### Problem 2: Host Scope Limiting with HOSTS Parameter
+
+**Symptom**: When using `make target HOSTS=specific-host`, variables set on
+other hosts become inaccessible.
+
+**Example**:
+```bash
+# This limits the playbook scope to only the dev node
+make linux HOSTS=demo-fio-tests-dev
+```
+
+If your playbook tries to access variables from baseline nodes or uses
+`hostvars[groups['baseline'][0]]`, these may fail because the baseline
+nodes are not in the current run scope.
+
+#### Problem 3: Race Conditions in Variable Resolution
+
+**Symptom**: Variables appear to be set inconsistently or use wrong values.
+
+**Root Cause**: Tasks with `run_once: true` may execute before per-node
+tasks complete, leading to variable access before they're properly set.
+
+### Best Practices for Variable Management
+
+#### 1. Prefer Global Variables for Cross-Context Access
+
+**Bad**:
+```yaml
+# Set per-node, access from localhost - fragile
+- name: Set node-specific value
+  set_fact:
+    node_value: "{{ some_computation }}"
+
+- name: Use in shared context
+  command: "process {{ hostvars[groups['target'][0]]['node_value'] }}"
+  run_once: true
+  delegate_to: localhost
+```
+
+**Good**:
+```yaml
+# Use global variable that's accessible everywhere
+- name: Use global configuration
+  command: "process {{ global_config_value }}"
+  run_once: true
+  delegate_to: localhost
+```
+
+#### 2. Explicit Variable Resolution with Fallbacks
+
+**Recommended Pattern**:
+```yaml
+- name: Resolve variable with robust fallback
+  set_fact:
+    active_value: >-
+      {{
+        hostvars[groups['dev'][0]]['target_value']
+        if (groups['dev'] | length > 0 and
+            hostvars[groups['dev'][0]]['target_value'] is defined)
+        else fallback_global_value
+      }}
+  run_once: true
+  delegate_to: localhost
+```
+
+#### 3. Validate Variable Availability
+
+**Add Validation Tasks**:
+```yaml
+- name: Validate required variables are available
+  fail:
+    msg: "Required variable {{ item }} not found in dev node context"
+  when: hostvars[groups['dev'][0]][item] is not defined
+  loop:
+    - target_linux_ref
+    - target_linux_config
+  run_once: true
+  delegate_to: localhost
+```
+
+#### 4. Use Consistent Variable Naming
+
+**Pattern**: Use prefixes to indicate variable scope:
+- `global_*` - Available everywhere
+- `node_*` - Per-node variables
+- `active_*` - Resolved variables for shared operations
+
+### Debugging Variable Scope Issues
+
+#### 1. Add Debug Tasks
+
+```yaml
+- name: Debug variable availability
+  debug:
+    msg: |
+      Groups: {{ groups }}
+      Dev group: {{ groups['dev'] | default([]) }}
+      Hostvars keys: {{ hostvars.keys() | list }}
+      Target var: {{ hostvars[groups['dev'][0]]['target_var'] | default('UNDEFINED') }}
+  run_once: true
+  delegate_to: localhost
+```
+
+#### 2. Check Execution Context
+
+```yaml
+- name: Show execution context
+  debug:
+    msg: |
+      Running on: {{ inventory_hostname }}
+      Delegate to: {{ ansible_delegated_vars | default('none') }}
+      Group names: {{ group_names }}
+  tags: debug
+```
+
+#### 3. Use Ansible Verbose Mode
+
+```bash
+# Run with verbose output to see variable resolution
+make target AV=3  # Ansible verbose level 3
+```
+
+### A/B Testing Variable Resolution Example
+
+The A/B testing feature demonstrates proper variable resolution:
+
+```yaml
+# Step 1: Set per-node variables (runs on dev nodes only)
+- name: Set development kernel parameters for dev nodes
+  set_fact:
+    target_linux_ref: "{{ bootlinux_dev_tree_ref }}"
+  when:
+    - kdevops_baseline_and_dev|bool
+    - bootlinux_ab_different_ref|bool
+    - bootlinux_is_dev_node|default(false)|bool
+
+# Step 2: Resolve for shared operations (robust fallback)
+- name: Determine active kernel parameters for A/B testing with 9P
+  set_fact:
+    active_linux_ref: "{{ bootlinux_dev_tree_ref }}"
+    # Direct use of global var instead of fragile hostvars access
+  when:
+    - kdevops_baseline_and_dev|bool
+    - bootlinux_ab_different_ref|bool
+    - bootlinux_9p|bool
+  run_once: true
+  delegate_to: localhost
+
+# Step 3: Use resolved variable
+- name: Checkout correct git ref for A/B testing with 9P
+  git:
+    version: "{{ active_linux_ref | default(target_linux_ref) }}"
+  run_once: true
+  delegate_to: localhost
+```
+
+### Testing Variable Resolution
+
+When developing features that involve per-node variables:
+
+1. **Test with different HOSTS parameters**:
+   ```bash
+   make target HOSTS=demo-fio-tests      # baseline only
+   make target HOSTS=demo-fio-tests-dev  # dev only
+   make target                           # both nodes
+   ```
+
+2. **Test with different configurations**:
+   - A/B testing enabled/disabled
+   - Different node configurations
+   - Varying group memberships
+
+3. **Validate variable values**:
+   ```bash
+   # Check resolved variables
+   grep "active_.*:" extra_vars.yaml
+
+   # Verify node-specific settings
+   make target AV=2 | grep -A5 -B5 "Set.*fact"
+   ```
+
+### Common Patterns to Avoid
+
+#### Anti-Pattern 1: Assuming Variable Availability
+```yaml
+# DON'T: Assume hostvars access will work
+- name: Use dev node variable
+  command: "build {{ hostvars[groups['dev'][0]]['target_ref'] }}"
+```
+
+#### Anti-Pattern 2: Complex Conditional Logic in Variable Access
+```yaml
+# DON'T: Bury complex logic in variable expressions
+- name: Set complex variable
+  set_fact:
+    value: "{{ 'dev' in group_names | ternary(dev_value, baseline_value) if condition else other_value }}"
+```
+
+#### Anti-Pattern 3: Side Effects in Variable Resolution
+```yaml
+# DON'T: Modify state during variable resolution
+- name: Set variable with side effects
+  set_fact:
+    computed_value: "{{ some_value }}"
+  changed_when: true  # Misleading change indication
+```
+
+### Recommended Patterns
+
+#### Pattern 1: Explicit Variable Resolution Phase
+```yaml
+# Phase 1: Collect per-node data
+- name: Collect node-specific configurations
+  set_fact:
+    node_config: "{{ local_config }}"
+
+# Phase 2: Resolve for shared operations
+- name: Resolve shared configuration
+  set_fact:
+    shared_config: "{{ resolved_value }}"
+  run_once: true
+  delegate_to: localhost
+
+# Phase 3: Execute with resolved config
+- name: Execute shared operation
+  command: "process {{ shared_config }}"
+  run_once: true
+  delegate_to: localhost
+```
+
+#### Pattern 2: Configuration-Driven Variable Resolution
+```yaml
+# Use configuration variables that are globally accessible
+- name: Resolve kernel reference
+  set_fact:
+    active_kernel_ref: >-
+      {{
+        bootlinux_dev_tree_ref
+        if (kdevops_baseline_and_dev|bool and bootlinux_ab_different_ref|bool)
+        else target_linux_ref
+      }}
+  run_once: true
+  delegate_to: localhost
+```
+
+This approach avoids the fragile `hostvars` access pattern and relies on
+configuration variables that are available in all execution contexts.
+
 ## Prompt Examples
 
 Refer to PROMPTS.md for example set of prompts used to generate code on
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] gen_nodes/gen_hosts: avoid usage of fs_config_path on task names
  2025-07-26  1:16 [PATCH 0/4] kdevops: add support for A/B testing Luis Chamberlain
  2025-07-26  1:16 ` [PATCH 1/4] Makefile: add make style for style checking Luis Chamberlain
  2025-07-26  1:16 ` [PATCH 2/4] CLAUDE.md: new workflow guide for hosts and nodes Luis Chamberlain
@ 2025-07-26  1:16 ` Luis Chamberlain
  2025-07-26  1:16 ` [PATCH 4/4] bootlinux: add support for A/B kernel testing Luis Chamberlain
  3 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2025-07-26  1:16 UTC (permalink / raw)
  To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain

The variable fs_config_path is not defined when we don't enable
fstests and so we'll get a warning about it not being defined.
Fix this by not using it on task names.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 playbooks/roles/gen_hosts/tasks/main.yml | 4 ++--
 playbooks/roles/gen_nodes/tasks/main.yml | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/playbooks/roles/gen_hosts/tasks/main.yml b/playbooks/roles/gen_hosts/tasks/main.yml
index 58ebd1d..c10abc5 100644
--- a/playbooks/roles/gen_hosts/tasks/main.yml
+++ b/playbooks/roles/gen_hosts/tasks/main.yml
@@ -49,12 +49,12 @@
     - ansible_hosts_template.stat.exists
   tags: vars
 
-- name: Verify fstest config file file exists {{ fs_config_path }}
+- name: Verify fstest config file exists
   stat:
     path: "{{ fs_config_path }}"
   register: fstests_config_file_reg
   when:
-    - is_fstests
+    - is_fstests|bool
 
 - name: Generate the Ansible hosts file
   tags: [ 'hosts' ]
diff --git a/playbooks/roles/gen_nodes/tasks/main.yml b/playbooks/roles/gen_nodes/tasks/main.yml
index ebaba92..cb8459a 100644
--- a/playbooks/roles/gen_nodes/tasks/main.yml
+++ b/playbooks/roles/gen_nodes/tasks/main.yml
@@ -101,7 +101,7 @@
     - ansible_nodes_template.stat.exists
   tags: vars
 
-- name: Verify fstest config file file exists {{ fs_config_path }}
+- name: Verify fstest config file exists
   stat:
     path: "{{ fs_config_path }}"
   register: fstests_config_file_reg
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] bootlinux: add support for A/B kernel testing
  2025-07-26  1:16 [PATCH 0/4] kdevops: add support for A/B testing Luis Chamberlain
                   ` (2 preceding siblings ...)
  2025-07-26  1:16 ` [PATCH 3/4] gen_nodes/gen_hosts: avoid usage of fs_config_path on task names Luis Chamberlain
@ 2025-07-26  1:16 ` Luis Chamberlain
  2025-07-26 18:00   ` Chuck Lever
  3 siblings, 1 reply; 13+ messages in thread
From: Luis Chamberlain @ 2025-07-26  1:16 UTC (permalink / raw)
  To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain

Right now we use the same kernel for all target nodes. We want to
compare and contrast different kenrels for different features. We
add support for A/B testing by leveraging the baseline and dev groups
provided to us by KDEVOPS_BASELINE_AND_DEV.

This extends the bootlinux playbook by enabling us to allow a different
kernel tree / ref to be used for the dev group. This just becomes a
configuration thing. The targets are intuitive:

  make linux                 # Handles A/B compilation transparently
  make linux-baseline        # Build and install baseline kernel only
  make linux-dev             # Build and install development kernel only

Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 PROMPTS.md                                  |  48 ++++++
 docs/kdevops-make-linux.md                  | 158 ++++++++++++++++++++
 playbooks/roles/bootlinux/defaults/main.yml |  12 ++
 playbooks/roles/bootlinux/tasks/main.yml    |  99 +++++++++++-
 scripts/infer_last_stable_kernel.sh         |  35 +++++
 workflows/linux/Kconfig                     | 116 +++++++++++++-
 workflows/linux/Makefile                    |  39 +++++
 7 files changed, 500 insertions(+), 7 deletions(-)
 create mode 100755 scripts/infer_last_stable_kernel.sh

diff --git a/PROMPTS.md b/PROMPTS.md
index a4ecf39..3fde7e9 100644
--- a/PROMPTS.md
+++ b/PROMPTS.md
@@ -123,3 +123,51 @@ source "workflows/mmtests/Kconfig.thpchallenge"
 source "workflows/mmtests/Kconfig.fs"
 
    This separation is preferred as it helps us scale.
+
+## Kernel development and A/B testing support
+
+### Adding A/B kernel testing support for different kernel versions
+
+**Prompt:**
+We want to add support for when users enable KDEVOPS_BASELINE_AND_DEV we want
+to extend workflows/linux/Kconfig with the a choise set of options to either a)
+use the same kernel ref or b) allow the user to specify a different ref tag.
+This will enable A/B testing with different kernel versions. When a different
+kernel refs are desirable we will want to extend the compilation step and
+installation of the Linux kernel in two steps. The first will be for the ref
+and target of A (baseline tag) and the second will be for the target ref of B
+(dev tag). However we want to fold these two steps in one for when
+KDEVOPS_BASELINE_AND_DEV is used and make install is used, it would happen
+transparently for us. The resulting linux kernel directory would end up with
+the "dev" ref at the end. In case a user wants to re-compile a target ref for
+baseline or dev we want to add (if we don't have already) a make linux-baseline
+and make linux-dev so that we can build and install the target ref tag on the
+baseline (A) or dev (B). The make linux target then would serially do make
+linux-baseline and make linux-dev. Extend documentation for all this and also
+add the respective prompt to PROMPTS.md once done. Avoid adding extra spaces to
+code or documentation at the end of each line. These end up in red color on
+diffs and hurt my eyes. Extend CLAUDE.md to understand styling for these rules
+about not wanting lines ending in white space for styling.
+
+**AI:** Claude Code
+**Commit:** [To be determined]
+**Result:** Complete A/B kernel testing implementation with comprehensive configuration options.
+**Grading:** 70%
+
+**Notes:**
+
+The implementation successfully added:
+
+1. **Makefile Implementation**: the AI failed to grasp the value of
+   output yaml, and made ugly Makefile changes to extract variables.
+
+2. **Ansible Integration**: The AI failed to write the required changes on
+   the ansible playbook at first. A secondary prompt made it just move the
+   definitions to the ansible playbook but failed to address serially compiling
+   linux for the baseline group first followed by the dev group after.
+
+3. **Documentation**: The AI is not grasping the preference to respect 80
+   character lengths.
+
+4. **Menus**: The AI didn't do a good job at placing menus in a way that
+   would make more intuitive sense for users.
diff --git a/docs/kdevops-make-linux.md b/docs/kdevops-make-linux.md
index e68eee5..8f54372 100644
--- a/docs/kdevops-make-linux.md
+++ b/docs/kdevops-make-linux.md
@@ -13,3 +13,161 @@ To verify the kernel on it:
 ```bash
 make uname
 ```
+
+## A/B Kernel Testing
+
+kdevops supports A/B testing with different kernel versions when
+`KDEVOPS_BASELINE_AND_DEV` is enabled. This allows you to compare performance
+or behavior between different kernel versions across baseline and development nodes.
+
+### Configuration Options
+
+When A/B testing is enabled, you can choose between two approaches:
+
+#### Same Kernel Reference (Default)
+Use the same kernel tree and reference for both baseline and dev nodes:
+```
+A/B kernel testing configuration (BOOTLINUX_AB_SAME_REF) [Y/n/?]
+```
+
+This is useful for testing configuration changes or different test parameters
+with identical kernels.
+
+#### Different Kernel References
+Use different kernel references for baseline and dev nodes:
+```
+A/B kernel testing configuration
+  1. Use same kernel reference for baseline and dev (BOOTLINUX_AB_SAME_REF)
+> 2. Use different kernel references for baseline and dev (BOOTLINUX_AB_DIFFERENT_REF)
+```
+
+This enables testing between different kernel versions, commits, or branches.
+
+When using different references, configure:
+- **Development kernel tree URL**: Git repository (defaults to baseline tree)
+- **Development kernel reference**: Branch, tag, or commit (e.g., "v6.8", "linux-next")
+- **Development kernel release/local version**: Custom version strings for identification
+
+### Make Targets
+
+#### Standard Linux Building
+```bash
+make linux                 # Build and install kernels for all nodes
+```
+
+When A/B testing with different references is enabled, this automatically:
+1. Builds and installs baseline kernel on baseline nodes
+2. Builds and installs development kernel on dev nodes
+3. Leaves the working directory with the dev kernel checked out
+
+#### Individual Node Targeting
+```bash
+make linux-baseline        # Build and install kernel for baseline nodes only
+make linux-dev             # Build and install kernel for dev nodes only
+```
+
+These targets are available when `KDEVOPS_BASELINE_AND_DEV=y` and allow
+selective building and installation.
+
+### Usage Examples
+
+#### Testing Kernel Versions
+Compare v6.7 (baseline) vs v6.8 (development):
+
+```bash
+# Configure baseline kernel
+menuconfig → Workflows → Linux kernel → Git tree to clone: linus
+            Reference to use: v6.7
+
+# Configure A/B testing
+menuconfig → Workflows → Linux kernel → A/B kernel testing
+            → Use different kernel references
+            → Development kernel reference: v6.8
+
+make bringup               # Provision baseline and dev nodes
+make linux                 # Install v6.7 on baseline, v6.8 on dev
+make fstests               # Run tests on both kernel versions
+make fstests-compare       # Compare results between versions
+```
+
+#### Testing Development Branches
+Compare stable vs linux-next:
+
+```bash
+# Baseline: stable kernel
+menuconfig → Reference to use: v6.8
+
+# Development: linux-next
+menuconfig → A/B kernel testing → Development kernel reference: linux-next
+
+make linux-baseline        # Install stable kernel on baseline nodes
+make linux-dev             # Install linux-next on dev nodes
+```
+
+#### Bisection Support
+Test specific commits during bisection:
+
+```bash
+# Update development reference for bisection
+menuconfig → Development kernel reference: abc123def
+
+make linux-dev             # Install bisection commit on dev nodes
+# Run tests and analyze results
+```
+
+### Working Directory State
+
+After running `make linux` with different references:
+- The Linux source directory contains the **development kernel** checkout
+- Both baseline and dev nodes have their respective kernels installed
+- Use `git log --oneline -5` to verify the current checkout
+
+To switch the working directory to baseline:
+```bash
+git checkout v6.7          # Switch to baseline reference
+```
+
+### Integration with Testing Workflows
+
+A/B kernel testing integrates seamlessly with all kdevops testing workflows:
+
+```bash
+# Run fstests with kernel comparison
+make linux                 # Install different kernels
+make fstests               # Test both kernel versions
+make fstests-compare       # Generate comparison analysis
+
+# Run fio-tests with kernel comparison
+make linux                 # Install different kernels
+make fio-tests             # Performance test both kernels
+make fio-tests-compare     # Compare performance metrics
+
+# Run sysbench with kernel comparison
+make linux                 # Install different kernels
+make sysbench              # Database tests on both kernels
+```
+
+### Best Practices
+
+1. **Version Identification**: Use descriptive kernel release versions to distinguish builds
+2. **Sequential Testing**: Install kernels before running test workflows
+3. **Result Organization**: Use baseline/dev labels in test result analysis
+4. **Git Management**: Keep track of which reference is currently checked out
+5. **Systematic Comparison**: Use `*-compare` targets for meaningful analysis
+
+### Troubleshooting
+
+#### Build Failures
+- Ensure both kernel references are valid and accessible
+- Check that build dependencies are installed on all nodes
+- Verify git repository permissions and network connectivity
+
+#### Version Conflicts
+- Use different `kernelrelease` and `localversion` settings for clear identification
+- Check `/boot` directory for kernel installation conflicts
+- Verify GRUB configuration after kernel installation
+
+#### Node Targeting Issues
+- Confirm `KDEVOPS_BASELINE_AND_DEV=y` is enabled
+- Verify baseline and dev node groups exist in inventory
+- Check ansible host patterns with `make linux-baseline HOSTS=baseline`
diff --git a/playbooks/roles/bootlinux/defaults/main.yml b/playbooks/roles/bootlinux/defaults/main.yml
index fd5674b..4146292 100644
--- a/playbooks/roles/bootlinux/defaults/main.yml
+++ b/playbooks/roles/bootlinux/defaults/main.yml
@@ -52,3 +52,15 @@ kdevops_workflow_enable_cxl: False
 
 bootlinux_cxl_test: False
 bootlinux_tree_set_by_cli: False
+
+# A/B testing defaults
+bootlinux_ab_same_ref: True
+bootlinux_ab_different_ref: False
+
+# Development kernel settings (used when bootlinux_ab_different_ref is True)
+bootlinux_dev_tree: ""
+bootlinux_dev_tree_ref: "master"
+bootlinux_dev_tree_kernelrelease: ""
+bootlinux_dev_tree_localversion: ""
+bootlinux_tree_custom_kernelrelease: False
+bootlinux_tree_custom_localversion: false
diff --git a/playbooks/roles/bootlinux/tasks/main.yml b/playbooks/roles/bootlinux/tasks/main.yml
index 7671389..283ac88 100644
--- a/playbooks/roles/bootlinux/tasks/main.yml
+++ b/playbooks/roles/bootlinux/tasks/main.yml
@@ -61,6 +61,74 @@
   when:
     - not kdevops_baseline_and_dev|bool
 
+- name: Determine if this is a dev node for A/B testing
+  set_fact:
+    bootlinux_is_dev_node: "{{ ansible_hostname | regex_search('^.*-dev$') is not none }}"
+  when:
+    - kdevops_baseline_and_dev|bool
+    - bootlinux_ab_different_ref|bool
+
+- name: Set development group full custom kernel release
+  set_fact:
+    target_linux_kernelrelease: "{{ bootlinux_dev_tree_kernelrelease if bootlinux_dev_tree_kernelrelease != '' else target_linux_kernelrelease }}"
+  when:
+    - kdevops_baseline_and_dev|bool
+    - bootlinux_ab_different_ref|bool
+    - bootlinux_tree_custom_kernelrelease|bool
+    - bootlinux_is_dev_node|default(false)|bool
+
+- name: Set development group local append version
+  set_fact:
+    target_linux_localversion: "{{ bootlinux_dev_tree_localversion if bootlinux_dev_tree_localversion != '' else target_linux_localversion }}"
+  when:
+    - kdevops_baseline_and_dev|bool
+    - bootlinux_ab_different_ref|bool
+    - bootlinux_tree_custom_localversion|bool
+    - bootlinux_is_dev_node|default(false)|bool
+
+- name: Set development kernel parameters for dev nodes
+  set_fact:
+    target_linux_git: "{{ bootlinux_dev_tree if bootlinux_dev_tree != '' else target_linux_git }}"
+    target_linux_ref: "{{ bootlinux_dev_tree_ref }}"
+    target_linux_config: "config-{{ bootlinux_dev_tree_ref }}"
+  when:
+    - kdevops_baseline_and_dev|bool
+    - bootlinux_ab_different_ref|bool
+    - bootlinux_is_dev_node|default(false)|bool
+
+- name: Determine active kernel parameters for A/B testing with 9P
+  set_fact:
+    active_linux_ref: "{{ bootlinux_dev_tree_ref }}"
+  when:
+    - kdevops_baseline_and_dev|bool
+    - bootlinux_ab_different_ref|bool
+    - bootlinux_tree_custom_kernelrelease|bool
+    - bootlinux_9p|bool
+  run_once: true
+  delegate_to: localhost
+
+- name: Determine full custom kernel release for A/B testing with 9P
+  set_fact:
+    active_linux_kernelrelease: "{{ hostvars[groups['dev'][0]]['target_linux_kernelrelease'] if hostvars[groups['dev'][0]]['target_linux_kernelrelease'] is defined else target_linux_kernelrelease }}"
+  when:
+    - kdevops_baseline_and_dev|bool
+    - bootlinux_ab_different_ref|bool
+    - bootlinux_tree_custom_kernelrelease|bool
+    - bootlinux_9p|bool
+  run_once: true
+  delegate_to: localhost
+
+- name: Determine localversion kernel release for A/B testing with 9P
+  set_fact:
+    active_linux_localversion: "{{ hostvars[groups['dev'][0]]['target_linux_localversion'] if hostvars[groups['dev'][0]]['target_linux_localversion'] is defined else target_linux_localversion }}"
+  when:
+    - kdevops_baseline_and_dev|bool
+    - bootlinux_ab_different_ref|bool
+    - bootlinux_tree_custom_localversion|bool
+    - bootlinux_9p|bool
+  run_once: true
+  delegate_to: localhost
+
 - include_role:
     name: create_data_partition
 
@@ -412,6 +480,20 @@
     - not bootlinux_9p|bool
     - snaik_oil_file.stat.exists
 
+- name: Checkout correct git ref for A/B testing with 9P
+  git:
+    repo: "{{ target_linux_git }}"
+    dest: "{{ bootlinux_9p_host_path }}"
+    version: "{{ active_linux_ref | default(target_linux_ref) }}"
+    force: yes
+  tags: [ 'build-linux' ]
+  when:
+    - bootlinux_9p|bool
+    - kdevops_baseline_and_dev|bool
+    - bootlinux_ab_different_ref|bool
+  run_once: true
+  delegate_to: localhost
+
 - name: Get nproc on the control node
   command: "{{ num_jobs }}"
   tags: [ 'build-linux', 'cxl-build' ]
@@ -429,21 +511,24 @@
   tags: [ 'build-linux' ]
   when:
     - bootlinux_9p|bool
-    - target_linux_kernelrelease | length > 0
+    - (active_linux_kernelrelease | default(target_linux_kernelrelease)) | length > 0
   run_once: true
   delegate_to: localhost
 
-- name: Generate user kernelrelease {{ target_linux_kernelversion.stdout }}-{{ target_linux_kernelrelease }}
+- name: Generate user kernelrelease
   set_fact:
-    target_user_kernelrelease: "{{ target_linux_kernelversion.stdout }}-{{ target_linux_kernelrelease }}"
+    target_user_kernelrelease: "{{ target_linux_kernelversion.stdout }}-{{ active_linux_kernelrelease | default(target_linux_kernelrelease) }}"
   tags: [ 'build-linux' ]
   when:
     - bootlinux_9p|bool
-    - target_linux_kernelrelease | length > 0
+    - bootlinux_tree_custom_kernelrelease|bool
+    - (active_linux_kernelrelease | default(target_linux_kernelrelease)) | length > 0
+    - target_linux_kernelversion is defined
+    - target_linux_kernelversion.stdout is defined
   run_once: true
   delegate_to: localhost
 
-- name: Build {{ target_linux_tree }} {{ target_user_kernelrelease }} on the control node using {{ nproc_9p.stdout }} threads
+- name: Build {{ target_linux_tree }} with custom kernel release on the control node using {{ nproc_9p.stdout }} threads
   make:
     jobs: "{{ nproc_9p.stdout }}"
     chdir: "{{ bootlinux_9p_host_path }}"
@@ -452,11 +537,13 @@
   tags: [ 'build-linux' ]
   when:
     - bootlinux_9p|bool
+    - bootlinux_tree_custom_kernelrelease|bool
     - target_linux_kernelrelease | length > 0
+    - target_user_kernelrelease is defined
   run_once: true
   delegate_to: localhost
 
-- name: Build {{ target_linux_tree }} {{ target_user_kernelrelease }} on the control node using {{ nproc_9p.stdout }} threads
+- name: Build {{ target_linux_tree }} on the control node using {{ nproc_9p.stdout }} threads
   make:
     jobs: "{{ nproc_9p.stdout }}"
     chdir: "{{ bootlinux_9p_host_path }}"
diff --git a/scripts/infer_last_stable_kernel.sh b/scripts/infer_last_stable_kernel.sh
new file mode 100755
index 0000000..9cc19a9
--- /dev/null
+++ b/scripts/infer_last_stable_kernel.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+# SPDX-License-Identifier: copyleft-next-0.3.1
+
+# This script infers the last stable kernel version from the git repository.
+# It looks for the most recent non-rc tag (e.g., v6.14, v6.13) that would
+# be a good default for A/B testing with different kernel references.
+
+GIT_TREE="${1:-/mirror/linux.git}"
+
+if [ ! -d "$GIT_TREE" ]; then
+    echo "v6.12"  # fallback if no git tree available
+    exit 0
+fi
+
+# Get all v6.x tags, excluding release candidates
+# Sort them by version and get the last stable release
+LAST_STABLE=$(git --git-dir="$GIT_TREE" tag --list 'v6.*' | \
+    grep -v -- '-rc' | \
+    sort -V | \
+    tail -1)
+
+if [ -z "$LAST_STABLE" ]; then
+    # If no stable v6.x found, try v5.x as fallback
+    LAST_STABLE=$(git --git-dir="$GIT_TREE" tag --list 'v5.*' | \
+        grep -v -- '-rc' | \
+        sort -V | \
+        tail -1)
+fi
+
+# Final fallback if nothing found
+if [ -z "$LAST_STABLE" ]; then
+    echo "v6.12"
+else
+    echo "$LAST_STABLE"
+fi
\ No newline at end of file
diff --git a/workflows/linux/Kconfig b/workflows/linux/Kconfig
index 183ac77..d08270e 100644
--- a/workflows/linux/Kconfig
+++ b/workflows/linux/Kconfig
@@ -139,6 +139,41 @@ config BOOTLINUX_CUSTOM
 
 endchoice
 
+if KDEVOPS_BASELINE_AND_DEV
+
+choice
+	prompt "A/B kernel testing configuration"
+	default BOOTLINUX_AB_DIFFERENT_REF
+	help
+	  When A/B testing is enabled, you can choose to use the same
+	  kernel reference for both baseline and dev nodes, or specify
+	  different kernel references to test different kernel versions.
+	  We default to assuming you want to test a different kernel on
+	  each.
+
+config BOOTLINUX_AB_SAME_REF
+	bool "Use same kernel reference for baseline and dev"
+	output yaml
+	help
+	  Use the same kernel tree and reference for both baseline and
+	  development nodes. This is useful for testing configuration
+	  changes or different test parameters with the same kernel.
+
+config BOOTLINUX_AB_DIFFERENT_REF
+	bool "Use different kernel references for baseline and dev"
+	output yaml
+	help
+	  Use different kernel references for baseline and development
+	  nodes. This enables testing between different kernel versions,
+	  commits, or branches. The baseline will use the main configured
+	  kernel reference, while dev uses a separate reference.
+
+endchoice
+
+endif
+
+menu "A -     main    group kernel configuration"
+
 source "workflows/linux/Kconfig.linus"
 source "workflows/linux/Kconfig.stable"
 source "workflows/linux/Kconfig.dev"
@@ -180,6 +215,65 @@ config BOOTLINUX_TREE_CUSTOM_REF
 
 endif # BOOTLINUX_CUSTOM
 
+endmenu
+
+if KDEVOPS_BASELINE_AND_DEV
+
+if BOOTLINUX_AB_DIFFERENT_REF
+
+menu "B - development group kernel configuration"
+
+config BOOTLINUX_DEV_TREE
+	string "B group development kernel tree URL"
+	output yaml
+	default BOOTLINUX_TREE
+	help
+	  Git tree URL for the development kernel. If left empty or same
+	  as the baseline tree, the same tree will be used with a different
+	  reference. This allows testing different branches or forks.
+
+config BOOTLINUX_DEV_TREE_REF
+	string "B group development kernel reference"
+	output yaml
+	default $(shell, scripts/infer_last_stable_kernel.sh)
+	help
+	  Git reference (branch, tag, or commit) for the development kernel.
+	  This should be different from the baseline reference to enable
+	  meaningful A/B comparison between kernel versions.
+
+	  The default is automatically inferred as the most recent stable
+	  kernel version (e.g., v6.15) from the git repository.
+
+	  Examples:
+	  - "v6.8" (stable release)
+	  - "linux-next" (latest development)
+	  - "v6.7..v6.8" (range for bisection)
+	  - commit SHA (specific commit)
+
+config BOOTLINUX_DEV_TREE_KERNELRELEASE
+	string "Development kernel release version"
+	depends on BOOTLINUX_TREE_CUSTOM_KERNELRELEASE
+	output yaml
+	help
+	  The string here (e.g. 'devel') will be appended to the result of make
+	  kernelversion. Example: '6.8.0-rc3-devel' but only for the dev group.
+	  Leave it empty unless you want a custom tag at the end.
+
+config BOOTLINUX_DEV_TREE_LOCALVERSION
+	string "Development kernel local version"
+	output yaml
+	depends on BOOTLINUX_TREE_CUSTOM_LOCALVERSION
+	default BOOTLINUX_TREE_LOCALVERSION
+	help
+	  The Linux local version to use for the development kernel (for uname).
+	  If left empty, will use the same as baseline.
+
+endmenu
+
+endif # BOOTLINUX_AB_DIFFERENT_REF
+
+endif # KDEVOPS_BASELINE_AND_DEV
+
 # This ends up being the directory name used for the /data/ partition
 # where linux is deployed on the nodes.
 config BOOTLINUX_TREE_NAME
@@ -264,23 +358,39 @@ config BOOTLINUX_TREE_REF
 	default BOOTLINUX_TREE_CEL_LINUX_REF if BOOTLINUX_TREE_CEL_LINUX
 	default BOOTLINUX_TREE_CUSTOM_REF if BOOTLINUX_CUSTOM
 
+config BOOTLINUX_TREE_CUSTOM_KERNELRELEASE
+	bool "Do you want a full custom kernel release name?"
+	output yaml
+	help
+	  Do you want a full custom Linux kernel release which will be output
+	  through uname?
+
 config BOOTLINUX_TREE_KERNELRELEASE
 	string "Linux kernel release version to use"
+	depends on BOOTLINUX_TREE_CUSTOM_KERNELRELEASE
 	help
 	  The Linux kernel release version to use (for uname).
 
 	  The string here (e.g. 'devel') will be appended to the result of make
 	  kernelversion. Example: '6.8.0-rc3-devel'
 
+config BOOTLINUX_TREE_CUSTOM_LOCALVERSION
+	bool "Do you want to append a custom kernel release tag?"
+	output yaml
+	help
+	  Do you want a full custom Linux kernel release which will be output
+	  through uname?
 
 config BOOTLINUX_TREE_LOCALVERSION
 	string "Linux local version to use"
+	depends on BOOTLINUX_TREE_CUSTOM_LOCALVERSION
 	help
 	  The Linux local version to use (for uname).
 
 config BOOTLINUX_SHALLOW_CLONE
 	bool "Shallow git clone"
-	default y
+	default y if !KDEVOPS_BASELINE_AND_DEV
+	depends on !BOOTLINUX_AB_DIFFERENT_REF
 	help
 	  If enabled the git tree cloned with be cloned using a shallow tree
 	  with history truncated. You want to enable this if you really don't
@@ -291,6 +401,10 @@ config BOOTLINUX_SHALLOW_CLONE
 	  just using the targets as dummy target runners and don't expect to
 	  be using 'git log' on the target guests.
 
+	  This option is automatically disabled when using A/B testing with
+	  different kernel references, as shallow clones may not contain all
+	  the required refs for checkout.
+
 config BOOTLINUX_SHALLOW_CLONE_DEPTH
 	int "Shallow git clone depth"
 	default 30 if BOOTLINUX_TREE_SET_BY_CLI
diff --git a/workflows/linux/Makefile b/workflows/linux/Makefile
index f68c090..ab8bcbb 100644
--- a/workflows/linux/Makefile
+++ b/workflows/linux/Makefile
@@ -71,6 +71,10 @@ PHONY +=  linux-help-menu
 linux-help-menu:
 	@echo "Linux git kernel development options"
 	@echo "linux              - Git clones a linux git tree, build Linux, installs and reboots into it"
+	@if [[ "$(CONFIG_KDEVOPS_BASELINE_AND_DEV)" == "y" ]]; then \
+		echo "linux-baseline     - Build and install kernel for baseline nodes only" ;\
+		echo "linux-dev          - Build and install kernel for dev nodes only" ;\
+	fi
 	@if [[ "$(CONFIG_BOOTLINUX_9P)" == "y" ]]; then \
 		echo "linux-mount        - Mounts 9p path on targets" ;\
 	fi
@@ -89,10 +93,45 @@ linux-help-end:
 LINUX_HELP_EXTRA :=
 
 PHONY += linux
+ifeq (y,$(CONFIG_KDEVOPS_BASELINE_AND_DEV))
+ifeq (y,$(CONFIG_BOOTLINUX_AB_DIFFERENT_REF))
+linux: linux-baseline linux-dev
+else
+linux: $(KDEVOPS_NODES)
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) -i \
+		$(KDEVOPS_HOSTFILE) $(KDEVOPS_PLAYBOOKS_DIR)/bootlinux.yml \
+		--extra-vars="$(BOOTLINUX_ARGS)" $(LIMIT_HOSTS)
+endif
+else
 linux: $(KDEVOPS_NODES)
 	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) -i \
 		$(KDEVOPS_HOSTFILE) $(KDEVOPS_PLAYBOOKS_DIR)/bootlinux.yml \
 		--extra-vars="$(BOOTLINUX_ARGS)" $(LIMIT_HOSTS)
+endif
+
+PHONY += linux-baseline
+ifeq (y,$(CONFIG_KDEVOPS_BASELINE_AND_DEV))
+linux-baseline: $(KDEVOPS_NODES)
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) -i \
+		$(KDEVOPS_HOSTFILE) $(KDEVOPS_PLAYBOOKS_DIR)/bootlinux.yml \
+		--extra-vars="$(BOOTLINUX_ARGS)" --limit baseline
+else
+linux-baseline:
+	@echo "linux-baseline requires KDEVOPS_BASELINE_AND_DEV=y"
+	@exit 1
+endif
+
+PHONY += linux-dev
+ifeq (y,$(CONFIG_KDEVOPS_BASELINE_AND_DEV))
+linux-dev: $(KDEVOPS_NODES)
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) -i \
+		$(KDEVOPS_HOSTFILE) $(KDEVOPS_PLAYBOOKS_DIR)/bootlinux.yml \
+		--extra-vars="$(BOOTLINUX_ARGS)" --limit dev
+else
+linux-dev:
+	@echo "linux-dev requires KDEVOPS_BASELINE_AND_DEV=y"
+	@exit 1
+endif
 
 PHONY += linux-mount
 linux-mount:
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] bootlinux: add support for A/B kernel testing
  2025-07-26  1:16 ` [PATCH 4/4] bootlinux: add support for A/B kernel testing Luis Chamberlain
@ 2025-07-26 18:00   ` Chuck Lever
  2025-07-26 20:21     ` Luis Chamberlain
  0 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2025-07-26 18:00 UTC (permalink / raw)
  To: Luis Chamberlain; +Cc: Daniel Gomez, kdevops

On 7/25/25 9:16 PM, Luis Chamberlain wrote:
> Right now we use the same kernel for all target nodes. We want to
> compare and contrast different kenrels for different features. We
> add support for A/B testing by leveraging the baseline and dev groups
> provided to us by KDEVOPS_BASELINE_AND_DEV.
> 
> This extends the bootlinux playbook by enabling us to allow a different
> kernel tree / ref to be used for the dev group. This just becomes a
> configuration thing. The targets are intuitive:
> 
>   make linux                 # Handles A/B compilation transparently
>   make linux-baseline        # Build and install baseline kernel only
>   make linux-dev             # Build and install development kernel only

My "build the kernel once and package it" patches are still under test,
but this patch conflicts heavily with that work. I'm not sure how to
reconcile it, but if you feel this patch is ready to merge now, let's
commit it and I will rework my set.

What my set does is change the operation of the "make linux" target so
that, depending on Kconfig settings, it either:

- Works as it does today (builds the kernel on each test runner)
- Builds the kernel and packages it on a separate node, saving the
  package
- Finds the packaged kernel and installs it on the test runners

Along the way there are a number of clean-ups suggested by Daniel,
including improving the selection of kernel .configs, and splitting
apart the tasks for 9p and non-9p builds.

It seems to me that we want to add a lot of complexity and nuance to
the "make linux" target. It might make sense to first split bootlinux
into three playbooks:

1. build this { URL, commit hash } tuple
2. install from the source tree or existing package
3. handle the grub details

Then A/B testing (and other new features) can be built on top of those
smaller plays.

Or, simply add a new "make linux-ab" target for this use case; I'm
guessing that the current set of "make linux" tasks might be utilized
far more frequently than this specialized case.

(Architecting the kdevops UX might be a little beyond the skill set of
AI code generators right at the moment)

More comments below...


> Generated-by: Claude AI
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  PROMPTS.md                                  |  48 ++++++
>  docs/kdevops-make-linux.md                  | 158 ++++++++++++++++++++
>  playbooks/roles/bootlinux/defaults/main.yml |  12 ++
>  playbooks/roles/bootlinux/tasks/main.yml    |  99 +++++++++++-
>  scripts/infer_last_stable_kernel.sh         |  35 +++++
>  workflows/linux/Kconfig                     | 116 +++++++++++++-
>  workflows/linux/Makefile                    |  39 +++++
>  7 files changed, 500 insertions(+), 7 deletions(-)
>  create mode 100755 scripts/infer_last_stable_kernel.sh
> 
> diff --git a/PROMPTS.md b/PROMPTS.md
> index a4ecf39..3fde7e9 100644
> --- a/PROMPTS.md
> +++ b/PROMPTS.md
> @@ -123,3 +123,51 @@ source "workflows/mmtests/Kconfig.thpchallenge"
>  source "workflows/mmtests/Kconfig.fs"
>  
>     This separation is preferred as it helps us scale.
> +
> +## Kernel development and A/B testing support
> +
> +### Adding A/B kernel testing support for different kernel versions
> +
> +**Prompt:**
> +We want to add support for when users enable KDEVOPS_BASELINE_AND_DEV we want
> +to extend workflows/linux/Kconfig with the a choise set of options to either a)
> +use the same kernel ref or b) allow the user to specify a different ref tag.
> +This will enable A/B testing with different kernel versions. When a different
> +kernel refs are desirable we will want to extend the compilation step and
> +installation of the Linux kernel in two steps. The first will be for the ref
> +and target of A (baseline tag) and the second will be for the target ref of B
> +(dev tag). However we want to fold these two steps in one for when
> +KDEVOPS_BASELINE_AND_DEV is used and make install is used, it would happen
> +transparently for us. The resulting linux kernel directory would end up with
> +the "dev" ref at the end. In case a user wants to re-compile a target ref for
> +baseline or dev we want to add (if we don't have already) a make linux-baseline
> +and make linux-dev so that we can build and install the target ref tag on the
> +baseline (A) or dev (B). The make linux target then would serially do make
> +linux-baseline and make linux-dev. Extend documentation for all this and also
> +add the respective prompt to PROMPTS.md once done. Avoid adding extra spaces to
> +code or documentation at the end of each line. These end up in red color on
> +diffs and hurt my eyes. Extend CLAUDE.md to understand styling for these rules
> +about not wanting lines ending in white space for styling.
> +
> +**AI:** Claude Code
> +**Commit:** [To be determined]
> +**Result:** Complete A/B kernel testing implementation with comprehensive configuration options.
> +**Grading:** 70%
> +
> +**Notes:**
> +
> +The implementation successfully added:
> +
> +1. **Makefile Implementation**: the AI failed to grasp the value of
> +   output yaml, and made ugly Makefile changes to extract variables.
> +
> +2. **Ansible Integration**: The AI failed to write the required changes on
> +   the ansible playbook at first. A secondary prompt made it just move the
> +   definitions to the ansible playbook but failed to address serially compiling
> +   linux for the baseline group first followed by the dev group after.
> +
> +3. **Documentation**: The AI is not grasping the preference to respect 80
> +   character lengths.
> +
> +4. **Menus**: The AI didn't do a good job at placing menus in a way that
> +   would make more intuitive sense for users.
> diff --git a/docs/kdevops-make-linux.md b/docs/kdevops-make-linux.md
> index e68eee5..8f54372 100644
> --- a/docs/kdevops-make-linux.md
> +++ b/docs/kdevops-make-linux.md
> @@ -13,3 +13,161 @@ To verify the kernel on it:
>  ```bash
>  make uname
>  ```
> +
> +## A/B Kernel Testing
> +
> +kdevops supports A/B testing with different kernel versions when
> +`KDEVOPS_BASELINE_AND_DEV` is enabled. This allows you to compare performance
> +or behavior between different kernel versions across baseline and development nodes.
> +
> +### Configuration Options
> +
> +When A/B testing is enabled, you can choose between two approaches:
> +
> +#### Same Kernel Reference (Default)
> +Use the same kernel tree and reference for both baseline and dev nodes:
> +```
> +A/B kernel testing configuration (BOOTLINUX_AB_SAME_REF) [Y/n/?]
> +```
> +
> +This is useful for testing configuration changes or different test parameters
> +with identical kernels.
> +
> +#### Different Kernel References
> +Use different kernel references for baseline and dev nodes:
> +```
> +A/B kernel testing configuration
> +  1. Use same kernel reference for baseline and dev (BOOTLINUX_AB_SAME_REF)
> +> 2. Use different kernel references for baseline and dev (BOOTLINUX_AB_DIFFERENT_REF)
> +```
> +
> +This enables testing between different kernel versions, commits, or branches.
> +
> +When using different references, configure:
> +- **Development kernel tree URL**: Git repository (defaults to baseline tree)
> +- **Development kernel reference**: Branch, tag, or commit (e.g., "v6.8", "linux-next")
> +- **Development kernel release/local version**: Custom version strings for identification
> +
> +### Make Targets
> +
> +#### Standard Linux Building
> +```bash
> +make linux                 # Build and install kernels for all nodes
> +```
> +
> +When A/B testing with different references is enabled, this automatically:
> +1. Builds and installs baseline kernel on baseline nodes
> +2. Builds and installs development kernel on dev nodes
> +3. Leaves the working directory with the dev kernel checked out
> +
> +#### Individual Node Targeting
> +```bash
> +make linux-baseline        # Build and install kernel for baseline nodes only
> +make linux-dev             # Build and install kernel for dev nodes only
> +```
> +
> +These targets are available when `KDEVOPS_BASELINE_AND_DEV=y` and allow
> +selective building and installation.
> +
> +### Usage Examples
> +
> +#### Testing Kernel Versions
> +Compare v6.7 (baseline) vs v6.8 (development):
> +
> +```bash
> +# Configure baseline kernel
> +menuconfig → Workflows → Linux kernel → Git tree to clone: linus
> +            Reference to use: v6.7
> +
> +# Configure A/B testing
> +menuconfig → Workflows → Linux kernel → A/B kernel testing
> +            → Use different kernel references
> +            → Development kernel reference: v6.8
> +
> +make bringup               # Provision baseline and dev nodes
> +make linux                 # Install v6.7 on baseline, v6.8 on dev
> +make fstests               # Run tests on both kernel versions
> +make fstests-compare       # Compare results between versions
> +```
> +
> +#### Testing Development Branches
> +Compare stable vs linux-next:
> +
> +```bash
> +# Baseline: stable kernel
> +menuconfig → Reference to use: v6.8
> +
> +# Development: linux-next
> +menuconfig → A/B kernel testing → Development kernel reference: linux-next
> +
> +make linux-baseline        # Install stable kernel on baseline nodes
> +make linux-dev             # Install linux-next on dev nodes
> +```
> +
> +#### Bisection Support
> +Test specific commits during bisection:
> +
> +```bash
> +# Update development reference for bisection
> +menuconfig → Development kernel reference: abc123def
> +
> +make linux-dev             # Install bisection commit on dev nodes
> +# Run tests and analyze results
> +```
> +
> +### Working Directory State
> +
> +After running `make linux` with different references:
> +- The Linux source directory contains the **development kernel** checkout
> +- Both baseline and dev nodes have their respective kernels installed
> +- Use `git log --oneline -5` to verify the current checkout
> +
> +To switch the working directory to baseline:
> +```bash
> +git checkout v6.7          # Switch to baseline reference
> +```
> +
> +### Integration with Testing Workflows
> +
> +A/B kernel testing integrates seamlessly with all kdevops testing workflows:
> +
> +```bash
> +# Run fstests with kernel comparison
> +make linux                 # Install different kernels
> +make fstests               # Test both kernel versions
> +make fstests-compare       # Generate comparison analysis
> +
> +# Run fio-tests with kernel comparison
> +make linux                 # Install different kernels
> +make fio-tests             # Performance test both kernels
> +make fio-tests-compare     # Compare performance metrics

I know that Chinner has made claims about how closely his QEMU-based
performance testing approaches the same results from bare metal testing.
Even so ...

This example is a little dangerous. fio results (especially latency)
will depend on there being enough idle CPU horsepower on the hypervisor
system. For cloud, all bets are off, because we have no control over the
other workloads on that physical host unless we have rented a bare metal
instance.

Even in the guestfs world, it's easy to set up a configuration where you
have provisioned more vCPUs than your system has threads / cores. There
can be other workload traffic on the system too, and you can be sure
that running the A and B tests at the same time will interfere with each
other.

"-compare" makes total sense for functional test results. For comparing
automatically generated performance results, there are loads of caveats.

(I rely on fio testing for NFS, so I'm pleased to see the fio-test
workflow -- but still, plenty of care must be taken before the results
can be used for inter-run comparison).


> +
> +# Run sysbench with kernel comparison
> +make linux                 # Install different kernels
> +make sysbench              # Database tests on both kernels
> +```
> +
> +### Best Practices
> +
> +1. **Version Identification**: Use descriptive kernel release versions to distinguish builds

This has been an ongoing problem for my "build once, package, and
install everywhere" set. How to identify which package to install,
in particular when the build step and the install step do not take
place during the same "make linux" run?

The kernel's Kconfig has the CONFIG_LOCALVERSION_AUTO option which
adds the 12-hexit commit hash to the name of the kernel:

   6.16.0-rc6-00005-g6b59765c97a3

Which:
- Gives us an identifier tied to a specific version of the code base
- rpmbuild makes part of the kernel package name, like so:

   kernel-6.12.40_rc1_g596aae841edf-2.x86_64.rpm

But CONFIG_LOCALVERSION_AUTO does not identify the particular .config
used to build the kernel. For instance you might want to A/B the same
code base built with different .configs. Or, maybe you are explicitly
intending not to support multiple .configs here.


> +2. **Sequential Testing**: Install kernels before running test workflows
> +3. **Result Organization**: Use baseline/dev labels in test result analysis
> +4. **Git Management**: Keep track of which reference is currently checked out
> +5. **Systematic Comparison**: Use `*-compare` targets for meaningful analysis
> +
> +### Troubleshooting
> +
> +#### Build Failures
> +- Ensure both kernel references are valid and accessible
> +- Check that build dependencies are installed on all nodes
> +- Verify git repository permissions and network connectivity
> +
> +#### Version Conflicts
> +- Use different `kernelrelease` and `localversion` settings for clear identification
> +- Check `/boot` directory for kernel installation conflicts
> +- Verify GRUB configuration after kernel installation
> +
> +#### Node Targeting Issues
> +- Confirm `KDEVOPS_BASELINE_AND_DEV=y` is enabled
> +- Verify baseline and dev node groups exist in inventory
> +- Check ansible host patterns with `make linux-baseline HOSTS=baseline`
> diff --git a/playbooks/roles/bootlinux/defaults/main.yml b/playbooks/roles/bootlinux/defaults/main.yml
> index fd5674b..4146292 100644
> --- a/playbooks/roles/bootlinux/defaults/main.yml
> +++ b/playbooks/roles/bootlinux/defaults/main.yml
> @@ -52,3 +52,15 @@ kdevops_workflow_enable_cxl: False
>  
>  bootlinux_cxl_test: False
>  bootlinux_tree_set_by_cli: False
> +
> +# A/B testing defaults
> +bootlinux_ab_same_ref: True
> +bootlinux_ab_different_ref: False
> +
> +# Development kernel settings (used when bootlinux_ab_different_ref is True)
> +bootlinux_dev_tree: ""
> +bootlinux_dev_tree_ref: "master"
> +bootlinux_dev_tree_kernelrelease: ""
> +bootlinux_dev_tree_localversion: ""
> +bootlinux_tree_custom_kernelrelease: False
> +bootlinux_tree_custom_localversion: false
> diff --git a/playbooks/roles/bootlinux/tasks/main.yml b/playbooks/roles/bootlinux/tasks/main.yml
> index 7671389..283ac88 100644
> --- a/playbooks/roles/bootlinux/tasks/main.yml
> +++ b/playbooks/roles/bootlinux/tasks/main.yml
> @@ -61,6 +61,74 @@
>    when:
>      - not kdevops_baseline_and_dev|bool
>  
> +- name: Determine if this is a dev node for A/B testing
> +  set_fact:
> +    bootlinux_is_dev_node: "{{ ansible_hostname | regex_search('^.*-dev$') is not none }}"
> +  when:
> +    - kdevops_baseline_and_dev|bool
> +    - bootlinux_ab_different_ref|bool
> +
> +- name: Set development group full custom kernel release
> +  set_fact:
> +    target_linux_kernelrelease: "{{ bootlinux_dev_tree_kernelrelease if bootlinux_dev_tree_kernelrelease != '' else target_linux_kernelrelease }}"
> +  when:
> +    - kdevops_baseline_and_dev|bool
> +    - bootlinux_ab_different_ref|bool
> +    - bootlinux_tree_custom_kernelrelease|bool
> +    - bootlinux_is_dev_node|default(false)|bool
> +
> +- name: Set development group local append version
> +  set_fact:
> +    target_linux_localversion: "{{ bootlinux_dev_tree_localversion if bootlinux_dev_tree_localversion != '' else target_linux_localversion }}"
> +  when:
> +    - kdevops_baseline_and_dev|bool
> +    - bootlinux_ab_different_ref|bool
> +    - bootlinux_tree_custom_localversion|bool
> +    - bootlinux_is_dev_node|default(false)|bool
> +
> +- name: Set development kernel parameters for dev nodes
> +  set_fact:
> +    target_linux_git: "{{ bootlinux_dev_tree if bootlinux_dev_tree != '' else target_linux_git }}"
> +    target_linux_ref: "{{ bootlinux_dev_tree_ref }}"
> +    target_linux_config: "config-{{ bootlinux_dev_tree_ref }}"
> +  when:
> +    - kdevops_baseline_and_dev|bool
> +    - bootlinux_ab_different_ref|bool
> +    - bootlinux_is_dev_node|default(false)|bool
> +
> +- name: Determine active kernel parameters for A/B testing with 9P
> +  set_fact:
> +    active_linux_ref: "{{ bootlinux_dev_tree_ref }}"
> +  when:
> +    - kdevops_baseline_and_dev|bool
> +    - bootlinux_ab_different_ref|bool
> +    - bootlinux_tree_custom_kernelrelease|bool
> +    - bootlinux_9p|bool
> +  run_once: true
> +  delegate_to: localhost
> +
> +- name: Determine full custom kernel release for A/B testing with 9P
> +  set_fact:
> +    active_linux_kernelrelease: "{{ hostvars[groups['dev'][0]]['target_linux_kernelrelease'] if hostvars[groups['dev'][0]]['target_linux_kernelrelease'] is defined else target_linux_kernelrelease }}"
> +  when:
> +    - kdevops_baseline_and_dev|bool
> +    - bootlinux_ab_different_ref|bool
> +    - bootlinux_tree_custom_kernelrelease|bool
> +    - bootlinux_9p|bool
> +  run_once: true
> +  delegate_to: localhost
> +
> +- name: Determine localversion kernel release for A/B testing with 9P
> +  set_fact:
> +    active_linux_localversion: "{{ hostvars[groups['dev'][0]]['target_linux_localversion'] if hostvars[groups['dev'][0]]['target_linux_localversion'] is defined else target_linux_localversion }}"
> +  when:
> +    - kdevops_baseline_and_dev|bool
> +    - bootlinux_ab_different_ref|bool
> +    - bootlinux_tree_custom_localversion|bool
> +    - bootlinux_9p|bool
> +  run_once: true
> +  delegate_to: localhost
> +
>  - include_role:
>      name: create_data_partition
>  
> @@ -412,6 +480,20 @@
>      - not bootlinux_9p|bool
>      - snaik_oil_file.stat.exists
>  
> +- name: Checkout correct git ref for A/B testing with 9P
> +  git:
> +    repo: "{{ target_linux_git }}"
> +    dest: "{{ bootlinux_9p_host_path }}"
> +    version: "{{ active_linux_ref | default(target_linux_ref) }}"
> +    force: yes
> +  tags: [ 'build-linux' ]
> +  when:
> +    - bootlinux_9p|bool
> +    - kdevops_baseline_and_dev|bool
> +    - bootlinux_ab_different_ref|bool
> +  run_once: true
> +  delegate_to: localhost
> +
>  - name: Get nproc on the control node
>    command: "{{ num_jobs }}"
>    tags: [ 'build-linux', 'cxl-build' ]
> @@ -429,21 +511,24 @@
>    tags: [ 'build-linux' ]
>    when:
>      - bootlinux_9p|bool
> -    - target_linux_kernelrelease | length > 0
> +    - (active_linux_kernelrelease | default(target_linux_kernelrelease)) | length > 0
>    run_once: true
>    delegate_to: localhost
>  
> -- name: Generate user kernelrelease {{ target_linux_kernelversion.stdout }}-{{ target_linux_kernelrelease }}
> +- name: Generate user kernelrelease
>    set_fact:
> -    target_user_kernelrelease: "{{ target_linux_kernelversion.stdout }}-{{ target_linux_kernelrelease }}"
> +    target_user_kernelrelease: "{{ target_linux_kernelversion.stdout }}-{{ active_linux_kernelrelease | default(target_linux_kernelrelease) }}"
>    tags: [ 'build-linux' ]
>    when:
>      - bootlinux_9p|bool
> -    - target_linux_kernelrelease | length > 0
> +    - bootlinux_tree_custom_kernelrelease|bool
> +    - (active_linux_kernelrelease | default(target_linux_kernelrelease)) | length > 0
> +    - target_linux_kernelversion is defined
> +    - target_linux_kernelversion.stdout is defined
>    run_once: true
>    delegate_to: localhost
>  
> -- name: Build {{ target_linux_tree }} {{ target_user_kernelrelease }} on the control node using {{ nproc_9p.stdout }} threads
> +- name: Build {{ target_linux_tree }} with custom kernel release on the control node using {{ nproc_9p.stdout }} threads
>    make:
>      jobs: "{{ nproc_9p.stdout }}"
>      chdir: "{{ bootlinux_9p_host_path }}"
> @@ -452,11 +537,13 @@
>    tags: [ 'build-linux' ]
>    when:
>      - bootlinux_9p|bool
> +    - bootlinux_tree_custom_kernelrelease|bool
>      - target_linux_kernelrelease | length > 0
> +    - target_user_kernelrelease is defined
>    run_once: true
>    delegate_to: localhost
>  
> -- name: Build {{ target_linux_tree }} {{ target_user_kernelrelease }} on the control node using {{ nproc_9p.stdout }} threads
> +- name: Build {{ target_linux_tree }} on the control node using {{ nproc_9p.stdout }} threads
>    make:
>      jobs: "{{ nproc_9p.stdout }}"
>      chdir: "{{ bootlinux_9p_host_path }}"
> diff --git a/scripts/infer_last_stable_kernel.sh b/scripts/infer_last_stable_kernel.sh
> new file mode 100755
> index 0000000..9cc19a9
> --- /dev/null
> +++ b/scripts/infer_last_stable_kernel.sh
> @@ -0,0 +1,35 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: copyleft-next-0.3.1
> +
> +# This script infers the last stable kernel version from the git repository.
> +# It looks for the most recent non-rc tag (e.g., v6.14, v6.13) that would
> +# be a good default for A/B testing with different kernel references.
> +
> +GIT_TREE="${1:-/mirror/linux.git}"
> +
> +if [ ! -d "$GIT_TREE" ]; then
> +    echo "v6.12"  # fallback if no git tree available
> +    exit 0
> +fi
> +
> +# Get all v6.x tags, excluding release candidates
> +# Sort them by version and get the last stable release
> +LAST_STABLE=$(git --git-dir="$GIT_TREE" tag --list 'v6.*' | \
> +    grep -v -- '-rc' | \
> +    sort -V | \
> +    tail -1)
> +
> +if [ -z "$LAST_STABLE" ]; then
> +    # If no stable v6.x found, try v5.x as fallback
> +    LAST_STABLE=$(git --git-dir="$GIT_TREE" tag --list 'v5.*' | \
> +        grep -v -- '-rc' | \
> +        sort -V | \
> +        tail -1)
> +fi
> +
> +# Final fallback if nothing found
> +if [ -z "$LAST_STABLE" ]; then
> +    echo "v6.12"
> +else
> +    echo "$LAST_STABLE"
> +fi
> \ No newline at end of file
> diff --git a/workflows/linux/Kconfig b/workflows/linux/Kconfig
> index 183ac77..d08270e 100644
> --- a/workflows/linux/Kconfig
> +++ b/workflows/linux/Kconfig
> @@ -139,6 +139,41 @@ config BOOTLINUX_CUSTOM
>  
>  endchoice
>  
> +if KDEVOPS_BASELINE_AND_DEV
> +
> +choice
> +	prompt "A/B kernel testing configuration"
> +	default BOOTLINUX_AB_DIFFERENT_REF
> +	help
> +	  When A/B testing is enabled, you can choose to use the same
> +	  kernel reference for both baseline and dev nodes, or specify
> +	  different kernel references to test different kernel versions.
> +	  We default to assuming you want to test a different kernel on
> +	  each.
> +
> +config BOOTLINUX_AB_SAME_REF
> +	bool "Use same kernel reference for baseline and dev"
> +	output yaml
> +	help
> +	  Use the same kernel tree and reference for both baseline and
> +	  development nodes. This is useful for testing configuration
> +	  changes or different test parameters with the same kernel.
> +
> +config BOOTLINUX_AB_DIFFERENT_REF
> +	bool "Use different kernel references for baseline and dev"
> +	output yaml
> +	help
> +	  Use different kernel references for baseline and development
> +	  nodes. This enables testing between different kernel versions,
> +	  commits, or branches. The baseline will use the main configured
> +	  kernel reference, while dev uses a separate reference.
> +
> +endchoice
> +
> +endif
> +
> +menu "A -     main    group kernel configuration"
> +
>  source "workflows/linux/Kconfig.linus"
>  source "workflows/linux/Kconfig.stable"
>  source "workflows/linux/Kconfig.dev"
> @@ -180,6 +215,65 @@ config BOOTLINUX_TREE_CUSTOM_REF
>  
>  endif # BOOTLINUX_CUSTOM
>  
> +endmenu
> +
> +if KDEVOPS_BASELINE_AND_DEV
> +
> +if BOOTLINUX_AB_DIFFERENT_REF
> +
> +menu "B - development group kernel configuration"
> +
> +config BOOTLINUX_DEV_TREE
> +	string "B group development kernel tree URL"
> +	output yaml
> +	default BOOTLINUX_TREE
> +	help
> +	  Git tree URL for the development kernel. If left empty or same
> +	  as the baseline tree, the same tree will be used with a different
> +	  reference. This allows testing different branches or forks.
> +
> +config BOOTLINUX_DEV_TREE_REF
> +	string "B group development kernel reference"
> +	output yaml
> +	default $(shell, scripts/infer_last_stable_kernel.sh)
> +	help
> +	  Git reference (branch, tag, or commit) for the development kernel.
> +	  This should be different from the baseline reference to enable
> +	  meaningful A/B comparison between kernel versions.
> +
> +	  The default is automatically inferred as the most recent stable
> +	  kernel version (e.g., v6.15) from the git repository.
> +
> +	  Examples:
> +	  - "v6.8" (stable release)
> +	  - "linux-next" (latest development)
> +	  - "v6.7..v6.8" (range for bisection)
> +	  - commit SHA (specific commit)
> +
> +config BOOTLINUX_DEV_TREE_KERNELRELEASE
> +	string "Development kernel release version"
> +	depends on BOOTLINUX_TREE_CUSTOM_KERNELRELEASE
> +	output yaml
> +	help
> +	  The string here (e.g. 'devel') will be appended to the result of make
> +	  kernelversion. Example: '6.8.0-rc3-devel' but only for the dev group.
> +	  Leave it empty unless you want a custom tag at the end.
> +
> +config BOOTLINUX_DEV_TREE_LOCALVERSION
> +	string "Development kernel local version"
> +	output yaml
> +	depends on BOOTLINUX_TREE_CUSTOM_LOCALVERSION
> +	default BOOTLINUX_TREE_LOCALVERSION
> +	help
> +	  The Linux local version to use for the development kernel (for uname).
> +	  If left empty, will use the same as baseline.
> +
> +endmenu
> +
> +endif # BOOTLINUX_AB_DIFFERENT_REF
> +
> +endif # KDEVOPS_BASELINE_AND_DEV
> +
>  # This ends up being the directory name used for the /data/ partition
>  # where linux is deployed on the nodes.
>  config BOOTLINUX_TREE_NAME
> @@ -264,23 +358,39 @@ config BOOTLINUX_TREE_REF
>  	default BOOTLINUX_TREE_CEL_LINUX_REF if BOOTLINUX_TREE_CEL_LINUX
>  	default BOOTLINUX_TREE_CUSTOM_REF if BOOTLINUX_CUSTOM
>  
> +config BOOTLINUX_TREE_CUSTOM_KERNELRELEASE
> +	bool "Do you want a full custom kernel release name?"
> +	output yaml
> +	help
> +	  Do you want a full custom Linux kernel release which will be output
> +	  through uname?
> +
>  config BOOTLINUX_TREE_KERNELRELEASE
>  	string "Linux kernel release version to use"
> +	depends on BOOTLINUX_TREE_CUSTOM_KERNELRELEASE
>  	help
>  	  The Linux kernel release version to use (for uname).
>  
>  	  The string here (e.g. 'devel') will be appended to the result of make
>  	  kernelversion. Example: '6.8.0-rc3-devel'
>  
> +config BOOTLINUX_TREE_CUSTOM_LOCALVERSION
> +	bool "Do you want to append a custom kernel release tag?"
> +	output yaml
> +	help
> +	  Do you want a full custom Linux kernel release which will be output
> +	  through uname?
>  
>  config BOOTLINUX_TREE_LOCALVERSION
>  	string "Linux local version to use"
> +	depends on BOOTLINUX_TREE_CUSTOM_LOCALVERSION
>  	help
>  	  The Linux local version to use (for uname).
>  
>  config BOOTLINUX_SHALLOW_CLONE
>  	bool "Shallow git clone"
> -	default y
> +	default y if !KDEVOPS_BASELINE_AND_DEV
> +	depends on !BOOTLINUX_AB_DIFFERENT_REF
>  	help
>  	  If enabled the git tree cloned with be cloned using a shallow tree
>  	  with history truncated. You want to enable this if you really don't
> @@ -291,6 +401,10 @@ config BOOTLINUX_SHALLOW_CLONE
>  	  just using the targets as dummy target runners and don't expect to
>  	  be using 'git log' on the target guests.
>  
> +	  This option is automatically disabled when using A/B testing with
> +	  different kernel references, as shallow clones may not contain all
> +	  the required refs for checkout.
> +
>  config BOOTLINUX_SHALLOW_CLONE_DEPTH
>  	int "Shallow git clone depth"
>  	default 30 if BOOTLINUX_TREE_SET_BY_CLI
> diff --git a/workflows/linux/Makefile b/workflows/linux/Makefile
> index f68c090..ab8bcbb 100644
> --- a/workflows/linux/Makefile
> +++ b/workflows/linux/Makefile
> @@ -71,6 +71,10 @@ PHONY +=  linux-help-menu
>  linux-help-menu:
>  	@echo "Linux git kernel development options"
>  	@echo "linux              - Git clones a linux git tree, build Linux, installs and reboots into it"
> +	@if [[ "$(CONFIG_KDEVOPS_BASELINE_AND_DEV)" == "y" ]]; then \
> +		echo "linux-baseline     - Build and install kernel for baseline nodes only" ;\
> +		echo "linux-dev          - Build and install kernel for dev nodes only" ;\
> +	fi
>  	@if [[ "$(CONFIG_BOOTLINUX_9P)" == "y" ]]; then \
>  		echo "linux-mount        - Mounts 9p path on targets" ;\
>  	fi
> @@ -89,10 +93,45 @@ linux-help-end:
>  LINUX_HELP_EXTRA :=
>  
>  PHONY += linux
> +ifeq (y,$(CONFIG_KDEVOPS_BASELINE_AND_DEV))
> +ifeq (y,$(CONFIG_BOOTLINUX_AB_DIFFERENT_REF))
> +linux: linux-baseline linux-dev
> +else
> +linux: $(KDEVOPS_NODES)
> +	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) -i \
> +		$(KDEVOPS_HOSTFILE) $(KDEVOPS_PLAYBOOKS_DIR)/bootlinux.yml \
> +		--extra-vars="$(BOOTLINUX_ARGS)" $(LIMIT_HOSTS)
> +endif
> +else
>  linux: $(KDEVOPS_NODES)
>  	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) -i \
>  		$(KDEVOPS_HOSTFILE) $(KDEVOPS_PLAYBOOKS_DIR)/bootlinux.yml \
>  		--extra-vars="$(BOOTLINUX_ARGS)" $(LIMIT_HOSTS)
> +endif
> +
> +PHONY += linux-baseline
> +ifeq (y,$(CONFIG_KDEVOPS_BASELINE_AND_DEV))
> +linux-baseline: $(KDEVOPS_NODES)
> +	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) -i \
> +		$(KDEVOPS_HOSTFILE) $(KDEVOPS_PLAYBOOKS_DIR)/bootlinux.yml \
> +		--extra-vars="$(BOOTLINUX_ARGS)" --limit baseline
> +else
> +linux-baseline:
> +	@echo "linux-baseline requires KDEVOPS_BASELINE_AND_DEV=y"
> +	@exit 1
> +endif
> +
> +PHONY += linux-dev
> +ifeq (y,$(CONFIG_KDEVOPS_BASELINE_AND_DEV))
> +linux-dev: $(KDEVOPS_NODES)
> +	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) -i \
> +		$(KDEVOPS_HOSTFILE) $(KDEVOPS_PLAYBOOKS_DIR)/bootlinux.yml \
> +		--extra-vars="$(BOOTLINUX_ARGS)" --limit dev
> +else
> +linux-dev:
> +	@echo "linux-dev requires KDEVOPS_BASELINE_AND_DEV=y"
> +	@exit 1
> +endif
>  
>  PHONY += linux-mount
>  linux-mount:


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] bootlinux: add support for A/B kernel testing
  2025-07-26 18:00   ` Chuck Lever
@ 2025-07-26 20:21     ` Luis Chamberlain
  2025-07-26 21:37       ` Luis Chamberlain
                         ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Luis Chamberlain @ 2025-07-26 20:21 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Daniel Gomez, kdevops

On Sat, Jul 26, 2025 at 02:00:55PM -0400, Chuck Lever wrote:
> On 7/25/25 9:16 PM, Luis Chamberlain wrote:
> > Right now we use the same kernel for all target nodes. We want to
> > compare and contrast different kenrels for different features. We
> > add support for A/B testing by leveraging the baseline and dev groups
> > provided to us by KDEVOPS_BASELINE_AND_DEV.
> > 
> > This extends the bootlinux playbook by enabling us to allow a different
> > kernel tree / ref to be used for the dev group. This just becomes a
> > configuration thing. The targets are intuitive:
> > 
> >   make linux                 # Handles A/B compilation transparently
> >   make linux-baseline        # Build and install baseline kernel only
> >   make linux-dev             # Build and install development kernel only
> 
> My "build the kernel once and package it" patches are still under test,
> but this patch conflicts heavily with that work. I'm not sure how to
> reconcile it,

If you give me a branch, I can just fetch it and ask Claude Code to
refactor my branch with your and take preference for style for your
branch. Its as easy as that.

What do you think?

You can try it as well too, if you like as my tree is public already.
And it may help you test / fix any outstanding bugs.

Any preferred way to go forward?

> but if you feel this patch is ready to merge now, let's
> commit it and I will rework my set.

I rather get this right and focus on proper architecture. We are kernel
hackers and we want to ensure we focus well on this part.

A next feature thing I have on my queue is to have it just have a static
and once a week or so (as we do for the kernel releases) scrape and
update the git.kernel.org trees so they're automatically scraped and
generated for the drop down menu. The manual generation doesn't scale so
well.

> What my set does is change the operation of the "make linux" target so
> that, depending on Kconfig settings, it either:
> 
> - Works as it does today (builds the kernel on each test runner)
> - Builds the kernel and packages it on a separate node, saving the
>   package

I see, OK so an extra option is available to "build on foo system which
is wicked fast".

> - Finds the packaged kernel and installs it on the test runners

Sweeeeet. It would be nice for us to have a sort of registry for kdevops
where if a kernel is already built and available we can use it but alas
we don't yet have any public infrastructure for that. Maybe we can just
regularly later build using OBS daily for next and weekly for RCs and
see if the latest build is availble. That is, we just have a secondary
process which all it does is fetch and build and push to OBS. If using
a vanilla kernel or next kernel then it can use that. This limits the
functionality to standard trees but a possible nice enhacement later.

> Along the way there are a number of clean-ups suggested by Daniel,
> including improving the selection of kernel .configs, and splitting
> apart the tasks for 9p and non-9p builds.

Awesome.

> It seems to me that we want to add a lot of complexity and nuance to
> the "make linux" target. It might make sense to first split bootlinux
> into three playbooks:
> 
> 1. build this { URL, commit hash } tuple
> 2. install from the source tree or existing package
> 3. handle the grub details
> 
> Then A/B testing (and other new features) can be built on top of those
> smaller plays.
> 
> Or, simply add a new "make linux-ab" target for this use case; I'm
> guessing that the current set of "make linux" tasks might be utilized
> far more frequently than this specialized case.

I just exposed make linux-baseline and make make linux-dev, and so the
make linux becomes a sequential operation of these two. I decided to
blend it to make linux just because that's the goal. Its not clear if we
need a different target for when you already configured and know that
you want different kernels for baseline (A) and dev (B). Specially since
I added also an option so that when you enable KDEVOPS_BASELINE_AND_DEV
(essentially what we're using for A/B testing) in the bootlinux menu
you can select if you want both groups to have different kernels or
the same kernel.

So yeah I'm not sure if we really need a linux-ab ?

> (Architecting the kdevops UX might be a little beyond the skill set of
> AI code generators right at the moment)

Indeed. However I'm seeing that once I augment CLAUDE.md with
instructions on preferred style -- it sometimes does pick up on it.
But indeed -- stylistic preferences are best expressed by humans on
CLAUDE.md. And in this case I do think that proper architecture ends
up being something we need to think hard on. I think this is a good
example limitation of boundaries and what we humans end up having to
think more about.

> > +# Run fio-tests with kernel comparison
> > +make linux                 # Install different kernels
> > +make fio-tests             # Performance test both kernels
> > +make fio-tests-compare     # Compare performance metrics
> 
> I know that Chinner has made claims about how closely his QEMU-based
> performance testing approaches the same results from bare metal testing.
> Even so ...
> 
> This example is a little dangerous. fio results (especially latency)
> will depend on there being enough idle CPU horsepower on the hypervisor
> system. For cloud, all bets are off, because we have no control over the
> other workloads on that physical host unless we have rented a bare metal
> instance.

Agreed. Its up to the users to know the above. Its why my fio-tests
patches clarify the same caution you're observing. So I do agree a
revised enhancement on language to caution strongly on this would
be good.

I just use guests to prototype. When I want real data I use bare metal.
And fortunately I'm starting to see bare metal working on kdevops, its
just minor tweaks we have to do. The steady-state stuff I added already
works on bare metal. For other workflows its just a matter of modifying
the create_data role calls to be skipped when SKIP_BRINGUP is enabled,
as its inferred the user would have done this step too. The other one
is that SKIP_BRINGUP should likely select WORKFLOW_INFER_USER_AND_GROUP.
This is so data on /data/ gets the local user / group. And so the way
I intend to use future performance analysis *will* be on bare metal.

Guests are just for prototyping. However there are a few test which one
could *still* likely run with PCI-E passthrough onto SSDs which likely
could still be very useful too. Once we have real data on bare metal we
can compare and contrast against VMs with PCI-E passthorugh (which
kdevops supports) and try to see what the deltas are.

> Even in the guestfs world, it's easy to set up a configuration where you
> have provisioned more vCPUs than your system has threads / cores. There
> can be other workload traffic on the system too, and you can be sure
> that running the A and B tests at the same time will interfere with each
> other.

Yup!

> "-compare" makes total sense for functional test results. For comparing
> automatically generated performance results, there are loads of caveats.

Indeed.

> (I rely on fio testing for NFS, so I'm pleased to see the fio-test
> workflow -- but still, plenty of care must be taken before the results
> can be used for inter-run comparison).

Yes, yes, yes.

I think we're in sync with our preferences for avoiding stupid
performance results. I think the language needs to be enhanced to
ensure those who are not familiar with these issues are brought forward.

Maybe just recommend guests for protyping, and we'd evaluate which
performance matrics *do* make sense with PCI-E passthrough, but advocate
for only true bare metal as the real data. I'm not sure it would help to
have a Kconfig option which would taint performance results or something
like that if the config does not adhere to our accepted norms. Perhaps?
The more I think about it -- the more I like it. Then we'd have semantic
way to express the idea / norms / best practices -- which may also be
useful for the bots.

> > +# Run sysbench with kernel comparison
> > +make linux                 # Install different kernels
> > +make sysbench              # Database tests on both kernels
> > +```
> > +
> > +### Best Practices
> > +
> > +1. **Version Identification**: Use descriptive kernel release versions to distinguish builds
> 
> This has been an ongoing problem for my "build once, package, and
> install everywhere" set. How to identify which package to install,
> in particular when the build step and the install step do not take
> place during the same "make linux" run?
> 
> The kernel's Kconfig has the CONFIG_LOCALVERSION_AUTO option which
> adds the 12-hexit commit hash to the name of the kernel:
> 
>    6.16.0-rc6-00005-g6b59765c97a3
> 
> Which:
> - Gives us an identifier tied to a specific version of the code base
> - rpmbuild makes part of the kernel package name, like so:
> 
>    kernel-6.12.40_rc1_g596aae841edf-2.x86_64.rpm
> 
> But CONFIG_LOCALVERSION_AUTO does not identify the particular .config
> used to build the kernel.

I see -- I thought CONFIG_LOCALVERSION_AUTO was already nice enough,
so you wan to track configs more closely.

Maybe we should enable CONFIG_IKCONFIG and CONFIG_IKCONFIG_PROC and 
to enhance this further perhaps we can also add either upstream to Linux
or on kdevops an equivalent sha256sum of the config. So if not upstream
on Linux we'd add to the kdevops kconfig cat linux/.config | sha256sum
and I'd hope that $(zcat /proc/config.gz | sha256sum) on the runtime on
the booted target DUT would match.

> For instance you might want to A/B the same
> code base built with different .configs. Or, maybe you are explicitly
> intending not to support multiple .configs here.

You are totally right -- let's just extend our semantics to include more
config details as suggested above. Thoughts?

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] bootlinux: add support for A/B kernel testing
  2025-07-26 20:21     ` Luis Chamberlain
@ 2025-07-26 21:37       ` Luis Chamberlain
  2025-07-26 22:46       ` Luis Chamberlain
  2025-07-26 23:35       ` Chuck Lever
  2 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2025-07-26 21:37 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Daniel Gomez, kdevops

On Sat, Jul 26, 2025 at 01:21:16PM -0700, Luis Chamberlain wrote:
> Maybe just recommend guests for protyping, and we'd evaluate which
> performance matrics *do* make sense with PCI-E passthrough, but advocate
> for only true bare metal as the real data.

Come to think of it -- I forgot our sysbench work which relied on AWS
i4i.4xlarge. I'd hope we can at least agree that's a sensible baseline
for hardware performance.

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] bootlinux: add support for A/B kernel testing
  2025-07-26 20:21     ` Luis Chamberlain
  2025-07-26 21:37       ` Luis Chamberlain
@ 2025-07-26 22:46       ` Luis Chamberlain
  2025-07-26 23:16         ` Chuck Lever
  2025-07-26 23:35       ` Chuck Lever
  2 siblings, 1 reply; 13+ messages in thread
From: Luis Chamberlain @ 2025-07-26 22:46 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Daniel Gomez, kdevops

On Sat, Jul 26, 2025 at 01:21:16PM -0700, Luis Chamberlain wrote:
> Maybe we should enable CONFIG_IKCONFIG and CONFIG_IKCONFIG_PROC and 
> to enhance this further perhaps we can also add either upstream to Linux
> or on kdevops an equivalent sha256sum of the config. So if not upstream
> on Linux we'd add to the kdevops kconfig cat linux/.config | sha256sum
> and I'd hope that $(zcat /proc/config.gz | sha256sum) on the runtime on
> the booted target DUT would match.

Actually, augmenting the upstream way to append the version to *also*
append a checksum of the kernel config would be even more deterministic.
But this is getting out of hand in terms of how long that string would
get. I wonder if a) do we care about that long lenght b) if we should
have more information just printed on the kernel boot like this
metadata. Then uname -c or something could be added to, to help us get
the config checksum used.

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] bootlinux: add support for A/B kernel testing
  2025-07-26 22:46       ` Luis Chamberlain
@ 2025-07-26 23:16         ` Chuck Lever
  2025-07-26 23:34           ` Luis Chamberlain
  0 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2025-07-26 23:16 UTC (permalink / raw)
  To: Luis Chamberlain; +Cc: Daniel Gomez, kdevops

On 7/26/25 6:46 PM, Luis Chamberlain wrote:
> On Sat, Jul 26, 2025 at 01:21:16PM -0700, Luis Chamberlain wrote:
>> Maybe we should enable CONFIG_IKCONFIG and CONFIG_IKCONFIG_PROC and 
>> to enhance this further perhaps we can also add either upstream to Linux
>> or on kdevops an equivalent sha256sum of the config. So if not upstream
>> on Linux we'd add to the kdevops kconfig cat linux/.config | sha256sum
>> and I'd hope that $(zcat /proc/config.gz | sha256sum) on the runtime on
>> the booted target DUT would match.
> 
> Actually, augmenting the upstream way to append the version to *also*
> append a checksum of the kernel config would be even more deterministic.
> But this is getting out of hand in terms of how long that string would
> get. I wonder if a) do we care about that long lenght b) if we should
> have more information just printed on the kernel boot like this
> metadata. Then uname -c or something could be added to, to help us get
> the config checksum used.
Well, we are basically looking for a way to identify which release is
under test (or which RPM should be installed). Maybe the 12-hexit hash
plus the build date is unique enough?


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] bootlinux: add support for A/B kernel testing
  2025-07-26 23:16         ` Chuck Lever
@ 2025-07-26 23:34           ` Luis Chamberlain
  0 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2025-07-26 23:34 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Daniel Gomez, kdevops

On Sat, Jul 26, 2025 at 07:16:07PM -0400, Chuck Lever wrote:
> On 7/26/25 6:46 PM, Luis Chamberlain wrote:
> > On Sat, Jul 26, 2025 at 01:21:16PM -0700, Luis Chamberlain wrote:
> >> Maybe we should enable CONFIG_IKCONFIG and CONFIG_IKCONFIG_PROC and 
> >> to enhance this further perhaps we can also add either upstream to Linux
> >> or on kdevops an equivalent sha256sum of the config. So if not upstream
> >> on Linux we'd add to the kdevops kconfig cat linux/.config | sha256sum
> >> and I'd hope that $(zcat /proc/config.gz | sha256sum) on the runtime on
> >> the booted target DUT would match.
> > 
> > Actually, augmenting the upstream way to append the version to *also*
> > append a checksum of the kernel config would be even more deterministic.
> > But this is getting out of hand in terms of how long that string would
> > get. I wonder if a) do we care about that long lenght b) if we should
> > have more information just printed on the kernel boot like this
> > metadata. Then uname -c or something could be added to, to help us get
> > the config checksum used.
> Well, we are basically looking for a way to identify which release is
> under test (or which RPM should be installed). Maybe the 12-hexit hash
> plus the build date is unique enough?

Ah I though you wanted a kernel-commit and config used deterministic
thing. So this sounds like something different. Where would this
12-hexit hash come from? The git commit? I guess perhaps I don't
understand some other requirements you have yet.

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] bootlinux: add support for A/B kernel testing
  2025-07-26 20:21     ` Luis Chamberlain
  2025-07-26 21:37       ` Luis Chamberlain
  2025-07-26 22:46       ` Luis Chamberlain
@ 2025-07-26 23:35       ` Chuck Lever
  2025-07-27  0:06         ` Luis Chamberlain
  2 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2025-07-26 23:35 UTC (permalink / raw)
  To: Luis Chamberlain; +Cc: Daniel Gomez, kdevops

On 7/26/25 4:21 PM, Luis Chamberlain wrote:
> On Sat, Jul 26, 2025 at 02:00:55PM -0400, Chuck Lever wrote:
>> On 7/25/25 9:16 PM, Luis Chamberlain wrote:
>>> Right now we use the same kernel for all target nodes. We want to
>>> compare and contrast different kenrels for different features. We
>>> add support for A/B testing by leveraging the baseline and dev groups
>>> provided to us by KDEVOPS_BASELINE_AND_DEV.
>>>
>>> This extends the bootlinux playbook by enabling us to allow a different
>>> kernel tree / ref to be used for the dev group. This just becomes a
>>> configuration thing. The targets are intuitive:
>>>
>>>   make linux                 # Handles A/B compilation transparently
>>>   make linux-baseline        # Build and install baseline kernel only
>>>   make linux-dev             # Build and install development kernel only
>>
>> My "build the kernel once and package it" patches are still under test,
>> but this patch conflicts heavily with that work. I'm not sure how to
>> reconcile it,
> 
> If you give me a branch, I can just fetch it and ask Claude Code to
> refactor my branch with your and take preference for style for your
> branch. Its as easy as that.

https://github.com/chucklever/kdevops/tree/builder

This is the first it's seen the light of day. :-) I have a few other
bowling balls in the air, so it's taken a while to get robust water
fowl alignment.


> What do you think?

You are welcome to experiment and/or provide feedback to me about the
ideas.


> You can try it as well too, if you like as my tree is public already.
> And it may help you test / fix any outstanding bugs.

Unfortunately my employer has a policy that we cannot make use of public
AI code generators at this time, so I can't try it quite yet.


>> What my set does is change the operation of the "make linux" target so
>> that, depending on Kconfig settings, it either:
>>
>> - Works as it does today (builds the kernel on each test runner)
>> - Builds the kernel and packages it on a separate node, saving the
>>   package
> 
> I see, OK so an extra option is available to "build on foo system which
> is wicked fast".

Yes, one new option for that, and one new option for "please install
from the packages sitting in this directory on the Ansible controller.

I'm also thinking that, for cloud, kdevops could set up a build pipeline
to build the kernel. So far, my tenancies are limited to 10 vCPUs at a
time. But the pipelines seem to get as many as 16, making the builds
quick.


>> - Finds the packaged kernel and installs it on the test runners
> 
> Sweeeeet. It would be nice for us to have a sort of registry for kdevops
> where if a kernel is already built and available we can use it but alas
> we don't yet have any public infrastructure for that.

Yes. Scott and I have been going back and forth about how to do such a
thing.

For cloud, it is typical for artifacts built by a pipeline to land in
a public S3-style bucket. We could certainly put them there and have
some scripting to create a website around it.

For guestfs, kdevops could set up a local minIO instance to do much the
same. Or the guests could NFS-mount the controller.


> Maybe we can just
> regularly later build using OBS daily for next and weekly for RCs and
> see if the latest build is availble. That is, we just have a secondary
> process which all it does is fetch and build and push to OBS. If using
> a vanilla kernel or next kernel then it can use that. This limits the
> functionality to standard trees but a possible nice enhacement later.

The primary goal is reducing the number of times we have to build any
kernel. Building them on a schedule (if there is a change) is better
than building the kernel every time a change is noticed, I think (and
unfortunately that is what I'm doing right now).

The reason to build kernels on the spot is for development work. For
CI, having one build, and using that build everywhere you can, seems
like that should be our goal.


>> Along the way there are a number of clean-ups suggested by Daniel,
>> including improving the selection of kernel .configs, and splitting
>> apart the tasks for 9p and non-9p builds.
> 
> Awesome.
> 
>> It seems to me that we want to add a lot of complexity and nuance to
>> the "make linux" target. It might make sense to first split bootlinux
>> into three playbooks:
>>
>> 1. build this { URL, commit hash } tuple
>> 2. install from the source tree or existing package
>> 3. handle the grub details
>>
>> Then A/B testing (and other new features) can be built on top of those
>> smaller plays.
>>
>> Or, simply add a new "make linux-ab" target for this use case; I'm
>> guessing that the current set of "make linux" tasks might be utilized
>> far more frequently than this specialized case.
> 
> I just exposed make linux-baseline and make make linux-dev, and so the
> make linux becomes a sequential operation of these two. I decided to
> blend it to make linux just because that's the goal. Its not clear if we
> need a different target for when you already configured and know that
> you want different kernels for baseline (A) and dev (B). Specially since
> I added also an option so that when you enable KDEVOPS_BASELINE_AND_DEV
> (essentially what we're using for A/B testing) in the bootlinux menu
> you can select if you want both groups to have different kernels or
> the same kernel.
> 
> So yeah I'm not sure if we really need a linux-ab ?
> 
>> (Architecting the kdevops UX might be a little beyond the skill set of
>> AI code generators right at the moment)
> 
> Indeed. However I'm seeing that once I augment CLAUDE.md with
> instructions on preferred style -- it sometimes does pick up on it.
> But indeed -- stylistic preferences are best expressed by humans on
> CLAUDE.md. And in this case I do think that proper architecture ends
> up being something we need to think hard on. I think this is a good
> example limitation of boundaries and what we humans end up having to
> think more about.
> 
>>> +# Run fio-tests with kernel comparison
>>> +make linux                 # Install different kernels
>>> +make fio-tests             # Performance test both kernels
>>> +make fio-tests-compare     # Compare performance metrics
>>
>> I know that Chinner has made claims about how closely his QEMU-based
>> performance testing approaches the same results from bare metal testing.
>> Even so ...
>>
>> This example is a little dangerous. fio results (especially latency)
>> will depend on there being enough idle CPU horsepower on the hypervisor
>> system. For cloud, all bets are off, because we have no control over the
>> other workloads on that physical host unless we have rented a bare metal
>> instance.
> 
> Agreed. Its up to the users to know the above. Its why my fio-tests
> patches clarify the same caution you're observing. So I do agree a
> revised enhancement on language to caution strongly on this would
> be good.
> 
> I just use guests to prototype. When I want real data I use bare metal.
> And fortunately I'm starting to see bare metal working on kdevops, its
> just minor tweaks we have to do. The steady-state stuff I added already
> works on bare metal. For other workflows its just a matter of modifying
> the create_data role calls to be skipped when SKIP_BRINGUP is enabled,
> as its inferred the user would have done this step too. The other one
> is that SKIP_BRINGUP should likely select WORKFLOW_INFER_USER_AND_GROUP.
> This is so data on /data/ gets the local user / group. And so the way
> I intend to use future performance analysis *will* be on bare metal.
> 
> Guests are just for prototyping. However there are a few test which one
> could *still* likely run with PCI-E passthrough onto SSDs which likely
> could still be very useful too. Once we have real data on bare metal we
> can compare and contrast against VMs with PCI-E passthorugh (which
> kdevops supports) and try to see what the deltas are.

Interesting -- I didn't know you have so much bare metal work going on!

AFAI recall, kdevops/aws is the only provider where we offer a bare
metal choice. I can have a look at the others to see what is easy to
introduce.


>> Even in the guestfs world, it's easy to set up a configuration where you
>> have provisioned more vCPUs than your system has threads / cores. There
>> can be other workload traffic on the system too, and you can be sure
>> that running the A and B tests at the same time will interfere with each
>> other.
> 
> Yup!
> 
>> "-compare" makes total sense for functional test results. For comparing
>> automatically generated performance results, there are loads of caveats.
> 
> Indeed.
> 
>> (I rely on fio testing for NFS, so I'm pleased to see the fio-test
>> workflow -- but still, plenty of care must be taken before the results
>> can be used for inter-run comparison).
> 
> Yes, yes, yes.
> 
> I think we're in sync with our preferences for avoiding stupid
> performance results. I think the language needs to be enhanced to
> ensure those who are not familiar with these issues are brought forward.
> 
> Maybe just recommend guests for protyping, and we'd evaluate which
> performance matrics *do* make sense with PCI-E passthrough, but advocate
> for only true bare metal as the real data. I'm not sure it would help to
> have a Kconfig option which would taint performance results or something
> like that if the config does not adhere to our accepted norms. Perhaps?
> The more I think about it -- the more I like it. Then we'd have semantic
> way to express the idea / norms / best practices -- which may also be
> useful for the bots.

Watermarking the results by injecting timing jitter? ;-)


>>> +# Run sysbench with kernel comparison
>>> +make linux                 # Install different kernels
>>> +make sysbench              # Database tests on both kernels
>>> +```
>>> +
>>> +### Best Practices
>>> +
>>> +1. **Version Identification**: Use descriptive kernel release versions to distinguish builds
>>
>> This has been an ongoing problem for my "build once, package, and
>> install everywhere" set. How to identify which package to install,
>> in particular when the build step and the install step do not take
>> place during the same "make linux" run?
>>
>> The kernel's Kconfig has the CONFIG_LOCALVERSION_AUTO option which
>> adds the 12-hexit commit hash to the name of the kernel:
>>
>>    6.16.0-rc6-00005-g6b59765c97a3
>>
>> Which:
>> - Gives us an identifier tied to a specific version of the code base
>> - rpmbuild makes part of the kernel package name, like so:
>>
>>    kernel-6.12.40_rc1_g596aae841edf-2.x86_64.rpm
>>
>> But CONFIG_LOCALVERSION_AUTO does not identify the particular .config
>> used to build the kernel.
> 
> I see -- I thought CONFIG_LOCALVERSION_AUTO was already nice enough,
> so you wan to track configs more closely.

I set CONFIG_LOCALVERSION_AUTO pretty much everywhere. I am already a
client, as the TV ads used to say.

I'm just pointing out a possible use case where it wouldn't be enough.
And maybe we worry about that use case some other time. There is a lot
going on in this one patch!


> Maybe we should enable CONFIG_IKCONFIG and CONFIG_IKCONFIG_PROC and 
> to enhance this further perhaps we can also add either upstream to Linux
> or on kdevops an equivalent sha256sum of the config. So if not upstream
> on Linux we'd add to the kdevops kconfig cat linux/.config | sha256sum
> and I'd hope that $(zcat /proc/config.gz | sha256sum) on the runtime on
> the booted target DUT would match.
> 
>> For instance you might want to A/B the same
>> code base built with different .configs. Or, maybe you are explicitly
>> intending not to support multiple .configs here.
> 
> You are totally right -- let's just extend our semantics to include more
> config details as suggested above. Thoughts?
> 
>   Luis


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] bootlinux: add support for A/B kernel testing
  2025-07-26 23:35       ` Chuck Lever
@ 2025-07-27  0:06         ` Luis Chamberlain
  0 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2025-07-27  0:06 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Daniel Gomez, kdevops

On Sat, Jul 26, 2025 at 07:35:54PM -0400, Chuck Lever wrote:
> On 7/26/25 4:21 PM, Luis Chamberlain wrote:
> > If you give me a branch, I can just fetch it and ask Claude Code to
> > refactor my branch with your and take preference for style for your
> > branch. Its as easy as that.
> 
> https://github.com/chucklever/kdevops/tree/builder
> 
> You are welcome to experiment and/or provide feedback to me about the
> ideas.

Will do, specially given your employer constraints.

> > You can try it as well too, if you like as my tree is public already.
> > And it may help you test / fix any outstanding bugs.
> 
> Unfortunately my employer has a policy that we cannot make use of public
> AI code generators at this time, so I can't try it quite yet.

I hope PROMPTS.md's growth on kdevops will show the value and without
it folks will just fall behind. Fast. I intentionally made PROMPTS.md
to help folks not only learn but also help with employers.

> >> What my set does is change the operation of the "make linux" target so
> >> that, depending on Kconfig settings, it either:
> >>
> >> - Works as it does today (builds the kernel on each test runner)
> >> - Builds the kernel and packages it on a separate node, saving the
> >>   package
> > 
> > I see, OK so an extra option is available to "build on foo system which
> > is wicked fast".
> 
> Yes, one new option for that, and one new option for "please install
> from the packages sitting in this directory on the Ansible controller.

Sure.

> I'm also thinking that, for cloud, kdevops could set up a build pipeline
> to build the kernel. So far, my tenancies are limited to 10 vCPUs at a
> time. But the pipelines seem to get as many as 16, making the builds
> quick.

Indeed. A "super build host" for cloud seems sensible.

> >> - Finds the packaged kernel and installs it on the test runners
> > 
> > Sweeeeet. It would be nice for us to have a sort of registry for kdevops
> > where if a kernel is already built and available we can use it but alas
> > we don't yet have any public infrastructure for that.
> 
> Yes. Scott and I have been going back and forth about how to do such a
> thing.
> 
> For cloud, it is typical for artifacts built by a pipeline to land in
> a public S3-style bucket. We could certainly put them there and have
> some scripting to create a website around it.

Neat. Makes sense.

> For guestfs, kdevops could set up a local minIO instance to do much the
> same. Or the guests could NFS-mount the controller.

OK yes I see. We already have "enterprise" uses of kdevops in the sense
that for example /mirror/ is on an NFS mount. So it would make sense
to then also use minio for these possible pre-built daily kernels.
We'd probably want a sort of master and allow the one off servers in
a corp set up to essentially go and fetch the latest 30 kernel builds
using S3 or whatever.

> > Maybe we can just
> > regularly later build using OBS daily for next and weekly for RCs and
> > see if the latest build is availble. That is, we just have a secondary
> > process which all it does is fetch and build and push to OBS. If using
> > a vanilla kernel or next kernel then it can use that. This limits the
> > functionality to standard trees but a possible nice enhacement later.
> 
> The primary goal is reducing the number of times we have to build any
> kernel. Building them on a schedule (if there is a change) is better
> than building the kernel every time a change is noticed, I think (and
> unfortunately that is what I'm doing right now).

Indeed.

> The reason to build kernels on the spot is for development work. For
> CI, having one build, and using that build everywhere you can, seems
> like that should be our goal.

Agreed.

> Interesting -- I didn't know you have so much bare metal work going on!
> 
> AFAI recall, kdevops/aws is the only provider where we offer a bare
> metal choice. I can have a look at the others to see what is easy to
> introduce.

Indeed. And even with the marketed "AWS bare metal" setups, you still
end up without some interesting performance counters... so I couldn't
evaluate TLB coalsecing for instnace on AMD which they support since
early days.

> > I see -- I thought CONFIG_LOCALVERSION_AUTO was already nice enough,
> > so you wan to track configs more closely.
> 
> I set CONFIG_LOCALVERSION_AUTO pretty much everywhere. I am already a
> client, as the TV ads used to say.
> 
> I'm just pointing out a possible use case where it wouldn't be enough.
> And maybe we worry about that use case some other time. There is a lot
> going on in this one patch!

Punting for later sure.

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-07-27  0:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-26  1:16 [PATCH 0/4] kdevops: add support for A/B testing Luis Chamberlain
2025-07-26  1:16 ` [PATCH 1/4] Makefile: add make style for style checking Luis Chamberlain
2025-07-26  1:16 ` [PATCH 2/4] CLAUDE.md: new workflow guide for hosts and nodes Luis Chamberlain
2025-07-26  1:16 ` [PATCH 3/4] gen_nodes/gen_hosts: avoid usage of fs_config_path on task names Luis Chamberlain
2025-07-26  1:16 ` [PATCH 4/4] bootlinux: add support for A/B kernel testing Luis Chamberlain
2025-07-26 18:00   ` Chuck Lever
2025-07-26 20:21     ` Luis Chamberlain
2025-07-26 21:37       ` Luis Chamberlain
2025-07-26 22:46       ` Luis Chamberlain
2025-07-26 23:16         ` Chuck Lever
2025-07-26 23:34           ` Luis Chamberlain
2025-07-26 23:35       ` Chuck Lever
2025-07-27  0:06         ` Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox