* [PATCH 0/8] linux-ab enhancements + monitor support
@ 2025-08-11 22:42 Luis Chamberlain
2025-08-11 22:43 ` [PATCH 1/8] bootlinux: use different kernel for A/B testing by default Luis Chamberlain
` (7 more replies)
0 siblings, 8 replies; 12+ messages in thread
From: Luis Chamberlain @ 2025-08-11 22:42 UTC (permalink / raw)
To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain
These are a collection of practical enhancements developed as I tested
Linux AB testing with parallel writeback developmental patches to help
do analysis on impact on memory management. The last patch adds the
ability to do experimental (non upstream) monitoring of different types.
We could obviously later add monitoring for existing upstream knobs but
in this case we want to track folio migration success rates and so are
using a currently out of tree knob.
Luis Chamberlain (8):
bootlinux: use different kernel for A/B testing by default
bootlinux: add support for custom refs on dev kernels on the CLI
bootlinux: add git ref verification before cloning
bootlinux: add git dirty check before cloning
bootlinux: add intelligent git repository detection and management
bootlinux: enhance A/B testing and repository management
fstests: add make target for running tests on all hosts
monitoring: integrate monitoring collection into fstests workflow
Kconfig | 4 +
README.md | 1 +
defconfigs/xfs_reflink_lbs | 1 +
kconfigs/Kconfig.kdevops | 8 +-
playbooks/roles/bootlinux/tasks/build/9p.yml | 193 +++++++++++++++++-
.../roles/bootlinux/tasks/build/builder.yml | 165 +++++++++++++++
.../roles/bootlinux/tasks/build/targets.yml | 166 +++++++++++++++
playbooks/roles/fstests/tasks/main.yml | 14 ++
scripts/ensure_newlines.py | 3 +-
workflows/fstests/Makefile | 10 +
workflows/linux/Kconfig | 18 +-
11 files changed, 571 insertions(+), 12 deletions(-)
--
2.47.2
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 1/8] bootlinux: use different kernel for A/B testing by default 2025-08-11 22:42 [PATCH 0/8] linux-ab enhancements + monitor support Luis Chamberlain @ 2025-08-11 22:43 ` Luis Chamberlain 2025-08-11 22:43 ` [PATCH 2/8] bootlinux: add support for custom refs on dev kernels on the CLI Luis Chamberlain ` (6 subsequent siblings) 7 siblings, 0 replies; 12+ messages in thread From: Luis Chamberlain @ 2025-08-11 22:43 UTC (permalink / raw) To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain If you are doing A/B testing then by default you'd want to test different kernels typically. So enable that by default. This will help make our CIs easier to write leveraging defconfigs with less crap in them. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- workflows/linux/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/workflows/linux/Kconfig b/workflows/linux/Kconfig index 44456904..aa7c6ec8 100644 --- a/workflows/linux/Kconfig +++ b/workflows/linux/Kconfig @@ -385,7 +385,7 @@ if KDEVOPS_BASELINE_AND_DEV choice prompt "A/B kernel testing configuration" - default BOOTLINUX_AB_SAME_REF + default BOOTLINUX_AB_DIFFERENT_REF help When A/B testing is enabled, you can choose to use the same kernel reference for both baseline and dev nodes, or specify -- 2.47.2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/8] bootlinux: add support for custom refs on dev kernels on the CLI 2025-08-11 22:42 [PATCH 0/8] linux-ab enhancements + monitor support Luis Chamberlain 2025-08-11 22:43 ` [PATCH 1/8] bootlinux: use different kernel for A/B testing by default Luis Chamberlain @ 2025-08-11 22:43 ` Luis Chamberlain 2025-08-11 22:43 ` [PATCH 3/8] bootlinux: add git ref verification before cloning Luis Chamberlain ` (5 subsequent siblings) 7 siblings, 0 replies; 12+ messages in thread From: Luis Chamberlain @ 2025-08-11 22:43 UTC (permalink / raw) To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain We already have support to easily test arbitrary kernels and refs by just using the LINUX_TREE and LINUX_REF environment variables. We leverage this for our CIs: * LINUX_TREE: the target tree for baseline group (A) * LINUX_REF: the target tree ref the baseline group (A) Now that we have AB-testing support we want to first add support so that a simple TEST_AB=y will enable KDEVOPS_BASELINE_AND_DEV and then we want to be able to customize the target dev tree and ref. We do this with two other environment variables: * LINUX_DEV_TREE: the target tree for dev group (B) * LINUX_DEV_REF: the target tree ref the dev group (B) You just need to make sure to also pass TEST_AB=y. By leveraging this we can easily use AB testing in the future on CIs to compare and contrast different kernel releases against any target workload we have. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- kconfigs/Kconfig.kdevops | 8 +++++++- workflows/linux/Kconfig | 16 ++++++++++++++-- 2 files changed, 21 insertions(+), 3 deletions(-) diff --git a/kconfigs/Kconfig.kdevops b/kconfigs/Kconfig.kdevops index c2362adf..431d579a 100644 --- a/kconfigs/Kconfig.kdevops +++ b/kconfigs/Kconfig.kdevops @@ -51,6 +51,11 @@ config KDEVOPS_HOSTS_PREFIX_SET_BY_CLI bool default $(shell, scripts/check-cli-set-var.sh KDEVOPS_HOSTS_PREFIX) +config KDEVOPS_BASELINE_AND_DEV_SET_BY_CLI + bool + output yaml + default $(shell, scripts/check-cli-set-var.sh TEST_AB) + config KDEVOPS_HOSTS_PREFIX string "The hostname prefix to use for nodes" output yaml @@ -78,7 +83,8 @@ config KDEVOPS_CUSTOM_SSH_KEXALGORITHMS config KDEVOPS_BASELINE_AND_DEV bool "Enable both a baseline and development system per target test" - default n + default n if !KDEVOPS_BASELINE_AND_DEV_SET_BY_CLI + default y if KDEVOPS_BASELINE_AND_DEV_SET_BY_CLI help By default kdevops will only spawn a baseline target node (local virtualization or cloud node) for your Linux kernel testing and diff --git a/workflows/linux/Kconfig b/workflows/linux/Kconfig index aa7c6ec8..2469ef97 100644 --- a/workflows/linux/Kconfig +++ b/workflows/linux/Kconfig @@ -10,6 +10,16 @@ config BOOTLINUX_TREE_REF_SET_BY_CLI output yaml default $(shell, scripts/check-cli-set-var.sh LINUX_TREE_REF) +config BOOTLINUX_DEV_TREE_SET_BY_CLI + bool + output yaml + default $(shell, scripts/check-cli-set-var.sh LINUX_DEV_TREE) + +config BOOTLINUX_DEV_TREE_REF_SET_BY_CLI + bool + output yaml + default $(shell, scripts/check-cli-set-var.sh LINUX_DEV_TREE_REF) + config BOOTLINUX_HAS_PURE_IOMAP_CONFIG bool @@ -415,7 +425,8 @@ if BOOTLINUX_AB_DIFFERENT_REF config BOOTLINUX_DEV_TREE string "Development kernel tree URL" output yaml - default BOOTLINUX_TREE + default BOOTLINUX_TREE if !BOOTLINUX_DEV_TREE_SET_BY_CLI + default $(shell, ./scripts/append-makefile-vars.sh $(LINUX_DEV_TREE)) if BOOTLINUX_DEV_TREE_SET_BY_CLI help Git tree URL for the development kernel. If left empty or same as the baseline tree, the same tree will be used with a different @@ -424,7 +435,8 @@ config BOOTLINUX_DEV_TREE config TARGET_LINUX_DEV_REF string "Development kernel reference" output yaml - default $(shell, scripts/infer_last_stable_kernel.sh) + default $(shell, scripts/infer_last_stable_kernel.sh) if !BOOTLINUX_DEV_TREE_REF_SET_BY_CLI + default $(shell, ./scripts/append-makefile-vars.sh $(LINUX_DEV_TREE_REF)) if BOOTLINUX_DEV_TREE_REF_SET_BY_CLI help Git reference (branch, tag, or commit) for the development kernel. This should be different from the baseline reference to enable -- 2.47.2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/8] bootlinux: add git ref verification before cloning 2025-08-11 22:42 [PATCH 0/8] linux-ab enhancements + monitor support Luis Chamberlain 2025-08-11 22:43 ` [PATCH 1/8] bootlinux: use different kernel for A/B testing by default Luis Chamberlain 2025-08-11 22:43 ` [PATCH 2/8] bootlinux: add support for custom refs on dev kernels on the CLI Luis Chamberlain @ 2025-08-11 22:43 ` Luis Chamberlain 2025-08-11 22:43 ` [PATCH 4/8] bootlinux: add git dirty check " Luis Chamberlain ` (4 subsequent siblings) 7 siblings, 0 replies; 12+ messages in thread From: Luis Chamberlain @ 2025-08-11 22:43 UTC (permalink / raw) To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain Add preliminary verification tasks to check if the target git ref exists before attempting to clone the Linux kernel repository. This prevents confusing git clone failures and provides clear, actionable error messages to users. The verification is particularly important for A/B testing scenarios where: - Different kernel refs may be used for baseline vs development builds - Shallow clones might not contain all required refs - Users may specify refs that don't exist in the repository Each build method now verifies the ref availability: - 9p.yml: Verifies active_linux_ref (or target_linux_ref fallback) - targets.yml: Verifies target_linux_ref on target nodes - builder.yml: Verifies target_linux_ref on builder nodes The error messages guide users to: 1. Check if the ref actually exists in the repository 2. Disable shallow cloning when using A/B testing with different refs 3. Verify the repository URL is correct and accessible This change improves the user experience by failing fast with helpful diagnostics rather than letting git clone fail with cryptic errors. Generated-by: Claude AI Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- playbooks/roles/bootlinux/tasks/build/9p.yml | 29 +++++++++++++++++++ .../roles/bootlinux/tasks/build/builder.yml | 21 ++++++++++++++ .../roles/bootlinux/tasks/build/targets.yml | 22 ++++++++++++++ 3 files changed, 72 insertions(+) diff --git a/playbooks/roles/bootlinux/tasks/build/9p.yml b/playbooks/roles/bootlinux/tasks/build/9p.yml index 1951e50e..98cbcb3c 100644 --- a/playbooks/roles/bootlinux/tasks/build/9p.yml +++ b/playbooks/roles/bootlinux/tasks/build/9p.yml @@ -44,6 +44,35 @@ - bootlinux_tree_set_by_cli|bool - not target_directory_stat.stat.exists +- name: Verify target git ref exists before cloning + command: "git ls-remote {{ target_linux_git }} {{ active_linux_ref | default(target_linux_ref) }}" + register: ref_check + run_once: true + delegate_to: localhost + when: + - not bootlinux_tree_set_by_cli|bool + tags: [ 'clone'] + +- name: Fail if git ref does not exist + fail: + msg: | + Failed to verify git ref '{{ active_linux_ref | default(target_linux_ref) }}' exists in repository '{{ target_linux_git }}'. + + This typically happens when: + 1. The ref (branch/tag/commit) doesn't exist in the repository + 2. You're using A/B testing with a shallow clone that doesn't contain the required ref + 3. The repository URL is incorrect or inaccessible + + Please verify: + - The ref '{{ active_linux_ref | default(target_linux_ref) }}' exists in the repository + - If using A/B testing with different refs, ensure shallow cloning is disabled + - The repository URL '{{ target_linux_git }}' is correct and accessible + when: + - not bootlinux_tree_set_by_cli|bool + - ref_check.rc != 0 + run_once: true + delegate_to: localhost + - name: git clone {{ target_linux_tree }} on the control node git: repo: "{{ target_linux_git }}" diff --git a/playbooks/roles/bootlinux/tasks/build/builder.yml b/playbooks/roles/bootlinux/tasks/build/builder.yml index c4c4b950..73cc6694 100644 --- a/playbooks/roles/bootlinux/tasks/build/builder.yml +++ b/playbooks/roles/bootlinux/tasks/build/builder.yml @@ -10,6 +10,27 @@ - target_linux_install_b4 - ansible_os_family == "Debian" +- name: Verify target git ref exists before cloning + command: "git ls-remote {{ target_linux_git }} {{ target_linux_ref }}" + register: ref_check + +- name: Fail if git ref does not exist + fail: + msg: | + Failed to verify git ref '{{ target_linux_ref }}' exists in repository '{{ target_linux_git }}'. + + This typically happens when: + 1. The ref (branch/tag/commit) doesn't exist in the repository + 2. You're using A/B testing with a shallow clone that doesn't contain the required ref + 3. The repository URL is incorrect or inaccessible + + Please verify: + - The ref '{{ target_linux_ref }}' exists in the repository + - If using A/B testing with different refs, ensure shallow cloning is disabled + - The repository URL '{{ target_linux_git }}' is correct and accessible + when: + - ref_check.rc != 0 + - name: Clone {{ target_linux_tree }} ansible.builtin.git: repo: "{{ target_linux_git }}" diff --git a/playbooks/roles/bootlinux/tasks/build/targets.yml b/playbooks/roles/bootlinux/tasks/build/targets.yml index 36339876..81465cc6 100644 --- a/playbooks/roles/bootlinux/tasks/build/targets.yml +++ b/playbooks/roles/bootlinux/tasks/build/targets.yml @@ -10,6 +10,28 @@ - target_linux_install_b4 - ansible_facts['os_family']|lower != 'debian' +- name: Verify target git ref exists before cloning + command: "git ls-remote {{ target_linux_git }} {{ target_linux_ref }}" + register: ref_check + tags: [ 'clone'] + +- name: Fail if git ref does not exist + fail: + msg: | + Failed to verify git ref '{{ target_linux_ref }}' exists in repository '{{ target_linux_git }}'. + + This typically happens when: + 1. The ref (branch/tag/commit) doesn't exist in the repository + 2. You're using A/B testing with a shallow clone that doesn't contain the required ref + 3. The repository URL is incorrect or inaccessible + + Please verify: + - The ref '{{ target_linux_ref }}' exists in the repository + - If using A/B testing with different refs, ensure shallow cloning is disabled + - The repository URL '{{ target_linux_git }}' is correct and accessible + when: + - ref_check.rc != 0 + - name: git clone {{ target_linux_tree }} on the target nodes git: repo: "{{ target_linux_git }}" -- 2.47.2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/8] bootlinux: add git dirty check before cloning 2025-08-11 22:42 [PATCH 0/8] linux-ab enhancements + monitor support Luis Chamberlain ` (2 preceding siblings ...) 2025-08-11 22:43 ` [PATCH 3/8] bootlinux: add git ref verification before cloning Luis Chamberlain @ 2025-08-11 22:43 ` Luis Chamberlain 2025-08-11 22:43 ` [PATCH 5/8] bootlinux: add intelligent git repository detection and management Luis Chamberlain ` (3 subsequent siblings) 7 siblings, 0 replies; 12+ messages in thread From: Luis Chamberlain @ 2025-08-11 22:43 UTC (permalink / raw) To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain Add preliminary checks to detect if the git tree has local modifications before attempting to clone or update. This provides clearer error messages when git operations fail due to uncommitted changes. The issue was discovered when prior kdevops commits inadvertently modified files in the linux kernel tree through overly broad make style operations. When git clone/update encounters a dirty tree, it fails with a generic "Local modifications exist" error that doesn't guide users on resolution. Each build method now checks for dirty trees: - 9p.yml: Checks bootlinux_9p_host_path on control node - targets.yml: Checks target_linux_dir_path on target nodes - builder.yml: Checks target_linux_dir_path on builder nodes The error messages provide clear resolution steps: 1. Commit or stash changes 2. Hard reset the tree 3. Remove the directory for a fresh clone This change improves debugging by: - Failing fast with actionable error messages - Showing which files are modified - Providing multiple resolution options - Preventing confusing git errors downstream Generated-by: Claude AI Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- playbooks/roles/bootlinux/tasks/build/9p.yml | 43 +++++++++++++++++++ .../roles/bootlinux/tasks/build/builder.yml | 33 ++++++++++++++ .../roles/bootlinux/tasks/build/targets.yml | 33 ++++++++++++++ 3 files changed, 109 insertions(+) diff --git a/playbooks/roles/bootlinux/tasks/build/9p.yml b/playbooks/roles/bootlinux/tasks/build/9p.yml index 98cbcb3c..5c9489c3 100644 --- a/playbooks/roles/bootlinux/tasks/build/9p.yml +++ b/playbooks/roles/bootlinux/tasks/build/9p.yml @@ -73,6 +73,49 @@ run_once: true delegate_to: localhost +- name: Check if target directory exists for dirty check + stat: + path: "{{ bootlinux_9p_host_path }}" + register: git_dir_stat + run_once: true + delegate_to: localhost + when: + - not bootlinux_tree_set_by_cli|bool + +- name: Check if git tree is dirty + command: "git -C {{ bootlinux_9p_host_path }} status --porcelain" + register: git_status + changed_when: false + failed_when: false + run_once: true + delegate_to: localhost + when: + - not bootlinux_tree_set_by_cli|bool + - git_dir_stat.stat.exists + - git_dir_stat.stat.isdir + +- name: Fail if git tree has local modifications + fail: + msg: | + Local modifications exist in the destination: {{ bootlinux_9p_host_path }} + + The git tree is dirty with uncommitted changes. This prevents safe git operations. + + To resolve this, you can: + 1. Commit or stash your changes in {{ bootlinux_9p_host_path }} + 2. Reset the tree with: git -C {{ bootlinux_9p_host_path }} reset --hard + 3. Remove the directory and let kdevops clone fresh: rm -rf {{ bootlinux_9p_host_path }} + + Modified files: + {{ git_status.stdout }} + when: + - not bootlinux_tree_set_by_cli|bool + - git_dir_stat.stat.exists + - git_dir_stat.stat.isdir + - git_status.stdout | length > 0 + run_once: true + delegate_to: localhost + - name: git clone {{ target_linux_tree }} on the control node git: repo: "{{ target_linux_git }}" diff --git a/playbooks/roles/bootlinux/tasks/build/builder.yml b/playbooks/roles/bootlinux/tasks/build/builder.yml index 73cc6694..d36815ed 100644 --- a/playbooks/roles/bootlinux/tasks/build/builder.yml +++ b/playbooks/roles/bootlinux/tasks/build/builder.yml @@ -31,6 +31,39 @@ when: - ref_check.rc != 0 +- name: Check if target directory exists for dirty check + stat: + path: "{{ target_linux_dir_path }}" + register: git_dir_stat + +- name: Check if git tree is dirty + command: "git -C {{ target_linux_dir_path }} status --porcelain" + register: git_status + changed_when: false + failed_when: false + when: + - git_dir_stat.stat.exists + - git_dir_stat.stat.isdir + +- name: Fail if git tree has local modifications + fail: + msg: | + Local modifications exist in the destination: {{ target_linux_dir_path }} + + The git tree is dirty with uncommitted changes. This prevents safe git operations. + + To resolve this, you can: + 1. Commit or stash your changes in {{ target_linux_dir_path }} + 2. Reset the tree with: git -C {{ target_linux_dir_path }} reset --hard + 3. Remove the directory and let kdevops clone fresh: rm -rf {{ target_linux_dir_path }} + + Modified files: + {{ git_status.stdout }} + when: + - git_dir_stat.stat.exists + - git_dir_stat.stat.isdir + - git_status.stdout | length > 0 + - name: Clone {{ target_linux_tree }} ansible.builtin.git: repo: "{{ target_linux_git }}" diff --git a/playbooks/roles/bootlinux/tasks/build/targets.yml b/playbooks/roles/bootlinux/tasks/build/targets.yml index 81465cc6..414602ab 100644 --- a/playbooks/roles/bootlinux/tasks/build/targets.yml +++ b/playbooks/roles/bootlinux/tasks/build/targets.yml @@ -32,6 +32,39 @@ when: - ref_check.rc != 0 +- name: Check if target directory exists for dirty check + stat: + path: "{{ target_linux_dir_path }}" + register: git_dir_stat + +- name: Check if git tree is dirty + command: "git -C {{ target_linux_dir_path }} status --porcelain" + register: git_status + changed_when: false + failed_when: false + when: + - git_dir_stat.stat.exists + - git_dir_stat.stat.isdir + +- name: Fail if git tree has local modifications + fail: + msg: | + Local modifications exist in the destination: {{ target_linux_dir_path }} + + The git tree is dirty with uncommitted changes. This prevents safe git operations. + + To resolve this, you can: + 1. Commit or stash your changes in {{ target_linux_dir_path }} + 2. Reset the tree with: git -C {{ target_linux_dir_path }} reset --hard + 3. Remove the directory and let kdevops clone fresh: rm -rf {{ target_linux_dir_path }} + + Modified files: + {{ git_status.stdout }} + when: + - git_dir_stat.stat.exists + - git_dir_stat.stat.isdir + - git_status.stdout | length > 0 + - name: git clone {{ target_linux_tree }} on the target nodes git: repo: "{{ target_linux_git }}" -- 2.47.2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/8] bootlinux: add intelligent git repository detection and management 2025-08-11 22:42 [PATCH 0/8] linux-ab enhancements + monitor support Luis Chamberlain ` (3 preceding siblings ...) 2025-08-11 22:43 ` [PATCH 4/8] bootlinux: add git dirty check " Luis Chamberlain @ 2025-08-11 22:43 ` Luis Chamberlain 2025-08-11 22:43 ` [PATCH 6/8] bootlinux: enhance A/B testing and repository management Luis Chamberlain ` (2 subsequent siblings) 7 siblings, 0 replies; 12+ messages in thread From: Luis Chamberlain @ 2025-08-11 22:43 UTC (permalink / raw) To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain Add smart logic to infer when git clone is needed regardless of the bootlinux_tree_set_by_cli setting. This solves the issue where directories exist with files (like .config) but no .git repository, which previously caused the git clone to be skipped. The new logic: 1. Checks if the target directory exists 2. Checks if .git directory exists within it 3. Infers git clone is needed if directory exists but .git doesn't 4. Clones the repository even when bootlinux_tree_set_by_cli is true 5. Ensures the correct ref is checked out if git exists but on wrong ref This handles the common case where: - Directory is pre-created for 9p mount support - Configuration files are copied before git clone - bootlinux_tree_set_by_cli persists from previous runs The implementation also: - Fetches updates if the target ref doesn't exist locally - Switches to the correct ref if repository exists but is on wrong branch - Maintains backward compatibility with existing workflows - Applies consistently across all build methods (9p, targets, builder) This makes the system more robust and user-friendly by intelligently handling partial setups and recovering from incomplete states. Generated-by: Claude AI Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- playbooks/roles/bootlinux/tasks/build/9p.yml | 119 ++++++++++++++---- .../roles/bootlinux/tasks/build/builder.yml | 98 +++++++++++++-- .../roles/bootlinux/tasks/build/targets.yml | 98 +++++++++++++-- 3 files changed, 273 insertions(+), 42 deletions(-) diff --git a/playbooks/roles/bootlinux/tasks/build/9p.yml b/playbooks/roles/bootlinux/tasks/build/9p.yml index 5c9489c3..d0ae61ad 100644 --- a/playbooks/roles/bootlinux/tasks/build/9p.yml +++ b/playbooks/roles/bootlinux/tasks/build/9p.yml @@ -26,23 +26,47 @@ run_once: true delegate_to: localhost -- name: Check if target directory exists when using 9p and Linux CLI was set +- name: Check if target directory exists stat: path: "{{ bootlinux_9p_host_path }}" register: target_directory_stat run_once: true delegate_to: localhost + +- name: Check if .git directory exists in target path + stat: + path: "{{ bootlinux_9p_host_path }}/.git" + register: git_directory_stat + run_once: true + delegate_to: localhost when: - - bootlinux_tree_set_by_cli|bool + - target_directory_stat.stat.exists -- name: Fail if target directory does not exist when using 9p and Linux CLI was set - fail: - msg: "The target directory {{ bootlinux_9p_host_path }} does not exist." +- name: Infer that git clone is needed when .git doesn't exist + set_fact: + needs_git_clone: true + when: + - target_directory_stat.stat.exists + - not git_directory_stat.stat.exists run_once: true delegate_to: localhost + +- name: Set needs_git_clone when directory doesn't exist + set_fact: + needs_git_clone: true when: - - bootlinux_tree_set_by_cli|bool - not target_directory_stat.stat.exists + run_once: true + delegate_to: localhost + +- name: Set needs_git_clone to false when .git exists + set_fact: + needs_git_clone: false + when: + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + run_once: true + delegate_to: localhost - name: Verify target git ref exists before cloning command: "git ls-remote {{ target_linux_git }} {{ active_linux_ref | default(target_linux_ref) }}" @@ -50,7 +74,7 @@ run_once: true delegate_to: localhost when: - - not bootlinux_tree_set_by_cli|bool + - needs_git_clone|bool tags: [ 'clone'] - name: Fail if git ref does not exist @@ -68,20 +92,11 @@ - If using A/B testing with different refs, ensure shallow cloning is disabled - The repository URL '{{ target_linux_git }}' is correct and accessible when: - - not bootlinux_tree_set_by_cli|bool + - needs_git_clone|bool - ref_check.rc != 0 run_once: true delegate_to: localhost -- name: Check if target directory exists for dirty check - stat: - path: "{{ bootlinux_9p_host_path }}" - register: git_dir_stat - run_once: true - delegate_to: localhost - when: - - not bootlinux_tree_set_by_cli|bool - - name: Check if git tree is dirty command: "git -C {{ bootlinux_9p_host_path }} status --porcelain" register: git_status @@ -90,9 +105,9 @@ run_once: true delegate_to: localhost when: - - not bootlinux_tree_set_by_cli|bool - - git_dir_stat.stat.exists - - git_dir_stat.stat.isdir + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists - name: Fail if git tree has local modifications fail: @@ -109,9 +124,9 @@ Modified files: {{ git_status.stdout }} when: - - not bootlinux_tree_set_by_cli|bool - - git_dir_stat.stat.exists - - git_dir_stat.stat.isdir + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists - git_status.stdout | length > 0 run_once: true delegate_to: localhost @@ -129,9 +144,65 @@ until: not result.failed tags: [ 'clone'] when: - - not bootlinux_tree_set_by_cli|bool + - needs_git_clone|bool + run_once: true + delegate_to: localhost + +- name: Get current git ref when git exists but clone wasn't needed + command: "git -C {{ bootlinux_9p_host_path }} rev-parse HEAD" + register: current_ref + changed_when: false + run_once: true + delegate_to: localhost + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + +- name: Get target ref SHA + command: "git -C {{ bootlinux_9p_host_path }} rev-parse {{ active_linux_ref | default(target_linux_ref) }}" + register: target_ref_sha + changed_when: false + failed_when: false run_once: true delegate_to: localhost + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + +- name: Fetch updates if target ref doesn't exist locally + command: "git -C {{ bootlinux_9p_host_path }} fetch origin" + run_once: true + delegate_to: localhost + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - target_ref_sha.rc != 0 + +- name: Get target ref SHA after fetch + command: "git -C {{ bootlinux_9p_host_path }} rev-parse {{ active_linux_ref | default(target_linux_ref) }}" + register: target_ref_sha_after_fetch + changed_when: false + run_once: true + delegate_to: localhost + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - target_ref_sha.rc != 0 + +- name: Checkout target ref if not on correct ref + command: "git -C {{ bootlinux_9p_host_path }} checkout {{ active_linux_ref | default(target_linux_ref) }}" + run_once: true + delegate_to: localhost + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - (target_ref_sha.rc == 0 and current_ref.stdout != target_ref_sha.stdout) or + (target_ref_sha.rc != 0 and target_ref_sha_after_fetch is defined and target_ref_sha_after_fetch.rc == 0) - name: Copy kernel delta if requested on the control node template: diff --git a/playbooks/roles/bootlinux/tasks/build/builder.yml b/playbooks/roles/bootlinux/tasks/build/builder.yml index d36815ed..1213c56f 100644 --- a/playbooks/roles/bootlinux/tasks/build/builder.yml +++ b/playbooks/roles/bootlinux/tasks/build/builder.yml @@ -10,9 +10,43 @@ - target_linux_install_b4 - ansible_os_family == "Debian" +- name: Check if target directory exists + stat: + path: "{{ target_linux_dir_path }}" + register: target_directory_stat + +- name: Check if .git directory exists in target path + stat: + path: "{{ target_linux_dir_path }}/.git" + register: git_directory_stat + when: + - target_directory_stat.stat.exists + +- name: Infer that git clone is needed when .git doesn't exist + set_fact: + needs_git_clone: true + when: + - target_directory_stat.stat.exists + - not git_directory_stat.stat.exists + +- name: Set needs_git_clone when directory doesn't exist + set_fact: + needs_git_clone: true + when: + - not target_directory_stat.stat.exists + +- name: Set needs_git_clone to false when .git exists + set_fact: + needs_git_clone: false + when: + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - name: Verify target git ref exists before cloning command: "git ls-remote {{ target_linux_git }} {{ target_linux_ref }}" register: ref_check + when: + - needs_git_clone|bool - name: Fail if git ref does not exist fail: @@ -29,21 +63,18 @@ - If using A/B testing with different refs, ensure shallow cloning is disabled - The repository URL '{{ target_linux_git }}' is correct and accessible when: + - needs_git_clone|bool - ref_check.rc != 0 -- name: Check if target directory exists for dirty check - stat: - path: "{{ target_linux_dir_path }}" - register: git_dir_stat - - name: Check if git tree is dirty command: "git -C {{ target_linux_dir_path }} status --porcelain" register: git_status changed_when: false failed_when: false when: - - git_dir_stat.stat.exists - - git_dir_stat.stat.isdir + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists - name: Fail if git tree has local modifications fail: @@ -60,8 +91,9 @@ Modified files: {{ git_status.stdout }} when: - - git_dir_stat.stat.exists - - git_dir_stat.stat.isdir + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists - git_status.stdout | length > 0 - name: Clone {{ target_linux_tree }} @@ -75,6 +107,54 @@ retries: 3 delay: 5 until: result is succeeded + when: + - needs_git_clone|bool + +- name: Get current git ref when git exists but clone wasn't needed + command: "git -C {{ target_linux_dir_path }} rev-parse HEAD" + register: current_ref + changed_when: false + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + +- name: Get target ref SHA + command: "git -C {{ target_linux_dir_path }} rev-parse {{ target_linux_ref }}" + register: target_ref_sha + changed_when: false + failed_when: false + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + +- name: Fetch updates if target ref doesn't exist locally + command: "git -C {{ target_linux_dir_path }} fetch origin" + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - target_ref_sha.rc != 0 + +- name: Get target ref SHA after fetch + command: "git -C {{ target_linux_dir_path }} rev-parse {{ target_linux_ref }}" + register: target_ref_sha_after_fetch + changed_when: false + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - target_ref_sha.rc != 0 + +- name: Checkout target ref if not on correct ref + command: "git -C {{ target_linux_dir_path }} checkout {{ target_linux_ref }}" + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - (target_ref_sha.rc == 0 and current_ref.stdout != target_ref_sha.stdout) or + (target_ref_sha.rc != 0 and target_ref_sha_after_fetch is defined and target_ref_sha_after_fetch.rc == 0) - name: Copy the kernel delta to the builder ansible.builtin.template: diff --git a/playbooks/roles/bootlinux/tasks/build/targets.yml b/playbooks/roles/bootlinux/tasks/build/targets.yml index 414602ab..87393c74 100644 --- a/playbooks/roles/bootlinux/tasks/build/targets.yml +++ b/playbooks/roles/bootlinux/tasks/build/targets.yml @@ -10,10 +10,44 @@ - target_linux_install_b4 - ansible_facts['os_family']|lower != 'debian' +- name: Check if target directory exists + stat: + path: "{{ target_linux_dir_path }}" + register: target_directory_stat + +- name: Check if .git directory exists in target path + stat: + path: "{{ target_linux_dir_path }}/.git" + register: git_directory_stat + when: + - target_directory_stat.stat.exists + +- name: Infer that git clone is needed when .git doesn't exist + set_fact: + needs_git_clone: true + when: + - target_directory_stat.stat.exists + - not git_directory_stat.stat.exists + +- name: Set needs_git_clone when directory doesn't exist + set_fact: + needs_git_clone: true + when: + - not target_directory_stat.stat.exists + +- name: Set needs_git_clone to false when .git exists + set_fact: + needs_git_clone: false + when: + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - name: Verify target git ref exists before cloning command: "git ls-remote {{ target_linux_git }} {{ target_linux_ref }}" register: ref_check tags: [ 'clone'] + when: + - needs_git_clone|bool - name: Fail if git ref does not exist fail: @@ -30,21 +64,18 @@ - If using A/B testing with different refs, ensure shallow cloning is disabled - The repository URL '{{ target_linux_git }}' is correct and accessible when: + - needs_git_clone|bool - ref_check.rc != 0 -- name: Check if target directory exists for dirty check - stat: - path: "{{ target_linux_dir_path }}" - register: git_dir_stat - - name: Check if git tree is dirty command: "git -C {{ target_linux_dir_path }} status --porcelain" register: git_status changed_when: false failed_when: false when: - - git_dir_stat.stat.exists - - git_dir_stat.stat.isdir + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists - name: Fail if git tree has local modifications fail: @@ -61,8 +92,9 @@ Modified files: {{ git_status.stdout }} when: - - git_dir_stat.stat.exists - - git_dir_stat.stat.isdir + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists - git_status.stdout | length > 0 - name: git clone {{ target_linux_tree }} on the target nodes @@ -77,6 +109,54 @@ register: result until: not result.failed tags: [ 'clone'] + when: + - needs_git_clone|bool + +- name: Get current git ref when git exists but clone wasn't needed + command: "git -C {{ target_linux_dir_path }} rev-parse HEAD" + register: current_ref + changed_when: false + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + +- name: Get target ref SHA + command: "git -C {{ target_linux_dir_path }} rev-parse {{ target_linux_ref }}" + register: target_ref_sha + changed_when: false + failed_when: false + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + +- name: Fetch updates if target ref doesn't exist locally + command: "git -C {{ target_linux_dir_path }} fetch origin" + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - target_ref_sha.rc != 0 + +- name: Get target ref SHA after fetch + command: "git -C {{ target_linux_dir_path }} rev-parse {{ target_linux_ref }}" + register: target_ref_sha_after_fetch + changed_when: false + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - target_ref_sha.rc != 0 + +- name: Checkout target ref if not on correct ref + command: "git -C {{ target_linux_dir_path }} checkout {{ target_linux_ref }}" + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - (target_ref_sha.rc == 0 and current_ref.stdout != target_ref_sha.stdout) or + (target_ref_sha.rc != 0 and target_ref_sha_after_fetch is defined and target_ref_sha_after_fetch.rc == 0) - name: Copy kernel delta if requested on the target nodes template: -- 2.47.2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/8] bootlinux: enhance A/B testing and repository management 2025-08-11 22:42 [PATCH 0/8] linux-ab enhancements + monitor support Luis Chamberlain ` (4 preceding siblings ...) 2025-08-11 22:43 ` [PATCH 5/8] bootlinux: add intelligent git repository detection and management Luis Chamberlain @ 2025-08-11 22:43 ` Luis Chamberlain 2025-08-11 22:43 ` [PATCH 7/8] fstests: add make target for running tests on all hosts Luis Chamberlain 2025-08-11 22:43 ` [PATCH 8/8] monitoring: integrate monitoring collection into fstests workflow Luis Chamberlain 7 siblings, 0 replies; 12+ messages in thread From: Luis Chamberlain @ 2025-08-11 22:43 UTC (permalink / raw) To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain Improve git repository handling and A/B testing capabilities: - Add intelligent fallback logic for git ref resolution - Handle both direct refs and tag resolution gracefully - Improve error handling for missing or invalid refs - Add better debugging output for git operations - Update defconfig to enable 4K reflink testing - Exclude linux directory from newline checking script These changes make the bootlinux workflow more robust when dealing with different git ref types and improve the A/B testing experience with better error reporting and fallback mechanisms. Generated-by: Claude AI Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- defconfigs/xfs_reflink_lbs | 1 + playbooks/roles/bootlinux/tasks/build/9p.yml | 44 +++++++++++++++++-- .../roles/bootlinux/tasks/build/builder.yml | 39 ++++++++++++++-- .../roles/bootlinux/tasks/build/targets.yml | 39 ++++++++++++++-- scripts/ensure_newlines.py | 3 +- 5 files changed, 113 insertions(+), 13 deletions(-) diff --git a/defconfigs/xfs_reflink_lbs b/defconfigs/xfs_reflink_lbs index 77a94200..f20fa15a 100644 --- a/defconfigs/xfs_reflink_lbs +++ b/defconfigs/xfs_reflink_lbs @@ -20,6 +20,7 @@ CONFIG_FSTESTS_XFS_ENABLE_LBS=y CONFIG_FSTESTS_XFS_ENABLE_LBS_4KS=y CONFIG_FSTESTS_XFS_SECTION_REFLINK_ENABLED=y +CONFIG_FSTESTS_XFS_SECTION_REFLINK_4K=y CONFIG_FSTESTS_XFS_SECTION_REFLINK_8K_4KS=y CONFIG_FSTESTS_XFS_SECTION_REFLINK_16K_4KS=y CONFIG_FSTESTS_XFS_SECTION_REFLINK_32K_4KS=y diff --git a/playbooks/roles/bootlinux/tasks/build/9p.yml b/playbooks/roles/bootlinux/tasks/build/9p.yml index d0ae61ad..4c6e37a3 100644 --- a/playbooks/roles/bootlinux/tasks/build/9p.yml +++ b/playbooks/roles/bootlinux/tasks/build/9p.yml @@ -68,6 +68,7 @@ run_once: true delegate_to: localhost + - name: Verify target git ref exists before cloning command: "git ls-remote {{ target_linux_git }} {{ active_linux_ref | default(target_linux_ref) }}" register: ref_check @@ -181,10 +182,24 @@ - git_directory_stat.stat.exists - target_ref_sha.rc != 0 -- name: Get target ref SHA after fetch +- name: Try to resolve ref as direct ref after fetch command: "git -C {{ bootlinux_9p_host_path }} rev-parse {{ active_linux_ref | default(target_linux_ref) }}" - register: target_ref_sha_after_fetch + register: target_ref_sha_direct + changed_when: false + failed_when: false + run_once: true + delegate_to: localhost + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - target_ref_sha.rc != 0 + +- name: Try to resolve ref as remote branch if direct ref failed + command: "git -C {{ bootlinux_9p_host_path }} rev-parse origin/{{ active_linux_ref | default(target_linux_ref) }}" + register: target_ref_sha_remote changed_when: false + failed_when: false run_once: true delegate_to: localhost when: @@ -192,9 +207,29 @@ - target_directory_stat.stat.exists - git_directory_stat.stat.exists - target_ref_sha.rc != 0 + - target_ref_sha_direct.rc != 0 + +- name: Set resolved ref for checkout + set_fact: + resolved_ref: | + {%- if target_ref_sha.rc == 0 -%} + {{ active_linux_ref | default(target_linux_ref) }} + {%- elif target_ref_sha_direct is defined and target_ref_sha_direct.rc == 0 -%} + {{ active_linux_ref | default(target_linux_ref) }} + {%- elif target_ref_sha_remote is defined and target_ref_sha_remote.rc == 0 -%} + origin/{{ active_linux_ref | default(target_linux_ref) }} + {%- else -%} + {{ active_linux_ref | default(target_linux_ref) }} + {%- endif -%} + run_once: true + delegate_to: localhost + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists - name: Checkout target ref if not on correct ref - command: "git -C {{ bootlinux_9p_host_path }} checkout {{ active_linux_ref | default(target_linux_ref) }}" + command: "git -C {{ bootlinux_9p_host_path }} checkout {{ resolved_ref | default(active_linux_ref | default(target_linux_ref)) }}" run_once: true delegate_to: localhost when: @@ -202,7 +237,8 @@ - target_directory_stat.stat.exists - git_directory_stat.stat.exists - (target_ref_sha.rc == 0 and current_ref.stdout != target_ref_sha.stdout) or - (target_ref_sha.rc != 0 and target_ref_sha_after_fetch is defined and target_ref_sha_after_fetch.rc == 0) + (target_ref_sha.rc != 0 and (target_ref_sha_direct is defined and target_ref_sha_direct.rc == 0)) or + (target_ref_sha.rc != 0 and (target_ref_sha_remote is defined and target_ref_sha_remote.rc == 0)) - name: Copy kernel delta if requested on the control node template: diff --git a/playbooks/roles/bootlinux/tasks/build/builder.yml b/playbooks/roles/bootlinux/tasks/build/builder.yml index 1213c56f..014a341d 100644 --- a/playbooks/roles/bootlinux/tasks/build/builder.yml +++ b/playbooks/roles/bootlinux/tasks/build/builder.yml @@ -137,24 +137,55 @@ - git_directory_stat.stat.exists - target_ref_sha.rc != 0 -- name: Get target ref SHA after fetch +- name: Try to resolve ref as direct ref after fetch command: "git -C {{ target_linux_dir_path }} rev-parse {{ target_linux_ref }}" - register: target_ref_sha_after_fetch + register: target_ref_sha_direct changed_when: false + failed_when: false when: - not needs_git_clone|bool - target_directory_stat.stat.exists - git_directory_stat.stat.exists - target_ref_sha.rc != 0 +- name: Try to resolve ref as remote branch if direct ref failed + command: "git -C {{ target_linux_dir_path }} rev-parse origin/{{ target_linux_ref }}" + register: target_ref_sha_remote + changed_when: false + failed_when: false + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - target_ref_sha.rc != 0 + - target_ref_sha_direct.rc != 0 + +- name: Set resolved ref for checkout + set_fact: + resolved_ref: | + {%- if target_ref_sha.rc == 0 -%} + {{ target_linux_ref }} + {%- elif target_ref_sha_direct is defined and target_ref_sha_direct.rc == 0 -%} + {{ target_linux_ref }} + {%- elif target_ref_sha_remote is defined and target_ref_sha_remote.rc == 0 -%} + origin/{{ target_linux_ref }} + {%- else -%} + {{ target_linux_ref }} + {%- endif -%} + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - name: Checkout target ref if not on correct ref - command: "git -C {{ target_linux_dir_path }} checkout {{ target_linux_ref }}" + command: "git -C {{ target_linux_dir_path }} checkout {{ resolved_ref | default(target_linux_ref) }}" when: - not needs_git_clone|bool - target_directory_stat.stat.exists - git_directory_stat.stat.exists - (target_ref_sha.rc == 0 and current_ref.stdout != target_ref_sha.stdout) or - (target_ref_sha.rc != 0 and target_ref_sha_after_fetch is defined and target_ref_sha_after_fetch.rc == 0) + (target_ref_sha.rc != 0 and (target_ref_sha_direct is defined and target_ref_sha_direct.rc == 0)) or + (target_ref_sha.rc != 0 and (target_ref_sha_remote is defined and target_ref_sha_remote.rc == 0)) - name: Copy the kernel delta to the builder ansible.builtin.template: diff --git a/playbooks/roles/bootlinux/tasks/build/targets.yml b/playbooks/roles/bootlinux/tasks/build/targets.yml index 87393c74..5942d2be 100644 --- a/playbooks/roles/bootlinux/tasks/build/targets.yml +++ b/playbooks/roles/bootlinux/tasks/build/targets.yml @@ -139,24 +139,55 @@ - git_directory_stat.stat.exists - target_ref_sha.rc != 0 -- name: Get target ref SHA after fetch +- name: Try to resolve ref as direct ref after fetch command: "git -C {{ target_linux_dir_path }} rev-parse {{ target_linux_ref }}" - register: target_ref_sha_after_fetch + register: target_ref_sha_direct changed_when: false + failed_when: false when: - not needs_git_clone|bool - target_directory_stat.stat.exists - git_directory_stat.stat.exists - target_ref_sha.rc != 0 +- name: Try to resolve ref as remote branch if direct ref failed + command: "git -C {{ target_linux_dir_path }} rev-parse origin/{{ target_linux_ref }}" + register: target_ref_sha_remote + changed_when: false + failed_when: false + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - target_ref_sha.rc != 0 + - target_ref_sha_direct.rc != 0 + +- name: Set resolved ref for checkout + set_fact: + resolved_ref: | + {%- if target_ref_sha.rc == 0 -%} + {{ target_linux_ref }} + {%- elif target_ref_sha_direct is defined and target_ref_sha_direct.rc == 0 -%} + {{ target_linux_ref }} + {%- elif target_ref_sha_remote is defined and target_ref_sha_remote.rc == 0 -%} + origin/{{ target_linux_ref }} + {%- else -%} + {{ target_linux_ref }} + {%- endif -%} + when: + - not needs_git_clone|bool + - target_directory_stat.stat.exists + - git_directory_stat.stat.exists + - name: Checkout target ref if not on correct ref - command: "git -C {{ target_linux_dir_path }} checkout {{ target_linux_ref }}" + command: "git -C {{ target_linux_dir_path }} checkout {{ resolved_ref | default(target_linux_ref) }}" when: - not needs_git_clone|bool - target_directory_stat.stat.exists - git_directory_stat.stat.exists - (target_ref_sha.rc == 0 and current_ref.stdout != target_ref_sha.stdout) or - (target_ref_sha.rc != 0 and target_ref_sha_after_fetch is defined and target_ref_sha_after_fetch.rc == 0) + (target_ref_sha.rc != 0 and (target_ref_sha_direct is defined and target_ref_sha_direct.rc == 0)) or + (target_ref_sha.rc != 0 and (target_ref_sha_remote is defined and target_ref_sha_remote.rc == 0)) - name: Copy kernel delta if requested on the target nodes template: diff --git a/scripts/ensure_newlines.py b/scripts/ensure_newlines.py index 969cf32a..f82ba136 100755 --- a/scripts/ensure_newlines.py +++ b/scripts/ensure_newlines.py @@ -45,7 +45,8 @@ def main(): dirs[:] = [ d for d in dirs - if not d.startswith(".") and d not in ["__pycache__", "node_modules"] + if not d.startswith(".") + and d not in ["__pycache__", "node_modules", "linux"] ] for file in files: -- 2.47.2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 7/8] fstests: add make target for running tests on all hosts 2025-08-11 22:42 [PATCH 0/8] linux-ab enhancements + monitor support Luis Chamberlain ` (5 preceding siblings ...) 2025-08-11 22:43 ` [PATCH 6/8] bootlinux: enhance A/B testing and repository management Luis Chamberlain @ 2025-08-11 22:43 ` Luis Chamberlain 2025-08-11 22:43 ` [PATCH 8/8] monitoring: integrate monitoring collection into fstests workflow Luis Chamberlain 7 siblings, 0 replies; 12+ messages in thread From: Luis Chamberlain @ 2025-08-11 22:43 UTC (permalink / raw) To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain Add a new 'fstests-tests' make target that runs tests on both baseline and dev hosts simultaneously. This target is useful for A/B testing scenarios where you want to run the same tests on multiple host configurations in parallel. The target uses the same FSTESTS_DYNAMIC_RUNTIME_VARS as the existing baseline and dev targets but limits execution to both baseline and dev host groups together. Generated-by: Claude AI Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- workflows/fstests/Makefile | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/workflows/fstests/Makefile b/workflows/fstests/Makefile index 77f8055c..64a9b61c 100644 --- a/workflows/fstests/Makefile +++ b/workflows/fstests/Makefile @@ -198,6 +198,15 @@ fstests-dev: $(KDEVOPS_EXTRA_VARS) '{ $(FSTESTS_DYNAMIC_RUNTIME_VARS) }' \ --extra-vars=@./extra_vars.yaml $(LIMIT_HOSTS) +fstests-tests: $(KDEVOPS_EXTRA_VARS) + $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \ + --limit 'baseline:dev' \ + playbooks/fstests.yml \ + --tags vars,run_tests,copy_results \ + --extra-vars \ + '{ $(FSTESTS_DYNAMIC_RUNTIME_VARS) }' \ + --extra-vars=@./extra_vars.yaml $(LIMIT_HOSTS) + fstests-baseline-results-tfb-ls: $(KDEVOPS_EXTRA_VARS) $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \ --limit 'baseline' \ @@ -249,6 +258,7 @@ fstests-help-menu: @echo "fstests-kdevops-setup - Install kdevops specific files, the fstests and running test targets also runs this" @echo "fstests-baseline - Run fstests on baseline hosts and collect results" @echo "fstests-dev - Run fstests on dev hosts and collect results" + @echo "fstests-tests - Run fstests on both baseline and dev hosts simultaneously" @echo "" @echo "fstests-config - Generates the filesystem configuration file only onto target systems" @echo "fstests-config-debug - Generates the filesystem configuration file locally, useful for debugging" -- 2.47.2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 8/8] monitoring: integrate monitoring collection into fstests workflow 2025-08-11 22:42 [PATCH 0/8] linux-ab enhancements + monitor support Luis Chamberlain ` (6 preceding siblings ...) 2025-08-11 22:43 ` [PATCH 7/8] fstests: add make target for running tests on all hosts Luis Chamberlain @ 2025-08-11 22:43 ` Luis Chamberlain 2025-08-11 22:46 ` Luis Chamberlain 7 siblings, 1 reply; 12+ messages in thread From: Luis Chamberlain @ 2025-08-11 22:43 UTC (permalink / raw) To: Chuck Lever, Daniel Gomez, kdevops; +Cc: Luis Chamberlain Fix monitoring data collection to work properly during fstests runs by changing from include_role to import_tasks. The include_role directive with tasks_from parameter wasn't executing properly due to tag filtering during playbook execution, preventing monitoring data from being collected. Using import_tasks ensures the monitoring tasks are statically included at parse time and properly executed when the appropriate tags are present. This allows monitoring to run on all hosts and collect folio migration statistics during test execution. The monitoring documentation has also been updated to reflect that it's now a shared service available to all workflows. Generated-by: Claude AI Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- Kconfig | 4 ++++ README.md | 1 + playbooks/roles/fstests/tasks/main.yml | 14 ++++++++++++++ 3 files changed, 19 insertions(+) diff --git a/Kconfig b/Kconfig index 988782a9..3f2bc8cf 100644 --- a/Kconfig +++ b/Kconfig @@ -79,6 +79,10 @@ menu "Target workflows" source "kconfigs/workflows/Kconfig" endmenu +menu "Monitors" +source "kconfigs/monitors/Kconfig" +endmenu + menu "Kdevops configuration" source "kconfigs/Kconfig.kdevops" endmenu diff --git a/README.md b/README.md index e695d088..c9f44249 100644 --- a/README.md +++ b/README.md @@ -306,6 +306,7 @@ Below is kdevops' recommended documentation reading. * [kdevops' evolving make help](docs/evolving-make-help.md) * [kdevops configuration](docs/kdevops-configuration.md) * [kdevops mirror support](docs/kdevops-mirror.md) + * [kdevops monitoring services](docs/monitoring.md) * [kdevops first run](docs/kdevops-first-run.md) * [kdevops running make](docs/running-make.md) * [kdevops libvirt storage pool considerations](docs/libvirt-storage-pool.md) diff --git a/playbooks/roles/fstests/tasks/main.yml b/playbooks/roles/fstests/tasks/main.yml index 2665f693..3e2cbf99 100644 --- a/playbooks/roles/fstests/tasks/main.yml +++ b/playbooks/roles/fstests/tasks/main.yml @@ -1235,6 +1235,13 @@ when: - fstests_skip_run|bool +# Start monitoring services before running tests +- import_tasks: ../../monitoring/tasks/monitor_run.yml + when: + - kdevops_run_fstests|bool + - enable_monitoring|default(false)|bool + tags: [ 'oscheck', 'fstests', 'run_tests', 'monitoring', 'monitor_run' ] + # Recent environments runs are showing that environment variables # set below are not propagated. So best to stuff what you need # into the .kdevops_fstests_setup file which is sourced by root. @@ -1281,6 +1288,13 @@ when: - kdevops_run_fstests|bool +# Stop monitoring services and collect data after running tests +- import_tasks: ../../monitoring/tasks/monitor_collect.yml + when: + - kdevops_run_fstests|bool + - enable_monitoring|default(false)|bool + tags: [ 'oscheck', 'fstests', 'run_tests', 'monitoring', 'monitor_collect' ] + - name: Remove watchdog hint that tests have started local_action: file path="{{ fstests_workflow_dir }}/.begin" state=absent tags: [ 'oscheck', 'fstests', 'run_tests' ] -- 2.47.2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 8/8] monitoring: integrate monitoring collection into fstests workflow 2025-08-11 22:43 ` [PATCH 8/8] monitoring: integrate monitoring collection into fstests workflow Luis Chamberlain @ 2025-08-11 22:46 ` Luis Chamberlain 2025-08-12 0:49 ` Luis Chamberlain 0 siblings, 1 reply; 12+ messages in thread From: Luis Chamberlain @ 2025-08-11 22:46 UTC (permalink / raw) To: Chuck Lever, Daniel Gomez, kdevops On Mon, Aug 11, 2025 at 03:43:07PM -0700, Luis Chamberlain wrote: > Fix monitoring data collection to work properly during fstests runs by > changing from include_role to import_tasks. The include_role directive > with tasks_from parameter wasn't executing properly due to tag filtering > during playbook execution, preventing monitoring data from being collected. > > Using import_tasks ensures the monitoring tasks are statically included > at parse time and properly executed when the appropriate tags are present. > This allows monitoring to run on all hosts and collect folio migration > statistics during test execution. > > The monitoring documentation has also been updated to reflect that it's > now a shared service available to all workflows. > > Generated-by: Claude AI > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Bah, this patch can be ignored for now, I forgot to git am a few things. Luis ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 8/8] monitoring: integrate monitoring collection into fstests workflow 2025-08-11 22:46 ` Luis Chamberlain @ 2025-08-12 0:49 ` Luis Chamberlain 2025-08-14 0:59 ` Luis Chamberlain 0 siblings, 1 reply; 12+ messages in thread From: Luis Chamberlain @ 2025-08-12 0:49 UTC (permalink / raw) To: Chuck Lever, Daniel Gomez, kdevops, Kundan Kumar, Anuj Gupta On Mon, Aug 11, 2025 at 03:46:29PM -0700, Luis Chamberlain wrote: > On Mon, Aug 11, 2025 at 03:43:07PM -0700, Luis Chamberlain wrote: > > Fix monitoring data collection to work properly during fstests runs by > > changing from include_role to import_tasks. The include_role directive > > with tasks_from parameter wasn't executing properly due to tag filtering > > during playbook execution, preventing monitoring data from being collected. > > > > Using import_tasks ensures the monitoring tasks are statically included > > at parse time and properly executed when the appropriate tags are present. > > This allows monitoring to run on all hosts and collect folio migration > > statistics during test execution. > > > > The monitoring documentation has also been updated to reflect that it's > > now a shared service available to all workflows. > > > > Generated-by: Claude AI > > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > > Bah, this patch can be ignored for now, I forgot to git am a few things. OK here's a v2 of that patch follows and a branch with *all* my pending changes: https://github.com/linux-kdevops/kdevops/tree/20250811-monitor-v2 From 75bbc91c5cdaced1ab294b979cf58c13d8ead34c Mon Sep 17 00:00:00 2001 From: Luis Chamberlain <mcgrof@kernel.org> Date: Mon, 11 Aug 2025 15:33:17 -0700 Subject: [PATCH v2] monitoring: add monitoring framework for workflow execution Add a flexible monitoring framework that collects system metrics during workflow execution. The framework supports background monitoring services that automatically start before workflows and collect results afterward. Initial implementation includes: - Core monitoring infrastructure with Kconfig integration - Folio migration statistics monitor (for developmental kernel features) - Integration with fstests workflow - Result collection and visualization support - Documentation for adding new monitors and integrating with workflows The monitoring system is designed to be modular, allowing workflows to opt-in and new monitors to be easily added. Results are stored in workflow-specific directories and can include both raw data and visualizations. Generated-by: Claude AI Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- Kconfig | 4 + README.md | 1 + docs/monitoring.md | 279 ++++++++++++++++++ kconfigs/monitors/Kconfig | 74 +++++ playbooks/roles/fstests/tasks/main.yml | 14 + playbooks/roles/monitoring/defaults/main.yml | 18 ++ .../monitoring/files/plot_migration_stats.py | 220 ++++++++++++++ playbooks/roles/monitoring/tasks/main.yml | 23 ++ .../monitoring/tasks/monitor_collect.yml | 205 +++++++++++++ .../roles/monitoring/tasks/monitor_run.yml | 83 ++++++ 10 files changed, 921 insertions(+) create mode 100644 docs/monitoring.md create mode 100644 kconfigs/monitors/Kconfig create mode 100644 playbooks/roles/monitoring/defaults/main.yml create mode 100755 playbooks/roles/monitoring/files/plot_migration_stats.py create mode 100644 playbooks/roles/monitoring/tasks/main.yml create mode 100644 playbooks/roles/monitoring/tasks/monitor_collect.yml create mode 100644 playbooks/roles/monitoring/tasks/monitor_run.yml diff --git a/Kconfig b/Kconfig index 988782a9dc83..3f2bc8cf019a 100644 --- a/Kconfig +++ b/Kconfig @@ -79,6 +79,10 @@ menu "Target workflows" source "kconfigs/workflows/Kconfig" endmenu +menu "Monitors" +source "kconfigs/monitors/Kconfig" +endmenu + menu "Kdevops configuration" source "kconfigs/Kconfig.kdevops" endmenu diff --git a/README.md b/README.md index e695d088dcb8..c9f442492559 100644 --- a/README.md +++ b/README.md @@ -306,6 +306,7 @@ Below is kdevops' recommended documentation reading. * [kdevops' evolving make help](docs/evolving-make-help.md) * [kdevops configuration](docs/kdevops-configuration.md) * [kdevops mirror support](docs/kdevops-mirror.md) + * [kdevops monitoring services](docs/monitoring.md) * [kdevops first run](docs/kdevops-first-run.md) * [kdevops running make](docs/running-make.md) * [kdevops libvirt storage pool considerations](docs/libvirt-storage-pool.md) diff --git a/docs/monitoring.md b/docs/monitoring.md new file mode 100644 index 000000000000..7102db5d518b --- /dev/null +++ b/docs/monitoring.md @@ -0,0 +1,279 @@ +# Monitoring Services in kdevops + +## Overview + +kdevops provides a flexible monitoring framework that allows you to collect system metrics and statistics during workflow execution. This is particularly useful for: + +- Performance analysis during testing +- Debugging kernel behavior +- Understanding system resource usage patterns +- Validating new kernel features with custom metrics + +The monitoring framework runs services in the background during workflow execution and automatically collects results afterward. + +## Configuration + +### Enabling Monitoring + +Monitoring services are configured through the kdevops menuconfig system: + +```bash +make menuconfig +# Navigate to: Monitors +# Enable: "Enable monitoring services during workflow execution" +``` + +### Available Monitors + +#### Folio Migration Statistics (Developmental) + +This monitor tracks page/folio migration statistics in the Linux kernel. It's marked as "developmental" because it requires kernel patches that are not yet upstream. + +**Requirements:** +- Kernel with folio migration debugfs stats patch applied +- Debugfs mounted at `/sys/kernel/debug` +- File exists: `/sys/kernel/debug/mm/migrate/stats` + +**Configuration:** +```bash +make menuconfig +# Navigate to: Monitors +# Enable: "Enable monitoring services during workflow execution" +# Enable: "Enable developmental statistics (not yet upstream)" +# Enable: "Monitor folio migration statistics" +# Set: "Folio migration monitoring interval" (default: 60 seconds) +``` + +## Integration with Workflows + +### Currently Supported Workflows + +- **fstests**: Filesystem testing framework + +### How Workflows Integrate Monitoring + +Workflows integrate monitoring by including the monitoring role at appropriate points. Here's the pattern used in fstests: + +```yaml +# Start monitoring before tests +- name: Start monitoring services + include_role: + name: monitoring + tasks_from: monitor_run + when: + - kdevops_run_fstests|bool + - enable_monitoring|default(false)|bool + tags: [ 'oscheck', 'fstests', 'run_tests', 'monitoring', 'monitor_run' ] + +# ... workflow tasks run here ... + +# Stop monitoring and collect data after tests +- name: Stop monitoring services and collect data + include_role: + name: monitoring + tasks_from: monitor_collect + when: + - kdevops_run_fstests|bool + - enable_monitoring|default(false)|bool + tags: [ 'oscheck', 'fstests', 'run_tests', 'monitoring', 'monitor_collect' ] +``` + +### Adding Monitoring to Your Workflow + +To add monitoring support to a new workflow: + +1. **Identify the execution boundaries**: Determine where your workflow starts and completes its main work. + +2. **Include the monitoring role**: Add the monitoring role calls before and after your main tasks: + +```yaml +# In your workflow's main task file (e.g., playbooks/roles/YOUR_WORKFLOW/tasks/main.yml) + +# Set custom monitoring results path (optional) +- name: Set monitoring results path for this workflow + set_fact: + monitoring_results_base_path: "{{ topdir_path }}/workflows/YOUR_WORKFLOW/results/monitoring" + when: + - enable_monitoring|default(false)|bool + +# Start monitoring +- name: Start monitoring services + include_role: + name: monitoring + tasks_from: monitor_run + when: + - your_workflow_condition|bool + - enable_monitoring|default(false)|bool + tags: [ 'your_workflow', 'monitoring', 'monitor_run' ] + +# Your workflow tasks here... + +# Stop monitoring +- name: Stop monitoring services and collect data + include_role: + name: monitoring + tasks_from: monitor_collect + when: + - your_workflow_condition|bool + - enable_monitoring|default(false)|bool + tags: [ 'your_workflow', 'monitoring', 'monitor_collect' ] +``` + +3. **Test the integration**: Run your workflow with monitoring enabled to verify data collection. + +## Output and Results + +### Result Location + +Monitoring results are stored in workflow-specific directories: + +- **fstests**: `workflows/fstests/results/monitoring/` +- **Other workflows**: `workflows/YOUR_WORKFLOW/results/monitoring/` + +Workflows can customize the results path by setting the `monitoring_results_base_path` variable in their playbook. + +### Result Files + +For folio migration monitoring: +- `<hostname>_folio_migration_stats.txt`: Raw statistics with timestamps +- `<hostname>_folio_migration_plot.png`: Visualization plot (if generation succeeds) + +### Example Output + +Raw statistics file format: +``` +2024-01-15 10:30:00 +success: 12345 +fail: 67 +total: 12412 + +2024-01-15 10:31:00 +success: 12456 +fail: 68 +total: 12524 +``` + +## Running Workflows with Monitoring + +### Example: fstests with Folio Migration Monitoring + +1. **Configure monitoring**: +```bash +make menuconfig +# Enable monitoring options as described above +make +``` + +2. **Provision systems**: +```bash +make bringup +``` + +3. **Run tests with monitoring**: +```bash +# Run on both baseline and dev groups +make fstests-tests TESTS=generic/003 + +# Or run on specific group +make fstests-baseline TESTS=generic/003 +``` + +4. **Check results**: +```bash +ls -la workflows/fstests/results/monitoring/ +``` + +## Advanced Usage + +### Custom Monitoring Intervals + +You can override the monitoring interval at runtime: + +```bash +make fstests-tests EXTRA_VARS="monitor_folio_migration_interval=30" +``` + +### Selective Monitoring + +You can enable/disable specific monitors at runtime: + +```bash +# Enable only folio migration monitoring +make fstests-tests EXTRA_VARS="enable_monitoring=true monitor_folio_migration=true" +``` + +## Troubleshooting + +### Monitor Not Starting + +1. **Check kernel support**: +```bash +ansible all -m shell -a "ls -la /sys/kernel/debug/mm/migrate/stats" +``` + +2. **Verify debugfs is mounted**: +```bash +ansible all -m shell -a "mount | grep debugfs" +``` + +3. **Check monitoring process**: +```bash +ansible all -m shell -a "ps aux | grep monitoring" +``` + +### No Data Collected + +1. **Verify monitoring was enabled**: +```bash +grep -E "enable_monitoring|monitor_" .config +``` + +2. **Check ansible output for monitoring tasks**: +```bash +make fstests-tests AV=2 | grep -A5 -B5 monitoring +``` + +3. **Look for error messages**: +```bash +ansible all -m shell -a "cat /root/monitoring/folio_migration.log" +``` + +## Adding New Monitors + +To add a new monitor to the framework: + +1. **Add Kconfig option** in `kconfigs/monitors/Kconfig`: +```kconfig +config MONITOR_YOUR_METRIC + bool "Monitor your metric description" + output yaml + default n + help + Detailed description of what this monitors... +``` + +2. **Extend monitoring role**: + - Add collection logic in `playbooks/roles/monitoring/tasks/monitor_run.yml` + - Add termination and data collection in `playbooks/roles/monitoring/tasks/monitor_collect.yml` + +3. **Add visualization** (optional): + - Place scripts in `playbooks/roles/monitoring/files/` + - Call them from `monitor_collect.yml` + +4. **Update documentation**: Add your monitor to this documentation file. + +## Performance Considerations + +- **Monitoring overhead**: Each monitor adds some system overhead. Consider the trade-off between data granularity and performance impact. +- **Storage requirements**: Long-running tests with frequent monitoring can generate large data files. +- **Concurrent monitors**: Running multiple monitors simultaneously increases overhead. + +## Future Enhancements + +Planned monitoring additions: +- Memory pressure statistics +- CPU utilization tracking +- I/O statistics collection +- Network traffic monitoring +- Custom perf event monitoring +- Integration with Grafana/Prometheus for real-time visualization diff --git a/kconfigs/monitors/Kconfig b/kconfigs/monitors/Kconfig new file mode 100644 index 000000000000..6dc1ddbdd2e9 --- /dev/null +++ b/kconfigs/monitors/Kconfig @@ -0,0 +1,74 @@ +# SPDX-License-Identifier: copyleft-next-0.3.1 + +config ENABLE_MONITORING + bool "Enable monitoring services during workflow execution" + output yaml + default n + help + Enable monitoring services to collect statistics during workflow + execution. This allows collection of various system metrics while + workflows are running. + + Monitoring services run in the background during test execution and + automatically collect results afterward. The collected data can be + used for performance analysis, debugging, and understanding system + behavior during tests. + + Individual workflows must add support for monitoring integration. + Currently supported workflows: + - fstests + +if ENABLE_MONITORING + +config MONITOR_DEVELOPMENTAL_STATS + bool "Enable developmental statistics (not yet upstream)" + output yaml + default n + help + Enable collection of statistics that are still in development + and not yet merged upstream in the Linux kernel. + + This is useful for testing and validating new kernel features + that provide additional debugging or performance metrics. + +if MONITOR_DEVELOPMENTAL_STATS + +config MONITOR_FOLIO_MIGRATION + bool "Monitor folio migration statistics" + output yaml + default n + help + Enable monitoring of folio migration statistics if available. + This requires the kernel to have the folio migration debugfs + stats patch applied. + + The statistics are collected from: + /sys/kernel/debug/mm/migrate/stats + + This feature collects migration statistics periodically during + workflow execution and can generate plots for visualization. + +config MONITOR_FOLIO_MIGRATION_INTERVAL + int "Folio migration monitoring interval (seconds)" + output yaml + default 60 + depends on MONITOR_FOLIO_MIGRATION + help + How often to collect folio migration statistics in seconds. + Default is 60 seconds. + + Lower values provide more granular data but may impact system + performance. Higher values reduce overhead but may miss + short-lived migration events. + +endif # MONITOR_DEVELOPMENTAL_STATS + +# Future monitoring options can be added here +# Examples: +# - Memory pressure monitoring +# - CPU utilization tracking +# - I/O statistics collection +# - Network traffic monitoring +# - Custom perf event monitoring + +endif # ENABLE_MONITORING diff --git a/playbooks/roles/fstests/tasks/main.yml b/playbooks/roles/fstests/tasks/main.yml index 2665f693af3c..3e2cbf99ff40 100644 --- a/playbooks/roles/fstests/tasks/main.yml +++ b/playbooks/roles/fstests/tasks/main.yml @@ -1235,6 +1235,13 @@ when: - fstests_skip_run|bool +# Start monitoring services before running tests +- import_tasks: ../../monitoring/tasks/monitor_run.yml + when: + - kdevops_run_fstests|bool + - enable_monitoring|default(false)|bool + tags: [ 'oscheck', 'fstests', 'run_tests', 'monitoring', 'monitor_run' ] + # Recent environments runs are showing that environment variables # set below are not propagated. So best to stuff what you need # into the .kdevops_fstests_setup file which is sourced by root. @@ -1281,6 +1288,13 @@ when: - kdevops_run_fstests|bool +# Stop monitoring services and collect data after running tests +- import_tasks: ../../monitoring/tasks/monitor_collect.yml + when: + - kdevops_run_fstests|bool + - enable_monitoring|default(false)|bool + tags: [ 'oscheck', 'fstests', 'run_tests', 'monitoring', 'monitor_collect' ] + - name: Remove watchdog hint that tests have started local_action: file path="{{ fstests_workflow_dir }}/.begin" state=absent tags: [ 'oscheck', 'fstests', 'run_tests' ] diff --git a/playbooks/roles/monitoring/defaults/main.yml b/playbooks/roles/monitoring/defaults/main.yml new file mode 100644 index 000000000000..7306f5959859 --- /dev/null +++ b/playbooks/roles/monitoring/defaults/main.yml @@ -0,0 +1,18 @@ +--- +# Default values for monitoring role + +# Enable monitoring services +enable_monitoring: false + +# Enable developmental statistics +monitor_developmental_stats: false + +# Enable folio migration monitoring +monitor_folio_migration: false + +# Folio migration monitoring interval in seconds +monitor_folio_migration_interval: 60 + +# Base path to store monitoring results on the control host +# Workflows can override by setting monitoring_results_base_path +monitoring_results_base_path: "{{ topdir_path }}/workflows/fstests/results/monitoring" diff --git a/playbooks/roles/monitoring/files/plot_migration_stats.py b/playbooks/roles/monitoring/files/plot_migration_stats.py new file mode 100755 index 000000000000..6cb47d31bd60 --- /dev/null +++ b/playbooks/roles/monitoring/files/plot_migration_stats.py @@ -0,0 +1,220 @@ +#!/usr/bin/env python3 + +import argparse +import os +import re +import matplotlib.pyplot as plt +from matplotlib.ticker import FuncFormatter +from datetime import datetime + + +def human_format(num): + if num >= 1_000_000: + return f"{num//1_000_000:,}M" + elif num >= 1_000: + return f"{num//1_000:,}K" + return f"{num:,}" + + +def parse_stats_file(filename): + """Parse the new format stats file with timestamps and migrate_folio data.""" + timestamps = [] + calls = [] + success = [] + + with open(filename) as f: + content = f.read() + + # Split by timestamps + entries = re.split( + r"(\w{3} \w{3} \d{1,2} \d{2}:\d{2}:\d{2} [AP]M \w{3} \d{4})", content + ) + + for i in range(1, len(entries), 2): + if i + 1 < len(entries): + timestamp = entries[i] + data_block = entries[i + 1] + + # Parse the migrate_folio section + calls_match = re.search(r"calls\s+(\d+)", data_block) + success_match = re.search(r"success\s+(\d+)", data_block) + + if calls_match and success_match: + timestamps.append(timestamp) + calls.append(int(calls_match.group(1))) + success.append(int(success_match.group(1))) + + return timestamps, calls, success + + +def find_start_index(values, threshold=1000): + """Find the index where values start jumping up significantly.""" + for i, val in enumerate(values): + if val >= threshold: + return i + return 0 + + +def cumulative_to_interval(cumulative_data): + """Convert cumulative data to per-interval data.""" + interval_data = [] + for i in range(len(cumulative_data)): + if i == 0: + interval_data.append(cumulative_data[0]) + else: + interval_data.append(cumulative_data[i] - cumulative_data[i - 1]) + return interval_data + + +def find_end_of_activity(interval_data, zero_threshold_hours=1): + """ + Find where activity ends by detecting consistent zero values. + + The heuristic stops data when there's been no activity (0 migrations per minute) + for a continuous period of 1 hour (60 consecutive zero values). This handles + cases where stats collection continues long after the workload has completed. + + Args: + interval_data: List of per-interval values + zero_threshold_hours: Hours of continuous zero activity to detect end (default: 1) + + Returns: + Index where to truncate the data, or len(interval_data) if activity continues + """ + zero_threshold_minutes = int(zero_threshold_hours * 60) + consecutive_zeros = 0 + + for i, value in enumerate(interval_data): + if value == 0: + consecutive_zeros += 1 + if consecutive_zeros >= zero_threshold_minutes: + # Return the index where zeros started + return i - zero_threshold_minutes + 1 + else: + consecutive_zeros = 0 + + return len(interval_data) + + +def plot_folio_migration(stats_files, output_file): + """Plot unified folio migration stats from multiple files.""" + fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(12, 14)) + + colors = plt.cm.tab10(range(len(stats_files))) + + for idx, file in enumerate(stats_files): + name = os.path.splitext(os.path.basename(file))[0] + timestamps, calls_cumulative, success_cumulative = parse_stats_file(file) + + if not calls_cumulative: + continue + + # Find where the workload starts (when calls jump up) + start_idx = find_start_index(calls_cumulative) + + # Trim data to start from when workload begins + calls_cumulative = calls_cumulative[start_idx:] + success_cumulative = success_cumulative[start_idx:] + + # Convert cumulative to per-interval + calls_interval = cumulative_to_interval(calls_cumulative) + success_interval = cumulative_to_interval(success_cumulative) + + # Find where activity ends (1 hour of zero activity) + end_idx = find_end_of_activity(calls_interval) + + # Truncate all data at the end of activity + calls_interval = calls_interval[:end_idx] + success_interval = success_interval[:end_idx] + calls_cumulative = calls_cumulative[:end_idx] + success_cumulative = success_cumulative[:end_idx] + + # Convert to hours from start + time_hours = list(range(len(calls_interval))) # Each entry is 1 minute apart + time_hours = [t / 60.0 for t in time_hours] # Convert to hours + + # Calculate success rate per interval + success_rate = [] + for c, s in zip(calls_interval, success_interval): + if c > 0: + success_rate.append((s / c) * 100) + else: + success_rate.append(0) + + # Plot 1: Cumulative success count over time + ax1.plot( + time_hours, + success_cumulative, + label=f"{name}", + color=colors[idx], + linewidth=2, + alpha=0.8, + ) + + # Plot 2: Migration rate over time (calls per minute) + ax2.plot( + time_hours, + calls_interval, + label=f"{name}", + color=colors[idx], + linewidth=2, + alpha=0.8, + ) + + # Plot 3: Success rate per interval + ax3.plot( + time_hours, + success_rate, + label=f"{name}", + color=colors[idx], + linewidth=2, + marker="o", + markersize=3, + markevery=max(1, len(time_hours) // 20), + ) + + # Configure first plot (cumulative success) + ax1.set_title("Cumulative Successful Migrations", fontsize=16) + ax1.set_xlabel("Time (hours from workload start)", fontsize=12) + ax1.set_ylabel("Total Successful Migrations", fontsize=12) + ax1.yaxis.set_major_formatter(FuncFormatter(lambda x, _: human_format(int(x)))) + ax1.grid(True, alpha=0.3) + ax1.legend(loc="best", fontsize=10) + + # Configure second plot (migration rate) + ax2.set_title("Folio Migration Rate (calls per minute)", fontsize=16) + ax2.set_xlabel("Time (hours from workload start)", fontsize=12) + ax2.set_ylabel("Migrations per minute", fontsize=12) + ax2.yaxis.set_major_formatter(FuncFormatter(lambda x, _: human_format(int(x)))) + ax2.grid(True, alpha=0.3) + ax2.legend(loc="best", fontsize=10) + + # Configure third plot (success rate) + ax3.set_title("Folio Migration Success Rate (per interval)", fontsize=16) + ax3.set_xlabel("Time (hours from workload start)", fontsize=12) + ax3.set_ylabel("Success Rate (%)", fontsize=12) + ax3.set_ylim(0, 105) + ax3.grid(True, alpha=0.3) + ax3.legend(loc="best", fontsize=10) + + plt.tight_layout() + fig.savefig(output_file, dpi=150) + print(f"Saved folio migration plot to: {output_file}") + + +def main(): + parser = argparse.ArgumentParser(description="Plot folio migration stats.") + parser.add_argument("stats_files", nargs="+", help="List of *.stats.txt files") + parser.add_argument( + "-o", + "--output", + default="folio-migration.png", + help="Output PNG file (default: folio-migration.png)", + ) + args = parser.parse_args() + + plot_folio_migration(args.stats_files, args.output) + + +if __name__ == "__main__": + main() diff --git a/playbooks/roles/monitoring/tasks/main.yml b/playbooks/roles/monitoring/tasks/main.yml new file mode 100644 index 000000000000..9c4bd3ab04e1 --- /dev/null +++ b/playbooks/roles/monitoring/tasks/main.yml @@ -0,0 +1,23 @@ +--- +- name: Import optional extra_args file + include_vars: "{{ item }}" + ignore_errors: yes + with_first_found: + - files: + - "../extra_vars.yml" + - "../extra_vars.yaml" + - "../extra_vars.json" + skip: true + tags: vars + +- name: Include monitor_run tasks + include_tasks: monitor_run.yml + when: + - enable_monitoring|default(false)|bool + tags: [ 'monitoring', 'monitor_run' ] + +- name: Include monitor_collect tasks + include_tasks: monitor_collect.yml + when: + - enable_monitoring|default(false)|bool + tags: [ 'monitoring', 'monitor_collect' ] diff --git a/playbooks/roles/monitoring/tasks/monitor_collect.yml b/playbooks/roles/monitoring/tasks/monitor_collect.yml new file mode 100644 index 000000000000..e01e7c347897 --- /dev/null +++ b/playbooks/roles/monitoring/tasks/monitor_collect.yml @@ -0,0 +1,205 @@ +--- +# Tasks to stop monitoring services and collect data after test execution + +- name: Check if folio migration monitoring was started + become: yes + become_method: sudo + stat: + path: /root/monitoring/folio_migration.pid + register: folio_migration_pid_file + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + +- name: Stop folio migration monitoring + become: yes + become_method: sudo + shell: | + if [ -f /root/monitoring/folio_migration.pid ]; then + pid=$(cat /root/monitoring/folio_migration.pid) + if ps -p $pid > /dev/null 2>&1; then + kill $pid + echo "Stopped monitoring process $pid" + else + echo "Monitoring process $pid was not running" + fi + rm -f /root/monitoring/folio_migration.pid + fi + register: stop_monitor + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_pid_file.stat.exists|default(false) + +- name: Display stop monitoring status + debug: + msg: "{{ stop_monitor.stdout }}" + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - stop_monitor is defined + - stop_monitor.changed|default(false) + +- name: Check if monitoring data was collected + become: yes + become_method: sudo + stat: + path: /root/monitoring/folio_migration_stats.txt + register: folio_migration_data_file + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + +- name: Copy plot_migration_stats.py to target + become: yes + become_method: sudo + copy: + src: "{{ playbook_dir }}/roles/monitoring/files/plot_migration_stats.py" + dest: /root/monitoring/plot_migration_stats.py + mode: '0755' + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_data_file.stat.exists|default(false) + +- name: Check if matplotlib is available for plotting + become: yes + become_method: sudo + command: python3 -c "import matplotlib.pyplot" + register: matplotlib_check + ignore_errors: yes + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_data_file.stat.exists|default(false) + +- name: Generate folio migration plots + become: yes + become_method: sudo + command: | + python3 /root/monitoring/plot_migration_stats.py + /root/monitoring/folio_migration_stats.txt + /root/monitoring/folio_migration_plot.png + args: + chdir: /root/monitoring + register: plot_generation + ignore_errors: yes + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_data_file.stat.exists|default(false) + - matplotlib_check.rc == 0 + +- name: Log plot generation skip if matplotlib not available + debug: + msg: "Skipping plot generation - matplotlib not available on target system" + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_data_file.stat.exists|default(false) + - matplotlib_check.rc != 0 + +- name: Debug monitoring collection start + debug: + msg: | + Starting monitoring collection + monitor_developmental_stats: {{ monitor_developmental_stats|default(false) }} + monitor_folio_migration: {{ monitor_folio_migration|default(false) }} + enable_monitoring: {{ enable_monitoring|default(false) }} + kdevops_run_fstests: {{ kdevops_run_fstests|default(false) }} + +- name: Set monitoring results path + set_fact: + monitoring_results_path: "{{ monitoring_results_base_path | default(topdir_path + '/workflows/fstests/results/monitoring') }}" + +- name: Create local monitoring results directory + local_action: file path="{{ monitoring_results_path }}" state=directory + run_once: true + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + +- name: Copy folio migration stats data to localhost + become: yes + become_method: sudo + fetch: + src: /root/monitoring/folio_migration_stats.txt + dest: "{{ monitoring_results_path }}/{{ ansible_hostname }}_folio_migration_stats.txt" + flat: yes + validate_checksum: False + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_data_file.stat.exists|default(false) + +- name: Check if plot was generated + become: yes + become_method: sudo + stat: + path: /root/monitoring/folio_migration_plot.png + register: folio_migration_plot_file + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + +- name: Copy folio migration plot to localhost + become: yes + become_method: sudo + fetch: + src: /root/monitoring/folio_migration_plot.png + dest: "{{ monitoring_results_path }}/{{ ansible_hostname }}_folio_migration_plot.png" + flat: yes + validate_checksum: False + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_plot_file.stat.exists|default(false) + +- name: Display monitoring data collection summary + debug: + msg: | + Folio migration monitoring collection complete. + Data saved to: {{ monitoring_results_path }}/{{ ansible_hostname }}_folio_migration_stats.txt + {% if folio_migration_plot_file.stat.exists|default(false) %} + Plot saved to: {{ monitoring_results_path }}/{{ ansible_hostname }}_folio_migration_plot.png + {% endif %} + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_data_file.stat.exists|default(false) + +# Generate plots on localhost after collecting data +- name: Check if matplotlib is available on localhost + local_action: command python3 -c "import matplotlib.pyplot" + register: localhost_matplotlib_check + ignore_errors: yes + run_once: true + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + +- name: Generate folio migration plots on localhost + local_action: | + command python3 {{ playbook_dir }}/roles/monitoring/files/plot_migration_stats.py + -o {{ monitoring_results_path }}/{{ ansible_hostname }}_folio_migration_plot.png + {{ monitoring_results_path }}/{{ ansible_hostname }}_folio_migration_stats.txt + register: localhost_plot_generation + ignore_errors: yes + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_data_file.stat.exists|default(false) + - localhost_matplotlib_check.rc == 0 + +- name: Log localhost plot generation status + debug: + msg: | + {% if localhost_matplotlib_check.rc != 0 %} + Skipping plot generation - matplotlib not available on localhost + {% else %} + Plot generated: {{ monitoring_results_path }}/{{ ansible_hostname }}_folio_migration_plot.png + {% endif %} + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_data_file.stat.exists|default(false) diff --git a/playbooks/roles/monitoring/tasks/monitor_run.yml b/playbooks/roles/monitoring/tasks/monitor_run.yml new file mode 100644 index 000000000000..068ba67f53ce --- /dev/null +++ b/playbooks/roles/monitoring/tasks/monitor_run.yml @@ -0,0 +1,83 @@ +--- +# Tasks to start monitoring services before test execution + +- name: Check if folio migration stats are available + become: yes + become_method: sudo + stat: + path: /sys/kernel/debug/mm/migrate/stats + register: folio_migration_stats_file + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + +- name: Create monitoring directory + become: yes + become_method: sudo + file: + path: /root/monitoring + state: directory + mode: '0755' + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_stats_file.stat.exists|default(false) + +- name: Start folio migration monitoring in background + become: yes + become_method: sudo + shell: | + nohup bash -c 'while true; do + echo "$(date +"%Y-%m-%d %H:%M:%S")" >> /root/monitoring/folio_migration_stats.txt + cat /sys/kernel/debug/mm/migrate/stats >> /root/monitoring/folio_migration_stats.txt + echo "" >> /root/monitoring/folio_migration_stats.txt + sleep {{ monitor_folio_migration_interval|default(60) }} + done' > /root/monitoring/folio_migration.log 2>&1 & + echo $! > /root/monitoring/folio_migration.pid + async: 86400 # Run for up to 24 hours + poll: 0 + register: folio_migration_monitor + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_stats_file.stat.exists|default(false) + +- name: Save async job ID for later termination + set_fact: + folio_migration_monitor_job: "{{ folio_migration_monitor.ansible_job_id }}" + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_stats_file.stat.exists|default(false) + - folio_migration_monitor is defined + +- name: Verify monitoring started successfully + become: yes + become_method: sudo + shell: | + if [ -f /root/monitoring/folio_migration.pid ]; then + pid=$(cat /root/monitoring/folio_migration.pid) + if ps -p $pid > /dev/null 2>&1; then + echo "Monitoring process $pid is running" + else + echo "ERROR: Monitoring process $pid is not running" >&2 + exit 1 + fi + else + echo "ERROR: PID file not found" >&2 + exit 1 + fi + register: monitor_status + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_stats_file.stat.exists|default(false) + +- name: Display monitoring status + debug: + msg: "{{ monitor_status.stdout }}" + when: + - monitor_developmental_stats|default(false)|bool + - monitor_folio_migration|default(false)|bool + - folio_migration_stats_file.stat.exists|default(false) + - monitor_status is defined -- 2.45.2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 8/8] monitoring: integrate monitoring collection into fstests workflow 2025-08-12 0:49 ` Luis Chamberlain @ 2025-08-14 0:59 ` Luis Chamberlain 0 siblings, 0 replies; 12+ messages in thread From: Luis Chamberlain @ 2025-08-14 0:59 UTC (permalink / raw) To: Chuck Lever, Daniel Gomez, kdevops, Kundan Kumar, Anuj Gupta On Mon, Aug 11, 2025 at 05:49:56PM -0700, Luis Chamberlain wrote: > On Mon, Aug 11, 2025 at 03:46:29PM -0700, Luis Chamberlain wrote: > > On Mon, Aug 11, 2025 at 03:43:07PM -0700, Luis Chamberlain wrote: > > > Fix monitoring data collection to work properly during fstests runs by > > > changing from include_role to import_tasks. The include_role directive > > > with tasks_from parameter wasn't executing properly due to tag filtering > > > during playbook execution, preventing monitoring data from being collected. > > > > > > Using import_tasks ensures the monitoring tasks are statically included > > > at parse time and properly executed when the appropriate tags are present. > > > This allows monitoring to run on all hosts and collect folio migration > > > statistics during test execution. > > > > > > The monitoring documentation has also been updated to reflect that it's > > > now a shared service available to all workflows. > > > > > > Generated-by: Claude AI > > > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > > > > Bah, this patch can be ignored for now, I forgot to git am a few things. > > OK here's a v2 of that patch follows and a branch with *all* my pending > changes: > > https://github.com/linux-kdevops/kdevops/tree/20250811-monitor-v2 I've pushed a better version of this, along with some fstests watchdog enhancements. Luis ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-08-14 0:59 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-11 22:42 [PATCH 0/8] linux-ab enhancements + monitor support Luis Chamberlain 2025-08-11 22:43 ` [PATCH 1/8] bootlinux: use different kernel for A/B testing by default Luis Chamberlain 2025-08-11 22:43 ` [PATCH 2/8] bootlinux: add support for custom refs on dev kernels on the CLI Luis Chamberlain 2025-08-11 22:43 ` [PATCH 3/8] bootlinux: add git ref verification before cloning Luis Chamberlain 2025-08-11 22:43 ` [PATCH 4/8] bootlinux: add git dirty check " Luis Chamberlain 2025-08-11 22:43 ` [PATCH 5/8] bootlinux: add intelligent git repository detection and management Luis Chamberlain 2025-08-11 22:43 ` [PATCH 6/8] bootlinux: enhance A/B testing and repository management Luis Chamberlain 2025-08-11 22:43 ` [PATCH 7/8] fstests: add make target for running tests on all hosts Luis Chamberlain 2025-08-11 22:43 ` [PATCH 8/8] monitoring: integrate monitoring collection into fstests workflow Luis Chamberlain 2025-08-11 22:46 ` Luis Chamberlain 2025-08-12 0:49 ` Luis Chamberlain 2025-08-14 0:59 ` Luis Chamberlain
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox