From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66B2E393DC3 for ; Thu, 21 Aug 2025 21:28:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755811735; cv=none; b=VG6U5grbArzWlH3D0g9SQtOR+s163t3m9Uw2s4/66evXpPowt166Pctd1IUfCnzfHZ4zAKxeA+a1D/4oaGRI13IEGhv0E7RUKJIHl2qz0bgGW9v5STdRmLwT+UVq2mIK6tMg0Ppi8j4OgRTUWDbdw3T+09LSnlJgWWMGFZzreoA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755811735; c=relaxed/simple; bh=ve2FRjivRImCYe9iE8XVaYwUoqMSJbMnd2vonwPeYNE=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=e1mQV4+T4Ww3Rga75FNWUltzf9ewAi+TOzIpCp1/aswp8lFcdADGqMzkXUGVJ0CXwzUtWTcu43wMEIdYO9vkFFnkKyJ6Arvhw2FgM3GFTWwvxwMrnxvyE9sJCKr+ibLti6cbhaKp057eRlfGYZ+kpd8Qgk/YywCQPYPyaa2SJp4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=y9tP6rB4; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="y9tP6rB4" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:Message-ID:Date:Subject:Cc:To:From:Reply-To: Content-ID:Content-Description:In-Reply-To:References; bh=xUMDHXQGBjBg2z1MSAsNXN72vtIHOAQzi43F2m9ef7Y=; b=y9tP6rB4bhdlqBLLdXLq/AwzG1 vlXj6z/u0Gbiohu596eE2Su9UA2NBvxr2KWxRpTdc1D313FrFSzcvH6crUzmoT5hczdnELkjIHydU TPiX09CIsllmDrqgNn7nSDU5oiK5ditUiFRYwva9tbYsHfRgIqTDXsd4jqLTduTdcS8+nch5KkXZE dZbqo0Xs6pN0Ivlo6L+ip5+pPjl4cFgNwQrWL+Q/clGve2+FkHCBuD85UqIjVkjV0e4IgfHZ8CunL u6pHbW9uddxnPlw9HBLBn57E+tj3xje0IbehYkvG4a1ELXXUGkiBscUdMQw2HtwT9c4aoq9mB46Gs PtNNC3PA==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1upCq0-00000000fD2-2430; Thu, 21 Aug 2025 21:28:48 +0000 From: Luis Chamberlain To: Chuck Lever , Daniel Gomez , Vincent Fu , kdevops@lists.linux.dev Cc: Luis Chamberlain Subject: [PATCH v2] workflows: add fio-tests performance tests Date: Thu, 21 Aug 2025 14:28:43 -0700 Message-ID: <20250821212844.158398-1-mcgrof@kernel.org> X-Mailer: git-send-email 2.49.0 Precedence: bulk X-Mailing-List: kdevops@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: Luis Chamberlain I had written fio-tests [0] a long time ago but although it used Kconfig, at that point I had lacked the for-sight of leveraging jinja2 / ansible to make things more declarative. It's been on my back log to try to get that ported over to kdevops but now with Claude Code, it just required a series of prompts. And its even better now. *This* is how we scale. I've added a demo tree which just has the graphs for those just itching to see what this produces [1]. It's just a demo comparing two separate runs don't get too excited as its just a silly guest with virtio drives. However this should hopefully show you how easily and quickly with the new A/B testing feature. The run time for tests is configurable, you also use the FIO_QUICK environment variable to do quick tests. The runtime choices are: - Default: 60 seconds runtime, 10 seconds ramp time - Quick: 10 seconds runtime, 2 seconds ramp time (selected with FIO_QUICK=y) - Custom High: 300 seconds runtime, 30 seconds ramp time - Custom Low: 5 seconds runtime, 1 second ramp time We add defconfig-fio-tests-perf which to enable all performance testing knobs: - All block sizes (4K, 8K, 16K, 32K, 64K, 128K) - All IO depths (1, 4, 8, 16, 32, 64) - All job counts (1, 2, 4, 8, 16) - All test patterns (random/sequential read/write, mixed workloads) - High DPI (300) for graphs - A/B testing with baseline and dev nodes This allows quick CI testing with: make defconfig-fio-tests-perf FIO_QUICK=y Or comprehensive performance testing with: make defconfig-fio-tests-perf Key features: - Configurable test matrix: block sizes (4K-128K), IO depths (1-64), job counts - Multiple workload patterns: random/sequential read/write, mixed workloads - Advanced configuration: IO engines, direct IO, fsync options - Performance logging: bandwidth, IOPS, and latency metrics - Baseline management and results analysis - FIO_TESTS_ENABLE_GRAPHING: Enable/disable graphing capabilities - Graph format, DPI, and theme configuration options - Updated CI defconfig with graphing support (150 DPI for faster CI) Key graphing features support: - Performance analysis: bandwidth heatmaps, IOPS scaling, latency distributions - A/B comparison: baseline vs development configuration analysis - Trend analysis: block size scaling, IO depth optimization, correlation matrices - Configurable output: PNG/SVG/PDF formats, DPI settings, matplotlib themes Documentation: - docs/fio-tests.md: Comprehensive workflow documentation covering: * Origin story and relationship to upstream fio-tests project * Quick start and configuration examples * Test matrix configuration and device setup * A/B testing and baseline management * Graphing and visualization capabilities * CI integration and troubleshooting guides * Best practices and performance considerations A minimal CI configuration (defconfig-fio-tests-ci) enables automated testing in GitHub Actions using /dev/null as the target device with a reduced test matrix for fast execution. We extend PROMPTS.md with prompts used for all this. Usage: make defconfig-fio-tests-ci # Simple CI testing make menuconfig # Interactive configuration make fio-tests # Run performance tests make fio-tests-baseline # Establish baseline make fio-tests-results # Collect results make fio-tests-graph # Generate performance graphs make fio-tests-compare # Compare baseline vs dev results make fio-tests-trend-analysis # Analyze performance trends Link: https://github.com/mcgrof/fio-tests # [0] Link: https://github.com/mcgrof/fio-tests-graphs-on-kdevops # [1] Generated-by: Claude AI Signed-off-by: Luis Chamberlain --- Changes on this v2: - Rebased - The last graphs didn't make too much sense so I've asked for some new ones. I think we can start with this for now. - Removed the docker tests as we can't run make menuconfig on a docker container. We can later enable on bare metal testing with the defconfigs/fio-tests-ci. It should be fast. .gitignore | 2 + PROMPTS.md | 105 ++++ README.md | 10 + defconfigs/fio-tests-ci | 56 ++ defconfigs/fio-tests-perf | 58 +++ docs/fio-tests.md | 377 ++++++++++++++ kconfigs/workflows/Kconfig | 27 + playbooks/fio-tests-baseline.yml | 29 ++ playbooks/fio-tests-compare.yml | 33 ++ playbooks/fio-tests-graph.yml | 78 +++ playbooks/fio-tests-results.yml | 10 + playbooks/fio-tests-trend-analysis.yml | 30 ++ playbooks/fio-tests.yml | 11 + .../python/workflows/fio-tests/fio-compare.py | 383 ++++++++++++++ .../python/workflows/fio-tests/fio-plot.py | 350 +++++++++++++ .../workflows/fio-tests/fio-trend-analysis.py | 477 ++++++++++++++++++ playbooks/roles/fio-tests/defaults/main.yml | 48 ++ .../tasks/install-deps/debian/main.yml | 20 + .../fio-tests/tasks/install-deps/main.yml | 3 + .../tasks/install-deps/redhat/main.yml | 20 + .../tasks/install-deps/suse/main.yml | 20 + playbooks/roles/fio-tests/tasks/main.yaml | 170 +++++++ .../roles/fio-tests/templates/fio-job.ini.j2 | 29 ++ playbooks/roles/gen_hosts/tasks/main.yml | 13 + .../roles/gen_hosts/templates/fio-tests.j2 | 28 + playbooks/roles/gen_hosts/templates/hosts.j2 | 38 ++ playbooks/roles/gen_nodes/tasks/main.yml | 32 ++ workflows/Makefile | 4 + workflows/fio-tests/Kconfig | 420 +++++++++++++++ workflows/fio-tests/Makefile | 68 +++ 30 files changed, 2949 insertions(+) create mode 100644 defconfigs/fio-tests-ci create mode 100644 defconfigs/fio-tests-perf create mode 100644 docs/fio-tests.md create mode 100644 playbooks/fio-tests-baseline.yml create mode 100644 playbooks/fio-tests-compare.yml create mode 100644 playbooks/fio-tests-graph.yml create mode 100644 playbooks/fio-tests-results.yml create mode 100644 playbooks/fio-tests-trend-analysis.yml create mode 100644 playbooks/fio-tests.yml create mode 100755 playbooks/python/workflows/fio-tests/fio-compare.py create mode 100755 playbooks/python/workflows/fio-tests/fio-plot.py create mode 100755 playbooks/python/workflows/fio-tests/fio-trend-analysis.py create mode 100644 playbooks/roles/fio-tests/defaults/main.yml create mode 100644 playbooks/roles/fio-tests/tasks/install-deps/debian/main.yml create mode 100644 playbooks/roles/fio-tests/tasks/install-deps/main.yml create mode 100644 playbooks/roles/fio-tests/tasks/install-deps/redhat/main.yml create mode 100644 playbooks/roles/fio-tests/tasks/install-deps/suse/main.yml create mode 100644 playbooks/roles/fio-tests/tasks/main.yaml create mode 100644 playbooks/roles/fio-tests/templates/fio-job.ini.j2 create mode 100644 playbooks/roles/gen_hosts/templates/fio-tests.j2 create mode 100644 workflows/fio-tests/Kconfig create mode 100644 workflows/fio-tests/Makefile diff --git a/.gitignore b/.gitignore index cfafa909cb40..50dc877adff5 100644 --- a/.gitignore +++ b/.gitignore @@ -70,6 +70,8 @@ workflows/sysbench/results/ workflows/mmtests/results/ tmp +workflows/fio-tests/results/ + playbooks/roles/linux-mirror/linux-mirror-systemd/*.service playbooks/roles/linux-mirror/linux-mirror-systemd/*.timer playbooks/roles/linux-mirror/linux-mirror-systemd/mirrors.yaml diff --git a/PROMPTS.md b/PROMPTS.md index a92f96f8e23b..1b60cbe62752 100644 --- a/PROMPTS.md +++ b/PROMPTS.md @@ -124,6 +124,111 @@ source "workflows/mmtests/Kconfig.fs" This separation is preferred as it helps us scale. +## Port an external project into kdevops + +The fio-tests was an older external project however its more suitably placed +into kdevops as jinja2 lets us easily scale this project. The projects also +were authored by the same person and the same license was used. The porting +took a few separate prompts as described below. + +### Initial implementation of fio-tests workflow on kdevops + +**Prompt:** +Now that we merged steady state to kdevops -- now let's add specific target +workflow support for different target different simple workflows. Learn from +how sysbench added two guests so we can do A/B testing in two separate guests. +The workflows you will focus on will be the workflows from +https://github.com/mcgrof/fio-tests. We already took steady state and +pre-conditioning from there so no need to do that. All we need to do is just +now target the different other workflows. Leverage the Kconfig documentation we +used on that project and adapt it to leverage output yaml on kdevops. Then also +to help test things we can simply add a basic test so that +.github/workflows/docker-tests.yml can run some tests using /dev/null as a +target block device for just one simple workflow. + +**AI:** Claude Code +**Commit:** TDB +**Result:** Excellent implementation with comprehensive workflow structure. +**Grading:** 90% + +**Notes:** + +The implementation successfully: +- Added complete fio-tests workflow with A/B testing support following sysbench patterns +- Created comprehensive Kconfig structure with output yaml support for all options +- Implemented configurable test matrices (block sizes, IO depths, job counts, patterns) +- Added ansible role with template-based job generation +- Integrated with main kdevops workflow system and makefiles +- Created CI-optimized defconfig using /dev/null target device +- Updated GitHub Actions workflow for automated testing + +Minor areas for improvement: +- Could have included more detailed help text in some Kconfig options +- Template generation could be more dynamic for complex configurations +- Didn't add documentation, which means we should extend CLAUDE.md to + add documentation when adding a new workflow. +- Did not pick up on the trend to prefer to have 'make foo-results' to always + copy results locally. + +### Extend fio-tests with graphing support + +**Prompt:** +The fio-tests project had support for graphing. Bring that over and add that to +kdevops. I am the author of fio-tests so I own all the code. Be sure to use +SPDX for my top header files with the copyleft-next license as is done with +tons of code on kdevops. + +**AI:** Claude Code +**Commit:** TDB +**Result:** Comprehensive graphing implementation with proper licensing. +**Grading:** 95% + +**Notes:** + +Outstanding implementation that: +- Improved upon the graphs I had originally had on fio-tests and actually + innovated on some! Also took the initiative to do A/B performance analysis! +- Created three comprehensive Python scripts with proper SPDX copyleft-next-0.3.1 headers +- Implemented advanced graphing: performance analysis, A/B comparison, trend analysis +- Added configurable graphing options through Kconfig (format, DPI, themes) +- Included conditional dependency installation across distributions +- Created ansible playbooks for automated graph generation +- Added make targets for different types of analysis +- Updated CI configuration with graphing support + +The implementation perfectly followed kdevops patterns and demonstrated +excellent understanding of the codebase structure. The graphing capabilities +are comprehensive and production-ready. + +### Add the fio-tests documentation + +**Prompt:** +Now add documentation for fio-tests on kdevops. Extend README.md with a small +section and point to its own documentation file. You can use the upstream +fio-tests https://github.com/mcgrof/fio-tests page for inspiration, but +obviously we want to port this to how you've implemented support on kdevops. +You can point back to the old https://github.com/mcgrof/fio-tests page as an +origin story. Also extend PROMPTS.md with the few prompts I've given you to +help add support for fio-tests and graphing support. + +**AI:** Claude Code +**Commit:** TDB +**Result:** Comprehensive documentation with examples and troubleshooting. +**Grading:** 90% + +**Notes:** + +The documentation implementation includes: +- Updated README.md with fio-tests section linking to detailed documentation +- Created comprehensive docs/fio-tests.md with full workflow coverage +- Included origin story referencing original fio-tests framework +- Added detailed configuration examples and troubleshooting guides +- Documented all graphing capabilities with usage examples +- Extended PROMPTS.md with the implementation prompts for future AI reference + +This demonstrates the complete lifecycle of implementing a complex workflow in +kdevops from initial implementation through comprehensive documentation. + ## Kernel development and A/B testing support ### Adding A/B kernel testing support for different kernel versions diff --git a/README.md b/README.md index c471277eac11..0c30762a269f 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,7 @@ Table of Contents * [CXL](#cxl) * [reboot-limit](#reboot-limit) * [sysbench](#sysbench) + * [fio-tests](#fio-tests) * [kdevops chats](#kdevops-chats) * [kdevops on discord](#kdevops-on-discord) * [kdevops IRC](#kdevops-irc) @@ -263,6 +264,15 @@ kdevops supports automation of sysbench tests on VMs with or without providers. For details refer to the [kdevops sysbench documentation](docs/sysbench/sysbench.md). +### fio-tests + +kdevops includes comprehensive storage performance testing through the fio-tests +workflow, adapted from the original [fio-tests framework](https://github.com/mcgrof/fio-tests). +This workflow provides flexible I/O benchmarking with configurable test matrices, +A/B testing capabilities, and advanced graphing and visualization support. For +detailed configuration and usage information, refer to the +[kdevops fio-tests documentation](docs/fio-tests.md). + ## kdevops chats We use discord and IRC. Right now we have more folks on discord than on IRC. diff --git a/defconfigs/fio-tests-ci b/defconfigs/fio-tests-ci new file mode 100644 index 000000000000..88b0e467d9d4 --- /dev/null +++ b/defconfigs/fio-tests-ci @@ -0,0 +1,56 @@ +# Minimal fio-tests configuration for CI testing +# Workflow configuration +CONFIG_WORKFLOWS=y +CONFIG_WORKFLOWS_TESTS=y +CONFIG_WORKFLOWS_LINUX_TESTS=y +CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y +CONFIG_KDEVOPS_WORKFLOW_DEDICATE_FIO_TESTS=y + +# fio-tests specific config for CI +CONFIG_FIO_TESTS_PERFORMANCE_ANALYSIS=y +CONFIG_FIO_TESTS_DEVICE="/dev/null" +CONFIG_FIO_TESTS_RUNTIME="10" +CONFIG_FIO_TESTS_RAMP_TIME="2" + +# Minimal test matrix for CI +CONFIG_FIO_TESTS_BS_4K=y +CONFIG_FIO_TESTS_BS_8K=n +CONFIG_FIO_TESTS_BS_16K=n +CONFIG_FIO_TESTS_BS_32K=n +CONFIG_FIO_TESTS_BS_64K=n +CONFIG_FIO_TESTS_BS_128K=n + +CONFIG_FIO_TESTS_IODEPTH_1=y +CONFIG_FIO_TESTS_IODEPTH_4=n +CONFIG_FIO_TESTS_IODEPTH_8=n +CONFIG_FIO_TESTS_IODEPTH_16=n +CONFIG_FIO_TESTS_IODEPTH_32=n +CONFIG_FIO_TESTS_IODEPTH_64=n + +CONFIG_FIO_TESTS_NUMJOBS_1=y +CONFIG_FIO_TESTS_NUMJOBS_2=n +CONFIG_FIO_TESTS_NUMJOBS_4=n +CONFIG_FIO_TESTS_NUMJOBS_8=n +CONFIG_FIO_TESTS_NUMJOBS_16=n + +CONFIG_FIO_TESTS_PATTERN_RAND_READ=y +CONFIG_FIO_TESTS_PATTERN_RAND_WRITE=n +CONFIG_FIO_TESTS_PATTERN_SEQ_READ=n +CONFIG_FIO_TESTS_PATTERN_SEQ_WRITE=n +CONFIG_FIO_TESTS_PATTERN_MIXED_75_25=n +CONFIG_FIO_TESTS_PATTERN_MIXED_50_50=n + +CONFIG_FIO_TESTS_IOENGINE="io_uring" +CONFIG_FIO_TESTS_DIRECT=y +CONFIG_FIO_TESTS_FSYNC_ON_CLOSE=y +CONFIG_FIO_TESTS_RESULTS_DIR="/data/fio-tests" +CONFIG_FIO_TESTS_LOG_AVG_MSEC=1000 + +# Graphing configuration +CONFIG_FIO_TESTS_ENABLE_GRAPHING=y +CONFIG_FIO_TESTS_GRAPH_FORMAT="png" +CONFIG_FIO_TESTS_GRAPH_DPI=150 +CONFIG_FIO_TESTS_GRAPH_THEME="default" + +# Baseline/dev testing setup +CONFIG_KDEVOPS_BASELINE_AND_DEV=y diff --git a/defconfigs/fio-tests-perf b/defconfigs/fio-tests-perf new file mode 100644 index 000000000000..df0fc6e86f33 --- /dev/null +++ b/defconfigs/fio-tests-perf @@ -0,0 +1,58 @@ +# Full performance testing configuration for fio-tests +# Enables all test parameters for comprehensive performance analysis + +# Workflow configuration +CONFIG_WORKFLOWS=y +CONFIG_WORKFLOWS_TESTS=y +CONFIG_WORKFLOWS_LINUX_TESTS=y +CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y +CONFIG_KDEVOPS_WORKFLOW_DEDICATE_FIO_TESTS=y + +# Performance analysis mode +CONFIG_FIO_TESTS_PERFORMANCE_ANALYSIS=y + +# Enable all block sizes +CONFIG_FIO_TESTS_BS_4K=y +CONFIG_FIO_TESTS_BS_8K=y +CONFIG_FIO_TESTS_BS_16K=y +CONFIG_FIO_TESTS_BS_32K=y +CONFIG_FIO_TESTS_BS_64K=y +CONFIG_FIO_TESTS_BS_128K=y + +# Enable all IO depths +CONFIG_FIO_TESTS_IODEPTH_1=y +CONFIG_FIO_TESTS_IODEPTH_4=y +CONFIG_FIO_TESTS_IODEPTH_8=y +CONFIG_FIO_TESTS_IODEPTH_16=y +CONFIG_FIO_TESTS_IODEPTH_32=y +CONFIG_FIO_TESTS_IODEPTH_64=y + +# Enable all job counts +CONFIG_FIO_TESTS_NUMJOBS_1=y +CONFIG_FIO_TESTS_NUMJOBS_2=y +CONFIG_FIO_TESTS_NUMJOBS_4=y +CONFIG_FIO_TESTS_NUMJOBS_8=y +CONFIG_FIO_TESTS_NUMJOBS_16=y + +# Enable all test patterns +CONFIG_FIO_TESTS_PATTERN_RAND_READ=y +CONFIG_FIO_TESTS_PATTERN_RAND_WRITE=y +CONFIG_FIO_TESTS_PATTERN_SEQ_READ=y +CONFIG_FIO_TESTS_PATTERN_SEQ_WRITE=y +CONFIG_FIO_TESTS_PATTERN_MIXED_75_25=y +CONFIG_FIO_TESTS_PATTERN_MIXED_50_50=y + +# Performance settings +CONFIG_FIO_TESTS_IOENGINE="io_uring" +CONFIG_FIO_TESTS_DIRECT=y +CONFIG_FIO_TESTS_FSYNC_ON_CLOSE=y +CONFIG_FIO_TESTS_RESULTS_DIR="/data/fio-tests" +CONFIG_FIO_TESTS_LOG_AVG_MSEC=1000 + +# Graphing configuration +CONFIG_FIO_TESTS_ENABLE_GRAPHING=y +CONFIG_FIO_TESTS_GRAPH_FORMAT="png" +CONFIG_FIO_TESTS_GRAPH_DPI=300 + +# Baseline/dev testing +CONFIG_KDEVOPS_BASELINE_AND_DEV=y diff --git a/docs/fio-tests.md b/docs/fio-tests.md new file mode 100644 index 000000000000..3383d81a0307 --- /dev/null +++ b/docs/fio-tests.md @@ -0,0 +1,377 @@ +# kdevops fio-tests workflow + +kdevops includes comprehensive storage performance testing through the fio-tests +workflow, providing flexible I/O benchmarking with configurable test matrices, +A/B testing capabilities, and advanced graphing and visualization support. + +## Origin and inspiration + +The fio-tests workflow in kdevops is adapted from the original +[fio-tests framework](https://github.com/mcgrof/fio-tests), which was designed +to provide systematic storage performance testing with dynamic test generation +and comprehensive analysis capabilities. The kdevops implementation brings +these capabilities into the kdevops ecosystem with seamless integration to +support virtualization, cloud providers, and bare metal testing. + +## Overview + +The fio-tests workflow enables comprehensive storage device performance testing +by generating configurable test matrices across multiple dimensions: + +- **Block sizes**: 4K, 8K, 16K, 32K, 64K, 128K +- **I/O depths**: 1, 4, 8, 16, 32, 64 +- **Job counts**: 1, 2, 4, 8, 16 concurrent fio jobs +- **Workload patterns**: Random/sequential read/write, mixed workloads +- **A/B testing**: Baseline vs development configuration comparison + +## Quick start + +### Basic configuration + +Configure fio-tests for quick testing: + +```bash +make defconfig-fio-tests-ci # Use minimal CI configuration +make menuconfig # Or configure interactively +make bringup # Provision test environment +make fio-tests # Run performance tests +``` + +### Comprehensive testing + +For full performance analysis: + +```bash +make menuconfig # Select fio-tests dedicated workflow +# Configure test matrix, block sizes, IO depths, patterns +make bringup # Provision baseline and dev nodes +make fio-tests # Run comprehensive test suite +make fio-tests-graph # Generate performance graphs +make fio-tests-compare # Compare baseline vs dev results +``` + +## Configuration options + +### Test types + +The workflow supports multiple test types optimized for different analysis goals: + +- **Performance analysis**: Comprehensive testing across all configured parameters +- **Latency analysis**: Focus on latency characteristics and tail latency +- **Throughput scaling**: Optimize for maximum throughput analysis +- **Mixed workloads**: Real-world application pattern simulation + +### Test matrix configuration + +Configure the test matrix through menuconfig: + +``` +Block size configuration → + [*] 4K block size tests + [*] 8K block size tests + [*] 16K block size tests + [ ] 32K block size tests + [ ] 64K block size tests + [ ] 128K block size tests + +IO depth configuration → + [*] IO depth 1 + [*] IO depth 4 + [*] IO depth 8 + [*] IO depth 16 + [ ] IO depth 32 + [ ] IO depth 64 + +Thread/job configuration → + [*] Single job + [*] 2 jobs + [*] 4 jobs + [ ] 8 jobs + [ ] 16 jobs + +Workload patterns → + [*] Random read + [*] Random write + [*] Sequential read + [*] Sequential write + [ ] Mixed 75% read / 25% write + [ ] Mixed 50% read / 50% write +``` + +### Advanced configuration + +Advanced settings for fine-tuning: + +- **I/O engine**: io_uring (recommended), libaio, psync, sync +- **Direct I/O**: Bypass page cache for accurate device testing +- **Test duration**: Runtime per test job (default: 60 seconds) +- **Ramp time**: Warm-up period before measurements (default: 10 seconds) +- **Results directory**: Storage location for test results and logs + +## Device configuration + +The workflow automatically selects appropriate storage devices based on your +infrastructure configuration: + +### Virtualization (libvirt) +- NVMe: `/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops1` +- VirtIO: `/dev/disk/by-id/virtio-kdevops1` +- IDE: `/dev/disk/by-id/ata-QEMU_HARDDISK_kdevops1` +- SCSI: `/dev/sdc` + +### Cloud providers +- AWS: `/dev/nvme2n1` (instance store) +- GCE: `/dev/nvme1n1` +- Azure: `/dev/sdd` +- OCI: Configurable sparse volume device + +### Testing/CI +- `/dev/null`: For configuration validation and CI testing + +## A/B testing + +The fio-tests workflow supports comprehensive A/B testing through the +`KDEVOPS_BASELINE_AND_DEV` configuration, which provisions separate +nodes for baseline and development testing. + +### Baseline establishment + +```bash +make fio-tests # Run tests on both baseline and dev +make fio-tests-baseline # Save current results as baseline +``` + +### Comparison analysis + +```bash +make fio-tests-compare # Generate A/B comparison analysis +``` + +This creates comprehensive comparison reports including: +- Side-by-side performance metrics +- Percentage improvement/regression analysis +- Statistical summaries +- Visual comparison charts + +## Graphing and visualization + +The fio-tests workflow includes comprehensive graphing capabilities through +Python scripts with matplotlib, pandas, and seaborn. + +### Enable graphing + +```bash +# In menuconfig: +Advanced configuration → + [*] Enable graphing and visualization + Graph output format (png) ---> + (300) Graph resolution (DPI) + (default) Matplotlib theme +``` + +### Available visualizations + +#### Performance analysis graphs +```bash +make fio-tests-graph +``` + +Generates: +- **Bandwidth heatmaps**: Performance across block sizes and I/O depths +- **IOPS scaling**: Scaling behavior with increasing I/O depth +- **Latency distributions**: Read/write latency characteristics +- **Pattern comparisons**: Performance across different workload patterns + +#### A/B comparison analysis +```bash +make fio-tests-compare +``` + +Creates: +- **Comparison bar charts**: Side-by-side baseline vs development +- **Performance delta analysis**: Percentage improvements across metrics +- **Summary reports**: Detailed statistical analysis + +#### Trend analysis +```bash +make fio-tests-trend-analysis +``` + +Provides: +- **Block size trends**: Performance scaling with block size +- **I/O depth scaling**: Efficiency analysis across patterns +- **Latency percentiles**: P95, P99 latency analysis +- **Correlation matrices**: Relationships between test parameters + +### Graph customization + +Configure graph output through Kconfig: + +- **Format**: PNG (default), SVG, PDF, JPG +- **Resolution**: 150 DPI (CI), 300 DPI (standard), 600 DPI (high quality) +- **Theme**: default, seaborn, dark_background, ggplot, bmh + +## Workflow targets + +The fio-tests workflow provides several make targets: + +### Core testing +- `make fio-tests`: Run the configured test matrix +- `make fio-tests-baseline`: Establish performance baseline +- `make fio-tests-results`: Collect and summarize test results + +### Analysis and visualization +- `make fio-tests-graph`: Generate performance graphs +- `make fio-tests-compare`: Compare baseline vs development results +- `make fio-tests-trend-analysis`: Analyze performance trends + +### Help +- `make fio-tests-help-menu`: Display available fio-tests targets + +## Results and output + +### Test results structure + +Results are organized in the configured results directory (default: `/data/fio-tests`): + +``` +/data/fio-tests/ +├── jobs/ # Generated fio job files +│ ├── randread_bs4k_iodepth1_jobs1.ini +│ └── ... +├── results_*.json # JSON format results +├── results_*.txt # Human-readable results +├── bw_*, iops_*, lat_* # Performance logs +├── graphs/ # Generated visualizations +│ ├── performance_bandwidth_heatmap.png +│ ├── performance_iops_scaling.png +│ └── ... +├── analysis/ # Trend analysis +│ ├── block_size_trends.png +│ └── correlation_heatmap.png +└── baseline/ # Baseline results + └── baseline_*.txt +``` + +### Result interpretation + +#### JSON output structure +Each test produces detailed JSON output with: +- Bandwidth metrics (KB/s) +- IOPS measurements +- Latency statistics (mean, stddev, percentiles) +- Job-specific performance data + +#### Performance logs +Detailed time-series logs for: +- Bandwidth over time +- IOPS over time +- Latency over time + +## CI integration + +The fio-tests workflow includes CI-optimized configuration: + +```bash +make defconfig-fio-tests-ci +``` + +CI-specific optimizations: +- Uses `/dev/null` as target device +- Minimal test matrix (4K block size, IO depth 1, single job) +- Short test duration (10 seconds) and ramp time (2 seconds) +- Lower DPI (150) for faster graph generation +- Essential workload patterns only (random read) + +## Troubleshooting + +### Common issues + +#### Missing dependencies +```bash +# Ensure graphing dependencies are installed +# This is handled automatically when FIO_TESTS_ENABLE_GRAPHING=y +``` + +#### No test results +- Verify device permissions and accessibility +- Check fio installation: `fio --version` +- Examine fio job files in results directory + +#### Graph generation failures +- Verify Python dependencies: matplotlib, pandas, seaborn +- Check results directory contains JSON output files +- Ensure sufficient disk space for graph files + +### Debug information + +Enable verbose output: +```bash +make V=1 fio-tests # Verbose build output +make AV=2 fio-tests # Ansible verbose output +``` + +## Performance considerations + +### Test duration vs coverage +- **Short tests** (10-60 seconds): Quick validation, less accurate +- **Medium tests** (5-10 minutes): Balanced accuracy and time +- **Long tests** (30+ minutes): High accuracy, comprehensive analysis + +### Resource requirements +- **CPU**: Scales with job count and I/O depth +- **Memory**: Minimal for fio, moderate for graphing (pandas/matplotlib) +- **Storage**: Depends on test duration and logging configuration +- **Network**: Minimal except for result collection + +### Optimization tips +- Use dedicated storage for results directory +- Enable direct I/O for accurate device testing +- Configure appropriate test matrix for your analysis goals +- Use A/B testing for meaningful performance comparisons + +## Integration with other workflows + +The fio-tests workflow integrates seamlessly with other kdevops workflows: + +### Combined testing +- Run fio-tests alongside fstests for comprehensive filesystem analysis +- Use with sysbench for database vs raw storage performance comparison +- Combine with blktests for block layer and device-level testing + +### Steady state preparation +- Use `KDEVOPS_WORKFLOW_ENABLE_SSD_STEADY_STATE` for SSD conditioning +- Run steady state before fio-tests for consistent results + +## Best practices + +### Configuration +1. Start with CI configuration for validation +2. Gradually expand test matrix based on analysis needs +3. Use A/B testing for meaningful comparisons +4. Enable graphing for visual analysis + +### Testing methodology +1. Establish baseline before configuration changes +2. Run multiple iterations for statistical significance +3. Use appropriate test duration for your workload +4. Document test conditions and configuration + +### Result analysis +1. Focus on relevant metrics for your use case +2. Use trend analysis to identify optimal configurations +3. Compare against baseline for regression detection +4. Share graphs and summaries for team collaboration + +## Contributing + +The fio-tests workflow follows kdevops development practices: + +- Use atomic commits with DCO sign-off +- Include "Generated-by: Claude AI" for AI-assisted contributions +- Test changes with CI configuration +- Update documentation for new features +- Follow existing code style and patterns + +For more information about contributing to kdevops, see the main project +documentation and CLAUDE.md for AI development guidelines. diff --git a/kconfigs/workflows/Kconfig b/kconfigs/workflows/Kconfig index b1b8a48b8536..6b2a37696afd 100644 --- a/kconfigs/workflows/Kconfig +++ b/kconfigs/workflows/Kconfig @@ -207,6 +207,13 @@ config KDEVOPS_WORKFLOW_DEDICATE_MMTESTS This will dedicate your configuration to running only the mmtests workflow for memory fragmentation testing. +config KDEVOPS_WORKFLOW_DEDICATE_FIO_TESTS + bool "fio-tests" + select KDEVOPS_WORKFLOW_ENABLE_FIO_TESTS + help + This will dedicate your configuration to running only the + fio-tests workflow for comprehensive storage performance testing. + endchoice config KDEVOPS_WORKFLOW_NAME @@ -221,6 +228,7 @@ config KDEVOPS_WORKFLOW_NAME default "nfstest" if KDEVOPS_WORKFLOW_DEDICATE_NFSTEST default "sysbench" if KDEVOPS_WORKFLOW_DEDICATE_SYSBENCH default "mmtests" if KDEVOPS_WORKFLOW_DEDICATE_MMTESTS + default "fio-tests" if KDEVOPS_WORKFLOW_DEDICATE_FIO_TESTS endif @@ -322,6 +330,14 @@ config KDEVOPS_WORKFLOW_NOT_DEDICATED_ENABLE_MMTESTS Select this option if you want to provision mmtests on a single target node for by-hand testing. +config KDEVOPS_WORKFLOW_NOT_DEDICATED_ENABLE_FIO_TESTS + bool "fio-tests" + select KDEVOPS_WORKFLOW_ENABLE_FIO_TESTS + depends on LIBVIRT || TERRAFORM_PRIVATE_NET + help + Select this option if you want to provision fio-tests on a + single target node for by-hand testing. + endif # !WORKFLOWS_DEDICATED_WORKFLOW config KDEVOPS_WORKFLOW_ENABLE_FSTESTS @@ -435,6 +451,17 @@ source "workflows/mmtests/Kconfig" endmenu endif # KDEVOPS_WORKFLOW_ENABLE_MMTESTS +config KDEVOPS_WORKFLOW_ENABLE_FIO_TESTS + bool + output yaml + default y if KDEVOPS_WORKFLOW_NOT_DEDICATED_ENABLE_FIO_TESTS || KDEVOPS_WORKFLOW_DEDICATE_FIO_TESTS + +if KDEVOPS_WORKFLOW_ENABLE_FIO_TESTS +menu "Configure and run fio-tests" +source "workflows/fio-tests/Kconfig" +endmenu +endif # KDEVOPS_WORKFLOW_ENABLE_FIO_TESTS + config KDEVOPS_WORKFLOW_ENABLE_SSD_STEADY_STATE bool "Attain SSD steady state prior to tests" output yaml diff --git a/playbooks/fio-tests-baseline.yml b/playbooks/fio-tests-baseline.yml new file mode 100644 index 000000000000..1f990c0efa40 --- /dev/null +++ b/playbooks/fio-tests-baseline.yml @@ -0,0 +1,29 @@ +--- +- hosts: + - baseline + - dev + become: no + vars: + ansible_ssh_pipelining: True + tasks: + - name: Create baseline directory structure + file: + path: "{{ fio_tests_results_dir }}/baseline" + state: directory + mode: '0755' + become: yes + + - name: Save current test configuration as baseline + copy: + src: "{{ fio_tests_results_dir }}/results_{{ item }}.txt" + dest: "{{ fio_tests_results_dir }}/baseline/baseline_{{ item }}.txt" + remote_src: yes + backup: yes + with_fileglob: + - "{{ fio_tests_results_dir }}/results_*.txt" + become: yes + ignore_errors: yes + + - name: Create baseline timestamp + shell: date > "{{ fio_tests_results_dir }}/baseline/baseline_timestamp.txt" + become: yes diff --git a/playbooks/fio-tests-compare.yml b/playbooks/fio-tests-compare.yml new file mode 100644 index 000000000000..e6e3464613c2 --- /dev/null +++ b/playbooks/fio-tests-compare.yml @@ -0,0 +1,33 @@ +--- +- hosts: localhost + become: no + vars: + ansible_ssh_pipelining: True + tasks: + - name: Check if baseline and dev hosts exist in inventory + fail: + msg: "Both baseline and dev hosts must exist for comparison" + when: "'baseline' not in groups or 'dev' not in groups" + + - name: Create local graph results comparison directory + file: + path: "{{ topdir_path }}/workflows/fio-tests/results/graphs" + state: directory + mode: '0755' + + - name: Generate comparison graphs + shell: | + python3 {{ topdir_path }}/playbooks/python/workflows/fio-tests/fio-compare.py \ + {{ topdir_path }}/workflows/fio-tests/results/{{ groups['baseline'][0] }}/fio-tests-results-{{ groups['baseline'][0] }} \ + {{ topdir_path }}/workflows/fio-tests/results/{{ groups['dev'][0] }}/fio-tests-results-{{ groups['dev'][0] }} \ + --output-dir {{ topdir_path }}/workflows/fio-tests/results/graphs \ + --baseline-label "Baseline" \ + --dev-label "Development" + + - name: List comparison results + shell: ls -la {{ topdir_path }}/workflows/fio-tests/results/graphs + register: comparison_list + + - name: Display comparison results + debug: + msg: "{{ comparison_list.stdout_lines }}" diff --git a/playbooks/fio-tests-graph.yml b/playbooks/fio-tests-graph.yml new file mode 100644 index 000000000000..a3ca9513b528 --- /dev/null +++ b/playbooks/fio-tests-graph.yml @@ -0,0 +1,78 @@ +--- +- hosts: localhost + become: no + vars: + ansible_ssh_pipelining: True + tasks: + - name: Ensure fio-tests results have been collected + stat: + path: "{{ topdir_path }}/workflows/fio-tests/results" + register: results_dir + tags: ['graph'] + + - name: Fail if results directory doesn't exist + fail: + msg: "Results directory not found. Please run 'make fio-tests-results' first to collect results from target nodes." + when: not results_dir.stat.exists + tags: ['graph'] + + - name: Find all collected result directories + find: + paths: "{{ topdir_path }}/workflows/fio-tests/results" + file_type: directory + recurse: no + register: result_dirs + tags: ['graph'] + + - name: Generate performance graphs for each host + shell: | + host_dir="{{ item.path }}" + host_name="{{ item.path | basename }}" + results_subdir="${host_dir}/fio-tests-results-${host_name}" + + # Check if extracted results exist + if [[ ! -d "${results_subdir}" ]]; then + echo "No extracted results found for ${host_name}" + exit 0 + fi + + # Create graphs directory + mkdir -p "${host_dir}/graphs" + + # Generate graphs using the fio-plot.py script + python3 {{ topdir_path }}/playbooks/python/workflows/fio-tests/fio-plot.py \ + "${results_subdir}" \ + --output-dir "${host_dir}/graphs" \ + --prefix "${host_name}_performance" + + echo "Generated graphs for ${host_name}" + loop: "{{ result_dirs.files }}" + when: item.isdir + tags: ['graph'] + register: graph_results + ignore_errors: yes + + - name: Display graph generation results + debug: + msg: "{{ item.stdout_lines | default(['No output']) }}" + loop: "{{ graph_results.results }}" + when: graph_results is defined + tags: ['graph'] + + - name: List all generated graphs + shell: | + for host_dir in {{ topdir_path }}/workflows/fio-tests/results/*/; do + if [[ -d "${host_dir}/graphs" ]]; then + host_name=$(basename "$host_dir") + echo "=== Graphs for ${host_name} ===" + ls -la "${host_dir}/graphs/" 2>/dev/null || echo "No graphs found" + echo "" + fi + done + register: all_graphs + tags: ['graph'] + + - name: Display generated graphs summary + debug: + msg: "{{ all_graphs.stdout_lines }}" + tags: ['graph'] diff --git a/playbooks/fio-tests-results.yml b/playbooks/fio-tests-results.yml new file mode 100644 index 000000000000..dcc2ea1847e9 --- /dev/null +++ b/playbooks/fio-tests-results.yml @@ -0,0 +1,10 @@ +--- +- hosts: + - baseline + - dev + become: no + vars: + ansible_ssh_pipelining: True + roles: + - role: fio-tests + tags: ['results'] diff --git a/playbooks/fio-tests-trend-analysis.yml b/playbooks/fio-tests-trend-analysis.yml new file mode 100644 index 000000000000..e94a1a1f76ee --- /dev/null +++ b/playbooks/fio-tests-trend-analysis.yml @@ -0,0 +1,30 @@ +--- +- hosts: + - baseline + - dev + become: no + vars: + ansible_ssh_pipelining: True + tasks: + - include_role: + name: create_data_partition + tags: [ 'oscheck', 'data_partition' ] + + - name: Generate fio trend analysis + shell: | + cd {{ fio_tests_results_dir }} + mkdir -p analysis + python3 {{ kdevops_data }}/playbooks/python/workflows/fio-tests/fio-trend-analysis.py \ + . --output-dir analysis + args: + creates: "{{ fio_tests_results_dir }}/analysis/block_size_trends.png" + become: yes + + - name: List generated analysis files + shell: ls -la {{ fio_tests_results_dir }}/analysis/ + become: yes + register: analysis_list + + - name: Display generated analysis files + debug: + msg: "{{ analysis_list.stdout_lines }}" diff --git a/playbooks/fio-tests.yml b/playbooks/fio-tests.yml new file mode 100644 index 000000000000..b9f5f936c653 --- /dev/null +++ b/playbooks/fio-tests.yml @@ -0,0 +1,11 @@ +--- +- hosts: + - baseline + - dev + become: no + vars: + ansible_ssh_pipelining: True + roles: + - role: create_data_partition + tags: ['data_partition'] + - role: fio-tests diff --git a/playbooks/python/workflows/fio-tests/fio-compare.py b/playbooks/python/workflows/fio-tests/fio-compare.py new file mode 100755 index 000000000000..0ed3e2a101c1 --- /dev/null +++ b/playbooks/python/workflows/fio-tests/fio-compare.py @@ -0,0 +1,383 @@ +#!/usr/bin/python3 +# SPDX-License-Identifier: copyleft-next-0.3.1 + +# Compare fio test results between baseline and dev configurations for A/B testing + +import pandas as pd +import matplotlib.pyplot as plt +import json +import argparse +import os +import sys +from pathlib import Path + + +def parse_fio_json(file_path): + """Parse fio JSON output and extract key metrics""" + try: + with open(file_path, "r") as f: + data = json.load(f) + + if "jobs" not in data: + return None + + job = data["jobs"][0] # Use first job + + # Extract read metrics + read_stats = job.get("read", {}) + read_bw = read_stats.get("bw", 0) / 1024 # Convert to MB/s + read_iops = read_stats.get("iops", 0) + read_lat_mean = ( + read_stats.get("lat_ns", {}).get("mean", 0) / 1000000 + ) # Convert to ms + + # Extract write metrics + write_stats = job.get("write", {}) + write_bw = write_stats.get("bw", 0) / 1024 # Convert to MB/s + write_iops = write_stats.get("iops", 0) + write_lat_mean = ( + write_stats.get("lat_ns", {}).get("mean", 0) / 1000000 + ) # Convert to ms + + return { + "read_bw": read_bw, + "read_iops": read_iops, + "read_lat": read_lat_mean, + "write_bw": write_bw, + "write_iops": write_iops, + "write_lat": write_lat_mean, + "total_bw": read_bw + write_bw, + "total_iops": read_iops + write_iops, + } + except (json.JSONDecodeError, FileNotFoundError, KeyError) as e: + print(f"Error parsing {file_path}: {e}") + return None + + +def extract_test_params(filename): + """Extract test parameters from filename""" + parts = filename.replace(".json", "").replace("results_", "").split("_") + + params = {} + for part in parts: + if part.startswith("bs"): + params["block_size"] = part[2:] + elif part.startswith("iodepth"): + params["io_depth"] = int(part[7:]) + elif part.startswith("jobs"): + params["num_jobs"] = int(part[4:]) + elif part in [ + "randread", + "randwrite", + "seqread", + "seqwrite", + "mixed_75_25", + "mixed_50_50", + ]: + params["pattern"] = part + + return params + + +def load_results(results_dir, config_name): + """Load all fio results from a directory""" + results = [] + + json_files = list(Path(results_dir).glob("results_*.json")) + if not json_files: + json_files = list(Path(results_dir).glob("results_*.txt")) + + for file_path in json_files: + if file_path.name.endswith(".json"): + metrics = parse_fio_json(file_path) + else: + continue + + if metrics: + params = extract_test_params(file_path.name) + result = {**params, **metrics, "config": config_name} + results.append(result) + + return pd.DataFrame(results) if results else None + + +def plot_comparison_bar_chart(baseline_df, dev_df, metric, output_file, title, ylabel): + """Create side-by-side bar chart comparison""" + if baseline_df.empty or dev_df.empty: + return + + # Group by test configuration and calculate means + baseline_grouped = baseline_df.groupby(["pattern", "block_size", "io_depth"])[ + metric + ].mean() + dev_grouped = dev_df.groupby(["pattern", "block_size", "io_depth"])[metric].mean() + + # Find common test configurations + common_configs = baseline_grouped.index.intersection(dev_grouped.index) + + if len(common_configs) == 0: + return + + baseline_values = [baseline_grouped[config] for config in common_configs] + dev_values = [dev_grouped[config] for config in common_configs] + + # Create labels from config tuples + labels = [f"{pattern}\n{bs}@{depth}" for pattern, bs, depth in common_configs] + + x = range(len(labels)) + width = 0.35 + + plt.figure(figsize=(16, 8)) + + plt.bar( + [i - width / 2 for i in x], + baseline_values, + width, + label="Baseline", + color="skyblue", + edgecolor="navy", + ) + plt.bar( + [i + width / 2 for i in x], + dev_values, + width, + label="Development", + color="lightcoral", + edgecolor="darkred", + ) + + # Add percentage improvement annotations + for i, (baseline_val, dev_val) in enumerate(zip(baseline_values, dev_values)): + if baseline_val > 0: + improvement = ((dev_val - baseline_val) / baseline_val) * 100 + y_pos = max(baseline_val, dev_val) * 1.05 + color = "green" if improvement > 0 else "red" + plt.text( + i, + y_pos, + f"{improvement:+.1f}%", + ha="center", + va="bottom", + color=color, + fontweight="bold", + ) + + plt.xlabel("Test Configuration (Pattern Block_Size@IO_Depth)") + plt.ylabel(ylabel) + plt.title(title) + plt.xticks(x, labels, rotation=45, ha="right") + plt.legend() + plt.grid(True, alpha=0.3, axis="y") + plt.tight_layout() + plt.savefig(output_file, dpi=300, bbox_inches="tight") + plt.close() + + +def plot_performance_delta(baseline_df, dev_df, output_file): + """Plot performance delta (percentage improvement) across metrics""" + if baseline_df.empty or dev_df.empty: + return + + metrics = ["total_bw", "total_iops", "read_lat", "write_lat"] + metric_names = ["Bandwidth", "IOPS", "Read Latency", "Write Latency"] + + fig, axes = plt.subplots(2, 2, figsize=(16, 12)) + axes = axes.flatten() + + for idx, (metric, name) in enumerate(zip(metrics, metric_names)): + baseline_grouped = baseline_df.groupby(["pattern", "block_size", "io_depth"])[ + metric + ].mean() + dev_grouped = dev_df.groupby(["pattern", "block_size", "io_depth"])[ + metric + ].mean() + + common_configs = baseline_grouped.index.intersection(dev_grouped.index) + + if len(common_configs) == 0: + continue + + # Calculate percentage changes + percent_changes = [] + config_labels = [] + + for config in common_configs: + baseline_val = baseline_grouped[config] + dev_val = dev_grouped[config] + + if baseline_val > 0: + # For latency, lower is better, so invert the calculation + if "lat" in metric: + change = ((baseline_val - dev_val) / baseline_val) * 100 + else: + change = ((dev_val - baseline_val) / baseline_val) * 100 + + percent_changes.append(change) + pattern, bs, depth = config + config_labels.append(f"{pattern}\n{bs}@{depth}") + + if percent_changes: + colors = ["green" if x > 0 else "red" for x in percent_changes] + bars = axes[idx].bar( + range(len(percent_changes)), percent_changes, color=colors + ) + + # Add value labels on bars + for bar, value in zip(bars, percent_changes): + height = bar.get_height() + axes[idx].text( + bar.get_x() + bar.get_width() / 2.0, + height, + f"{value:.1f}%", + ha="center", + va="bottom" if height > 0 else "top", + ) + + axes[idx].set_title(f"{name} Performance Change") + axes[idx].set_ylabel("Percentage Change (%)") + axes[idx].set_xticks(range(len(config_labels))) + axes[idx].set_xticklabels(config_labels, rotation=45, ha="right") + axes[idx].axhline(y=0, color="black", linestyle="-", alpha=0.3) + axes[idx].grid(True, alpha=0.3, axis="y") + + plt.tight_layout() + plt.savefig(output_file, dpi=300, bbox_inches="tight") + plt.close() + + +def generate_summary_report(baseline_df, dev_df, output_file): + """Generate a text summary report of the comparison""" + with open(output_file, "w") as f: + f.write("FIO Performance Comparison Report\n") + f.write("=" * 40 + "\n\n") + + f.write(f"Baseline tests: {len(baseline_df)} configurations\n") + f.write(f"Development tests: {len(dev_df)} configurations\n\n") + + metrics = ["total_bw", "total_iops", "read_lat", "write_lat"] + metric_names = [ + "Total Bandwidth (MB/s)", + "Total IOPS", + "Read Latency (ms)", + "Write Latency (ms)", + ] + + for metric, name in zip(metrics, metric_names): + f.write(f"{name}:\n") + f.write("-" * len(name) + "\n") + + baseline_mean = baseline_df[metric].mean() + dev_mean = dev_df[metric].mean() + + if baseline_mean > 0: + if "lat" in metric: + improvement = ((baseline_mean - dev_mean) / baseline_mean) * 100 + direction = "reduction" if improvement > 0 else "increase" + else: + improvement = ((dev_mean - baseline_mean) / baseline_mean) * 100 + direction = "improvement" if improvement > 0 else "regression" + + f.write(f" Baseline average: {baseline_mean:.2f}\n") + f.write(f" Development average: {dev_mean:.2f}\n") + f.write(f" Change: {improvement:+.1f}% {direction}\n\n") + else: + f.write(f" No data available\n\n") + + +def main(): + parser = argparse.ArgumentParser( + description="Compare fio performance between baseline and development configurations" + ) + parser.add_argument( + "baseline_dir", type=str, help="Directory containing baseline results" + ) + parser.add_argument( + "dev_dir", type=str, help="Directory containing development results" + ) + parser.add_argument( + "--output-dir", + type=str, + default=".", + help="Output directory for comparison graphs", + ) + parser.add_argument( + "--prefix", type=str, default="fio_comparison", help="Prefix for output files" + ) + parser.add_argument( + "--baseline-label", + type=str, + default="Baseline", + help="Label for baseline configuration", + ) + parser.add_argument( + "--dev-label", + type=str, + default="Development", + help="Label for development configuration", + ) + + args = parser.parse_args() + + if not os.path.exists(args.baseline_dir): + print(f"Error: Baseline directory '{args.baseline_dir}' not found.") + sys.exit(1) + + if not os.path.exists(args.dev_dir): + print(f"Error: Development directory '{args.dev_dir}' not found.") + sys.exit(1) + + os.makedirs(args.output_dir, exist_ok=True) + + print("Loading baseline results...") + baseline_df = load_results(args.baseline_dir, args.baseline_label) + + print("Loading development results...") + dev_df = load_results(args.dev_dir, args.dev_label) + + if baseline_df is None or baseline_df.empty: + print("No baseline results found.") + sys.exit(1) + + if dev_df is None or dev_df.empty: + print("No development results found.") + sys.exit(1) + + print( + f"Comparing {len(baseline_df)} baseline vs {len(dev_df)} development results..." + ) + + # Generate comparison charts + plot_comparison_bar_chart( + baseline_df, + dev_df, + "total_bw", + os.path.join(args.output_dir, f"{args.prefix}_bandwidth_comparison.png"), + "Bandwidth Comparison", + "Bandwidth (MB/s)", + ) + + plot_comparison_bar_chart( + baseline_df, + dev_df, + "total_iops", + os.path.join(args.output_dir, f"{args.prefix}_iops_comparison.png"), + "IOPS Comparison", + "IOPS", + ) + + plot_performance_delta( + baseline_df, + dev_df, + os.path.join(args.output_dir, f"{args.prefix}_performance_delta.png"), + ) + + # Generate summary report + generate_summary_report( + baseline_df, dev_df, os.path.join(args.output_dir, f"{args.prefix}_summary.txt") + ) + + print(f"Comparison results saved to {args.output_dir}") + + +if __name__ == "__main__": + main() diff --git a/playbooks/python/workflows/fio-tests/fio-plot.py b/playbooks/python/workflows/fio-tests/fio-plot.py new file mode 100755 index 000000000000..2a1948063cb1 --- /dev/null +++ b/playbooks/python/workflows/fio-tests/fio-plot.py @@ -0,0 +1,350 @@ +#!/usr/bin/python3 +# SPDX-License-Identifier: copyleft-next-0.3.1 + +# Accepts fio output and provides comprehensive plots for performance analysis + +import pandas as pd +import matplotlib.pyplot as plt +import json +import argparse +import os +import sys +from pathlib import Path + + +def parse_fio_json(file_path): + """Parse fio JSON output and extract key metrics""" + try: + with open(file_path, "r") as f: + data = json.load(f) + + if "jobs" not in data: + return None + + job = data["jobs"][0] # Use first job + + # Extract read metrics + read_stats = job.get("read", {}) + read_bw = read_stats.get("bw", 0) / 1024 # Convert to MB/s + read_iops = read_stats.get("iops", 0) + read_lat_mean = ( + read_stats.get("lat_ns", {}).get("mean", 0) / 1000000 + ) # Convert to ms + + # Extract write metrics + write_stats = job.get("write", {}) + write_bw = write_stats.get("bw", 0) / 1024 # Convert to MB/s + write_iops = write_stats.get("iops", 0) + write_lat_mean = ( + write_stats.get("lat_ns", {}).get("mean", 0) / 1000000 + ) # Convert to ms + + return { + "read_bw": read_bw, + "read_iops": read_iops, + "read_lat": read_lat_mean, + "write_bw": write_bw, + "write_iops": write_iops, + "write_lat": write_lat_mean, + "total_bw": read_bw + write_bw, + "total_iops": read_iops + write_iops, + } + except (json.JSONDecodeError, FileNotFoundError, KeyError) as e: + print(f"Error parsing {file_path}: {e}") + return None + + +def extract_test_params(filename): + """Extract test parameters from filename""" + # Expected format: pattern_bs4k_iodepth1_jobs1.json + parts = filename.replace(".json", "").replace("results_", "").split("_") + + params = {} + for part in parts: + if part.startswith("bs"): + params["block_size"] = part[2:] + elif part.startswith("iodepth"): + params["io_depth"] = int(part[7:]) + elif part.startswith("jobs"): + params["num_jobs"] = int(part[4:]) + elif part in [ + "randread", + "randwrite", + "seqread", + "seqwrite", + "mixed_75_25", + "mixed_50_50", + ]: + params["pattern"] = part + + return params + + +def create_performance_matrix(results_dir): + """Create performance matrix from all test results""" + results = [] + + # Look for JSON result files + json_files = list(Path(results_dir).glob("results_*.json")) + if not json_files: + # Fallback to text files if JSON not available + json_files = list(Path(results_dir).glob("results_*.txt")) + + for file_path in json_files: + if file_path.name.endswith(".json"): + metrics = parse_fio_json(file_path) + else: + continue # Skip text files for now, could add text parsing later + + if metrics: + params = extract_test_params(file_path.name) + result = {**params, **metrics} + results.append(result) + + return pd.DataFrame(results) if results else None + + +def plot_bandwidth_heatmap(df, output_file): + """Create bandwidth heatmap across block sizes and IO depths""" + if df.empty or "block_size" not in df.columns or "io_depth" not in df.columns: + return + + # Create pivot table for heatmap + pivot_data = df.pivot_table( + values="total_bw", index="io_depth", columns="block_size", aggfunc="mean" + ) + + plt.figure(figsize=(12, 8)) + im = plt.imshow(pivot_data.values, cmap="viridis", aspect="auto") + + # Add colorbar + plt.colorbar(im, label="Bandwidth (MB/s)") + + # Set ticks and labels + plt.xticks(range(len(pivot_data.columns)), pivot_data.columns) + plt.yticks(range(len(pivot_data.index)), pivot_data.index) + + plt.xlabel("Block Size") + plt.ylabel("IO Depth") + plt.title("Bandwidth Performance Matrix") + + # Add text annotations + for i in range(len(pivot_data.index)): + for j in range(len(pivot_data.columns)): + if not pd.isna(pivot_data.iloc[i, j]): + plt.text( + j, + i, + f"{pivot_data.iloc[i, j]:.0f}", + ha="center", + va="center", + color="white", + fontweight="bold", + ) + + plt.tight_layout() + plt.savefig(output_file, dpi=300, bbox_inches="tight") + plt.close() + + +def plot_iops_scaling(df, output_file): + """Plot IOPS scaling with IO depth""" + if df.empty or "io_depth" not in df.columns: + return + + plt.figure(figsize=(12, 8)) + + # Group by pattern and plot separately + patterns = df["pattern"].unique() if "pattern" in df.columns else ["all"] + + for pattern in patterns: + if pattern != "all": + pattern_df = df[df["pattern"] == pattern] + else: + pattern_df = df + + # Group by IO depth and calculate mean IOPS + iops_by_depth = pattern_df.groupby("io_depth")["total_iops"].mean() + + plt.plot( + iops_by_depth.index, + iops_by_depth.values, + marker="o", + linewidth=2, + markersize=6, + label=pattern, + ) + + plt.xlabel("IO Depth") + plt.ylabel("IOPS") + plt.title("IOPS Scaling with IO Depth") + plt.grid(True, alpha=0.3) + plt.legend() + plt.tight_layout() + plt.savefig(output_file, dpi=300, bbox_inches="tight") + plt.close() + + +def plot_latency_distribution(df, output_file): + """Plot latency distribution across different configurations""" + if df.empty: + return + + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6)) + + # Read latency + if "read_lat" in df.columns: + read_lat_data = df[df["read_lat"] > 0]["read_lat"] + if not read_lat_data.empty: + ax1.hist(read_lat_data, bins=20, alpha=0.7, color="blue", edgecolor="black") + ax1.set_xlabel("Read Latency (ms)") + ax1.set_ylabel("Frequency") + ax1.set_title("Read Latency Distribution") + ax1.grid(True, alpha=0.3) + + # Write latency + if "write_lat" in df.columns: + write_lat_data = df[df["write_lat"] > 0]["write_lat"] + if not write_lat_data.empty: + ax2.hist(write_lat_data, bins=20, alpha=0.7, color="red", edgecolor="black") + ax2.set_xlabel("Write Latency (ms)") + ax2.set_ylabel("Frequency") + ax2.set_title("Write Latency Distribution") + ax2.grid(True, alpha=0.3) + + plt.tight_layout() + plt.savefig(output_file, dpi=300, bbox_inches="tight") + plt.close() + + +def plot_pattern_comparison(df, output_file): + """Compare performance across different workload patterns""" + if df.empty or "pattern" not in df.columns: + return + + patterns = df["pattern"].unique() + if len(patterns) <= 1: + return + + # Calculate mean metrics for each pattern + pattern_stats = ( + df.groupby("pattern") + .agg( + { + "total_bw": "mean", + "total_iops": "mean", + "read_lat": "mean", + "write_lat": "mean", + } + ) + .reset_index() + ) + + fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12)) + + # Bandwidth comparison + ax1.bar( + pattern_stats["pattern"], + pattern_stats["total_bw"], + color="skyblue", + edgecolor="navy", + ) + ax1.set_ylabel("Bandwidth (MB/s)") + ax1.set_title("Bandwidth by Workload Pattern") + ax1.tick_params(axis="x", rotation=45) + + # IOPS comparison + ax2.bar( + pattern_stats["pattern"], + pattern_stats["total_iops"], + color="lightgreen", + edgecolor="darkgreen", + ) + ax2.set_ylabel("IOPS") + ax2.set_title("IOPS by Workload Pattern") + ax2.tick_params(axis="x", rotation=45) + + # Read latency comparison + read_lat_data = pattern_stats[pattern_stats["read_lat"] > 0] + if not read_lat_data.empty: + ax3.bar( + read_lat_data["pattern"], + read_lat_data["read_lat"], + color="orange", + edgecolor="darkorange", + ) + ax3.set_ylabel("Read Latency (ms)") + ax3.set_title("Read Latency by Workload Pattern") + ax3.tick_params(axis="x", rotation=45) + + # Write latency comparison + write_lat_data = pattern_stats[pattern_stats["write_lat"] > 0] + if not write_lat_data.empty: + ax4.bar( + write_lat_data["pattern"], + write_lat_data["write_lat"], + color="salmon", + edgecolor="darkred", + ) + ax4.set_ylabel("Write Latency (ms)") + ax4.set_title("Write Latency by Workload Pattern") + ax4.tick_params(axis="x", rotation=45) + + plt.tight_layout() + plt.savefig(output_file, dpi=300, bbox_inches="tight") + plt.close() + + +def main(): + parser = argparse.ArgumentParser( + description="Generate comprehensive performance graphs from fio test results" + ) + parser.add_argument( + "results_dir", type=str, help="Directory containing fio test results" + ) + parser.add_argument( + "--output-dir", type=str, default=".", help="Output directory for graphs" + ) + parser.add_argument( + "--prefix", type=str, default="fio_performance", help="Prefix for output files" + ) + + args = parser.parse_args() + + if not os.path.exists(args.results_dir): + print(f"Error: Results directory '{args.results_dir}' not found.") + sys.exit(1) + + # Create output directory if it doesn't exist + os.makedirs(args.output_dir, exist_ok=True) + + # Load and process results + print("Loading fio test results...") + df = create_performance_matrix(args.results_dir) + + if df is None or df.empty: + print("No valid fio results found.") + sys.exit(1) + + print(f"Found {len(df)} test results") + print("Generating graphs...") + + # Generate different types of graphs + plot_bandwidth_heatmap( + df, os.path.join(args.output_dir, f"{args.prefix}_bandwidth_heatmap.png") + ) + plot_iops_scaling( + df, os.path.join(args.output_dir, f"{args.prefix}_iops_scaling.png") + ) + plot_latency_distribution( + df, os.path.join(args.output_dir, f"{args.prefix}_latency_distribution.png") + ) + plot_pattern_comparison( + df, os.path.join(args.output_dir, f"{args.prefix}_pattern_comparison.png") + ) + + print(f"Graphs saved to {args.output_dir}") + + +if __name__ == "__main__": + main() diff --git a/playbooks/python/workflows/fio-tests/fio-trend-analysis.py b/playbooks/python/workflows/fio-tests/fio-trend-analysis.py new file mode 100755 index 000000000000..0213a8d5cf6c --- /dev/null +++ b/playbooks/python/workflows/fio-tests/fio-trend-analysis.py @@ -0,0 +1,477 @@ +#!/usr/bin/python3 +# SPDX-License-Identifier: copyleft-next-0.3.1 + +# Analyze fio performance trends across different test parameters + +import pandas as pd +import matplotlib.pyplot as plt +import seaborn as sns +import numpy as np +import json +import argparse +import os +import sys +from pathlib import Path + + +def parse_fio_json(file_path): + """Parse fio JSON output and extract detailed metrics""" + try: + with open(file_path, "r") as f: + data = json.load(f) + + if "jobs" not in data: + return None + + job = data["jobs"][0] # Use first job + + # Extract read metrics + read_stats = job.get("read", {}) + read_bw = read_stats.get("bw", 0) / 1024 # Convert to MB/s + read_iops = read_stats.get("iops", 0) + read_lat = read_stats.get("lat_ns", {}) + read_lat_mean = read_lat.get("mean", 0) / 1000000 # Convert to ms + read_lat_stddev = read_lat.get("stddev", 0) / 1000000 + read_lat_p95 = read_lat.get("percentile", {}).get("95.000000", 0) / 1000000 + read_lat_p99 = read_lat.get("percentile", {}).get("99.000000", 0) / 1000000 + + # Extract write metrics + write_stats = job.get("write", {}) + write_bw = write_stats.get("bw", 0) / 1024 # Convert to MB/s + write_iops = write_stats.get("iops", 0) + write_lat = write_stats.get("lat_ns", {}) + write_lat_mean = write_lat.get("mean", 0) / 1000000 # Convert to ms + write_lat_stddev = write_lat.get("stddev", 0) / 1000000 + write_lat_p95 = write_lat.get("percentile", {}).get("95.000000", 0) / 1000000 + write_lat_p99 = write_lat.get("percentile", {}).get("99.000000", 0) / 1000000 + + return { + "read_bw": read_bw, + "read_iops": read_iops, + "read_lat_mean": read_lat_mean, + "read_lat_stddev": read_lat_stddev, + "read_lat_p95": read_lat_p95, + "read_lat_p99": read_lat_p99, + "write_bw": write_bw, + "write_iops": write_iops, + "write_lat_mean": write_lat_mean, + "write_lat_stddev": write_lat_stddev, + "write_lat_p95": write_lat_p95, + "write_lat_p99": write_lat_p99, + "total_bw": read_bw + write_bw, + "total_iops": read_iops + write_iops, + } + except (json.JSONDecodeError, FileNotFoundError, KeyError) as e: + print(f"Error parsing {file_path}: {e}") + return None + + +def extract_test_params(filename): + """Extract test parameters from filename""" + parts = filename.replace(".json", "").replace("results_", "").split("_") + + params = {} + for part in parts: + if part.startswith("bs"): + # Convert block size to numeric KB for sorting + bs_str = part[2:] + if bs_str.endswith("k"): + params["block_size_kb"] = int(bs_str[:-1]) + params["block_size"] = bs_str + else: + params["block_size_kb"] = int(bs_str) + params["block_size"] = bs_str + elif part.startswith("iodepth"): + params["io_depth"] = int(part[7:]) + elif part.startswith("jobs"): + params["num_jobs"] = int(part[4:]) + elif part in [ + "randread", + "randwrite", + "seqread", + "seqwrite", + "mixed_75_25", + "mixed_50_50", + ]: + params["pattern"] = part + + return params + + +def load_all_results(results_dir): + """Load all fio results from directory""" + results = [] + + json_files = list(Path(results_dir).glob("results_*.json")) + if not json_files: + json_files = list(Path(results_dir).glob("results_*.txt")) + + for file_path in json_files: + if file_path.name.endswith(".json"): + metrics = parse_fio_json(file_path) + else: + continue + + if metrics: + params = extract_test_params(file_path.name) + result = {**params, **metrics} + results.append(result) + + return pd.DataFrame(results) if results else None + + +def plot_block_size_trends(df, output_dir): + """Plot performance trends across block sizes""" + if df.empty or "block_size_kb" not in df.columns: + return + + # Group by block size and calculate means + bs_trends = ( + df.groupby("block_size_kb") + .agg( + { + "total_bw": "mean", + "total_iops": "mean", + "read_lat_mean": "mean", + "write_lat_mean": "mean", + } + ) + .reset_index() + ) + + bs_trends = bs_trends.sort_values("block_size_kb") + + fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12)) + + # Bandwidth trend + ax1.plot( + bs_trends["block_size_kb"], + bs_trends["total_bw"], + marker="o", + linewidth=2, + markersize=8, + color="blue", + ) + ax1.set_xlabel("Block Size (KB)") + ax1.set_ylabel("Bandwidth (MB/s)") + ax1.set_title("Bandwidth vs Block Size") + ax1.grid(True, alpha=0.3) + + # IOPS trend + ax2.plot( + bs_trends["block_size_kb"], + bs_trends["total_iops"], + marker="s", + linewidth=2, + markersize=8, + color="green", + ) + ax2.set_xlabel("Block Size (KB)") + ax2.set_ylabel("IOPS") + ax2.set_title("IOPS vs Block Size") + ax2.grid(True, alpha=0.3) + + # Read latency trend + read_lat_data = bs_trends[bs_trends["read_lat_mean"] > 0] + if not read_lat_data.empty: + ax3.plot( + read_lat_data["block_size_kb"], + read_lat_data["read_lat_mean"], + marker="^", + linewidth=2, + markersize=8, + color="orange", + ) + ax3.set_xlabel("Block Size (KB)") + ax3.set_ylabel("Read Latency (ms)") + ax3.set_title("Read Latency vs Block Size") + ax3.grid(True, alpha=0.3) + + # Write latency trend + write_lat_data = bs_trends[bs_trends["write_lat_mean"] > 0] + if not write_lat_data.empty: + ax4.plot( + write_lat_data["block_size_kb"], + write_lat_data["write_lat_mean"], + marker="v", + linewidth=2, + markersize=8, + color="red", + ) + ax4.set_xlabel("Block Size (KB)") + ax4.set_ylabel("Write Latency (ms)") + ax4.set_title("Write Latency vs Block Size") + ax4.grid(True, alpha=0.3) + + plt.tight_layout() + plt.savefig( + os.path.join(output_dir, "block_size_trends.png"), dpi=300, bbox_inches="tight" + ) + plt.close() + + +def plot_io_depth_scaling(df, output_dir): + """Plot performance scaling with IO depth""" + if df.empty or "io_depth" not in df.columns: + return + + patterns = df["pattern"].unique() if "pattern" in df.columns else [None] + + fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12)) + + colors = plt.cm.tab10(np.linspace(0, 1, len(patterns))) + + for pattern, color in zip(patterns, colors): + if pattern is not None: + pattern_df = df[df["pattern"] == pattern] + label = pattern + else: + pattern_df = df + label = "All" + + if pattern_df.empty: + continue + + depth_trends = ( + pattern_df.groupby("io_depth") + .agg( + { + "total_bw": "mean", + "total_iops": "mean", + "read_lat_mean": "mean", + "write_lat_mean": "mean", + } + ) + .reset_index() + ) + + depth_trends = depth_trends.sort_values("io_depth") + + # Bandwidth scaling + ax1.plot( + depth_trends["io_depth"], + depth_trends["total_bw"], + marker="o", + linewidth=2, + markersize=6, + label=label, + color=color, + ) + + # IOPS scaling + ax2.plot( + depth_trends["io_depth"], + depth_trends["total_iops"], + marker="s", + linewidth=2, + markersize=6, + label=label, + color=color, + ) + + # Read latency scaling + read_lat_data = depth_trends[depth_trends["read_lat_mean"] > 0] + if not read_lat_data.empty: + ax3.plot( + read_lat_data["io_depth"], + read_lat_data["read_lat_mean"], + marker="^", + linewidth=2, + markersize=6, + label=label, + color=color, + ) + + # Write latency scaling + write_lat_data = depth_trends[depth_trends["write_lat_mean"] > 0] + if not write_lat_data.empty: + ax4.plot( + write_lat_data["io_depth"], + write_lat_data["write_lat_mean"], + marker="v", + linewidth=2, + markersize=6, + label=label, + color=color, + ) + + ax1.set_xlabel("IO Depth") + ax1.set_ylabel("Bandwidth (MB/s)") + ax1.set_title("Bandwidth Scaling with IO Depth") + ax1.grid(True, alpha=0.3) + ax1.legend() + + ax2.set_xlabel("IO Depth") + ax2.set_ylabel("IOPS") + ax2.set_title("IOPS Scaling with IO Depth") + ax2.grid(True, alpha=0.3) + ax2.legend() + + ax3.set_xlabel("IO Depth") + ax3.set_ylabel("Read Latency (ms)") + ax3.set_title("Read Latency vs IO Depth") + ax3.grid(True, alpha=0.3) + ax3.legend() + + ax4.set_xlabel("IO Depth") + ax4.set_ylabel("Write Latency (ms)") + ax4.set_title("Write Latency vs IO Depth") + ax4.grid(True, alpha=0.3) + ax4.legend() + + plt.tight_layout() + plt.savefig( + os.path.join(output_dir, "io_depth_scaling.png"), dpi=300, bbox_inches="tight" + ) + plt.close() + + +def plot_latency_percentiles(df, output_dir): + """Plot latency percentile analysis""" + if df.empty: + return + + latency_cols = [ + "read_lat_mean", + "read_lat_p95", + "read_lat_p99", + "write_lat_mean", + "write_lat_p95", + "write_lat_p99", + ] + + # Filter out zero latencies + lat_df = df[latency_cols] + lat_df = lat_df[(lat_df > 0).any(axis=1)] + + if lat_df.empty: + return + + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6)) + + # Read latency percentiles + read_cols = [col for col in latency_cols if col.startswith("read_")] + if any(col in lat_df.columns for col in read_cols): + read_data = lat_df[read_cols].dropna() + if not read_data.empty: + bp1 = ax1.boxplot( + [read_data[col] for col in read_cols], + labels=["Mean", "P95", "P99"], + patch_artist=True, + ) + for patch in bp1["boxes"]: + patch.set_facecolor("lightblue") + ax1.set_ylabel("Latency (ms)") + ax1.set_title("Read Latency Distribution") + ax1.grid(True, alpha=0.3) + + # Write latency percentiles + write_cols = [col for col in latency_cols if col.startswith("write_")] + if any(col in lat_df.columns for col in write_cols): + write_data = lat_df[write_cols].dropna() + if not write_data.empty: + bp2 = ax2.boxplot( + [write_data[col] for col in write_cols], + labels=["Mean", "P95", "P99"], + patch_artist=True, + ) + for patch in bp2["boxes"]: + patch.set_facecolor("lightcoral") + ax2.set_ylabel("Latency (ms)") + ax2.set_title("Write Latency Distribution") + ax2.grid(True, alpha=0.3) + + plt.tight_layout() + plt.savefig( + os.path.join(output_dir, "latency_percentiles.png"), + dpi=300, + bbox_inches="tight", + ) + plt.close() + + +def create_correlation_heatmap(df, output_dir): + """Create correlation heatmap of performance metrics""" + if df.empty: + return + + # Select numeric columns for correlation + numeric_cols = [ + "block_size_kb", + "io_depth", + "num_jobs", + "total_bw", + "total_iops", + "read_lat_mean", + "write_lat_mean", + ] + + corr_df = df[numeric_cols].dropna() + + if corr_df.empty: + return + + correlation_matrix = corr_df.corr() + + plt.figure(figsize=(10, 8)) + sns.heatmap( + correlation_matrix, + annot=True, + cmap="coolwarm", + center=0, + square=True, + linewidths=0.5, + cbar_kws={"shrink": 0.8}, + ) + plt.title("Performance Metrics Correlation Matrix") + plt.tight_layout() + plt.savefig( + os.path.join(output_dir, "correlation_heatmap.png"), + dpi=300, + bbox_inches="tight", + ) + plt.close() + + +def main(): + parser = argparse.ArgumentParser( + description="Analyze fio performance trends and patterns" + ) + parser.add_argument( + "results_dir", type=str, help="Directory containing fio test results" + ) + parser.add_argument( + "--output-dir", + type=str, + default=".", + help="Output directory for analysis graphs", + ) + + args = parser.parse_args() + + if not os.path.exists(args.results_dir): + print(f"Error: Results directory '{args.results_dir}' not found.") + sys.exit(1) + + os.makedirs(args.output_dir, exist_ok=True) + + print("Loading fio test results...") + df = load_all_results(args.results_dir) + + if df is None or df.empty: + print("No valid fio results found.") + sys.exit(1) + + print(f"Analyzing {len(df)} test results...") + + # Generate trend analysis + plot_block_size_trends(df, args.output_dir) + plot_io_depth_scaling(df, args.output_dir) + plot_latency_percentiles(df, args.output_dir) + create_correlation_heatmap(df, args.output_dir) + + print(f"Trend analysis saved to {args.output_dir}") + + +if __name__ == "__main__": + main() diff --git a/playbooks/roles/fio-tests/defaults/main.yml b/playbooks/roles/fio-tests/defaults/main.yml new file mode 100644 index 000000000000..a53184406d2d --- /dev/null +++ b/playbooks/roles/fio-tests/defaults/main.yml @@ -0,0 +1,48 @@ +--- +# fio-tests role defaults + +fio_tests_results_dir: "/data/fio-tests" +fio_tests_binary: "/usr/bin/fio" + +# These variables are populated from kconfig via extra_vars.yaml +# fio_tests_device: "" +# fio_tests_runtime: "" +# fio_tests_ramp_time: "" +# fio_tests_ioengine: "" +# fio_tests_direct: "" +# fio_tests_fsync_on_close: "" +# fio_tests_log_avg_msec: "" + +# Test configuration booleans (populated from kconfig) +# fio_tests_bs_4k: false +# fio_tests_bs_8k: false +# fio_tests_bs_16k: false +# fio_tests_bs_32k: false +# fio_tests_bs_64k: false +# fio_tests_bs_128k: false + +# fio_tests_iodepth_1: false +# fio_tests_iodepth_4: false +# fio_tests_iodepth_8: false +# fio_tests_iodepth_16: false +# fio_tests_iodepth_32: false +# fio_tests_iodepth_64: false + +# fio_tests_numjobs_1: false +# fio_tests_numjobs_2: false +# fio_tests_numjobs_4: false +# fio_tests_numjobs_8: false +# fio_tests_numjobs_16: false + +# fio_tests_pattern_rand_read: false +# fio_tests_pattern_rand_write: false +# fio_tests_pattern_seq_read: false +# fio_tests_pattern_seq_write: false +# fio_tests_pattern_mixed_75_25: false +# fio_tests_pattern_mixed_50_50: false + +# Derived configuration lists +fio_tests_block_sizes: [] +fio_tests_io_depths: [] +fio_tests_num_jobs: [] +fio_tests_patterns: [] diff --git a/playbooks/roles/fio-tests/tasks/install-deps/debian/main.yml b/playbooks/roles/fio-tests/tasks/install-deps/debian/main.yml new file mode 100644 index 000000000000..327ab493ba81 --- /dev/null +++ b/playbooks/roles/fio-tests/tasks/install-deps/debian/main.yml @@ -0,0 +1,20 @@ +--- +- name: Install fio for Debian/Ubuntu + package: + name: + - fio + - python3 + state: present + become: yes + +- name: Install graphing dependencies for Debian/Ubuntu + package: + name: + - python3-pip + - python3-pandas + - python3-matplotlib + - python3-seaborn + - python3-numpy + state: present + become: yes + when: fio_tests_enable_graphing is defined and fio_tests_enable_graphing diff --git a/playbooks/roles/fio-tests/tasks/install-deps/main.yml b/playbooks/roles/fio-tests/tasks/install-deps/main.yml new file mode 100644 index 000000000000..c29e6f751fce --- /dev/null +++ b/playbooks/roles/fio-tests/tasks/install-deps/main.yml @@ -0,0 +1,3 @@ +--- +- name: Include distribution-specific installation tasks + include_tasks: "{{ ansible_os_family | lower }}/main.yml" diff --git a/playbooks/roles/fio-tests/tasks/install-deps/redhat/main.yml b/playbooks/roles/fio-tests/tasks/install-deps/redhat/main.yml new file mode 100644 index 000000000000..7decdcf84fc0 --- /dev/null +++ b/playbooks/roles/fio-tests/tasks/install-deps/redhat/main.yml @@ -0,0 +1,20 @@ +--- +- name: Install fio for RHEL/CentOS/Fedora + package: + name: + - fio + - python3 + state: present + become: yes + +- name: Install graphing dependencies for RHEL/CentOS/Fedora + package: + name: + - python3-pip + - python3-pandas + - python3-matplotlib + - python3-seaborn + - python3-numpy + state: present + become: yes + when: fio_tests_enable_graphing is defined and fio_tests_enable_graphing diff --git a/playbooks/roles/fio-tests/tasks/install-deps/suse/main.yml b/playbooks/roles/fio-tests/tasks/install-deps/suse/main.yml new file mode 100644 index 000000000000..8bd5cf6261b2 --- /dev/null +++ b/playbooks/roles/fio-tests/tasks/install-deps/suse/main.yml @@ -0,0 +1,20 @@ +--- +- name: Install fio for SUSE + package: + name: + - fio + - python3 + state: present + become: yes + +- name: Install graphing dependencies for SUSE + package: + name: + - python3-pip + - python3-pandas + - python3-matplotlib + - python3-seaborn + - python3-numpy + state: present + become: yes + when: fio_tests_enable_graphing is defined and fio_tests_enable_graphing diff --git a/playbooks/roles/fio-tests/tasks/main.yaml b/playbooks/roles/fio-tests/tasks/main.yaml new file mode 100644 index 000000000000..5cd25ba7ef7a --- /dev/null +++ b/playbooks/roles/fio-tests/tasks/main.yaml @@ -0,0 +1,170 @@ +--- +- name: Set derived configuration variables + set_fact: + fio_tests_block_sizes: >- + {{ + (['4k'] if fio_tests_bs_4k else []) + + (['8k'] if fio_tests_bs_8k else []) + + (['16k'] if fio_tests_bs_16k else []) + + (['32k'] if fio_tests_bs_32k else []) + + (['64k'] if fio_tests_bs_64k else []) + + (['128k'] if fio_tests_bs_128k else []) + }} + fio_tests_io_depths: >- + {{ + ([1] if fio_tests_iodepth_1 else []) + + ([4] if fio_tests_iodepth_4 else []) + + ([8] if fio_tests_iodepth_8 else []) + + ([16] if fio_tests_iodepth_16 else []) + + ([32] if fio_tests_iodepth_32 else []) + + ([64] if fio_tests_iodepth_64 else []) + }} + fio_tests_num_jobs: >- + {{ + ([1] if fio_tests_numjobs_1 else []) + + ([2] if fio_tests_numjobs_2 else []) + + ([4] if fio_tests_numjobs_4 else []) + + ([8] if fio_tests_numjobs_8 else []) + + ([16] if fio_tests_numjobs_16 else []) + }} + fio_tests_patterns: >- + {{ + ([{'name': 'randread', 'rw': 'randread', 'rwmixread': 100}] if fio_tests_pattern_rand_read else []) + + ([{'name': 'randwrite', 'rw': 'randwrite', 'rwmixread': 0}] if fio_tests_pattern_rand_write else []) + + ([{'name': 'seqread', 'rw': 'read', 'rwmixread': 100}] if fio_tests_pattern_seq_read else []) + + ([{'name': 'seqwrite', 'rw': 'write', 'rwmixread': 0}] if fio_tests_pattern_seq_write else []) + + ([{'name': 'mixed_75_25', 'rw': 'randrw', 'rwmixread': 75}] if fio_tests_pattern_mixed_75_25 else []) + + ([{'name': 'mixed_50_50', 'rw': 'randrw', 'rwmixread': 50}] if fio_tests_pattern_mixed_50_50 else []) + }} + +- name: Calculate total test combinations and timeout + set_fact: + fio_tests_total_combinations: "{{ fio_tests_block_sizes | length * fio_tests_io_depths | length * fio_tests_num_jobs | length * fio_tests_patterns | length }}" + fio_test_time_per_job: "{{ (fio_tests_runtime | int) + (fio_tests_ramp_time | int) }}" + +- name: Calculate async timeout with safety margin + set_fact: + # Each test runs twice (JSON + normal output), add 60s per test for overhead, then add 30% safety margin + fio_tests_async_timeout: "{{ ((fio_tests_total_combinations | int * fio_test_time_per_job | int * 2) + (fio_tests_total_combinations | int * 60) * 1.3) | int }}" + +- name: Display test configuration + debug: + msg: | + FIO Test Configuration: + - Total test combinations: {{ fio_tests_total_combinations }} + - Runtime per test: {{ fio_tests_runtime }}s + - Ramp time per test: {{ fio_tests_ramp_time }}s + - Estimated total time: {{ (fio_tests_total_combinations | int * fio_test_time_per_job | int * 2 / 60) | round(1) }} minutes + - Async timeout: {{ (fio_tests_async_timeout | int / 60) | round(1) }} minutes + {% if fio_tests_device == '/dev/null' %} + - Note: Using /dev/null - fsync_on_close and direct IO disabled automatically + {% endif %} + +- name: Install fio and dependencies + include_tasks: install-deps/main.yml + +- name: Create results directory + file: + path: "{{ fio_tests_results_dir }}" + state: directory + mode: '0755' + become: yes + +- name: Create fio job files directory + file: + path: "{{ fio_tests_results_dir }}/jobs" + state: directory + mode: '0755' + become: yes + +- name: Generate fio job files + template: + src: fio-job.ini.j2 + dest: "{{ fio_tests_results_dir }}/jobs/{{ item.0.name }}_bs{{ item.1 }}_iodepth{{ item.2 }}_jobs{{ item.3 }}.ini" + mode: '0644' + vars: + pattern: "{{ item.0 }}" + block_size: "{{ item.1 }}" + io_depth: "{{ item.2 }}" + num_jobs: "{{ item.3 }}" + with_nested: + - "{{ fio_tests_patterns }}" + - "{{ fio_tests_block_sizes }}" + - "{{ fio_tests_io_depths }}" + - "{{ fio_tests_num_jobs }}" + become: yes + +- name: Run fio tests + shell: | + cd {{ fio_tests_results_dir }}/jobs + for job_file in *.ini; do + echo "Running test: $job_file" + # Run test with both JSON and normal output + fio "$job_file" --output="{{ fio_tests_results_dir }}/results_${job_file%.ini}.json" \ + --output-format=json \ + --write_bw_log="{{ fio_tests_results_dir }}/bw_${job_file%.ini}" \ + --write_iops_log="{{ fio_tests_results_dir }}/iops_${job_file%.ini}" \ + --write_lat_log="{{ fio_tests_results_dir }}/lat_${job_file%.ini}" \ + --log_avg_msec={{ fio_tests_log_avg_msec }} + # Also create text output for compatibility + fio "$job_file" --output="{{ fio_tests_results_dir }}/results_${job_file%.ini}.txt" \ + --output-format=normal + done + become: yes + async: "{{ fio_tests_async_timeout | default(7200) }}" + poll: 30 + +- name: Remove old fio-tests results archive if it exists + file: + path: "{{ fio_tests_results_dir }}/fio-tests-results-{{ inventory_hostname }}.tar.gz" + state: absent + tags: [ 'results' ] + become: yes + +- name: Archive fio-tests results directory on remote host + become: yes + shell: | + cd {{ fio_tests_results_dir }} + tar czf /tmp/fio-tests-results-{{ inventory_hostname }}.tar.gz \ + --exclude='*.tar.gz' \ + results_*.json results_*.txt *.log jobs/ 2>/dev/null || true + mv /tmp/fio-tests-results-{{ inventory_hostname }}.tar.gz {{ fio_tests_results_dir }}/ || true + tags: [ 'results' ] + +- name: Remove previously fetched fio-tests results archive if it exists + become: no + delegate_to: localhost + ansible.builtin.file: + path: "{{ item }}" + state: absent + tags: [ 'results' ] + with_items: + - "{{ topdir_path }}/workflows/fio-tests/results/{{ inventory_hostname }}/fio-tests-results-{{ inventory_hostname }}.tar.gz" + - "{{ topdir_path }}/workflows/fio-tests/results/{{ inventory_hostname }}/fio-tests-results-{{ inventory_hostname }}" + +- name: Copy fio-tests results + tags: [ 'results' ] + become: yes + ansible.builtin.fetch: + src: "{{ fio_tests_results_dir }}/fio-tests-results-{{ inventory_hostname }}.tar.gz" + dest: "{{ topdir_path }}/workflows/fio-tests/results/{{ inventory_hostname }}/" + flat: yes + +- name: Ensure local fio-tests results extraction directory exists + become: no + delegate_to: localhost + ansible.builtin.file: + path: "{{ topdir_path }}/workflows/fio-tests/results/{{ inventory_hostname }}/fio-tests-results-{{ inventory_hostname }}" + state: directory + mode: '0755' + recurse: yes + tags: [ 'results' ] + +- name: Extract fio-tests results archive locally + become: no + delegate_to: localhost + ansible.builtin.unarchive: + src: "{{ topdir_path }}/workflows/fio-tests/results/{{ inventory_hostname }}/fio-tests-results-{{ inventory_hostname }}.tar.gz" + dest: "{{ topdir_path }}/workflows/fio-tests/results/{{ inventory_hostname }}/fio-tests-results-{{ inventory_hostname }}" + remote_src: no + tags: [ 'results' ] diff --git a/playbooks/roles/fio-tests/templates/fio-job.ini.j2 b/playbooks/roles/fio-tests/templates/fio-job.ini.j2 new file mode 100644 index 000000000000..49727d461e36 --- /dev/null +++ b/playbooks/roles/fio-tests/templates/fio-job.ini.j2 @@ -0,0 +1,29 @@ +[global] +ioengine={{ fio_tests_ioengine }} +{% if fio_tests_device == '/dev/null' %} +direct=0 +{% else %} +direct={{ fio_tests_direct | int }} +{% endif %} +{% if fio_tests_device == '/dev/null' %} +fsync_on_close=0 +{% else %} +fsync_on_close={{ fio_tests_fsync_on_close | int }} +{% endif %} +group_reporting=1 +time_based=1 +runtime={{ fio_tests_runtime }} +ramp_time={{ fio_tests_ramp_time }} + +[{{ pattern.name }}_bs{{ block_size }}_iodepth{{ io_depth }}_jobs{{ num_jobs }}] +filename={{ fio_tests_device }} +{% if fio_tests_device == '/dev/null' %} +size=1G +{% endif %} +rw={{ pattern.rw }} +{% if pattern.rwmixread is defined and pattern.rw in ['randrw', 'rw'] %} +rwmixread={{ pattern.rwmixread }} +{% endif %} +bs={{ block_size }} +iodepth={{ io_depth }} +numjobs={{ num_jobs }} diff --git a/playbooks/roles/gen_hosts/tasks/main.yml b/playbooks/roles/gen_hosts/tasks/main.yml index e36d71fc5535..febe81ff2a39 100644 --- a/playbooks/roles/gen_hosts/tasks/main.yml +++ b/playbooks/roles/gen_hosts/tasks/main.yml @@ -324,6 +324,19 @@ - kdevops_workflow_enable_sysbench - ansible_hosts_template.stat.exists +- name: Generate the Ansible hosts file for a dedicated fio-tests setup + tags: [ 'hosts' ] + template: + src: "{{ kdevops_hosts_template }}" + dest: "{{ ansible_cfg_inventory }}" + force: yes + trim_blocks: True + lstrip_blocks: True + when: + - kdevops_workflows_dedicated_workflow + - kdevops_workflow_enable_fio_tests + - ansible_hosts_template.stat.exists + - name: Infer enabled mmtests test types set_fact: mmtests_enabled_test_types: >- diff --git a/playbooks/roles/gen_hosts/templates/fio-tests.j2 b/playbooks/roles/gen_hosts/templates/fio-tests.j2 new file mode 100644 index 000000000000..75bc0c53569e --- /dev/null +++ b/playbooks/roles/gen_hosts/templates/fio-tests.j2 @@ -0,0 +1,28 @@ +[all] +localhost ansible_connection=local +{{ kdevops_host_prefix }}-fio-tests +{% if kdevops_baseline_and_dev %} +{{ kdevops_host_prefix }}-fio-tests-dev +{% endif %} +[all:vars] +ansible_python_interpreter = "{{ kdevops_python_interpreter }}" +[baseline] +{{ kdevops_host_prefix }}-fio-tests +[baseline:vars] +ansible_python_interpreter = "{{ kdevops_python_interpreter }}" +[dev] +{% if kdevops_baseline_and_dev %} +{{ kdevops_host_prefix }}-fio-tests-dev +{% endif %} +[dev:vars] +ansible_python_interpreter = "{{ kdevops_python_interpreter }}" +[fio_tests] +{{ kdevops_host_prefix }}-fio-tests +{% if kdevops_baseline_and_dev %} +{{ kdevops_host_prefix }}-fio-tests-dev +{% endif %} +[fio_tests:vars] +ansible_python_interpreter = "{{ kdevops_python_interpreter }}" +[service] +[service:vars] +ansible_python_interpreter = "{{ kdevops_python_interpreter }}" diff --git a/playbooks/roles/gen_hosts/templates/hosts.j2 b/playbooks/roles/gen_hosts/templates/hosts.j2 index f89fae48c349..6d83191d93ce 100644 --- a/playbooks/roles/gen_hosts/templates/hosts.j2 +++ b/playbooks/roles/gen_hosts/templates/hosts.j2 @@ -39,6 +39,44 @@ ansible_python_interpreter = "{{ kdevops_python_interpreter }}" [reboot-limit:vars] ansible_python_interpreter = "{{ kdevops_python_interpreter }}" +{% elif kdevops_workflow_enable_fio_tests %} +[all] +localhost ansible_connection=local +{{ kdevops_host_prefix }}-fio-tests +{% if kdevops_baseline_and_dev %} +{{ kdevops_host_prefix }}-fio-tests-dev +{% endif %} + +[all:vars] +ansible_python_interpreter = "{{ kdevops_python_interpreter }}" + +[baseline] +{{ kdevops_host_prefix }}-fio-tests + +[baseline:vars] +ansible_python_interpreter = "{{ kdevops_python_interpreter }}" + +{% if kdevops_baseline_and_dev %} +[dev] +{{ kdevops_host_prefix }}-fio-tests-dev + +[dev:vars] +ansible_python_interpreter = "{{ kdevops_python_interpreter }}" + +{% endif %} +[fio_tests] +{{ kdevops_host_prefix }}-fio-tests +{% if kdevops_baseline_and_dev %} +{{ kdevops_host_prefix }}-fio-tests-dev +{% endif %} + +[fio_tests:vars] +ansible_python_interpreter = "{{ kdevops_python_interpreter }}" + +[service] + +[service:vars] +ansible_python_interpreter = "{{ kdevops_python_interpreter }}" {% else %} [all] localhost ansible_connection=local diff --git a/playbooks/roles/gen_nodes/tasks/main.yml b/playbooks/roles/gen_nodes/tasks/main.yml index 8c1772546800..0451761fbf30 100644 --- a/playbooks/roles/gen_nodes/tasks/main.yml +++ b/playbooks/roles/gen_nodes/tasks/main.yml @@ -508,6 +508,38 @@ - kdevops_workflow_enable_sysbench - ansible_nodes_template.stat.exists +- name: Generate the fio-tests kdevops nodes file using {{ kdevops_nodes_template }} as jinja2 source template + tags: [ 'hosts' ] + vars: + node_template: "{{ kdevops_nodes_template | basename }}" + nodes: "{{ [kdevops_host_prefix + '-fio-tests'] }}" + all_generic_nodes: "{{ [kdevops_host_prefix + '-fio-tests'] }}" + template: + src: "{{ node_template }}" + dest: "{{ topdir_path }}/{{ kdevops_nodes }}" + force: yes + when: + - kdevops_workflows_dedicated_workflow + - kdevops_workflow_enable_fio_tests + - ansible_nodes_template.stat.exists + - not kdevops_baseline_and_dev + +- name: Generate the fio-tests kdevops nodes file with dev hosts using {{ kdevops_nodes_template }} as jinja2 source template + tags: [ 'hosts' ] + vars: + node_template: "{{ kdevops_nodes_template | basename }}" + nodes: "{{ [kdevops_host_prefix + '-fio-tests', kdevops_host_prefix + '-fio-tests-dev'] }}" + all_generic_nodes: "{{ [kdevops_host_prefix + '-fio-tests', kdevops_host_prefix + '-fio-tests-dev'] }}" + template: + src: "{{ node_template }}" + dest: "{{ topdir_path }}/{{ kdevops_nodes }}" + force: yes + when: + - kdevops_workflows_dedicated_workflow + - kdevops_workflow_enable_fio_tests + - ansible_nodes_template.stat.exists + - kdevops_baseline_and_dev + - name: Infer enabled mmtests test section types set_fact: mmtests_enabled_test_types: >- diff --git a/workflows/Makefile b/workflows/Makefile index ef3cc2d1f0ee..b5f54ff5f57b 100644 --- a/workflows/Makefile +++ b/workflows/Makefile @@ -62,6 +62,10 @@ ifeq (y,$(CONFIG_KDEVOPS_WORKFLOW_ENABLE_SSD_STEADY_STATE)) include workflows/steady_state/Makefile endif # CONFIG_KDEVOPS_WORKFLOW_ENABLE_SSD_STEADY_STATE == y +ifeq (y,$(CONFIG_KDEVOPS_WORKFLOW_ENABLE_FIO_TESTS)) +include workflows/fio-tests/Makefile +endif # CONFIG_KDEVOPS_WORKFLOW_ENABLE_FIO_TESTS == y + ANSIBLE_EXTRA_ARGS += $(WORKFLOW_ARGS) ANSIBLE_EXTRA_ARGS_SEPARATED += $(WORKFLOW_ARGS_SEPARATED) ANSIBLE_EXTRA_ARGS_DIRECT += $(WORKFLOW_ARGS_DIRECT) diff --git a/workflows/fio-tests/Kconfig b/workflows/fio-tests/Kconfig new file mode 100644 index 000000000000..98e7ac637ac5 --- /dev/null +++ b/workflows/fio-tests/Kconfig @@ -0,0 +1,420 @@ +choice + prompt "What type of fio testing do you want to run?" + default FIO_TESTS_PERFORMANCE_ANALYSIS + +config FIO_TESTS_PERFORMANCE_ANALYSIS + bool "Performance analysis tests" + select KDEVOPS_BASELINE_AND_DEV + output yaml + help + Run comprehensive performance analysis tests across different + configurations to understand storage device characteristics. + This includes testing various block sizes, IO depths, and + thread counts to generate performance profiles. + + A/B testing is enabled to compare performance across different + configurations using baseline and development nodes. + +config FIO_TESTS_LATENCY_ANALYSIS + bool "Latency analysis tests" + select KDEVOPS_BASELINE_AND_DEV + output yaml + help + Focus on latency characteristics and tail latency analysis + across different workload patterns. This helps identify + performance outliers and latency distribution patterns. + +config FIO_TESTS_THROUGHPUT_SCALING + bool "Throughput scaling tests" + select KDEVOPS_BASELINE_AND_DEV + output yaml + help + Test how throughput scales with increasing IO depth and + thread count. Useful for understanding the optimal + configuration for maximum throughput. + +config FIO_TESTS_MIXED_WORKLOADS + bool "Mixed workload tests" + select KDEVOPS_BASELINE_AND_DEV + output yaml + help + Test mixed read/write workloads with various ratios to + simulate real-world application patterns. + +endchoice + +config FIO_TESTS_DEVICE + string "Device to use for fio testing" + output yaml + default "/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops2" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_NVME + default "/dev/disk/by-id/virtio-kdevops2" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_VIRTIO + default "/dev/disk/by-id/ata-QEMU_HARDDISK_kdevops2" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_IDE + default "/dev/sdc" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_SCSI + default "/dev/nvme2n1" if TERRAFORM_AWS_INSTANCE_M5AD_2XLARGE + default "/dev/nvme2n1" if TERRAFORM_AWS_INSTANCE_M5AD_4XLARGE + default "/dev/nvme1n1" if TERRAFORM_GCE + default "/dev/sdd" if TERRAFORM_AZURE + default TERRAFORM_OCI_SPARSE_VOLUME_DEVICE_FILE_NAME if TERRAFORM_OCI + help + The block device to use for fio testing. For CI/testing + purposes, /dev/null can be used as a simple target. + +config FIO_TESTS_QUICK_SET_BY_CLI + bool + output yaml + default $(shell, scripts/check-cli-set-var.sh FIO_QUICK) + +choice + prompt "FIO test runtime duration" + default FIO_TESTS_RUNTIME_DEFAULT if !FIO_TESTS_QUICK_SET_BY_CLI + default FIO_TESTS_RUNTIME_QUICK if FIO_TESTS_QUICK_SET_BY_CLI + +config FIO_TESTS_RUNTIME_DEFAULT + bool "Default runtime (60 seconds)" + help + Use default runtime of 60 seconds per job for comprehensive + performance testing. + +config FIO_TESTS_RUNTIME_QUICK + bool "Quick runtime (10 seconds)" + help + Use quick runtime of 10 seconds per job for rapid testing + or CI environments. + +config FIO_TESTS_RUNTIME_CUSTOM_HIGH + bool "Custom high runtime (300 seconds)" + help + Use extended runtime of 300 seconds per job for thorough + long-duration testing. + +config FIO_TESTS_RUNTIME_CUSTOM_LOW + bool "Custom low runtime (5 seconds)" + help + Use minimal runtime of 5 seconds per job for very quick + smoke testing. + +endchoice + +config FIO_TESTS_RUNTIME + string "Test runtime per job" + output yaml + default "60" if FIO_TESTS_RUNTIME_DEFAULT + default "10" if FIO_TESTS_RUNTIME_QUICK + default "300" if FIO_TESTS_RUNTIME_CUSTOM_HIGH + default "5" if FIO_TESTS_RUNTIME_CUSTOM_LOW + help + Runtime in seconds for each fio job. + +config FIO_TESTS_RAMP_TIME + string "Ramp time before measurements" + output yaml + default "10" if FIO_TESTS_RUNTIME_DEFAULT + default "2" if FIO_TESTS_RUNTIME_QUICK + default "30" if FIO_TESTS_RUNTIME_CUSTOM_HIGH + default "1" if FIO_TESTS_RUNTIME_CUSTOM_LOW + help + Time in seconds to ramp up before starting measurements. + This allows the workload to stabilize before collecting + performance data. + +menu "Block size configuration" + +config FIO_TESTS_BS_4K + bool "4K block size tests" + output yaml + default y + help + Enable 4K block size testing. This is the most common + block size for many applications. + +config FIO_TESTS_BS_8K + bool "8K block size tests" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Enable 8K block size testing. + +config FIO_TESTS_BS_16K + bool "16K block size tests" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Enable 16K block size testing. + +config FIO_TESTS_BS_32K + bool "32K block size tests" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Enable 32K block size testing. + +config FIO_TESTS_BS_64K + bool "64K block size tests" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Enable 64K block size testing. + +config FIO_TESTS_BS_128K + bool "128K block size tests" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Enable 128K block size testing. + +endmenu + +menu "IO depth configuration" + +config FIO_TESTS_IODEPTH_1 + bool "IO depth 1" + output yaml + default y + help + Test with IO depth of 1 (synchronous IO). + +config FIO_TESTS_IODEPTH_4 + bool "IO depth 4" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Test with IO depth of 4. + +config FIO_TESTS_IODEPTH_8 + bool "IO depth 8" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Test with IO depth of 8. + +config FIO_TESTS_IODEPTH_16 + bool "IO depth 16" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Test with IO depth of 16. + +config FIO_TESTS_IODEPTH_32 + bool "IO depth 32" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Test with IO depth of 32. + +config FIO_TESTS_IODEPTH_64 + bool "IO depth 64" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Test with IO depth of 64. + +endmenu + +menu "Thread/job configuration" + +config FIO_TESTS_NUMJOBS_1 + bool "Single job" + output yaml + default y + help + Test with a single fio job. + +config FIO_TESTS_NUMJOBS_2 + bool "2 jobs" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Test with 2 concurrent fio jobs. + +config FIO_TESTS_NUMJOBS_4 + bool "4 jobs" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Test with 4 concurrent fio jobs. + +config FIO_TESTS_NUMJOBS_8 + bool "8 jobs" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Test with 8 concurrent fio jobs. + +config FIO_TESTS_NUMJOBS_16 + bool "16 jobs" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Test with 16 concurrent fio jobs. + +endmenu + +menu "Workload patterns" + +config FIO_TESTS_PATTERN_RAND_READ + bool "Random read" + output yaml + default y + help + Enable random read workload testing. + +config FIO_TESTS_PATTERN_RAND_WRITE + bool "Random write" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Enable random write workload testing. + +config FIO_TESTS_PATTERN_SEQ_READ + bool "Sequential read" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Enable sequential read workload testing. + +config FIO_TESTS_PATTERN_SEQ_WRITE + bool "Sequential write" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Enable sequential write workload testing. + +config FIO_TESTS_PATTERN_MIXED_75_25 + bool "Mixed 75% read / 25% write" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Enable mixed workload with 75% reads and 25% writes. + +config FIO_TESTS_PATTERN_MIXED_50_50 + bool "Mixed 50% read / 50% write" + output yaml + default y if !FIO_TESTS_QUICK_SET_BY_CLI + default n if FIO_TESTS_QUICK_SET_BY_CLI + help + Enable mixed workload with 50% reads and 50% writes. + +endmenu + +menu "Advanced configuration" + +config FIO_TESTS_IOENGINE + string "IO engine to use" + output yaml + default "io_uring" + help + The fio IO engine to use. Options include: + - io_uring: Linux native async IO (recommended) + - libaio: Linux native async IO (legacy) + - psync: POSIX sync IO + - sync: Basic sync IO + +config FIO_TESTS_DIRECT + bool "Use direct IO" + output yaml + default y + help + Enable direct IO to bypass the page cache. This provides + more accurate storage device performance measurements. + +config FIO_TESTS_FSYNC_ON_CLOSE + bool "Fsync on close" + output yaml + default y + help + Call fsync() before closing files to ensure data is + written to storage. + + Note: This is automatically disabled when using /dev/null + as the test device since /dev/null doesn't support fsync. + +config FIO_TESTS_RESULTS_DIR + string "Results directory" + output yaml + default "/data/fio-tests" + help + Directory where test results and logs will be stored. + This should be on a different filesystem than the test + target to avoid interference. + +config FIO_TESTS_LOG_AVG_MSEC + int "Log averaging interval (msec)" + output yaml + default 1000 + help + Interval in milliseconds for averaging performance logs. + Lower values provide more granular data but larger log files. + +config FIO_TESTS_ENABLE_GRAPHING + bool "Enable graphing and visualization" + output yaml + default y + help + Enable comprehensive graphing and visualization capabilities + for fio test results. This installs Python dependencies + including matplotlib, pandas, and seaborn for generating + performance analysis graphs. + + Graphing features include: + - Performance heatmaps across block sizes and IO depths + - IOPS scaling analysis + - Latency distribution charts + - Workload pattern comparisons + - Baseline vs development A/B comparisons + - Trend analysis and correlation matrices + +if FIO_TESTS_ENABLE_GRAPHING + +config FIO_TESTS_GRAPH_FORMAT + string "Graph output format" + output yaml + default "png" + help + Output format for generated graphs. Common formats include: + - png: Portable Network Graphics (recommended) + - svg: Scalable Vector Graphics + - pdf: Portable Document Format + - jpg: JPEG format + +config FIO_TESTS_GRAPH_DPI + int "Graph resolution (DPI)" + output yaml + default 300 + help + Resolution for generated graphs in dots per inch. + Higher values produce better quality but larger files. + Common values: 150 (screen), 300 (print), 600 (high quality). + +config FIO_TESTS_GRAPH_THEME + string "Matplotlib theme" + output yaml + default "default" + help + Matplotlib style theme for graphs. Options include: + - default: Default matplotlib style + - seaborn: Clean seaborn style + - dark_background: Dark theme + - ggplot: R ggplot2 style + - bmh: Bayesian Methods for Hackers style + +endif # FIO_TESTS_ENABLE_GRAPHING + +endmenu diff --git a/workflows/fio-tests/Makefile b/workflows/fio-tests/Makefile new file mode 100644 index 000000000000..cc641606e98c --- /dev/null +++ b/workflows/fio-tests/Makefile @@ -0,0 +1,68 @@ +ifeq (y,$(CONFIG_WORKFLOWS_DEDICATED_WORKFLOW)) +export KDEVOPS_HOSTS_TEMPLATE := fio-tests.j2 +endif + +fio-tests: + $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \ + --connection=ssh \ + --inventory hosts \ + --extra-vars=@$(KDEVOPS_EXTRA_VARS) \ + playbooks/fio-tests.yml \ + $(LIMIT_HOSTS) + +fio-tests-baseline: + $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \ + --connection=ssh \ + --inventory hosts \ + --extra-vars=@$(KDEVOPS_EXTRA_VARS) \ + playbooks/fio-tests-baseline.yml \ + $(LIMIT_HOSTS) + +fio-tests-results: + $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \ + --connection=ssh \ + --inventory hosts \ + --extra-vars=@$(KDEVOPS_EXTRA_VARS) \ + playbooks/fio-tests.yml \ + --tags results \ + $(LIMIT_HOSTS) + +fio-tests-graph: + $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \ + --connection=ssh \ + --inventory hosts \ + --extra-vars=@$(KDEVOPS_EXTRA_VARS) \ + playbooks/fio-tests-graph.yml \ + $(LIMIT_HOSTS) + +fio-tests-compare: + $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \ + --connection=ssh \ + --inventory hosts \ + --extra-vars=@$(KDEVOPS_EXTRA_VARS) \ + playbooks/fio-tests-compare.yml \ + $(LIMIT_HOSTS) + +fio-tests-trend-analysis: + $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \ + --connection=ssh \ + --inventory hosts \ + --extra-vars=@$(KDEVOPS_EXTRA_VARS) \ + playbooks/fio-tests-trend-analysis.yml \ + $(LIMIT_HOSTS) + +fio-tests-estimate: + $(Q)python3 scripts/workflows/fio-tests/estimate-runtime.py + +fio-tests-help-menu: + @echo "fio-tests options:" + @echo "fio-tests - run fio performance tests" + @echo "fio-tests-baseline - establish baseline results" + @echo "fio-tests-results - collect results from target nodes to localhost" + @echo "fio-tests-graph - generate performance graphs on localhost" + @echo "fio-tests-compare - compare baseline vs dev results" + @echo "fio-tests-trend-analysis - analyze performance trends" + @echo "fio-tests-estimate - estimate runtime for current configuration" + @echo "" + +HELP_TARGETS += fio-tests-help-menu -- 2.45.2