[PATCH] nvme: test nvmet-wq sysfs interface

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] nvme: test nvmet-wq sysfs interface
@ 2024-11-04 19:29 Chaitanya Kulkarni
  2024-11-05  7:02 ` Daniel Wagner
  2024-11-08  5:21 ` Shinichiro Kawasaki
  0 siblings, 2 replies; 7+ messages in thread
From: Chaitanya Kulkarni @ 2024-11-04 19:29 UTC (permalink / raw)
  To: kanie
  Cc: linux-nvme, linux-block, shinichiro.kawasaki, dwagner,
	Chaitanya Kulkarni

Add a test that randomly sets the cpumask from available CPUs for
the nvmet-wq while running the fio workload. This patch has been
tested on nvme-loop and nvme-tcp transport.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
 tests/nvme/055     | 99 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/nvme/055.out |  3 ++
 2 files changed, 102 insertions(+)
 create mode 100755 tests/nvme/055
 create mode 100644 tests/nvme/055.out

diff --git a/tests/nvme/055 b/tests/nvme/055
new file mode 100755
index 0000000..9fe27a3
--- /dev/null
+++ b/tests/nvme/055
@@ -0,0 +1,99 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0+
+# Copyright (C) 2024 Chaitanya Kulkarni
+#
+# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload
+#
+
+. tests/nvme/rc
+
+DESCRIPTION="Test nvmet-wq cpumask sysfs attribute with fio on NVMe-oF device"
+TIMED=1
+
+requires() {
+	_nvme_requires
+	 _have_fio && _have_loop
+	_require_nvme_trtype_is_fabrics
+}
+
+cleanup_setup() {
+	_nvme_disconnect_subsys
+	_nvmet_target_cleanup
+}
+
+test() {
+	local cpumask_path="/sys/devices/virtual/workqueue/nvmet-wq/cpumask"
+	local original_cpumask
+	local min_cpus
+	local max_cpus
+	local numbers
+	local idx
+	local ns
+
+	echo "Running ${TEST_NAME}"
+
+	_setup_nvmet
+	_nvmet_target_setup
+	_nvme_connect_subsys
+
+	if [ ! -f "$cpumask_path" ]; then
+		SKIP_REASONS+=("nvmet_wq cpumask sysfs attribute not found.")
+		cleanup_setup
+		return 1
+	fi
+
+	ns=$(_find_nvme_ns "${def_subsys_uuid}")
+
+	original_cpumask=$(cat "$cpumask_path")
+
+	num_cpus=$(nproc)
+	max_cpus=$(( num_cpus < 20 ? num_cpus : 20 ))
+	min_cpus=0
+	#shellcheck disable=SC2207
+	numbers=($(seq $min_cpus $max_cpus))
+
+	_run_fio_rand_io --filename="/dev/${ns}" --time_based --runtime=130s \
+			 --iodepth=8 --size="${NVME_IMG_SIZE}" &> "$FULL" &
+
+	# Let the fio settle down else we will break in the loop for fio check
+	sleep 1
+	for ((i = 0; i < max_cpus; i++)); do
+		if ! pgrep -x fio &> /dev/null ; then
+			break
+		fi
+
+		if [[ ${#numbers[@]} -eq 0 ]]; then
+			break
+		fi
+
+		idx=$((RANDOM % ${#numbers[@]}))
+
+		#shellcheck disable=SC2004
+		cpu_mask=$(printf "%X" $((1 << ${numbers[idx]})))
+		echo "$cpu_mask" > "$cpumask_path"
+		if [[ $(cat "$cpumask_path") =~ ^[0,]*${cpu_mask}\n$ ]]; then
+			echo "Test Failed: cpumask was not set correctly "
+			echo "Expected ${cpu_mask} found $(cat "$cpumask_path")"
+			cleanup_setup
+			return 1
+		fi
+		sleep 3
+		# Remove the selected number
+		numbers=("${numbers[@]:0:$idx}" "${numbers[@]:$((idx + 1))}")
+	done
+
+	killall fio &> /dev/null
+
+	# Restore original cpumask
+	echo "$original_cpumask" > "$cpumask_path"
+	restored_cpumask=$(cat "$cpumask_path")
+
+	if [[ "$restored_cpumask" != "$original_cpumask" ]]; then
+		echo "Failed to restore original cpumask."
+		cleanup_setup
+		return 1
+	fi
+
+	cleanup_setup
+	echo "Test complete"
+}
diff --git a/tests/nvme/055.out b/tests/nvme/055.out
new file mode 100644
index 0000000..427dfee
--- /dev/null
+++ b/tests/nvme/055.out
@@ -0,0 +1,3 @@
+Running nvme/055
+disconnected 1 controller(s)
+Test complete
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: test nvmet-wq sysfs interface
  2024-11-04 19:29 [PATCH] nvme: test nvmet-wq sysfs interface Chaitanya Kulkarni
@ 2024-11-05  7:02 ` Daniel Wagner
  2024-11-06  3:46   ` Chaitanya Kulkarni
  2024-11-08  5:21 ` Shinichiro Kawasaki
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel Wagner @ 2024-11-05  7:02 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: kanie, linux-nvme, linux-block, shinichiro.kawasaki

On Mon, Nov 04, 2024 at 11:29:07AM GMT, Chaitanya Kulkarni wrote:
> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload

What do you want to test here? What's the objective? I don't see any
block layer related parts or do I miss something?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: test nvmet-wq sysfs interface
  2024-11-05  7:02 ` Daniel Wagner
@ 2024-11-06  3:46   ` Chaitanya Kulkarni
  2024-11-06 12:13     ` Daniel Wagner
  0 siblings, 1 reply; 7+ messages in thread
From: Chaitanya Kulkarni @ 2024-11-06  3:46 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: kanie@linux.alibaba.com, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, shinichiro.kawasaki@wdc.com,
	Chaitanya Kulkarni

On 11/4/24 23:02, Daniel Wagner wrote:
> On Mon, Nov 04, 2024 at 11:29:07AM GMT, Chaitanya Kulkarni wrote:
>> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload 
> What do you want to test here? What's the objective? I don't see any 
> block layer related parts or do I miss something? 

For NVMeOF target we have exported nvmet-wq to userspace with
recent patch. This behavior is new and I don't think there is any test
exists that runs the fio workload while changing the CPUMASK randomly
to ensure the stability for nvmet-wq when application is using the block
layer. Hence we need the test for latest patch we have added to the 6.13.

-ck

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: test nvmet-wq sysfs interface
  2024-11-06  3:46   ` Chaitanya Kulkarni
@ 2024-11-06 12:13     ` Daniel Wagner
  2024-11-08  5:00       ` Shinichiro Kawasaki
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Wagner @ 2024-11-06 12:13 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: kanie@linux.alibaba.com, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, shinichiro.kawasaki@wdc.com

On Wed, Nov 06, 2024 at 03:46:26AM GMT, Chaitanya Kulkarni wrote:
> On 11/4/24 23:02, Daniel Wagner wrote:
> > On Mon, Nov 04, 2024 at 11:29:07AM GMT, Chaitanya Kulkarni wrote:
> >> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload 
> > What do you want to test here? What's the objective? I don't see any 
> > block layer related parts or do I miss something? 
> 
> For NVMeOF target we have exported nvmet-wq to userspace with
> recent patch. This behavior is new and I don't think there is any test
> exists that runs the fio workload while changing the CPUMASK randomly
> to ensure the stability for nvmet-wq when application is using the block
> layer. Hence we need the test for latest patch we have added to the
> 6.13.

Though this is not a performance measurement just a functional testing if
the scheduler works. I don't think we should add such tests
because it will add to the overall runtime for little benefit. I am
pretty sure there already tests (e.g. kunit) which do more elaborate
scheduler tests.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: test nvmet-wq sysfs interface
  2024-11-06 12:13     ` Daniel Wagner
@ 2024-11-08  5:00       ` Shinichiro Kawasaki
  0 siblings, 0 replies; 7+ messages in thread
From: Shinichiro Kawasaki @ 2024-11-08  5:00 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Chaitanya Kulkarni, kanie@linux.alibaba.com,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org

On Nov 06, 2024 / 13:13, Daniel Wagner wrote:
> On Wed, Nov 06, 2024 at 03:46:26AM GMT, Chaitanya Kulkarni wrote:
> > On 11/4/24 23:02, Daniel Wagner wrote:
> > > On Mon, Nov 04, 2024 at 11:29:07AM GMT, Chaitanya Kulkarni wrote:
> > >> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload 
> > > What do you want to test here? What's the objective? I don't see any 
> > > block layer related parts or do I miss something? 
> > 
> > For NVMeOF target we have exported nvmet-wq to userspace with
> > recent patch. This behavior is new and I don't think there is any test
> > exists that runs the fio workload while changing the CPUMASK randomly
> > to ensure the stability for nvmet-wq when application is using the block
> > layer. Hence we need the test for latest patch we have added to the
> > 6.13.
> 
> Though this is not a performance measurement just a functional testing if
> the scheduler works. I don't think we should add such tests
> because it will add to the overall runtime for little benefit. I am
> pretty sure there already tests (e.g. kunit) which do more elaborate
> scheduler tests.

When I ran the test case on my test system using the kernel v6.12-rc6 and the
patch "nvmet: make nvmet_wq visible in sysfs", I observed a kernel INFO [1].
This INFO looks indicating a bug in nvmet driver. Is this a known issue? The
INFO is stably recreated with loop, rdma and tcp transports on my test system.

If this is a new, unknown issue, this test case a good test to exercise nvmet
driver. It looks worth adding to blktests.

As for the runtime cost, I think the test case can be modified to reflect the
TIMEOUT variable users set. This will allow users to control the runtime cost to
some extent.

[1]

[   87.365354][  T963] run blktests nvme/055 at 2024-11-08 09:32:42
[   87.433572][ T1011] loop0: detected capacity change from 0 to 2097152
[   87.452511][ T1014] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   87.478018][ T1019] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   87.565565][  T226] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.20.
[   87.570700][ T1026] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port dev.
[   87.573064][ T1026] nvme nvme1: creating 4 I/O queues.
[   87.577041][ T1026] nvme nvme1: mapped 4/0/0 default/read/poll queues.
[   87.580437][ T1026] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nq9
[  101.337422][ T1073] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[  101.383814][  T104] INFO: trying to register non-static key.
[  101.384456][  T104] The code is fine but needs lockdep annotation, or maybe
[  101.385136][  T104] you didn't initialize this object before use?
[  101.385758][  T104] turning off the locking correctness validator.
[  101.386392][  T104] CPU: 1 UID: 0 PID: 104 Comm: kworker/u16:4 Not tainted 6.12.0-rc6+ #361
[  101.387197][  T104] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/204
[  101.388109][  T104] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp]
[  101.388868][  T104] Call Trace:
[  101.389203][  T104]  <TASK>
[  101.389484][  T104]  dump_stack_lvl+0x6a/0x90
[  101.389927][  T104]  register_lock_class+0xe2a/0x10a0
[  101.390444][  T104]  ? __lock_acquire+0xd1b/0x5f20
[  101.390921][  T104]  ? __pfx_register_lock_class+0x10/0x10
[  101.391479][  T104]  __lock_acquire+0x81e/0x5f20
[  101.391939][  T104]  ? lock_is_held_type+0xd5/0x130
[  101.392434][  T104]  ? find_held_lock+0x2d/0x110
[  101.392891][  T104]  ? __pfx___lock_acquire+0x10/0x10
[  101.393399][  T104]  ? lock_release+0x460/0x7a0
[  101.393853][  T104]  ? __pfx_lock_release+0x10/0x10
[  101.394351][  T104]  lock_acquire.part.0+0x12d/0x360
[  101.394841][  T104]  ? xa_erase+0xd/0x30
[  101.395255][  T104]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  101.395793][  T104]  ? rcu_is_watching+0x11/0xb0
[  101.396271][  T104]  ? trace_lock_acquire+0x12f/0x1a0
[  101.396770][  T104]  ? __pfx___flush_work+0x10/0x10
[  101.397266][  T104]  ? xa_erase+0xd/0x30
[  101.397658][  T104]  ? lock_acquire+0x2d/0xc0
[  101.398089][  T104]  ? xa_erase+0xd/0x30
[  101.398501][  T104]  _raw_spin_lock+0x2f/0x40
[  101.398933][  T104]  ? xa_erase+0xd/0x30
[  101.400167][  T104]  xa_erase+0xd/0x30
[  101.401359][  T104]  nvmet_ctrl_destroy_pr+0x10e/0x1c0 [nvmet]
[  101.402767][  T104]  ? __pfx_nvmet_ctrl_destroy_pr+0x10/0x10 [nvmet]
[  101.404235][  T104]  ? __pfx___might_resched+0x10/0x10
[  101.405558][  T104]  nvmet_ctrl_free+0x2f0/0x830 [nvmet]
[  101.406900][  T104]  ? lockdep_hardirqs_on+0x78/0x100
[  101.408192][  T104]  ? __cancel_work+0x166/0x230
[  101.409398][  T104]  ? __pfx_nvmet_ctrl_free+0x10/0x10 [nvmet]
[  101.410726][  T104]  ? rcu_is_watching+0x11/0xb0
[  101.411938][  T104]  ? kfree+0x13e/0x4a0
[  101.413073][  T104]  ? lockdep_hardirqs_on+0x78/0x100
[  101.414318][  T104]  nvmet_sq_destroy+0x1f2/0x3a0 [nvmet]
[  101.415556][  T104]  nvmet_tcp_release_queue_work+0x4c0/0xe40 [nvmet_tcp]
[  101.416912][  T104]  process_one_work+0x85a/0x1460
[  101.418064][  T104]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  101.419266][  T104]  ? __pfx_process_one_work+0x10/0x10
[  101.420427][  T104]  ? assign_work+0x16c/0x240
[  101.421486][  T104]  ? lock_is_held_type+0xd5/0x130
[  101.422555][  T104]  worker_thread+0x5e2/0xfc0
[  101.423574][  T104]  ? __pfx_worker_thread+0x10/0x10
[  101.424639][  T104]  kthread+0x2d1/0x3a0
[  101.425587][  T104]  ? _raw_spin_unlock_irq+0x24/0x50
[  101.426644][  T104]  ? __pfx_kthread+0x10/0x10
[  101.427637][  T104]  ret_from_fork+0x30/0x70
[  101.428596][  T104]  ? __pfx_kthread+0x10/0x10
[  101.429562][  T104]  ret_from_fork_asm+0x1a/0x30
[  101.430535][  T104]  </TASK>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: test nvmet-wq sysfs interface
  2024-11-04 19:29 [PATCH] nvme: test nvmet-wq sysfs interface Chaitanya Kulkarni
  2024-11-05  7:02 ` Daniel Wagner
@ 2024-11-08  5:21 ` Shinichiro Kawasaki
  2024-11-12  0:58   ` Chaitanya Kulkarni
  1 sibling, 1 reply; 7+ messages in thread
From: Shinichiro Kawasaki @ 2024-11-08  5:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: kanie@linux.alibaba.com, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, dwagner@suse.de

On Nov 04, 2024 / 11:29, Chaitanya Kulkarni wrote:
> Add a test that randomly sets the cpumask from available CPUs for
> the nvmet-wq while running the fio workload. This patch has been
> tested on nvme-loop and nvme-tcp transport.
> 
> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>

Thanks. As I noted in another mail, this test case generates a kernel
INFO message, and it looks like catching a new bug. So I think this
test case worth adding. Please find my review comments in line.

> ---
>  tests/nvme/055     | 99 ++++++++++++++++++++++++++++++++++++++++++++++
>  tests/nvme/055.out |  3 ++
>  2 files changed, 102 insertions(+)
>  create mode 100755 tests/nvme/055
>  create mode 100644 tests/nvme/055.out
> 
> diff --git a/tests/nvme/055 b/tests/nvme/055
> new file mode 100755
> index 0000000..9fe27a3
> --- /dev/null
> +++ b/tests/nvme/055
> @@ -0,0 +1,99 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0+
> +# Copyright (C) 2024 Chaitanya Kulkarni
> +#
> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload
> +#
> +
> +. tests/nvme/rc
> +
> +DESCRIPTION="Test nvmet-wq cpumask sysfs attribute with fio on NVMe-oF device"
> +TIMED=1
> +
> +requires() {
> +	_nvme_requires
> +	 _have_fio && _have_loop

Nit, a unneccesaary space.

> +	_require_nvme_trtype_is_fabrics
> +}

I suggest to add set_conditions() here as below. Without it, the test case is
skipped when users do not se NVMET_TRTYPES variable. If we add set_conditions(),
_have_loop call in requires() can be removed.

set_conditions() {
       _set_nvme_trtype "$@"
}

> +
> +cleanup_setup() {
> +	_nvme_disconnect_subsys
> +	_nvmet_target_cleanup
> +}
> +
> +test() {
> +	local cpumask_path="/sys/devices/virtual/workqueue/nvmet-wq/cpumask"
> +	local original_cpumask
> +	local min_cpus
> +	local max_cpus
> +	local numbers
> +	local idx
> +	local ns
> +
> +	echo "Running ${TEST_NAME}"
> +
> +	_setup_nvmet
> +	_nvmet_target_setup
> +	_nvme_connect_subsys
> +
> +	if [ ! -f "$cpumask_path" ]; then
> +		SKIP_REASONS+=("nvmet_wq cpumask sysfs attribute not found.")
> +		cleanup_setup
> +		return 1
> +	fi
> +
> +	ns=$(_find_nvme_ns "${def_subsys_uuid}")
> +
> +	original_cpumask=$(cat "$cpumask_path")
> +
> +	num_cpus=$(nproc)
> +	max_cpus=$(( num_cpus < 20 ? num_cpus : 20 ))
> +	min_cpus=0
> +	#shellcheck disable=SC2207
> +	numbers=($(seq $min_cpus $max_cpus))

Nit: the shellcheck error can be vaoided with this:

        read -a numbers -d '' < <(seq $min_cpus $max_cpus)

> +
> +	_run_fio_rand_io --filename="/dev/${ns}" --time_based --runtime=130s \

The --time_based and --runtime=130s options are not required, because fio helper
bash functions will add them.

Instead, let's add this line before the fio command.

         : ${TIMEOUT:=60}

The fio helper functions will refect this TIMEOUT value to the --runtime
option. This will allow users to control runtime cost as TIMED=1 indicates.

> +			 --iodepth=8 --size="${NVME_IMG_SIZE}" &> "$FULL" &
> +
> +	# Let the fio settle down else we will break in the loop for fio check
> +	sleep 1
> +	for ((i = 0; i < max_cpus; i++)); do
> +		if ! pgrep -x fio &> /dev/null ; then

pgrep command is not in the GNU coreutils. I suggest to keep the fio process id
in a variable and check with "kill -0" command. The test case block/005 does it.

> +			break
> +		fi
> +
> +		if [[ ${#numbers[@]} -eq 0 ]]; then
> +			break
> +		fi
> +
> +		idx=$((RANDOM % ${#numbers[@]}))
> +
> +		#shellcheck disable=SC2004
> +		cpu_mask=$(printf "%X" $((1 << ${numbers[idx]})))
> +		echo "$cpu_mask" > "$cpumask_path"

When I ran this test case, I observed an error at this line:

    echo: write error: Value too large for defined data type

I think the cpu_mask calculation is wrong, and it should be like this:

               cpu_mask=0
               for ((n = 0; n < numbers[idx]; n++)); do
                       cpu_mask=$((cpu_mask + (1 << n)))
               done
               cpu_mask=$(printf "%X" $((cpu_mask)))


> +		if [[ $(cat "$cpumask_path") =~ ^[0,]*${cpu_mask}\n$ ]]; then
> +			echo "Test Failed: cpumask was not set correctly "
> +			echo "Expected ${cpu_mask} found $(cat "$cpumask_path")"
> +			cleanup_setup
> +			return 1
> +		fi
> +		sleep 3
> +		# Remove the selected number
> +		numbers=("${numbers[@]:0:$idx}" "${numbers[@]:$((idx + 1))}")
> +	done
> +
> +	killall fio &> /dev/null

killall is not in GNU coreutils either (blktests already uses it at other
places though...)   Does "kill -9" work instead?

> +
> +	# Restore original cpumask
> +	echo "$original_cpumask" > "$cpumask_path"
> +	restored_cpumask=$(cat "$cpumask_path")
> +
> +	if [[ "$restored_cpumask" != "$original_cpumask" ]]; then
> +		echo "Failed to restore original cpumask."
> +		cleanup_setup
> +		return 1
> +	fi
> +
> +	cleanup_setup
> +	echo "Test complete"
> +}
> diff --git a/tests/nvme/055.out b/tests/nvme/055.out
> new file mode 100644
> index 0000000..427dfee
> --- /dev/null
> +++ b/tests/nvme/055.out
> @@ -0,0 +1,3 @@
> +Running nvme/055
> +disconnected 1 controller(s)
> +Test complete
> -- 
> 2.40.0
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: test nvmet-wq sysfs interface
  2024-11-08  5:21 ` Shinichiro Kawasaki
@ 2024-11-12  0:58   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 7+ messages in thread
From: Chaitanya Kulkarni @ 2024-11-12  0:58 UTC (permalink / raw)
  To: Shinichiro Kawasaki, Chaitanya Kulkarni
  Cc: kanie@linux.alibaba.com, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, dwagner@suse.de

On 11/7/24 21:21, Shinichiro Kawasaki wrote:
> On Nov 04, 2024 / 11:29, Chaitanya Kulkarni wrote:
>> Add a test that randomly sets the cpumask from available CPUs for
>> the nvmet-wq while running the fio workload. This patch has been
>> tested on nvme-loop and nvme-tcp transport.
>>
>> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
> 
> Thanks. As I noted in another mail, this test case generates a kernel
> INFO message, and it looks like catching a new bug. So I think this
> test case worth adding. Please find my review comments in line.
> 
>> ---
>>   tests/nvme/055     | 99 ++++++++++++++++++++++++++++++++++++++++++++++
>>   tests/nvme/055.out |  3 ++
>>   2 files changed, 102 insertions(+)
>>   create mode 100755 tests/nvme/055
>>   create mode 100644 tests/nvme/055.out
>>
>> diff --git a/tests/nvme/055 b/tests/nvme/055
>> new file mode 100755
>> index 0000000..9fe27a3
>> --- /dev/null
>> +++ b/tests/nvme/055
>> @@ -0,0 +1,99 @@
>> +#!/bin/bash
>> +# SPDX-License-Identifier: GPL-2.0+
>> +# Copyright (C) 2024 Chaitanya Kulkarni
>> +#
>> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload
>> +#
>> +
>> +. tests/nvme/rc
>> +
>> +DESCRIPTION="Test nvmet-wq cpumask sysfs attribute with fio on NVMe-oF device"
>> +TIMED=1
>> +
>> +requires() {
>> +	_nvme_requires
>> +	 _have_fio && _have_loop
> 
> Nit, a unneccesaary space.
> 
>> +	_require_nvme_trtype_is_fabrics
>> +}
> 

done.

> I suggest to add set_conditions() here as below. Without it, the test case is
> skipped when users do not se NVMET_TRTYPES variable. If we add set_conditions(),
> _have_loop call in requires() can be removed.
> 
> set_conditions() {
>         _set_nvme_trtype "$@"
> }
> 

done.

>> +
>> +cleanup_setup() {
>> +	_nvme_disconnect_subsys
>> +	_nvmet_target_cleanup
>> +}
>> +
>> +test() {
>> +	local cpumask_path="/sys/devices/virtual/workqueue/nvmet-wq/cpumask"
>> +	local original_cpumask
>> +	local min_cpus
>> +	local max_cpus
>> +	local numbers
>> +	local idx
>> +	local ns
>> +
>> +	echo "Running ${TEST_NAME}"
>> +
>> +	_setup_nvmet
>> +	_nvmet_target_setup
>> +	_nvme_connect_subsys
>> +
>> +	if [ ! -f "$cpumask_path" ]; then
>> +		SKIP_REASONS+=("nvmet_wq cpumask sysfs attribute not found.")
>> +		cleanup_setup
>> +		return 1
>> +	fi
>> +
>> +	ns=$(_find_nvme_ns "${def_subsys_uuid}")
>> +
>> +	original_cpumask=$(cat "$cpumask_path")
>> +
>> +	num_cpus=$(nproc)
>> +	max_cpus=$(( num_cpus < 20 ? num_cpus : 20 ))
>> +	min_cpus=0
>> +	#shellcheck disable=SC2207
>> +	numbers=($(seq $min_cpus $max_cpus))
> 
> Nit: the shellcheck error can be vaoided with this:
> 
>          read -a numbers -d '' < <(seq $min_cpus $max_cpus)
> 

done.

>> +
>> +	_run_fio_rand_io --filename="/dev/${ns}" --time_based --runtime=130s \
> 
> The --time_based and --runtime=130s options are not required, because fio helper
> bash functions will add them.
> 
> Instead, let's add this line before the fio command.
> 
>           : ${TIMEOUT:=60}
> 
> The fio helper functions will refect this TIMEOUT value to the --runtime
> option. This will allow users to control runtime cost as TIMED=1 indicates.
> 

done.

>> +			 --iodepth=8 --size="${NVME_IMG_SIZE}" &> "$FULL" &
>> +
>> +	# Let the fio settle down else we will break in the loop for fio check
>> +	sleep 1
>> +	for ((i = 0; i < max_cpus; i++)); do
>> +		if ! pgrep -x fio &> /dev/null ; then
> 
> pgrep command is not in the GNU coreutils. I suggest to keep the fio process id
> in a variable and check with "kill -0" command. The test case block/005 does it.
> 

thanks for the pointer, done and it looks much cleaner.

>> +			break
>> +		fi
>> +
>> +		if [[ ${#numbers[@]} -eq 0 ]]; then
>> +			break
>> +		fi
>> +
>> +		idx=$((RANDOM % ${#numbers[@]}))
>> +
>> +		#shellcheck disable=SC2004
>> +		cpu_mask=$(printf "%X" $((1 << ${numbers[idx]})))
>> +		echo "$cpu_mask" > "$cpumask_path"
> 
> When I ran this test case, I observed an error at this line:
> 
>      echo: write error: Value too large for defined data type
> 
> I think the cpu_mask calculation is wrong, and it should be like this:
> 
>                 cpu_mask=0
>                 for ((n = 0; n < numbers[idx]; n++)); do
>                         cpu_mask=$((cpu_mask + (1 << n)))
>                 done
>                 cpu_mask=$(printf "%X" $((cpu_mask)))
> 
> 

didn't see that error in my execution done.

This version tries to only use one CPU at a time, but with your
suggestion it will now consider more than one CPUs which I think is
useful too.

I modified the calculation to keep it simple for now.

>> +		if [[ $(cat "$cpumask_path") =~ ^[0,]*${cpu_mask}\n$ ]]; then
>> +			echo "Test Failed: cpumask was not set correctly "
>> +			echo "Expected ${cpu_mask} found $(cat "$cpumask_path")"
>> +			cleanup_setup
>> +			return 1
>> +		fi
>> +		sleep 3
>> +		# Remove the selected number
>> +		numbers=("${numbers[@]:0:$idx}" "${numbers[@]:$((idx + 1))}")
>> +	done
>> +
>> +	killall fio &> /dev/null
> 
> killall is not in GNU coreutils either (blktests already uses it at other
> places though...)   Does "kill -9" work instead?
> 

done.

Thanks for the review comments, I'll send it V2 soon.

-ck



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-11-12  1:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-04 19:29 [PATCH] nvme: test nvmet-wq sysfs interface Chaitanya Kulkarni
2024-11-05  7:02 ` Daniel Wagner
2024-11-06  3:46   ` Chaitanya Kulkarni
2024-11-06 12:13     ` Daniel Wagner
2024-11-08  5:00       ` Shinichiro Kawasaki
2024-11-08  5:21 ` Shinichiro Kawasaki
2024-11-12  0:58   ` Chaitanya Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox