public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Alexander Atanasov <alex@zazolabs.com>
Cc: Shuah Khan <shuah@kernel.org>,
	linux-block@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] selftests: ublk: io-reorder triggered in test_generic_01.sh
Date: Sun, 25 Jan 2026 23:28:41 +0800	[thread overview]
Message-ID: <aXY2qdYtfVTSc7jB@fedora> (raw)
In-Reply-To: <03C8F5B9-B2E1-4C78-A043-B4F9422508B7@zazolabs.com>

On Fri, Jan 23, 2026 at 05:00:33PM +0200, Alexander Atanasov wrote:
> 
> 
> > On 23 Jan 2026, at 16:33, Ming Lei <ming.lei@redhat.com> wrote:
> > 
> > On Fri, Jan 23, 2026 at 03:59:31PM +0200, Alexander Atanasov wrote:
> >> On 23 Jan 2026, at 15:33, Ming Lei <ming.lei@redhat.com> wrote:
> >>> 
> >>> On Fri, Jan 23, 2026 at 11:20:36AM +0000, Alexander Atanasov wrote:
> >>>> Create a temp dir for temporary files and use it instead of
> >>>> placing them inside source tree.
> >>> 
> >>> Many temporary files are backing files of file storage target, so far
> >>> the code requires O_DIRECT, or the size could be a bit big.
> >>> 
> >>> In case of ramfs/tmpfs of temp dir, it may cause problem for tests.
> >>> 
> >> 
> >> I am aware of O_DIRECT problem but you can export different TMPDIR that has working O_DIRECT.
> > 
> > Can you share how to export TMPDIR capable of O_DIRECT?
> 
> 
> +export TMPDIR=$(mktemp -d ${TMPDIR:-/tmp}/ublktest-dir.XXXXXX)
> 
> I made the tests to run in own TMPDIR. Which is under already set TMPDIR or
> if TMPDIR is not set it is defaults to  /tmp.
> 
> export TMPDIR=/path/to/odirect/capable
> Before running and tests will run in:
> /path/to/odirect/capable/ublktest-dir.XXXXXX
> 
> 
> > 
> >> 
> >> I use sshfs mount of the build to run the tests and that is a problem sshfs/fuse does not
> >> do O_DIRECT too.
> >> 
> >> I think test_generic_06.sh is the only one that fails due to this(thou I still have to investigate).
> >> 
> >> If O_DIRECT is required by the tests it may be possible to go thru a RAM disk which does support it,
> >> so it works eveerywhere
> >> 
> >> Other option is to preserve working in source tree as it is now, and just add a variable to specify working directory -
> >> UBLK_TMPDIR or something.
> >> 
> >> 
> >> I get a lot of out of order io - between 0 and 10 on average on my test setup:
> >> tools/testing/selftests/ublk/test_generic_01.sh 
> >> Attached 3 probes
> >> io_out_of_order: exp 564688 actual 564648
> >> io_out_of_order: exp 564648 actual 565584
> >> io_out_of_order: exp 565584 actual 564688
> >> io_out_of_order: exp 565592 actual 564688
> >> io_out_of_order: exp 566328 actual 565592
> >> io_out_of_order: exp 882256 actual 882248
> >> io_out_of_order: exp 883032 actual 882912
> >> io_out_of_order: exp 882912 actual 883040
> >> io_out_of_order: exp 883040 actual 883032
> >> 
> >> 
> >> generic_01 : [FAIL]
> >> 
> >> All rq-s are there just reordered , AFAIK blk-mq does not guarantee that requests will be completed in order, what’s the idea to catch this and
> > 
> > If there is just 0 ~ 10, it could be fine. But if all are reorderd,
> > something must be wrong. One improvement could be check if there is too
> > many reorder...
> > 
> > Actually what I am trying to test is to make sure same order is observed
> > from both ublk driver dispatch code path and ublk target io handling code
> > path, because io_uring task work schedule uses llist, which may introduce io
> > reorder.
> 
> There are for sure other places where a reordering can be introduced, so the code should be ready and expecting 
> It. (For my case see bellow) Is preserving the order required for some reason for ublk?
> 
> > 
> > However, that involves ublk kprobe/kfunc trace, which may not be stable,
> > so I simply check the end-to-end IO order. Sometimes blk-mq IO queue/dispatch
> > may re-order IO.
> > 
> > I guess the following change may avoid the re-order, but batch IO case may
> > not be covered:
> > 
> > diff --git a/tools/testing/selftests/ublk/test_generic_01.sh b/tools/testing/selftests/ublk/test_generic_01.sh
> > index 21a31cd5491a..5805da4c84c5 100755
> > --- a/tools/testing/selftests/ublk/test_generic_01.sh
> > +++ b/tools/testing/selftests/ublk/test_generic_01.sh
> > @@ -29,14 +29,8 @@ if ! kill -0 "$btrace_pid" > /dev/null 2>&1; then
> >        exit "$UBLK_SKIP_CODE"
> > fi
> > 
> > -# run fio over this ublk disk
> > -fio --name=write_seq \
> > -    --filename=/dev/ublkb"${dev_id}" \
> > -    --ioengine=libaio --iodepth=16 \
> > -    --rw=write \
> > -    --size=512M \
> > -    --direct=1 \
> > -    --bs=4k > /dev/null 2>&1
> > +taskset -c 0 dd if=/dev/zero of=/dev/ublkb"${dev_id}" bs=1M count=256 oflag=direct > /dev/null 2>&1
> > +
> > 
> > 
> >> consider it an error? (Latest tree with batch io and batch io fixes on top of if that matters)
> > 
> > Never observe generic_01 failure in my test VM and hardware.
> > 
> > My kernel config is based on Fedora, maybe scheduler config option makes the difference.
> 
> Fedora 43 default config with some debugging options enabled, but no changes in schedulers.
> Test VM storage is on a networked NAS over iSCSI - both boxes VM host and NAS have two NICs,
> I get the errors when I load the network. So I believe the requests really complete out of 
> order due to the network in my case. All tests that have the bpftrace check fail on occasion.

Can you test the following patch and see if re-order still can happen? 


diff --git a/tools/testing/selftests/ublk/test_generic_01.sh b/tools/testing/selftests/ublk/test_generic_01.sh
index 26cf3c7ceeb5..26d5e52ece29 100755
--- a/tools/testing/selftests/ublk/test_generic_01.sh
+++ b/tools/testing/selftests/ublk/test_generic_01.sh
@@ -13,7 +13,7 @@ if ! _have_program fio; then
 	exit "$UBLK_SKIP_CODE"
 fi
 
-_prep_test "null" "sequential io order"
+_prep_test "null" "ublk dispatch won't reorder IO"
 
 dev_id=$(_add_ublk_dev -t null)
 _check_add_dev $TID $?
@@ -39,9 +39,13 @@ fio --name=write_seq \
 ERR_CODE=$?
 kill "$btrace_pid"
 wait
-if grep -q "io_out_of_order" "$UBLK_TMP"; then
-	cat "$UBLK_TMP"
+
+# Check for out-of-order completions detected by bpftrace
+if grep -q "^out_of_order:" "$UBLK_TMP"; then
+	echo "I/O reordering detected:"
+	grep "^out_of_order:" "$UBLK_TMP"
 	ERR_CODE=255
 fi
+
 _cleanup_test "null"
 _show_result $TID $ERR_CODE
diff --git a/tools/testing/selftests/ublk/trace/seq_io.bt b/tools/testing/selftests/ublk/trace/seq_io.bt
index b2f60a92b118..60ac40e66606 100644
--- a/tools/testing/selftests/ublk/trace/seq_io.bt
+++ b/tools/testing/selftests/ublk/trace/seq_io.bt
@@ -2,23 +2,45 @@
 	$1: 	dev_t
 	$2: 	RWBS
 	$3:     strlen($2)
+
+	Track request order between block_io_start and block_rq_complete.
+	For each request, record its start sequence number and verify
+	completions happen in the same order.
 */
+
 BEGIN {
-	@last_rw[$1, str($2)] = (uint64)0;
+	@start_seq = (uint64)0;
+	@complete_seq = (uint64)0;
+	@out_of_order = (uint64)0;
+}
+
+tracepoint:block:block_io_start
+{
+	if ((int64)args.dev == $1 && !strncmp(args.rwbs, str($2), $3)) {
+		@start_order[args.sector] = @start_seq;
+		@start_seq = @start_seq + 1;
+	}
 }
+
 tracepoint:block:block_rq_complete
 {
-	$dev = $1;
 	if ((int64)args.dev == $1 && !strncmp(args.rwbs, str($2), $3)) {
-		$last = @last_rw[$dev, str($2)];
-		if ((uint64)args.sector != $last) {
-			printf("io_out_of_order: exp %llu actual %llu\n",
-				args.sector, $last);
+		$expected_order = @start_order[args.sector];
+		if ($expected_order != @complete_seq) {
+			printf("out_of_order: sector %llu started at seq %llu but completed at seq %llu\n",
+				args.sector, $expected_order, @complete_seq);
+			@out_of_order = @out_of_order + 1;
 		}
-		@last_rw[$dev, str($2)] = (args.sector + args.nr_sector);
+		delete(@start_order[args.sector]);
+		@complete_seq = @complete_seq + 1;
 	}
 }
 
 END {
-	clear(@last_rw);
+	printf("total_start: %llu total_complete: %llu out_of_order: %llu\n",
+		@start_seq, @complete_seq, @out_of_order);
+	clear(@start_order);
+	clear(@start_seq);
+	clear(@complete_seq);
+	clear(@out_of_order);
 }

Thanks,
Ming


  reply	other threads:[~2026-01-25 15:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-23 11:20 [PATCH] selftests: ublk: use tmpdir for scratch files and improve relative paths use Alexander Atanasov
2026-01-23 13:33 ` Ming Lei
2026-01-23 13:59   ` Alexander Atanasov
2026-01-23 14:33     ` Ming Lei
2026-01-23 15:00       ` Alexander Atanasov
2026-01-25 15:28         ` Ming Lei [this message]
2026-01-25 15:34           ` [PATCH] selftests: ublk: io-reorder triggered in test_generic_01.sh Ming Lei
2026-01-25 18:35           ` Alexander Atanasov
2026-01-26  1:27             ` Ming Lei
2026-01-26  8:33               ` Alexander Atanasov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXY2qdYtfVTSc7jB@fedora \
    --to=ming.lei@redhat.com \
    --cc=alex@zazolabs.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox