public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Uday Shankar <ushankar@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	Caleb Sander Mateos <csander@purestorage.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shuah Khan <shuah@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v7 1/8] ublk: have a per-io daemon instead of a per-queue daemon
Date: Thu, 29 May 2025 17:59:54 +0800	[thread overview]
Message-ID: <aDgwGoGCEpwd1mFY@fedora> (raw)
In-Reply-To: <20250527-ublk_task_per_io-v7-1-cbdbaf283baa@purestorage.com>

On Tue, May 27, 2025 at 05:01:24PM -0600, Uday Shankar wrote:
> Currently, ublk_drv associates to each hardware queue (hctx) a unique
> task (called the queue's ubq_daemon) which is allowed to issue
> COMMIT_AND_FETCH commands against the hctx. If any other task attempts
> to do so, the command fails immediately with EINVAL. When considered
> together with the block layer architecture, the result is that for each
> CPU C on the system, there is a unique ublk server thread which is
> allowed to handle I/O submitted on CPU C. This can lead to suboptimal
> performance under imbalanced load generation. For an extreme example,
> suppose all the load is generated on CPUs mapping to a single ublk
> server thread. Then that thread may be fully utilized and become the
> bottleneck in the system, while other ublk server threads are totally
> idle.
> 
> This issue can also be addressed directly in the ublk server without
> kernel support by having threads dequeue I/Os and pass them around to
> ensure even load. But this solution requires inter-thread communication
> at least twice for each I/O (submission and completion), which is
> generally a bad pattern for performance. The problem gets even worse
> with zero copy, as more inter-thread communication would be required to
> have the buffer register/unregister calls to come from the correct
> thread.
> 
> Therefore, address this issue in ublk_drv by allowing each I/O to have
> its own daemon task. Two I/Os in the same queue are now allowed to be
> serviced by different daemon tasks - this was not possible before.
> Imbalanced load can then be balanced across all ublk server threads by
> having the ublk server threads issue FETCH_REQs in a round-robin manner.
> As a small toy example, consider a system with a single ublk device
> having 2 queues, each of depth 4. A ublk server having 4 threads could
> issue its FETCH_REQs against this device as follows (where each entry is
> the qid,tag pair that the FETCH_REQ targets):
> 
> ublk server thread:	T0	T1	T2	T3
> 			0,0	0,1	0,2	0,3
> 			1,3	1,0	1,1	1,2
> 
> This setup allows for load that is concentrated on one hctx/ublk_queue
> to be spread out across all ublk server threads, alleviating the issue
> described above.
> 
> Add the new UBLK_F_PER_IO_DAEMON feature to ublk_drv, which ublk servers
> can use to essentially test for the presence of this change and tailor
> their behavior accordingly.
> 
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>

This patch looks close to go, just one panic triggered immediately by
the following steps, I think it needs to be addressed first.

Maybe we need to add one such stress test for UBLK_F_PER_IO_DAEMON too.


1) run heavy IO:

[root@ktest-40 ublk]# ./kublk add -t null -q 2 --nthreads 4 --per_io_tasks
dev id 0: nr_hw_queues 2 queue_depth 128 block size 512 dev_capacity 524288000
	max rq size 1048576 daemon pid 1283 flags 0x2042 state LIVE
	queue 0: affinity(0 )
	queue 1: affinity(8 )
[root@ktest-40 ublk]#
[root@ktest-40 ublk]# ~/git/fio/t/io_uring -p 0 -n 8 /dev/ublkb0

Or

`fio -numjobs=8 --ioengine=libaio --iodepth=128 --iodepth_batch_submit=32 \
	--iodepth_batch_complete_min=32`

2) panic immediately:

[   51.297750] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   51.298719] #PF: supervisor read access in kernel mode
[   51.299403] #PF: error_code(0x0000) - not-present page
[   51.300069] PGD 1161c8067 P4D 1161c8067 PUD 11a793067 PMD 0 
[   51.300825] Oops: Oops: 0000 [#1] SMP NOPTI
[   51.301389] CPU: 0 UID: 0 PID: 1285 Comm: kublk Not tainted 6.15.0+ #288 PREEMPT(full) 
[   51.302375] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014
[   51.303551] RIP: 0010:io_uring_cmd_done+0xa7/0x1d0
[   51.304226] Code: 48 89 f1 48 89 f0 48 83 e1 bf 80 cc 01 48 81 c9 00 01 80 00 83 e6 40 48 0f 45 c1 48 89 43 48 44 89 6b 58 c7 43 5c 00 00 00 00 <8b> 07 f6 c4 08 74 12 48 89 93 e8 00 00 0
[   51.306554] RSP: 0018:ffffd1da436e3a40 EFLAGS: 00010246
[   51.307253] RAX: 0000000000000100 RBX: ffff8d9cd3737300 RCX: 0000000000000001
[   51.308178] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   51.309333] RBP: 0000000000000001 R08: 0000000000000018 R09: 0000000000190015
[   51.310744] R10: 0000000000190015 R11: 0000000000000035 R12: ffff8d9cd1c7c000
[   51.311986] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   51.313386] FS:  00007f2c293916c0(0000) GS:ffff8da179df6000(0000) knlGS:0000000000000000
[   51.314899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   51.315926] CR2: 0000000000000000 CR3: 00000001161c9002 CR4: 0000000000772ef0
[   51.317179] PKRU: 55555554
[   51.317682] Call Trace:
[   51.318040]  <TASK>
[   51.318355]  ublk_cmd_list_tw_cb+0x30/0x40 [ublk_drv]
[   51.319061]  __io_run_local_work_loop+0x72/0x80
[   51.319696]  __io_run_local_work+0x69/0x1e0
[   51.320274]  io_cqring_wait+0x8f/0x6a0
[   51.320794]  __do_sys_io_uring_enter+0x500/0x770
[   51.321422]  do_syscall_64+0x82/0x170
[   51.321891]  ? __do_sys_io_uring_enter+0x500/0x770




Thanks,
Ming


  parent reply	other threads:[~2025-05-29 10:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-27 23:01 [PATCH v7 0/8] ublk: decouple server threads from ublk_queues/hctxs Uday Shankar
2025-05-27 23:01 ` [PATCH v7 1/8] ublk: have a per-io daemon instead of a per-queue daemon Uday Shankar
2025-05-28 18:25   ` Caleb Sander Mateos
2025-05-29  9:59   ` Ming Lei [this message]
2025-05-29 15:37     ` Caleb Sander Mateos
2025-05-29 15:39   ` Caleb Sander Mateos
2025-05-27 23:01 ` [PATCH v7 2/8] selftests: ublk: kublk: plumb q_id in io_uring user_data Uday Shankar
2025-05-27 23:01 ` [PATCH v7 3/8] selftests: ublk: kublk: tie sqe allocation to io instead of queue Uday Shankar
2025-05-29  9:45   ` Ming Lei
2025-05-27 23:01 ` [PATCH v7 4/8] selftests: ublk: kublk: lift queue initialization out of thread Uday Shankar
2025-05-27 23:01 ` [PATCH v7 5/8] selftests: ublk: kublk: move per-thread data out of ublk_queue Uday Shankar
2025-05-27 23:01 ` [PATCH v7 6/8] selftests: ublk: kublk: decouple ublk_queues from ublk server threads Uday Shankar
2025-05-29  9:47   ` Ming Lei
2025-05-27 23:01 ` [PATCH v7 7/8] selftests: ublk: add test for per io daemons Uday Shankar
2025-05-29  9:47   ` Ming Lei
2025-05-27 23:01 ` [PATCH v7 8/8] Documentation: ublk: document UBLK_F_PER_IO_DAEMON Uday Shankar
2025-05-28 18:25   ` Caleb Sander Mateos
2025-05-29  9:48   ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aDgwGoGCEpwd1mFY@fedora \
    --to=ming.lei@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=corbet@lwn.net \
    --cc=csander@purestorage.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=shuah@kernel.org \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox