From: Ming Lei <ming.lei@redhat.com>
To: Uday Shankar <ushankar@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
Caleb Sander Mateos <csander@purestorage.com>,
Andrew Morton <akpm@linux-foundation.org>,
Shuah Khan <shuah@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v7 1/8] ublk: have a per-io daemon instead of a per-queue daemon
Date: Thu, 29 May 2025 17:59:54 +0800 [thread overview]
Message-ID: <aDgwGoGCEpwd1mFY@fedora> (raw)
In-Reply-To: <20250527-ublk_task_per_io-v7-1-cbdbaf283baa@purestorage.com>
On Tue, May 27, 2025 at 05:01:24PM -0600, Uday Shankar wrote:
> Currently, ublk_drv associates to each hardware queue (hctx) a unique
> task (called the queue's ubq_daemon) which is allowed to issue
> COMMIT_AND_FETCH commands against the hctx. If any other task attempts
> to do so, the command fails immediately with EINVAL. When considered
> together with the block layer architecture, the result is that for each
> CPU C on the system, there is a unique ublk server thread which is
> allowed to handle I/O submitted on CPU C. This can lead to suboptimal
> performance under imbalanced load generation. For an extreme example,
> suppose all the load is generated on CPUs mapping to a single ublk
> server thread. Then that thread may be fully utilized and become the
> bottleneck in the system, while other ublk server threads are totally
> idle.
>
> This issue can also be addressed directly in the ublk server without
> kernel support by having threads dequeue I/Os and pass them around to
> ensure even load. But this solution requires inter-thread communication
> at least twice for each I/O (submission and completion), which is
> generally a bad pattern for performance. The problem gets even worse
> with zero copy, as more inter-thread communication would be required to
> have the buffer register/unregister calls to come from the correct
> thread.
>
> Therefore, address this issue in ublk_drv by allowing each I/O to have
> its own daemon task. Two I/Os in the same queue are now allowed to be
> serviced by different daemon tasks - this was not possible before.
> Imbalanced load can then be balanced across all ublk server threads by
> having the ublk server threads issue FETCH_REQs in a round-robin manner.
> As a small toy example, consider a system with a single ublk device
> having 2 queues, each of depth 4. A ublk server having 4 threads could
> issue its FETCH_REQs against this device as follows (where each entry is
> the qid,tag pair that the FETCH_REQ targets):
>
> ublk server thread: T0 T1 T2 T3
> 0,0 0,1 0,2 0,3
> 1,3 1,0 1,1 1,2
>
> This setup allows for load that is concentrated on one hctx/ublk_queue
> to be spread out across all ublk server threads, alleviating the issue
> described above.
>
> Add the new UBLK_F_PER_IO_DAEMON feature to ublk_drv, which ublk servers
> can use to essentially test for the presence of this change and tailor
> their behavior accordingly.
>
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
This patch looks close to go, just one panic triggered immediately by
the following steps, I think it needs to be addressed first.
Maybe we need to add one such stress test for UBLK_F_PER_IO_DAEMON too.
1) run heavy IO:
[root@ktest-40 ublk]# ./kublk add -t null -q 2 --nthreads 4 --per_io_tasks
dev id 0: nr_hw_queues 2 queue_depth 128 block size 512 dev_capacity 524288000
max rq size 1048576 daemon pid 1283 flags 0x2042 state LIVE
queue 0: affinity(0 )
queue 1: affinity(8 )
[root@ktest-40 ublk]#
[root@ktest-40 ublk]# ~/git/fio/t/io_uring -p 0 -n 8 /dev/ublkb0
Or
`fio -numjobs=8 --ioengine=libaio --iodepth=128 --iodepth_batch_submit=32 \
--iodepth_batch_complete_min=32`
2) panic immediately:
[ 51.297750] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 51.298719] #PF: supervisor read access in kernel mode
[ 51.299403] #PF: error_code(0x0000) - not-present page
[ 51.300069] PGD 1161c8067 P4D 1161c8067 PUD 11a793067 PMD 0
[ 51.300825] Oops: Oops: 0000 [#1] SMP NOPTI
[ 51.301389] CPU: 0 UID: 0 PID: 1285 Comm: kublk Not tainted 6.15.0+ #288 PREEMPT(full)
[ 51.302375] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014
[ 51.303551] RIP: 0010:io_uring_cmd_done+0xa7/0x1d0
[ 51.304226] Code: 48 89 f1 48 89 f0 48 83 e1 bf 80 cc 01 48 81 c9 00 01 80 00 83 e6 40 48 0f 45 c1 48 89 43 48 44 89 6b 58 c7 43 5c 00 00 00 00 <8b> 07 f6 c4 08 74 12 48 89 93 e8 00 00 0
[ 51.306554] RSP: 0018:ffffd1da436e3a40 EFLAGS: 00010246
[ 51.307253] RAX: 0000000000000100 RBX: ffff8d9cd3737300 RCX: 0000000000000001
[ 51.308178] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 51.309333] RBP: 0000000000000001 R08: 0000000000000018 R09: 0000000000190015
[ 51.310744] R10: 0000000000190015 R11: 0000000000000035 R12: ffff8d9cd1c7c000
[ 51.311986] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 51.313386] FS: 00007f2c293916c0(0000) GS:ffff8da179df6000(0000) knlGS:0000000000000000
[ 51.314899] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 51.315926] CR2: 0000000000000000 CR3: 00000001161c9002 CR4: 0000000000772ef0
[ 51.317179] PKRU: 55555554
[ 51.317682] Call Trace:
[ 51.318040] <TASK>
[ 51.318355] ublk_cmd_list_tw_cb+0x30/0x40 [ublk_drv]
[ 51.319061] __io_run_local_work_loop+0x72/0x80
[ 51.319696] __io_run_local_work+0x69/0x1e0
[ 51.320274] io_cqring_wait+0x8f/0x6a0
[ 51.320794] __do_sys_io_uring_enter+0x500/0x770
[ 51.321422] do_syscall_64+0x82/0x170
[ 51.321891] ? __do_sys_io_uring_enter+0x500/0x770
Thanks,
Ming
next prev parent reply other threads:[~2025-05-29 10:00 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-27 23:01 [PATCH v7 0/8] ublk: decouple server threads from ublk_queues/hctxs Uday Shankar
2025-05-27 23:01 ` [PATCH v7 1/8] ublk: have a per-io daemon instead of a per-queue daemon Uday Shankar
2025-05-28 18:25 ` Caleb Sander Mateos
2025-05-29 9:59 ` Ming Lei [this message]
2025-05-29 15:37 ` Caleb Sander Mateos
2025-05-29 15:39 ` Caleb Sander Mateos
2025-05-27 23:01 ` [PATCH v7 2/8] selftests: ublk: kublk: plumb q_id in io_uring user_data Uday Shankar
2025-05-27 23:01 ` [PATCH v7 3/8] selftests: ublk: kublk: tie sqe allocation to io instead of queue Uday Shankar
2025-05-29 9:45 ` Ming Lei
2025-05-27 23:01 ` [PATCH v7 4/8] selftests: ublk: kublk: lift queue initialization out of thread Uday Shankar
2025-05-27 23:01 ` [PATCH v7 5/8] selftests: ublk: kublk: move per-thread data out of ublk_queue Uday Shankar
2025-05-27 23:01 ` [PATCH v7 6/8] selftests: ublk: kublk: decouple ublk_queues from ublk server threads Uday Shankar
2025-05-29 9:47 ` Ming Lei
2025-05-27 23:01 ` [PATCH v7 7/8] selftests: ublk: add test for per io daemons Uday Shankar
2025-05-29 9:47 ` Ming Lei
2025-05-27 23:01 ` [PATCH v7 8/8] Documentation: ublk: document UBLK_F_PER_IO_DAEMON Uday Shankar
2025-05-28 18:25 ` Caleb Sander Mateos
2025-05-29 9:48 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aDgwGoGCEpwd1mFY@fedora \
--to=ming.lei@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=corbet@lwn.net \
--cc=csander@purestorage.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=shuah@kernel.org \
--cc=ushankar@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.