From: Jens Axboe <axboe@kernel.dk>
To: linux-block@vger.kernel.org
Subject: [PATCHSET v3] mq-deadline and BFQ scalability improvements
Date: Tue, 23 Jan 2024 10:34:12 -0700 [thread overview]
Message-ID: <20240123174021.1967461-1-axboe@kernel.dk> (raw)
Hi,
It's no secret that mq-deadline doesn't scale very well - it was
originally done as a proof-of-concept conversion from deadline, when the
blk-mq multiqueue layer was written. In the single queue world, the
queue lock protected the IO scheduler as well, and mq-deadline simply
adopted an internal dd->lock to fill the place of that.
While mq-deadline works under blk-mq and doesn't suffer any scaling on
that side, as soon as request insertion or dispatch is done, we're
hitting the per-queue dd->lock quite intensely. On a basic test box
with 16 cores / 32 threads, running a number of IO intensive threads
on either null_blk (single hw queue) or nvme0n1 (many hw queues) shows
this quite easily:
The test case looks like this:
fio --bs=512 --group_reporting=1 --gtod_reduce=1 --invalidate=1 \
--ioengine=io_uring --norandommap --runtime=60 --rw=randread \
--thread --time_based=1 --buffered=0 --fixedbufs=1 --numjobs=32 \
--iodepth=4 --iodepth_batch_submit=4 --iodepth_batch_complete=4 \
--name=scaletest --filename=/dev/$DEV
and is being run on a desktop 7950X box.
which is 32 threads each doing 4 IOs, for a total queue depth of 128.
Results before the patches:
Device IOPS sys contention diff
====================================================
null_blk 879K 89% 93.6%
nvme0n1 901K 86% 94.5%
which looks pretty miserable, most of the time is spent contending on
the queue lock.
This RFC patchset attempts to address that by:
1) Serializing dispatch of requests. If we fail dispatching, rely on
the next completion to dispatch the next one. This could potentially
reduce the overall depth achieved on the device side, however even
for the heavily contended test I'm running here, no observable
change is seen. This is patch 2.
2) Serialize request insertion, using internal per-cpu lists to
temporarily store requests until insertion can proceed. This is
patch 3.
3) Skip expensive merges if the queue is already contended. Reasonings
provided in that patch, patch 4.
With that in place, the same test case now does:
Device IOPS sys contention diff
====================================================
null_blk 2867K 11.1% ~6.0% +226%
nvme0n1 3162K 9.9% ~5.0% +250%
and while that doesn't completely eliminate the lock contention, it's
oodles better than what it was before. The throughput increase shows
that nicely, with more than a 200% improvement for both cases.
Since the above is very high IOPS testing to show the scalability
limitations, I also ran this on a more normal drive on a Dell R7525 test
box. It doesn't change the performance there (around 66K IOPS), but
it does reduce the system time required to do the IO from 12.6% to
10.7%, or about 20% less time spent in the kernel.
block/mq-deadline.c | 178 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 161 insertions(+), 17 deletions(-)
Since v2:
- Update mq-deadline insertion locking optimization patch to
use Bart's variant instead. This also drops the per-cpu
buckets and hence resolves the need to potentially make
the number of buckets dependent on the host.
- Use locking bitops
- Add similar series for BFQ, with good results as well
- Rebase on 6.8-rc1
next reply other threads:[~2024-01-23 17:40 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-23 17:34 Jens Axboe [this message]
2024-01-23 17:34 ` [PATCH 1/8] block/mq-deadline: pass in queue directly to dd_insert_request() Jens Axboe
2024-01-24 9:21 ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 2/8] block/mq-deadline: serialize request dispatching Jens Axboe
2024-01-23 18:36 ` Bart Van Assche
2024-01-23 19:13 ` Jens Axboe
2024-01-24 9:31 ` Christoph Hellwig
2024-01-24 15:00 ` Jens Axboe
2024-01-24 9:29 ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 3/8] block/mq-deadline: skip expensive merge lookups if contended Jens Axboe
2024-01-24 9:31 ` Johannes Thumshirn
2024-01-24 9:32 ` Christoph Hellwig
2024-01-24 15:02 ` Jens Axboe
2024-01-23 17:34 ` [PATCH 4/8] block/mq-deadline: use separate insertion lists Jens Axboe
2024-01-23 18:37 ` Bart Van Assche
2024-01-24 9:42 ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 5/8] block/bfq: pass in queue directly to bfq_insert_request() Jens Axboe
2024-01-23 18:38 ` Bart Van Assche
2024-01-24 9:46 ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 6/8] block/bfq: serialize request dispatching Jens Axboe
2024-01-23 18:40 ` Bart Van Assche
2024-01-23 19:14 ` Jens Axboe
2024-01-23 17:34 ` [PATCH 7/8] block/bfq: skip expensive merge lookups if contended Jens Axboe
2024-01-23 18:44 ` Bart Van Assche
2024-01-23 19:14 ` Jens Axboe
2024-01-23 17:34 ` [PATCH 8/8] block/bfq: use separate insertion lists Jens Axboe
2024-01-23 18:47 ` Bart Van Assche
2024-01-23 19:18 ` Jens Axboe
2024-01-23 20:03 ` [PATCHSET v3] mq-deadline and BFQ scalability improvements Oleksandr Natalenko
2024-01-23 22:14 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240123174021.1967461-1-axboe@kernel.dk \
--to=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.