From: Oleksandr Natalenko <oleksandr@natalenko.name>
To: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCHSET v3] mq-deadline and BFQ scalability improvements
Date: Tue, 23 Jan 2024 21:03:51 +0100 [thread overview]
Message-ID: <2313676.ElGaqSPkdT@natalenko.name> (raw)
In-Reply-To: <20240123174021.1967461-1-axboe@kernel.dk>
[-- Attachment #1: Type: text/plain, Size: 3727 bytes --]
Hello.
On úterý 23. ledna 2024 18:34:12 CET Jens Axboe wrote:
> Hi,
>
> It's no secret that mq-deadline doesn't scale very well - it was
> originally done as a proof-of-concept conversion from deadline, when the
> blk-mq multiqueue layer was written. In the single queue world, the
> queue lock protected the IO scheduler as well, and mq-deadline simply
> adopted an internal dd->lock to fill the place of that.
>
> While mq-deadline works under blk-mq and doesn't suffer any scaling on
> that side, as soon as request insertion or dispatch is done, we're
> hitting the per-queue dd->lock quite intensely. On a basic test box
> with 16 cores / 32 threads, running a number of IO intensive threads
> on either null_blk (single hw queue) or nvme0n1 (many hw queues) shows
> this quite easily:
>
> The test case looks like this:
>
> fio --bs=512 --group_reporting=1 --gtod_reduce=1 --invalidate=1 \
> --ioengine=io_uring --norandommap --runtime=60 --rw=randread \
> --thread --time_based=1 --buffered=0 --fixedbufs=1 --numjobs=32 \
> --iodepth=4 --iodepth_batch_submit=4 --iodepth_batch_complete=4 \
> --name=scaletest --filename=/dev/$DEV
>
> and is being run on a desktop 7950X box.
>
> which is 32 threads each doing 4 IOs, for a total queue depth of 128.
>
> Results before the patches:
>
> Device IOPS sys contention diff
> ====================================================
> null_blk 879K 89% 93.6%
> nvme0n1 901K 86% 94.5%
>
> which looks pretty miserable, most of the time is spent contending on
> the queue lock.
>
> This RFC patchset attempts to address that by:
>
> 1) Serializing dispatch of requests. If we fail dispatching, rely on
> the next completion to dispatch the next one. This could potentially
> reduce the overall depth achieved on the device side, however even
> for the heavily contended test I'm running here, no observable
> change is seen. This is patch 2.
>
> 2) Serialize request insertion, using internal per-cpu lists to
> temporarily store requests until insertion can proceed. This is
> patch 3.
>
> 3) Skip expensive merges if the queue is already contended. Reasonings
> provided in that patch, patch 4.
>
> With that in place, the same test case now does:
>
> Device IOPS sys contention diff
> ====================================================
> null_blk 2867K 11.1% ~6.0% +226%
> nvme0n1 3162K 9.9% ~5.0% +250%
>
> and while that doesn't completely eliminate the lock contention, it's
> oodles better than what it was before. The throughput increase shows
> that nicely, with more than a 200% improvement for both cases.
>
> Since the above is very high IOPS testing to show the scalability
> limitations, I also ran this on a more normal drive on a Dell R7525 test
> box. It doesn't change the performance there (around 66K IOPS), but
> it does reduce the system time required to do the IO from 12.6% to
> 10.7%, or about 20% less time spent in the kernel.
>
> block/mq-deadline.c | 178 +++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 161 insertions(+), 17 deletions(-)
>
> Since v2:
> - Update mq-deadline insertion locking optimization patch to
> use Bart's variant instead. This also drops the per-cpu
> buckets and hence resolves the need to potentially make
> the number of buckets dependent on the host.
> - Use locking bitops
> - Add similar series for BFQ, with good results as well
> - Rebase on 6.8-rc1
>
>
>
I'm running this for a couple of days with no issues, hence for the series:
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Thank you.
--
Oleksandr Natalenko (post-factum)
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-01-23 20:04 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-23 17:34 [PATCHSET v3] mq-deadline and BFQ scalability improvements Jens Axboe
2024-01-23 17:34 ` [PATCH 1/8] block/mq-deadline: pass in queue directly to dd_insert_request() Jens Axboe
2024-01-24 9:21 ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 2/8] block/mq-deadline: serialize request dispatching Jens Axboe
2024-01-23 18:36 ` Bart Van Assche
2024-01-23 19:13 ` Jens Axboe
2024-01-24 9:31 ` Christoph Hellwig
2024-01-24 15:00 ` Jens Axboe
2024-01-24 9:29 ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 3/8] block/mq-deadline: skip expensive merge lookups if contended Jens Axboe
2024-01-24 9:31 ` Johannes Thumshirn
2024-01-24 9:32 ` Christoph Hellwig
2024-01-24 15:02 ` Jens Axboe
2024-01-23 17:34 ` [PATCH 4/8] block/mq-deadline: use separate insertion lists Jens Axboe
2024-01-23 18:37 ` Bart Van Assche
2024-01-24 9:42 ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 5/8] block/bfq: pass in queue directly to bfq_insert_request() Jens Axboe
2024-01-23 18:38 ` Bart Van Assche
2024-01-24 9:46 ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 6/8] block/bfq: serialize request dispatching Jens Axboe
2024-01-23 18:40 ` Bart Van Assche
2024-01-23 19:14 ` Jens Axboe
2024-01-23 17:34 ` [PATCH 7/8] block/bfq: skip expensive merge lookups if contended Jens Axboe
2024-01-23 18:44 ` Bart Van Assche
2024-01-23 19:14 ` Jens Axboe
2024-01-23 17:34 ` [PATCH 8/8] block/bfq: use separate insertion lists Jens Axboe
2024-01-23 18:47 ` Bart Van Assche
2024-01-23 19:18 ` Jens Axboe
2024-01-23 20:03 ` Oleksandr Natalenko [this message]
2024-01-23 22:14 ` [PATCHSET v3] mq-deadline and BFQ scalability improvements Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2313676.ElGaqSPkdT@natalenko.name \
--to=oleksandr@natalenko.name \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).