Re: [PATCHSET v3] mq-deadline and BFQ scalability improvements

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Oleksandr Natalenko <oleksandr@natalenko.name>
To: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCHSET v3] mq-deadline and BFQ scalability improvements
Date: Tue, 23 Jan 2024 21:03:51 +0100	[thread overview]
Message-ID: <2313676.ElGaqSPkdT@natalenko.name> (raw)
In-Reply-To: <20240123174021.1967461-1-axboe@kernel.dk>

[-- Attachment #1: Type: text/plain, Size: 3727 bytes --]

Hello.

On úterý 23. ledna 2024 18:34:12 CET Jens Axboe wrote:
> Hi,
> 
> It's no secret that mq-deadline doesn't scale very well - it was
> originally done as a proof-of-concept conversion from deadline, when the
> blk-mq multiqueue layer was written. In the single queue world, the
> queue lock protected the IO scheduler as well, and mq-deadline simply
> adopted an internal dd->lock to fill the place of that.
> 
> While mq-deadline works under blk-mq and doesn't suffer any scaling on
> that side, as soon as request insertion or dispatch is done, we're
> hitting the per-queue dd->lock quite intensely. On a basic test box
> with 16 cores / 32 threads, running a number of IO intensive threads
> on either null_blk (single hw queue) or nvme0n1 (many hw queues) shows
> this quite easily:
> 
> The test case looks like this:
> 
> fio --bs=512 --group_reporting=1 --gtod_reduce=1 --invalidate=1 \
> 	--ioengine=io_uring --norandommap --runtime=60 --rw=randread \
> 	--thread --time_based=1 --buffered=0 --fixedbufs=1 --numjobs=32 \
> 	--iodepth=4 --iodepth_batch_submit=4 --iodepth_batch_complete=4 \
> 	--name=scaletest --filename=/dev/$DEV
> 
> and is being run on a desktop 7950X box.
> 
> which is 32 threads each doing 4 IOs, for a total queue depth of 128.
> 
> Results before the patches:
> 
> Device		IOPS	sys	contention	diff
> ====================================================
> null_blk	879K	89%	93.6%
> nvme0n1		901K	86%	94.5%
> 
> which looks pretty miserable, most of the time is spent contending on
> the queue lock.
> 
> This RFC patchset attempts to address that by:
> 
> 1) Serializing dispatch of requests. If we fail dispatching, rely on
>    the next completion to dispatch the next one. This could potentially
>    reduce the overall depth achieved on the device side, however even
>    for the heavily contended test I'm running here, no observable
>    change is seen. This is patch 2.
> 
> 2) Serialize request insertion, using internal per-cpu lists to
>    temporarily store requests until insertion can proceed. This is
>    patch 3.
> 
> 3) Skip expensive merges if the queue is already contended. Reasonings
>    provided in that patch, patch 4.
> 
> With that in place, the same test case now does:
> 
> Device		IOPS	sys	contention	diff
> ====================================================
> null_blk	2867K	11.1%	~6.0%		+226%
> nvme0n1		3162K	 9.9%	~5.0%		+250%
> 
> and while that doesn't completely eliminate the lock contention, it's
> oodles better than what it was before. The throughput increase shows
> that nicely, with more than a 200% improvement for both cases.
> 
> Since the above is very high IOPS testing to show the scalability
> limitations, I also ran this on a more normal drive on a Dell R7525 test
> box. It doesn't change the performance there (around 66K IOPS), but
> it does reduce the system time required to do the IO from 12.6% to
> 10.7%, or about 20% less time spent in the kernel.
> 
>  block/mq-deadline.c | 178 +++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 161 insertions(+), 17 deletions(-)
> 
> Since v2:
> 	- Update mq-deadline insertion locking optimization patch to
> 	  use Bart's variant instead. This also drops the per-cpu
> 	  buckets and hence resolves the need to potentially make
> 	  the number of buckets dependent on the host.
> 	- Use locking bitops
> 	- Add similar series for BFQ, with good results as well
> 	- Rebase on 6.8-rc1
> 
> 
> 

I'm running this for a couple of days with no issues, hence for the series:

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>

Thank you.

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2024-01-23 20:04 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-23 17:34 [PATCHSET v3] mq-deadline and BFQ scalability improvements Jens Axboe
2024-01-23 17:34 ` [PATCH 1/8] block/mq-deadline: pass in queue directly to dd_insert_request() Jens Axboe
2024-01-24  9:21   ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 2/8] block/mq-deadline: serialize request dispatching Jens Axboe
2024-01-23 18:36   ` Bart Van Assche
2024-01-23 19:13     ` Jens Axboe
2024-01-24  9:31       ` Christoph Hellwig
2024-01-24 15:00         ` Jens Axboe
2024-01-24  9:29   ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 3/8] block/mq-deadline: skip expensive merge lookups if contended Jens Axboe
2024-01-24  9:31   ` Johannes Thumshirn
2024-01-24  9:32   ` Christoph Hellwig
2024-01-24 15:02     ` Jens Axboe
2024-01-23 17:34 ` [PATCH 4/8] block/mq-deadline: use separate insertion lists Jens Axboe
2024-01-23 18:37   ` Bart Van Assche
2024-01-24  9:42   ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 5/8] block/bfq: pass in queue directly to bfq_insert_request() Jens Axboe
2024-01-23 18:38   ` Bart Van Assche
2024-01-24  9:46   ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 6/8] block/bfq: serialize request dispatching Jens Axboe
2024-01-23 18:40   ` Bart Van Assche
2024-01-23 19:14     ` Jens Axboe
2024-01-23 17:34 ` [PATCH 7/8] block/bfq: skip expensive merge lookups if contended Jens Axboe
2024-01-23 18:44   ` Bart Van Assche
2024-01-23 19:14     ` Jens Axboe
2024-01-23 17:34 ` [PATCH 8/8] block/bfq: use separate insertion lists Jens Axboe
2024-01-23 18:47   ` Bart Van Assche
2024-01-23 19:18     ` Jens Axboe
2024-01-23 20:03 ` Oleksandr Natalenko [this message]
2024-01-23 22:14   ` [PATCHSET v3] mq-deadline and BFQ scalability improvements Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2313676.ElGaqSPkdT@natalenko.name \
    --to=oleksandr@natalenko.name \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.