linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleksandr Natalenko <oleksandr@natalenko.name>
To: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCHSET v3] mq-deadline and BFQ scalability improvements
Date: Tue, 23 Jan 2024 21:03:51 +0100	[thread overview]
Message-ID: <2313676.ElGaqSPkdT@natalenko.name> (raw)
In-Reply-To: <20240123174021.1967461-1-axboe@kernel.dk>

[-- Attachment #1: Type: text/plain, Size: 3727 bytes --]

Hello.

On úterý 23. ledna 2024 18:34:12 CET Jens Axboe wrote:
> Hi,
> 
> It's no secret that mq-deadline doesn't scale very well - it was
> originally done as a proof-of-concept conversion from deadline, when the
> blk-mq multiqueue layer was written. In the single queue world, the
> queue lock protected the IO scheduler as well, and mq-deadline simply
> adopted an internal dd->lock to fill the place of that.
> 
> While mq-deadline works under blk-mq and doesn't suffer any scaling on
> that side, as soon as request insertion or dispatch is done, we're
> hitting the per-queue dd->lock quite intensely. On a basic test box
> with 16 cores / 32 threads, running a number of IO intensive threads
> on either null_blk (single hw queue) or nvme0n1 (many hw queues) shows
> this quite easily:
> 
> The test case looks like this:
> 
> fio --bs=512 --group_reporting=1 --gtod_reduce=1 --invalidate=1 \
> 	--ioengine=io_uring --norandommap --runtime=60 --rw=randread \
> 	--thread --time_based=1 --buffered=0 --fixedbufs=1 --numjobs=32 \
> 	--iodepth=4 --iodepth_batch_submit=4 --iodepth_batch_complete=4 \
> 	--name=scaletest --filename=/dev/$DEV
> 
> and is being run on a desktop 7950X box.
> 
> which is 32 threads each doing 4 IOs, for a total queue depth of 128.
> 
> Results before the patches:
> 
> Device		IOPS	sys	contention	diff
> ====================================================
> null_blk	879K	89%	93.6%
> nvme0n1		901K	86%	94.5%
> 
> which looks pretty miserable, most of the time is spent contending on
> the queue lock.
> 
> This RFC patchset attempts to address that by:
> 
> 1) Serializing dispatch of requests. If we fail dispatching, rely on
>    the next completion to dispatch the next one. This could potentially
>    reduce the overall depth achieved on the device side, however even
>    for the heavily contended test I'm running here, no observable
>    change is seen. This is patch 2.
> 
> 2) Serialize request insertion, using internal per-cpu lists to
>    temporarily store requests until insertion can proceed. This is
>    patch 3.
> 
> 3) Skip expensive merges if the queue is already contended. Reasonings
>    provided in that patch, patch 4.
> 
> With that in place, the same test case now does:
> 
> Device		IOPS	sys	contention	diff
> ====================================================
> null_blk	2867K	11.1%	~6.0%		+226%
> nvme0n1		3162K	 9.9%	~5.0%		+250%
> 
> and while that doesn't completely eliminate the lock contention, it's
> oodles better than what it was before. The throughput increase shows
> that nicely, with more than a 200% improvement for both cases.
> 
> Since the above is very high IOPS testing to show the scalability
> limitations, I also ran this on a more normal drive on a Dell R7525 test
> box. It doesn't change the performance there (around 66K IOPS), but
> it does reduce the system time required to do the IO from 12.6% to
> 10.7%, or about 20% less time spent in the kernel.
> 
>  block/mq-deadline.c | 178 +++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 161 insertions(+), 17 deletions(-)
> 
> Since v2:
> 	- Update mq-deadline insertion locking optimization patch to
> 	  use Bart's variant instead. This also drops the per-cpu
> 	  buckets and hence resolves the need to potentially make
> 	  the number of buckets dependent on the host.
> 	- Use locking bitops
> 	- Add similar series for BFQ, with good results as well
> 	- Rebase on 6.8-rc1
> 
> 
> 

I'm running this for a couple of days with no issues, hence for the series:

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>

Thank you.

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2024-01-23 20:04 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-23 17:34 [PATCHSET v3] mq-deadline and BFQ scalability improvements Jens Axboe
2024-01-23 17:34 ` [PATCH 1/8] block/mq-deadline: pass in queue directly to dd_insert_request() Jens Axboe
2024-01-24  9:21   ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 2/8] block/mq-deadline: serialize request dispatching Jens Axboe
2024-01-23 18:36   ` Bart Van Assche
2024-01-23 19:13     ` Jens Axboe
2024-01-24  9:31       ` Christoph Hellwig
2024-01-24 15:00         ` Jens Axboe
2024-01-24  9:29   ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 3/8] block/mq-deadline: skip expensive merge lookups if contended Jens Axboe
2024-01-24  9:31   ` Johannes Thumshirn
2024-01-24  9:32   ` Christoph Hellwig
2024-01-24 15:02     ` Jens Axboe
2024-01-23 17:34 ` [PATCH 4/8] block/mq-deadline: use separate insertion lists Jens Axboe
2024-01-23 18:37   ` Bart Van Assche
2024-01-24  9:42   ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 5/8] block/bfq: pass in queue directly to bfq_insert_request() Jens Axboe
2024-01-23 18:38   ` Bart Van Assche
2024-01-24  9:46   ` Johannes Thumshirn
2024-01-23 17:34 ` [PATCH 6/8] block/bfq: serialize request dispatching Jens Axboe
2024-01-23 18:40   ` Bart Van Assche
2024-01-23 19:14     ` Jens Axboe
2024-01-23 17:34 ` [PATCH 7/8] block/bfq: skip expensive merge lookups if contended Jens Axboe
2024-01-23 18:44   ` Bart Van Assche
2024-01-23 19:14     ` Jens Axboe
2024-01-23 17:34 ` [PATCH 8/8] block/bfq: use separate insertion lists Jens Axboe
2024-01-23 18:47   ` Bart Van Assche
2024-01-23 19:18     ` Jens Axboe
2024-01-23 20:03 ` Oleksandr Natalenko [this message]
2024-01-23 22:14   ` [PATCHSET v3] mq-deadline and BFQ scalability improvements Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2313676.ElGaqSPkdT@natalenko.name \
    --to=oleksandr@natalenko.name \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).