Re: [LSF/MM/BPF TOPIC] Improving Zoned Storage Support

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Bart Van Assche <bvanassche@acm.org>,
	Damien Le Moal <dlemoal@kernel.org>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [LSF/MM/BPF TOPIC] Improving Zoned Storage Support
Date: Wed, 17 Jan 2024 14:02:51 -0700	[thread overview]
Message-ID: <207a985d-ad4e-4cad-ac07-961633967bfc@kernel.dk> (raw)
In-Reply-To: <9f4a6b8a-1c17-46b7-8344-cbf4bcb406ab@kernel.dk>

On 1/17/24 1:20 PM, Jens Axboe wrote:
> On 1/17/24 1:18 PM, Bart Van Assche wrote:
>> On 1/17/24 12:06, Jens Axboe wrote:
>>> Case in point, I spent 10 min hacking up some smarts on the insertion
>>> and dispatch side, and then we get:
>>>
>>> IOPS=2.54M, BW=1240MiB/s, IOS/call=32/32
>>>
>>> or about a 63% improvement when running the _exact same thing_. Looking
>>> at profiles:
>>>
>>> -   13.71%  io_uring  [kernel.kallsyms]  [k] queued_spin_lock_slowpath
>>>
>>> reducing the > 70% of locking contention down to ~14%. No change in data
>>> structures, just an ugly hack that:
>>>
>>> - Serializes dispatch, no point having someone hammer on dd->lock for
>>>    dispatch when already running
>>> - Serialize insertions, punt to one of N buckets if insertion is already
>>>    busy. Current insertion will notice someone else did that, and will
>>>    prune the buckets and re-run insertion.
>>>
>>> And while I seriously doubt that my quick hack is 100% fool proof, it
>>> works as a proof of concept. If we can get that kind of reduction with
>>> minimal effort, well...
>>
>> If nobody else beats me to it then I will look into using separate
>> locks in the mq-deadline scheduler for insertion and dispatch.
> 
> That's not going to help by itself, as most of the contention (as I
> showed in the profile trace in the email) is from dispatch competing
> with itself, and not necessarily dispatch competing with insertion. And
> not sure how that would even work, as insert and dispatch are working on
> the same structures.
> 
> Do some proper analysis first, then that will show you where the problem
> is.

Here's a quick'n dirty that brings it from 1.56M to:

IOPS=3.50M, BW=1711MiB/s, IOS/call=32/32

by just doing something stupid - if someone is already dispatching, then
don't dispatch anything. Clearly shows that this is just dispatch
contention. But a 160% improvement from looking at the initial profile I
sent and hacking up something stupid in a few minutes does show that
there's a ton of low hanging fruit here.

This is run on nvme, so there's going to be lots of hardware queues.
This may even be worth solving in blk-mq rather than try and hack around
it in the scheduler, blk-mq has no idea that mq-deadline is serializing
all hardware queues like this. Or we just solve it in the io scheduler,
since that's the one with the knowledge.

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index f958e79277b8..822337521fc5 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -80,6 +80,11 @@ struct dd_per_prio {
 };
 
 struct deadline_data {
+	spinlock_t lock;
+	spinlock_t zone_lock ____cacheline_aligned_in_smp;
+
+	unsigned long dispatch_state;
+
 	/*
 	 * run time data
 	 */
@@ -100,9 +105,6 @@ struct deadline_data {
 	int front_merges;
 	u32 async_depth;
 	int prio_aging_expire;
-
-	spinlock_t lock;
-	spinlock_t zone_lock;
 };
 
 /* Maps an I/O priority class to a deadline scheduler priority. */
@@ -600,6 +602,10 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	struct request *rq;
 	enum dd_prio prio;
 
+	if (test_bit(0, &dd->dispatch_state) &&
+	    test_and_set_bit(0, &dd->dispatch_state))
+		return NULL;
+
 	spin_lock(&dd->lock);
 	rq = dd_dispatch_prio_aged_requests(dd, now);
 	if (rq)
@@ -616,6 +622,7 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	}
 
 unlock:
+	clear_bit(0, &dd->dispatch_state);
 	spin_unlock(&dd->lock);
 
 	return rq;

-- 
Jens Axboe

next prev parent reply	other threads:[~2024-01-17 21:02 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-16 18:20 [LSF/MM/BPF TOPIC] Improving Zoned Storage Support Bart Van Assche
2024-01-16 23:34 ` Damien Le Moal
2024-01-17  1:21   ` Bart Van Assche
2024-01-17 17:36   ` Bart Van Assche
2024-01-17 17:48     ` Jens Axboe
2024-01-17 18:22       ` Bart Van Assche
2024-01-17 18:43         ` Jens Axboe
2024-01-17 20:06           ` Jens Axboe
2024-01-17 20:18             ` Bart Van Assche
2024-01-17 20:20               ` Jens Axboe
2024-01-17 21:02                 ` Jens Axboe [this message]
2024-01-17 21:14                   ` Jens Axboe
2024-01-17 21:33                     ` Bart Van Assche
2024-01-17 21:40                       ` Jens Axboe
2024-01-18  0:43                         ` Bart Van Assche
2024-01-18 14:51                           ` Jens Axboe
2024-01-18  0:38           ` Bart Van Assche
2024-01-18  0:42             ` Jens Axboe
2024-01-18  0:54               ` Bart Van Assche
2024-01-18 15:07                 ` Jens Axboe
2024-01-17  8:15 ` Viacheslav Dubeyko

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:f958e79277b dfblob:822337521fc )
 OR (
bs:"Re: [LSF/MM/BPF TOPIC] Improving Zoned Storage Support" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=207a985d-ad4e-4cad-ac07-961633967bfc@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=dlemoal@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.