All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Yu Kuai <yukuai1@huaweicloud.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, John Garry <john.garry@huawei.com>,
	Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH 5/5] blk-mq: Replace tags->lock with SRCU for tag iterators
Date: Thu, 7 Aug 2025 10:12:32 +0800	[thread overview]
Message-ID: <aJQLkDTiHVboo6CT@fedora> (raw)
In-Reply-To: <700b6ab7-c0c0-58ea-fccf-fd9c3d806d59@huaweicloud.com>

On Thu, Aug 07, 2025 at 09:23:24AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2025/08/06 21:28, Ming Lei 写道:
> > On Wed, Aug 06, 2025 at 05:21:32PM +0800, Yu Kuai wrote:
> > > Hi,
> > > 
> > > 在 2025/08/01 19:44, Ming Lei 写道:
> > > > Replace the spinlock in blk_mq_find_and_get_req() with an SRCU read lock
> > > > around the tag iterators.
> > > > 
> > > > This is done by:
> > > > 
> > > > - Holding the SRCU read lock in blk_mq_queue_tag_busy_iter(),
> > > > blk_mq_tagset_busy_iter(), and blk_mq_hctx_has_requests().
> > > > 
> > > > - Removing the now-redundant tags->lock from blk_mq_find_and_get_req().
> > > > 
> > > > This change improves performance by replacing a spinlock with a more
> > > > scalable SRCU lock, and fixes lockup issue in scsi_host_busy() in case of
> > > > shost->host_blocked.
> > > > 
> > > > Meantime it becomes possible to use blk_mq_in_driver_rw() for io
> > > > accounting.
> > > > 
> > > > Signed-off-by: Ming Lei<ming.lei@redhat.com>
> > > > ---
> > > >    block/blk-mq-tag.c | 12 ++++++++----
> > > >    block/blk-mq.c     | 24 ++++--------------------
> > > >    2 files changed, 12 insertions(+), 24 deletions(-)
> > > 
> > > I think it's not good to use blk_mq_in_driver_rw() for io accounting, we
> > > start io accounting from blk_account_io_start(), where such io is not in
> > > driver yet.
> > 
> > In blk_account_io_start(), the current IO is _not_ taken into account in
> > update_io_ticks() yet.
> 
> However, this is exactly what we did from coding for a long time, for
> example, consider just one IO:
> 
> t1: blk_account_io_start
> t2: blk_mq_start_request
> t3: blk_account_io_done
> 
> The update_io_ticks() is called from blk_account_io_start() and
> blk_account_io_done(), the time (t3 - t1) will be accounted into
> io_ticks.

That still may not be correct, please see "Documentation/block/stat.rst":

```
io_ticks        milliseconds  total time this block device has been active
```

What I meant is that it doesn't matter wrt. "io accounting from
blk_account_io_start()", because the current io is excluded from `inflight ios` in
update_io_ticks() from blk_account_io_start().

> 
> And if we consider more IO, there will be a mess:
> 
> t1: IO a: blk_account_io_start
> t2: IO b: blk_account_io_start
> t3: IO a: blk_mq_start_request
> t4: IO b: blk_mq_start_request
> t5: IO a: blk_account_io_done
> t6: IO b: blk_account_io_done
> 
> In the old cases that IO is inflight untill blk_mq_start_request, the
> io_ticks accounted is t6 - t2, which is werid. That's the reason I
> changed the IO accouting, and consider IO inflight from
> blk_account_io_start, and this will unify all the fields in iostat.

In reality implementation may include odd things, but the top thing is that
what/how 'io_ticks' should be defined in theory? same with util%.

> > 
> > Also please look at 'man iostat':
> > 
> >      %util  Percentage  of  elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device). Device
> >             saturation occurs when this value is close to 100% for devices serving requests serially.  But for devices serving requests  in
> >             parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance limits.
> > 
> > which shows %util in device level, not from queuing IO to complete io from device.
> > 
> > That said the current approach for counting inflight IO via percpu counter
> > from blk_account_io_start() is not correct from %util viewpoint from
> > request based driver.
> > 
> 
> I'll prefer to update the documents, on the one hand, util is not

Can we update every distributed iostat's man page? And we can't refresh
every user's mind about %util's definition in short time.

Also what/how should we update the document to? which is one serious thing.

Thanks, 
Ming


  reply	other threads:[~2025-08-07  2:12 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-01 11:44 [PATCH 0/5] blk-mq: Replace tags->lock with SRCU for tag iterators Ming Lei
2025-08-01 11:44 ` [PATCH 1/5] blk-mq: Move flush queue allocation into blk_mq_init_hctx() Ming Lei
2025-08-04  6:06   ` Yu Kuai
2025-08-04  7:07   ` Hannes Reinecke
2025-08-01 11:44 ` [PATCH 2/5] blk-mq: Pass tag_set to blk_mq_free_rq_map/tags Ming Lei
2025-08-04  7:08   ` Hannes Reinecke
2025-08-05  7:48   ` Yu Kuai
2025-08-01 11:44 ` [PATCH 3/5] blk-mq: Defer freeing of tags page_list to SRCU callback Ming Lei
2025-08-04  7:09   ` Hannes Reinecke
2025-08-06  9:15   ` Yu Kuai
2025-08-01 11:44 ` [PATCH 4/5] blk-mq: Defer freeing flush queue " Ming Lei
2025-08-04  7:11   ` Hannes Reinecke
2025-08-06  9:17   ` Yu Kuai
2025-08-01 11:44 ` [PATCH 5/5] blk-mq: Replace tags->lock with SRCU for tag iterators Ming Lei
2025-08-04  6:30   ` Yu Kuai
2025-08-04 11:32     ` Ming Lei
2025-08-05  8:33       ` Yu Kuai
2025-08-05  8:38         ` Yu Kuai
2025-08-05  8:48           ` Ming Lei
2025-08-06  1:06             ` Yu Kuai
2025-08-06  8:59               ` Ming Lei
2025-08-06  9:06                 ` Yu Kuai
2025-08-04  7:13   ` Hannes Reinecke
2025-08-04 11:35     ` Ming Lei
2025-08-04 11:45       ` Hannes Reinecke
2025-08-06  9:21   ` Yu Kuai
2025-08-06 13:28     ` Ming Lei
2025-08-07  1:23       ` Yu Kuai
2025-08-07  2:12         ` Ming Lei [this message]
2025-08-07  3:44           ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJQLkDTiHVboo6CT@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=john.garry@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=sathya.prakash@broadcom.com \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.