public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>, Alasdair Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@kernel.org>,
	linux-block@vger.kernel.org, dm-devel@lists.linux.dev
Subject: Re: [PATC] block: update queue limits atomically
Date: Wed, 19 Mar 2025 09:22:24 +0800	[thread overview]
Message-ID: <Z9ocUCrvXQRJHFVY@fedora> (raw)
In-Reply-To: <d5193df0-5944-8cf6-7eb6-26adf191269e@redhat.com>

On Tue, Mar 18, 2025 at 04:31:35PM +0100, Mikulas Patocka wrote:
> 
> 
> On Tue, 18 Mar 2025, Ming Lei wrote:
> 
> > On Tue, Mar 18, 2025 at 03:26:10PM +0100, Mikulas Patocka wrote:
> > > The block limits may be read while they are being modified. The statement
> > 
> > It is supposed to not be so for IO path, that is why queue is usually down
> > or frozen when updating limit.
> 
> The limits are read at some points when constructing a bio - for example 
> bio_integrity_add_page, bvec_try_merge_hw_page, bio_integrity_map_user.

For request based code path, there isn't such issue because queue usage
counter is grabbed.

I should be one device mapper specific issue because the above interface
may not be called from dm_submit_bio().

One fix is to make sure that queue usage counter is grabbed in dm's bio/clone
submission code path.

> 
> > For other cases, limit lock can be held for sync the read/write.
> > 
> > Or you have cases not covered by both queue freeze and limit lock?
> 
> For example, device mapper reads the limits of the underlying devices 
> without holding any lock (dm_set_device_limits,

dm_set_device_limits() need to be fixed by holding limit lock.


> __process_abnormal_io, 
> __max_io_len).

The two is called with queue usage counter grabbed, so it should be fine.


> It also writes the limits in the I/O path - 
> disable_discard, disable_write_zeroes - you couldn't easily lock it here 
> because it happens in the interrupt contex.

IMO it is one bad implementation, why does device mapper have to clear
it in bio->end_io() or request's blk_mq_ops->complete()?

> 
> I'm not sure how many other kernel subsystems do it and whether they could 
> all be converted to locking.

Most request based driver should have been converted to new API.

I guess only device mapper / raid / other bio based driver should have such
kind of risk.

> 
> > > "q->limits = *lim" is not really atomic. The compiler may turn it into
> > > memcpy (clang does).
> > > 
> > > On x86-64, the kernel uses the "rep movsb" instruction for memcpy - it is
> > > optimized on modern CPUs, but it is not atomic, it may be interrupted at
> > > any byte boundary - and if it is interrupted, the readers may read
> > > garbage.
> > > 
> > > On sparc64, there's an instruction that zeroes a cache line without
> > > reading it from memory. The kernel memcpy implementation uses it (see
> > > b3a04ed507bf) to avoid loading the destination buffer from memory. The
> > > problem is that if we copy a block of data to q->limits and someone reads
> > > it at the same time, the reader may read zeros.
> > > 
> > > This commit changes it to use WRITE_ONCE, so that individual words are
> > > updated atomically.
> > 
> > It isn't necessary, for this particular problem, it is also fragile to
> > provide atomic word update in this low level way, such as, what if
> > sizeof(struct queue_limits) isn't 8byte aligned?
> 
> struct queue_limits contains two "unsigned long" fields, so it must be 
> aligned on "unsigned long" boundary.
> 
> In order to make it future-proof, we could use __alignof__(struct 
> queue_limits) to determine the size of the update step.

Yeah, it looks fine, but I feel it is still fragile, and not sure it is one
accepted solution.



Thanks,
Ming


  reply	other threads:[~2025-03-19  1:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-18 14:26 [PATC] block: update queue limits atomically Mikulas Patocka
2025-03-18 14:56 ` Ming Lei
2025-03-18 15:31   ` Mikulas Patocka
2025-03-19  1:22     ` Ming Lei [this message]
2025-03-19  1:58       ` Jens Axboe
2025-03-19 21:18         ` Mikulas Patocka
2025-03-20  2:22           ` Ming Lei
2025-03-20 12:58             ` Jens Axboe
2025-03-20  5:26         ` Christoph Hellwig
2025-03-18 15:58 ` Bart Van Assche
2025-03-18 16:13   ` Mikulas Patocka
2025-03-20  5:25 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9ocUCrvXQRJHFVY@fedora \
    --to=ming.lei@redhat.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@lists.linux.dev \
    --cc=linux-block@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox