From: Ming Lei <ming.lei@redhat.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>, Alasdair Kergon <agk@redhat.com>,
Mike Snitzer <snitzer@kernel.org>,
linux-block@vger.kernel.org, dm-devel@lists.linux.dev
Subject: Re: [PATC] block: update queue limits atomically
Date: Wed, 19 Mar 2025 09:22:24 +0800 [thread overview]
Message-ID: <Z9ocUCrvXQRJHFVY@fedora> (raw)
In-Reply-To: <d5193df0-5944-8cf6-7eb6-26adf191269e@redhat.com>
On Tue, Mar 18, 2025 at 04:31:35PM +0100, Mikulas Patocka wrote:
>
>
> On Tue, 18 Mar 2025, Ming Lei wrote:
>
> > On Tue, Mar 18, 2025 at 03:26:10PM +0100, Mikulas Patocka wrote:
> > > The block limits may be read while they are being modified. The statement
> >
> > It is supposed to not be so for IO path, that is why queue is usually down
> > or frozen when updating limit.
>
> The limits are read at some points when constructing a bio - for example
> bio_integrity_add_page, bvec_try_merge_hw_page, bio_integrity_map_user.
For request based code path, there isn't such issue because queue usage
counter is grabbed.
I should be one device mapper specific issue because the above interface
may not be called from dm_submit_bio().
One fix is to make sure that queue usage counter is grabbed in dm's bio/clone
submission code path.
>
> > For other cases, limit lock can be held for sync the read/write.
> >
> > Or you have cases not covered by both queue freeze and limit lock?
>
> For example, device mapper reads the limits of the underlying devices
> without holding any lock (dm_set_device_limits,
dm_set_device_limits() need to be fixed by holding limit lock.
> __process_abnormal_io,
> __max_io_len).
The two is called with queue usage counter grabbed, so it should be fine.
> It also writes the limits in the I/O path -
> disable_discard, disable_write_zeroes - you couldn't easily lock it here
> because it happens in the interrupt contex.
IMO it is one bad implementation, why does device mapper have to clear
it in bio->end_io() or request's blk_mq_ops->complete()?
>
> I'm not sure how many other kernel subsystems do it and whether they could
> all be converted to locking.
Most request based driver should have been converted to new API.
I guess only device mapper / raid / other bio based driver should have such
kind of risk.
>
> > > "q->limits = *lim" is not really atomic. The compiler may turn it into
> > > memcpy (clang does).
> > >
> > > On x86-64, the kernel uses the "rep movsb" instruction for memcpy - it is
> > > optimized on modern CPUs, but it is not atomic, it may be interrupted at
> > > any byte boundary - and if it is interrupted, the readers may read
> > > garbage.
> > >
> > > On sparc64, there's an instruction that zeroes a cache line without
> > > reading it from memory. The kernel memcpy implementation uses it (see
> > > b3a04ed507bf) to avoid loading the destination buffer from memory. The
> > > problem is that if we copy a block of data to q->limits and someone reads
> > > it at the same time, the reader may read zeros.
> > >
> > > This commit changes it to use WRITE_ONCE, so that individual words are
> > > updated atomically.
> >
> > It isn't necessary, for this particular problem, it is also fragile to
> > provide atomic word update in this low level way, such as, what if
> > sizeof(struct queue_limits) isn't 8byte aligned?
>
> struct queue_limits contains two "unsigned long" fields, so it must be
> aligned on "unsigned long" boundary.
>
> In order to make it future-proof, we could use __alignof__(struct
> queue_limits) to determine the size of the update step.
Yeah, it looks fine, but I feel it is still fragile, and not sure it is one
accepted solution.
Thanks,
Ming
next prev parent reply other threads:[~2025-03-19 1:22 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-18 14:26 [PATC] block: update queue limits atomically Mikulas Patocka
2025-03-18 14:56 ` Ming Lei
2025-03-18 15:31 ` Mikulas Patocka
2025-03-19 1:22 ` Ming Lei [this message]
2025-03-19 1:58 ` Jens Axboe
2025-03-19 21:18 ` Mikulas Patocka
2025-03-20 2:22 ` Ming Lei
2025-03-20 12:58 ` Jens Axboe
2025-03-20 5:26 ` Christoph Hellwig
2025-03-18 15:58 ` Bart Van Assche
2025-03-18 16:13 ` Mikulas Patocka
2025-03-20 5:25 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z9ocUCrvXQRJHFVY@fedora \
--to=ming.lei@redhat.com \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=dm-devel@lists.linux.dev \
--cc=linux-block@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=snitzer@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox