From: Mike Snitzer <snitzer@kernel.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
Benjamin Marzinski <bmarzins@redhat.com>,
dm-devel@lists.linux.dev
Subject: Re: dm-integrity: align the outgoing bio in integrity_recheck
Date: Fri, 22 Mar 2024 11:53:20 -0400 [thread overview]
Message-ID: <Zf2pcG_X1VyBimfV@redhat.com> (raw)
In-Reply-To: <Zf1zfjuzb9be0Uwz@fedora>
On Fri, Mar 22 2024 at 8:03P -0400,
Ming Lei <ming.lei@redhat.com> wrote:
> On Fri, Mar 22, 2024 at 11:30:33AM +0100, Mikulas Patocka wrote:
> >
> >
> > On Fri, 22 Mar 2024, Ming Lei wrote:
> >
> > > On Thu, Mar 21, 2024 at 05:48:45PM +0100, Mikulas Patocka wrote:
> > > > It may be possible to set up dm-integrity with smaller sector size than
> > > > the logical sector size of the underlying device. In this situation,
> > > > dm-integrity guarantees that the outgoing bios have the same alignment as
> > > > incoming bios (so, if you create a filesystem with 4k block size,
> > > > dm-integrity would send 4k-aligned bios to the underlying device).
> > > >
> > > > This guarantee was broken when integrity_recheck was implemented.
> > > > integrity_recheck sends bio that is aligned to ic->sectors_per_block. So
> > > > if we set up integrity with 512-byte sector size on a device with logical
> > > > block size 4k, we would be sending unaligned bio. This triggered a bug in
> > > > one of our internal tests.
> > > >
> > > > This commit fixes it - it determines what's the actual alignment of the
> > > > incoming bio and then makes sure that the outgoing bio in
> > > > integrity_recheck has the same alignment.
> > > >
> > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > > Fixes: c88f5e553fe3 ("dm-integrity: recheck the integrity tag after a failure")
> > > > Cc: stable@vger.kernel.org
> > > >
> > > > ---
> > > > drivers/md/dm-integrity.c | 12 ++++++++++--
> > > > 1 file changed, 10 insertions(+), 2 deletions(-)
> > > >
> > > > Index: linux-2.6/drivers/md/dm-integrity.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/drivers/md/dm-integrity.c 2024-03-21 14:25:45.000000000 +0100
> > > > +++ linux-2.6/drivers/md/dm-integrity.c 2024-03-21 17:47:39.000000000 +0100
> > > > @@ -1699,7 +1699,6 @@ static noinline void integrity_recheck(s
> > > > struct bio_vec bv;
> > > > sector_t sector, logical_sector, area, offset;
> > > > struct page *page;
> > > > - void *buffer;
> > > >
> > > > get_area_and_offset(ic, dio->range.logical_sector, &area, &offset);
> > > > dio->metadata_block = get_metadata_sector_and_offset(ic, area, offset,
> > > > @@ -1708,13 +1707,14 @@ static noinline void integrity_recheck(s
> > > > logical_sector = dio->range.logical_sector;
> > > >
> > > > page = mempool_alloc(&ic->recheck_pool, GFP_NOIO);
> > > > - buffer = page_to_virt(page);
> > > >
> > > > __bio_for_each_segment(bv, bio, iter, dio->bio_details.bi_iter) {
> > > > unsigned pos = 0;
> > > >
> > > > do {
> > > > + sector_t alignment;
> > > > char *mem;
> > > > + char *buffer = page_to_virt(page);
> > > > int r;
> > > > struct dm_io_request io_req;
> > > > struct dm_io_region io_loc;
> > > > @@ -1727,6 +1727,14 @@ static noinline void integrity_recheck(s
> > > > io_loc.sector = sector;
> > > > io_loc.count = ic->sectors_per_block;
> > > >
> > > > + /* Align the bio to logical block size */
> > > > + alignment = dio->range.logical_sector | bio_sectors(bio) | (PAGE_SIZE >> SECTOR_SHIFT);
> > > > + alignment &= -alignment;
> > >
> > > The above is less readable, :-(
> >
> > It isolates the lowest bit from dio->range.logical_sector,
> > bio_sectors(bio) and (PAGE_SIZE >> SECTOR_SHIFT).
> >
> > See for example this https://www.felixcloutier.com/x86/blsi
>
> Fine, but I have to say such usage isn't popular.
Yeah, at a minimum it should have a comment explaining the
optimization of combining and then getting lsbit.
The non-ffs() optimized gcd() uses the same but comments it:
/* Isolate lsbit of r */
r &= -r;
> > > > + io_loc.sector = round_down(io_loc.sector, alignment);
> > > > + io_loc.count += sector - io_loc.sector;
> > > > + buffer += (sector - io_loc.sector) << SECTOR_SHIFT;
> > > > + io_loc.count = round_up(io_loc.count, alignment);
> > >
> > > I feel the above code isn't very reliable, what we need actually is to
> > > make sure that io's sector & size is aligned with dm's
> > > bdev_logical_block_size(bdev).
> >
> > I thought about using bdev_logical_block_size. But it may be wrong if the
> > device stack is reconfigured. So, I concluded that taking the alignment
> > from the bio would be better.
>
> If logical block becomes mismatched by reconfiguration, the whole DM stack can't work:
>
> - at the beginning, DM is over NVMe(512 bs), DM & NVMe lbs is 512
> - later, nvme is reconfigured and its lbs becomes 4k, but DM's lbs can't
> be updated
> - then unaligned IO is submitted to NVMe
>
> So DM _never_ works with mis-matched logical block size because of
> reconfigure, and same with MD.
At some point we need to trust the queue_limits and DM takes
considerable pain to validate the alignment when a dm-table is
(re)loaded.
But we could get into problems with deep(er) device stacks where an
underlying DM device is reloaded but the upper level devices'
queue_limits aren't restacked. Thankfully, in practice that generally
doesn't occur!
If it were to become a prevalent issue DM would need to grow
validation that DM devices aren't changing their logic_block_size and
overall alignment during runtime.
> > > Yeah, so far the max logical block size is 4k, but it may be increased
> > > in future and you can see the recent lsfmm proposal, so can we force it to be
> > > aligned with bdev_logical_block_size(bdev) here?
> > >
> > > Also can the above change work efficiently in case of 64K PAGE_SIZE?
> >
> > It doesn't work efficiently at all - this piece of code is only run in a
> > pathological case where the user writes into a buffer while reading it (or
> > when he reads multiple blocks into the same buffer), so I optimized it for
> > size, not for performance.
> >
> > But yes, it works with 64K PAGE_SIZE.
>
> Fine, but I still think PAGE_SIZE is hard to follow than logical block
> size.
Thanks for your review. I shared many of your review concerns (the
math isn't apporachable, and why not just use logical_block_size in
queue_limits?).
That said, I'm OK with the code as-is because it has been tested to
fix the reported misalignment issue. BUT, I would like to see
follow-on cleanup in a separate commit, at a minimum there should be
some helpful comments (to address the math and assumptions made,
e.g. this recheck code is not fast path).
Mike
prev parent reply other threads:[~2024-03-22 15:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-21 16:48 [PATCH] dm-integrity: align the outgoing bio in integrity_recheck Mikulas Patocka
2024-03-22 1:41 ` Ming Lei
2024-03-22 10:30 ` Mikulas Patocka
2024-03-22 12:03 ` Ming Lei
2024-03-22 12:32 ` Mikulas Patocka
2024-03-22 12:58 ` Ming Lei
2024-03-22 15:53 ` Mike Snitzer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zf2pcG_X1VyBimfV@redhat.com \
--to=snitzer@kernel.org \
--cc=bmarzins@redhat.com \
--cc=dm-devel@lists.linux.dev \
--cc=ming.lei@redhat.com \
--cc=mpatocka@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.