The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* Re: [PATCH v3 0/6] block: fix integrity offset/length conversions
       [not found]       ` <CADUfDZqkT4g3T6uE=hxt9J6JDMXbJt49rM7_Vgs3EBPdFeuuLw@mail.gmail.com>
@ 2026-05-04 17:55         ` Caleb Sander Mateos
  2026-05-13  2:15           ` Martin K. Petersen
  0 siblings, 1 reply; 3+ messages in thread
From: Caleb Sander Mateos @ 2026-05-04 17:55 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
	Anuj Gupta, linux-block, linux-nvme, linux-scsi, target-devel,
	linux-kernel

On Thu, Apr 23, 2026 at 11:02 AM Caleb Sander Mateos
<csander@purestorage.com> wrote:
>
> On Mon, Apr 20, 2026 at 7:09 PM Martin K. Petersen
> <martin.petersen@oracle.com> wrote:
> >
> >
> > Hi Caleb!
> >
> > > NVM Command Set specification 1.1 section 5.3.3 requires the reference
> > > tag to increment by 1 per logical block, so that seems to determine
> > > the increment unit:
> >
> > SCSI allows PI to be interleaved at intervals smaller than the logical
> > block size. This was done for PI compatibility in mixed environments
> > with both 512[en] and 4Kn disks. Interleaving allows 8 bytes of PI per
> > 512 bytes of data on devices using 4 KB logical blocks. That is the
> > reason why we use the term "integrity interval" instead of assuming
> > logical block size.
>
> Thanks for the explanation, I'm not too familiar with SCSI. I meant to
> refer to integrity intervals in my explanation if they differ from the
> logical block size.
>
> >
> > > The ref tag used for a particular block needs to be consistent. And
> > > since reftag(block N) can be computed as the reftag(M) + N - M if
> > > block N is accessed as part of an I/O that begins at block M, the
> > > function must be of the form reftag(block N) = N + c for some constant
> > > c. Thus, the ref tag seed needs to be computed in units of logical
> > > blocks (integrity intervals); no other unit (e.g. 512-byte sectors)
> > > works.
> >
> > Whoever attaches the PI decides on the seed value. In the case of the
> > block layer it made sense to use block layer sector number since that
> > value is inevitably going to be the same for a future read.
>
> I'm not following "going to be the same for a future read". The block
> can be read back by an I/O with a different starting
> offset/sector/seed, as my example illustrates. When the integrity
> interval size differs from the sector size (512 bytes), mixing the two
> units results in a different ref tag seed for the block depending on
> the starting offset of the I/O.
>
> >
> > Note that with MD, DM, and partitioning in the mix, the sector number
> > seen by whoever submits the I/O is going to be different from the LBAs
> > on the target devices which eventually receive the I/O. Nobody says
> > there is a computable constant offset. Think scattered LVM extent
> > allocations. Or RAID stripes placed at mismatched LBA offsets.
>
> The constant offset relationship still needs to hold over any
> contiguous range of a backing block device that can be accessed by a
> single I/O. For example, with partitions, it's not possible for a
> single I/O to cross a partition boundary, so each partition can have a
> different constant offset between the ref tags and absolute integrity
> interval numbers. With RAID, each shard can have a different constant
> offset. etc.
>
> >
> > > To see the issue with the current approach, consider an example
> > > accessing LBA 1 on a device with a 4 KB block size. If the block is
> > > written as part of a write that begins at LBA 0, its ref tag in the
> > > generated PI will be 1 (sector 0 + 1 integrity interval). If it's
> > > later read by a read starting at LBA 1, its expected ref tag will be 8
> > > (sector 8 + 0 integrity intervals), and the auto-integrity code will
> > > fail the read due to a reftag mismatch.
> >
> > Something is broken, then. Because the ref tag in the received PI should
> > have been remapped to start at 8 in that case.
>
> Ah, I missed the remapping piece. Thanks for pointing that out. I
> guess I was testing with a ublk device that doesn't advertise
> BLK_INTEGRITY_REF_TAG. Since commit 203247c5cb97 ("blk-integrity:
> support arbitrary buffer alignment"), the ref tag is unconditionally
> set in the PI from the (sector) seed, but the remapping is conditional
> on BLK_INTEGRITY_REF_TAG. That explains why I was seeing ref tags in
> the PI that didn't match the integrity interval numbers.
>
> So seems like patch 1 ("block: use integrity interval instead of
> sector as seed") doesn't need a Fixes tag. Still, I'm confused why the
> auto-integrity code bothers setting the seed to the sector number in
> the first place if it's going to be remapped later. Why not just leave
> the seed zeroed?

Martin,
I would appreciate a response here. Would you be okay with patch 1 if
the Fixes tags were dropped? Do you think we can get rid of the ref
tag seed initialization entirely if the ref tags get remapped later
anyways? Even if patch 1 is not required for correctness, patch 2 is a
fix for a separate issue introduced in the 7.1 merge window and has
reviews from Christoph and Anuj. I would prefer not to hold up that
fix over this ref tag seed discussion.

Best,
Caleb

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3 0/6] block: fix integrity offset/length conversions
  2026-05-04 17:55         ` [PATCH v3 0/6] block: fix integrity offset/length conversions Caleb Sander Mateos
@ 2026-05-13  2:15           ` Martin K. Petersen
  2026-05-13  2:51             ` Caleb Sander Mateos
  0 siblings, 1 reply; 3+ messages in thread
From: Martin K. Petersen @ 2026-05-13  2:15 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Chaitanya Kulkarni, Anuj Gupta, linux-block, linux-nvme,
	linux-scsi, target-devel, linux-kernel


Hi Caleb!

Sorry about the delay. Been away for a few weeks...

>> So seems like patch 1 ("block: use integrity interval instead of
>> sector as seed") doesn't need a Fixes tag. Still, I'm confused why
>> the auto-integrity code bothers setting the seed to the sector number
>> in the first place if it's going to be remapped later. Why not just
>> leave the seed zeroed?

It adds a bit of extra protection in the sense that it there is one more
parameter that can be validated. The premise of the integrity
infrastructure is that things in the two supplied buffers (data + PI) as
well as the control path (bip in the block layer case plus the SCSI or
NVMe command fields) all need to agree for the I/O to go through.

It is valid to generate the PI starting with 0. But that is
indistinguishable from "the seed value was not initialized".

> I would appreciate a response here. Would you be okay with patch 1 if
> the Fixes tags were dropped?

I am afraid I still don't completely understand why things are broken.

For writes, the meaning of the bip seed is: "This is the value you
should expect in the ref tag for the first integrity interval in the PI
buffer I prepared". With block layer autoprotect, the seed is set before
generating the PI and thus implicitly affects the generation.

When the write operation subsequently reaches the bottom of the stack,
we will check that the first ref tag in the PI buffer matches the
supplied seed value. And then proceed to remap the ref tags for each
protection interval to the target LBA + n since that is what the storage
requires (ignoring the odd Type 2 interval mismatch for now).

For reads, the meaning of the bip seed is: "This is what I expect to
receive in the ref tag for the first integrity interval in the PI
buffer". At the bottom of the stack we will receive PI from the storage
and that will contain ref tags matching the lower 32 bits of the LBA
since that is what the hardware returns. And we will then remap all
those ref tags starting with whichever bip seed value was requested by
the caller. It doesn't matter whether the requested seed value was 0,
10, or 42. The ref tags are remapped to whatever the caller wants them
to be.

I tend to think of the seed as a register you program with the value you
want. And then hardware or software remaps between what the storage
device's protection envelope requires and what the application (or in
this case the block layer) requested. With SCSI + DIX 1.1, the seed
literally controls a remapping register in the HBA ASIC. In NVMe we have
ILBRT/EILBRT.

-- 
Martin K. Petersen

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3 0/6] block: fix integrity offset/length conversions
  2026-05-13  2:15           ` Martin K. Petersen
@ 2026-05-13  2:51             ` Caleb Sander Mateos
  0 siblings, 0 replies; 3+ messages in thread
From: Caleb Sander Mateos @ 2026-05-13  2:51 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
	Anuj Gupta, linux-block, linux-nvme, linux-scsi, target-devel,
	linux-kernel

On Tue, May 12, 2026 at 7:16 PM Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>
>
> Hi Caleb!
>
> Sorry about the delay. Been away for a few weeks...
>
> >> So seems like patch 1 ("block: use integrity interval instead of
> >> sector as seed") doesn't need a Fixes tag. Still, I'm confused why
> >> the auto-integrity code bothers setting the seed to the sector number
> >> in the first place if it's going to be remapped later. Why not just
> >> leave the seed zeroed?
>
> It adds a bit of extra protection in the sense that it there is one more
> parameter that can be validated. The premise of the integrity
> infrastructure is that things in the two supplied buffers (data + PI) as
> well as the control path (bip in the block layer case plus the SCSI or
> NVMe command fields) all need to agree for the I/O to go through.
>
> It is valid to generate the PI starting with 0. But that is
> indistinguishable from "the seed value was not initialized".
>
> > I would appreciate a response here. Would you be okay with patch 1 if
> > the Fixes tags were dropped?
>
> I am afraid I still don't completely understand why things are broken.

Nothing is broken, I just mean that the seed value stored in
bip_iter.bi_sector is strange in that it's initialized in units of
512-byte sectors but incremented in units of integrity intervals. As
you point out, the remapping step makes the initial seed value
irrelevant, but I was certainly confused by it when I printed it
during some debugging. I can update the commit message to clarify the
rationale for the change.

>
> For writes, the meaning of the bip seed is: "This is the value you
> should expect in the ref tag for the first integrity interval in the PI
> buffer I prepared". With block layer autoprotect, the seed is set before
> generating the PI and thus implicitly affects the generation.
>
> When the write operation subsequently reaches the bottom of the stack,
> we will check that the first ref tag in the PI buffer matches the
> supplied seed value. And then proceed to remap the ref tags for each
> protection interval to the target LBA + n since that is what the storage
> requires (ignoring the odd Type 2 interval mismatch for now).
>
> For reads, the meaning of the bip seed is: "This is what I expect to
> receive in the ref tag for the first integrity interval in the PI
> buffer". At the bottom of the stack we will receive PI from the storage
> and that will contain ref tags matching the lower 32 bits of the LBA
> since that is what the hardware returns. And we will then remap all
> those ref tags starting with whichever bip seed value was requested by
> the caller. It doesn't matter whether the requested seed value was 0,
> 10, or 42. The ref tags are remapped to whatever the caller wants them
> to be.
>
> I tend to think of the seed as a register you program with the value you
> want. And then hardware or software remaps between what the storage
> device's protection envelope requires and what the application (or in
> this case the block layer) requested. With SCSI + DIX 1.1, the seed
> literally controls a remapping register in the HBA ASIC. In NVMe we have
> ILBRT/EILBRT.

What I find confusing is that the seed value stored in
bip_iter.bi_sector isn't what's actually passed to the SCSI/NVMe
device. It's only used in blk_integrity_iterate() and
__blk_reftag_remap() to generate/verify/remap the reftags in the
integrity/PI buffer. However, (E)ILBRT field (taking NVMe as an
example) comes from the physical block device offset rather than the
reftag seed. See t10_pi_ref_tag(), which returns blk_rq_pos()
converted to integrity intervals. It looks like this works because the
remap step ensures the reftags passed in the integrity buffer match
the physical integrity interval numbers, but this means the device is
comparing physical integrity interval numbers rather than reftag
seeds. My point is that if the remap step undoes the effect of the
seed by setting all the reftags in the integrity buffer to their
physical integrity interval, I don't see why the block integrity code
bothers setting a seed in the first place.

But it sounds like this may be a longer discussion, so I will split
out the two fixes for 7.1 into a separate series.

Thanks,
Caleb

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-13  2:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260417015732.2692434-1-csander@purestorage.com>
     [not found] ` <yq18qams3re.fsf@ca-mkp.ca.oracle.com>
     [not found]   ` <CADUfDZrwzUTi2TOj6M-+FtBK6u5evMsWSBqRDwJsLb8yLbOGvw@mail.gmail.com>
     [not found]     ` <yq15x5lqfdx.fsf@ca-mkp.ca.oracle.com>
     [not found]       ` <CADUfDZqkT4g3T6uE=hxt9J6JDMXbJt49rM7_Vgs3EBPdFeuuLw@mail.gmail.com>
2026-05-04 17:55         ` [PATCH v3 0/6] block: fix integrity offset/length conversions Caleb Sander Mateos
2026-05-13  2:15           ` Martin K. Petersen
2026-05-13  2:51             ` Caleb Sander Mateos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox