linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>, Shuah Khan <shuah@kernel.org>,
	linux-block@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Stanley Zhang <stazhang@purestorage.com>,
	Uday Shankar <ushankar@purestorage.com>
Subject: Re: [PATCH v2 04/19] ublk: set request integrity params in ublksrv_io_desc
Date: Tue, 6 Jan 2026 09:55:22 +0800	[thread overview]
Message-ID: <aVxrih4F_V58QM3S@fedora> (raw)
In-Reply-To: <CADUfDZqF2rWRQVptMjM7JedkkHfM-K+V65=odLcdrc3OM9jsJg@mail.gmail.com>

On Mon, Jan 05, 2026 at 08:44:48AM -0800, Caleb Sander Mateos wrote:
> On Sat, Jan 3, 2026 at 6:14 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Fri, Jan 02, 2026 at 05:45:14PM -0700, Caleb Sander Mateos wrote:
> > > Indicate to the ublk server when an incoming request has integrity data
> > > by setting UBLK_IO_F_INTEGRITY in the ublksrv_io_desc's op_flags field.
> > > If the ublk device doesn't support integrity, the request will never
> > > provide integrity data. If the ublk device supports integrity, the
> > > request may omit the integrity buffer only if metadata_size matches the
> > > PI tuple size determined by csum_type. In this case, the ublk server
> > > should internally generate/verify the protection information from the
> > > data and sector offset.
> > > Set the UBLK_IO_F_CHECK_{GUARD,REFTAG,APPTAG} flags based on the
> > > request's BIP_CHECK_{GUARD,REFTAG,APPTAG} flags, indicating whether to
> > > verify the guard, reference, and app tags in the protection information.
> > > The expected reference tag (32 or 48 bits) and app tag (16 bits) are
> > > indicated in ublksrv_io_desc's new struct ublksrv_io_integrity integrity
> > > field. This field is unioned with the addr field to avoid changing the
> >
> > It might be fine to set per-rq app_tag, but bios in one request might have
> > different app_tag in case of io merge actually.
> 
> I based this logic largely on the code under if (ns->head->ms) in
> nvme_setup_rw(). That also assumes a single app_tag for the request.
> Sounds like an existing bug if bios with different app_tags can be
> merged together?

Looks it is true.

> 
> >
> > Also block layer builds ref_tag for each internal, please see
> 
> What do you mean by "internal"? "interval"?
> 
> > t10_pi_generate() and ext_pi_crc64_generate().
> 
> Yes, the reftag increases by 1 for each integrity interval. That's why
> it suffices for an NVMe command reading multiple blocks to specify
> only the expected reftag for the first block; the reftags for
> subsequent blocks are incremented accordingly.
> 
> Actually, I think we probably don't need to communicate the reftag
> seed to the ublk server. NVMe doesn't use the reftag seed (which can
> be overridden by struct uio_meta's seed field). Instead,
> nvme_set_ref_tag() always uses the offset into the block device
> divided by the integrity interval size, as required by all the
> existing csum_type formats the kernel supports. So a ublk server could
> just use the start_sector field of struct ublksrv_io_desc to compute
> the expected reftags. And using start_sector as the reftag also means
> merging requests would preserve their expected reftags.

IMO, this way looks fine from user viewpoint, especially aligning with NVMe. 

> 
> >
> > So looks this way is wrong.
> >
> > More importantly reusing iod->addr for other purpose not related with IO
> > buffer is very unfriendly for adding new features, and one lesson is for ZONED support
> > by reusing ublksrv_io_cmd->addr for zoned's append lba.
> 
> That's a fair point.

One candidate is add per-IO mmaped meta area, which can be flexible to
cover more use cases. 

> 
> >
> > For example, there is chance to support dma-buf based zero copy for ublk, and
> > please see the io-uring dma-buf support[1], and iod->addr might carry IO buffer info
> > in dma-buf format in future.
> >
> > [1] https://lore.kernel.org/io-uring/cover.1763725387.git.asml.silence@gmail.com/#t

BTW, PI data size is often small, and it belongs to kernel, there could be
chance to define PI data as pre-mapped DMA-BUF, then almost all drivers can
benefit from avoiding the runtime dma mapping for meta. But that may be one
bigger thing.

> >
> >
> > > size of struct ublksrv_io_desc. UBLK_F_INTEGRITY requires
> > > UBLK_F_USER_COPY and the addr field isn't used for UBLK_F_USER_COPY, so
> > > the two fields aren't needed simultaneously.
> > >
> > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > > ---
> > >  drivers/block/ublk_drv.c      | 43 +++++++++++++++++++++++++++++++----
> > >  include/uapi/linux/ublk_cmd.h | 27 ++++++++++++++++++++--
> > >  2 files changed, 64 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > index 2f9316febf83..51469e0627ff 100644
> > > --- a/drivers/block/ublk_drv.c
> > > +++ b/drivers/block/ublk_drv.c
> > > @@ -316,10 +316,36 @@ static inline bool ublk_dev_is_zoned(const struct ublk_device *ub)
> > >  static inline bool ublk_queue_is_zoned(const struct ublk_queue *ubq)
> > >  {
> > >       return ubq->flags & UBLK_F_ZONED;
> > >  }
> > >
> > > +static void ublk_setup_iod_buf(const struct ublk_queue *ubq,
> > > +                            const struct request *req,
> > > +                            struct ublksrv_io_desc *iod)
> > > +{
> > > +#ifdef CONFIG_BLK_DEV_INTEGRITY
> > > +     if (ubq->flags & UBLK_F_INTEGRITY) {
> > > +             struct bio_integrity_payload *bip;
> > > +             sector_t ref_tag_seed;
> > > +
> > > +             if (!blk_integrity_rq(req))
> > > +                     return;
> > > +
> > > +             bip = bio_integrity(req->bio);
> > > +             ref_tag_seed = bip_get_seed(bip);
> >
> > As mentioned, t10_pi_generate() and ext_pi_crc64_generate() builds
> > per-internal ref tag.
> 
> As mentioned, the reftags for subsequent intervals can be computed by
> simply incrementing the seed. If the seed is assumed to always be
> start_sector >> (interval_exp - SECTOR_SHIFT), then it may not be
> necessary to communicate ref_tag_seed at all.

Fair enough, but this should be documented in UAPI interface.

> 
> >
> >
> > > +             iod->integrity.ref_tag_lo = ref_tag_seed;
> > > +             iod->integrity.ref_tag_hi = ref_tag_seed >> 32;
> > > +             iod->integrity.app_tag = bip->app_tag;
> >
> > In case of io merge, each bio may have different ->app_tag.
> 
> It seems like it would make more sense to prevent merging bios with
> different app_tags. In the common case where a request contains a
> single bio, which has a single app_tag, it would be much more
> efficient to communicate only the 1 app_tag instead of having to pass
> a separate app_tag for every logical block/integrity interval.

OK.

> 
> >
> > Given you have to copy meta data via user copy, I suggest to follow the PI
> > standard and make it per-internal.
> 
> How are you suggesting the ublk server access bip->app_tag and
> bip_get_seed(bip) (if overriding the reftag seed is supported)? Would
> the ublk server need to make another user copy syscall?
> 
> Or would you prefer I drop the BIP_CHECK_* flag support from this
> patch set for now?

I can understand the motivation, and extra syscall should be avoided for
communicating reftag & apptag only, given you have explained both can be
per-request instead of per-interval.

But iod->addr should be avoided for this purpose, otherwise, new feature
can conflict with this usage easily.

But per-io mmapped area can solve this issue, the meta size can be one parameter
of `ublksrv_ctrl_dev_info` with feature flag of UBLK_F_MMAPED_IO_META, what
do you think of this way?


Thanks,
Ming


  reply	other threads:[~2026-01-06  1:55 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-03  0:45 [PATCH v2 00/19] ublk: add support for integrity data Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 01/19] blk-integrity: take const pointer in blk_integrity_rq() Caleb Sander Mateos
2026-01-04 21:50   ` Martin K. Petersen
2026-01-03  0:45 ` [PATCH v2 02/19] ublk: move ublk flag check functions earlier Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 03/19] ublk: support UBLK_PARAM_TYPE_INTEGRITY in device creation Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 04/19] ublk: set request integrity params in ublksrv_io_desc Caleb Sander Mateos
2026-01-03 14:14   ` Ming Lei
2026-01-05 16:44     ` Caleb Sander Mateos
2026-01-06  1:55       ` Ming Lei [this message]
2026-01-07 17:45         ` Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 05/19] ublk: add ublk_copy_user_bvec() helper Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 06/19] ublk: split out ublk_user_copy() helper Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 07/19] ublk: inline ublk_check_and_get_req() into ublk_user_copy() Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 08/19] ublk: move offset check out of __ublk_check_and_get_req() Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 09/19] ublk: implement integrity user copy Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 10/19] ublk: support UBLK_F_INTEGRITY Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 11/19] ublk: optimize ublk_user_copy() on daemon task Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 12/19] selftests: ublk: display UBLK_F_INTEGRITY support Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 13/19] selftests: ublk: add utility to get block device metadata size Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 14/19] selftests: ublk: add kublk support for integrity params Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 15/19] selftests: ublk: implement integrity user copy in kublk Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 16/19] selftests: ublk: support non-O_DIRECT backing files Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 17/19] selftests: ublk: add integrity data support to loop target Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 18/19] selftests: ublk: add integrity params test Caleb Sander Mateos
2026-01-03  0:45 ` [PATCH v2 19/19] selftests: ublk: add end-to-end integrity test Caleb Sander Mateos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aVxrih4F_V58QM3S@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=shuah@kernel.org \
    --cc=stazhang@purestorage.com \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).