All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	dm-devel@lists.linux.dev
Subject: Re: [GIT PULL] Block updates for 6.9-rc1
Date: Tue, 12 Mar 2024 11:22:53 -0400	[thread overview]
Message-ID: <ZfBzTWM7NBbGymsY@redhat.com> (raw)
In-Reply-To: <Ze-rRvKpux44ueao@infradead.org>

On Mon, Mar 11 2024 at  9:09P -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> On Mon, Mar 11, 2024 at 08:28:50PM -0400, Mike Snitzer wrote:
> > All for Jens being made to suffer with dm-crypt but I think we need a
> > proper root cause of what is happening for you and Johannes ;)
> 
> I'm going to try to stay out of the cranking, but I think the reason is
> that the limits stacking inherits the max_segment_size, nvme has weird
> rules for them due their odd PRPs, and dm-crypt set it's own
> max_segment_size to split out each page.  The regression here is
> that we now actually verify that conflict.
> 
> So this happens only for dm-crypt on nvme.  The fix is probably
> to not inherit low-level limits like max_segment_size, but I need
> to think about it a bit more and come up with an automated test case
> using say nvme-loop.

Yeah, I generally agree.

I looked at the latest code to more fully understand why this failed.

1) dm-crypt.c:crypt_io_hints() sets limits->max_segment_size = PAGE_SIZE;

2) drivers/nvme/host/core.c:nvme_set_ctrl_limits() sets:
   lim->virt_boundary_mask = NVME_CTRL_PAGE_SIZE - 1;
   lim->max_segment_size = UINT_MAX;

3) blk_stack_limits(t=dm-crypt, b=nvme-pci) will combine limits:
        t->virt_boundary_mask = min_not_zero(t->virt_boundary_mask,
                                            b->virt_boundary_mask);
        t->max_segment_size = min_not_zero(t->max_segment_size,
                                           b->max_segment_size);

4) blk_validate_limits() will reject the limits that
   blk_stack_limits() created:
        /*
         * Devices that require a virtual boundary do not support scatter/gather
         * I/O natively, but instead require a descriptor list entry for each
         * page (which might not be identical to the Linux PAGE_SIZE).  Because
         * of that they are not limited by our notion of "segment size".
         */
	if (lim->virt_boundary_mask) {
                if (WARN_ON_ONCE(lim->max_segment_size &&
                                 lim->max_segment_size != UINT_MAX))
                        return -EINVAL;
                lim->max_segment_size = UINT_MAX;
	} else {
                /*
                 * The maximum segment size has an odd historic 64k default that
                 * drivers probably should override.  Just like the I/O size we
                 * require drivers to at least handle a full page per segment.
                 */
		if (!lim->max_segment_size)
                        lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
                if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE))
                	return -EINVAL;
        }

blk_validate_limits() is currently very pedantic. I discussed with Jens
briefly and we're thinking it might make sense for blk_validate_limits()
to be more forgiving by _not_ imposing hard -EINVAL failure.  That in
the interim, during this transition to more curated and atomic limits, a
WARN_ON_ONCE() splat should serve as enough notice to developers (be it
lower level nvme or higher-level virtual devices like DM).

BUT for this specific max_segment_size case, the constraints of dm-crypt
are actually more conservative due to crypto requirements. Yet nvme's
more general "don't care, but will care if non-nvme driver does" for
this particular max_segment_size limit is being imposed when validating
the combined limits that dm-crypt will impose at the top-level.

All said, the above "if (lim->virt_boundary_mask)" check in
blk_validate_limits() looks bogus for stacked device limits.

Mike

  parent reply	other threads:[~2024-03-12 15:22 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-10 20:30 [GIT PULL] Block updates for 6.9-rc1 Jens Axboe
2024-03-11 19:43 ` pr-tracker-bot
2024-03-11 23:50 ` Johannes Weiner
2024-03-11 23:53   ` Jens Axboe
2024-03-11 23:58   ` Linus Torvalds
2024-03-12  0:02     ` Jens Axboe
2024-03-12  0:21       ` Linus Torvalds
2024-03-12  0:28         ` Mike Snitzer
2024-03-12  1:03           ` Jens Axboe
2024-03-12  1:09           ` Christoph Hellwig
2024-03-12  1:17             ` Jens Axboe
2024-03-12  1:20               ` Linus Torvalds
2024-03-12  1:23                 ` Jens Axboe
2024-03-12  1:28                   ` Linus Torvalds
2024-03-12  1:37                     ` Jens Axboe
2024-03-12 16:39                       ` Keith Busch
2024-03-12 11:53                   ` Christoph Hellwig
2024-03-12 15:25                     ` Jens Axboe
2024-03-12 11:52                 ` Christoph Hellwig
2024-03-12 15:22             ` Mike Snitzer [this message]
2024-03-12 16:28               ` Keith Busch
2024-03-12 21:10               ` Christoph Hellwig
2024-03-12 22:22                 ` Mike Snitzer
2024-03-12 22:30                   ` Christoph Hellwig
2024-03-12 22:50                     ` Mike Snitzer
2024-03-12 22:58                       ` Christoph Hellwig
2024-04-11 20:15                         ` [PATCH for-6.10 0/2] dm: use late bio-splitting and queue_limits_set Mike Snitzer
2024-04-11 20:15                         ` [PATCH for-6.10 1/2] dm-crypt: stop constraining max_segment_size to PAGE_SIZE Mike Snitzer
2024-04-12  6:11                           ` Christoph Hellwig
2024-04-15 14:08                           ` Mikulas Patocka
2024-04-23  7:32                           ` Ming Lei
2024-04-11 20:15                         ` [PATCH for-6.10 2/2] dm: use queue_limits_set Mike Snitzer
2024-04-23  7:33                           ` Ming Lei
2024-03-13 13:11                 ` [GIT PULL] Block updates for 6.9-rc1 Ming Lei
2024-03-12  1:01         ` Jens Axboe
2024-03-12  0:25       ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfBzTWM7NBbGymsY@redhat.com \
    --to=snitzer@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@lists.linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.