From: Mike Snitzer <snitzer@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: dm-devel@lists.linux.dev, linux-block@vger.kernel.org,
Marco Patalano <mpatalan@redhat.com>,
Ewan Milne <emilne@redhat.com>
Subject: Re: dm: retain stacked max_sectors when setting queue_limits
Date: Wed, 22 May 2024 12:48:59 -0400 [thread overview]
Message-ID: <Zk4h-6f2M0XmraJV@kernel.org> (raw)
In-Reply-To: <20240522142458.GB7502@lst.de>
On Wed, May 22, 2024 at 04:24:58PM +0200, Christoph Hellwig wrote:
> On Tue, May 21, 2024 at 10:51:17PM -0400, Mike Snitzer wrote:
> > Otherwise, blk_validate_limits() will throw-away the max_sectors that
> > was stacked from underlying device(s). In doing so it can set a
> > max_sectors limit that violates underlying device limits.
>
> Hmm, yes it sort of is "throwing the limit away", but it really
> recalculates it from max_hw_sectors, max_dev_sectors and user_max_sectors.
Yes, but it needs to do that recalculation at each level of a stacked
device. And then we need to combine them via blk_stack_limits() -- as
is done with the various limits stacking loops in
drivers/md/dm-table.c:dm_calculate_queue_limits
> > This caused dm-multipath IO failures like the following because the
> > underlying devices' max_sectors were stacked up to be 1024, yet
> > blk_validate_limits() defaulted max_sectors to BLK_DEF_MAX_SECTORS_CAP
> > (2560):
>
> I suspect the problem is that SCSI messed directly with max_sectors instead
> and ignores max_user_sectors (and really shouldn't touch either, but that's
> a separate discussion). Can you try the patch below and maybe also provide
> the sysfs output for max_sectors_kb and max_hw_sectors_kb for all involved
> devices?
FYI, you can easily reproduce with:
git clone https://github.com/snitm/mptest.git
cd mptest
<edit so it uses: export MULTIPATH_BACKEND_MODULE="scsidebug">
./runtest ./tests/test_00_no_failure
Also, best to change this line:
./lib/mpath_generic: local _feature="4 queue_if_no_path retain_attached_hw_handler queue_mode $MULTIPATH_QUEUE_MODE"
to:
./lib/mpath_generic: local _feature="3 retain_attached_hw_handler queue_mode $MULTIPATH_QUEUE_MODE"
Otherwise the test will hang due to queue_if_no_path.
all underlying scsi-debug scsi devices:
./max_hw_sectors_kb:2147483647
./max_sectors_kb:512
multipath device:
before my fix:
./max_hw_sectors_kb:2147483647
./max_sectors_kb:1280
after my fix:
./max_hw_sectors_kb:2147483647
./max_sectors_kb:512
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 332eb9dac22d91..f6c822c9cbd2d3 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3700,8 +3700,10 @@ static int sd_revalidate_disk(struct gendisk *disk)
> */
> if (sdkp->first_scan ||
> q->limits.max_sectors > q->limits.max_dev_sectors ||
> - q->limits.max_sectors > q->limits.max_hw_sectors)
> + q->limits.max_sectors > q->limits.max_hw_sectors) {
> q->limits.max_sectors = rw_max;
> + q->limits.max_user_sectors = rw_max;
> + }
>
> sdkp->first_scan = 0;
>
>
Driver shouldn't be changing max_user_sectors..
But it also didn't fix it (mpath still gets ./max_sectors_kb:1280):
[ 74.872485] blk_insert_cloned_request: over max size limit. (2048 > 1024)
[ 74.872505] device-mapper: multipath: 254:3: Failing path 8:16.
[ 74.872620] blk_insert_cloned_request: over max size limit. (2048 > 1024)
[ 74.872641] device-mapper: multipath: 254:3: Failing path 8:32.
[ 74.872712] blk_insert_cloned_request: over max size limit. (2048 > 1024)
[ 74.872732] device-mapper: multipath: 254:3: Failing path 8:48.
[ 74.872788] blk_insert_cloned_request: over max size limit. (2048 > 1024)
[ 74.872808] device-mapper: multipath: 254:3: Failing path 8:64.
Simply setting max_user_sectors won't help with stacked devices
because blk_stack_limits() doesn't stack max_user_sectors. It'll
inform the underlying device's blk_validate_limits() calculation which
will result in max_sectors having the desired value (which it already
did, as I showed above). But when stacking limits from underlying
devices up to the higher-level dm-mpath queue_limits we still have
information loss.
Mike
next prev parent reply other threads:[~2024-05-22 16:49 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-22 2:51 [PATCH] dm: retain stacked max_sectors when setting queue_limits Mike Snitzer
2024-05-22 14:24 ` Christoph Hellwig
2024-05-22 16:48 ` Mike Snitzer [this message]
2024-05-22 17:37 ` Ewan Milne
2024-05-23 1:52 ` Ming Lei
2024-05-23 15:38 ` [PATCH for-6.10-rc1] block: fix blk_validate_limits() to properly handle stacked devices Mike Snitzer
2024-05-23 15:44 ` Christoph Hellwig
2024-05-23 15:48 ` Mike Snitzer
2024-05-23 15:52 ` Christoph Hellwig
2024-05-23 16:38 ` Mike Snitzer
2024-05-23 17:05 ` Christoph Hellwig
2024-05-23 17:14 ` Mike Snitzer
2024-05-23 7:16 ` dm: retain stacked max_sectors when setting queue_limits Christoph Hellwig
2024-05-23 8:27 ` Christoph Hellwig
2024-05-23 14:12 ` Mike Snitzer
2024-05-23 14:49 ` Christoph Hellwig
2024-05-23 15:44 ` Mike Snitzer
2024-05-23 15:50 ` Christoph Hellwig
2024-05-23 16:44 ` Mike Snitzer
2024-05-23 17:03 ` Christoph Hellwig
2024-05-22 20:33 ` [PATCH] " Ewan Milne
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zk4h-6f2M0XmraJV@kernel.org \
--to=snitzer@kernel.org \
--cc=dm-devel@lists.linux.dev \
--cc=emilne@redhat.com \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=mpatalan@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.