From: Benjamin Marzinski <bmarzins@redhat.com>
To: Yongpeng Yang <yangyongpeng.storage@gmail.com>
Cc: Alasdair Kergon <agk@redhat.com>,
Mike Snitzer <snitzer@kernel.org>,
Mikulas Patocka <mpatocka@redhat.com>,
dm-devel@lists.linux.dev, Yongpeng Yang <yangyongpeng@xiaomi.com>
Subject: Re: [PATCH 1/1] dm-stripe: adjust max_hw_discard_sectors to avoid unnecessary discard bio splitting
Date: Wed, 10 Dec 2025 01:07:31 -0500 [thread overview]
Message-ID: <aTkOIzJVOcEJB7SU@redhat.com> (raw)
In-Reply-To: <20251208100713.697542-2-yangyongpeng.storage@gmail.com>
On Mon, Dec 08, 2025 at 06:07:14PM +0800, Yongpeng Yang wrote:
> From: Yongpeng Yang <yangyongpeng@xiaomi.com>
>
> Currently, the max_hw_discard_sectors of a stripe target is set to the
> minimum max_hw_discard_sectors among all sub devices. When the discard
> bio is larger than max_hw_discard_sectors, this may cause the stripe
> device to split discard bios unnecessarily, because the value of
> max_hw_discard_sectors affects max_discard_sectors, which equal to
> min(max_hw_discard_sectors, max_user_discard_sectors).
>
> For example:
> root@vm:~# echo '0 33554432 striped 2 256 /dev/vdd 0 /dev/vde 0' | dmsetup create stripe_dev
> root@vm:~# cat /sys/block/dm-1/queue/discard_max_bytes
> 536870912
> root@vm:~# cat /sys/block/dm-1/slaves/vdd/queue/discard_max_bytes
> 536870912
> root@vm:~# blkdiscard -o 0 -l 1073741824 -p 1073741824 /dev/mapper/stripe_dev
>
> dm-1 is the stripe device, and its discard_max_bytes is equal to
> each sub device’s discard_max_bytes. Since the requested discard
> length exceeds discard_max_bytes, the block layer splits the discard bio:
>
> block_bio_queue: 252,1 DS 0 + 2097152 [blkdiscard]
> block_split: 252,1 DS 0 / 1048576 [blkdiscard]
> block_rq_issue: 253,48 DS 268435456 () 0 + 524288 be,0,4 [blkdiscard]
> block_bio_queue: 253,64 DS 524288 + 524288 [blkdiscard]
>
> However, both vdd and vde can actually handle a discard bio of 536870912
> bytes, so this split is not necessary.
>
> This patch updates the stripe target’s q->limits.max_hw_discard_sectors
> to be the minimum max_hw_discard_sectors of the sub devices multiplied
> by the # of stripe devices. This enables the stripe device to handle
> larger discard bios without incurring unnecessary splitting.
>
> Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
> ---
> drivers/md/dm-stripe.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
> index 1461dc740dae..799d6def699b 100644
> --- a/drivers/md/dm-stripe.c
> +++ b/drivers/md/dm-stripe.c
> @@ -38,6 +38,9 @@ struct stripe_c {
> uint32_t chunk_size;
> int chunk_size_shift;
>
> + /* The minimum max_hw_discard_sectors of all sub devices. */
> + unsigned int max_hw_discard_sectors;
> +
> /* Needed for handling events */
> struct dm_target *ti;
>
> @@ -169,6 +172,8 @@ static int stripe_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> * Get the stripe destinations.
> */
> for (i = 0; i < stripes; i++) {
> + struct request_queue *q;
> +
> argv += 2;
>
> r = get_stripe(ti, sc, i, argv);
> @@ -180,6 +185,13 @@ static int stripe_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> return r;
> }
> atomic_set(&(sc->stripe[i].error_count), 0);
> +
> + q = bdev_get_queue(sc->stripe[i].dev->bdev);
> + if (i == 0)
> + sc->max_hw_discard_sectors = q->limits.max_hw_discard_sectors;
> + else
> + sc->max_hw_discard_sectors = min_not_zero(sc->max_hw_discard_sectors,
> + q->limits.max_hw_discard_sectors);
I don't think any of the above is necessary. When stripe_io_hints() is
called, dm_set_device_limits() will already have been called on all the
underlying stripe devices to combine the limits. So
limits->max_hw_discard_sectors should already be set to the same value
as you're computing for sc->max_hw_discard_sectors. Right?
> }
>
> ti->private = sc;
> @@ -456,7 +468,7 @@ static void stripe_io_hints(struct dm_target *ti,
> struct queue_limits *limits)
> {
> struct stripe_c *sc = ti->private;
> - unsigned int io_min, io_opt;
> + unsigned int io_min, io_opt, max_hw_discard_sectors;
>
> limits->chunk_sectors = sc->chunk_size;
>
> @@ -465,6 +477,8 @@ static void stripe_io_hints(struct dm_target *ti,
> limits->io_min = io_min;
> limits->io_opt = io_opt;
> }
> + if (!check_mul_overflow(sc->max_hw_discard_sectors, sc->stripes, &max_hw_discard_sectors))
sc->max_hw_discard_sectors should be the same as
limits->max_hw_discard_sectors here
> + limits->max_hw_discard_sectors = max_hw_discard_sectors;
I see a couple of issues with this calculation. First, this only works
if max_hw_discard_sectors is greater than sc->chunk_size. But more than
that, before you multiply the original limit by the number of stripes,
you must round it down to a multiple of chunk_size. Otherwise, you can
end up writing too much to some of the underlying devices. To show this
with simple numbers, imagine you had 3 stripes that with a chunk_size of
4 and all the underlying devices had a max_hw_discard_sectors of 5. You
would set max_hw_discard_sectors to 5 * 3 = 15. But if you discared 15
sectors from the beginning of the device, you would end up discarding
the first 4 sectors of each underlying device, and then loop around
discard 3 more sectors of the first device. This means that you would
discard 7 from the first device, instead of the 5 that it could handle.
Rounding max_hw_discard_sectors down to a multiple of chunk_size will
fix that.
Lastly, if you do overflow when multiplying max_hw_discard_sectors by
the number of stripes, you should probably just set
limits->max_hw_discard_sectors to UINT_MAX >> SECTOR_SHIFT, instead of
leaving it at what it was.
-Ben
> }
>
> static struct target_type stripe_target = {
> --
> 2.43.0
next prev parent reply other threads:[~2025-12-10 6:07 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-08 10:07 [PATCH 1/1] dm-stripe: adjust max_hw_discard_sectors to avoid unnecessary discard bio splitting Yongpeng Yang
2025-12-10 6:07 ` Benjamin Marzinski [this message]
2025-12-11 7:03 ` Yongpeng Yang
2025-12-11 23:13 ` Benjamin Marzinski
2025-12-12 12:11 ` Yongpeng Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTkOIzJVOcEJB7SU@redhat.com \
--to=bmarzins@redhat.com \
--cc=agk@redhat.com \
--cc=dm-devel@lists.linux.dev \
--cc=mpatocka@redhat.com \
--cc=snitzer@kernel.org \
--cc=yangyongpeng.storage@gmail.com \
--cc=yangyongpeng@xiaomi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.