Re: [PATCH 1/2] block: ignore underlying non-stack devices io_opt

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Yu Kuai <yukuai1@huaweicloud.com>
To: Damien Le Moal <dlemoal@kernel.org>,
	colyli@kernel.org, linux-raid@vger.kernel.org
Cc: linux-block@vger.kernel.org, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH 1/2] block: ignore underlying non-stack devices io_opt
Date: Mon, 18 Aug 2025 10:57:50 +0800	[thread overview]
Message-ID: <ffa13f8c-5dfe-20d4-f67d-e3ccd0c70b86@huaweicloud.com> (raw)
In-Reply-To: <756b587d-2a5c-4749-a03b-cfc786740684@kernel.org>

Hi,

在 2025/08/18 10:51, Damien Le Moal 写道:
> On 8/18/25 12:26 AM, colyli@kernel.org wrote:
>> From: Coly Li <colyli@kernel.org>
>>
>> This patch adds a new BLK_FLAG_STACK_IO_OPT for stack block device. If a
>> stack block device like md raid5 declares its io_opt when don't want
>> blk_stack_limits() to change it with io_opt of underlying non-stack
>> block devices, BLK_FLAG_STACK_IO_OPT can be set on limits.flags. Then in
>> blk_stack_limits(), lcm_not_zero(t->io_opt, b->io_opt) will be avoided.
>>
>> For md raid5, it is necessary to keep a proper io_opt size for better
>> I/O thoughput.
>>
>> Signed-off-by: Coly Li <colyli@kernel.org>
>> ---
>>   block/blk-settings.c   | 6 +++++-
>>   drivers/md/raid5.c     | 1 +
>>   include/linux/blkdev.h | 3 +++
>>   3 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index 07874e9b609f..46ee538b2be9 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -782,6 +782,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>>   		t->features &= ~BLK_FEAT_POLL;
>>   
>>   	t->flags |= (b->flags & BLK_FLAG_MISALIGNED);
>> +	t->flags |= (b->flags & BLK_FLAG_STACK_IO_OPT);
>>   
>>   	t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
>>   	t->max_user_sectors = min_not_zero(t->max_user_sectors,
>> @@ -839,7 +840,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>>   				     b->physical_block_size);
>>   
>>   	t->io_min = max(t->io_min, b->io_min);
>> -	t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
>> +	if (!t->io_opt || !(t->flags & BLK_FLAG_STACK_IO_OPT) ||
>> +	    (b->flags & BLK_FLAG_STACK_IO_OPT))
>> +		t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
>> +
>>   	t->dma_alignment = max(t->dma_alignment, b->dma_alignment);
>>   
>>   	/* Set non-power-of-2 compatible chunk_sectors boundary */
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 023649fe2476..989acd8abd98 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -7730,6 +7730,7 @@ static int raid5_set_limits(struct mddev *mddev)
>>   	lim.io_min = mddev->chunk_sectors << 9;
>>   	lim.io_opt = lim.io_min * (conf->raid_disks - conf->max_degraded);
> 
> It seems to me that moving this *after* the call to mddev_stack_rdev_limits()
> would simply overwrite the io_opt limit coming from stacking and get you the
> same result as your patch, but without adding the new limit flags.

This is not enough, we have the case array is build on the top of
another array, we still need the lcm_not_zero() to not break this case.
And I would expect this flag for all the arrays, not just raid5.

Thanks,
Kuai

> 
> I do not think that the use of such flag is good as we definitely do not want
> to have more of these. That would make the limit stacking far too complicated
> with too many corner cases. Instead, drivers should simply change whatever
> limit they do not like. DM has the .io_hint method for that. md should have
> something similar.
> 
>>   	lim.features |= BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE;
>> +	lim.flags |= BLK_FLAG_STACK_IO_OPT;
>>   	lim.discard_granularity = stripe;
>>   	lim.max_write_zeroes_sectors = 0;
>>   	mddev_stack_rdev_limits(mddev, &lim, 0);
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 95886b404b16..a22c7cea9836 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -366,6 +366,9 @@ typedef unsigned int __bitwise blk_flags_t;
>>   /* passthrough command IO accounting */
>>   #define BLK_FLAG_IOSTATS_PASSTHROUGH	((__force blk_flags_t)(1u << 2))
>>   
>> +/* ignore underlying non-stack devices io_opt */
>> +#define BLK_FLAG_STACK_IO_OPT		((__force blk_flags_t)(1u << 3))
>> +
>>   struct queue_limits {
>>   	blk_features_t		features;
>>   	blk_flags_t		flags;
> 
>

next prev parent reply	other threads:[~2025-08-18  2:57 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-17 15:26 [PATCH 1/2] block: ignore underlying non-stack devices io_opt colyli
2025-08-17 15:26 ` [PATCH 2/2] md: split bio by io_opt size in md_submit_bio() colyli
2025-08-18  1:38   ` Yu Kuai
2025-08-18  8:01   ` Christoph Hellwig
2025-08-18  9:51   ` John Garry
     [not found]     ` <6DA25F37-26B3-4912-90A3-346CFD9A6EEA@coly.li>
2025-08-18 12:20       ` John Garry
2025-08-17 18:37 ` [PATCH 1/2] block: ignore underlying non-stack devices io_opt Paul Menzel
2025-08-18  1:14 ` Yu Kuai
2025-08-18  2:51 ` Damien Le Moal
2025-08-18  2:57   ` Yu Kuai [this message]
2025-08-18  3:18     ` Damien Le Moal
2025-08-18  3:40       ` Yu Kuai
2025-08-18  5:56         ` Christoph Hellwig
2025-08-18  6:14           ` Yu Kuai
2025-08-18  6:18             ` Christoph Hellwig
2025-08-18  6:31               ` Yu Kuai
2025-08-18  8:00                 ` Christoph Hellwig
2025-08-18  8:10                   ` Yu Kuai
2025-08-18  8:14                     ` Christoph Hellwig
2025-08-18  8:57                       ` Yu Kuai
2025-08-18  9:08                         ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ffa13f8c-5dfe-20d4-f67d-e3ccd0c70b86@huaweicloud.com \
    --to=yukuai1@huaweicloud.com \
    --cc=colyli@kernel.org \
    --cc=dlemoal@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).