From: Nilay Shroff <nilay@linux.ibm.com>
To: John Garry <john.g.garry@oracle.com>, linux-block@vger.kernel.org
Cc: hch@lst.de, martin.petersen@oracle.com, axboe@kernel.dk,
ojaswin@linux.ibm.com, gjoyce@ibm.com
Subject: Re: [PATCH] block: fix atomic write limits for stacked devices
Date: Tue, 3 Jun 2025 20:46:24 +0530 [thread overview]
Message-ID: <0747313c-a70d-482f-8ea6-ce2dff772c2c@linux.ibm.com> (raw)
In-Reply-To: <07cfb3a1-c410-4544-ae4d-5808114e02d7@oracle.com>
On 6/3/25 5:47 PM, John Garry wrote:
> On 03/06/2025 12:27, Nilay Shroff wrote:
>> The current logic applies the bottom device's atomic write limits to
>> the stacked (top) device only if the top device does not support chunk
>> sectors. However, this approach is too restrictive.
>
> Note that the assumption is that md raid0/10 or dm-stripe chunk size is in io_min. It's not ideal, but that's the field which always holds chunk_sectors for those drivers.
>
> max_hw_sectors would seem to be more appropriate to me...
>
>>
>> We should also propagate the bottom device's atomic write limits to the
>> stacked device if atomic_write_hw_unit_{min,max} of the bottom device
>> are aligned with the top device's chunk size (io_min). Failing to do so
>> may unnecessarily reduce the stacked device's atomic write limits to
>> values much smaller than what the hardware can actually support.
>>
>> For example, on an NVMe disk with the following controller capability:
>> $ nvme id-ctrl /dev/nvme1 | grep awupf
>> awupf : 63
>>
>> Without the patch applied,
>>
>> The bottom device (nvme1c1n1) atomic queue limits:
>> $ /sys/block/nvme1c1n1/queue/atomic_write_boundary_bytes:0
>> $ /sys/block/nvme1c1n1/queue/atomic_write_max_bytes:262144
>> $ /sys/block/nvme1c1n1/queue/atomic_write_unit_max_bytes:262144
>> $ /sys/block/nvme1c1n1/queue/atomic_write_unit_min_bytes:4096
>>
>> The top device (nvme1n1) atomic queue limits:
>> $ /sys/block/nvme1n1/queue/atomic_write_boundary_bytes:0
>> $ /sys/block/nvme1n1/queue/atomic_write_max_bytes:4096
>> $ /sys/block/nvme1n1/queue/atomic_write_unit_max_bytes:4096
>> $ /sys/block/nvme1n1/queue/atomic_write_unit_min_bytes:4096
>>
>> With this patch applied,
>>
>> The top device (nvme1n1) atomic queue limits:
>> /sys/block/nvme1n1/queue/atomic_write_boundary_bytes:0
>> /sys/block/nvme1n1/queue/atomic_write_max_bytes:262144
>> /sys/block/nvme1n1/queue/atomic_write_unit_max_bytes:262144
>> /sys/block/nvme1n1/queue/atomic_write_unit_min_bytes:4096
>>
>> This change ensures that the stacked device retains optimal atomic write
>> capability when alignment permits, improving overall performance and
>> correctness.
>>
>> Reported-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>> Fixes: d7f36dc446e8 ("block: Support atomic writes limits for stacked devices")
>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
>> ---
>> block/blk-settings.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index a000daafbfb4..35c1354dd5ae 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -598,8 +598,14 @@ static bool blk_stack_atomic_writes_head(struct queue_limits *t,
>> !blk_stack_atomic_writes_boundary_head(t, b))
>> return false;
>> - if (t->io_min <= SECTOR_SIZE) {
>> - /* No chunk sectors, so use bottom device values directly */
>> + if (t->io_min <= SECTOR_SIZE ||
>> + (!(t->atomic_write_hw_unit_max % t->io_min) &&
>> + !(t->atomic_write_hw_unit_min % t->io_min))) {
>
> So will this now break md raid0/10 or dm stripe when t->io_min is set (> SECTOR_SIZE)? I mean, for md raid0/10 or dm-stripe, we should be taking the chunk size into account there and we now don't seem to be doing so now.
>
Shouldn't it be work good if we ensure that a bottom device atomic write unit min/max are
aligned with the top device chunk sectors then top device could simply copy and use the
bottom device atomic write limits directly? Or do we have a special case for raid0/10 and
dm-strip which can't handle atomic write if chunk size for stacked device is greater than
SECTOR_SIZE?
BTW there's a typo in the above change, we should have the above if check written as below
(my bad):
if (t->io_min <= SECTOR_SIZE ||
(!(b->atomic_write_hw_unit_max % t->io_min) &&
!(b->atomic_write_hw_unit_min % t->io_min))) {
...
...
> What is the value of top device io_min and physical_block_size in your example?
The NVme disk which I am using has both t->io_min and t->physical_block_size set
to 4096.
Thanks,
--Nilay
next prev parent reply other threads:[~2025-06-03 15:16 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-03 11:27 [PATCH] block: fix atomic write limits for stacked devices Nilay Shroff
2025-06-03 12:17 ` John Garry
2025-06-03 15:16 ` Nilay Shroff [this message]
2025-06-04 7:29 ` John Garry
2025-06-04 15:09 ` Nilay Shroff
2025-06-05 9:01 ` John Garry
2025-06-05 9:50 ` Nilay Shroff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0747313c-a70d-482f-8ea6-ce2dff772c2c@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=axboe@kernel.dk \
--cc=gjoyce@ibm.com \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=linux-block@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ojaswin@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox