* [PATCH RFC 0/4] RAID0 atomic write support
@ 2024-09-03 15:07 John Garry
2024-09-03 15:07 ` [PATCH RFC 1/4] block: Make bdev_can_atomic_write() robust against mis-aligned bdev size John Garry
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: John Garry @ 2024-09-03 15:07 UTC (permalink / raw)
To: axboe, song, yukuai3, kbusch, hch, sagi, James.Bottomley,
martin.petersen
Cc: linux-block, linux-kernel, linux-raid, linux-nvme, linux-scsi,
John Garry
This series introduces atomic write support for software RAID0.
The main changes are to ensure that we can calculate the stacked device
request_queue limits appropriately for atomic writes. Fundamentally, if
some bottom does not support atomic writes, then atomic writes are not
supported for the top device. Furthermore, the atomic writes limits are
the lowest common supported limits from all bottom devices.
Flag BLK_FEAT_ATOMIC_WRITES is introduced to enable atomic writes for
stacked devices selectively. This ensures that we can analyze and test
atomic writes support per individual md/dm personality.
I am sending as an RFC as I need to test more and I'd like some initial
feedback, if any. Furthermore, RAID1 support is still be analyzed, so I
am 100% confident that the solution presented here is final.
The first block patch can be considered as a fix.
Based on v6.11-rc6
John Garry (4):
block: Make bdev_can_atomic_write() robust against mis-aligned bdev
size
block: Add BLK_FEAT_ATOMIC_WRITES flag
block: Support atomic writes limits for stacked devices
md/raid0: Atomic write support
block/blk-settings.c | 22 +++++++++++++++++++++-
drivers/md/raid0.c | 8 +++++++-
drivers/nvme/host/core.c | 3 +++
drivers/scsi/sd.c | 13 ++++++++++---
include/linux/blkdev.h | 6 ++++++
5 files changed, 47 insertions(+), 5 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH RFC 1/4] block: Make bdev_can_atomic_write() robust against mis-aligned bdev size
2024-09-03 15:07 [PATCH RFC 0/4] RAID0 atomic write support John Garry
@ 2024-09-03 15:07 ` John Garry
2024-09-12 13:15 ` Christoph Hellwig
2024-09-03 15:07 ` [PATCH RFC 2/4] block: Add BLK_FEAT_ATOMIC_WRITES flag John Garry
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: John Garry @ 2024-09-03 15:07 UTC (permalink / raw)
To: axboe, song, yukuai3, kbusch, hch, sagi, James.Bottomley,
martin.petersen
Cc: linux-block, linux-kernel, linux-raid, linux-nvme, linux-scsi,
John Garry
For bdev file operations, a write will be truncated when trying to write
past the end of the device. This could not be tolerated for an atomic
write.
Ensure that the size of the bdev matches max atomic write unit so that this
truncation would never occur.
Fixes: 9da3d1e912f3 ("block: Add core atomic write support")
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/blkdev.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b7664d593486..af8434e391fa 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1662,6 +1662,9 @@ static inline bool bdev_can_atomic_write(struct block_device *bdev)
if (!limits->atomic_write_unit_min)
return false;
+ if (!IS_ALIGNED(bdev_nr_bytes(bdev), limits->atomic_write_unit_max))
+ return false;
+
if (bdev_is_partition(bdev)) {
sector_t bd_start_sect = bdev->bd_start_sect;
unsigned int alignment =
--
2.31.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH RFC 2/4] block: Add BLK_FEAT_ATOMIC_WRITES flag
2024-09-03 15:07 [PATCH RFC 0/4] RAID0 atomic write support John Garry
2024-09-03 15:07 ` [PATCH RFC 1/4] block: Make bdev_can_atomic_write() robust against mis-aligned bdev size John Garry
@ 2024-09-03 15:07 ` John Garry
2024-09-12 13:16 ` Christoph Hellwig
2024-09-03 15:07 ` [PATCH RFC 3/4] block: Support atomic writes limits for stacked devices John Garry
2024-09-03 15:07 ` [PATCH RFC 4/4] md/raid0: Atomic write support John Garry
3 siblings, 1 reply; 18+ messages in thread
From: John Garry @ 2024-09-03 15:07 UTC (permalink / raw)
To: axboe, song, yukuai3, kbusch, hch, sagi, James.Bottomley,
martin.petersen
Cc: linux-block, linux-kernel, linux-raid, linux-nvme, linux-scsi,
John Garry
For stacking devices, not all underlying bottom devices may support atomic
writes, so a feature flag would be helpful in deciding whether the top
device may support atomic writes.
Furthermore, even if all bottom devices support atomic writes, a md/dm
personality may not.
Add flag BLK_FEAT_ATOMIC_WRITES to decide whether a block device supports
atomic writes.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/blk-settings.c | 3 ++-
drivers/nvme/host/core.c | 3 +++
drivers/scsi/sd.c | 13 ++++++++++---
include/linux/blkdev.h | 3 +++
4 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index cd8a8eabc9a5..036e67f73116 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -176,7 +176,7 @@ static void blk_validate_atomic_write_limits(struct queue_limits *lim)
{
unsigned int boundary_sectors;
- if (!lim->atomic_write_hw_max)
+ if (!(lim->features & BLK_FEAT_ATOMIC_WRITES) || !lim->atomic_write_hw_max)
goto unsupported;
boundary_sectors = lim->atomic_write_hw_boundary >> SECTOR_SHIFT;
@@ -217,6 +217,7 @@ static void blk_validate_atomic_write_limits(struct queue_limits *lim)
lim->atomic_write_boundary_sectors = 0;
lim->atomic_write_unit_min = 0;
lim->atomic_write_unit_max = 0;
+ lim->features &= ~BLK_FEAT_ATOMIC_WRITES;
}
/*
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0dc8bcc664f2..c70d8e7602bf 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1975,6 +1975,7 @@ static void nvme_update_atomic_write_disk_info(struct nvme_ns *ns,
lim->atomic_write_hw_boundary = boundary;
lim->atomic_write_hw_unit_min = bs;
lim->atomic_write_hw_unit_max = rounddown_pow_of_two(atomic_bs);
+ lim->features |= BLK_FEAT_ATOMIC_WRITES;
}
static u32 nvme_max_drv_segments(struct nvme_ctrl *ctrl)
@@ -2025,6 +2026,8 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id,
atomic_bs = (1 + ns->ctrl->subsys->awupf) * bs;
nvme_update_atomic_write_disk_info(ns, id, lim, bs, atomic_bs);
+ } else {
+ lim->features &= ~BLK_FEAT_ATOMIC_WRITES;
}
if (id->nsfeat & NVME_NS_FEAT_IO_OPT) {
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 9db86943d04c..aecd05165ee8 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -948,7 +948,7 @@ static void sd_config_atomic(struct scsi_disk *sdkp, struct queue_limits *lim)
if ((!sdkp->max_atomic && !sdkp->max_atomic_with_boundary) ||
sdkp->protection_type == T10_PI_TYPE2_PROTECTION)
- return;
+ goto out_unsupported;
physical_block_size_sectors = sdkp->physical_block_size /
sdkp->device->sector_size;
@@ -988,15 +988,22 @@ static void sd_config_atomic(struct scsi_disk *sdkp, struct queue_limits *lim)
if (sdkp->atomic_alignment > 1) {
if (unit_min > 1 && unit_min % sdkp->atomic_alignment)
- return;
+ goto out_unsupported;
if (unit_max > 1 && unit_max % sdkp->atomic_alignment)
- return;
+ goto out_unsupported;
}
lim->atomic_write_hw_max = max_atomic * logical_block_size;
lim->atomic_write_hw_boundary = 0;
lim->atomic_write_hw_unit_min = unit_min * logical_block_size;
lim->atomic_write_hw_unit_max = unit_max * logical_block_size;
+
+ lim->features |= BLK_FEAT_ATOMIC_WRITES;
+ return;
+
+out_unsupported:
+ lim->features &= ~BLK_FEAT_ATOMIC_WRITES;
+ return;
}
static blk_status_t sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index af8434e391fa..c8c6a496a6ed 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -332,6 +332,9 @@ typedef unsigned int __bitwise blk_features_t;
#define BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE \
((__force blk_features_t)(1u << 15))
+/* device supports atomic writes */
+#define BLK_FEAT_ATOMIC_WRITES ((__force blk_features_t)(1u << 16))
+
/*
* Flags automatically inherited when stacking limits.
*/
--
2.31.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH RFC 3/4] block: Support atomic writes limits for stacked devices
2024-09-03 15:07 [PATCH RFC 0/4] RAID0 atomic write support John Garry
2024-09-03 15:07 ` [PATCH RFC 1/4] block: Make bdev_can_atomic_write() robust against mis-aligned bdev size John Garry
2024-09-03 15:07 ` [PATCH RFC 2/4] block: Add BLK_FEAT_ATOMIC_WRITES flag John Garry
@ 2024-09-03 15:07 ` John Garry
2024-09-12 13:16 ` Christoph Hellwig
2024-09-03 15:07 ` [PATCH RFC 4/4] md/raid0: Atomic write support John Garry
3 siblings, 1 reply; 18+ messages in thread
From: John Garry @ 2024-09-03 15:07 UTC (permalink / raw)
To: axboe, song, yukuai3, kbusch, hch, sagi, James.Bottomley,
martin.petersen
Cc: linux-block, linux-kernel, linux-raid, linux-nvme, linux-scsi,
John Garry
Allow stacked devices to support atomic writes by aggregating the minimum
capacility of all bottom devices.
If a bottom device does not support atomic writes, then
BLK_FEAT_ATOMIC_WRITES should be cleared for that device, and the top
device then will also not support atomic writes.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/blk-settings.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 036e67f73116..aeb05fb24801 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -682,6 +682,25 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->zone_write_granularity = 0;
t->max_zone_append_sectors = 0;
}
+ if (!(b->features & BLK_FEAT_ATOMIC_WRITES)) {
+ t->atomic_write_hw_max = 0;
+ t->atomic_write_hw_unit_max = 0;
+ t->atomic_write_hw_unit_min = 0;
+ t->atomic_write_hw_boundary = 0;
+ t->features &= ~BLK_FEAT_ATOMIC_WRITES;
+ } else if (t->features & BLK_FEAT_ATOMIC_WRITES) {
+ t->atomic_write_hw_max = min_not_zero(t->atomic_write_hw_max,
+ b->atomic_write_hw_max);
+ t->atomic_write_boundary_sectors =
+ min_not_zero(t->atomic_write_boundary_sectors,
+ b->atomic_write_boundary_sectors);
+ t->atomic_write_hw_unit_min = max(t->atomic_write_hw_unit_min,
+ b->atomic_write_hw_unit_min);
+ t->atomic_write_hw_unit_max =
+ min_not_zero(t->atomic_write_hw_unit_max,
+ b->atomic_write_hw_unit_max);
+ }
+
return ret;
}
EXPORT_SYMBOL(blk_stack_limits);
--
2.31.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH RFC 4/4] md/raid0: Atomic write support
2024-09-03 15:07 [PATCH RFC 0/4] RAID0 atomic write support John Garry
` (2 preceding siblings ...)
2024-09-03 15:07 ` [PATCH RFC 3/4] block: Support atomic writes limits for stacked devices John Garry
@ 2024-09-03 15:07 ` John Garry
2024-09-12 13:18 ` Christoph Hellwig
3 siblings, 1 reply; 18+ messages in thread
From: John Garry @ 2024-09-03 15:07 UTC (permalink / raw)
To: axboe, song, yukuai3, kbusch, hch, sagi, James.Bottomley,
martin.petersen
Cc: linux-block, linux-kernel, linux-raid, linux-nvme, linux-scsi,
John Garry
Set BLK_FEAT_ATOMIC_WRITES to enable atomic writes. All other stacked
device request queue limits should automatically be set properly. With
regards to atomic write max bytes limit, this will be set at
hw_max_sectors and this is limited by the stripe width, which we want.
Atomic requests may not be split, so error an attempt to do so.
It is noted that returning false from .make_request CB results in
bio_io_error() being called for the bio, which results in BLK_STS_IOERR.
This is not suitable for atomic writes, as BLK_STS_INVAL should be
returned.
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
drivers/md/raid0.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 32d587524778..caf1ecb55d11 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -384,6 +384,7 @@ static int raid0_set_limits(struct mddev *mddev)
lim.max_write_zeroes_sectors = mddev->chunk_sectors;
lim.io_min = mddev->chunk_sectors << 9;
lim.io_opt = lim.io_min * mddev->raid_disks;
+ lim.features |= BLK_FEAT_ATOMIC_WRITES;
err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
if (err) {
queue_limits_cancel_update(mddev->gendisk->queue);
@@ -606,7 +607,12 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
: sector_div(sector, chunk_sects));
if (sectors < bio_sectors(bio)) {
- struct bio *split = bio_split(bio, sectors, GFP_NOIO,
+ struct bio *split;
+
+ if (bio->bi_opf & REQ_ATOMIC)
+ return false;
+
+ split = bio_split(bio, sectors, GFP_NOIO,
&mddev->bio_set);
bio_chain(split, bio);
raid0_map_submit_bio(mddev, bio);
--
2.31.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 1/4] block: Make bdev_can_atomic_write() robust against mis-aligned bdev size
2024-09-03 15:07 ` [PATCH RFC 1/4] block: Make bdev_can_atomic_write() robust against mis-aligned bdev size John Garry
@ 2024-09-12 13:15 ` Christoph Hellwig
2024-09-12 14:58 ` John Garry
0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2024-09-12 13:15 UTC (permalink / raw)
To: John Garry
Cc: axboe, song, yukuai3, kbusch, hch, sagi, James.Bottomley,
martin.petersen, linux-block, linux-kernel, linux-raid,
linux-nvme, linux-scsi
On Tue, Sep 03, 2024 at 03:07:45PM +0000, John Garry wrote:
> For bdev file operations, a write will be truncated when trying to write
> past the end of the device. This could not be tolerated for an atomic
> write.
>
> Ensure that the size of the bdev matches max atomic write unit so that this
> truncation would never occur.
But we'd still support atomic writes for all but the last sectors of
the device? Isn't this really an application problem?
If not supporting atomic writes at all for unaligned devices is the right
thing to do, we'll need to clearly document this somewhere. Any maybe
also add a pr_once to log a message?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 2/4] block: Add BLK_FEAT_ATOMIC_WRITES flag
2024-09-03 15:07 ` [PATCH RFC 2/4] block: Add BLK_FEAT_ATOMIC_WRITES flag John Garry
@ 2024-09-12 13:16 ` Christoph Hellwig
2024-09-12 14:58 ` John Garry
0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2024-09-12 13:16 UTC (permalink / raw)
To: John Garry
Cc: axboe, song, yukuai3, kbusch, hch, sagi, James.Bottomley,
martin.petersen, linux-block, linux-kernel, linux-raid,
linux-nvme, linux-scsi
> @@ -176,7 +176,7 @@ static void blk_validate_atomic_write_limits(struct queue_limits *lim)
> {
> unsigned int boundary_sectors;
>
> - if (!lim->atomic_write_hw_max)
> + if (!(lim->features & BLK_FEAT_ATOMIC_WRITES) || !lim->atomic_write_hw_max)
Overly long line here.
Otherwise this looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 3/4] block: Support atomic writes limits for stacked devices
2024-09-03 15:07 ` [PATCH RFC 3/4] block: Support atomic writes limits for stacked devices John Garry
@ 2024-09-12 13:16 ` Christoph Hellwig
2024-09-12 15:05 ` John Garry
0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2024-09-12 13:16 UTC (permalink / raw)
To: John Garry
Cc: axboe, song, yukuai3, kbusch, hch, sagi, James.Bottomley,
martin.petersen, linux-block, linux-kernel, linux-raid,
linux-nvme, linux-scsi
On Tue, Sep 03, 2024 at 03:07:47PM +0000, John Garry wrote:
> + } else if (t->features & BLK_FEAT_ATOMIC_WRITES) {
> + t->atomic_write_hw_max = min_not_zero(t->atomic_write_hw_max,
> + b->atomic_write_hw_max);
> + t->atomic_write_boundary_sectors =
> + min_not_zero(t->atomic_write_boundary_sectors,
> + b->atomic_write_boundary_sectors);
> + t->atomic_write_hw_unit_min = max(t->atomic_write_hw_unit_min,
> + b->atomic_write_hw_unit_min);
> + t->atomic_write_hw_unit_max =
> + min_not_zero(t->atomic_write_hw_unit_max,
> + b->atomic_write_hw_unit_max);
Maybe split this into a helper to make the code more readable?
Otherwise this looks good to me.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 4/4] md/raid0: Atomic write support
2024-09-03 15:07 ` [PATCH RFC 4/4] md/raid0: Atomic write support John Garry
@ 2024-09-12 13:18 ` Christoph Hellwig
2024-09-12 14:48 ` John Garry
0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2024-09-12 13:18 UTC (permalink / raw)
To: John Garry
Cc: axboe, song, yukuai3, kbusch, hch, sagi, James.Bottomley,
martin.petersen, linux-block, linux-kernel, linux-raid,
linux-nvme, linux-scsi
On Tue, Sep 03, 2024 at 03:07:48PM +0000, John Garry wrote:
> if (sectors < bio_sectors(bio)) {
> - struct bio *split = bio_split(bio, sectors, GFP_NOIO,
> + struct bio *split;
> +
> + if (bio->bi_opf & REQ_ATOMIC)
> + return false;
I guess this is the erroring out when attempting to split the request.
Can you add a comment to explain that and why it can't happen for the
normal I/O patterns?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 4/4] md/raid0: Atomic write support
2024-09-12 13:18 ` Christoph Hellwig
@ 2024-09-12 14:48 ` John Garry
2024-09-12 15:10 ` Christoph Hellwig
0 siblings, 1 reply; 18+ messages in thread
From: John Garry @ 2024-09-12 14:48 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, song, yukuai3, kbusch, sagi, James.Bottomley,
martin.petersen, linux-block, linux-kernel, linux-raid,
linux-nvme, linux-scsi
On 12/09/2024 14:18, Christoph Hellwig wrote:
> On Tue, Sep 03, 2024 at 03:07:48PM +0000, John Garry wrote:
>> if (sectors < bio_sectors(bio)) {
>> - struct bio *split = bio_split(bio, sectors, GFP_NOIO,
>> + struct bio *split;
>> +
>> + if (bio->bi_opf & REQ_ATOMIC)
>> + return false;
> I guess this is the erroring out when attempting to split the request.
I actually now think that I should change bio_split() to return NULL for
splitting a REQ_ATOMIC, like what do for ZONE_APPEND - calling
bio_split() like this is a common pattern in md RAID personalities.
However, none of the md RAID code check for a NULL split, which they
really should, so I can make that change also.
> Can you add a comment to explain that and why it can't happen for the
> normal I/O patterns?
ok, will do.
Cheers,
John
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 1/4] block: Make bdev_can_atomic_write() robust against mis-aligned bdev size
2024-09-12 13:15 ` Christoph Hellwig
@ 2024-09-12 14:58 ` John Garry
2024-09-12 15:07 ` Christoph Hellwig
0 siblings, 1 reply; 18+ messages in thread
From: John Garry @ 2024-09-12 14:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, song, yukuai3, kbusch, sagi, James.Bottomley,
martin.petersen, linux-block, linux-kernel, linux-raid,
linux-nvme, linux-scsi
On 12/09/2024 14:15, Christoph Hellwig wrote:
> On Tue, Sep 03, 2024 at 03:07:45PM +0000, John Garry wrote:
>> For bdev file operations, a write will be truncated when trying to write
>> past the end of the device. This could not be tolerated for an atomic
>> write.
>>
>> Ensure that the size of the bdev matches max atomic write unit so that this
>> truncation would never occur.
>
> But we'd still support atomic writes for all but the last sectors of
> the device?
We should do be able to, but with this patch we cannot. However, a
misaligned partition would be very much unexpected.
> Isn't this really an application problem?
Sure, if the application tried to do an atomic write to the end of the
device and it was truncated.
>
> If not supporting atomic writes at all for unaligned devices is the right
> thing to do, we'll need to clearly document this somewhere. Any maybe
> also add a pr_once to log a message?
I could also just reject any truncation on the atomic write in fops.
Maybe that is better.
And at some stage looking at making parted, fdisk, and other
partitioning tools atomic write aware would be good, so that the user
knows about these restrictions.
Thanks,
John
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 2/4] block: Add BLK_FEAT_ATOMIC_WRITES flag
2024-09-12 13:16 ` Christoph Hellwig
@ 2024-09-12 14:58 ` John Garry
0 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2024-09-12 14:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, song, yukuai3, kbusch, sagi, James.Bottomley,
martin.petersen, linux-block, linux-kernel, linux-raid,
linux-nvme, linux-scsi
On 12/09/2024 14:16, Christoph Hellwig wrote:
>> @@ -176,7 +176,7 @@ static void blk_validate_atomic_write_limits(struct queue_limits *lim)
>> {
>> unsigned int boundary_sectors;
>>
>> - if (!lim->atomic_write_hw_max)
>> + if (!(lim->features & BLK_FEAT_ATOMIC_WRITES) || !lim->atomic_write_hw_max)
> Overly long line here.
ok
>
> Otherwise this looks good:
>
> Reviewed-by: Christoph Hellwig<hch@lst.de>
cheers
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 3/4] block: Support atomic writes limits for stacked devices
2024-09-12 13:16 ` Christoph Hellwig
@ 2024-09-12 15:05 ` John Garry
0 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2024-09-12 15:05 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, song, yukuai3, kbusch, sagi, James.Bottomley,
martin.petersen, linux-block, linux-kernel, linux-raid,
linux-nvme, linux-scsi
On 12/09/2024 14:16, Christoph Hellwig wrote:
> On Tue, Sep 03, 2024 at 03:07:47PM +0000, John Garry wrote:
>> + } else if (t->features & BLK_FEAT_ATOMIC_WRITES) {
>> + t->atomic_write_hw_max = min_not_zero(t->atomic_write_hw_max,
>> + b->atomic_write_hw_max);
>> + t->atomic_write_boundary_sectors =
>> + min_not_zero(t->atomic_write_boundary_sectors,
>> + b->atomic_write_boundary_sectors);
>> + t->atomic_write_hw_unit_min = max(t->atomic_write_hw_unit_min,
>> + b->atomic_write_hw_unit_min);
>> + t->atomic_write_hw_unit_max =
>> + min_not_zero(t->atomic_write_hw_unit_max,
>> + b->atomic_write_hw_unit_max);
>
> Maybe split this into a helper to make the code more readable?
Yeah, I will do.
I was reworking this anyway.
So far I am not supporting a stripe unit with which is not a power-of-2.
But that is too restrictive. And lifting that restriction makes
calculating atomic write limits more complicated.
>
> Otherwise this looks good to me.
cheers
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 1/4] block: Make bdev_can_atomic_write() robust against mis-aligned bdev size
2024-09-12 14:58 ` John Garry
@ 2024-09-12 15:07 ` Christoph Hellwig
2024-09-12 15:22 ` John Garry
0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2024-09-12 15:07 UTC (permalink / raw)
To: John Garry
Cc: Christoph Hellwig, axboe, song, yukuai3, kbusch, sagi,
James.Bottomley, martin.petersen, linux-block, linux-kernel,
linux-raid, linux-nvme, linux-scsi
On Thu, Sep 12, 2024 at 03:58:00PM +0100, John Garry wrote:
> On 12/09/2024 14:15, Christoph Hellwig wrote:
>> On Tue, Sep 03, 2024 at 03:07:45PM +0000, John Garry wrote:
>>> For bdev file operations, a write will be truncated when trying to write
>>> past the end of the device. This could not be tolerated for an atomic
>>> write.
>>>
>>> Ensure that the size of the bdev matches max atomic write unit so that this
>>> truncation would never occur.
>>
>> But we'd still support atomic writes for all but the last sectors of
>> the device?
>
> We should do be able to, but with this patch we cannot. However, a
> misaligned partition would be very much unexpected.
Yes, misaligned partitions is very unexpected, but with large and
potentially unlimited atomic boundaries I would not expect the size
to always be aligned. But then again at least in NVMe atomic writes
don't need to match the max size anyway, so I'm not entirely sure
what the problem actually is.
> I could also just reject any truncation on the atomic write in fops. Maybe
> that is better.
It probably is.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 4/4] md/raid0: Atomic write support
2024-09-12 14:48 ` John Garry
@ 2024-09-12 15:10 ` Christoph Hellwig
2024-09-12 15:38 ` John Garry
0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2024-09-12 15:10 UTC (permalink / raw)
To: John Garry
Cc: Christoph Hellwig, axboe, song, yukuai3, kbusch, sagi,
James.Bottomley, martin.petersen, linux-block, linux-kernel,
linux-raid, linux-nvme, linux-scsi
On Thu, Sep 12, 2024 at 03:48:09PM +0100, John Garry wrote:
>
> I actually now think that I should change bio_split() to return NULL for
> splitting a REQ_ATOMIC, like what do for ZONE_APPEND - calling bio_split()
> like this is a common pattern in md RAID personalities. However, none of
> the md RAID code check for a NULL split, which they really should, so I can
> make that change also.
bio_split is a bit of a mess - even NULL isn't very good at returning
what caused it to fail. Maybe a switch to ERR_PTR and an audit of
all callers might be a good idea.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 1/4] block: Make bdev_can_atomic_write() robust against mis-aligned bdev size
2024-09-12 15:07 ` Christoph Hellwig
@ 2024-09-12 15:22 ` John Garry
2024-09-13 8:36 ` Hannes Reinecke
0 siblings, 1 reply; 18+ messages in thread
From: John Garry @ 2024-09-12 15:22 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, song, yukuai3, kbusch, sagi, James.Bottomley,
martin.petersen, linux-block, linux-kernel, linux-raid,
linux-nvme, linux-scsi
On 12/09/2024 16:07, Christoph Hellwig wrote:
>> We should do be able to, but with this patch we cannot. However, a
>> misaligned partition would be very much unexpected.
> Yes, misaligned partitions is very unexpected, but with large and
> potentially unlimited atomic boundaries I would not expect the size
> to always be aligned. But then again at least in NVMe atomic writes
> don't need to match the max size anyway, so I'm not entirely sure
> what the problem actually is.
Actually it's not an alignment issue, but a size issue.
Consider a 3.5MB partition and atomic write max is 1MB. If we tried to
atomic write 1MB at offset 3MB, then it would be truncated to a 0.5MB write.
So maybe it is an application bug.
>
>> I could also just reject any truncation on the atomic write in fops. Maybe
>> that is better.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 4/4] md/raid0: Atomic write support
2024-09-12 15:10 ` Christoph Hellwig
@ 2024-09-12 15:38 ` John Garry
0 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2024-09-12 15:38 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, song, yukuai3, kbusch, sagi, James.Bottomley,
martin.petersen, linux-block, linux-kernel, linux-raid,
linux-nvme, linux-scsi
On 12/09/2024 16:10, Christoph Hellwig wrote:
> On Thu, Sep 12, 2024 at 03:48:09PM +0100, John Garry wrote:
>>
>> I actually now think that I should change bio_split() to return NULL for
>> splitting a REQ_ATOMIC, like what do for ZONE_APPEND - calling bio_split()
>> like this is a common pattern in md RAID personalities. However, none of
>> the md RAID code check for a NULL split, which they really should, so I can
>> make that change also.
>
> bio_split is a bit of a mess - even NULL isn't very good at returning
> what caused it to fail. Maybe a switch to ERR_PTR and an audit of
> all callers might be a good idea.
>
So for bio_split() I guess that we would change as follows:
--->8----
diff --git a/block/bio.c b/block/bio.c
index c4053d49679a..36ddf458753f 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1671,11 +1671,11 @@ struct bio *bio_split(struct bio *bio, int sectors,
/* Zone append commands cannot be split */
if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
- return NULL;
+ return ERR_PTR(-EOPNOTSUPP);
split = bio_alloc_clone(bio->bi_bdev, bio, gfp, bs);
if (!split)
- return NULL;
+ return ERR_PTR(-ENOMEM);
split->bi_iter.bi_size = sectors << 9;
---8<----
And then fix up the callers to use IS_ERR().
Or also change bio_alloc_clone() to return an ERR_PTR()? I don't see
much point in that, as we will only ever return ENOMEM (from the
callees) anyway.
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 1/4] block: Make bdev_can_atomic_write() robust against mis-aligned bdev size
2024-09-12 15:22 ` John Garry
@ 2024-09-13 8:36 ` Hannes Reinecke
0 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2024-09-13 8:36 UTC (permalink / raw)
To: John Garry, Christoph Hellwig
Cc: axboe, song, yukuai3, kbusch, sagi, James.Bottomley,
martin.petersen, linux-block, linux-kernel, linux-raid,
linux-nvme, linux-scsi
On 9/12/24 17:22, John Garry wrote:
> On 12/09/2024 16:07, Christoph Hellwig wrote:
>>> We should do be able to, but with this patch we cannot. However, a
>>> misaligned partition would be very much unexpected.
>> Yes, misaligned partitions is very unexpected, but with large and
>> potentially unlimited atomic boundaries I would not expect the size
>> to always be aligned. But then again at least in NVMe atomic writes
>> don't need to match the max size anyway, so I'm not entirely sure
>> what the problem actually is.
>
> Actually it's not an alignment issue, but a size issue.
>
> Consider a 3.5MB partition and atomic write max is 1MB. If we tried to
> atomic write 1MB at offset 3MB, then it would be truncated to a 0.5MB
> write.
>
> So maybe it is an application bug.
>
Hmm. Why don't we reject such an I/O? We cannot guarantee an atomic
write, so I think we should be perfectly fine to return an error to
userspace.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-09-13 8:36 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-03 15:07 [PATCH RFC 0/4] RAID0 atomic write support John Garry
2024-09-03 15:07 ` [PATCH RFC 1/4] block: Make bdev_can_atomic_write() robust against mis-aligned bdev size John Garry
2024-09-12 13:15 ` Christoph Hellwig
2024-09-12 14:58 ` John Garry
2024-09-12 15:07 ` Christoph Hellwig
2024-09-12 15:22 ` John Garry
2024-09-13 8:36 ` Hannes Reinecke
2024-09-03 15:07 ` [PATCH RFC 2/4] block: Add BLK_FEAT_ATOMIC_WRITES flag John Garry
2024-09-12 13:16 ` Christoph Hellwig
2024-09-12 14:58 ` John Garry
2024-09-03 15:07 ` [PATCH RFC 3/4] block: Support atomic writes limits for stacked devices John Garry
2024-09-12 13:16 ` Christoph Hellwig
2024-09-12 15:05 ` John Garry
2024-09-03 15:07 ` [PATCH RFC 4/4] md/raid0: Atomic write support John Garry
2024-09-12 13:18 ` Christoph Hellwig
2024-09-12 14:48 ` John Garry
2024-09-12 15:10 ` Christoph Hellwig
2024-09-12 15:38 ` John Garry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).