* [PATCH v2 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling
@ 2024-08-15 16:32 John Garry
2024-08-15 16:32 ` [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes() John Garry
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: John Garry @ 2024-08-15 16:32 UTC (permalink / raw)
To: axboe, hch; +Cc: martin.petersen, linux-block, kbusch, John Garry
As reported in [0], we may get an infinite loop in
__blkdev_issue_write_zeroes() for making an XFS FS on a raid0 config.
Fix __blkdev_issue_write_zeroes() limit handling by reading the write
zeroes limit outside the loop.
Also include a change to drop the unnecessary NULL queue check in
bdev_write_zeroes_sectors().
[0] https://lore.kernel.org/linux-block/20240815062112.GA14067@lst.de/T/#m14ed5d882f9390a46cfe2fcfa2b51218aafbd32e
Differences to v1:
- Add RB tags from Christoph (thanks!)
- Update comment on __blkdev_issue_write_zeroes (Martin, Christoph)
John Garry (2):
block: Read max write zeroes once for __blkdev_issue_write_zeroes()
block: Drop NULL check in bdev_write_zeroes_sectors()
block/blk-lib.c | 25 ++++++++++++++++++-------
include/linux/blkdev.h | 7 +------
2 files changed, 19 insertions(+), 13 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
2024-08-15 16:32 [PATCH v2 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling John Garry
@ 2024-08-15 16:32 ` John Garry
2024-08-15 17:43 ` Martin K. Petersen
` (2 more replies)
2024-08-15 16:32 ` [PATCH v2 2/2] block: Drop NULL check in bdev_write_zeroes_sectors() John Garry
2024-08-19 15:49 ` [PATCH v2 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling Jens Axboe
2 siblings, 3 replies; 13+ messages in thread
From: John Garry @ 2024-08-15 16:32 UTC (permalink / raw)
To: axboe, hch; +Cc: martin.petersen, linux-block, kbusch, John Garry
As reported in [0], we may get a hang when formatting a XFS FS on a RAID0
drive.
Commit 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
changed __blkdev_issue_write_zeroes() to read the max write zeroes
value in the loop. This is not safe as max write zeroes may change in
value. Specifically for the case of [0], the value goes to 0, and we get
an infinite loop.
Lift the limit reading out of the loop.
[0] https://lore.kernel.org/linux-xfs/4d31268f-310b-4220-88a2-e191c3932a82@oracle.com/T/#t
Fixes: 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/blk-lib.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9f735efa6c94..83eb7761c2bf 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -111,13 +111,20 @@ static sector_t bio_write_zeroes_limit(struct block_device *bdev)
(UINT_MAX >> SECTOR_SHIFT) & ~bs_mask);
}
+/*
+ * There is no reliable way for the SCSI subsystem to determine whether a
+ * device supports a WRITE SAME operation without actually performing a write
+ * to media. As a result, write_zeroes is enabled by default and will be
+ * disabled if a zeroing operation subsequently fails. This means that this
+ * queue limit is likely to change at runtime.
+ */
static void __blkdev_issue_write_zeroes(struct block_device *bdev,
sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
- struct bio **biop, unsigned flags)
+ struct bio **biop, unsigned flags, sector_t limit)
{
+
while (nr_sects) {
- unsigned int len = min_t(sector_t, nr_sects,
- bio_write_zeroes_limit(bdev));
+ unsigned int len = min(nr_sects, limit);
struct bio *bio;
if ((flags & BLKDEV_ZERO_KILLABLE) &&
@@ -141,12 +148,14 @@ static void __blkdev_issue_write_zeroes(struct block_device *bdev,
static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp, unsigned flags)
{
+ sector_t limit = bio_write_zeroes_limit(bdev);
struct bio *bio = NULL;
struct blk_plug plug;
int ret = 0;
blk_start_plug(&plug);
- __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, flags);
+ __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio,
+ flags, limit);
if (bio) {
if ((flags & BLKDEV_ZERO_KILLABLE) &&
fatal_signal_pending(current)) {
@@ -165,7 +174,7 @@ static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector,
* on an I/O error, in which case we'll turn any error into
* "not supported" here.
*/
- if (ret && !bdev_write_zeroes_sectors(bdev))
+ if (ret && !limit)
return -EOPNOTSUPP;
return ret;
}
@@ -265,12 +274,14 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
unsigned flags)
{
+ sector_t limit = bio_write_zeroes_limit(bdev);
+
if (bdev_read_only(bdev))
return -EPERM;
- if (bdev_write_zeroes_sectors(bdev)) {
+ if (limit) {
__blkdev_issue_write_zeroes(bdev, sector, nr_sects,
- gfp_mask, biop, flags);
+ gfp_mask, biop, flags, limit);
} else {
if (flags & BLKDEV_ZERO_NOFALLBACK)
return -EOPNOTSUPP;
--
2.31.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/2] block: Drop NULL check in bdev_write_zeroes_sectors()
2024-08-15 16:32 [PATCH v2 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling John Garry
2024-08-15 16:32 ` [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes() John Garry
@ 2024-08-15 16:32 ` John Garry
2024-08-15 17:43 ` Martin K. Petersen
2024-08-19 15:16 ` Nitesh Shetty
2024-08-19 15:49 ` [PATCH v2 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling Jens Axboe
2 siblings, 2 replies; 13+ messages in thread
From: John Garry @ 2024-08-15 16:32 UTC (permalink / raw)
To: axboe, hch; +Cc: martin.petersen, linux-block, kbusch, John Garry
Function bdev_get_queue() must not return NULL, so drop the check in
bdev_write_zeroes_sectors().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
include/linux/blkdev.h | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e85ec73a07d5..b7664d593486 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1296,12 +1296,7 @@ bdev_max_secure_erase_sectors(struct block_device *bdev)
static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev)
{
- struct request_queue *q = bdev_get_queue(bdev);
-
- if (q)
- return q->limits.max_write_zeroes_sectors;
-
- return 0;
+ return bdev_get_queue(bdev)->limits.max_write_zeroes_sectors;
}
static inline bool bdev_nonrot(struct block_device *bdev)
--
2.31.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
2024-08-15 16:32 ` [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes() John Garry
@ 2024-08-15 17:43 ` Martin K. Petersen
2024-08-16 18:34 ` Nitesh Shetty
2024-08-28 7:15 ` Shinichiro Kawasaki
2 siblings, 0 replies; 13+ messages in thread
From: Martin K. Petersen @ 2024-08-15 17:43 UTC (permalink / raw)
To: John Garry; +Cc: axboe, hch, martin.petersen, linux-block, kbusch
John,
> As reported in [0], we may get a hang when formatting a XFS FS on a RAID0
> drive.
>
> Commit 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
> changed __blkdev_issue_write_zeroes() to read the max write zeroes
> value in the loop. This is not safe as max write zeroes may change in
> value. Specifically for the case of [0], the value goes to 0, and we get
> an infinite loop.
>
> Lift the limit reading out of the loop.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/2] block: Drop NULL check in bdev_write_zeroes_sectors()
2024-08-15 16:32 ` [PATCH v2 2/2] block: Drop NULL check in bdev_write_zeroes_sectors() John Garry
@ 2024-08-15 17:43 ` Martin K. Petersen
2024-08-19 15:16 ` Nitesh Shetty
1 sibling, 0 replies; 13+ messages in thread
From: Martin K. Petersen @ 2024-08-15 17:43 UTC (permalink / raw)
To: John Garry; +Cc: axboe, hch, martin.petersen, linux-block, kbusch
John,
> Function bdev_get_queue() must not return NULL, so drop the check in
> bdev_write_zeroes_sectors().
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
2024-08-15 16:32 ` [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes() John Garry
2024-08-15 17:43 ` Martin K. Petersen
@ 2024-08-16 18:34 ` Nitesh Shetty
2024-08-17 15:33 ` John Garry
2024-08-28 7:15 ` Shinichiro Kawasaki
2 siblings, 1 reply; 13+ messages in thread
From: Nitesh Shetty @ 2024-08-16 18:34 UTC (permalink / raw)
To: John Garry; +Cc: axboe, hch, martin.petersen, linux-block, kbusch
[-- Attachment #1: Type: text/plain, Size: 2539 bytes --]
On 15/08/24 04:32PM, John Garry wrote:
>As reported in [0], we may get a hang when formatting a XFS FS on a RAID0
>drive.
>
>Commit 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
>changed __blkdev_issue_write_zeroes() to read the max write zeroes
>value in the loop. This is not safe as max write zeroes may change in
>value. Specifically for the case of [0], the value goes to 0, and we get
>an infinite loop.
>
>Lift the limit reading out of the loop.
>
>[0] https://lore.kernel.org/linux-xfs/4d31268f-310b-4220-88a2-e191c3932a82@oracle.com/T/#t
>
>Fixes: 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
>Reviewed-by: Christoph Hellwig <hch@lst.de>
>Signed-off-by: John Garry <john.g.garry@oracle.com>
>---
> block/blk-lib.c | 25 ++++++++++++++++++-------
> 1 file changed, 18 insertions(+), 7 deletions(-)
>
>diff --git a/block/blk-lib.c b/block/blk-lib.c
>index 9f735efa6c94..83eb7761c2bf 100644
>--- a/block/blk-lib.c
>+++ b/block/blk-lib.c
>@@ -111,13 +111,20 @@ static sector_t bio_write_zeroes_limit(struct block_device *bdev)
> (UINT_MAX >> SECTOR_SHIFT) & ~bs_mask);
> }
>
>+/*
>+ * There is no reliable way for the SCSI subsystem to determine whether a
>+ * device supports a WRITE SAME operation without actually performing a write
>+ * to media. As a result, write_zeroes is enabled by default and will be
>+ * disabled if a zeroing operation subsequently fails. This means that this
>+ * queue limit is likely to change at runtime.
>+ */
> static void __blkdev_issue_write_zeroes(struct block_device *bdev,
> sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
>- struct bio **biop, unsigned flags)
>+ struct bio **biop, unsigned flags, sector_t limit)
> {
>+
> while (nr_sects) {
>- unsigned int len = min_t(sector_t, nr_sects,
>- bio_write_zeroes_limit(bdev));
>+ unsigned int len = min(nr_sects, limit);
I feel changes something like below will simplify the whole patch.
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -115,9 +115,13 @@ static void __blkdev_issue_write_zeroes(struct block_device *bdev,
sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
struct bio **biop, unsigned flags)
{
+ sector_t limit = bio_write_zeroes_limit(bdev);
+
+ if (!limit)
+ return;
+
while (nr_sects) {
- unsigned int len = min_t(sector_t, nr_sects,
- bio_write_zeroes_limit(bdev));
+ unsigned int len = min_t(sector_t, nr_sects, limit);
struct bio *bio;
if ((flags & BLKDEV_ZERO_KILLABLE) &&
Otherwise, this series looks good to me.
-- Nitesh Shetty
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
2024-08-16 18:34 ` Nitesh Shetty
@ 2024-08-17 15:33 ` John Garry
2024-08-19 15:15 ` Nitesh Shetty
0 siblings, 1 reply; 13+ messages in thread
From: John Garry @ 2024-08-17 15:33 UTC (permalink / raw)
To: Nitesh Shetty; +Cc: axboe, hch, martin.petersen, linux-block, kbusch
On 16/08/2024 19:34, Nitesh Shetty wrote:
> On 15/08/24 04:32PM, John Garry wrote:
>> As reported in [0], we may get a hang when formatting a XFS FS on a RAID0
>> drive.
>>
>> Commit 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
>> changed __blkdev_issue_write_zeroes() to read the max write zeroes
>> value in the loop. This is not safe as max write zeroes may change in
>> value. Specifically for the case of [0], the value goes to 0, and we get
>> an infinite loop.
>>
>> Lift the limit reading out of the loop.
>>
>> [0] https://urldefense.com/v3/__https://lore.kernel.org/linux-
>> xfs/4d31268f-310b-4220-88a2-e191c3932a82@oracle.com/T/*t__;Iw!!
>> ACWV5N9M2RV99hQ!
>> KNrgu0c216k_Y_2RLxTasjxizyhbN8eKD61JwIwDxT5OJJDamER6hw1nvf5biNMqQLaLl9PqC2qRUDdHlrGF7g$
>> Fixes: 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>> block/blk-lib.c | 25 ++++++++++++++++++-------
>> 1 file changed, 18 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>> index 9f735efa6c94..83eb7761c2bf 100644
>> --- a/block/blk-lib.c
>> +++ b/block/blk-lib.c
>> @@ -111,13 +111,20 @@ static sector_t bio_write_zeroes_limit(struct
>> block_device *bdev)
>> (UINT_MAX >> SECTOR_SHIFT) & ~bs_mask);
>> }
>>
>> +/*
>> + * There is no reliable way for the SCSI subsystem to determine
>> whether a
>> + * device supports a WRITE SAME operation without actually performing
>> a write
>> + * to media. As a result, write_zeroes is enabled by default and will be
>> + * disabled if a zeroing operation subsequently fails. This means
>> that this
>> + * queue limit is likely to change at runtime.
>> + */
>> static void __blkdev_issue_write_zeroes(struct block_device *bdev,
>> sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
>> - struct bio **biop, unsigned flags)
>> + struct bio **biop, unsigned flags, sector_t limit)
>> {
>> +
>> while (nr_sects) {
>> - unsigned int len = min_t(sector_t, nr_sects,
>> - bio_write_zeroes_limit(bdev));
>> + unsigned int len = min(nr_sects, limit);
>
> I feel changes something like below will simplify the whole patch.
>
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -115,9 +115,13 @@ static void __blkdev_issue_write_zeroes(struct
> block_device *bdev,
> sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
> struct bio **biop, unsigned flags)
> {
> + sector_t limit = bio_write_zeroes_limit(bdev);
> +
> + if (!limit)
> + return;
Just this?
If yes, then __blkdev_issue_write_zeroes() does not return an error
code, so can we tell whether the zeroes were actually issued?
Furthermore, I would rather limit points at which we call
bio_write_zeroes_limit()
BTW, I am going on a short vacation now, so I can't quickly rework this
(if that was actually required).
> +
> while (nr_sects) {
> - unsigned int len = min_t(sector_t, nr_sects,
> - bio_write_zeroes_limit(bdev));
> + unsigned int len = min_t(sector_t, nr_sects, limit);
> struct bio *bio;
>
> if ((flags & BLKDEV_ZERO_KILLABLE) &&
>
>
> Otherwise, this series looks good to me.
>
> -- Nitesh Shetty
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
2024-08-17 15:33 ` John Garry
@ 2024-08-19 15:15 ` Nitesh Shetty
0 siblings, 0 replies; 13+ messages in thread
From: Nitesh Shetty @ 2024-08-19 15:15 UTC (permalink / raw)
To: John Garry; +Cc: axboe, hch, martin.petersen, linux-block, kbusch
[-- Attachment #1: Type: text/plain, Size: 3194 bytes --]
On 17/08/24 04:33PM, John Garry wrote:
>On 16/08/2024 19:34, Nitesh Shetty wrote:
>>On 15/08/24 04:32PM, John Garry wrote:
>>>As reported in [0], we may get a hang when formatting a XFS FS on a RAID0
>>>drive.
>>>
>>>Commit 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
>>>changed __blkdev_issue_write_zeroes() to read the max write zeroes
>>>value in the loop. This is not safe as max write zeroes may change in
>>>value. Specifically for the case of [0], the value goes to 0, and we get
>>>an infinite loop.
>>>
>>>Lift the limit reading out of the loop.
>>>
>>>[0] https://urldefense.com/v3/__https://lore.kernel.org/linux-
>>>xfs/4d31268f-310b-4220-88a2-e191c3932a82@oracle.com/T/*t__;Iw!!
>>>ACWV5N9M2RV99hQ! KNrgu0c216k_Y_2RLxTasjxizyhbN8eKD61JwIwDxT5OJJDamER6hw1nvf5biNMqQLaLl9PqC2qRUDdHlrGF7g$
>>>Fixes: 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
>>>Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>Signed-off-by: John Garry <john.g.garry@oracle.com>
>>>---
>>>block/blk-lib.c | 25 ++++++++++++++++++-------
>>>1 file changed, 18 insertions(+), 7 deletions(-)
>>>
>>>diff --git a/block/blk-lib.c b/block/blk-lib.c
>>>index 9f735efa6c94..83eb7761c2bf 100644
>>>--- a/block/blk-lib.c
>>>+++ b/block/blk-lib.c
>>>@@ -111,13 +111,20 @@ static sector_t
>>>bio_write_zeroes_limit(struct block_device *bdev)
>>> (UINT_MAX >> SECTOR_SHIFT) & ~bs_mask);
>>>}
>>>
>>>+/*
>>>+ * There is no reliable way for the SCSI subsystem to determine
>>>whether a
>>>+ * device supports a WRITE SAME operation without actually
>>>performing a write
>>>+ * to media. As a result, write_zeroes is enabled by default and will be
>>>+ * disabled if a zeroing operation subsequently fails. This means
>>>that this
>>>+ * queue limit is likely to change at runtime.
>>>+ */
>>>static void __blkdev_issue_write_zeroes(struct block_device *bdev,
>>> sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
>>>- struct bio **biop, unsigned flags)
>>>+ struct bio **biop, unsigned flags, sector_t limit)
>>>{
>>>+
>>> while (nr_sects) {
>>>- unsigned int len = min_t(sector_t, nr_sects,
>>>- bio_write_zeroes_limit(bdev));
>>>+ unsigned int len = min(nr_sects, limit);
>>
>>I feel changes something like below will simplify the whole patch.
>>
>>--- a/block/blk-lib.c
>>+++ b/block/blk-lib.c
>>@@ -115,9 +115,13 @@ static void __blkdev_issue_write_zeroes(struct
>>block_device *bdev,
>> sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
>> struct bio **biop, unsigned flags)
>> {
>>+ sector_t limit = bio_write_zeroes_limit(bdev);
>>+
>>+ if (!limit)
>>+ return;
>
>Just this?
>
>If yes, then __blkdev_issue_write_zeroes() does not return an error
>code, so can we tell whether the zeroes were actually issued?
>
I failed to see this.
>Furthermore, I would rather limit points at which we call
>bio_write_zeroes_limit()
>
>BTW, I am going on a short vacation now, so I can't quickly rework
>this (if that was actually required).
>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/2] block: Drop NULL check in bdev_write_zeroes_sectors()
2024-08-15 16:32 ` [PATCH v2 2/2] block: Drop NULL check in bdev_write_zeroes_sectors() John Garry
2024-08-15 17:43 ` Martin K. Petersen
@ 2024-08-19 15:16 ` Nitesh Shetty
1 sibling, 0 replies; 13+ messages in thread
From: Nitesh Shetty @ 2024-08-19 15:16 UTC (permalink / raw)
To: John Garry; +Cc: axboe, hch, martin.petersen, linux-block, kbusch
[-- Attachment #1: Type: text/plain, Size: 291 bytes --]
On 15/08/24 04:32PM, John Garry wrote:
>Function bdev_get_queue() must not return NULL, so drop the check in
>bdev_write_zeroes_sectors().
>
>Reviewed-by: Christoph Hellwig <hch@lst.de>
>Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling
2024-08-15 16:32 [PATCH v2 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling John Garry
2024-08-15 16:32 ` [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes() John Garry
2024-08-15 16:32 ` [PATCH v2 2/2] block: Drop NULL check in bdev_write_zeroes_sectors() John Garry
@ 2024-08-19 15:49 ` Jens Axboe
2 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2024-08-19 15:49 UTC (permalink / raw)
To: hch, John Garry; +Cc: martin.petersen, linux-block, kbusch
On Thu, 15 Aug 2024 16:32:26 +0000, John Garry wrote:
> As reported in [0], we may get an infinite loop in
> __blkdev_issue_write_zeroes() for making an XFS FS on a raid0 config.
>
> Fix __blkdev_issue_write_zeroes() limit handling by reading the write
> zeroes limit outside the loop.
>
> Also include a change to drop the unnecessary NULL queue check in
> bdev_write_zeroes_sectors().
>
> [...]
Applied, thanks!
[1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
commit: 64b582ca88ca11400467b282d5fa3b870ded1c11
[2/2] block: Drop NULL check in bdev_write_zeroes_sectors()
commit: 81475beb1b5996505a39cd1d9316ce1e668932a2
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
2024-08-15 16:32 ` [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes() John Garry
2024-08-15 17:43 ` Martin K. Petersen
2024-08-16 18:34 ` Nitesh Shetty
@ 2024-08-28 7:15 ` Shinichiro Kawasaki
2024-08-28 7:17 ` hch
2 siblings, 1 reply; 13+ messages in thread
From: Shinichiro Kawasaki @ 2024-08-28 7:15 UTC (permalink / raw)
To: John Garry
Cc: axboe@kernel.dk, hch, martin.petersen@oracle.com,
linux-block@vger.kernel.org, kbusch@kernel.org, Damien Le Moal
On Aug 15, 2024 / 16:32, John Garry wrote:
> As reported in [0], we may get a hang when formatting a XFS FS on a RAID0
> drive.
>
> Commit 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
> changed __blkdev_issue_write_zeroes() to read the max write zeroes
> value in the loop. This is not safe as max write zeroes may change in
> value. Specifically for the case of [0], the value goes to 0, and we get
> an infinite loop.
>
> Lift the limit reading out of the loop.
>
> [0] https://lore.kernel.org/linux-xfs/4d31268f-310b-4220-88a2-e191c3932a82@oracle.com/T/#t
>
> Fixes: 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
Hello John,
I ran may test set for the kernel v6.11-rc5 and found that this commit triggers
an error. When I set up xfs with a dm-zoned device on a TCMU zoned device and
filled the filesystem, the kernel reported this error message:
device-mapper: zoned reclaim: (sdg): Align zone 19 wp 0 to 32736 (wp+32736) blocks failed -121
When dm-zoned calls blkdev_issue_zeorout(), EREMOTEIO 121 was returned, then
the error was reported. I think a change in this commit is the cause. Please
find my comment below.
> ---
> block/blk-lib.c | 25 ++++++++++++++++++-------
> 1 file changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 9f735efa6c94..83eb7761c2bf 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -111,13 +111,20 @@ static sector_t bio_write_zeroes_limit(struct block_device *bdev)
> (UINT_MAX >> SECTOR_SHIFT) & ~bs_mask);
> }
>
> +/*
> + * There is no reliable way for the SCSI subsystem to determine whether a
> + * device supports a WRITE SAME operation without actually performing a write
> + * to media. As a result, write_zeroes is enabled by default and will be
> + * disabled if a zeroing operation subsequently fails. This means that this
> + * queue limit is likely to change at runtime.
> + */
> static void __blkdev_issue_write_zeroes(struct block_device *bdev,
> sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
> - struct bio **biop, unsigned flags)
> + struct bio **biop, unsigned flags, sector_t limit)
> {
> +
> while (nr_sects) {
> - unsigned int len = min_t(sector_t, nr_sects,
> - bio_write_zeroes_limit(bdev));
> + unsigned int len = min(nr_sects, limit);
> struct bio *bio;
>
> if ((flags & BLKDEV_ZERO_KILLABLE) &&
> @@ -141,12 +148,14 @@ static void __blkdev_issue_write_zeroes(struct block_device *bdev,
> static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector,
> sector_t nr_sects, gfp_t gfp, unsigned flags)
> {
> + sector_t limit = bio_write_zeroes_limit(bdev);
> struct bio *bio = NULL;
> struct blk_plug plug;
> int ret = 0;
>
> blk_start_plug(&plug);
> - __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, flags);
> + __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio,
> + flags, limit);
> if (bio) {
> if ((flags & BLKDEV_ZERO_KILLABLE) &&
> fatal_signal_pending(current)) {
> @@ -165,7 +174,7 @@ static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector,
> * on an I/O error, in which case we'll turn any error into
> * "not supported" here.
> */
> - if (ret && !bdev_write_zeroes_sectors(bdev))
> + if (ret && !limit)
> return -EOPNOTSUPP;
> return ret;
> }
The hunk above changed the timing to check WRITE ZEROES capability. Before the
change, bdev_write_zeroes_sectors(bdev) was called after WRITE ZEROES bio
completion. Then it was able to detect the q->limits.max_write_zeroes_sector
value change by drivers at WRITE ZEROES failure (scsi sd driver does it).
However, after applying the hunk, the check is done for the local variable
'limit' which is cached before the WRITE ZEROES bio issue. So it can not detect
the q->limits.max_write_zeroes_sector value change at the WRITE ZEROES failure.
Hence, it does not replace the ERMOTEIO with EOPNOTSUPP.
I think the hunk above should be reverted. I created a patch below for it, and
confirmed it avoids the error. Now I'm running my test set to confirm no
regression. If you have comments on my findings and the patch, please share.
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 83eb7761c2bf..b336d57673d3 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -148,14 +148,13 @@ static void __blkdev_issue_write_zeroes(struct block_device *bdev,
static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp, unsigned flags)
{
- sector_t limit = bio_write_zeroes_limit(bdev);
struct bio *bio = NULL;
struct blk_plug plug;
int ret = 0;
blk_start_plug(&plug);
__blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio,
- flags, limit);
+ flags, bio_write_zeroes_limit(bdev));
if (bio) {
if ((flags & BLKDEV_ZERO_KILLABLE) &&
fatal_signal_pending(current)) {
@@ -174,7 +173,7 @@ static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector,
* on an I/O error, in which case we'll turn any error into
* "not supported" here.
*/
- if (ret && !limit)
+ if (ret && !bdev_write_zeroes_sectors(bdev))
return -EOPNOTSUPP;
return ret;
}
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
2024-08-28 7:15 ` Shinichiro Kawasaki
@ 2024-08-28 7:17 ` hch
2024-08-28 7:25 ` Shinichiro Kawasaki
0 siblings, 1 reply; 13+ messages in thread
From: hch @ 2024-08-28 7:17 UTC (permalink / raw)
To: Shinichiro Kawasaki
Cc: John Garry, axboe@kernel.dk, hch, martin.petersen@oracle.com,
linux-block@vger.kernel.org, kbusch@kernel.org, Damien Le Moal
On Wed, Aug 28, 2024 at 07:15:55AM +0000, Shinichiro Kawasaki wrote:
> I ran may test set for the kernel v6.11-rc5 and found that this commit triggers
> an error. When I set up xfs with a dm-zoned device on a TCMU zoned device and
> filled the filesystem, the kernel reported this error message:
>
> device-mapper: zoned reclaim: (sdg): Align zone 19 wp 0 to 32736 (wp+32736) blocks failed -121
>
> When dm-zoned calls blkdev_issue_zeorout(), EREMOTEIO 121 was returned, then
> the error was reported. I think a change in this commit is the cause. Please
> find my comment below.
This patch should fix it:
https://lore.kernel.org/linux-block/20240827175340.GB1977952@frogsfrogsfrogs/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
2024-08-28 7:17 ` hch
@ 2024-08-28 7:25 ` Shinichiro Kawasaki
0 siblings, 0 replies; 13+ messages in thread
From: Shinichiro Kawasaki @ 2024-08-28 7:25 UTC (permalink / raw)
To: hch
Cc: John Garry, axboe@kernel.dk, martin.petersen@oracle.com,
linux-block@vger.kernel.org, kbusch@kernel.org, Damien Le Moal
On Aug 28, 2024 / 09:17, hch wrote:
> On Wed, Aug 28, 2024 at 07:15:55AM +0000, Shinichiro Kawasaki wrote:
> > I ran may test set for the kernel v6.11-rc5 and found that this commit triggers
> > an error. When I set up xfs with a dm-zoned device on a TCMU zoned device and
> > filled the filesystem, the kernel reported this error message:
> >
> > device-mapper: zoned reclaim: (sdg): Align zone 19 wp 0 to 32736 (wp+32736) blocks failed -121
> >
> > When dm-zoned calls blkdev_issue_zeorout(), EREMOTEIO 121 was returned, then
> > the error was reported. I think a change in this commit is the cause. Please
> > find my comment below.
>
> This patch should fix it:
>
> https://lore.kernel.org/linux-block/20240827175340.GB1977952@frogsfrogsfrogs/
Oh I should have checked the latest linux-block posts. Anyway, good to have the
fix patch. Thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-08-28 7:25 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-15 16:32 [PATCH v2 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling John Garry
2024-08-15 16:32 ` [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes() John Garry
2024-08-15 17:43 ` Martin K. Petersen
2024-08-16 18:34 ` Nitesh Shetty
2024-08-17 15:33 ` John Garry
2024-08-19 15:15 ` Nitesh Shetty
2024-08-28 7:15 ` Shinichiro Kawasaki
2024-08-28 7:17 ` hch
2024-08-28 7:25 ` Shinichiro Kawasaki
2024-08-15 16:32 ` [PATCH v2 2/2] block: Drop NULL check in bdev_write_zeroes_sectors() John Garry
2024-08-15 17:43 ` Martin K. Petersen
2024-08-19 15:16 ` Nitesh Shetty
2024-08-19 15:49 ` [PATCH v2 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox