Linux block layer
 help / color / mirror / Atom feed
* [PATCH 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling
@ 2024-08-15  8:27 John Garry
  2024-08-15  8:27 ` [PATCH 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes() John Garry
  2024-08-15  8:27 ` [PATCH 2/2] block: Drop NULL check in bdev_write_zeroes_sectors() John Garry
  0 siblings, 2 replies; 9+ messages in thread
From: John Garry @ 2024-08-15  8:27 UTC (permalink / raw)
  To: axboe, hch; +Cc: martin.petersen, linux-block, kbusch, John Garry

As reported in [0], we may get an infinite loop in
__blkdev_issue_write_zeroes() for making an XFS FS on a raid0 config.

Fix __blkdev_issue_write_zeroes() limit handling by reading the write
zeroes limit outside the loop.

Also include a change to drop the unnecessary NULL queue check in
bdev_write_zeroes_sectors().

[0] https://lore.kernel.org/linux-block/20240815062112.GA14067@lst.de/T/#m14ed5d882f9390a46cfe2fcfa2b51218aafbd32e

John Garry (2):
  block: Read max write zeroes once for __blkdev_issue_write_zeroes()
  block: Drop NULL check in bdev_write_zeroes_sectors()

 block/blk-lib.c        | 22 +++++++++++++++-------
 include/linux/blkdev.h |  7 +------
 2 files changed, 16 insertions(+), 13 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
  2024-08-15  8:27 [PATCH 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling John Garry
@ 2024-08-15  8:27 ` John Garry
  2024-08-15 12:40   ` Christoph Hellwig
  2024-08-15  8:27 ` [PATCH 2/2] block: Drop NULL check in bdev_write_zeroes_sectors() John Garry
  1 sibling, 1 reply; 9+ messages in thread
From: John Garry @ 2024-08-15  8:27 UTC (permalink / raw)
  To: axboe, hch; +Cc: martin.petersen, linux-block, kbusch, John Garry

As reported in [0], we may get a hang when formatting a XFS FS on a RAID0
drive.

Commit 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
changed __blkdev_issue_write_zeroes() to read the max write zeroes
value in the loop. This is not safe as max write zeroes may change in
value. Specifically for the case of [0], the value goes to 0, and we get
an infinite loop.

Lift the limit reading out of the loop.

[0] https://lore.kernel.org/linux-xfs/4d31268f-310b-4220-88a2-e191c3932a82@oracle.com/T/#t

Fixes: 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 block/blk-lib.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9f735efa6c94..00cee94e5d42 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -111,13 +111,17 @@ static sector_t bio_write_zeroes_limit(struct block_device *bdev)
 		(UINT_MAX >> SECTOR_SHIFT) & ~bs_mask);
 }
 
+/*
+ * Pass bio_write_zeroes_limit() return value in @limit, as the return
+ * value may change after a REQ_OP_WRITE_ZEROES is issued.
+ */
 static void __blkdev_issue_write_zeroes(struct block_device *bdev,
 		sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
-		struct bio **biop, unsigned flags)
+		struct bio **biop, unsigned flags, sector_t limit)
 {
+
 	while (nr_sects) {
-		unsigned int len = min_t(sector_t, nr_sects,
-				bio_write_zeroes_limit(bdev));
+		unsigned int len = min(nr_sects, limit);
 		struct bio *bio;
 
 		if ((flags & BLKDEV_ZERO_KILLABLE) &&
@@ -141,12 +145,14 @@ static void __blkdev_issue_write_zeroes(struct block_device *bdev,
 static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp, unsigned flags)
 {
+	sector_t limit = bio_write_zeroes_limit(bdev);
 	struct bio *bio = NULL;
 	struct blk_plug plug;
 	int ret = 0;
 
 	blk_start_plug(&plug);
-	__blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, flags);
+	__blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio,
+			flags, limit);
 	if (bio) {
 		if ((flags & BLKDEV_ZERO_KILLABLE) &&
 		    fatal_signal_pending(current)) {
@@ -165,7 +171,7 @@ static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector,
 	 * on an I/O error, in which case we'll turn any error into
 	 * "not supported" here.
 	 */
-	if (ret && !bdev_write_zeroes_sectors(bdev))
+	if (ret && !limit)
 		return -EOPNOTSUPP;
 	return ret;
 }
@@ -265,12 +271,14 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
 		unsigned flags)
 {
+	sector_t limit = bio_write_zeroes_limit(bdev);
+
 	if (bdev_read_only(bdev))
 		return -EPERM;
 
-	if (bdev_write_zeroes_sectors(bdev)) {
+	if (limit) {
 		__blkdev_issue_write_zeroes(bdev, sector, nr_sects,
-				gfp_mask, biop, flags);
+				gfp_mask, biop, flags, limit);
 	} else {
 		if (flags & BLKDEV_ZERO_NOFALLBACK)
 			return -EOPNOTSUPP;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] block: Drop NULL check in bdev_write_zeroes_sectors()
  2024-08-15  8:27 [PATCH 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling John Garry
  2024-08-15  8:27 ` [PATCH 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes() John Garry
@ 2024-08-15  8:27 ` John Garry
  2024-08-15 12:40   ` Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: John Garry @ 2024-08-15  8:27 UTC (permalink / raw)
  To: axboe, hch; +Cc: martin.petersen, linux-block, kbusch, John Garry

Function bdev_get_queue() must not return NULL, so drop the check in
bdev_write_zeroes_sectors().

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 include/linux/blkdev.h | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e85ec73a07d5..b7664d593486 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1296,12 +1296,7 @@ bdev_max_secure_erase_sectors(struct block_device *bdev)
 
 static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev)
 {
-	struct request_queue *q = bdev_get_queue(bdev);
-
-	if (q)
-		return q->limits.max_write_zeroes_sectors;
-
-	return 0;
+	return bdev_get_queue(bdev)->limits.max_write_zeroes_sectors;
 }
 
 static inline bool bdev_nonrot(struct block_device *bdev)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
  2024-08-15  8:27 ` [PATCH 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes() John Garry
@ 2024-08-15 12:40   ` Christoph Hellwig
  2024-08-15 13:29     ` John Garry
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2024-08-15 12:40 UTC (permalink / raw)
  To: John Garry; +Cc: axboe, hch, martin.petersen, linux-block, kbusch

On Thu, Aug 15, 2024 at 08:27:54AM +0000, John Garry wrote:
> +/*
> + * Pass bio_write_zeroes_limit() return value in @limit, as the return
> + * value may change after a REQ_OP_WRITE_ZEROES is issued.
> + */

I don't think that really helps all that much to explain the issue,
which is about SCSI not having an ahead of time flag that reliably
works for write same support, which makes it clear the limit to 0 on
the first I/O completion.  Maybe you can actually spell this out?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] block: Drop NULL check in bdev_write_zeroes_sectors()
  2024-08-15  8:27 ` [PATCH 2/2] block: Drop NULL check in bdev_write_zeroes_sectors() John Garry
@ 2024-08-15 12:40   ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-08-15 12:40 UTC (permalink / raw)
  To: John Garry; +Cc: axboe, hch, martin.petersen, linux-block, kbusch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
  2024-08-15 12:40   ` Christoph Hellwig
@ 2024-08-15 13:29     ` John Garry
  2024-08-15 15:26       ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: John Garry @ 2024-08-15 13:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, martin.petersen, linux-block, kbusch

On 15/08/2024 13:40, Christoph Hellwig wrote:
> On Thu, Aug 15, 2024 at 08:27:54AM +0000, John Garry wrote:
>> +/*
>> + * Pass bio_write_zeroes_limit() return value in @limit, as the return
>> + * value may change after a REQ_OP_WRITE_ZEROES is issued.
>> + */
> I don't think that really helps all that much to explain the issue,
> which is about SCSI not having an ahead of time flag that reliably
> works for write same support, which makes it clear the limit to 0 on
> the first I/O completion.  Maybe you can actually spell this out?

Please just tell me what you would like to see, and I will copy verbatim.

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
  2024-08-15 13:29     ` John Garry
@ 2024-08-15 15:26       ` Christoph Hellwig
  2024-08-15 15:38         ` John Garry
  2024-08-15 15:41         ` Martin K. Petersen
  0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-08-15 15:26 UTC (permalink / raw)
  To: John Garry; +Cc: Christoph Hellwig, axboe, martin.petersen, linux-block, kbusch

On Thu, Aug 15, 2024 at 02:29:46PM +0100, John Garry wrote:
> On 15/08/2024 13:40, Christoph Hellwig wrote:
>> On Thu, Aug 15, 2024 at 08:27:54AM +0000, John Garry wrote:
>>> +/*
>>> + * Pass bio_write_zeroes_limit() return value in @limit, as the return
>>> + * value may change after a REQ_OP_WRITE_ZEROES is issued.
>>> + */
>> I don't think that really helps all that much to explain the issue,
>> which is about SCSI not having an ahead of time flag that reliably
>> works for write same support, which makes it clear the limit to 0 on
>> the first I/O completion.  Maybe you can actually spell this out?
>
> Please just tell me what you would like to see, and I will copy verbatim.

Probably just what I just wrote.  Unless Martin can come up with
better language.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
  2024-08-15 15:26       ` Christoph Hellwig
@ 2024-08-15 15:38         ` John Garry
  2024-08-15 15:41         ` Martin K. Petersen
  1 sibling, 0 replies; 9+ messages in thread
From: John Garry @ 2024-08-15 15:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, martin.petersen, linux-block, kbusch

On 15/08/2024 16:26, Christoph Hellwig wrote:
>>>> +/*
>>>> + * Pass bio_write_zeroes_limit() return value in @limit, as the return
>>>> + * value may change after a REQ_OP_WRITE_ZEROES is issued.
>>>> + */
>>> I don't think that really helps all that much to explain the issue,
>>> which is about SCSI not having an ahead of time flag that reliably
>>> works for write same support, which makes it clear the limit to 0 on
>>> the first I/O completion.  Maybe you can actually spell this out?
>> Please just tell me what you would like to see, and I will copy verbatim.
> Probably just what I just wrote.  Unless Martin can come up with
> better language.


/*
  * SCSI does not have a reliable ahead of time flag that reliably
  * works for write same support; instead, the limit is set to zero for
  * the first failed I/O completion. As such, only read the limit prior
  * to submission, as it may later change.
  */

ok?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()
  2024-08-15 15:26       ` Christoph Hellwig
  2024-08-15 15:38         ` John Garry
@ 2024-08-15 15:41         ` Martin K. Petersen
  1 sibling, 0 replies; 9+ messages in thread
From: Martin K. Petersen @ 2024-08-15 15:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: John Garry, axboe, martin.petersen, linux-block, kbusch


Christoph,

>>>> +/*
>>>> + * Pass bio_write_zeroes_limit() return value in @limit, as the return
>>>> + * value may change after a REQ_OP_WRITE_ZEROES is issued.
>>>> + */
>>> I don't think that really helps all that much to explain the issue,
>>> which is about SCSI not having an ahead of time flag that reliably
>>> works for write same support, which makes it clear the limit to 0 on
>>> the first I/O completion.  Maybe you can actually spell this out?
>>
>> Please just tell me what you would like to see, and I will copy verbatim.
>
> Probably just what I just wrote.  Unless Martin can come up with
> better language.

There is no reliable way for the SCSI subsystem to determine whether a
device supports a WRITE SAME operation without actually performing a
write to media. As a result, write_zeroes is enabled by default and will
be disabled if a zeroing operation subsequently fails. This means that
this queue limit is likely to change at runtime.

Something like that, perhaps?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-08-15 15:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-15  8:27 [PATCH 0/2] block: Fix __blkdev_issue_write_zeroes() limit handling John Garry
2024-08-15  8:27 ` [PATCH 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes() John Garry
2024-08-15 12:40   ` Christoph Hellwig
2024-08-15 13:29     ` John Garry
2024-08-15 15:26       ` Christoph Hellwig
2024-08-15 15:38         ` John Garry
2024-08-15 15:41         ` Martin K. Petersen
2024-08-15  8:27 ` [PATCH 2/2] block: Drop NULL check in bdev_write_zeroes_sectors() John Garry
2024-08-15 12:40   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox