All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yongpeng Yang <yangyongpeng.storage@gmail.com>
To: Yongpeng Yang <yangyongpeng.storage@gmail.com>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Chaitanya Kulkarni <ckulkarnilinux@gmail.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"agk@redhat.com" <agk@redhat.com>,
	"snitzer@kernel.org" <snitzer@kernel.org>,
	"mpatocka@redhat.com" <mpatocka@redhat.com>,
	"song@kernel.org" <song@kernel.org>,
	"yukuai@fnnas.com" <yukuai@fnnas.com>, "hch@lst.de" <hch@lst.de>,
	"sagi@grimberg.me" <sagi@grimberg.me>,
	"jaegeuk@kernel.org" <jaegeuk@kernel.org>,
	"chao@kernel.org" <chao@kernel.org>,
	"cem@kernel.org" <cem@kernel.org>
Cc: "dm-devel@lists.linux.dev" <dm-devel@lists.linux.dev>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>,
	Yongpeng Yang <yangyongpeng@xiaomi.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-f2fs-devel@lists.sourceforge.net"
	<linux-f2fs-devel@lists.sourceforge.net>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: [f2fs-dev] [PATCH V3 6/6] xfs: ignore discard return value
Date: Wed, 26 Nov 2025 17:48:28 +0800	[thread overview]
Message-ID: <5a1387fd-4952-42e0-b7a9-e614f7b20325@gmail.com> (raw)
In-Reply-To: <2da95607-9b21-4d21-8926-9463021a6f33@gmail.com>

On 11/26/25 17:14, Yongpeng Yang wrote:
> On 11/26/25 16:07, Chaitanya Kulkarni via Linux-f2fs-devel wrote:
>> On 11/25/25 18:37, Yongpeng Yang wrote:
>>>> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
>>>> index 6917de832191..b6ffe4807a11 100644
>>>> --- a/fs/xfs/xfs_discard.c
>>>> +++ b/fs/xfs/xfs_discard.c
>>>> @@ -108,7 +108,7 @@ xfs_discard_endio(
>>>>     * list. We plug and chain the bios so that we only need a single
>>>> completion
>>>>     * call to clear all the busy extents once the discards are 
>>>> complete.
>>>>     */
>>>> -int
>>>> +void
>>>>    xfs_discard_extents(
>>>>        struct xfs_mount    *mp,
>>>>        struct xfs_busy_extents    *extents)
>>>> @@ -116,7 +116,6 @@ xfs_discard_extents(
>>>>        struct xfs_extent_busy    *busyp;
>>>>        struct bio        *bio = NULL;
>>>>        struct blk_plug        plug;
>>>> -    int            error = 0;
>>>>          blk_start_plug(&plug);
>>>>        list_for_each_entry(busyp, &extents->extent_list, list) {
>>>> @@ -126,18 +125,10 @@ xfs_discard_extents(
>>>>              trace_xfs_discard_extent(xg, busyp->bno, busyp->length);
>>>>    -        error = __blkdev_issue_discard(btp->bt_bdev,
>>>> +        __blkdev_issue_discard(btp->bt_bdev,
>>>>                    xfs_gbno_to_daddr(xg, busyp->bno),
>>>>                    XFS_FSB_TO_BB(mp, busyp->length),
>>>>                    GFP_KERNEL, &bio);
>>>
>>> If blk_alloc_discard_bio() fails to allocate a bio inside
>>> __blkdev_issue_discard(), this may lead to an invalid loop in
>>> list_for_each_entry{}. Instead of using __blkdev_issue_discard(), how
>>> about allocate and submit the discard bios explicitly in
>>> list_for_each_entry{}?
>>>
>>> Yongpeng,
>>
>>
>> Calling __blkdev_issue_discard() keeps managing all the bio with the
>> appropriate GFP mask, so the semantics stay the same. You are just
>> moving memory allocation to the caller and potentially looking at
>> implementing retry on bio allocation failure.
>>
>> The retry for discard bio memory allocation is not desired I think,
>> since it's only a hint to the controller.
> 
> Agreed. I'm not trying to retry bio allocation inside the
> list_for_each_entry{} loop. Instead, since blk_alloc_discard_bio()
> returning NULL cannot reliably indicate whether the failure is due to
> bio allocation failure, it could also be caused by 'bio_sects == 0', I'd
> like to allocate the bio explicitly.
> 
>>
>> This patch is simply cleaning up dead error-handling branches at the
>> callers no behavioral changes intended.
>>
>> What maybe useful is to stop iterating once we fail to allocate the
>> bio [1].
>>
>> -ck
>>
>> [1] Potential addition on the top of this to exit early in discard loop
>>       on bio allocation failure.
>>
>> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
>> index b6ffe4807a11..1519f708bb79 100644
>> --- a/fs/xfs/xfs_discard.c
>> +++ b/fs/xfs/xfs_discard.c
>> @@ -129,6 +129,13 @@ xfs_discard_extents(
>>                                   xfs_gbno_to_daddr(xg, busyp->bno),
>>                                   XFS_FSB_TO_BB(mp, busyp->length),
>>                                   GFP_KERNEL, &bio);
>> +               /*
>> +                * We failed to allocate bio instead of continuing the 
>> loop
>> +                * so it will lead to inconsistent discards to the disk
>> +                * exit early and jump into xfs_discard_busy_clear().
>> +                */
>> +               if (!bio)
>> +                       break;
> 
> I noticed that as long as XFS_FSB_TO_BB(mp, busyp->length) is greater
> than 0 and there is no bio allocation failure, __blkdev_issue_discard()
> will never return NULL. I'm not familiar with this part of the xfs, so
> I'm not sure whether there are cases where 'XFS_FSB_TO_BB(mp,
> busyp->length)' could be 0. If such cases do not exist, then
> checking whether the bio is NULL should be sufficient.
> 
> Yongpeng,

If __blkdev_issue_discard() requires multiple calls to
blk_alloc_discard_bio(), once the first bio allocation succeeds, it will
never result in bio == NULL, meaning that any subsequent bio allocation
failures cannot be detected.

Yongpeng,

> 
>>           }
>>           if (bio) {
>> > If we keep looping after the first bio == NULL, the rest of the 
>> range is
>> guaranteed to be inconsistent anyways, because every subsequent iteration
>> will fall into one of three cases:
>>
>> - The allocator keeps returning NULL, so none of the remaining LBAs 
>> receive
>>     discard.
>> - Rest of the allocator succeeds, but we’ve already skipped a chunk, 
>> leaving
>>     a hole in the discard range.
>> - We get intermittent successes, which produces alternating chunks of
>>     discarded and undiscarded blocks.
>>
>> In each of those scenarios, the disk ends up with a partially discarded
>> range, so the correct fix is to break out of the loop immediately and
>> proceed to xfs_discard_busy_clear() once the very first allocation fails.

WARNING: multiple messages have this Message-ID (diff)
From: Yongpeng Yang <yangyongpeng.storage@gmail.com>
To: Yongpeng Yang <yangyongpeng.storage@gmail.com>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Chaitanya Kulkarni <ckulkarnilinux@gmail.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"agk@redhat.com" <agk@redhat.com>,
	"snitzer@kernel.org" <snitzer@kernel.org>,
	"mpatocka@redhat.com" <mpatocka@redhat.com>,
	"song@kernel.org" <song@kernel.org>,
	"yukuai@fnnas.com" <yukuai@fnnas.com>, "hch@lst.de" <hch@lst.de>,
	"sagi@grimberg.me" <sagi@grimberg.me>,
	"jaegeuk@kernel.org" <jaegeuk@kernel.org>,
	"chao@kernel.org" <chao@kernel.org>,
	"cem@kernel.org" <cem@kernel.org>
Cc: "dm-devel@lists.linux.dev" <dm-devel@lists.linux.dev>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>,
	Yongpeng Yang <yangyongpeng@xiaomi.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-f2fs-devel@lists.sourceforge.net"
	<linux-f2fs-devel@lists.sourceforge.net>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: [f2fs-dev] [PATCH V3 6/6] xfs: ignore discard return value
Date: Wed, 26 Nov 2025 17:48:28 +0800	[thread overview]
Message-ID: <5a1387fd-4952-42e0-b7a9-e614f7b20325@gmail.com> (raw)
In-Reply-To: <2da95607-9b21-4d21-8926-9463021a6f33@gmail.com>

On 11/26/25 17:14, Yongpeng Yang wrote:
> On 11/26/25 16:07, Chaitanya Kulkarni via Linux-f2fs-devel wrote:
>> On 11/25/25 18:37, Yongpeng Yang wrote:
>>>> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
>>>> index 6917de832191..b6ffe4807a11 100644
>>>> --- a/fs/xfs/xfs_discard.c
>>>> +++ b/fs/xfs/xfs_discard.c
>>>> @@ -108,7 +108,7 @@ xfs_discard_endio(
>>>>     * list. We plug and chain the bios so that we only need a single
>>>> completion
>>>>     * call to clear all the busy extents once the discards are 
>>>> complete.
>>>>     */
>>>> -int
>>>> +void
>>>>    xfs_discard_extents(
>>>>        struct xfs_mount    *mp,
>>>>        struct xfs_busy_extents    *extents)
>>>> @@ -116,7 +116,6 @@ xfs_discard_extents(
>>>>        struct xfs_extent_busy    *busyp;
>>>>        struct bio        *bio = NULL;
>>>>        struct blk_plug        plug;
>>>> -    int            error = 0;
>>>>          blk_start_plug(&plug);
>>>>        list_for_each_entry(busyp, &extents->extent_list, list) {
>>>> @@ -126,18 +125,10 @@ xfs_discard_extents(
>>>>              trace_xfs_discard_extent(xg, busyp->bno, busyp->length);
>>>>    -        error = __blkdev_issue_discard(btp->bt_bdev,
>>>> +        __blkdev_issue_discard(btp->bt_bdev,
>>>>                    xfs_gbno_to_daddr(xg, busyp->bno),
>>>>                    XFS_FSB_TO_BB(mp, busyp->length),
>>>>                    GFP_KERNEL, &bio);
>>>
>>> If blk_alloc_discard_bio() fails to allocate a bio inside
>>> __blkdev_issue_discard(), this may lead to an invalid loop in
>>> list_for_each_entry{}. Instead of using __blkdev_issue_discard(), how
>>> about allocate and submit the discard bios explicitly in
>>> list_for_each_entry{}?
>>>
>>> Yongpeng,
>>
>>
>> Calling __blkdev_issue_discard() keeps managing all the bio with the
>> appropriate GFP mask, so the semantics stay the same. You are just
>> moving memory allocation to the caller and potentially looking at
>> implementing retry on bio allocation failure.
>>
>> The retry for discard bio memory allocation is not desired I think,
>> since it's only a hint to the controller.
> 
> Agreed. I'm not trying to retry bio allocation inside the
> list_for_each_entry{} loop. Instead, since blk_alloc_discard_bio()
> returning NULL cannot reliably indicate whether the failure is due to
> bio allocation failure, it could also be caused by 'bio_sects == 0', I'd
> like to allocate the bio explicitly.
> 
>>
>> This patch is simply cleaning up dead error-handling branches at the
>> callers no behavioral changes intended.
>>
>> What maybe useful is to stop iterating once we fail to allocate the
>> bio [1].
>>
>> -ck
>>
>> [1] Potential addition on the top of this to exit early in discard loop
>>       on bio allocation failure.
>>
>> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
>> index b6ffe4807a11..1519f708bb79 100644
>> --- a/fs/xfs/xfs_discard.c
>> +++ b/fs/xfs/xfs_discard.c
>> @@ -129,6 +129,13 @@ xfs_discard_extents(
>>                                   xfs_gbno_to_daddr(xg, busyp->bno),
>>                                   XFS_FSB_TO_BB(mp, busyp->length),
>>                                   GFP_KERNEL, &bio);
>> +               /*
>> +                * We failed to allocate bio instead of continuing the 
>> loop
>> +                * so it will lead to inconsistent discards to the disk
>> +                * exit early and jump into xfs_discard_busy_clear().
>> +                */
>> +               if (!bio)
>> +                       break;
> 
> I noticed that as long as XFS_FSB_TO_BB(mp, busyp->length) is greater
> than 0 and there is no bio allocation failure, __blkdev_issue_discard()
> will never return NULL. I'm not familiar with this part of the xfs, so
> I'm not sure whether there are cases where 'XFS_FSB_TO_BB(mp,
> busyp->length)' could be 0. If such cases do not exist, then
> checking whether the bio is NULL should be sufficient.
> 
> Yongpeng,

If __blkdev_issue_discard() requires multiple calls to
blk_alloc_discard_bio(), once the first bio allocation succeeds, it will
never result in bio == NULL, meaning that any subsequent bio allocation
failures cannot be detected.

Yongpeng,

> 
>>           }
>>           if (bio) {
>> > If we keep looping after the first bio == NULL, the rest of the 
>> range is
>> guaranteed to be inconsistent anyways, because every subsequent iteration
>> will fall into one of three cases:
>>
>> - The allocator keeps returning NULL, so none of the remaining LBAs 
>> receive
>>     discard.
>> - Rest of the allocator succeeds, but we’ve already skipped a chunk, 
>> leaving
>>     a hole in the discard range.
>> - We get intermittent successes, which produces alternating chunks of
>>     discarded and undiscarded blocks.
>>
>> In each of those scenarios, the disk ends up with a partially discarded
>> range, so the correct fix is to break out of the loop immediately and
>> proceed to xfs_discard_busy_clear() once the very first allocation fails.


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply	other threads:[~2025-11-26  9:48 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-24 23:48 [PATCH V3 0/6] block: ignore __blkdev_issue_discard() ret value Chaitanya Kulkarni
2025-11-24 23:48 ` [f2fs-dev] " Chaitanya Kulkarni
2025-11-24 23:48 ` [PATCH V3 1/6] block: ignore discard return value Chaitanya Kulkarni
2025-11-24 23:48   ` [f2fs-dev] " Chaitanya Kulkarni
2025-11-25 17:38   ` Jens Axboe
2025-11-25 17:38     ` [f2fs-dev] " Jens Axboe
2025-11-25 19:09     ` Chaitanya Kulkarni
2025-11-25 19:09       ` [f2fs-dev] " Chaitanya Kulkarni
2025-11-25 19:19       ` Jens Axboe
2025-11-25 19:19         ` [f2fs-dev] " Jens Axboe
2025-11-24 23:48 ` [PATCH V3 2/6] md: " Chaitanya Kulkarni
2025-11-24 23:48   ` [f2fs-dev] " Chaitanya Kulkarni
2025-12-01  6:22   ` Chaitanya Kulkarni
2026-01-24 21:29     ` Chaitanya Kulkarni
2025-11-24 23:48 ` [PATCH V3 3/6] dm: " Chaitanya Kulkarni
2025-11-24 23:48   ` [f2fs-dev] " Chaitanya Kulkarni
2025-12-01  6:23   ` Chaitanya Kulkarni
2025-12-01 21:07     ` Mikulas Patocka
2025-11-24 23:48 ` [PATCH V3 4/6] nvmet: " Chaitanya Kulkarni
2025-11-24 23:48   ` [f2fs-dev] " Chaitanya Kulkarni
2025-11-26 10:14   ` Yongpeng Yang
2025-11-26 10:14     ` Yongpeng Yang
2025-11-26 10:37     ` Christoph Hellwig
2025-11-26 10:37       ` Christoph Hellwig
2026-01-24 21:35   ` Chaitanya Kulkarni
2026-01-24 21:35     ` [f2fs-dev] " Chaitanya Kulkarni via Linux-f2fs-devel
2026-01-26  4:57     ` hch
2026-01-26  4:57       ` [f2fs-dev] " hch
2026-01-27 22:40       ` Chaitanya Kulkarni
2026-01-27 22:40         ` [f2fs-dev] " Chaitanya Kulkarni via Linux-f2fs-devel
2025-11-24 23:48 ` [PATCH V3 5/6] f2fs: " Chaitanya Kulkarni
2025-11-24 23:48   ` [f2fs-dev] " Chaitanya Kulkarni
2025-11-25  1:10   ` Chao Yu
2025-11-25  1:10     ` [f2fs-dev] " Chao Yu via Linux-f2fs-devel
2025-11-25  6:33     ` Christoph Hellwig
2025-11-25  6:33       ` [f2fs-dev] " Christoph Hellwig
2025-11-25  7:11       ` Chao Yu
2025-11-25  7:11         ` [f2fs-dev] " Chao Yu via Linux-f2fs-devel
2025-12-01  6:26         ` Chaitanya Kulkarni via Linux-f2fs-devel
2025-11-26  2:47   ` Chao Yu
2025-11-26  2:47     ` [f2fs-dev] " Chao Yu via Linux-f2fs-devel
2025-11-26  3:37     ` Chaitanya Kulkarni
2025-11-26  3:37       ` [f2fs-dev] " Chaitanya Kulkarni via Linux-f2fs-devel
2025-11-26  3:58       ` Chao Yu
2025-11-26  3:58         ` [f2fs-dev] " Chao Yu via Linux-f2fs-devel
2025-11-24 23:48 ` [PATCH V3 6/6] xfs: " Chaitanya Kulkarni
2025-11-24 23:48   ` [f2fs-dev] " Chaitanya Kulkarni
2025-11-26  2:37   ` Yongpeng Yang
2025-11-26  2:37     ` Yongpeng Yang
2025-11-26  8:07     ` Chaitanya Kulkarni
2025-11-26  8:07       ` Chaitanya Kulkarni via Linux-f2fs-devel
2025-11-26  9:14       ` Yongpeng Yang
2025-11-26  9:14         ` Yongpeng Yang
2025-11-26  9:48         ` Yongpeng Yang [this message]
2025-11-26  9:48           ` Yongpeng Yang
2025-11-26 10:34       ` hch
2025-11-26 10:34         ` hch
2025-11-26 10:33     ` Christoph Hellwig
2025-11-26 10:33       ` Christoph Hellwig
2025-12-01  6:28   ` Chaitanya Kulkarni
2025-12-01 12:20     ` Carlos Maiolino
2025-11-25 13:20 ` [PATCH V3 0/6] block: ignore __blkdev_issue_discard() ret value Anuj gupta
2025-11-25 13:20   ` [f2fs-dev] " Anuj gupta
2025-11-25 19:20 ` (subset) " Jens Axboe
2025-11-25 19:20   ` [f2fs-dev] " Jens Axboe
2025-12-03 21:50 ` [f2fs-dev] " patchwork-bot+f2fs
2025-12-03 21:50   ` patchwork-bot+f2fs--- via Linux-f2fs-devel
2025-12-09 17:18 ` patchwork-bot+f2fs
2025-12-09 17:18   ` patchwork-bot+f2fs--- via Linux-f2fs-devel
2025-12-16  0:50 ` patchwork-bot+f2fs
2025-12-16  0:50   ` patchwork-bot+f2fs--- via Linux-f2fs-devel
2026-02-17 21:14 ` patchwork-bot+f2fs
2026-02-17 21:14   ` patchwork-bot+f2fs--- via Linux-f2fs-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a1387fd-4952-42e0-b7a9-e614f7b20325@gmail.com \
    --to=yangyongpeng.storage@gmail.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=cem@kernel.org \
    --cc=chaitanyak@nvidia.com \
    --cc=chao@kernel.org \
    --cc=ckulkarnilinux@gmail.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=hch@lst.de \
    --cc=jaegeuk@kernel.org \
    --cc=johannes.thumshirn@wdc.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=sagi@grimberg.me \
    --cc=snitzer@kernel.org \
    --cc=song@kernel.org \
    --cc=yangyongpeng@xiaomi.com \
    --cc=yukuai@fnnas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.