* [PATCH] btrfs: defrag: don't try to defrag extent which is going to be written back
@ 2022-02-07 5:17 Qu Wenruo
2022-02-07 10:35 ` Filipe Manana
0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2022-02-07 5:17 UTC (permalink / raw)
To: linux-btrfs
In defrag_collect_targets() if we hit an extent map which is created by
create_io_em(), it will be considered as target as its generation is
(u64)-1, thus will pass the generation check.
Furthermore since all delalloc functions will clear EXTENT_DELALLOC,
such extent map will also pass the EXTENT_DELALLOC check.
Defragging such extent will make no sense, in fact this will cause extra
IO as we will just re-dirty the range and submit it for writeback again,
causing wasted IO.
Unfortunately this behavior seems to exist in older kernels too (v5.15
and older), but I don't have a solid test case to prove it nor test the
patched behavior.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/ioctl.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 133e3e2e2e79..0ba98e1d9329 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1353,6 +1353,10 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
if (em->generation < ctrl->newer_than)
goto next;
+ /* This em is goging to be written back, no need to defrag */
+ if (em->generation == (u64)-1)
+ goto next;
+
/*
* Our start offset might be in the middle of an existing extent
* map, so take that into account.
--
2.35.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] btrfs: defrag: don't try to defrag extent which is going to be written back
2022-02-07 5:17 [PATCH] btrfs: defrag: don't try to defrag extent which is going to be written back Qu Wenruo
@ 2022-02-07 10:35 ` Filipe Manana
2022-02-07 11:06 ` Qu Wenruo
0 siblings, 1 reply; 4+ messages in thread
From: Filipe Manana @ 2022-02-07 10:35 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Mon, Feb 07, 2022 at 01:17:15PM +0800, Qu Wenruo wrote:
> In defrag_collect_targets() if we hit an extent map which is created by
> create_io_em(), it will be considered as target as its generation is
> (u64)-1, thus will pass the generation check.
>
> Furthermore since all delalloc functions will clear EXTENT_DELALLOC,
What are delalloc functions?
This should say that once we start writeback (we run delalloc), we allocate
an extent, create an extent map point to that extent, with a generation of
(u64)-1, created the ordered extent and then clear the DELALLOC bit from the
range in the inode's io tree.
> such extent map will also pass the EXTENT_DELALLOC check.
>
> Defragging such extent will make no sense, in fact this will cause extra
> IO as we will just re-dirty the range and submit it for writeback again,
> causing wasted IO.
defrag_prepare_one_page() will wait for the ordered extent to complete,
and after the wait, the extent map's generation is updated from (u64)-1 to
something else.
So the second pass of defrag_collect_targets() will see a generation that
is not (u64)-1.
>
> Unfortunately this behavior seems to exist in older kernels too (v5.15
> and older), but I don't have a solid test case to prove it nor test the
> patched behavior.
This is exactly the first patch I sent François when the first report
of unusable autodefrag popped up:
https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/ioctl.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 133e3e2e2e79..0ba98e1d9329 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1353,6 +1353,10 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> if (em->generation < ctrl->newer_than)
> goto next;
>
> + /* This em is goging to be written back, no need to defrag */
goging -> going
Saying that it is under writeback is more correct.
By saying is "going to be", it gives the idea that writeback may have not started yet,
but an extent map with a generation of (u64)-1 is created when writeback starts.
> + if (em->generation == (u64)-1)
> + goto next;
> +
> /*
> * Our start offset might be in the middle of an existing extent
> * map, so take that into account.
> --
> 2.35.0
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] btrfs: defrag: don't try to defrag extent which is going to be written back
2022-02-07 10:35 ` Filipe Manana
@ 2022-02-07 11:06 ` Qu Wenruo
2022-02-07 12:53 ` Filipe Manana
0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2022-02-07 11:06 UTC (permalink / raw)
To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs
On 2022/2/7 18:35, Filipe Manana wrote:
> On Mon, Feb 07, 2022 at 01:17:15PM +0800, Qu Wenruo wrote:
>> In defrag_collect_targets() if we hit an extent map which is created by
>> create_io_em(), it will be considered as target as its generation is
>> (u64)-1, thus will pass the generation check.
>>
>> Furthermore since all delalloc functions will clear EXTENT_DELALLOC,
>
> What are delalloc functions?
> This should say that once we start writeback (we run delalloc), we allocate
> an extent, create an extent map point to that extent, with a generation of
> (u64)-1, created the ordered extent and then clear the DELALLOC bit from the
> range in the inode's io tree.
I mean the functions called inside btrfs_run_dealloc_ranges(), like
cow_file_range() etc.
>
>> such extent map will also pass the EXTENT_DELALLOC check.
>>
>> Defragging such extent will make no sense, in fact this will cause extra
>> IO as we will just re-dirty the range and submit it for writeback again,
>> causing wasted IO.
>
> defrag_prepare_one_page() will wait for the ordered extent to complete,
> and after the wait, the extent map's generation is updated from (u64)-1 to
> something else.
>
> So the second pass of defrag_collect_targets() will see a generation that
> is not (u64)-1.
Yes, for the 2nd loop we will no longer see the (u64)-1 generations.
>
>>
>> Unfortunately this behavior seems to exist in older kernels too (v5.15
>> and older), but I don't have a solid test case to prove it nor test the
>> patched behavior.
>
> This is exactly the first patch I sent François when the first report
> of unusable autodefrag popped up:
>
> https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb
Oh, mind to send out the proper patch as you're the original author with
this idea?
Thanks,
Qu
>
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/ioctl.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 133e3e2e2e79..0ba98e1d9329 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1353,6 +1353,10 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>> if (em->generation < ctrl->newer_than)
>> goto next;
>>
>> + /* This em is goging to be written back, no need to defrag */
>
> goging -> going
>
> Saying that it is under writeback is more correct.
> By saying is "going to be", it gives the idea that writeback may have not started yet,
> but an extent map with a generation of (u64)-1 is created when writeback starts.
>
>
>> + if (em->generation == (u64)-1)
>> + goto next;
>> +
>> /*
>> * Our start offset might be in the middle of an existing extent
>> * map, so take that into account.
>> --
>> 2.35.0
>>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] btrfs: defrag: don't try to defrag extent which is going to be written back
2022-02-07 11:06 ` Qu Wenruo
@ 2022-02-07 12:53 ` Filipe Manana
0 siblings, 0 replies; 4+ messages in thread
From: Filipe Manana @ 2022-02-07 12:53 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs
On Mon, Feb 07, 2022 at 07:06:27PM +0800, Qu Wenruo wrote:
>
>
> On 2022/2/7 18:35, Filipe Manana wrote:
> > On Mon, Feb 07, 2022 at 01:17:15PM +0800, Qu Wenruo wrote:
> > > In defrag_collect_targets() if we hit an extent map which is created by
> > > create_io_em(), it will be considered as target as its generation is
> > > (u64)-1, thus will pass the generation check.
> > >
> > > Furthermore since all delalloc functions will clear EXTENT_DELALLOC,
> >
> > What are delalloc functions?
> > This should say that once we start writeback (we run delalloc), we allocate
> > an extent, create an extent map point to that extent, with a generation of
> > (u64)-1, created the ordered extent and then clear the DELALLOC bit from the
> > range in the inode's io tree.
>
> I mean the functions called inside btrfs_run_dealloc_ranges(), like
> cow_file_range() etc.
>
> >
> > > such extent map will also pass the EXTENT_DELALLOC check.
> > >
> > > Defragging such extent will make no sense, in fact this will cause extra
> > > IO as we will just re-dirty the range and submit it for writeback again,
> > > causing wasted IO.
> >
> > defrag_prepare_one_page() will wait for the ordered extent to complete,
> > and after the wait, the extent map's generation is updated from (u64)-1 to
> > something else.
> >
> > So the second pass of defrag_collect_targets() will see a generation that
> > is not (u64)-1.
>
> Yes, for the 2nd loop we will no longer see the (u64)-1 generations.
>
> >
> > >
> > > Unfortunately this behavior seems to exist in older kernels too (v5.15
> > > and older), but I don't have a solid test case to prove it nor test the
> > > patched behavior.
> >
> > This is exactly the first patch I sent François when the first report
> > of unusable autodefrag popped up:
> >
> > https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@debian9.Home/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb
>
> Oh, mind to send out the proper patch as you're the original author with
> this idea?
I don't mind if you send your version. That's fine, my comment there about
the infinite loop is not correct, as that was before having a better understanding
of all that changed in the refactoring and before finding out the 1 byte file
case triggering the almost infinite loop.
Just the correction to the change log, comment, and also the subject ("which is going
to be written back" should be "which is under writeback").
Thanks.
>
> Thanks,
> Qu
> >
> > >
> > > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > > ---
> > > fs/btrfs/ioctl.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > index 133e3e2e2e79..0ba98e1d9329 100644
> > > --- a/fs/btrfs/ioctl.c
> > > +++ b/fs/btrfs/ioctl.c
> > > @@ -1353,6 +1353,10 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> > > if (em->generation < ctrl->newer_than)
> > > goto next;
> > >
> > > + /* This em is goging to be written back, no need to defrag */
> >
> > goging -> going
> >
> > Saying that it is under writeback is more correct.
> > By saying is "going to be", it gives the idea that writeback may have not started yet,
> > but an extent map with a generation of (u64)-1 is created when writeback starts.
> >
> >
> > > + if (em->generation == (u64)-1)
> > > + goto next;
> > > +
> > > /*
> > > * Our start offset might be in the middle of an existing extent
> > > * map, so take that into account.
> > > --
> > > 2.35.0
> > >
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-02-07 13:00 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-07 5:17 [PATCH] btrfs: defrag: don't try to defrag extent which is going to be written back Qu Wenruo
2022-02-07 10:35 ` Filipe Manana
2022-02-07 11:06 ` Qu Wenruo
2022-02-07 12:53 ` Filipe Manana
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox