* [PATCH 0/2] btrfs: fix extent state leaks after device replace
@ 2023-04-26 17:12 fdmanana
2023-04-26 17:13 ` [PATCH 1/2] btrfs: fix leak of source device allocation state " fdmanana
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: fdmanana @ 2023-04-26 17:12 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
This fixes a recent regression (on its way to Linus' tree) that results in
leaking extent state records in the allocation io tree of a device used as
a source device for a device replace. Also unexport btrfs_free_device().
Filipe Manana (2):
btrfs: fix leak of source device allocation state after device replace
btrfs: make btrfs_free_device() static
fs/btrfs/volumes.c | 3 ++-
fs/btrfs/volumes.h | 1 -
2 files changed, 2 insertions(+), 2 deletions(-)
--
2.35.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] btrfs: fix leak of source device allocation state after device replace
2023-04-26 17:12 [PATCH 0/2] btrfs: fix extent state leaks after device replace fdmanana
@ 2023-04-26 17:13 ` fdmanana
2023-04-27 3:15 ` Qu Wenruo
2023-04-27 15:16 ` Anand Jain
2023-04-26 17:13 ` [PATCH 2/2] btrfs: make btrfs_free_device() static fdmanana
2023-04-27 22:52 ` [PATCH 0/2] btrfs: fix extent state leaks after device replace David Sterba
2 siblings, 2 replies; 12+ messages in thread
From: fdmanana @ 2023-04-26 17:13 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
When a device replace finishes, the source device is freed by calling
btrfs_free_device() at btrfs_rm_dev_replace_free_srcdev(), but the
allocation state, tracked in the device's alloc_state io tree, is never
freed.
This is a regression recently introduced by commit f0bb5474cff0 ("btrfs:
remove redundant release of btrfs_device::alloc_state"), which removed a
call to extent_io_tree_release() from btrfs_free_device(), with the
rationale that btrfs_close_one_device() already releases the allocation
state from a device and btrfs_close_one_device() is always called before
a device is freed with btrfs_free_device(). However that is not true for
the device replace case, as btrfs_free_device() is called without any
previous call to btrfs_close_one_device().
The issue is trivial to reproduce, for example, by running test btrfs/027
from fstests:
$ ./check btrfs/027
$ rmmod btrfs
$ dmesg
(...)
[84519.395485] BTRFS info (device sdc): dev_replace from <missing disk> (devid 2) to /dev/sdg started
[84519.466224] BTRFS info (device sdc): dev_replace from <missing disk> (devid 2) to /dev/sdg finished
[84519.552251] BTRFS info (device sdc): scrub: started on devid 1
[84519.552277] BTRFS info (device sdc): scrub: started on devid 2
[84519.552332] BTRFS info (device sdc): scrub: started on devid 3
[84519.552705] BTRFS info (device sdc): scrub: started on devid 4
[84519.604261] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0
[84519.609374] BTRFS info (device sdc): scrub: finished on devid 3 with status: 0
[84519.610818] BTRFS info (device sdc): scrub: finished on devid 1 with status: 0
[84519.610927] BTRFS info (device sdc): scrub: finished on devid 2 with status: 0
[84559.503795] BTRFS: state leak: start 1048576 end 1351614463 state 1 in tree 1 refs 1
[84559.506764] BTRFS: state leak: start 1048576 end 1347420159 state 1 in tree 1 refs 1
[84559.510294] BTRFS: state leak: start 1048576 end 1351614463 state 1 in tree 1 refs 1
So fix this by adding back the call to extent_io_tree_release() at
btrfs_free_device().
Fixes: f0bb5474cff0 ("btrfs: remove redundant release of btrfs_device::alloc_state")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/volumes.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 03f52e4a20aa..841e799dece5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -395,6 +395,7 @@ void btrfs_free_device(struct btrfs_device *device)
{
WARN_ON(!list_empty(&device->post_commit_list));
rcu_string_free(device->name);
+ extent_io_tree_release(&device->alloc_state);
btrfs_destroy_dev_zone_info(device);
kfree(device);
}
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] btrfs: make btrfs_free_device() static
2023-04-26 17:12 [PATCH 0/2] btrfs: fix extent state leaks after device replace fdmanana
2023-04-26 17:13 ` [PATCH 1/2] btrfs: fix leak of source device allocation state " fdmanana
@ 2023-04-26 17:13 ` fdmanana
2023-04-27 11:31 ` Qu Wenruo
2023-04-27 15:17 ` Anand Jain
2023-04-27 22:52 ` [PATCH 0/2] btrfs: fix extent state leaks after device replace David Sterba
2 siblings, 2 replies; 12+ messages in thread
From: fdmanana @ 2023-04-26 17:13 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The function btrfs_free_device() is never used outside of volumes.c, so
make it static and remove its prototype declaration at volumes.h.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/volumes.c | 2 +-
fs/btrfs/volumes.h | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 841e799dece5..1a7620680f50 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -391,7 +391,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
return fs_devs;
}
-void btrfs_free_device(struct btrfs_device *device)
+static void btrfs_free_device(struct btrfs_device *device)
{
WARN_ON(!list_empty(&device->post_commit_list));
rcu_string_free(device->name);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index bf47a1a70813..5cbbee32748c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -617,7 +617,6 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
const u64 *devid, const u8 *uuid,
const char *path);
void btrfs_put_dev_args_from_path(struct btrfs_dev_lookup_args *args);
-void btrfs_free_device(struct btrfs_device *device);
int btrfs_rm_device(struct btrfs_fs_info *fs_info,
struct btrfs_dev_lookup_args *args,
struct block_device **bdev, fmode_t *mode);
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] btrfs: fix leak of source device allocation state after device replace
2023-04-26 17:13 ` [PATCH 1/2] btrfs: fix leak of source device allocation state " fdmanana
@ 2023-04-27 3:15 ` Qu Wenruo
2023-04-27 15:16 ` Anand Jain
1 sibling, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2023-04-27 3:15 UTC (permalink / raw)
To: fdmanana, linux-btrfs
On 2023/4/27 01:13, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When a device replace finishes, the source device is freed by calling
> btrfs_free_device() at btrfs_rm_dev_replace_free_srcdev(), but the
> allocation state, tracked in the device's alloc_state io tree, is never
> freed.
>
> This is a regression recently introduced by commit f0bb5474cff0 ("btrfs:
> remove redundant release of btrfs_device::alloc_state"), which removed a
> call to extent_io_tree_release() from btrfs_free_device(), with the
> rationale that btrfs_close_one_device() already releases the allocation
> state from a device and btrfs_close_one_device() is always called before
> a device is freed with btrfs_free_device(). However that is not true for
> the device replace case, as btrfs_free_device() is called without any
> previous call to btrfs_close_one_device().
>
> The issue is trivial to reproduce, for example, by running test btrfs/027
> from fstests:
>
> $ ./check btrfs/027
> $ rmmod btrfs
> $ dmesg
> (...)
> [84519.395485] BTRFS info (device sdc): dev_replace from <missing disk> (devid 2) to /dev/sdg started
> [84519.466224] BTRFS info (device sdc): dev_replace from <missing disk> (devid 2) to /dev/sdg finished
> [84519.552251] BTRFS info (device sdc): scrub: started on devid 1
> [84519.552277] BTRFS info (device sdc): scrub: started on devid 2
> [84519.552332] BTRFS info (device sdc): scrub: started on devid 3
> [84519.552705] BTRFS info (device sdc): scrub: started on devid 4
> [84519.604261] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0
> [84519.609374] BTRFS info (device sdc): scrub: finished on devid 3 with status: 0
> [84519.610818] BTRFS info (device sdc): scrub: finished on devid 1 with status: 0
> [84519.610927] BTRFS info (device sdc): scrub: finished on devid 2 with status: 0
> [84559.503795] BTRFS: state leak: start 1048576 end 1351614463 state 1 in tree 1 refs 1
> [84559.506764] BTRFS: state leak: start 1048576 end 1347420159 state 1 in tree 1 refs 1
> [84559.510294] BTRFS: state leak: start 1048576 end 1351614463 state 1 in tree 1 refs 1
>
> So fix this by adding back the call to extent_io_tree_release() at
> btrfs_free_device().
>
> Fixes: f0bb5474cff0 ("btrfs: remove redundant release of btrfs_device::alloc_state")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
> ---
> fs/btrfs/volumes.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 03f52e4a20aa..841e799dece5 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -395,6 +395,7 @@ void btrfs_free_device(struct btrfs_device *device)
> {
> WARN_ON(!list_empty(&device->post_commit_list));
> rcu_string_free(device->name);
> + extent_io_tree_release(&device->alloc_state);
> btrfs_destroy_dev_zone_info(device);
> kfree(device);
> }
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: make btrfs_free_device() static
2023-04-26 17:13 ` [PATCH 2/2] btrfs: make btrfs_free_device() static fdmanana
@ 2023-04-27 11:31 ` Qu Wenruo
2023-04-27 15:17 ` Anand Jain
1 sibling, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2023-04-27 11:31 UTC (permalink / raw)
To: fdmanana, linux-btrfs
On 2023/4/27 01:13, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> The function btrfs_free_device() is never used outside of volumes.c, so
> make it static and remove its prototype declaration at volumes.h.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
> ---
> fs/btrfs/volumes.c | 2 +-
> fs/btrfs/volumes.h | 1 -
> 2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 841e799dece5..1a7620680f50 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -391,7 +391,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
> return fs_devs;
> }
>
> -void btrfs_free_device(struct btrfs_device *device)
> +static void btrfs_free_device(struct btrfs_device *device)
> {
> WARN_ON(!list_empty(&device->post_commit_list));
> rcu_string_free(device->name);
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index bf47a1a70813..5cbbee32748c 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -617,7 +617,6 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
> const u64 *devid, const u8 *uuid,
> const char *path);
> void btrfs_put_dev_args_from_path(struct btrfs_dev_lookup_args *args);
> -void btrfs_free_device(struct btrfs_device *device);
> int btrfs_rm_device(struct btrfs_fs_info *fs_info,
> struct btrfs_dev_lookup_args *args,
> struct block_device **bdev, fmode_t *mode);
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] btrfs: fix leak of source device allocation state after device replace
2023-04-26 17:13 ` [PATCH 1/2] btrfs: fix leak of source device allocation state " fdmanana
2023-04-27 3:15 ` Qu Wenruo
@ 2023-04-27 15:16 ` Anand Jain
2023-04-27 16:34 ` Filipe Manana
1 sibling, 1 reply; 12+ messages in thread
From: Anand Jain @ 2023-04-27 15:16 UTC (permalink / raw)
To: fdmanana, linux-btrfs
On 27/04/2023 01:13, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When a device replace finishes, the source device is freed by calling
> btrfs_free_device() at btrfs_rm_dev_replace_free_srcdev(), but the
> allocation state, tracked in the device's alloc_state io tree, is never
> freed.
>
> This is a regression recently introduced by commit f0bb5474cff0 ("btrfs:
> remove redundant release of btrfs_device::alloc_state"), which removed a
> call to extent_io_tree_release() from btrfs_free_device(), with the
> rationale that btrfs_close_one_device() already releases the allocation
> state from a device and btrfs_close_one_device() is always called before
> a device is freed with btrfs_free_device(). However that is not true for
> the device replace case, as btrfs_free_device() is called without any
> previous call to btrfs_close_one_device().
>
> The issue is trivial to reproduce, for example, by running test btrfs/027
> from fstests:
>
> $ ./check btrfs/027
> $ rmmod btrfs
Ah, module reload is a useful way to verify. I have now enabled
it in the fstests by setting TEST_FS_MODULE_RELOAD=1, and I am
able to reproduce the issue.
> $ dmesg
> (...)
> [84519.395485] BTRFS info (device sdc): dev_replace from <missing disk> (devid 2) to /dev/sdg started
> [84519.466224] BTRFS info (device sdc): dev_replace from <missing disk> (devid 2) to /dev/sdg finished
> [84519.552251] BTRFS info (device sdc): scrub: started on devid 1
> [84519.552277] BTRFS info (device sdc): scrub: started on devid 2
> [84519.552332] BTRFS info (device sdc): scrub: started on devid 3
> [84519.552705] BTRFS info (device sdc): scrub: started on devid 4
> [84519.604261] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0
> [84519.609374] BTRFS info (device sdc): scrub: finished on devid 3 with status: 0
> [84519.610818] BTRFS info (device sdc): scrub: finished on devid 1 with status: 0
> [84519.610927] BTRFS info (device sdc): scrub: finished on devid 2 with status: 0
> [84559.503795] BTRFS: state leak: start 1048576 end 1351614463 state 1 in tree 1 refs 1
> [84559.506764] BTRFS: state leak: start 1048576 end 1347420159 state 1 in tree 1 refs 1
> [84559.510294] BTRFS: state leak: start 1048576 end 1351614463 state 1 in tree 1 refs 1
>
> So fix this by adding back the call to extent_io_tree_release() at
> btrfs_free_device().
>
> Fixes: f0bb5474cff0 ("btrfs: remove redundant release of btrfs_device::alloc_state")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/volumes.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 03f52e4a20aa..841e799dece5 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -395,6 +395,7 @@ void btrfs_free_device(struct btrfs_device *device)
> {
> WARN_ON(!list_empty(&device->post_commit_list));
> rcu_string_free(device->name);
> + extent_io_tree_release(&device->alloc_state);
> btrfs_destroy_dev_zone_info(device);
> kfree(device);
> }
Hmm. Is this fix incomplete? It does not address the concern raised in
f0bb5474cff0. Why don't we also do this...
-----
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 40ef429d10a5..e8c26856426e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1133,7 +1133,6 @@ static void btrfs_close_one_device(struct
btrfs_device *device)
device->fs_info = NULL;
atomic_set(&device->dev_stats_ccnt, 0);
- extent_io_tree_release(&device->alloc_state);
/*
* Reset the flush error record. We might have a transient
flush error
-----
Thanks, Anand
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: make btrfs_free_device() static
2023-04-26 17:13 ` [PATCH 2/2] btrfs: make btrfs_free_device() static fdmanana
2023-04-27 11:31 ` Qu Wenruo
@ 2023-04-27 15:17 ` Anand Jain
1 sibling, 0 replies; 12+ messages in thread
From: Anand Jain @ 2023-04-27 15:17 UTC (permalink / raw)
To: fdmanana, linux-btrfs
On 27/04/2023 01:13, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> The function btrfs_free_device() is never used outside of volumes.c, so
> make it static and remove its prototype declaration at volumes.h.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] btrfs: fix leak of source device allocation state after device replace
2023-04-27 15:16 ` Anand Jain
@ 2023-04-27 16:34 ` Filipe Manana
2023-04-27 23:39 ` Anand Jain
0 siblings, 1 reply; 12+ messages in thread
From: Filipe Manana @ 2023-04-27 16:34 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
On Thu, Apr 27, 2023 at 4:16 PM Anand Jain <anand.jain@oracle.com> wrote:
>
> On 27/04/2023 01:13, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When a device replace finishes, the source device is freed by calling
> > btrfs_free_device() at btrfs_rm_dev_replace_free_srcdev(), but the
> > allocation state, tracked in the device's alloc_state io tree, is never
> > freed.
> >
> > This is a regression recently introduced by commit f0bb5474cff0 ("btrfs:
> > remove redundant release of btrfs_device::alloc_state"), which removed a
> > call to extent_io_tree_release() from btrfs_free_device(), with the
> > rationale that btrfs_close_one_device() already releases the allocation
> > state from a device and btrfs_close_one_device() is always called before
> > a device is freed with btrfs_free_device(). However that is not true for
> > the device replace case, as btrfs_free_device() is called without any
> > previous call to btrfs_close_one_device().
> >
> > The issue is trivial to reproduce, for example, by running test btrfs/027
> > from fstests:
> >
> > $ ./check btrfs/027
> > $ rmmod btrfs
>
> Ah, module reload is a useful way to verify. I have now enabled
> it in the fstests by setting TEST_FS_MODULE_RELOAD=1, and I am
> able to reproduce the issue.
>
> > $ dmesg
> > (...)
> > [84519.395485] BTRFS info (device sdc): dev_replace from <missing disk> (devid 2) to /dev/sdg started
> > [84519.466224] BTRFS info (device sdc): dev_replace from <missing disk> (devid 2) to /dev/sdg finished
> > [84519.552251] BTRFS info (device sdc): scrub: started on devid 1
> > [84519.552277] BTRFS info (device sdc): scrub: started on devid 2
> > [84519.552332] BTRFS info (device sdc): scrub: started on devid 3
> > [84519.552705] BTRFS info (device sdc): scrub: started on devid 4
> > [84519.604261] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0
> > [84519.609374] BTRFS info (device sdc): scrub: finished on devid 3 with status: 0
> > [84519.610818] BTRFS info (device sdc): scrub: finished on devid 1 with status: 0
> > [84519.610927] BTRFS info (device sdc): scrub: finished on devid 2 with status: 0
> > [84559.503795] BTRFS: state leak: start 1048576 end 1351614463 state 1 in tree 1 refs 1
> > [84559.506764] BTRFS: state leak: start 1048576 end 1347420159 state 1 in tree 1 refs 1
> > [84559.510294] BTRFS: state leak: start 1048576 end 1351614463 state 1 in tree 1 refs 1
> >
> > So fix this by adding back the call to extent_io_tree_release() at
> > btrfs_free_device().
> >
> > Fixes: f0bb5474cff0 ("btrfs: remove redundant release of btrfs_device::alloc_state")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> > fs/btrfs/volumes.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 03f52e4a20aa..841e799dece5 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -395,6 +395,7 @@ void btrfs_free_device(struct btrfs_device *device)
> > {
> > WARN_ON(!list_empty(&device->post_commit_list));
> > rcu_string_free(device->name);
> > + extent_io_tree_release(&device->alloc_state);
> > btrfs_destroy_dev_zone_info(device);
> > kfree(device);
> > }
>
>
> Hmm. Is this fix incomplete? It does not address the concern raised in
> f0bb5474cff0. Why don't we also do this...
What's the concern exactly?
One extra call to extent_io_tree_release() that's actually needed?
>
> -----
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 40ef429d10a5..e8c26856426e 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1133,7 +1133,6 @@ static void btrfs_close_one_device(struct
> btrfs_device *device)
>
> device->fs_info = NULL;
> atomic_set(&device->dev_stats_ccnt, 0);
> - extent_io_tree_release(&device->alloc_state);
This will change semantics a bit.
Removing this, will make us hold on to more memory for a longer period
when fs_devices->opened > 1, so btrfs_free_device() /
free_fs_devices() will be called potentially much later.
I'd rather keep the behaviour we had unless there's a stronger reason
other than removing 1 line of code.
Thanks.
>
> /*
> * Reset the flush error record. We might have a transient
> flush error
> -----
>
> Thanks, Anand
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] btrfs: fix extent state leaks after device replace
2023-04-26 17:12 [PATCH 0/2] btrfs: fix extent state leaks after device replace fdmanana
2023-04-26 17:13 ` [PATCH 1/2] btrfs: fix leak of source device allocation state " fdmanana
2023-04-26 17:13 ` [PATCH 2/2] btrfs: make btrfs_free_device() static fdmanana
@ 2023-04-27 22:52 ` David Sterba
2023-04-27 23:51 ` Anand Jain
2 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2023-04-27 22:52 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Wed, Apr 26, 2023 at 06:12:59PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> This fixes a recent regression (on its way to Linus' tree) that results in
> leaking extent state records in the allocation io tree of a device used as
> a source device for a device replace. Also unexport btrfs_free_device().
>
> Filipe Manana (2):
> btrfs: fix leak of source device allocation state after device replace
> btrfs: make btrfs_free_device() static
Added to misc-next, thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] btrfs: fix leak of source device allocation state after device replace
2023-04-27 16:34 ` Filipe Manana
@ 2023-04-27 23:39 ` Anand Jain
0 siblings, 0 replies; 12+ messages in thread
From: Anand Jain @ 2023-04-27 23:39 UTC (permalink / raw)
To: Filipe Manana; +Cc: linux-btrfs
On 28/4/23 00:34, Filipe Manana wrote:
> On Thu, Apr 27, 2023 at 4:16 PM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> On 27/04/2023 01:13, fdmanana@kernel.org wrote:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> When a device replace finishes, the source device is freed by calling
>>> btrfs_free_device() at btrfs_rm_dev_replace_free_srcdev(), but the
>>> allocation state, tracked in the device's alloc_state io tree, is never
>>> freed.
>>>
>>> This is a regression recently introduced by commit f0bb5474cff0 ("btrfs:
>>> remove redundant release of btrfs_device::alloc_state"), which removed a
>>> call to extent_io_tree_release() from btrfs_free_device(), with the
>>> rationale that btrfs_close_one_device() already releases the allocation
>>> state from a device and btrfs_close_one_device() is always called before
>>> a device is freed with btrfs_free_device(). However that is not true for
>>> the device replace case, as btrfs_free_device() is called without any
>>> previous call to btrfs_close_one_device().
>>>
>>> The issue is trivial to reproduce, for example, by running test btrfs/027
>>> from fstests:
>>>
>>> $ ./check btrfs/027
>>> $ rmmod btrfs
>>
>> Ah, module reload is a useful way to verify. I have now enabled
>> it in the fstests by setting TEST_FS_MODULE_RELOAD=1, and I am
>> able to reproduce the issue.
>>
>>> $ dmesg
>>> (...)
>>> [84519.395485] BTRFS info (device sdc): dev_replace from <missing disk> (devid 2) to /dev/sdg started
>>> [84519.466224] BTRFS info (device sdc): dev_replace from <missing disk> (devid 2) to /dev/sdg finished
>>> [84519.552251] BTRFS info (device sdc): scrub: started on devid 1
>>> [84519.552277] BTRFS info (device sdc): scrub: started on devid 2
>>> [84519.552332] BTRFS info (device sdc): scrub: started on devid 3
>>> [84519.552705] BTRFS info (device sdc): scrub: started on devid 4
>>> [84519.604261] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0
>>> [84519.609374] BTRFS info (device sdc): scrub: finished on devid 3 with status: 0
>>> [84519.610818] BTRFS info (device sdc): scrub: finished on devid 1 with status: 0
>>> [84519.610927] BTRFS info (device sdc): scrub: finished on devid 2 with status: 0
>>> [84559.503795] BTRFS: state leak: start 1048576 end 1351614463 state 1 in tree 1 refs 1
>>> [84559.506764] BTRFS: state leak: start 1048576 end 1347420159 state 1 in tree 1 refs 1
>>> [84559.510294] BTRFS: state leak: start 1048576 end 1351614463 state 1 in tree 1 refs 1
>>>
>>> So fix this by adding back the call to extent_io_tree_release() at
>>> btrfs_free_device().
>>>
>>> Fixes: f0bb5474cff0 ("btrfs: remove redundant release of btrfs_device::alloc_state")
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>> ---
>>> fs/btrfs/volumes.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 03f52e4a20aa..841e799dece5 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -395,6 +395,7 @@ void btrfs_free_device(struct btrfs_device *device)
>>> {
>>> WARN_ON(!list_empty(&device->post_commit_list));
>>> rcu_string_free(device->name);
>>> + extent_io_tree_release(&device->alloc_state);
>>> btrfs_destroy_dev_zone_info(device);
>>> kfree(device);
>>> }
>>
>>
>> Hmm. Is this fix incomplete? It does not address the concern raised in
>> f0bb5474cff0. Why don't we also do this...
>
> What's the concern exactly?
> One extra call to extent_io_tree_release() that's actually needed?
I'm looking for a solution that meets the need without leading to a
situation where extent_io_tree_release() is invoked twice elsewhere.
>
>>
>> -----
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 40ef429d10a5..e8c26856426e 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1133,7 +1133,6 @@ static void btrfs_close_one_device(struct
>> btrfs_device *device)
>>
>> device->fs_info = NULL;
>> atomic_set(&device->dev_stats_ccnt, 0);
>> - extent_io_tree_release(&device->alloc_state);
>
> This will change semantics a bit.
> Removing this, will make us hold on to more memory for a longer period
> when fs_devices->opened > 1, so btrfs_free_device() /
> free_fs_devices() will be called potentially much later.
>
> I'd rather keep the behaviour we had unless there's a stronger reason
> other than removing 1 line of code.
I'm trying to simplify by merging steps in either
btrfs_rm_dev_replace_remove_srcdev() or
btrfs_rm_dev_replace_free_srcdev() with btrfs_close_one_device().
This could beautifully address the issue. Let's see if it works.
Thanks, Anand
>
> Thanks.
>
>
>
>>
>> /*
>> * Reset the flush error record. We might have a transient
>> flush error
>> -----
>>
>> Thanks, Anand
>>
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] btrfs: fix extent state leaks after device replace
2023-04-27 22:52 ` [PATCH 0/2] btrfs: fix extent state leaks after device replace David Sterba
@ 2023-04-27 23:51 ` Anand Jain
2023-04-28 13:07 ` David Sterba
0 siblings, 1 reply; 12+ messages in thread
From: Anand Jain @ 2023-04-27 23:51 UTC (permalink / raw)
To: dsterba, fdmanana; +Cc: linux-btrfs
On 28/4/23 06:52, David Sterba wrote:
> On Wed, Apr 26, 2023 at 06:12:59PM +0100, fdmanana@kernel.org wrote:
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> This fixes a recent regression (on its way to Linus' tree) that results in
>> leaking extent state records in the allocation io tree of a device used as
>> a source device for a device replace. Also unexport btrfs_free_device().
>>
>> Filipe Manana (2):
>> btrfs: fix leak of source device allocation state after device replace
>> btrfs: make btrfs_free_device() static
>
> Added to misc-next, thanks.
Oh, I hope you saw the discussions in the patch 1/2.
We can address both calling extent_io_tree_release() twice and the
leak after device replace. Or no need of it?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] btrfs: fix extent state leaks after device replace
2023-04-27 23:51 ` Anand Jain
@ 2023-04-28 13:07 ` David Sterba
0 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2023-04-28 13:07 UTC (permalink / raw)
To: Anand Jain; +Cc: fdmanana, linux-btrfs
On Fri, Apr 28, 2023 at 07:51:42AM +0800, Anand Jain wrote:
> On 28/4/23 06:52, David Sterba wrote:
> > On Wed, Apr 26, 2023 at 06:12:59PM +0100, fdmanana@kernel.org wrote:
> >> From: Filipe Manana <fdmanana@suse.com>
> >>
> >> This fixes a recent regression (on its way to Linus' tree) that results in
> >> leaking extent state records in the allocation io tree of a device used as
> >> a source device for a device replace. Also unexport btrfs_free_device().
> >>
> >> Filipe Manana (2):
> >> btrfs: fix leak of source device allocation state after device replace
> >> btrfs: make btrfs_free_device() static
> >
>
>
> > Added to misc-next, thanks.
>
> Oh, I hope you saw the discussions in the patch 1/2.
> We can address both calling extent_io_tree_release() twice and the
> leak after device replace. Or no need of it?
Yes I saw the discussion, I'd rather revert back to the known behaviour,
this does not seem to be worth the optimization, apparently it's easy to
miss some cases.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-04-28 13:13 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-26 17:12 [PATCH 0/2] btrfs: fix extent state leaks after device replace fdmanana
2023-04-26 17:13 ` [PATCH 1/2] btrfs: fix leak of source device allocation state " fdmanana
2023-04-27 3:15 ` Qu Wenruo
2023-04-27 15:16 ` Anand Jain
2023-04-27 16:34 ` Filipe Manana
2023-04-27 23:39 ` Anand Jain
2023-04-26 17:13 ` [PATCH 2/2] btrfs: make btrfs_free_device() static fdmanana
2023-04-27 11:31 ` Qu Wenruo
2023-04-27 15:17 ` Anand Jain
2023-04-27 22:52 ` [PATCH 0/2] btrfs: fix extent state leaks after device replace David Sterba
2023-04-27 23:51 ` Anand Jain
2023-04-28 13:07 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox