* [PATCH v2] btrfs: make dropped merge extent_map immutable
@ 2024-10-18 23:34 Boris Burkov
2024-10-18 23:37 ` Qu Wenruo
0 siblings, 1 reply; 2+ messages in thread
From: Boris Burkov @ 2024-10-18 23:34 UTC (permalink / raw)
To: linux-btrfs, kernel-team
In debugging some corrupt squashfs files, we observed symptoms of
corrupt page cache pages but correct on-disk contents. Further
investigation revealed that the exact symptom was a correct page
followed by an incorrect, duplicate, page. This got us thinking about
extent maps.
commit ac05ca913e9f ("Btrfs: fix race between using extent maps and merging them")
enforces a reference count on the primary `em` extent_map being merged,
as that one gets modified.
However, since,
commit 3d2ac9922465 ("btrfs: introduce new members for extent_map")
both 'em' and 'merge' get modified, which started modifying 'merge'
and thus introduced the same race.
We were able to reproduce this by looping the affected squashfs workload
in parallel on a bunch of separate btrfs-es while also dropping caches.
We are still working on a simple enough reproducer to make into an fstest.
The simplest fix is to stop modifying 'merge', which is not essential,
as it is dropped immediately after the merge. This behavior is simply
a consequence of the order of the two extent maps being important in
computing the new values. Modify merge_ondisk_extents to take prev and
next by const* and also take a third merged parameter that it puts the
results in. Note that this introduces the rather odd behavior of passing
'em' to merge_ondisk_extents as a const * and as a regular ptr.
Fixes: 3d2ac9922465 ("btrfs: introduce new members for extent_map")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Boris Burkov <boris@bur.io>
---
Changelog:
v2:
- make 'merge' immutable instead of refcounting it
---
fs/btrfs/extent_map.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index d58d6fc40da1..a8290724475a 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -252,7 +252,8 @@ static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma
* @prev and @next will be both updated to point to the new merged range.
* Thus one of them should be removed by the caller.
*/
-static void merge_ondisk_extents(struct extent_map *prev, struct extent_map *next)
+static void merge_ondisk_extents(struct extent_map const *prev, struct extent_map const *next,
+ struct extent_map *merged)
{
u64 new_disk_bytenr;
u64 new_disk_num_bytes;
@@ -287,15 +288,10 @@ static void merge_ondisk_extents(struct extent_map *prev, struct extent_map *nex
new_disk_bytenr;
new_offset = prev->disk_bytenr + prev->offset - new_disk_bytenr;
- prev->disk_bytenr = new_disk_bytenr;
- prev->disk_num_bytes = new_disk_num_bytes;
- prev->ram_bytes = new_disk_num_bytes;
- prev->offset = new_offset;
-
- next->disk_bytenr = new_disk_bytenr;
- next->disk_num_bytes = new_disk_num_bytes;
- next->ram_bytes = new_disk_num_bytes;
- next->offset = new_offset;
+ merged->disk_bytenr = new_disk_bytenr;
+ merged->disk_num_bytes = new_disk_num_bytes;
+ merged->ram_bytes = new_disk_num_bytes;
+ merged->offset = new_offset;
}
static void dump_extent_map(struct btrfs_fs_info *fs_info, const char *prefix,
@@ -363,7 +359,7 @@ static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em)
em->generation = max(em->generation, merge->generation);
if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
- merge_ondisk_extents(merge, em);
+ merge_ondisk_extents(merge, em, em);
em->flags |= EXTENT_FLAG_MERGED;
validate_extent_map(fs_info, em);
@@ -378,7 +374,7 @@ static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em)
if (rb && can_merge_extent_map(merge) && mergeable_maps(em, merge)) {
em->len += merge->len;
if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
- merge_ondisk_extents(em, merge);
+ merge_ondisk_extents(em, merge, em);
validate_extent_map(fs_info, em);
em->generation = max(em->generation, merge->generation);
em->flags |= EXTENT_FLAG_MERGED;
--
2.47.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH v2] btrfs: make dropped merge extent_map immutable
2024-10-18 23:34 [PATCH v2] btrfs: make dropped merge extent_map immutable Boris Burkov
@ 2024-10-18 23:37 ` Qu Wenruo
0 siblings, 0 replies; 2+ messages in thread
From: Qu Wenruo @ 2024-10-18 23:37 UTC (permalink / raw)
To: Boris Burkov, linux-btrfs, kernel-team
在 2024/10/19 10:04, Boris Burkov 写道:
> In debugging some corrupt squashfs files, we observed symptoms of
> corrupt page cache pages but correct on-disk contents. Further
> investigation revealed that the exact symptom was a correct page
> followed by an incorrect, duplicate, page. This got us thinking about
> extent maps.
>
> commit ac05ca913e9f ("Btrfs: fix race between using extent maps and merging them")
> enforces a reference count on the primary `em` extent_map being merged,
> as that one gets modified.
>
> However, since,
> commit 3d2ac9922465 ("btrfs: introduce new members for extent_map")
> both 'em' and 'merge' get modified, which started modifying 'merge'
> and thus introduced the same race.
>
> We were able to reproduce this by looping the affected squashfs workload
> in parallel on a bunch of separate btrfs-es while also dropping caches.
> We are still working on a simple enough reproducer to make into an fstest.
>
> The simplest fix is to stop modifying 'merge', which is not essential,
> as it is dropped immediately after the merge. This behavior is simply
> a consequence of the order of the two extent maps being important in
> computing the new values. Modify merge_ondisk_extents to take prev and
> next by const* and also take a third merged parameter that it puts the
> results in. Note that this introduces the rather odd behavior of passing
> 'em' to merge_ondisk_extents as a const * and as a regular ptr.
>
> Fixes: 3d2ac9922465 ("btrfs: introduce new members for extent_map")
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Just some small nitpicks inlined below, which can be done at merge time.
> ---
> Changelog:
> v2:
> - make 'merge' immutable instead of refcounting it
> ---
> fs/btrfs/extent_map.c | 20 ++++++++------------
> 1 file changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> index d58d6fc40da1..a8290724475a 100644
> --- a/fs/btrfs/extent_map.c
> +++ b/fs/btrfs/extent_map.c
> @@ -252,7 +252,8 @@ static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma
> * @prev and @next will be both updated to point to the new merged range.
This comment needs to be updated.
Thanks,
Qu
> * Thus one of them should be removed by the caller.
> */
> -static void merge_ondisk_extents(struct extent_map *prev, struct extent_map *next)
> +static void merge_ondisk_extents(struct extent_map const *prev, struct extent_map const *next,
> + struct extent_map *merged)
> {
> u64 new_disk_bytenr;
> u64 new_disk_num_bytes;
> @@ -287,15 +288,10 @@ static void merge_ondisk_extents(struct extent_map *prev, struct extent_map *nex
> new_disk_bytenr;
> new_offset = prev->disk_bytenr + prev->offset - new_disk_bytenr;
>
> - prev->disk_bytenr = new_disk_bytenr;
> - prev->disk_num_bytes = new_disk_num_bytes;
> - prev->ram_bytes = new_disk_num_bytes;
> - prev->offset = new_offset;
> -
> - next->disk_bytenr = new_disk_bytenr;
> - next->disk_num_bytes = new_disk_num_bytes;
> - next->ram_bytes = new_disk_num_bytes;
> - next->offset = new_offset;
> + merged->disk_bytenr = new_disk_bytenr;
> + merged->disk_num_bytes = new_disk_num_bytes;
> + merged->ram_bytes = new_disk_num_bytes;
> + merged->offset = new_offset;
> }
>
> static void dump_extent_map(struct btrfs_fs_info *fs_info, const char *prefix,
> @@ -363,7 +359,7 @@ static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em)
> em->generation = max(em->generation, merge->generation);
>
> if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
> - merge_ondisk_extents(merge, em);
> + merge_ondisk_extents(merge, em, em);
> em->flags |= EXTENT_FLAG_MERGED;
>
> validate_extent_map(fs_info, em);
> @@ -378,7 +374,7 @@ static void try_merge_map(struct btrfs_inode *inode, struct extent_map *em)
> if (rb && can_merge_extent_map(merge) && mergeable_maps(em, merge)) {
> em->len += merge->len;
> if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
> - merge_ondisk_extents(em, merge);
> + merge_ondisk_extents(em, merge, em);
> validate_extent_map(fs_info, em);
> em->generation = max(em->generation, merge->generation);
> em->flags |= EXTENT_FLAG_MERGED;
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-10-18 23:37 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-18 23:34 [PATCH v2] btrfs: make dropped merge extent_map immutable Boris Burkov
2024-10-18 23:37 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).