* [PATCH 1/4] btrfs-progs: add warning for -s option of btrfs-image
2024-07-08 5:14 [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image Qu Wenruo
@ 2024-07-08 5:14 ` Qu Wenruo
2024-07-08 5:14 ` [PATCH 2/4] btrfs-progs: image: fix the bug that filename sanitization not working Qu Wenruo
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-07-08 5:14 UTC (permalink / raw)
To: linux-btrfs
The filename sanitization is not recommended as it introduces mismatches
between DIR_ITEM and INODE_REF.
Even hash confliction mode (double "-s" option) is not ensured to always
find a conflict, and when failed to find a conflict, a mismatch would
happen.
And when a mismatch happens, the kernel will not resolve the path
correctly since kernel uses the hash from DIR_ITEM to lookup the child
inode.
So add a warning into the "-s" option of btrfs-image.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Documentation/btrfs-image.rst | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/Documentation/btrfs-image.rst b/Documentation/btrfs-image.rst
index a63b0da273c5..febbf3877c82 100644
--- a/Documentation/btrfs-image.rst
+++ b/Documentation/btrfs-image.rst
@@ -37,13 +37,16 @@ OPTIONS
file system will not be able to be mounted.
-s
- Sanitize the file names when generating the image. One -s means just
- generate random garbage, which means that the directory indexes won't match up
- since the hashes won't match with the garbage filenames. Using *-s* will
- calculate a collision for the filename so that the hashes match, and if it
- can't calculate a collision then it will just generate garbage. The collision
- calculator is very time and CPU intensive so only use it if you are having
- problems with your file system tree and need to have it mostly working.
+ Sanitize the file names when generating the image.
+ Not recommended as this would introduce new hash mismatch, thus if your
+ problem involves subvolume tress, it can even mask the existing problems.
+
+ One *-s* means just generate random garbage, which means that the
+ directory hash won't match up its filenames.
+ Using two *-s* will calculate a collision for the filename so that the
+ hashes match, and if it can't calculate a collision then it will just
+ generate garbage. The collision calculator is very time and CPU
+ intensive.
-w
Walk all the trees manually and copy any blocks that are referenced. Use this
--
2.45.2
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 2/4] btrfs-progs: image: fix the bug that filename sanitization not working
2024-07-08 5:14 [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image Qu Wenruo
2024-07-08 5:14 ` [PATCH 1/4] btrfs-progs: add warning for -s option " Qu Wenruo
@ 2024-07-08 5:14 ` Qu Wenruo
2024-07-08 5:14 ` [PATCH 3/4] btrfs-progs: fix rand_range() Qu Wenruo
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-07-08 5:14 UTC (permalink / raw)
To: linux-btrfs; +Cc: Andrea Gelmini
[BUG]
There is a bug report that image dump taken by "btrfs-image -s" doesn't
really sanitize the filenames:
# truncates -s 1G source.raw
# mkfs.btrfs -f source.raw
# mount source.raw $mnt
# touch $mnt/top_secret_filename
# touch $mnt/secret_filename
# umount $mnt
# btrfs-image -s source.raw dump.img
# string dump.img | grep filename
top_secret_filename
secret_filename
top_secret_filename
secret_filename
top_secret_filename
[CAUSE]
Using above image to store the fs, and we got the following result in fs
tree:
item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
generation 3 transid 7 size 68 nbytes 16384
block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0
sequence 2 flags 0x0(none)
item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
index 0 namelen 2 name: ..
item 2 key (256 DIR_ITEM 439756795) itemoff 16062 itemsize 49
location key (257 INODE_ITEM 0) type FILE
transid 7 data_len 0 name_len 19
name: top_secret_filename
item 3 key (256 DIR_ITEM 693462946) itemoff 16017 itemsize 45
location key (258 INODE_ITEM 0) type FILE
transid 7 data_len 0 name_len 15
name: secret_filename
item 4 key (256 DIR_INDEX 2) itemoff 15968 itemsize 49
location key (257 INODE_ITEM 0) type FILE
transid 7 data_len 0 name_len 19
name: top_secret_filename
item 5 key (256 DIR_INDEX 3) itemoff 15923 itemsize 45
location key (258 INODE_ITEM 0) type FILE
transid 7 data_len 0 name_len 15
name: secret_filename
item 6 key (257 INODE_ITEM 0) itemoff 15763 itemsize 160
generation 7 transid 7 size 0 nbytes 0
block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 7 key (257 INODE_REF 256) itemoff 15734 itemsize 29
index 2 namelen 19 name: top_secret_filename
item 8 key (258 INODE_ITEM 0) itemoff 15574 itemsize 160
generation 7 transid 7 size 0 nbytes 0
block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 9 key (258 INODE_REF 256) itemoff 15549 itemsize 25
index 3 namelen 15 name: 1���'�gc*&R
The result shows, only the last INODE_REF got sanitized, all the
remaining is not touched at all.
This is cauesd by how we sanitize the filenames:
copy_buffer()
|- memcpy(dst, src->data, src->len);
| This means we copy the whole eb into our buffer already.
|
|- zero_items()
|- sanitize_name()
|- eb = alloc_dummy_eb();
|- memcpy(eb->data, src->data, src->len);
| This means we generate a dummy eb with the same contents of
| the source eb.
|
|- sanitize_dir_item();
| We override the dir item of the given item (specified by the
| slot number) inside our dummy eb.
|
|- memcpy(dst, eb->data, eb->lem);
The last one copy the dummy eb into our buffer, with only the slot
corrupted.
But when the whole work flow hits the next slot, we only corrupt the
next slot, but still copy the whole dummy eb back to buffer.
This means the previous slot would be overwritten by the old unsanitized
data.
Resulting only the last slot is corrupted.
[FIX]
Fix the bug by only copying back the corrupted item to the buffer.
So that other slots won't be overwritten by unsanitized data.
Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
image/sanitize.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/image/sanitize.c b/image/sanitize.c
index 084da22e401e..ef4f6e515633 100644
--- a/image/sanitize.c
+++ b/image/sanitize.c
@@ -449,6 +449,8 @@ void sanitize_name(enum sanitize_mode sanitize, struct rb_root *name_tree,
int slot)
{
struct extent_buffer *eb;
+ u32 item_ptr_off = btrfs_item_ptr_offset(src, slot);
+ u32 item_ptr_size = btrfs_item_size(src, slot);
eb = alloc_dummy_eb(src->start, src->len);
if (!eb) {
@@ -476,7 +478,11 @@ void sanitize_name(enum sanitize_mode sanitize, struct rb_root *name_tree,
break;
}
- memcpy(dst, eb->data, eb->len);
+ /*
+ * Only overwrite the sanitized part, or we can overwrite previous
+ * sanitized value with the old values from @src.
+ */
+ memcpy(dst + item_ptr_off, eb->data + item_ptr_off, item_ptr_size);
free(eb);
}
--
2.45.2
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 3/4] btrfs-progs: fix rand_range()
2024-07-08 5:14 [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image Qu Wenruo
2024-07-08 5:14 ` [PATCH 1/4] btrfs-progs: add warning for -s option " Qu Wenruo
2024-07-08 5:14 ` [PATCH 2/4] btrfs-progs: image: fix the bug that filename sanitization not working Qu Wenruo
@ 2024-07-08 5:14 ` Qu Wenruo
2024-07-08 5:14 ` [PATCH 4/4] btrfs-progs: misc-tests: add a test case for filename sanitization Qu Wenruo
2024-07-14 22:09 ` [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image Qu Wenruo
4 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-07-08 5:14 UTC (permalink / raw)
To: linux-btrfs
[BUG]
Btrfs-image with "-s" option is not properly genearting the desired
range [33, 126], thus even with the filename sanitization fixed, it's
still generate filenames beyond ASCII ranges:
item 9 key (258 INODE_REF 256) itemoff 15549 itemsize 25
index 3 namelen 15 name: 1���'�gc*&R
[CAUSE]
It's the function rand_range() return value larger than the specified
@upper.
The cause is in the timing when we trim the value down to 32 bits:
return (unsigned int)(jrand48(rand_seed) % upper);
Unlike the name, jrand48() generate uniformly distrbuted value between
[-2^31, 2^31 - 1], which is the full range of s32.
And the result of a modulus operation with a minus input is still minus.
Thus even if we later convert it to unsigned int, the minus value is
much larger than @upper and caused the problem.
[FIX]
Convert the value to unsigned int first, then do the modulus operation.
Furthermore to prevent the problem from happening again, add a new
UASSERT() to make sure the result is indeed smaller than @upper.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
common/utils.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/common/utils.c b/common/utils.c
index 3ca7cff396fe..8f6556b345e8 100644
--- a/common/utils.c
+++ b/common/utils.c
@@ -922,12 +922,12 @@ u32 rand_u32(void)
/* Return random number in range [0, upper) */
unsigned int rand_range(unsigned int upper)
{
+ int ret;
+
__init_seed();
- /*
- * Use the full 48bits to mod, which would be more uniformly
- * distributed
- */
- return (unsigned int)(jrand48(rand_seed) % upper);
+ ret = (unsigned int)(jrand48(rand_seed)) % upper;
+ UASSERT(ret < upper);
+ return ret;
}
int rand_int(void)
--
2.45.2
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 4/4] btrfs-progs: misc-tests: add a test case for filename sanitization
2024-07-08 5:14 [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image Qu Wenruo
` (2 preceding siblings ...)
2024-07-08 5:14 ` [PATCH 3/4] btrfs-progs: fix rand_range() Qu Wenruo
@ 2024-07-08 5:14 ` Qu Wenruo
2024-07-14 22:09 ` [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image Qu Wenruo
4 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-07-08 5:14 UTC (permalink / raw)
To: linux-btrfs
This test case checks:
- If a regular btrfs-image dump has the unsanitized filenames
- If a sanitized btrfs-image dump has filenames properly censored
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
tests/misc-tests/065-image-filename/test.sh | 33 +++++++++++++++++++++
1 file changed, 33 insertions(+)
create mode 100755 tests/misc-tests/065-image-filename/test.sh
diff --git a/tests/misc-tests/065-image-filename/test.sh b/tests/misc-tests/065-image-filename/test.sh
new file mode 100755
index 000000000000..a9567ad672ef
--- /dev/null
+++ b/tests/misc-tests/065-image-filename/test.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+# Verify "btrfs-image -s" sanitize the filenames correctly
+
+source "$TEST_TOP/common" || exit
+source "$TEST_TOP/common.convert" || exit
+
+setup_root_helper
+prepare_test_dev
+
+tmp=$(mktemp --tmpdir btrfs-progs-image-filename.XXXXXX)
+
+run_check_mkfs_test_dev
+run_check_mount_test_dev
+run_check $SUDO_HELPER touch "$TEST_MNT/top_secret_filename"
+run_check $SUDO_HELPER touch "$TEST_MNT/secret_filename"
+run_check $SUDO_HELPER touch "$TEST_MNT/confidential_filename"
+run_check_umount_test_dev
+
+run_check "$TOP/btrfs-image" "$TEST_DEV" "$tmp"
+echo "strings found inside the regular dump:" >> "$RESULTS"
+strings "$tmp" >> "$RESULTS"
+if ! strings "$tmp" | grep -q _filename "$tmp"; then
+ rm -f -- "$tmp"
+ _fail "regular dump sanitized the filenames"
+fi
+run_check "$TOP/btrfs-image" -s "$TEST_DEV" "$tmp"
+echo "strings found inside the sanitized dump:" >> "$RESULTS"
+strings "$tmp" >> "$RESULTS"
+if strings "$tmp" | grep -q _filename "$tmp"; then
+ rm -f -- "$tmp"
+ _fail "filenames not properly sanitized"
+fi
+rm -f -- "$tmp"
--
2.45.2
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image
2024-07-08 5:14 [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image Qu Wenruo
` (3 preceding siblings ...)
2024-07-08 5:14 ` [PATCH 4/4] btrfs-progs: misc-tests: add a test case for filename sanitization Qu Wenruo
@ 2024-07-14 22:09 ` Qu Wenruo
2024-07-18 15:47 ` David Sterba
4 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2024-07-14 22:09 UTC (permalink / raw)
To: linux-btrfs, David Sterba
Ping?
Any update on this?
Thanks,
Qu
在 2024/7/8 14:44, Qu Wenruo 写道:
> There are several bugs in btrfs-image filename sanitization:
>
> - Ensured kernel path resolution bug
> Since during path resolution btrfs uses hash to find the child inode,
> with garbage filled DIR_ITEMs, it's definitely unable to properly
> resolve the path.
>
> A warning is added to the man page by the first patch.
>
> - Only the last item got properly sanitized
> All the remaining INODE_REF/DIR_INDEX/DIR_ITEM are not sanitized at
> all.
>
> This is fixed by the second patch.
>
> - Sanitized filename contains non-ASCII chars
> This is fixed by the third patch.
>
>
> Finally a new test case is introduced to verify the filename
> sanitization behavior of btrfs-image.
>
> Qu Wenruo (4):
> btrfs-progs: add warning for -s option of btrfs-image
> btrfs-progs: image: fix the bug that filename sanitization not working
> btrfs-progs: fix rand_range()
> btrfs-progs: misc-tests: add a test case for filename sanitization
>
> Documentation/btrfs-image.rst | 17 ++++++-----
> common/utils.c | 10 +++----
> image/sanitize.c | 8 ++++-
> tests/misc-tests/065-image-filename/test.sh | 33 +++++++++++++++++++++
> 4 files changed, 55 insertions(+), 13 deletions(-)
> create mode 100755 tests/misc-tests/065-image-filename/test.sh
>
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image
2024-07-14 22:09 ` [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image Qu Wenruo
@ 2024-07-18 15:47 ` David Sterba
2024-07-18 22:38 ` Qu Wenruo
0 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2024-07-18 15:47 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, David Sterba
On Mon, Jul 15, 2024 at 07:39:32AM +0930, Qu Wenruo wrote:
> Ping?
>
> Any update on this?
Tracked as https://github.com/kdave/btrfs-progs/pull/837
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image
2024-07-18 15:47 ` David Sterba
@ 2024-07-18 22:38 ` Qu Wenruo
2024-07-18 22:55 ` David Sterba
0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2024-07-18 22:38 UTC (permalink / raw)
To: dsterba, Qu Wenruo; +Cc: linux-btrfs, David Sterba
在 2024/7/19 01:17, David Sterba 写道:
> On Mon, Jul 15, 2024 at 07:39:32AM +0930, Qu Wenruo wrote:
>> Ping?
>>
>> Any update on this?
>
> Tracked as https://github.com/kdave/btrfs-progs/pull/837
>
Just to mention, since the github review system is working now, do you
prefer me to still send the updated patches to the ML?
I'm mostly using the ML as a way to track the work...
Thanks,
Qu
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image
2024-07-18 22:38 ` Qu Wenruo
@ 2024-07-18 22:55 ` David Sterba
2024-07-19 0:46 ` Qu Wenruo
0 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2024-07-18 22:55 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs, David Sterba
On Fri, Jul 19, 2024 at 08:08:30AM +0930, Qu Wenruo wrote:
> 在 2024/7/19 01:17, David Sterba 写道:
> > On Mon, Jul 15, 2024 at 07:39:32AM +0930, Qu Wenruo wrote:
> >> Ping?
> >>
> >> Any update on this?
> >
> > Tracked as https://github.com/kdave/btrfs-progs/pull/837
>
> Just to mention, since the github review system is working now, do you
> prefer me to still send the updated patches to the ML?
> I'm mostly using the ML as a way to track the work...
You don't need to send it to the mailinglist if there's a PR, for
tracking purposes it's ok to send it. It would be good to keep
dicsussion at one place but I think this won't be common or we'll
manage.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/4] btrfs-progs: fix the filename sanitization of btrfs-image
2024-07-18 22:55 ` David Sterba
@ 2024-07-19 0:46 ` Qu Wenruo
0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-07-19 0:46 UTC (permalink / raw)
To: dsterba, Qu Wenruo; +Cc: linux-btrfs, David Sterba
在 2024/7/19 08:25, David Sterba 写道:
> On Fri, Jul 19, 2024 at 08:08:30AM +0930, Qu Wenruo wrote:
>> 在 2024/7/19 01:17, David Sterba 写道:
>>> On Mon, Jul 15, 2024 at 07:39:32AM +0930, Qu Wenruo wrote:
>>>> Ping?
>>>>
>>>> Any update on this?
>>>
>>> Tracked as https://github.com/kdave/btrfs-progs/pull/837
>>
>> Just to mention, since the github review system is working now, do you
>> prefer me to still send the updated patches to the ML?
>> I'm mostly using the ML as a way to track the work...
>
> You don't need to send it to the mailinglist if there's a PR, for
> tracking purposes it's ok to send it. It would be good to keep
> dicsussion at one place but I think this won't be common or we'll
> manage.
Got it. Would mention the PR URL in the future cover letters, so
discussion would still happen there.
Thanks,
Qu
^ permalink raw reply [flat|nested] 10+ messages in thread