* [PATCH v3] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
@ 2020-07-31 9:48 Qu Wenruo
2020-07-31 10:05 ` Filipe Manana
2020-07-31 20:52 ` kernel test robot
0 siblings, 2 replies; 7+ messages in thread
From: Qu Wenruo @ 2020-07-31 9:48 UTC (permalink / raw)
To: linux-btrfs; +Cc: Filipe Manana
[BUG]
The following script can lead to tons of beyond device boundary access:
mkfs.btrfs -f $dev -b 10G
mount $dev $mnt
trimfs $mnt
btrfs filesystem resize 1:-1G $mnt
trimfs $mnt
[CAUSE]
Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
find_first_clear_extent_bit"), we try to avoid trimming ranges that's
already trimmed.
So we check device->alloc_state by finding the first range which doesn't
have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
But if we shrunk the device, that bits are not cleared, thus we could
easily got a range starts beyond the shrunk device size.
This results the returned @start and @end are all beyond device size,
then we call "end = min(end, device->total_bytes -1);" making @end
smaller than device size.
Then finally we goes "len = end - start + 1", totally underflow the
result, and lead to the beyond-device-boundary access.
[FIX]
This patch will fix the problem in two ways:
- Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
This is the root fix
- Add extra safe net when trimming free device extents
We check and warn if the returned range is already beyond current
device.
Link: https://github.com/kdave/btrfs-progs/issues/282
Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
---
Changelog:
v2:
- Add proper fixes tag
- Add extra warning for beyond device end case
- Add graceful exit for already trimmed case
v3:
- Don't return EUCLEAN for beyond boundary access
- Rephrase the warning message for beyond boundary access
---
fs/btrfs/extent-tree.c | 21 +++++++++++++++++++++
fs/btrfs/volumes.c | 12 ++++++++++++
2 files changed, 33 insertions(+)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fa7d83051587..7c5e0961c93b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -33,6 +33,7 @@
#include "delalloc-space.h"
#include "block-group.h"
#include "discard.h"
+#include "rcu-string.h"
#undef SCRAMBLE_DELAYED_REFS
@@ -5669,6 +5670,26 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
&start, &end,
CHUNK_TRIMMED | CHUNK_ALLOCATED);
+ /* CHUNK_* bits not cleared properly */
+ if (start > device->total_bytes) {
+ WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+ btrfs_warn_in_rcu(fs_info,
+"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
+ start, end - start + 1,
+ rcu_str_deref(device->name),
+ device->total_bytes);
+ mutex_unlock(&fs_info->chunk_mutex);
+ ret = 0;
+ break;
+ }
+
+ /* The remaining part has already been trimmed */
+ if (start == device->total_bytes) {
+ mutex_unlock(&fs_info->chunk_mutex);
+ ret = 0;
+ break;
+ }
+
/* Ensure we skip the reserved area in the first 1M */
start = max_t(u64, start, SZ_1M);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d7670e2a9f39..4e51ef68ea72 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
}
mutex_lock(&fs_info->chunk_mutex);
+ /*
+ * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
+ * current device boundary.
+ * This shouldn't fail, as alloc_state should only utilize those two
+ * bits, thus we shouldn't alloc new memory for clearing the status.
+ *
+ * So here we just do an ASSERT() to catch future behavior change.
+ */
+ ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
+ CHUNK_TRIMMED | CHUNK_ALLOCATED);
+ ASSERT(!ret);
+
btrfs_device_set_disk_total_bytes(device, new_size);
if (list_empty(&device->post_commit_list))
list_add_tail(&device->post_commit_list,
--
2.28.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
2020-07-31 9:48 [PATCH v3] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary Qu Wenruo
@ 2020-07-31 10:05 ` Filipe Manana
2020-07-31 10:20 ` Qu Wenruo
2020-07-31 20:52 ` kernel test robot
1 sibling, 1 reply; 7+ messages in thread
From: Filipe Manana @ 2020-07-31 10:05 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, Filipe Manana
On Fri, Jul 31, 2020 at 10:49 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> The following script can lead to tons of beyond device boundary access:
>
> mkfs.btrfs -f $dev -b 10G
> mount $dev $mnt
> trimfs $mnt
> btrfs filesystem resize 1:-1G $mnt
> trimfs $mnt
>
> [CAUSE]
> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
> already trimmed.
>
> So we check device->alloc_state by finding the first range which doesn't
> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
>
> But if we shrunk the device, that bits are not cleared, thus we could
> easily got a range starts beyond the shrunk device size.
>
> This results the returned @start and @end are all beyond device size,
> then we call "end = min(end, device->total_bytes -1);" making @end
> smaller than device size.
>
> Then finally we goes "len = end - start + 1", totally underflow the
> result, and lead to the beyond-device-boundary access.
>
> [FIX]
> This patch will fix the problem in two ways:
> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
> This is the root fix
>
> - Add extra safe net when trimming free device extents
> We check and warn if the returned range is already beyond current
> device.
>
> Link: https://github.com/kdave/btrfs-progs/issues/282
> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> ---
> Changelog:
> v2:
> - Add proper fixes tag
> - Add extra warning for beyond device end case
> - Add graceful exit for already trimmed case
> v3:
> - Don't return EUCLEAN for beyond boundary access
> - Rephrase the warning message for beyond boundary access
> ---
> fs/btrfs/extent-tree.c | 21 +++++++++++++++++++++
> fs/btrfs/volumes.c | 12 ++++++++++++
> 2 files changed, 33 insertions(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index fa7d83051587..7c5e0961c93b 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -33,6 +33,7 @@
> #include "delalloc-space.h"
> #include "block-group.h"
> #include "discard.h"
> +#include "rcu-string.h"
>
> #undef SCRAMBLE_DELAYED_REFS
>
> @@ -5669,6 +5670,26 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
> &start, &end,
> CHUNK_TRIMMED | CHUNK_ALLOCATED);
>
> + /* CHUNK_* bits not cleared properly */
> + if (start > device->total_bytes) {
> + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> + btrfs_warn_in_rcu(fs_info,
> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
> + start, end - start + 1,
> + rcu_str_deref(device->name),
> + device->total_bytes);
> + mutex_unlock(&fs_info->chunk_mutex);
> + ret = 0;
> + break;
> + }
> +
> + /* The remaining part has already been trimmed */
> + if (start == device->total_bytes) {
> + mutex_unlock(&fs_info->chunk_mutex);
> + ret = 0;
> + break;
> + }
Sorry I missed this earlier, but why is this a special case? Couldn't
this be merged into the previous check?
Why is an offset matching the ending of the device not considered unexpected?
I also don't understand the comment, what is the remaining part?
Thanks.
> +
> /* Ensure we skip the reserved area in the first 1M */
> start = max_t(u64, start, SZ_1M);
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index d7670e2a9f39..4e51ef68ea72 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
> }
>
> mutex_lock(&fs_info->chunk_mutex);
> + /*
> + * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
> + * current device boundary.
> + * This shouldn't fail, as alloc_state should only utilize those two
> + * bits, thus we shouldn't alloc new memory for clearing the status.
> + *
> + * So here we just do an ASSERT() to catch future behavior change.
> + */
> + ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
> + CHUNK_TRIMMED | CHUNK_ALLOCATED);
> + ASSERT(!ret);
> +
> btrfs_device_set_disk_total_bytes(device, new_size);
> if (list_empty(&device->post_commit_list))
> list_add_tail(&device->post_commit_list,
> --
> 2.28.0
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
2020-07-31 10:05 ` Filipe Manana
@ 2020-07-31 10:20 ` Qu Wenruo
2020-07-31 10:38 ` Qu Wenruo
2020-07-31 10:40 ` Filipe Manana
0 siblings, 2 replies; 7+ messages in thread
From: Qu Wenruo @ 2020-07-31 10:20 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs, Filipe Manana
On 2020/7/31 下午6:05, Filipe Manana wrote:
> On Fri, Jul 31, 2020 at 10:49 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [BUG]
>> The following script can lead to tons of beyond device boundary access:
>>
>> mkfs.btrfs -f $dev -b 10G
>> mount $dev $mnt
>> trimfs $mnt
>> btrfs filesystem resize 1:-1G $mnt
>> trimfs $mnt
>>
>> [CAUSE]
>> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
>> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
>> already trimmed.
>>
>> So we check device->alloc_state by finding the first range which doesn't
>> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
>>
>> But if we shrunk the device, that bits are not cleared, thus we could
>> easily got a range starts beyond the shrunk device size.
>>
>> This results the returned @start and @end are all beyond device size,
>> then we call "end = min(end, device->total_bytes -1);" making @end
>> smaller than device size.
>>
>> Then finally we goes "len = end - start + 1", totally underflow the
>> result, and lead to the beyond-device-boundary access.
>>
>> [FIX]
>> This patch will fix the problem in two ways:
>> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
>> This is the root fix
>>
>> - Add extra safe net when trimming free device extents
>> We check and warn if the returned range is already beyond current
>> device.
>>
>> Link: https://github.com/kdave/btrfs-progs/issues/282
>> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>> ---
>> Changelog:
>> v2:
>> - Add proper fixes tag
>> - Add extra warning for beyond device end case
>> - Add graceful exit for already trimmed case
>> v3:
>> - Don't return EUCLEAN for beyond boundary access
>> - Rephrase the warning message for beyond boundary access
>> ---
>> fs/btrfs/extent-tree.c | 21 +++++++++++++++++++++
>> fs/btrfs/volumes.c | 12 ++++++++++++
>> 2 files changed, 33 insertions(+)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index fa7d83051587..7c5e0961c93b 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -33,6 +33,7 @@
>> #include "delalloc-space.h"
>> #include "block-group.h"
>> #include "discard.h"
>> +#include "rcu-string.h"
>>
>> #undef SCRAMBLE_DELAYED_REFS
>>
>> @@ -5669,6 +5670,26 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
>> &start, &end,
>> CHUNK_TRIMMED | CHUNK_ALLOCATED);
>>
>> + /* CHUNK_* bits not cleared properly */
>> + if (start > device->total_bytes) {
>> + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>> + btrfs_warn_in_rcu(fs_info,
>> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
>> + start, end - start + 1,
>> + rcu_str_deref(device->name),
>> + device->total_bytes);
>> + mutex_unlock(&fs_info->chunk_mutex);
>> + ret = 0;
>> + break;
>> + }
>> +
>> + /* The remaining part has already been trimmed */
>> + if (start == device->total_bytes) {
>> + mutex_unlock(&fs_info->chunk_mutex);
>> + ret = 0;
>> + break;
>> + }
>
> Sorry I missed this earlier, but why is this a special case? Couldn't
> this be merged into the previous check?
> Why is an offset matching the ending of the device not considered unexpected?
For such example:
0 1g 2g
device 1: |///////////////| |
|//| = Allocated space
| | = Free space.
After one fstrim, [1G, 2G) get trimmed.
So in the alloc_state we have
0 1G 2G
device 1: | |***************|
|**| = CHUNK_TRIMMED bits set
Here we just focus on the unallocated space, ignoring the block group parts.
Then we run fstrim again.
We call find_first_clear_extent_bit(start == 1G), then we got the result
start == 2G, end = U64_MAX.
In that case, we got start == device->total_bytes, and it's completely
valid.
>
> I also don't understand the comment, what is the remaining part?
The remaining means the unallocated space from the @start of
find_first_clear_extent_bit().
Any better suggestion?
Thanks,
Qu
>
> Thanks.
>
>> +
>> /* Ensure we skip the reserved area in the first 1M */
>> start = max_t(u64, start, SZ_1M);
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index d7670e2a9f39..4e51ef68ea72 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>> }
>>
>> mutex_lock(&fs_info->chunk_mutex);
>> + /*
>> + * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
>> + * current device boundary.
>> + * This shouldn't fail, as alloc_state should only utilize those two
>> + * bits, thus we shouldn't alloc new memory for clearing the status.
>> + *
>> + * So here we just do an ASSERT() to catch future behavior change.
>> + */
>> + ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
>> + CHUNK_TRIMMED | CHUNK_ALLOCATED);
>> + ASSERT(!ret);
>> +
>> btrfs_device_set_disk_total_bytes(device, new_size);
>> if (list_empty(&device->post_commit_list))
>> list_add_tail(&device->post_commit_list,
>> --
>> 2.28.0
>>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
2020-07-31 10:20 ` Qu Wenruo
@ 2020-07-31 10:38 ` Qu Wenruo
2020-07-31 10:42 ` Filipe Manana
2020-07-31 10:40 ` Filipe Manana
1 sibling, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2020-07-31 10:38 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs, Filipe Manana
[-- Attachment #1.1: Type: text/plain, Size: 6899 bytes --]
On 2020/7/31 下午6:20, Qu Wenruo wrote:
>
>
> On 2020/7/31 下午6:05, Filipe Manana wrote:
>> On Fri, Jul 31, 2020 at 10:49 AM Qu Wenruo <wqu@suse.com> wrote:
>>>
>>> [BUG]
>>> The following script can lead to tons of beyond device boundary access:
>>>
>>> mkfs.btrfs -f $dev -b 10G
>>> mount $dev $mnt
>>> trimfs $mnt
>>> btrfs filesystem resize 1:-1G $mnt
>>> trimfs $mnt
>>>
>>> [CAUSE]
>>> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
>>> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
>>> already trimmed.
>>>
>>> So we check device->alloc_state by finding the first range which doesn't
>>> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
>>>
>>> But if we shrunk the device, that bits are not cleared, thus we could
>>> easily got a range starts beyond the shrunk device size.
>>>
>>> This results the returned @start and @end are all beyond device size,
>>> then we call "end = min(end, device->total_bytes -1);" making @end
>>> smaller than device size.
>>>
>>> Then finally we goes "len = end - start + 1", totally underflow the
>>> result, and lead to the beyond-device-boundary access.
>>>
>>> [FIX]
>>> This patch will fix the problem in two ways:
>>> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
>>> This is the root fix
>>>
>>> - Add extra safe net when trimming free device extents
>>> We check and warn if the returned range is already beyond current
>>> device.
>>>
>>> Link: https://github.com/kdave/btrfs-progs/issues/282
>>> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>>> ---
>>> Changelog:
>>> v2:
>>> - Add proper fixes tag
>>> - Add extra warning for beyond device end case
>>> - Add graceful exit for already trimmed case
>>> v3:
>>> - Don't return EUCLEAN for beyond boundary access
>>> - Rephrase the warning message for beyond boundary access
>>> ---
>>> fs/btrfs/extent-tree.c | 21 +++++++++++++++++++++
>>> fs/btrfs/volumes.c | 12 ++++++++++++
>>> 2 files changed, 33 insertions(+)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index fa7d83051587..7c5e0961c93b 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -33,6 +33,7 @@
>>> #include "delalloc-space.h"
>>> #include "block-group.h"
>>> #include "discard.h"
>>> +#include "rcu-string.h"
>>>
>>> #undef SCRAMBLE_DELAYED_REFS
>>>
>>> @@ -5669,6 +5670,26 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
>>> &start, &end,
>>> CHUNK_TRIMMED | CHUNK_ALLOCATED);
>>>
>>> + /* CHUNK_* bits not cleared properly */
>>> + if (start > device->total_bytes) {
>>> + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>>> + btrfs_warn_in_rcu(fs_info,
>>> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
>>> + start, end - start + 1,
>>> + rcu_str_deref(device->name),
>>> + device->total_bytes);
>>> + mutex_unlock(&fs_info->chunk_mutex);
>>> + ret = 0;
>>> + break;
>>> + }
>>> +
>>> + /* The remaining part has already been trimmed */
>>> + if (start == device->total_bytes) {
>>> + mutex_unlock(&fs_info->chunk_mutex);
>>> + ret = 0;
>>> + break;
>>> + }
>>
>> Sorry I missed this earlier, but why is this a special case? Couldn't
>> this be merged into the previous check?
>> Why is an offset matching the ending of the device not considered unexpected?
>
> For such example:
> 0 1g 2g
> device 1: |///////////////| |
> |//| = Allocated space
> | | = Free space.
>
> After one fstrim, [1G, 2G) get trimmed.
> So in the alloc_state we have
> 0 1G 2G
> device 1: | |***************|
> |**| = CHUNK_TRIMMED bits set
>
> Here we just focus on the unallocated space, ignoring the block group parts.
>
> Then we run fstrim again.
> We call find_first_clear_extent_bit(start == 1G), then we got the result
> start == 2G, end = U64_MAX.
>
> In that case, we got start == device->total_bytes, and it's completely
> valid.
Sorry, although this is correct, it's duplicated with the later checks:
end = min(end, device->total_bytes - 1);
len = end - start + 1;
/* We didn't find any extents */
if (!len) {
mutex_unlock(&fs_info->chunk_mutex);
ret = 0;
break;
}
If we got a returned start == device->total_bytes, then here we would
hit len == 0, and exit.
So my (start == device->total_bytes) is duplicated.
I guess the existing one is easier to understand, thus my extra check
should be removed.
Thanks,
Qu
>
>>
>> I also don't understand the comment, what is the remaining part?
>
> The remaining means the unallocated space from the @start of
> find_first_clear_extent_bit().
>
> Any better suggestion?
>
> Thanks,
> Qu
>
>>
>> Thanks.
>>
>>> +
>>> /* Ensure we skip the reserved area in the first 1M */
>>> start = max_t(u64, start, SZ_1M);
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index d7670e2a9f39..4e51ef68ea72 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>>> }
>>>
>>> mutex_lock(&fs_info->chunk_mutex);
>>> + /*
>>> + * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
>>> + * current device boundary.
>>> + * This shouldn't fail, as alloc_state should only utilize those two
>>> + * bits, thus we shouldn't alloc new memory for clearing the status.
>>> + *
>>> + * So here we just do an ASSERT() to catch future behavior change.
>>> + */
>>> + ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
>>> + CHUNK_TRIMMED | CHUNK_ALLOCATED);
>>> + ASSERT(!ret);
>>> +
>>> btrfs_device_set_disk_total_bytes(device, new_size);
>>> if (list_empty(&device->post_commit_list))
>>> list_add_tail(&device->post_commit_list,
>>> --
>>> 2.28.0
>>>
>>
>>
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
2020-07-31 10:20 ` Qu Wenruo
2020-07-31 10:38 ` Qu Wenruo
@ 2020-07-31 10:40 ` Filipe Manana
1 sibling, 0 replies; 7+ messages in thread
From: Filipe Manana @ 2020-07-31 10:40 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, Filipe Manana
On Fri, Jul 31, 2020 at 11:21 AM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> On 2020/7/31 下午6:05, Filipe Manana wrote:
> > On Fri, Jul 31, 2020 at 10:49 AM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> [BUG]
> >> The following script can lead to tons of beyond device boundary access:
> >>
> >> mkfs.btrfs -f $dev -b 10G
> >> mount $dev $mnt
> >> trimfs $mnt
> >> btrfs filesystem resize 1:-1G $mnt
> >> trimfs $mnt
> >>
> >> [CAUSE]
> >> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
> >> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
> >> already trimmed.
> >>
> >> So we check device->alloc_state by finding the first range which doesn't
> >> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
> >>
> >> But if we shrunk the device, that bits are not cleared, thus we could
> >> easily got a range starts beyond the shrunk device size.
> >>
> >> This results the returned @start and @end are all beyond device size,
> >> then we call "end = min(end, device->total_bytes -1);" making @end
> >> smaller than device size.
> >>
> >> Then finally we goes "len = end - start + 1", totally underflow the
> >> result, and lead to the beyond-device-boundary access.
> >>
> >> [FIX]
> >> This patch will fix the problem in two ways:
> >> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
> >> This is the root fix
> >>
> >> - Add extra safe net when trimming free device extents
> >> We check and warn if the returned range is already beyond current
> >> device.
> >>
> >> Link: https://github.com/kdave/btrfs-progs/issues/282
> >> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> >> ---
> >> Changelog:
> >> v2:
> >> - Add proper fixes tag
> >> - Add extra warning for beyond device end case
> >> - Add graceful exit for already trimmed case
> >> v3:
> >> - Don't return EUCLEAN for beyond boundary access
> >> - Rephrase the warning message for beyond boundary access
> >> ---
> >> fs/btrfs/extent-tree.c | 21 +++++++++++++++++++++
> >> fs/btrfs/volumes.c | 12 ++++++++++++
> >> 2 files changed, 33 insertions(+)
> >>
> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >> index fa7d83051587..7c5e0961c93b 100644
> >> --- a/fs/btrfs/extent-tree.c
> >> +++ b/fs/btrfs/extent-tree.c
> >> @@ -33,6 +33,7 @@
> >> #include "delalloc-space.h"
> >> #include "block-group.h"
> >> #include "discard.h"
> >> +#include "rcu-string.h"
> >>
> >> #undef SCRAMBLE_DELAYED_REFS
> >>
> >> @@ -5669,6 +5670,26 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
> >> &start, &end,
> >> CHUNK_TRIMMED | CHUNK_ALLOCATED);
> >>
> >> + /* CHUNK_* bits not cleared properly */
> >> + if (start > device->total_bytes) {
> >> + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> >> + btrfs_warn_in_rcu(fs_info,
> >> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
> >> + start, end - start + 1,
> >> + rcu_str_deref(device->name),
> >> + device->total_bytes);
> >> + mutex_unlock(&fs_info->chunk_mutex);
> >> + ret = 0;
> >> + break;
> >> + }
> >> +
> >> + /* The remaining part has already been trimmed */
> >> + if (start == device->total_bytes) {
> >> + mutex_unlock(&fs_info->chunk_mutex);
> >> + ret = 0;
> >> + break;
> >> + }
> >
> > Sorry I missed this earlier, but why is this a special case? Couldn't
> > this be merged into the previous check?
> > Why is an offset matching the ending of the device not considered unexpected?
>
> For such example:
> 0 1g 2g
> device 1: |///////////////| |
> |//| = Allocated space
> | | = Free space.
>
> After one fstrim, [1G, 2G) get trimmed.
> So in the alloc_state we have
> 0 1G 2G
> device 1: | |***************|
> |**| = CHUNK_TRIMMED bits set
>
> Here we just focus on the unallocated space, ignoring the block group parts.
>
> Then we run fstrim again.
> We call find_first_clear_extent_bit(start == 1G), then we got the result
> start == 2G, end = U64_MAX.
>
> In that case, we got start == device->total_bytes, and it's completely
> valid.
Ok. But this can happen without shrinking the device before, and it
seems we already handle it, or is the existing handling buggy?
If it is, it should be replaced or updated.
Thanks.
>
> >
> > I also don't understand the comment, what is the remaining part?
>
> The remaining means the unallocated space from the @start of
> find_first_clear_extent_bit().
>
> Any better suggestion?
>
> Thanks,
> Qu
>
> >
> > Thanks.
> >
> >> +
> >> /* Ensure we skip the reserved area in the first 1M */
> >> start = max_t(u64, start, SZ_1M);
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index d7670e2a9f39..4e51ef68ea72 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
> >> }
> >>
> >> mutex_lock(&fs_info->chunk_mutex);
> >> + /*
> >> + * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
> >> + * current device boundary.
> >> + * This shouldn't fail, as alloc_state should only utilize those two
> >> + * bits, thus we shouldn't alloc new memory for clearing the status.
> >> + *
> >> + * So here we just do an ASSERT() to catch future behavior change.
> >> + */
> >> + ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
> >> + CHUNK_TRIMMED | CHUNK_ALLOCATED);
> >> + ASSERT(!ret);
> >> +
> >> btrfs_device_set_disk_total_bytes(device, new_size);
> >> if (list_empty(&device->post_commit_list))
> >> list_add_tail(&device->post_commit_list,
> >> --
> >> 2.28.0
> >>
> >
> >
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
2020-07-31 10:38 ` Qu Wenruo
@ 2020-07-31 10:42 ` Filipe Manana
0 siblings, 0 replies; 7+ messages in thread
From: Filipe Manana @ 2020-07-31 10:42 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, Filipe Manana
On Fri, Jul 31, 2020 at 11:38 AM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> On 2020/7/31 下午6:20, Qu Wenruo wrote:
> >
> >
> > On 2020/7/31 下午6:05, Filipe Manana wrote:
> >> On Fri, Jul 31, 2020 at 10:49 AM Qu Wenruo <wqu@suse.com> wrote:
> >>>
> >>> [BUG]
> >>> The following script can lead to tons of beyond device boundary access:
> >>>
> >>> mkfs.btrfs -f $dev -b 10G
> >>> mount $dev $mnt
> >>> trimfs $mnt
> >>> btrfs filesystem resize 1:-1G $mnt
> >>> trimfs $mnt
> >>>
> >>> [CAUSE]
> >>> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
> >>> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
> >>> already trimmed.
> >>>
> >>> So we check device->alloc_state by finding the first range which doesn't
> >>> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
> >>>
> >>> But if we shrunk the device, that bits are not cleared, thus we could
> >>> easily got a range starts beyond the shrunk device size.
> >>>
> >>> This results the returned @start and @end are all beyond device size,
> >>> then we call "end = min(end, device->total_bytes -1);" making @end
> >>> smaller than device size.
> >>>
> >>> Then finally we goes "len = end - start + 1", totally underflow the
> >>> result, and lead to the beyond-device-boundary access.
> >>>
> >>> [FIX]
> >>> This patch will fix the problem in two ways:
> >>> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
> >>> This is the root fix
> >>>
> >>> - Add extra safe net when trimming free device extents
> >>> We check and warn if the returned range is already beyond current
> >>> device.
> >>>
> >>> Link: https://github.com/kdave/btrfs-progs/issues/282
> >>> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
> >>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> >>> ---
> >>> Changelog:
> >>> v2:
> >>> - Add proper fixes tag
> >>> - Add extra warning for beyond device end case
> >>> - Add graceful exit for already trimmed case
> >>> v3:
> >>> - Don't return EUCLEAN for beyond boundary access
> >>> - Rephrase the warning message for beyond boundary access
> >>> ---
> >>> fs/btrfs/extent-tree.c | 21 +++++++++++++++++++++
> >>> fs/btrfs/volumes.c | 12 ++++++++++++
> >>> 2 files changed, 33 insertions(+)
> >>>
> >>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >>> index fa7d83051587..7c5e0961c93b 100644
> >>> --- a/fs/btrfs/extent-tree.c
> >>> +++ b/fs/btrfs/extent-tree.c
> >>> @@ -33,6 +33,7 @@
> >>> #include "delalloc-space.h"
> >>> #include "block-group.h"
> >>> #include "discard.h"
> >>> +#include "rcu-string.h"
> >>>
> >>> #undef SCRAMBLE_DELAYED_REFS
> >>>
> >>> @@ -5669,6 +5670,26 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
> >>> &start, &end,
> >>> CHUNK_TRIMMED | CHUNK_ALLOCATED);
> >>>
> >>> + /* CHUNK_* bits not cleared properly */
> >>> + if (start > device->total_bytes) {
> >>> + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> >>> + btrfs_warn_in_rcu(fs_info,
> >>> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
> >>> + start, end - start + 1,
> >>> + rcu_str_deref(device->name),
> >>> + device->total_bytes);
> >>> + mutex_unlock(&fs_info->chunk_mutex);
> >>> + ret = 0;
> >>> + break;
> >>> + }
> >>> +
> >>> + /* The remaining part has already been trimmed */
> >>> + if (start == device->total_bytes) {
> >>> + mutex_unlock(&fs_info->chunk_mutex);
> >>> + ret = 0;
> >>> + break;
> >>> + }
> >>
> >> Sorry I missed this earlier, but why is this a special case? Couldn't
> >> this be merged into the previous check?
> >> Why is an offset matching the ending of the device not considered unexpected?
> >
> > For such example:
> > 0 1g 2g
> > device 1: |///////////////| |
> > |//| = Allocated space
> > | | = Free space.
> >
> > After one fstrim, [1G, 2G) get trimmed.
> > So in the alloc_state we have
> > 0 1G 2G
> > device 1: | |***************|
> > |**| = CHUNK_TRIMMED bits set
> >
> > Here we just focus on the unallocated space, ignoring the block group parts.
> >
> > Then we run fstrim again.
> > We call find_first_clear_extent_bit(start == 1G), then we got the result
> > start == 2G, end = U64_MAX.
> >
> > In that case, we got start == device->total_bytes, and it's completely
> > valid.
>
> Sorry, although this is correct, it's duplicated with the later checks:
>
> end = min(end, device->total_bytes - 1);
>
> len = end - start + 1;
>
> /* We didn't find any extents */
> if (!len) {
> mutex_unlock(&fs_info->chunk_mutex);
> ret = 0;
> break;
> }
>
> If we got a returned start == device->total_bytes, then here we would
> hit len == 0, and exit.
>
> So my (start == device->total_bytes) is duplicated.
>
> I guess the existing one is easier to understand, thus my extra check
> should be removed.
Hit reply too soon before seeing this reply. Yes, it seems correct to
me as well.
Thanks.
>
> Thanks,
> Qu
>
> >
> >>
> >> I also don't understand the comment, what is the remaining part?
> >
> > The remaining means the unallocated space from the @start of
> > find_first_clear_extent_bit().
> >
> > Any better suggestion?
> >
> > Thanks,
> > Qu
> >
> >>
> >> Thanks.
> >>
> >>> +
> >>> /* Ensure we skip the reserved area in the first 1M */
> >>> start = max_t(u64, start, SZ_1M);
> >>>
> >>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >>> index d7670e2a9f39..4e51ef68ea72 100644
> >>> --- a/fs/btrfs/volumes.c
> >>> +++ b/fs/btrfs/volumes.c
> >>> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
> >>> }
> >>>
> >>> mutex_lock(&fs_info->chunk_mutex);
> >>> + /*
> >>> + * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
> >>> + * current device boundary.
> >>> + * This shouldn't fail, as alloc_state should only utilize those two
> >>> + * bits, thus we shouldn't alloc new memory for clearing the status.
> >>> + *
> >>> + * So here we just do an ASSERT() to catch future behavior change.
> >>> + */
> >>> + ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
> >>> + CHUNK_TRIMMED | CHUNK_ALLOCATED);
> >>> + ASSERT(!ret);
> >>> +
> >>> btrfs_device_set_disk_total_bytes(device, new_size);
> >>> if (list_empty(&device->post_commit_list))
> >>> list_add_tail(&device->post_commit_list,
> >>> --
> >>> 2.28.0
> >>>
> >>
> >>
> >
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
2020-07-31 9:48 [PATCH v3] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary Qu Wenruo
2020-07-31 10:05 ` Filipe Manana
@ 2020-07-31 20:52 ` kernel test robot
1 sibling, 0 replies; 7+ messages in thread
From: kernel test robot @ 2020-07-31 20:52 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs; +Cc: kbuild-all, Filipe Manana
[-- Attachment #1: Type: text/plain, Size: 5997 bytes --]
Hi Qu,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on kdave/for-next]
[also build test WARNING on v5.8-rc7 next-20200731]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-trim-fix-underflow-in-trim-length-to-prevent-access-beyond-device-boundary/20200731-174928
base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: x86_64-randconfig-s022-20200731 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-14) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.2-115-g5fc204f2-dirty
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
>> fs/btrfs/extent-tree.c:5676:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
>> fs/btrfs/extent-tree.c:5676:25: sparse: struct rcu_string [noderef] __rcu *
>> fs/btrfs/extent-tree.c:5676:25: sparse: struct rcu_string *
fs/btrfs/extent-tree.c:1769:9: sparse: sparse: context imbalance in 'run_and_cleanup_extent_op' - unexpected unlock
fs/btrfs/extent-tree.c:1842:28: sparse: sparse: context imbalance in 'cleanup_ref_head' - unexpected unlock
fs/btrfs/extent-tree.c:1918:36: sparse: sparse: context imbalance in 'btrfs_run_delayed_refs_for_head' - unexpected unlock
fs/btrfs/extent-tree.c:1983:21: sparse: sparse: context imbalance in '__btrfs_run_delayed_refs' - wrong count at exit
vim +5676 fs/btrfs/extent-tree.c
5619
5620 /*
5621 * It used to be that old block groups would be left around forever.
5622 * Iterating over them would be enough to trim unused space. Since we
5623 * now automatically remove them, we also need to iterate over unallocated
5624 * space.
5625 *
5626 * We don't want a transaction for this since the discard may take a
5627 * substantial amount of time. We don't require that a transaction be
5628 * running, but we do need to take a running transaction into account
5629 * to ensure that we're not discarding chunks that were released or
5630 * allocated in the current transaction.
5631 *
5632 * Holding the chunks lock will prevent other threads from allocating
5633 * or releasing chunks, but it won't prevent a running transaction
5634 * from committing and releasing the memory that the pending chunks
5635 * list head uses. For that, we need to take a reference to the
5636 * transaction and hold the commit root sem. We only need to hold
5637 * it while performing the free space search since we have already
5638 * held back allocations.
5639 */
5640 static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
5641 {
5642 u64 start = SZ_1M, len = 0, end = 0;
5643 int ret;
5644
5645 *trimmed = 0;
5646
5647 /* Discard not supported = nothing to do. */
5648 if (!blk_queue_discard(bdev_get_queue(device->bdev)))
5649 return 0;
5650
5651 /* Not writable = nothing to do. */
5652 if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
5653 return 0;
5654
5655 /* No free space = nothing to do. */
5656 if (device->total_bytes <= device->bytes_used)
5657 return 0;
5658
5659 ret = 0;
5660
5661 while (1) {
5662 struct btrfs_fs_info *fs_info = device->fs_info;
5663 u64 bytes;
5664
5665 ret = mutex_lock_interruptible(&fs_info->chunk_mutex);
5666 if (ret)
5667 break;
5668
5669 find_first_clear_extent_bit(&device->alloc_state, start,
5670 &start, &end,
5671 CHUNK_TRIMMED | CHUNK_ALLOCATED);
5672
5673 /* CHUNK_* bits not cleared properly */
5674 if (start > device->total_bytes) {
5675 WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> 5676 btrfs_warn_in_rcu(fs_info,
5677 "ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
5678 start, end - start + 1,
5679 rcu_str_deref(device->name),
5680 device->total_bytes);
5681 mutex_unlock(&fs_info->chunk_mutex);
5682 ret = 0;
5683 break;
5684 }
5685
5686 /* The remaining part has already been trimmed */
5687 if (start == device->total_bytes) {
5688 mutex_unlock(&fs_info->chunk_mutex);
5689 ret = 0;
5690 break;
5691 }
5692
5693 /* Ensure we skip the reserved area in the first 1M */
5694 start = max_t(u64, start, SZ_1M);
5695
5696 /*
5697 * If find_first_clear_extent_bit find a range that spans the
5698 * end of the device it will set end to -1, in this case it's up
5699 * to the caller to trim the value to the size of the device.
5700 */
5701 end = min(end, device->total_bytes - 1);
5702
5703 len = end - start + 1;
5704
5705 /* We didn't find any extents */
5706 if (!len) {
5707 mutex_unlock(&fs_info->chunk_mutex);
5708 ret = 0;
5709 break;
5710 }
5711
5712 ret = btrfs_issue_discard(device->bdev, start, len,
5713 &bytes);
5714 if (!ret)
5715 set_extent_bits(&device->alloc_state, start,
5716 start + bytes - 1,
5717 CHUNK_TRIMMED);
5718 mutex_unlock(&fs_info->chunk_mutex);
5719
5720 if (ret)
5721 break;
5722
5723 start += len;
5724 *trimmed += bytes;
5725
5726 if (fatal_signal_pending(current)) {
5727 ret = -ERESTARTSYS;
5728 break;
5729 }
5730
5731 cond_resched();
5732 }
5733
5734 return ret;
5735 }
5736
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33364 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-07-31 20:53 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-31 9:48 [PATCH v3] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary Qu Wenruo
2020-07-31 10:05 ` Filipe Manana
2020-07-31 10:20 ` Qu Wenruo
2020-07-31 10:38 ` Qu Wenruo
2020-07-31 10:42 ` Filipe Manana
2020-07-31 10:40 ` Filipe Manana
2020-07-31 20:52 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox