Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* [PATCH v2] btrfs: fix u32 overflows when left shifting @stripe_nr
@ 2023-06-20  9:57 Qu Wenruo
  2023-06-20 10:27 ` David Sterba
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2023-06-20  9:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

[BUG]
David reported an ASSERT() get triggered during certain fio load.

The ASSERT() is from rbio_add_bio() of raid56.c:

	ASSERT(orig_logical >= full_stripe_start &&
	       orig_logical + orig_len <= full_stripe_start +
	       rbio->nr_data * BTRFS_STRIPE_LEN);

Which is checking if the target rbio is crossing the full stripe
boundary.

[CAUSE]
Commit a97699d1d610 ("btrfs: replace map_lookup->stripe_len by
BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
u64 division.

Function btrfs_max_io_len() is to get the length to the stripe boundary.

It calculates the full stripe start offset (inside the chunk) by the
following command:

		*full_stripe_start =
			rounddown(*stripe_nr, nr_data_stripes(map)) <<
			BTRFS_STRIPE_LEN_SHIFT;

The calculation itself is fine, but the value returned by rounddown() is
dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
returned int).

Thus the result is also u32, then we do the left shift, which can
overflow u32.

If such overflow happens, @full_stripe_start will be a value way smaller
than @offset, causing later "full_stripe_len - (offset -
*full_stripe_start)" to underflow, thus make later length calculation to
have no stripe boundary limit, resulting a write bio to exceed stripe
boundary.

There are some other locations like this, with a u32 @stripe_nr got left
shift, which can lead to a similar overflow.

[FIX]
Fix all @stripe_nr with left shift with a type cast to u64 before the
left shift.

Those involved @stripe_nr or similar variables are recording the stripe
number inside the chunk, which is small enough to be contained by u32,
but their offset inside the chunk can not fit into u32.

Thus for those specific left shifts, a type cast to u64 is necessary.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Fix all @stripe_nr with left shift
- Apply the ASSERT() on full stripe checks for all RAID56 IOs.
---
 fs/btrfs/volumes.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b8540af6e136..ed3765d21cb0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5985,12 +5985,12 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
 	stripe_nr = offset >> BTRFS_STRIPE_LEN_SHIFT;
 
 	/* stripe_offset is the offset of this block in its stripe */
-	stripe_offset = offset - (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
+	stripe_offset = offset - ((u64)stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
 
 	stripe_nr_end = round_up(offset + length, BTRFS_STRIPE_LEN) >>
 			BTRFS_STRIPE_LEN_SHIFT;
 	stripe_cnt = stripe_nr_end - stripe_nr;
-	stripe_end_offset = (stripe_nr_end << BTRFS_STRIPE_LEN_SHIFT) -
+	stripe_end_offset = ((u64)stripe_nr_end << BTRFS_STRIPE_LEN_SHIFT) -
 			    (offset + length);
 	/*
 	 * after this, stripe_nr is the number of stripes on this
@@ -6033,7 +6033,7 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
 	for (i = 0; i < *num_stripes; i++) {
 		stripes[i].physical =
 			map->stripes[stripe_index].physical +
-			stripe_offset + (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
+			stripe_offset + ((u64)stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
 		stripes[i].dev = map->stripes[stripe_index].dev;
 
 		if (map->type & (BTRFS_BLOCK_GROUP_RAID0 |
@@ -6199,15 +6199,18 @@ static u64 btrfs_max_io_len(struct map_lookup *map, enum btrfs_map_op op,
 		 * not ensured to be power of 2.
 		 */
 		*full_stripe_start =
-			rounddown(*stripe_nr, nr_data_stripes(map)) <<
+			(u64)rounddown(*stripe_nr, nr_data_stripes(map)) <<
 			BTRFS_STRIPE_LEN_SHIFT;
 
+		ASSERT(*full_stripe_start + full_stripe_len > offset);
+		ASSERT(*full_stripe_start <= offset);
 		/*
 		 * For writes to RAID56, allow to write a full stripe set, but
 		 * no straddling of stripe sets.
 		 */
-		if (op == BTRFS_MAP_WRITE)
+		if (op == BTRFS_MAP_WRITE) {
 			return full_stripe_len - (offset - *full_stripe_start);
+		}
 	}
 
 	/*
@@ -6224,7 +6227,7 @@ static void set_io_stripe(struct btrfs_io_stripe *dst, const struct map_lookup *
 {
 	dst->dev = map->stripes[stripe_index].dev;
 	dst->physical = map->stripes[stripe_index].physical +
-			stripe_offset + (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
+			stripe_offset + ((u64)stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
 }
 
 int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] btrfs: fix u32 overflows when left shifting @stripe_nr
  2023-06-20  9:57 [PATCH v2] btrfs: fix u32 overflows when left shifting @stripe_nr Qu Wenruo
@ 2023-06-20 10:27 ` David Sterba
  2023-06-20 11:24   ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: David Sterba @ 2023-06-20 10:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, dsterba

On Tue, Jun 20, 2023 at 05:57:31PM +0800, Qu Wenruo wrote:
> [BUG]
> David reported an ASSERT() get triggered during certain fio load.
> 
> The ASSERT() is from rbio_add_bio() of raid56.c:
> 
> 	ASSERT(orig_logical >= full_stripe_start &&
> 	       orig_logical + orig_len <= full_stripe_start +
> 	       rbio->nr_data * BTRFS_STRIPE_LEN);
> 
> Which is checking if the target rbio is crossing the full stripe
> boundary.
> 
> [CAUSE]
> Commit a97699d1d610 ("btrfs: replace map_lookup->stripe_len by
> BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
> u64 division.
> 
> Function btrfs_max_io_len() is to get the length to the stripe boundary.
> 
> It calculates the full stripe start offset (inside the chunk) by the
> following command:
> 
> 		*full_stripe_start =
> 			rounddown(*stripe_nr, nr_data_stripes(map)) <<
> 			BTRFS_STRIPE_LEN_SHIFT;
> 
> The calculation itself is fine, but the value returned by rounddown() is
> dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
> returned int).
> 
> Thus the result is also u32, then we do the left shift, which can
> overflow u32.
> 
> If such overflow happens, @full_stripe_start will be a value way smaller
> than @offset, causing later "full_stripe_len - (offset -
> *full_stripe_start)" to underflow, thus make later length calculation to
> have no stripe boundary limit, resulting a write bio to exceed stripe
> boundary.
> 
> There are some other locations like this, with a u32 @stripe_nr got left
> shift, which can lead to a similar overflow.
> 
> [FIX]
> Fix all @stripe_nr with left shift with a type cast to u64 before the
> left shift.
> 
> Those involved @stripe_nr or similar variables are recording the stripe
> number inside the chunk, which is small enough to be contained by u32,
> but their offset inside the chunk can not fit into u32.
> 
> Thus for those specific left shifts, a type cast to u64 is necessary.
> 
> Reported-by: David Sterba <dsterba@suse.com>
> Fixes: a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Fix all @stripe_nr with left shift
> - Apply the ASSERT() on full stripe checks for all RAID56 IOs.
> ---
>  fs/btrfs/volumes.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b8540af6e136..ed3765d21cb0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5985,12 +5985,12 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
>  	stripe_nr = offset >> BTRFS_STRIPE_LEN_SHIFT;
>  
>  	/* stripe_offset is the offset of this block in its stripe */
> -	stripe_offset = offset - (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
> +	stripe_offset = offset - ((u64)stripe_nr << BTRFS_STRIPE_LEN_SHIFT);

This needs a helper, mandating a type cast for correctness in so many
places is a bad pattern.

>  
>  	stripe_nr_end = round_up(offset + length, BTRFS_STRIPE_LEN) >>
>  			BTRFS_STRIPE_LEN_SHIFT;
>  	stripe_cnt = stripe_nr_end - stripe_nr;
> -	stripe_end_offset = (stripe_nr_end << BTRFS_STRIPE_LEN_SHIFT) -
> +	stripe_end_offset = ((u64)stripe_nr_end << BTRFS_STRIPE_LEN_SHIFT) -
>  			    (offset + length);
>  	/*
>  	 * after this, stripe_nr is the number of stripes on this
> @@ -6033,7 +6033,7 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
>  	for (i = 0; i < *num_stripes; i++) {
>  		stripes[i].physical =
>  			map->stripes[stripe_index].physical +
> -			stripe_offset + (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
> +			stripe_offset + ((u64)stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
>  		stripes[i].dev = map->stripes[stripe_index].dev;
>  
>  		if (map->type & (BTRFS_BLOCK_GROUP_RAID0 |
> @@ -6199,15 +6199,18 @@ static u64 btrfs_max_io_len(struct map_lookup *map, enum btrfs_map_op op,
>  		 * not ensured to be power of 2.
>  		 */
>  		*full_stripe_start =
> -			rounddown(*stripe_nr, nr_data_stripes(map)) <<
> +			(u64)rounddown(*stripe_nr, nr_data_stripes(map)) <<
>  			BTRFS_STRIPE_LEN_SHIFT;
>  
> +		ASSERT(*full_stripe_start + full_stripe_len > offset);
> +		ASSERT(*full_stripe_start <= offset);
>  		/*
>  		 * For writes to RAID56, allow to write a full stripe set, but
>  		 * no straddling of stripe sets.
>  		 */
> -		if (op == BTRFS_MAP_WRITE)
> +		if (op == BTRFS_MAP_WRITE) {
>  			return full_stripe_len - (offset - *full_stripe_start);
> +		}

No { }

>  	}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] btrfs: fix u32 overflows when left shifting @stripe_nr
  2023-06-20 10:27 ` David Sterba
@ 2023-06-20 11:24   ` Qu Wenruo
  2023-06-20 11:56     ` David Sterba
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2023-06-20 11:24 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs, dsterba



On 2023/6/20 18:27, David Sterba wrote:
> On Tue, Jun 20, 2023 at 05:57:31PM +0800, Qu Wenruo wrote:
>> [BUG]
>> David reported an ASSERT() get triggered during certain fio load.
>>
>> The ASSERT() is from rbio_add_bio() of raid56.c:
>>
>> 	ASSERT(orig_logical >= full_stripe_start &&
>> 	       orig_logical + orig_len <= full_stripe_start +
>> 	       rbio->nr_data * BTRFS_STRIPE_LEN);
>>
>> Which is checking if the target rbio is crossing the full stripe
>> boundary.
>>
>> [CAUSE]
>> Commit a97699d1d610 ("btrfs: replace map_lookup->stripe_len by
>> BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
>> u64 division.
>>
>> Function btrfs_max_io_len() is to get the length to the stripe boundary.
>>
>> It calculates the full stripe start offset (inside the chunk) by the
>> following command:
>>
>> 		*full_stripe_start =
>> 			rounddown(*stripe_nr, nr_data_stripes(map)) <<
>> 			BTRFS_STRIPE_LEN_SHIFT;
>>
>> The calculation itself is fine, but the value returned by rounddown() is
>> dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
>> returned int).
>>
>> Thus the result is also u32, then we do the left shift, which can
>> overflow u32.
>>
>> If such overflow happens, @full_stripe_start will be a value way smaller
>> than @offset, causing later "full_stripe_len - (offset -
>> *full_stripe_start)" to underflow, thus make later length calculation to
>> have no stripe boundary limit, resulting a write bio to exceed stripe
>> boundary.
>>
>> There are some other locations like this, with a u32 @stripe_nr got left
>> shift, which can lead to a similar overflow.
>>
>> [FIX]
>> Fix all @stripe_nr with left shift with a type cast to u64 before the
>> left shift.
>>
>> Those involved @stripe_nr or similar variables are recording the stripe
>> number inside the chunk, which is small enough to be contained by u32,
>> but their offset inside the chunk can not fit into u32.
>>
>> Thus for those specific left shifts, a type cast to u64 is necessary.
>>
>> Reported-by: David Sterba <dsterba@suse.com>
>> Fixes: a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Changelog:
>> v2:
>> - Fix all @stripe_nr with left shift
>> - Apply the ASSERT() on full stripe checks for all RAID56 IOs.
>> ---
>>   fs/btrfs/volumes.c | 15 +++++++++------
>>   1 file changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index b8540af6e136..ed3765d21cb0 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -5985,12 +5985,12 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
>>   	stripe_nr = offset >> BTRFS_STRIPE_LEN_SHIFT;
>>
>>   	/* stripe_offset is the offset of this block in its stripe */
>> -	stripe_offset = offset - (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
>> +	stripe_offset = offset - ((u64)stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
>
> This needs a helper, mandating a type cast for correctness in so many
> places is a bad pattern.

The problem is, we still need to manually determine if we need a cast or
not.

For a lot of cases like "for (int i = 0; i < nr_data_stripes; i++) { do
with i << BTRFS_STRIPE_LEN_SHIFT;}", it's safe to go with 32 bit and
left shift.

So even with a helper, it's still the same, we need to manually decide
if we need such conversion.

Thanks,
Qu
>
>>
>>   	stripe_nr_end = round_up(offset + length, BTRFS_STRIPE_LEN) >>
>>   			BTRFS_STRIPE_LEN_SHIFT;
>>   	stripe_cnt = stripe_nr_end - stripe_nr;
>> -	stripe_end_offset = (stripe_nr_end << BTRFS_STRIPE_LEN_SHIFT) -
>> +	stripe_end_offset = ((u64)stripe_nr_end << BTRFS_STRIPE_LEN_SHIFT) -
>>   			    (offset + length);
>>   	/*
>>   	 * after this, stripe_nr is the number of stripes on this
>> @@ -6033,7 +6033,7 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
>>   	for (i = 0; i < *num_stripes; i++) {
>>   		stripes[i].physical =
>>   			map->stripes[stripe_index].physical +
>> -			stripe_offset + (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
>> +			stripe_offset + ((u64)stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
>>   		stripes[i].dev = map->stripes[stripe_index].dev;
>>
>>   		if (map->type & (BTRFS_BLOCK_GROUP_RAID0 |
>> @@ -6199,15 +6199,18 @@ static u64 btrfs_max_io_len(struct map_lookup *map, enum btrfs_map_op op,
>>   		 * not ensured to be power of 2.
>>   		 */
>>   		*full_stripe_start =
>> -			rounddown(*stripe_nr, nr_data_stripes(map)) <<
>> +			(u64)rounddown(*stripe_nr, nr_data_stripes(map)) <<
>>   			BTRFS_STRIPE_LEN_SHIFT;
>>
>> +		ASSERT(*full_stripe_start + full_stripe_len > offset);
>> +		ASSERT(*full_stripe_start <= offset);
>>   		/*
>>   		 * For writes to RAID56, allow to write a full stripe set, but
>>   		 * no straddling of stripe sets.
>>   		 */
>> -		if (op == BTRFS_MAP_WRITE)
>> +		if (op == BTRFS_MAP_WRITE) {
>>   			return full_stripe_len - (offset - *full_stripe_start);
>> +		}
>
> No { }
>
>>   	}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] btrfs: fix u32 overflows when left shifting @stripe_nr
  2023-06-20 11:24   ` Qu Wenruo
@ 2023-06-20 11:56     ` David Sterba
  2023-06-20 12:05       ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: David Sterba @ 2023-06-20 11:56 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, dsterba

On Tue, Jun 20, 2023 at 07:24:24PM +0800, Qu Wenruo wrote:
> On 2023/6/20 18:27, David Sterba wrote:
> > On Tue, Jun 20, 2023 at 05:57:31PM +0800, Qu Wenruo wrote:
> >> [BUG]
> >> David reported an ASSERT() get triggered during certain fio load.
> >>
> >> The ASSERT() is from rbio_add_bio() of raid56.c:
> >>
> >> 	ASSERT(orig_logical >= full_stripe_start &&
> >> 	       orig_logical + orig_len <= full_stripe_start +
> >> 	       rbio->nr_data * BTRFS_STRIPE_LEN);
> >>
> >> Which is checking if the target rbio is crossing the full stripe
> >> boundary.
> >>
> >> [CAUSE]
> >> Commit a97699d1d610 ("btrfs: replace map_lookup->stripe_len by
> >> BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
> >> u64 division.
> >>
> >> Function btrfs_max_io_len() is to get the length to the stripe boundary.
> >>
> >> It calculates the full stripe start offset (inside the chunk) by the
> >> following command:
> >>
> >> 		*full_stripe_start =
> >> 			rounddown(*stripe_nr, nr_data_stripes(map)) <<
> >> 			BTRFS_STRIPE_LEN_SHIFT;
> >>
> >> The calculation itself is fine, but the value returned by rounddown() is
> >> dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
> >> returned int).
> >>
> >> Thus the result is also u32, then we do the left shift, which can
> >> overflow u32.
> >>
> >> If such overflow happens, @full_stripe_start will be a value way smaller
> >> than @offset, causing later "full_stripe_len - (offset -
> >> *full_stripe_start)" to underflow, thus make later length calculation to
> >> have no stripe boundary limit, resulting a write bio to exceed stripe
> >> boundary.
> >>
> >> There are some other locations like this, with a u32 @stripe_nr got left
> >> shift, which can lead to a similar overflow.
> >>
> >> [FIX]
> >> Fix all @stripe_nr with left shift with a type cast to u64 before the
> >> left shift.
> >>
> >> Those involved @stripe_nr or similar variables are recording the stripe
> >> number inside the chunk, which is small enough to be contained by u32,
> >> but their offset inside the chunk can not fit into u32.
> >>
> >> Thus for those specific left shifts, a type cast to u64 is necessary.
> >>
> >> Reported-by: David Sterba <dsterba@suse.com>
> >> Fixes: a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >> Changelog:
> >> v2:
> >> - Fix all @stripe_nr with left shift
> >> - Apply the ASSERT() on full stripe checks for all RAID56 IOs.
> >> ---
> >>   fs/btrfs/volumes.c | 15 +++++++++------
> >>   1 file changed, 9 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index b8540af6e136..ed3765d21cb0 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -5985,12 +5985,12 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
> >>   	stripe_nr = offset >> BTRFS_STRIPE_LEN_SHIFT;
> >>
> >>   	/* stripe_offset is the offset of this block in its stripe */
> >> -	stripe_offset = offset - (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
> >> +	stripe_offset = offset - ((u64)stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
> >
> > This needs a helper, mandating a type cast for correctness in so many
> > places is a bad pattern.
> 
> The problem is, we still need to manually determine if we need a cast or
> not.
> 
> For a lot of cases like "for (int i = 0; i < nr_data_stripes; i++) { do
> with i << BTRFS_STRIPE_LEN_SHIFT;}", it's safe to go with 32 bit and
> left shift.

The helper is supposed to avoid deciding if the cast is needed or not,
so the raw "<< BTRFS_STRIPE_LEN_SHIFT" should be abstracted away
everywhere and any uncommented occurece considered for closer
inspection. If you have a specific example where this would not work
please point to the code.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] btrfs: fix u32 overflows when left shifting @stripe_nr
  2023-06-20 11:56     ` David Sterba
@ 2023-06-20 12:05       ` Qu Wenruo
  2023-06-20 18:27         ` David Sterba
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2023-06-20 12:05 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs, dsterba



On 2023/6/20 19:56, David Sterba wrote:
> On Tue, Jun 20, 2023 at 07:24:24PM +0800, Qu Wenruo wrote:
>> On 2023/6/20 18:27, David Sterba wrote:
>>> On Tue, Jun 20, 2023 at 05:57:31PM +0800, Qu Wenruo wrote:
>>>> [BUG]
>>>> David reported an ASSERT() get triggered during certain fio load.
>>>>
>>>> The ASSERT() is from rbio_add_bio() of raid56.c:
>>>>
>>>> 	ASSERT(orig_logical >= full_stripe_start &&
>>>> 	       orig_logical + orig_len <= full_stripe_start +
>>>> 	       rbio->nr_data * BTRFS_STRIPE_LEN);
>>>>
>>>> Which is checking if the target rbio is crossing the full stripe
>>>> boundary.
>>>>
>>>> [CAUSE]
>>>> Commit a97699d1d610 ("btrfs: replace map_lookup->stripe_len by
>>>> BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
>>>> u64 division.
>>>>
>>>> Function btrfs_max_io_len() is to get the length to the stripe boundary.
>>>>
>>>> It calculates the full stripe start offset (inside the chunk) by the
>>>> following command:
>>>>
>>>> 		*full_stripe_start =
>>>> 			rounddown(*stripe_nr, nr_data_stripes(map)) <<
>>>> 			BTRFS_STRIPE_LEN_SHIFT;
>>>>
>>>> The calculation itself is fine, but the value returned by rounddown() is
>>>> dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
>>>> returned int).
>>>>
>>>> Thus the result is also u32, then we do the left shift, which can
>>>> overflow u32.
>>>>
>>>> If such overflow happens, @full_stripe_start will be a value way smaller
>>>> than @offset, causing later "full_stripe_len - (offset -
>>>> *full_stripe_start)" to underflow, thus make later length calculation to
>>>> have no stripe boundary limit, resulting a write bio to exceed stripe
>>>> boundary.
>>>>
>>>> There are some other locations like this, with a u32 @stripe_nr got left
>>>> shift, which can lead to a similar overflow.
>>>>
>>>> [FIX]
>>>> Fix all @stripe_nr with left shift with a type cast to u64 before the
>>>> left shift.
>>>>
>>>> Those involved @stripe_nr or similar variables are recording the stripe
>>>> number inside the chunk, which is small enough to be contained by u32,
>>>> but their offset inside the chunk can not fit into u32.
>>>>
>>>> Thus for those specific left shifts, a type cast to u64 is necessary.
>>>>
>>>> Reported-by: David Sterba <dsterba@suse.com>
>>>> Fixes: a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>> Changelog:
>>>> v2:
>>>> - Fix all @stripe_nr with left shift
>>>> - Apply the ASSERT() on full stripe checks for all RAID56 IOs.
>>>> ---
>>>>    fs/btrfs/volumes.c | 15 +++++++++------
>>>>    1 file changed, 9 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index b8540af6e136..ed3765d21cb0 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -5985,12 +5985,12 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
>>>>    	stripe_nr = offset >> BTRFS_STRIPE_LEN_SHIFT;
>>>>
>>>>    	/* stripe_offset is the offset of this block in its stripe */
>>>> -	stripe_offset = offset - (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
>>>> +	stripe_offset = offset - ((u64)stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
>>>
>>> This needs a helper, mandating a type cast for correctness in so many
>>> places is a bad pattern.
>>
>> The problem is, we still need to manually determine if we need a cast or
>> not.
>>
>> For a lot of cases like "for (int i = 0; i < nr_data_stripes; i++) { do
>> with i << BTRFS_STRIPE_LEN_SHIFT;}", it's safe to go with 32 bit and
>> left shift.
> 
> The helper is supposed to avoid deciding if the cast is needed or not,
> so the raw "<< BTRFS_STRIPE_LEN_SHIFT" should be abstracted away
> everywhere and any uncommented occurece considered for closer
> inspection. If you have a specific example where this would not work
> please point to the code.

E.g. for the code inside RAID56 utilizing the left shift, they are all 
safe and no need to do a u64 cast.

Yes, I got your point, but for the bug fix, can we split them into two 
patches?
The first one introduce the helper and fix the 5 call sites, this should 
be very small and easy to backport.

Then the second patch to convert the remaining ones no matter if it's 
safe or not.

Would this be a reasonable solution?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] btrfs: fix u32 overflows when left shifting @stripe_nr
  2023-06-20 12:05       ` Qu Wenruo
@ 2023-06-20 18:27         ` David Sterba
  0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2023-06-20 18:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, dsterba

On Tue, Jun 20, 2023 at 08:05:58PM +0800, Qu Wenruo wrote:
> On 2023/6/20 19:56, David Sterba wrote:
> > On Tue, Jun 20, 2023 at 07:24:24PM +0800, Qu Wenruo wrote:
> >> On 2023/6/20 18:27, David Sterba wrote:
> >>> On Tue, Jun 20, 2023 at 05:57:31PM +0800, Qu Wenruo wrote:
> >>>> ---
> >>>>    fs/btrfs/volumes.c | 15 +++++++++------
> >>>>    1 file changed, 9 insertions(+), 6 deletions(-)
> >>>>
> >>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >>>> index b8540af6e136..ed3765d21cb0 100644
> >>>> --- a/fs/btrfs/volumes.c
> >>>> +++ b/fs/btrfs/volumes.c
> >>>> @@ -5985,12 +5985,12 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
> >>>>    	stripe_nr = offset >> BTRFS_STRIPE_LEN_SHIFT;
> >>>>
> >>>>    	/* stripe_offset is the offset of this block in its stripe */
> >>>> -	stripe_offset = offset - (stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
> >>>> +	stripe_offset = offset - ((u64)stripe_nr << BTRFS_STRIPE_LEN_SHIFT);
> >>>
> >>> This needs a helper, mandating a type cast for correctness in so many
> >>> places is a bad pattern.
> >>
> >> The problem is, we still need to manually determine if we need a cast or
> >> not.
> >>
> >> For a lot of cases like "for (int i = 0; i < nr_data_stripes; i++) { do
> >> with i << BTRFS_STRIPE_LEN_SHIFT;}", it's safe to go with 32 bit and
> >> left shift.
> > 
> > The helper is supposed to avoid deciding if the cast is needed or not,
> > so the raw "<< BTRFS_STRIPE_LEN_SHIFT" should be abstracted away
> > everywhere and any uncommented occurece considered for closer
> > inspection. If you have a specific example where this would not work
> > please point to the code.
> 
> E.g. for the code inside RAID56 utilizing the left shift, they are all 
> safe and no need to do a u64 cast.
> 
> Yes, I got your point, but for the bug fix, can we split them into two 
> patches?
> The first one introduce the helper and fix the 5 call sites, this should 
> be very small and easy to backport.
> 
> Then the second patch to convert the remaining ones no matter if it's 
> safe or not.
> 
> Would this be a reasonable solution?

Yes, given the time constraints it's safer to do a minimal fix. No other
problem has appeared, I tried the workload on 26 devices, some profile
conversions, device deletions, scrub etc.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-20 18:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-20  9:57 [PATCH v2] btrfs: fix u32 overflows when left shifting @stripe_nr Qu Wenruo
2023-06-20 10:27 ` David Sterba
2023-06-20 11:24   ` Qu Wenruo
2023-06-20 11:56     ` David Sterba
2023-06-20 12:05       ` Qu Wenruo
2023-06-20 18:27         ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox