Re: [PATCH 2/4] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Baokun Li <libaokun1@huawei.com>, linux-ext4@vger.kernel.org
Cc: tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz,
	ojaswin@linux.ibm.com, linux-kernel@vger.kernel.org,
	yi.zhang@huawei.com, yangerkun@huawei.com, yukuai3@huawei.com,
	Baokun Li <libaokun1@huawei.com>
Subject: Re: [PATCH 2/4] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
Date: Fri, 21 Jul 2023 01:00:51 +0530	[thread overview]
Message-ID: <87zg3qmlck.fsf@doe.com> (raw)
In-Reply-To: <d75562f3-62fc-c54b-967b-0c2e9571b914@huawei.com>

Baokun Li <libaokun1@huawei.com> writes:

> On 2023/7/20 20:44, Ritesh Harjani (IBM) wrote:
>> Baokun Li <libaokun1@huawei.com> writes:
>>
>>> When we calculate the end position of ext4_free_extent, this position may
>>> be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
>>> ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
>>> computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
>>> the first case of adjusting the best extent, that is, new_bex_end > 0, the
>>> following BUG_ON will be triggered:
>> Nice spotting.
>>
>>> =========================================================
>>> kernel BUG at fs/ext4/mballoc.c:5116!
>>> invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>> CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
>>> RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
>>> Call Trace:
>>>   <TASK>
>>>   ext4_mb_use_best_found+0x203/0x2f0
>>>   ext4_mb_try_best_found+0x163/0x240
>>>   ext4_mb_regular_allocator+0x158/0x1550
>>>   ext4_mb_new_blocks+0x86a/0xe10
>>>   ext4_ext_map_blocks+0xb0c/0x13a0
>>>   ext4_map_blocks+0x2cd/0x8f0
>>>   ext4_iomap_begin+0x27b/0x400
>>>   iomap_iter+0x222/0x3d0
>>>   __iomap_dio_rw+0x243/0xcb0
>>>   iomap_dio_rw+0x16/0x80
>>> =========================================================
>>>
>>> A simple reproducer demonstrating the problem:
>>>
>>> 	mkfs.ext4 -F /dev/sda -b 4096 100M
>>> 	mount /dev/sda /tmp/test
>>> 	fallocate -l1M /tmp/test/tmp
>>> 	fallocate -l10M /tmp/test/file
>>> 	fallocate -i -o 1M -l16777203M /tmp/test/file
>>> 	fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
>>> 	sleep 10 && killall -9 fsstress
>>> 	rm -f /tmp/test/tmp
>>> 	xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
>>
>> Could you please also add it into xfstests?
> Sure！I'll try to push this test case to xfstests.

Thanks that would be great!

>> I think it's a nice test which can check the boundary conditions for
>> start and end of data types used in mballoc. I think it should even work
>> if you don't do fsstress but instead just fallocate some remaining space
>> in filesystem, so that when you open and try to write it to 0th offset,
>> if automatically hits this error in ext4_mb_new_inode_pa().
> Yes, the fsstress here is just to fill up the remaining space on the disk.
>>
>>> We declare new_bex_start and new_bex_end as correct types and use fex_end()
>>> to avoid the problems caused by the ext4_lblk_t overflow above.
>>>
>>> Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
>>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>>> ---
>>>   fs/ext4/mballoc.c | 11 +++++------
>>>   1 file changed, 5 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>> index eb7f5d35ef96..2090e5e7ba58 100644
>>> --- a/fs/ext4/mballoc.c
>>> +++ b/fs/ext4/mballoc.c
>>> @@ -5076,8 +5076,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
>>>   	pa = ac->ac_pa;
>>>   
>>>   	if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
>>> -		int new_bex_start;
>>> -		int new_bex_end;
>>> +		ext4_lblk_t new_bex_start;
>>> +		loff_t new_bex_end;
>>>   
>>>   		/* we can't allocate as much as normalizer wants.
>>>   		 * so, found space must get proper lstart
>>> @@ -5096,8 +5096,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
>>>   		 *    still cover original start
>>>   		 * 3. Else, keep the best ex at start of original request.
>>>   		 */
>>> -		new_bex_end = ac->ac_g_ex.fe_logical +
>>> -			EXT4_C2B(sbi, ac->ac_orig_goal_len);
>>> +		new_bex_end = fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len);
>>>   		new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
>>>   		if (ac->ac_o_ex.fe_logical >= new_bex_start)
>>>   			goto adjust_bex;
>>> @@ -5117,8 +5116,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
>>>   
>>>   		BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
>>>   		BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
>>> -		BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
>>> -				      EXT4_C2B(sbi, ac->ac_orig_goal_len)));
>> Ok so the right hand becomes 0 (because then end can go upto 1<<32 which
>> will be 0 for unsigned int). And the left (new_bex_end) might be
>> negative due to some operations above as I see it.
>> And comparing an int with unsigned int, it will promote new_bex_end to
>> unsigned int which will make it's value too large and will hit the
>> bug_on.
> Exactly!
>>
>> I would like to carefully review all such paths. I will soon review and
>> get back.
> Okay, thank you very much for your careful review.
> The 2nd and 3rd cases of adjusting the best extent are impossible to 
> overflow,
> so only the first case is converted here.

I noticed them too during review. I think it would be safe to make the
changes in other two places as well such that in future we never
trip over such overlooked overflow bugs.

>>
>>
>>> +		BUG_ON(new_bex_end >
>>> +			fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len));
>> I am not sure whether using fex_end or pa_end is any helpful.
>> I think we can just typecast if needed and keep it simple rather
>> than adding helpers functions for addition operation.
>> (because of the fact that fex_end() can take a third parameter which
>> sometimes you pass as NULL. Hence it doesn't look clean, IMO)
> I added the helper functions here for two reasons:
> 1. restricting the type of the return value.
> 2. This avoids the ugly line breaks in most cases.
>
> The fex_end() indeed doesn't look as clean as the pa_end(), because we 
> might use
> the start of the free extent plus some other length to get a new end, 
> like right in
> ext4_mb_new_inode_pa(), which makes me have to add another extra length
> argument, but I think it's worth it, and even with the addition of a 
> parameter
> that will probably be unused, it still looks a lot shorter than the 
> original code.

IMO, we don't need pa_end() and fex_end() at all. In several places in
ext4 we always have taken care by directly typecasting to avoid
overflows. Also it reads much simpler rather to typecast in place than
having a helper function which is also not very elegant due to a third
parameter. Hence I think we should drop those helpers.

Thanks once again for catching the overflows and coming up with a
easy reproducer. I am surprised that this bug was never caught with LTP,
fstests, smatch static checker.
How did you find it? :)

-ritesh

next prev parent reply	other threads:[~2023-07-20 19:31 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-18 13:10 [PATCH 0/4] ext4: fix some ext4_lblk_t overflow issues Baokun Li
2023-07-18 13:10 ` [PATCH 1/4] ext4: add two helper functions fex_end() and pa_end() Baokun Li
2023-07-20 17:48   ` Ritesh Harjani
2023-07-18 13:10 ` [PATCH 2/4] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow Baokun Li
2023-07-20 12:44   ` Ritesh Harjani
2023-07-20 13:42     ` Baokun Li
2023-07-20 19:30       ` Ritesh Harjani [this message]
2023-07-21  7:29         ` Baokun Li
2023-07-21  8:24           ` Ritesh Harjani
2023-07-21  9:13             ` Baokun Li
2023-07-21 17:15               ` Theodore Ts'o
2023-07-22  3:16                 ` Baokun Li
2023-07-20 19:07   ` Ritesh Harjani
2023-07-21  8:01     ` Baokun Li
2023-07-18 13:10 ` [PATCH 3/4] ext4: avoid overlapping preallocations " Baokun Li
2023-07-20 19:24   ` Ritesh Harjani
2023-07-18 13:10 ` [PATCH 4/4] ext4: avoid prealloc space being skipped " Baokun Li
2023-07-20 19:22   ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zg3qmlck.fsf@doe.com \
    --to=ritesh.list@gmail.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.cz \
    --cc=libaokun1@huawei.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=tytso@mit.edu \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox