linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: kernel BUG in zero_user_segments
       [not found] <CADCV8spm=TtW_Lu6p-5q-jdHv1ryLcx45mNBEcYdELbHv_4TnQ@mail.gmail.com>
@ 2025-04-28  8:14 ` Jan Kara
  2025-04-28 12:55   ` Matthew Wilcox
  2025-04-29  7:55   ` Zhang Yi
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Kara @ 2025-04-28  8:14 UTC (permalink / raw)
  To: Liebes Wang
  Cc: Jan Kara, ojaswin, Theodore Ts'o, yi.zhang, Matthew Wilcox,
	linux-fsdevel, syzkaller

On Fri 25-04-25 15:29:41, Liebes Wang wrote:
> Dear Linux maintainers and reviewers:
> We are reporting a Linux kernel bug titled **kernel BUG in
> zero_user_segments**, discovered using a modified version of Syzkaller.
> 
> This bug seems to be duplicated as
> https://syzkaller.appspot.com/bug?extid=78eeb671facb19832e95, but the test
> case is much smaller, which may be helpful for analyzing the bug.
> 
> Linux version: 9d7a0577c9db35c4cc52db90bc415ea248446472
> 
> The bisection log shows the first introduced commit is
> 982bf37da09d078570650b691d9084f43805a5de
> commit 982bf37da09d078570650b691d9084f43805a5de
> Author: Zhang Yi <yi.zhang@huawei.com>
> Date:   Fri Dec 20 09:16:31 2024 +0800
> 
>     ext4: refactor ext4_punch_hole()
> 
>     The current implementation of ext4_punch_hole() contains complex
>     position calculations and stale error tags. To improve the code's
>     clarity and maintainability, it is essential to clean up the code and
>     improve its readability, this can be achieved by: a) simplifying and
>     renaming variables; b) eliminating unnecessary position calculations;
>     c) writing back all data in data=journal mode, and drop page cache from
>     the original offset to the end, rather than using aligned blocks,
>     d) renaming the stale error tags.
> 
>     Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>     Reviewed-by: Jan Kara <jack@suse.cz>
>     Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>     Link:
> https://patch.msgid.link/20241220011637.1157197-5-yi.zhang@huaweicloud.com
>     Signed-off-by: Theodore Ts'o <tytso@mit.edu>

So there's something suspicious about this report. The stacktrace shows
we've crashed in punch hole code (call from ioctl_preallocate()) but the
reproducer actually never calls this. Anyway, the reported stack trace ends
with truncate_inode_partial_folio() -> folio_zero_range() ->
zero_user_segments(). The assertion that's failing is:

BUG_ON(end1 > page_size(page) || end2 > page_size(page));

Now it seems that this assertion can indeed easily trigger when we have
a large folio because truncate_inode_partial_folio() is called to zero out
tail of the whole folio which can certainly be more than a page. Matthew,
am I missing something (I guess I am because otherwise I'd expect we'd be
crashing left and right) or is the folio conversion on this path indeed
broken?

								Honza

> 
> The test case, kernel config and full bisection log are attached.
> 
> The report is (The full report is attached):
> EXT4-fs (loop7): mounted filesystem 00000000-0000-0000-0000-000000000000
> r/w without journal. Quota mode: writeback.
> EXT4-fs warning (device loop7): ext4_block_to_path:105: block 2147483648 >
> max in inode 15
> ------------[ cut here ]------------
> kernel BUG at ./include/linux/highmem.h:275!
> Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
> CPU: 0 UID: 0 PID: 6795 Comm: syz.7.479 Not tainted
> 6.15.0-rc3-g9d7a0577c9db #1 PREEMPT(voluntary)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1ubuntu1.1 04/01/2014
> RIP: 0010:zero_user_segments.constprop.0+0x10c/0x290
> include/linux/highmem.h:275
> Code: 0f b6 4b 40 ba 00 10 00 00 48 d3 e2 49 89 d7 e8 ba d5 e2 ff 4c 89 fe
> 4c 89 ef e8 3f d0 e2 ff 4d 39 fd 76 08 e8 a5 d5 e2 ff 90 <0f> 0b e8 9d d5
> e2 ff be 08 00 00 00 48 89 df e8 a0 9c 1d 00 48 89
> RSP: 0018:ffff8881235ff678 EFLAGS: 00010216
> RAX: 000000000000025d RBX: ffffea00056071c0 RCX: ffffc90002e0b000
> RDX: 0000000000080000 RSI: ffffffff818f7b0b RDI: 0000000000000006
> RBP: 000000000040b000 R08: 0000000000000000 R09: fffff94000ac0e38
> R10: 0000000000001000 R11: 0000000000000000 R12: 0000000000000005
> R13: 000000000040b000 R14: 0000000000000000 R15: 0000000000001000
> FS:  00007fecef19d700(0000) GS:ffff888543948000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f5e38b40008 CR3: 000000013ebaa001 CR4: 0000000000770ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> PKRU: 80000000
> Call Trace:
>  <TASK>
>  folio_zero_range include/linux/highmem.h:647 [inline]
>  truncate_inode_partial_folio+0x6da/0xbd0 mm/truncate.c:219
>  truncate_inode_pages_range+0x3fc/0xcc0 mm/truncate.c:387
>  ext4_truncate_page_cache_block_range+0xb3/0x5c0 fs/ext4/inode.c:3974
>  ext4_punch_hole+0x2cd/0xec0 fs/ext4/inode.c:4049
>  ext4_fallocate+0x128d/0x32c0 fs/ext4/extents.c:4766
>  vfs_fallocate+0x3ed/0xd70 fs/open.c:338
>  ioctl_preallocate+0x190/0x200 fs/ioctl.c:290
>  file_ioctl fs/ioctl.c:333 [inline]
>  do_vfs_ioctl+0x149c/0x1850 fs/ioctl.c:885
>  __do_sys_ioctl fs/ioctl.c:904 [inline]
>  __se_sys_ioctl fs/ioctl.c:892 [inline]
>  __x64_sys_ioctl+0x11f/0x200 fs/ioctl.c:892
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xc1/0x1d0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f

> syz_mount_image$ext4(&(0x7f0000000400)='ext4\x00', &(0x7f00000001c0)='./file0\x00', 0x0, &(0x7f0000000280)={[{@journal_ioprio}, {@mb_optimize_scan}, {@data_err_ignore}, {@grpquota}, {@barrier}]}, 0x1, 0x3cb, &(0x7f00000026c0)="$eJzs3M9rHFUcAPDvTH61aXUjeBC9LAgaELPZpFoFRQUFD55sLx48LLtpLW4aabZgSw4VPHnVf0AE79V/QBDFmzdvgmBFKRRJe/K0Mrsz6ZrsxsTduEn6+cBj35uZzXvfndnhO5OdF8ADqxwRr0bEREQsRUQpX57mJW50S7bdvc2NelaSaLfP/ZlEEhF3Nzfqxd9K8tdTeWM+jUg/jnjixs5+169df7/WbK5cyduV1uoHlfVr15+9tFq7uHJx5XL1hRefX1o+Wz1zdmSx3vz53PKv377+w1e/PfXTj+2Xv8jGezpf1xvHqJSjvPWZbPfcqDsbs+lxDwAAgD1J89x/spP/l2KiU+sqRWVjrIMDAAAARqL9Sv4KAAAAHGOJa38AAAA45orfAdzd3KgXZYw/R/jf3XktIua68RfPN3fXTMaJfJupA3y+tRwRJ95uvJOVOKDnkAEAen2T5T+L/fK/NB7r2W4my1Mi4uSI+y9va+/Mf9LbI+7yH7L876WeuW3u9cSfm5vIWw91UsWp5MKl5spiRDwcEfMxNZO1q7v0cWvmk5lB63rzv6xk/Re5YD6O25Pb3t2otWrDxNzrzkcRj0/2iz/Zyn+TiJgdoo8v/7p5ddC6f4//YLU/j3i67/6/P3NPsvv8RJXO8VApjoqdbq3+8u6g/scdf7b/Z3ePfy7pna9pff99/L54frVT6XPy+K/H/3RyvlMvrss+rLVaV6oR08lbO5cv3X9v0S62z+Kff7L/9784/yX5nFan83PAfn339XufDlp3GPZ/Y1/7f/+VN978fojvf7b/z3Rq8/mSvZz/9jrAYT47AAAAOCrSzn2NJF3YqqfpwkL3fsejMZs219Zbz1xYu3q50b3/MRdTaXGnq9RzP7Ta/Tf6VntpW3s5Ih6JiM9KJzvthfpaszHu4AEAAOABcWrA9X/mj9K4RwcAAACMzNy4BwAAAAAcONf/AAAAcKwNM6+fymGv1ONQDEPlCFbGfWYCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA42v4OAAD///tZxK0=")
> quotactl_fd$Q_SETINFO(0xffffffffffffffff, 0x2, 0x0, &(0x7f0000000080)={0x80000000000002, 0x80000000005, 0x1, 0x6})
> r0 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file1\x00', 0x42, 0x1ff)
> ioctl$EXT4_IOC_CHECKPOINT(r0, 0x40305829, &(0x7f0000000080)=0x5)
> r1 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file1\x00', 0x42, 0x1ff)
> ioctl$EXT4_IOC_CHECKPOINT(r1, 0x40305829, &(0x7f0000000080)=0x5)





-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG in zero_user_segments
  2025-04-28  8:14 ` kernel BUG in zero_user_segments Jan Kara
@ 2025-04-28 12:55   ` Matthew Wilcox
  2025-04-29  7:55   ` Zhang Yi
  1 sibling, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2025-04-28 12:55 UTC (permalink / raw)
  To: Jan Kara
  Cc: Liebes Wang, ojaswin, Theodore Ts'o, yi.zhang, linux-fsdevel,
	syzkaller

On Mon, Apr 28, 2025 at 10:14:10AM +0200, Jan Kara wrote:
> So there's something suspicious about this report. The stacktrace shows
> we've crashed in punch hole code (call from ioctl_preallocate()) but the
> reproducer actually never calls this. Anyway, the reported stack trace ends
> with truncate_inode_partial_folio() -> folio_zero_range() ->
> zero_user_segments(). The assertion that's failing is:
> 
> BUG_ON(end1 > page_size(page) || end2 > page_size(page));
> 
> Now it seems that this assertion can indeed easily trigger when we have
> a large folio because truncate_inode_partial_folio() is called to zero out
> tail of the whole folio which can certainly be more than a page. Matthew,
> am I missing something (I guess I am because otherwise I'd expect we'd be
> crashing left and right) or is the folio conversion on this path indeed
> broken?

page_size(page) is not PAGE_SIZE (necessarily).  It's from the bad
old days (ie 2019) when we didn't have folios yet.  We haven't yet
got round to removing zero_user_segments() and related functions, so
folio_zero_range() is still implemented in terms of it.

Anyway, ext4 doesn't have large folio support, so all this is really
telling you is that the truncation path called folio_zero_range() with
bad arguments.

But I'm not sure I see how.  truncate_inode_pages_range() takes (mapping,
start, end) and looks up the folios and calculates everything there.
You'd think if there were a bug in the calculations we'd see it by now,
and in any case it wouldn't be bisectable to an ext4 commit.

It does look like there's a deliberately-corrupt ext4 image involved
here, but I'm not sure how that could upset the page cache like this.

> 								Honza
> 
> > 
> > The test case, kernel config and full bisection log are attached.
> > 
> > The report is (The full report is attached):
> > EXT4-fs (loop7): mounted filesystem 00000000-0000-0000-0000-000000000000
> > r/w without journal. Quota mode: writeback.
> > EXT4-fs warning (device loop7): ext4_block_to_path:105: block 2147483648 >
> > max in inode 15
> > ------------[ cut here ]------------
> > kernel BUG at ./include/linux/highmem.h:275!
> > Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
> > CPU: 0 UID: 0 PID: 6795 Comm: syz.7.479 Not tainted
> > 6.15.0-rc3-g9d7a0577c9db #1 PREEMPT(voluntary)
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > 1.13.0-1ubuntu1.1 04/01/2014
> > RIP: 0010:zero_user_segments.constprop.0+0x10c/0x290
> > include/linux/highmem.h:275
> > Code: 0f b6 4b 40 ba 00 10 00 00 48 d3 e2 49 89 d7 e8 ba d5 e2 ff 4c 89 fe
> > 4c 89 ef e8 3f d0 e2 ff 4d 39 fd 76 08 e8 a5 d5 e2 ff 90 <0f> 0b e8 9d d5
> > e2 ff be 08 00 00 00 48 89 df e8 a0 9c 1d 00 48 89
> > RSP: 0018:ffff8881235ff678 EFLAGS: 00010216
> > RAX: 000000000000025d RBX: ffffea00056071c0 RCX: ffffc90002e0b000
> > RDX: 0000000000080000 RSI: ffffffff818f7b0b RDI: 0000000000000006
> > RBP: 000000000040b000 R08: 0000000000000000 R09: fffff94000ac0e38
> > R10: 0000000000001000 R11: 0000000000000000 R12: 0000000000000005
> > R13: 000000000040b000 R14: 0000000000000000 R15: 0000000000001000
> > FS:  00007fecef19d700(0000) GS:ffff888543948000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f5e38b40008 CR3: 000000013ebaa001 CR4: 0000000000770ef0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> > PKRU: 80000000
> > Call Trace:
> >  <TASK>
> >  folio_zero_range include/linux/highmem.h:647 [inline]
> >  truncate_inode_partial_folio+0x6da/0xbd0 mm/truncate.c:219
> >  truncate_inode_pages_range+0x3fc/0xcc0 mm/truncate.c:387
> >  ext4_truncate_page_cache_block_range+0xb3/0x5c0 fs/ext4/inode.c:3974
> >  ext4_punch_hole+0x2cd/0xec0 fs/ext4/inode.c:4049
> >  ext4_fallocate+0x128d/0x32c0 fs/ext4/extents.c:4766
> >  vfs_fallocate+0x3ed/0xd70 fs/open.c:338
> >  ioctl_preallocate+0x190/0x200 fs/ioctl.c:290
> >  file_ioctl fs/ioctl.c:333 [inline]
> >  do_vfs_ioctl+0x149c/0x1850 fs/ioctl.c:885
> >  __do_sys_ioctl fs/ioctl.c:904 [inline]
> >  __se_sys_ioctl fs/ioctl.c:892 [inline]
> >  __x64_sys_ioctl+0x11f/0x200 fs/ioctl.c:892
> >  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >  do_syscall_64+0xc1/0x1d0 arch/x86/entry/syscall_64.c:94
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> > syz_mount_image$ext4(&(0x7f0000000400)='ext4\x00', &(0x7f00000001c0)='./file0\x00', 0x0, &(0x7f0000000280)={[{@journal_ioprio}, {@mb_optimize_scan}, {@data_err_ignore}, {@grpquota}, {@barrier}]}, 0x1, 0x3cb, &(0x7f00000026c0)="$eJzs3M9rHFUcAPDvTH61aXUjeBC9LAgaELPZpFoFRQUFD55sLx48LLtpLW4aabZgSw4VPHnVf0AE79V/QBDFmzdvgmBFKRRJe/K0Mrsz6ZrsxsTduEn6+cBj35uZzXvfndnhO5OdF8ADqxwRr0bEREQsRUQpX57mJW50S7bdvc2NelaSaLfP/ZlEEhF3Nzfqxd9K8tdTeWM+jUg/jnjixs5+169df7/WbK5cyduV1uoHlfVr15+9tFq7uHJx5XL1hRefX1o+Wz1zdmSx3vz53PKv377+w1e/PfXTj+2Xv8jGezpf1xvHqJSjvPWZbPfcqDsbs+lxDwAAgD1J89x/spP/l2KiU+sqRWVjrIMDAAAARqL9Sv4KAAAAHGOJa38AAAA45orfAdzd3KgXZYw/R/jf3XktIua68RfPN3fXTMaJfJupA3y+tRwRJ95uvJOVOKDnkAEAen2T5T+L/fK/NB7r2W4my1Mi4uSI+y9va+/Mf9LbI+7yH7L876WeuW3u9cSfm5vIWw91UsWp5MKl5spiRDwcEfMxNZO1q7v0cWvmk5lB63rzv6xk/Re5YD6O25Pb3t2otWrDxNzrzkcRj0/2iz/Zyn+TiJgdoo8v/7p5ddC6f4//YLU/j3i67/6/P3NPsvv8RJXO8VApjoqdbq3+8u6g/scdf7b/Z3ePfy7pna9pff99/L54frVT6XPy+K/H/3RyvlMvrss+rLVaV6oR08lbO5cv3X9v0S62z+Kff7L/9784/yX5nFan83PAfn339XufDlp3GPZ/Y1/7f/+VN978fojvf7b/z3Rq8/mSvZz/9jrAYT47AAAAOCrSzn2NJF3YqqfpwkL3fsejMZs219Zbz1xYu3q50b3/MRdTaXGnq9RzP7Ta/Tf6VntpW3s5Ih6JiM9KJzvthfpaszHu4AEAAOABcWrA9X/mj9K4RwcAAACMzNy4BwAAAAAcONf/AAAAcKwNM6+fymGv1ONQDEPlCFbGfWYCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA42v4OAAD///tZxK0=")
> > quotactl_fd$Q_SETINFO(0xffffffffffffffff, 0x2, 0x0, &(0x7f0000000080)={0x80000000000002, 0x80000000005, 0x1, 0x6})
> > r0 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file1\x00', 0x42, 0x1ff)
> > ioctl$EXT4_IOC_CHECKPOINT(r0, 0x40305829, &(0x7f0000000080)=0x5)
> > r1 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file1\x00', 0x42, 0x1ff)
> > ioctl$EXT4_IOC_CHECKPOINT(r1, 0x40305829, &(0x7f0000000080)=0x5)
> 
> 
> 
> 
> 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG in zero_user_segments
  2025-04-28  8:14 ` kernel BUG in zero_user_segments Jan Kara
  2025-04-28 12:55   ` Matthew Wilcox
@ 2025-04-29  7:55   ` Zhang Yi
       [not found]     ` <CADCV8spSjWbnr_cUTzcB=zn0M92s_AhRx-byz0A8zZZa4cZ=Lg@mail.gmail.com>
  2025-04-30  3:14     ` Matthew Wilcox
  1 sibling, 2 replies; 9+ messages in thread
From: Zhang Yi @ 2025-04-29  7:55 UTC (permalink / raw)
  To: Liebes Wang, Jan Kara
  Cc: ojaswin, Theodore Ts'o, Matthew Wilcox, linux-fsdevel,
	syzkaller, Ext4 Developers List

On 2025/4/28 16:14, Jan Kara wrote:
> On Fri 25-04-25 15:29:41, Liebes Wang wrote:
>> Dear Linux maintainers and reviewers:
>> We are reporting a Linux kernel bug titled **kernel BUG in
>> zero_user_segments**, discovered using a modified version of Syzkaller.
>>
>> This bug seems to be duplicated as
>> https://syzkaller.appspot.com/bug?extid=78eeb671facb19832e95, but the test
>> case is much smaller, which may be helpful for analyzing the bug.
>>
>> Linux version: 9d7a0577c9db35c4cc52db90bc415ea248446472
>>
>> The bisection log shows the first introduced commit is
>> 982bf37da09d078570650b691d9084f43805a5de
>> commit 982bf37da09d078570650b691d9084f43805a5de
>> Author: Zhang Yi <yi.zhang@huawei.com>
>> Date:   Fri Dec 20 09:16:31 2024 +0800
>>
>>     ext4: refactor ext4_punch_hole()
>>
>>     The current implementation of ext4_punch_hole() contains complex
>>     position calculations and stale error tags. To improve the code's
>>     clarity and maintainability, it is essential to clean up the code and
>>     improve its readability, this can be achieved by: a) simplifying and
>>     renaming variables; b) eliminating unnecessary position calculations;
>>     c) writing back all data in data=journal mode, and drop page cache from
>>     the original offset to the end, rather than using aligned blocks,
>>     d) renaming the stale error tags.
>>
>>     Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>     Reviewed-by: Jan Kara <jack@suse.cz>
>>     Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>>     Link:
>> https://patch.msgid.link/20241220011637.1157197-5-yi.zhang@huaweicloud.com
>>     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> 
> So there's something suspicious about this report. The stacktrace shows
> we've crashed in punch hole code (call from ioctl_preallocate()) but the
> reproducer actually never calls this. Anyway, the reported stack trace ends
> with truncate_inode_partial_folio() -> folio_zero_range() ->
> zero_user_segments(). The assertion that's failing is:
> 
> BUG_ON(end1 > page_size(page) || end2 > page_size(page));

After debugging, I found that this problem is caused by punching a hole
with an offset variable larger than max_end on a corrupted ext4 inode,
whose i_size is larger than maxbyte. It will result in a negative length
in the truncate_inode_partial_folio(), which will trigger this problem.

Hi, Liebes!

Thank you for the report. Could you please try the patch below? I have
tested it, and it resolves this issue on my machine.

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 94c7d2d828a6..4ec4a80b6879 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4016,7 +4016,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
	WARN_ON_ONCE(!inode_is_locked(inode));

	/* No need to punch hole beyond i_size */
-       if (offset >= inode->i_size)
+       if (offset >= inode->i_size || offset >= max_end)
		return 0;

	/*

BTW, I also found that the calculation of the max_end variable in
ext4_punch_hole() is wrong for extent inodes. It should be
inode->i_sb->s_maxbytes - sb->s_blocksize instead of
s_bitmap_maxbytes - sb->s_blocksize. I will fix it together.

Thanks,
Yi.

> 
> Now it seems that this assertion can indeed easily trigger when we have
> a large folio because truncate_inode_partial_folio() is called to zero out
> tail of the whole folio which can certainly be more than a page. Matthew,
> am I missing something (I guess I am because otherwise I'd expect we'd be
> crashing left and right) or is the folio conversion on this path indeed
> broken?
> 
> 								Honza
> 
>>
>> The test case, kernel config and full bisection log are attached.
>>
>> The report is (The full report is attached):
>> EXT4-fs (loop7): mounted filesystem 00000000-0000-0000-0000-000000000000
>> r/w without journal. Quota mode: writeback.
>> EXT4-fs warning (device loop7): ext4_block_to_path:105: block 2147483648 >
>> max in inode 15
>> ------------[ cut here ]------------
>> kernel BUG at ./include/linux/highmem.h:275!
>> Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
>> CPU: 0 UID: 0 PID: 6795 Comm: syz.7.479 Not tainted
>> 6.15.0-rc3-g9d7a0577c9db #1 PREEMPT(voluntary)
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 1.13.0-1ubuntu1.1 04/01/2014
>> RIP: 0010:zero_user_segments.constprop.0+0x10c/0x290
>> include/linux/highmem.h:275
>> Code: 0f b6 4b 40 ba 00 10 00 00 48 d3 e2 49 89 d7 e8 ba d5 e2 ff 4c 89 fe
>> 4c 89 ef e8 3f d0 e2 ff 4d 39 fd 76 08 e8 a5 d5 e2 ff 90 <0f> 0b e8 9d d5
>> e2 ff be 08 00 00 00 48 89 df e8 a0 9c 1d 00 48 89
>> RSP: 0018:ffff8881235ff678 EFLAGS: 00010216
>> RAX: 000000000000025d RBX: ffffea00056071c0 RCX: ffffc90002e0b000
>> RDX: 0000000000080000 RSI: ffffffff818f7b0b RDI: 0000000000000006
>> RBP: 000000000040b000 R08: 0000000000000000 R09: fffff94000ac0e38
>> R10: 0000000000001000 R11: 0000000000000000 R12: 0000000000000005
>> R13: 000000000040b000 R14: 0000000000000000 R15: 0000000000001000
>> FS:  00007fecef19d700(0000) GS:ffff888543948000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f5e38b40008 CR3: 000000013ebaa001 CR4: 0000000000770ef0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>> PKRU: 80000000
>> Call Trace:
>>  <TASK>
>>  folio_zero_range include/linux/highmem.h:647 [inline]
>>  truncate_inode_partial_folio+0x6da/0xbd0 mm/truncate.c:219
>>  truncate_inode_pages_range+0x3fc/0xcc0 mm/truncate.c:387
>>  ext4_truncate_page_cache_block_range+0xb3/0x5c0 fs/ext4/inode.c:3974
>>  ext4_punch_hole+0x2cd/0xec0 fs/ext4/inode.c:4049
>>  ext4_fallocate+0x128d/0x32c0 fs/ext4/extents.c:4766
>>  vfs_fallocate+0x3ed/0xd70 fs/open.c:338
>>  ioctl_preallocate+0x190/0x200 fs/ioctl.c:290
>>  file_ioctl fs/ioctl.c:333 [inline]
>>  do_vfs_ioctl+0x149c/0x1850 fs/ioctl.c:885
>>  __do_sys_ioctl fs/ioctl.c:904 [inline]
>>  __se_sys_ioctl fs/ioctl.c:892 [inline]
>>  __x64_sys_ioctl+0x11f/0x200 fs/ioctl.c:892
>>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>>  do_syscall_64+0xc1/0x1d0 arch/x86/entry/syscall_64.c:94
>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
>> syz_mount_image$ext4(&(0x7f0000000400)='ext4\x00', &(0x7f00000001c0)='./file0\x00', 0x0, &(0x7f0000000280)={[{@journal_ioprio}, {@mb_optimize_scan}, {@data_err_ignore}, {@grpquota}, {@barrier}]}, 0x1, 0x3cb, &(0x7f00000026c0)="$eJzs3M9rHFUcAPDvTH61aXUjeBC9LAgaELPZpFoFRQUFD55sLx48LLtpLW4aabZgSw4VPHnVf0AE79V/QBDFmzdvgmBFKRRJe/K0Mrsz6ZrsxsTduEn6+cBj35uZzXvfndnhO5OdF8ADqxwRr0bEREQsRUQpX57mJW50S7bdvc2NelaSaLfP/ZlEEhF3Nzfqxd9K8tdTeWM+jUg/jnjixs5+169df7/WbK5cyduV1uoHlfVr15+9tFq7uHJx5XL1hRefX1o+Wz1zdmSx3vz53PKv377+w1e/PfXTj+2Xv8jGezpf1xvHqJSjvPWZbPfcqDsbs+lxDwAAgD1J89x/spP/l2KiU+sqRWVjrIMDAAAARqL9Sv4KAAAAHGOJa38AAAA45orfAdzd3KgXZYw/R/jf3XktIua68RfPN3fXTMaJfJupA3y+tRwRJ95uvJOVOKDnkAEAen2T5T+L/fK/NB7r2W4my1Mi4uSI+y9va+/Mf9LbI+7yH7L876WeuW3u9cSfm5vIWw91UsWp5MKl5spiRDwcEfMxNZO1q7v0cWvmk5lB63rzv6xk/Re5YD6O25Pb3t2otWrDxNzrzkcRj0/2iz/Zyn+TiJgdoo8v/7p5ddC6f4//YLU/j3i67/6/P3NPsvv8RJXO8VApjoqdbq3+8u6g/scdf7b/Z3ePfy7pna9pff99/L54frVT6XPy+K/H/3RyvlMvrss+rLVaV6oR08lbO5cv3X9v0S62z+Kff7L/9784/yX5nFan83PAfn339XufDlp3GPZ/Y1/7f/+VN978fojvf7b/z3Rq8/mSvZz/9jrAYT47AAAAOCrSzn2NJF3YqqfpwkL3fsejMZs219Zbz1xYu3q50b3/MRdTaXGnq9RzP7Ta/Tf6VntpW3s5Ih6JiM9KJzvthfpaszHu4AEAAOABcWrA9X/mj9K4RwcAAACMzNy4BwAAAAAcONf/AAAAcKwNM6+fymGv1ONQDEPlCFbGfWYCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA42v4OAAD///tZxK0=")
>> quotactl_fd$Q_SETINFO(0xffffffffffffffff, 0x2, 0x0, &(0x7f0000000080)={0x80000000000002, 0x80000000005, 0x1, 0x6})
>> r0 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file1\x00', 0x42, 0x1ff)
>> ioctl$EXT4_IOC_CHECKPOINT(r0, 0x40305829, &(0x7f0000000080)=0x5)
>> r1 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file1\x00', 0x42, 0x1ff)
>> ioctl$EXT4_IOC_CHECKPOINT(r1, 0x40305829, &(0x7f0000000080)=0x5)
> 
> 
> 
> 
> 


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: kernel BUG in zero_user_segments
       [not found]     ` <CADCV8spSjWbnr_cUTzcB=zn0M92s_AhRx-byz0A8zZZa4cZ=Lg@mail.gmail.com>
@ 2025-04-30  1:16       ` Zhang Yi
  0 siblings, 0 replies; 9+ messages in thread
From: Zhang Yi @ 2025-04-30  1:16 UTC (permalink / raw)
  To: Liebes Wang
  Cc: Jan Kara, ojaswin, Theodore Ts'o, Matthew Wilcox,
	linux-fsdevel, syzkaller, Ext4 Developers List

On 2025/4/29 16:17, Liebes Wang wrote:
> Hi Yi,
> 
> I’ve tested the patch on kernel version |9d7a0577c9db35c4cc52db90bc415ea248446472|, and it indeed resolves the issue. The crash no longer occurs.
> 

Thank you for the test. I will send out the fix after completing all tests.

Yi.

> Best regards,
> Liebes
> 
> 
> Zhang Yi <yi.zhang@huawei.com <mailto:yi.zhang@huawei.com>> 于2025年4月29日周二 15:55写道:
> 
>     On 2025/4/28 16:14, Jan Kara wrote:
>     > On Fri 25-04-25 15:29:41, Liebes Wang wrote:
>     >> Dear Linux maintainers and reviewers:
>     >> We are reporting a Linux kernel bug titled **kernel BUG in
>     >> zero_user_segments**, discovered using a modified version of Syzkaller.
>     >>
>     >> This bug seems to be duplicated as
>     >> https://syzkaller.appspot.com/bug?extid=78eeb671facb19832e95 <https://syzkaller.appspot.com/bug?extid=78eeb671facb19832e95>, but the test
>     >> case is much smaller, which may be helpful for analyzing the bug.
>     >>
>     >> Linux version: 9d7a0577c9db35c4cc52db90bc415ea248446472
>     >>
>     >> The bisection log shows the first introduced commit is
>     >> 982bf37da09d078570650b691d9084f43805a5de
>     >> commit 982bf37da09d078570650b691d9084f43805a5de
>     >> Author: Zhang Yi <yi.zhang@huawei.com <mailto:yi.zhang@huawei.com>>
>     >> Date:   Fri Dec 20 09:16:31 2024 +0800
>     >>
>     >>     ext4: refactor ext4_punch_hole()
>     >>
>     >>     The current implementation of ext4_punch_hole() contains complex
>     >>     position calculations and stale error tags. To improve the code's
>     >>     clarity and maintainability, it is essential to clean up the code and
>     >>     improve its readability, this can be achieved by: a) simplifying and
>     >>     renaming variables; b) eliminating unnecessary position calculations;
>     >>     c) writing back all data in data=journal mode, and drop page cache from
>     >>     the original offset to the end, rather than using aligned blocks,
>     >>     d) renaming the stale error tags.
>     >>
>     >>     Signed-off-by: Zhang Yi <yi.zhang@huawei.com <mailto:yi.zhang@huawei.com>>
>     >>     Reviewed-by: Jan Kara <jack@suse.cz <mailto:jack@suse.cz>>
>     >>     Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com <mailto:ojaswin@linux.ibm.com>>
>     >>     Link:
>     >> https://patch.msgid.link/20241220011637.1157197-5-yi.zhang@huaweicloud.com <https://patch.msgid.link/20241220011637.1157197-5-yi.zhang@huaweicloud.com>
>     >>     Signed-off-by: Theodore Ts'o <tytso@mit.edu <mailto:tytso@mit.edu>>
>     >
>     > So there's something suspicious about this report. The stacktrace shows
>     > we've crashed in punch hole code (call from ioctl_preallocate()) but the
>     > reproducer actually never calls this. Anyway, the reported stack trace ends
>     > with truncate_inode_partial_folio() -> folio_zero_range() ->
>     > zero_user_segments(). The assertion that's failing is:
>     >
>     > BUG_ON(end1 > page_size(page) || end2 > page_size(page));
> 
>     After debugging, I found that this problem is caused by punching a hole
>     with an offset variable larger than max_end on a corrupted ext4 inode,
>     whose i_size is larger than maxbyte. It will result in a negative length
>     in the truncate_inode_partial_folio(), which will trigger this problem.
> 
>     Hi, Liebes!
> 
>     Thank you for the report. Could you please try the patch below? I have
>     tested it, and it resolves this issue on my machine.
> 
>     diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>     index 94c7d2d828a6..4ec4a80b6879 100644
>     --- a/fs/ext4/inode.c
>     +++ b/fs/ext4/inode.c
>     @@ -4016,7 +4016,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>             WARN_ON_ONCE(!inode_is_locked(inode));
> 
>             /* No need to punch hole beyond i_size */
>     -       if (offset >= inode->i_size)
>     +       if (offset >= inode->i_size || offset >= max_end)
>                     return 0;
> 
>             /*
> 
>     BTW, I also found that the calculation of the max_end variable in
>     ext4_punch_hole() is wrong for extent inodes. It should be
>     inode->i_sb->s_maxbytes - sb->s_blocksize instead of
>     s_bitmap_maxbytes - sb->s_blocksize. I will fix it together.
> 
>     Thanks,
>     Yi.
> 
>     >
>     > Now it seems that this assertion can indeed easily trigger when we have
>     > a large folio because truncate_inode_partial_folio() is called to zero out
>     > tail of the whole folio which can certainly be more than a page. Matthew,
>     > am I missing something (I guess I am because otherwise I'd expect we'd be
>     > crashing left and right) or is the folio conversion on this path indeed
>     > broken?
>     >
>     >                                                               Honza
>     >
>     >>
>     >> The test case, kernel config and full bisection log are attached.
>     >>
>     >> The report is (The full report is attached):
>     >> EXT4-fs (loop7): mounted filesystem 00000000-0000-0000-0000-000000000000
>     >> r/w without journal. Quota mode: writeback.
>     >> EXT4-fs warning (device loop7): ext4_block_to_path:105: block 2147483648 >
>     >> max in inode 15
>     >> ------------[ cut here ]------------
>     >> kernel BUG at ./include/linux/highmem.h:275!
>     >> Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
>     >> CPU: 0 UID: 0 PID: 6795 Comm: syz.7.479 Not tainted
>     >> 6.15.0-rc3-g9d7a0577c9db #1 PREEMPT(voluntary)
>     >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>     >> 1.13.0-1ubuntu1.1 04/01/2014
>     >> RIP: 0010:zero_user_segments.constprop.0+0x10c/0x290
>     >> include/linux/highmem.h:275
>     >> Code: 0f b6 4b 40 ba 00 10 00 00 48 d3 e2 49 89 d7 e8 ba d5 e2 ff 4c 89 fe
>     >> 4c 89 ef e8 3f d0 e2 ff 4d 39 fd 76 08 e8 a5 d5 e2 ff 90 <0f> 0b e8 9d d5
>     >> e2 ff be 08 00 00 00 48 89 df e8 a0 9c 1d 00 48 89
>     >> RSP: 0018:ffff8881235ff678 EFLAGS: 00010216
>     >> RAX: 000000000000025d RBX: ffffea00056071c0 RCX: ffffc90002e0b000
>     >> RDX: 0000000000080000 RSI: ffffffff818f7b0b RDI: 0000000000000006
>     >> RBP: 000000000040b000 R08: 0000000000000000 R09: fffff94000ac0e38
>     >> R10: 0000000000001000 R11: 0000000000000000 R12: 0000000000000005
>     >> R13: 000000000040b000 R14: 0000000000000000 R15: 0000000000001000
>     >> FS:  00007fecef19d700(0000) GS:ffff888543948000(0000) knlGS:0000000000000000
>     >> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     >> CR2: 00007f5e38b40008 CR3: 000000013ebaa001 CR4: 0000000000770ef0
>     >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>     >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>     >> PKRU: 80000000
>     >> Call Trace:
>     >>  <TASK>
>     >>  folio_zero_range include/linux/highmem.h:647 [inline]
>     >>  truncate_inode_partial_folio+0x6da/0xbd0 mm/truncate.c:219
>     >>  truncate_inode_pages_range+0x3fc/0xcc0 mm/truncate.c:387
>     >>  ext4_truncate_page_cache_block_range+0xb3/0x5c0 fs/ext4/inode.c:3974
>     >>  ext4_punch_hole+0x2cd/0xec0 fs/ext4/inode.c:4049
>     >>  ext4_fallocate+0x128d/0x32c0 fs/ext4/extents.c:4766
>     >>  vfs_fallocate+0x3ed/0xd70 fs/open.c:338
>     >>  ioctl_preallocate+0x190/0x200 fs/ioctl.c:290
>     >>  file_ioctl fs/ioctl.c:333 [inline]
>     >>  do_vfs_ioctl+0x149c/0x1850 fs/ioctl.c:885
>     >>  __do_sys_ioctl fs/ioctl.c:904 [inline]
>     >>  __se_sys_ioctl fs/ioctl.c:892 [inline]
>     >>  __x64_sys_ioctl+0x11f/0x200 fs/ioctl.c:892
>     >>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>     >>  do_syscall_64+0xc1/0x1d0 arch/x86/entry/syscall_64.c:94
>     >>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>     >
>     >> syz_mount_image$ext4(&(0x7f0000000400)='ext4\x00', &(0x7f00000001c0)='./file0\x00', 0x0, &(0x7f0000000280)={[{@journal_ioprio}, {@mb_optimize_scan}, {@data_err_ignore}, {@grpquota}, {@barrier}]}, 0x1, 0x3cb, &(0x7f00000026c0)="$eJzs3M9rHFUcAPDvTH61aXUjeBC9LAgaELPZpFoFRQUFD55sLx48LLtpLW4aabZgSw4VPHnVf0AE79V/QBDFmzdvgmBFKRRJe/K0Mrsz6ZrsxsTduEn6+cBj35uZzXvfndnhO5OdF8ADqxwRr0bEREQsRUQpX57mJW50S7bdvc2NelaSaLfP/ZlEEhF3Nzfqxd9K8tdTeWM+jUg/jnjixs5+169df7/WbK5cyduV1uoHlfVr15+9tFq7uHJx5XL1hRefX1o+Wz1zdmSx3vz53PKv377+w1e/PfXTj+2Xv8jGezpf1xvHqJSjvPWZbPfcqDsbs+lxDwAAgD1J89x/spP/l2KiU+sqRWVjrIMDAAAARqL9Sv4KAAAAHGOJa38AAAA45orfAdzd3KgXZYw/R/jf3XktIua68RfPN3fXTMaJfJupA3y+tRwRJ95uvJOVOKDnkAEAen2T5T+L/fK/NB7r2W4my1Mi4uSI+y9va+/Mf9LbI+7yH7L876WeuW3u9cSfm5vIWw91UsWp5MKl5spiRDwcEfMxNZO1q7v0cWvmk5lB63rzv6xk/Re5YD6O25Pb3t2otWrDxNzrzkcRj0/2iz/Zyn+TiJgdoo8v/7p5ddC6f4//YLU/j3i67/6/P3NPsvv8RJXO8VApjoqdbq3+8u6g/scdf7b/Z3ePfy7pna9pff99/L54frVT6XPy+K/
>     H/3RyvlMvrss+rLVaV6oR08lbO5cv3X9v0S62z+Kff7L/9784/yX5nFan83PAfn339XufDlp3GPZ/Y1/7f/+VN978fojvf7b/z3Rq8/mSvZz/9jrAYT47AAAAOCrSzn2NJF3YqqfpwkL3fsejMZs219Zbz1xYu3q50b3/MRdTaXGnq9RzP7Ta/Tf6VntpW3s5Ih6JiM9KJzvthfpaszHu4AEAAOABcWrA9X/mj9K4RwcAAACMzNy4BwAAAAAcONf/AAAAcKwNM6+fymGv1ONQDEPlCFbGfWYCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA42v4OAAD///tZxK0=")
>     >> quotactl_fd$Q_SETINFO(0xffffffffffffffff, 0x2, 0x0, &(0x7f0000000080)={0x80000000000002, 0x80000000005, 0x1, 0x6})
>     >> r0 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file1\x00', 0x42, 0x1ff)
>     >> ioctl$EXT4_IOC_CHECKPOINT(r0, 0x40305829, &(0x7f0000000080)=0x5)
>     >> r1 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file1\x00', 0x42, 0x1ff)
>     >> ioctl$EXT4_IOC_CHECKPOINT(r1, 0x40305829, &(0x7f0000000080)=0x5)
>     >
>     >
>     >
>     >
>     >
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG in zero_user_segments
  2025-04-29  7:55   ` Zhang Yi
       [not found]     ` <CADCV8spSjWbnr_cUTzcB=zn0M92s_AhRx-byz0A8zZZa4cZ=Lg@mail.gmail.com>
@ 2025-04-30  3:14     ` Matthew Wilcox
  2025-05-01 11:19       ` Jan Kara
  1 sibling, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2025-04-30  3:14 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Liebes Wang, Jan Kara, ojaswin, Theodore Ts'o, linux-fsdevel,
	syzkaller, Ext4 Developers List

On Tue, Apr 29, 2025 at 03:55:18PM +0800, Zhang Yi wrote:
> After debugging, I found that this problem is caused by punching a hole
> with an offset variable larger than max_end on a corrupted ext4 inode,
> whose i_size is larger than maxbyte. It will result in a negative length
> in the truncate_inode_partial_folio(), which will trigger this problem.

It seems to me like we're asking for trouble when we allow an inode with
an i_size larger than max_end to be instantiated.  There are probably
other places which assume it is smaller than max_end.  We should probably
decline to create the bad inode in the first place?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG in zero_user_segments
  2025-04-30  3:14     ` Matthew Wilcox
@ 2025-05-01 11:19       ` Jan Kara
  2025-05-06  2:25         ` Zhang Yi
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2025-05-01 11:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Zhang Yi, Liebes Wang, Jan Kara, ojaswin, Theodore Ts'o,
	linux-fsdevel, syzkaller, Ext4 Developers List

On Wed 30-04-25 04:14:32, Matthew Wilcox wrote:
> On Tue, Apr 29, 2025 at 03:55:18PM +0800, Zhang Yi wrote:
> > After debugging, I found that this problem is caused by punching a hole
> > with an offset variable larger than max_end on a corrupted ext4 inode,
> > whose i_size is larger than maxbyte. It will result in a negative length
> > in the truncate_inode_partial_folio(), which will trigger this problem.
> 
> It seems to me like we're asking for trouble when we allow an inode with
> an i_size larger than max_end to be instantiated.  There are probably
> other places which assume it is smaller than max_end.  We should probably
> decline to create the bad inode in the first place?

Indeed somewhat less quirky fix could be to make ext4_max_bitmap_size()
return one block smaller limit. Something like:

        /* Compute how many blocks we can address by block tree */
        res += ppb;
        res += ppb * ppb;
        res += ((loff_t)ppb) * ppb * ppb;
+	/*
+	 * Hole punching assumes it can map the block past end of hole to
+	 * tree offsets
+	 */
+	res -= 1;
        /* Compute how many metadata blocks are needed */
        meta_blocks = 1;
        meta_blocks += 1 + ppb;

The slight caveat is that in theory there could be filesystems out there
with so large files and then we'd stop allowing access to such files. But I
guess the chances are so low that it's probably worth trying.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG in zero_user_segments
  2025-05-01 11:19       ` Jan Kara
@ 2025-05-06  2:25         ` Zhang Yi
  2025-05-06 11:33           ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Zhang Yi @ 2025-05-06  2:25 UTC (permalink / raw)
  To: Jan Kara, Matthew Wilcox
  Cc: Liebes Wang, ojaswin, Theodore Ts'o, linux-fsdevel, syzkaller,
	Ext4 Developers List

On 2025/5/1 19:19, Jan Kara wrote:
> On Wed 30-04-25 04:14:32, Matthew Wilcox wrote:
>> On Tue, Apr 29, 2025 at 03:55:18PM +0800, Zhang Yi wrote:
>>> After debugging, I found that this problem is caused by punching a hole
>>> with an offset variable larger than max_end on a corrupted ext4 inode,
>>> whose i_size is larger than maxbyte. It will result in a negative length
>>> in the truncate_inode_partial_folio(), which will trigger this problem.
>>
>> It seems to me like we're asking for trouble when we allow an inode with
>> an i_size larger than max_end to be instantiated.  There are probably
>> other places which assume it is smaller than max_end.  We should probably
>> decline to create the bad inode in the first place?
> 
> Indeed somewhat less quirky fix could be to make ext4_max_bitmap_size()
> return one block smaller limit. Something like:
> 
>         /* Compute how many blocks we can address by block tree */
>         res += ppb;
>         res += ppb * ppb;
>         res += ((loff_t)ppb) * ppb * ppb;
> +	/*
> +	 * Hole punching assumes it can map the block past end of hole to
> +	 * tree offsets
> +	 */
> +	res -= 1;
>         /* Compute how many metadata blocks are needed */
>         meta_blocks = 1;
>         meta_blocks += 1 + ppb;
> 
> The slight caveat is that in theory there could be filesystems out there
> with so large files and then we'd stop allowing access to such files. But I
> guess the chances are so low that it's probably worth trying.
> 

Hmm, I suppose this approach could pose some risks to our legacy products,
and it makes me feel uneasy. Personally, I am more inclined toward the
current solution, unless we decide to fix the ext4_ind_remove_space()
directly. :)

Thanks,
Yi.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG in zero_user_segments
  2025-05-06  2:25         ` Zhang Yi
@ 2025-05-06 11:33           ` Jan Kara
  2025-05-06 12:12             ` Zhang Yi
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2025-05-06 11:33 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Jan Kara, Matthew Wilcox, Liebes Wang, ojaswin, Theodore Ts'o,
	linux-fsdevel, syzkaller, Ext4 Developers List

On Tue 06-05-25 10:25:06, Zhang Yi wrote:
> On 2025/5/1 19:19, Jan Kara wrote:
> > On Wed 30-04-25 04:14:32, Matthew Wilcox wrote:
> >> On Tue, Apr 29, 2025 at 03:55:18PM +0800, Zhang Yi wrote:
> >>> After debugging, I found that this problem is caused by punching a hole
> >>> with an offset variable larger than max_end on a corrupted ext4 inode,
> >>> whose i_size is larger than maxbyte. It will result in a negative length
> >>> in the truncate_inode_partial_folio(), which will trigger this problem.
> >>
> >> It seems to me like we're asking for trouble when we allow an inode with
> >> an i_size larger than max_end to be instantiated.  There are probably
> >> other places which assume it is smaller than max_end.  We should probably
> >> decline to create the bad inode in the first place?
> > 
> > Indeed somewhat less quirky fix could be to make ext4_max_bitmap_size()
> > return one block smaller limit. Something like:
> > 
> >         /* Compute how many blocks we can address by block tree */
> >         res += ppb;
> >         res += ppb * ppb;
> >         res += ((loff_t)ppb) * ppb * ppb;
> > +	/*
> > +	 * Hole punching assumes it can map the block past end of hole to
> > +	 * tree offsets
> > +	 */
> > +	res -= 1;
> >         /* Compute how many metadata blocks are needed */
> >         meta_blocks = 1;
> >         meta_blocks += 1 + ppb;
> > 
> > The slight caveat is that in theory there could be filesystems out there
> > with so large files and then we'd stop allowing access to such files. But I
> > guess the chances are so low that it's probably worth trying.
> > 
> 
> Hmm, I suppose this approach could pose some risks to our legacy products,
> and it makes me feel uneasy. Personally, I am more inclined toward the
> current solution, unless we decide to fix the ext4_ind_remove_space()
> directly. :)

OK. I'm just curious, are you using indirect-block based inodes and using
them upto the current s_bitmap_maxbytes size? :)

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG in zero_user_segments
  2025-05-06 11:33           ` Jan Kara
@ 2025-05-06 12:12             ` Zhang Yi
  0 siblings, 0 replies; 9+ messages in thread
From: Zhang Yi @ 2025-05-06 12:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Liebes Wang, ojaswin, Theodore Ts'o,
	linux-fsdevel, syzkaller, Ext4 Developers List

On 2025/5/6 19:33, Jan Kara wrote:
> On Tue 06-05-25 10:25:06, Zhang Yi wrote:
>> On 2025/5/1 19:19, Jan Kara wrote:
>>> On Wed 30-04-25 04:14:32, Matthew Wilcox wrote:
>>>> On Tue, Apr 29, 2025 at 03:55:18PM +0800, Zhang Yi wrote:
>>>>> After debugging, I found that this problem is caused by punching a hole
>>>>> with an offset variable larger than max_end on a corrupted ext4 inode,
>>>>> whose i_size is larger than maxbyte. It will result in a negative length
>>>>> in the truncate_inode_partial_folio(), which will trigger this problem.
>>>>
>>>> It seems to me like we're asking for trouble when we allow an inode with
>>>> an i_size larger than max_end to be instantiated.  There are probably
>>>> other places which assume it is smaller than max_end.  We should probably
>>>> decline to create the bad inode in the first place?
>>>
>>> Indeed somewhat less quirky fix could be to make ext4_max_bitmap_size()
>>> return one block smaller limit. Something like:
>>>
>>>         /* Compute how many blocks we can address by block tree */
>>>         res += ppb;
>>>         res += ppb * ppb;
>>>         res += ((loff_t)ppb) * ppb * ppb;
>>> +	/*
>>> +	 * Hole punching assumes it can map the block past end of hole to
>>> +	 * tree offsets
>>> +	 */
>>> +	res -= 1;
>>>         /* Compute how many metadata blocks are needed */
>>>         meta_blocks = 1;
>>>         meta_blocks += 1 + ppb;
>>>
>>> The slight caveat is that in theory there could be filesystems out there
>>> with so large files and then we'd stop allowing access to such files. But I
>>> guess the chances are so low that it's probably worth trying.
>>>
>>
>> Hmm, I suppose this approach could pose some risks to our legacy products,
>> and it makes me feel uneasy. Personally, I am more inclined toward the
>> current solution, unless we decide to fix the ext4_ind_remove_space()
>> directly. :)
> 
> OK. I'm just curious, are you using indirect-block based inodes and using
> them upto the current s_bitmap_maxbytes size? :)
> 

Yes, we have many legacy products that still use ext3 images, which utilize
indirect-block-based inodes. However, most of these are open scenarios, so
I'm not entirely sure about the size of the files that the products and
customers will store. Although it is unlikely that the s_bitmap_maxbytes
size will be reached, I cannot be 100% certain.

Best regards,
Yi.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-05-06 12:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CADCV8spm=TtW_Lu6p-5q-jdHv1ryLcx45mNBEcYdELbHv_4TnQ@mail.gmail.com>
2025-04-28  8:14 ` kernel BUG in zero_user_segments Jan Kara
2025-04-28 12:55   ` Matthew Wilcox
2025-04-29  7:55   ` Zhang Yi
     [not found]     ` <CADCV8spSjWbnr_cUTzcB=zn0M92s_AhRx-byz0A8zZZa4cZ=Lg@mail.gmail.com>
2025-04-30  1:16       ` Zhang Yi
2025-04-30  3:14     ` Matthew Wilcox
2025-05-01 11:19       ` Jan Kara
2025-05-06  2:25         ` Zhang Yi
2025-05-06 11:33           ` Jan Kara
2025-05-06 12:12             ` Zhang Yi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).