[BUG] kernel BUG in __ext4_journal

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [BUG] kernel BUG in __ext4_journal_stop
@ 2026-06-29  6:43 Xianying Wang
  2026-06-29  9:29 ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Xianying Wang @ 2026-06-29  6:43 UTC (permalink / raw)
  To: tytso
  Cc: adilger.kernel, libaokun, jack, ojaswin, yi.zhang, linux-ext4,
	linux-kernel

Hi,

I would like to report that this bug has been reported before, but it
can still be triggered on Linux 7.1-rc5 by a syzkaller reproducer.

The issue is a kernel BUG in the ext4 inline-data write path. Before
the crash, ext4 reports corrupted block allocation metadata:

EXT4-fs error: ext4_mb_generate_buddy: group 0, block bitmap and bg
descriptor inconsistent

The crash happens while the reproducer is writing to an ext4 file
through sendfile64(). The write path reaches the ext4 buffered write
and inline-data write-end code, and then triggers a BUG when stopping
the journal handle.

Based on the execution context, the issue appears to be related to the
interaction between corrupted ext4 block allocation metadata and the
inline-data buffered write path. After ext4 detects that the block
bitmap and block group descriptor are inconsistent, the sendfile64()
write still proceeds into ext4_write_inline_data_end(). During this
phase, ext4 needs to update inline-data/inode metadata and stop the
journal transaction. However, the journal handle or the inline-data
write state appears to be inconsistent, and __ext4_journal_stop()
eventually hits an internal BUG_ON().

So the suspected problem is that the ext4 error handling path after
detecting corrupted allocation metadata does not fully prevent the
inline-data write-end path from continuing with an invalid or
unexpected journal handle state. This results in a kernel BUG in
__ext4_journal_stop().

This can be reproduced on:

HEAD commit:

e7ae89a0c97ce2b68b0983cd01eda67cf373517d

report: https://pastebin.com/raw/1aWWc2Uj

console output : https://pastebin.com/raw/MS8YxkTn

kernel config : https://pastebin.com/raw/fUwrL2uz

C reproducer : https://pastebin.com/raw/HgPfLbKs

Let me know if you need more details or testing.

Best regards,

Xianying

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] kernel BUG in __ext4_journal_stop
  2026-06-29  6:43 [BUG] kernel BUG in __ext4_journal_stop Xianying Wang
@ 2026-06-29  9:29 ` Jan Kara
  2026-06-29 13:07   ` Theodore Tso
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2026-06-29  9:29 UTC (permalink / raw)
  To: Xianying Wang
  Cc: tytso, adilger.kernel, libaokun, jack, ojaswin, yi.zhang,
	linux-ext4, linux-kernel

Hi!

On Mon 29-06-26 14:43:55, Xianying Wang wrote:
> I would like to report that this bug has been reported before, but it
> can still be triggered on Linux 7.1-rc5 by a syzkaller reproducer.
> 
> The issue is a kernel BUG in the ext4 inline-data write path. Before
> the crash, ext4 reports corrupted block allocation metadata:
> 
> EXT4-fs error: ext4_mb_generate_buddy: group 0, block bitmap and bg
> descriptor inconsistent
> 
> The crash happens while the reproducer is writing to an ext4 file
> through sendfile64(). The write path reaches the ext4 buffered write
> and inline-data write-end code, and then triggers a BUG when stopping
> the journal handle.

Thanks for report but frankly, we have no capacity to analyze every fuzzing
report somebody comes with. We generally look with higher priority at
Syzbot produced fuzzing results because it provides environment for
tracking of reproducers, easy access to artifacts, etc. which significantly
speeds up analysis.

For example in this case I couldn't even access the console log at
pastebin to check the exact BUG message.

								Honza

> Based on the execution context, the issue appears to be related to the
> interaction between corrupted ext4 block allocation metadata and the
> inline-data buffered write path. After ext4 detects that the block
> bitmap and block group descriptor are inconsistent, the sendfile64()
> write still proceeds into ext4_write_inline_data_end(). During this
> phase, ext4 needs to update inline-data/inode metadata and stop the
> journal transaction. However, the journal handle or the inline-data
> write state appears to be inconsistent, and __ext4_journal_stop()
> eventually hits an internal BUG_ON().
> 
> So the suspected problem is that the ext4 error handling path after
> detecting corrupted allocation metadata does not fully prevent the
> inline-data write-end path from continuing with an invalid or
> unexpected journal handle state. This results in a kernel BUG in
> __ext4_journal_stop().
> 
> This can be reproduced on:
> 
> HEAD commit:
> 
> e7ae89a0c97ce2b68b0983cd01eda67cf373517d
> 
> report: https://pastebin.com/raw/1aWWc2Uj
> 
> console output : https://pastebin.com/raw/MS8YxkTn
> 
> kernel config : https://pastebin.com/raw/fUwrL2uz
> 
> C reproducer : https://pastebin.com/raw/HgPfLbKs
> 
> Let me know if you need more details or testing.
> 
> Best regards,
> 
> Xianying
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] kernel BUG in __ext4_journal_stop
  2026-06-29  9:29 ` Jan Kara
@ 2026-06-29 13:07   ` Theodore Tso
  0 siblings, 0 replies; 3+ messages in thread
From: Theodore Tso @ 2026-06-29 13:07 UTC (permalink / raw)
  To: Jan Kara
  Cc: Xianying Wang, adilger.kernel, libaokun, ojaswin, yi.zhang,
	linux-ext4, linux-kernel

On Mon, Jun 29, 2026 at 11:29:51AM -0500, Jan Kara wrote:
> Thanks for report but frankly, we have no capacity to analyze every fuzzing
> report somebody comes with. We generally look with higher priority at
> Syzbot produced fuzzing results because it provides environment for
> tracking of reproducers, easy access to artifacts, etc. which significantly
> speeds up analysis.

Xianying,

When we say "syzbot" we're referring to the upstream syzkaller which
has a dashboard[1] which makes it significantly easier for us to also
request rerunning the reproducer in the same environment as used by
the upstream Syzkaller.  (Very often reproducers are very timing
dependent, and so just because it runs on *your* system doesn't
guarantee that it will run elsewhere.)

[1] https://syzkaller.appspot.com/upstream

Also, in particular, if the syzkaller or modified syzkaller involves a
fuzzed, corrupted file system image, I personally significantly
down-prioritize investigating it, because we are getting flooded by
them, and I don't consider it a particularly interesting threat model.
Users shouldn't be blindly mounting untrusted file systems without
running fsck on the image first.

This is similar to the concern of a kernel developers getting attacked
by a denial of service by a flood of low-quality security reports
caused by LLM's[2].

[2] https://www.theverge.com/tech/932312/linus-torvalds-linux-ai-security-bugs

> For example in this case I couldn't even access the console log at
> pastebin to check the exact BUG message.

netlink: 'syz.7.1096': attribute type 3 has an invalid length.
EXT4-fs error (device loop7): ext4_mb_generate_buddy:1314: group 0, block bitmap and bg descriptor inconsistent: 25 vs 150994969 free clusters
------------[ cut here ]------------
kernel BUG at fs/ext4/ext4_jbd2.c:53!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 13658 Comm: syz.7.1098 Not tainted 7.1.0-rc5 #2 PREEMPT(lazy)
Hardware name: QEMU Ubuntu 24.10 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:__ext4_journal_stop+0x189/0x1c0
Code: e8 3c 5e a2 ff 48 89 ef e8 14 55 18 00 85 db 0f 44 d8 41 89 dc e9 6d ff ff ff e8 12 79 d5 ff e9 d0 fe ff ff e8 18 5e a2 ff 90 <0f> 0b 4c 89 e7 e8 2d 79 d5 ff e9 06 ff ff ff 48 89 ef e8 20 79 d5
RSP: 0018:ffff88810d3974d8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000009 RCX: ffffffff88486688
RDX: ffff8880164cc600 RSI: 000000000000035c RDI: ffffffff8af2ad80
RBP: 0000000000000000 R08: ffff88800e96f9d8 R09: ffffed1001d2df48
R10: ffffed1001d2df47 R11: ffff88800e96fa3b R12: ffffea00044dcb40
R13: ffffffff8af2ad80 R14: 000000000000035c R15: ffff88800e9045c0
FS:  00007f4bb3d37640(0000) GS:ffff888187bc7000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4bb532e6e0 CR3: 000000010328d000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
<TASK>
ext4_write_inline_data_end+0x3b3/0xa30
ext4_da_write_end+0x3a3/0xa70 inode.c:-1
generic_perform_write+0x215/0x730
ext4_buffered_write_iter+0x194/0x500 file.c:-1
ext4_file_write_iter+0x4b5/0x1400 file.c:-1
iter_file_splice_write+0x8dc/0xfa0
direct_splice_actor+0x181/0x5c0 splice.c:-1
splice_direct_to_actor+0x335/0x920
do_splice_direct_actor+0x168/0x230 splice.c:-1
do_splice_direct+0x41/0x60
do_sendfile+0x9c4/0xd30 read_write.c:-1
__x64_sys_sendfile64+0x195/0x1d0
do_syscall_64+0x104/0x5b0
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Note the "-1" for the line number.  If you want to reduce the amount
of time that you are wasting, try to fix your build setup so that the
line numbers are correctly reporting in the kernel stack trace.

Or better yet, send us a well-formed kernel patch following the
established patch submission protocols[3].

[3] https://docs.kernel.org/process/submitting-patches.html

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-29 13:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-29  6:43 [BUG] kernel BUG in __ext4_journal_stop Xianying Wang
2026-06-29  9:29 ` Jan Kara
2026-06-29 13:07   ` Theodore Tso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox