[syzbot] [ext4?] general protection fault in hrtimer

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [syzbot] [ext4?] general protection fault in hrtimer_nanosleep
@ 2023-11-01  5:36 syzbot
  2023-11-01 12:58 ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: syzbot @ 2023-11-01  5:36 UTC (permalink / raw)
  To: adilger.kernel, linux-ext4, linux-fsdevel, linux-kernel,
	syzkaller-bugs, tglx, tytso

Hello,

syzbot found the following issue on:

HEAD commit:    888cf78c29e2 Merge tag 'iommu-fix-v6.6-rc7' of git://git.k..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10339673680000
kernel config:  https://syzkaller.appspot.com/x/.config?x=7d1f30869bb78ec6
dashboard link: https://syzkaller.appspot.com/bug?extid=b408cd9b40ec25380ee1
compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=165bbce3680000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/2e776d64243c/disk-888cf78c.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/9ce776a2bcfc/vmlinux-888cf78c.xz
kernel image: https://storage.googleapis.com/syzbot-assets/86a6c193c013/bzImage-888cf78c.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/8021bba287f0/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b408cd9b40ec25380ee1@syzkaller.appspotmail.com

general protection fault, probably for non-canonical address 0xdffffc003ffff113: 0000 [#1] PREEMPT SMP KASAN
KASAN: probably user-memory-access in range [0x00000001ffff8898-0x00000001ffff889f]
CPU: 1 PID: 5308 Comm: syz-executor.4 Not tainted 6.6.0-rc7-syzkaller-00142-g888cf78c29e2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
RIP: 0010:lookup_object lib/debugobjects.c:195 [inline]
RIP: 0010:lookup_object_or_alloc lib/debugobjects.c:564 [inline]
RIP: 0010:__debug_object_init+0xf3/0x2b0 lib/debugobjects.c:634
Code: d8 48 c1 e8 03 42 80 3c 20 00 0f 85 85 01 00 00 48 8b 1b 48 85 db 0f 84 9f 00 00 00 48 8d 7b 18 83 c5 01 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 4c 01 00 00 4c 3b 73 18 75 c3 48 8d 7b 10 48
RSP: 0018:ffffc900050e7d08 EFLAGS: 00010012
RAX: 000000003ffff113 RBX: 00000001ffff8880 RCX: ffffffff8169123e
RDX: 1ffffffff249b149 RSI: 0000000000000004 RDI: 00000001ffff8898
RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000216
R10: 0000000000000003 R11: 0000000000000000 R12: dffffc0000000000
R13: ffffffff924d8a48 R14: ffffc900050e7d90 R15: ffffffff924d8a50
FS:  0000555556eec480(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa23ab065ee CR3: 000000007e5c1000 CR4: 0000000000350ee0
Call Trace:
 <TASK>
 hrtimer_init_sleeper_on_stack kernel/time/hrtimer.c:447 [inline]
 hrtimer_nanosleep+0x122/0x440 kernel/time/hrtimer.c:2098
 common_nsleep+0xa1/0xc0 kernel/time/posix-timers.c:1350
 __do_sys_clock_nanosleep kernel/time/posix-timers.c:1396 [inline]
 __se_sys_clock_nanosleep kernel/time/posix-timers.c:1373 [inline]
 __x64_sys_clock_nanosleep+0x344/0x490 kernel/time/posix-timers.c:1373
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ff1a56a7ef5
Code: 24 0c 89 3c 24 48 89 4c 24 18 e8 f6 b9 ff ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 8b 74 24 0c 8b 3c 24 b8 e6 00 00 00 0f 05 <44> 89 c7 48 89 04 24 e8 4f ba ff ff 48 8b 04 24 48 83 c4 28 f7 d8
RSP: 002b:00007ffe80c6ee30 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6
RAX: ffffffffffffffda RBX: 00007ff1a579bf80 RCX: 00007ff1a56a7ef5
RDX: 00007ffe80c6ee70 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00007ff1a579d980 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000fef3
R13: ffffffffffffffff R14: 00007ff1a5200000 R15: 000000000000fbb2
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:lookup_object lib/debugobjects.c:195 [inline]
RIP: 0010:lookup_object_or_alloc lib/debugobjects.c:564 [inline]
RIP: 0010:__debug_object_init+0xf3/0x2b0 lib/debugobjects.c:634
Code: d8 48 c1 e8 03 42 80 3c 20 00 0f 85 85 01 00 00 48 8b 1b 48 85 db 0f 84 9f 00 00 00 48 8d 7b 18 83 c5 01 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 4c 01 00 00 4c 3b 73 18 75 c3 48 8d 7b 10 48
RSP: 0018:ffffc900050e7d08 EFLAGS: 00010012

RAX: 000000003ffff113 RBX: 00000001ffff8880 RCX: ffffffff8169123e
RDX: 1ffffffff249b149 RSI: 0000000000000004 RDI: 00000001ffff8898
RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000216
R10: 0000000000000003 R11: 0000000000000000 R12: dffffc0000000000
R13: ffffffff924d8a48 R14: ffffc900050e7d90 R15: ffffffff924d8a50
FS:  0000555556eec480(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa23ab065ee CR3: 000000007e5c1000 CR4: 0000000000350ee0
----------------
Code disassembly (best guess):
   0:	d8 48 c1             	fmuls  -0x3f(%rax)
   3:	e8 03 42 80 3c       	call   0x3c80420b
   8:	20 00                	and    %al,(%rax)
   a:	0f 85 85 01 00 00    	jne    0x195
  10:	48 8b 1b             	mov    (%rbx),%rbx
  13:	48 85 db             	test   %rbx,%rbx
  16:	0f 84 9f 00 00 00    	je     0xbb
  1c:	48 8d 7b 18          	lea    0x18(%rbx),%rdi
  20:	83 c5 01             	add    $0x1,%ebp
  23:	48 89 f8             	mov    %rdi,%rax
  26:	48 c1 e8 03          	shr    $0x3,%rax
* 2a:	42 80 3c 20 00       	cmpb   $0x0,(%rax,%r12,1) <-- trapping instruction
  2f:	0f 85 4c 01 00 00    	jne    0x181
  35:	4c 3b 73 18          	cmp    0x18(%rbx),%r14
  39:	75 c3                	jne    0xfffffffe
  3b:	48 8d 7b 10          	lea    0x10(%rbx),%rdi
  3f:	48                   	rex.W


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [ext4?] general protection fault in hrtimer_nanosleep
  2023-11-01  5:36 [syzbot] [ext4?] general protection fault in hrtimer_nanosleep syzbot
@ 2023-11-01 12:58 ` Thomas Gleixner
  2023-11-02 12:08   ` Aleksandr Nogikh
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2023-11-01 12:58 UTC (permalink / raw)
  To: syzbot, adilger.kernel, linux-ext4, linux-fsdevel, linux-kernel,
	syzkaller-bugs, tytso

On Tue, Oct 31 2023 at 22:36, syzbot wrote:
> general protection fault, probably for non-canonical address 0xdffffc003ffff113: 0000 [#1] PREEMPT SMP KASAN
> KASAN: probably user-memory-access in range [0x00000001ffff8898-0x00000001ffff889f]
> CPU: 1 PID: 5308 Comm: syz-executor.4 Not tainted 6.6.0-rc7-syzkaller-00142-g888cf78c29e2 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
> RIP: 0010:lookup_object lib/debugobjects.c:195 [inline]
> RIP: 0010:lookup_object_or_alloc lib/debugobjects.c:564 [inline]
> RIP: 0010:__debug_object_init+0xf3/0x2b0 lib/debugobjects.c:634
> Code: d8 48 c1 e8 03 42 80 3c 20 00 0f 85 85 01 00 00 48 8b 1b 48 85 db 0f 84 9f 00 00 00 48 8d 7b 18 83 c5 01 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 4c 01 00 00 4c 3b 73 18 75 c3 48 8d 7b 10 48
> RSP: 0018:ffffc900050e7d08 EFLAGS: 00010012
> RAX: 000000003ffff113 RBX: 00000001ffff8880 RCX: ffffffff8169123e
> RDX: 1ffffffff249b149 RSI: 0000000000000004 RDI: 00000001ffff8898
> RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000216
> R10: 0000000000000003 R11: 0000000000000000 R12: dffffc0000000000
> R13: ffffffff924d8a48 R14: ffffc900050e7d90 R15: ffffffff924d8a50
> FS:  0000555556eec480(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fa23ab065ee CR3: 000000007e5c1000 CR4: 0000000000350ee0

So this dies in debugobjects::lookup_object()

hlist_for_each_entry()

>   10:	48 8b 1b             	mov    (%rbx),%rbx

Gets the next entry

>   13:	48 85 db             	test   %rbx,%rbx
>   16:	0f 84 9f 00 00 00    	je     0xbb

Checks for the termination condition (NULL pointer)

>   1c:	48 8d 7b 18          	lea    0x18(%rbx),%rdi

Calculates the address of obj->object

>   20:	83 c5 01             	add    $0x1,%ebp

cnt++;

>   23:	48 89 f8             	mov    %rdi,%rax
>   26:	48 c1 e8 03          	shr    $0x3,%rax

KASAN shadow address calculation

> * 2a:	42 80 3c 20 00       	cmpb   $0x0,(%rax,%r12,1) <-- trapping instruction

Kasan accesses 0xdffffc003ffff113 and dies.

RBX contains the pointer to the next object: 0x00000001ffff8880 which is
clearly a user space address, but I have no idea where that might come
from. It's obviously data corruption of unknown provenience.

Unfortunately repro.syz does not hold up to its name and refuses to
reproduce.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [ext4?] general protection fault in hrtimer_nanosleep
  2023-11-01 12:58 ` Thomas Gleixner
@ 2023-11-02 12:08   ` Aleksandr Nogikh
  2023-11-02 15:57     ` Thomas Gleixner
  2023-11-03 11:17     ` AW: " carsten.schmid
  0 siblings, 2 replies; 7+ messages in thread
From: Aleksandr Nogikh @ 2023-11-02 12:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: syzbot, adilger.kernel, linux-ext4, linux-fsdevel, linux-kernel,
	syzkaller-bugs, tytso

On Wed, Nov 1, 2023 at 1:58 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, Oct 31 2023 at 22:36, syzbot wrote:
> > general protection fault, probably for non-canonical address 0xdffffc003ffff113: 0000 [#1] PREEMPT SMP KASAN
> > KASAN: probably user-memory-access in range [0x00000001ffff8898-0x00000001ffff889f]
> > CPU: 1 PID: 5308 Comm: syz-executor.4 Not tainted 6.6.0-rc7-syzkaller-00142-g888cf78c29e2 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
> > RIP: 0010:lookup_object lib/debugobjects.c:195 [inline]
> > RIP: 0010:lookup_object_or_alloc lib/debugobjects.c:564 [inline]
> > RIP: 0010:__debug_object_init+0xf3/0x2b0 lib/debugobjects.c:634
> > Code: d8 48 c1 e8 03 42 80 3c 20 00 0f 85 85 01 00 00 48 8b 1b 48 85 db 0f 84 9f 00 00 00 48 8d 7b 18 83 c5 01 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 4c 01 00 00 4c 3b 73 18 75 c3 48 8d 7b 10 48
> > RSP: 0018:ffffc900050e7d08 EFLAGS: 00010012
> > RAX: 000000003ffff113 RBX: 00000001ffff8880 RCX: ffffffff8169123e
> > RDX: 1ffffffff249b149 RSI: 0000000000000004 RDI: 00000001ffff8898
> > RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000216
> > R10: 0000000000000003 R11: 0000000000000000 R12: dffffc0000000000
> > R13: ffffffff924d8a48 R14: ffffc900050e7d90 R15: ffffffff924d8a50
> > FS:  0000555556eec480(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fa23ab065ee CR3: 000000007e5c1000 CR4: 0000000000350ee0
>
> So this dies in debugobjects::lookup_object()
>
> hlist_for_each_entry()
>
> >   10: 48 8b 1b                mov    (%rbx),%rbx
>
> Gets the next entry
>
> >   13: 48 85 db                test   %rbx,%rbx
> >   16: 0f 84 9f 00 00 00       je     0xbb
>
> Checks for the termination condition (NULL pointer)
>
> >   1c: 48 8d 7b 18             lea    0x18(%rbx),%rdi
>
> Calculates the address of obj->object
>
> >   20: 83 c5 01                add    $0x1,%ebp
>
> cnt++;
>
> >   23: 48 89 f8                mov    %rdi,%rax
> >   26: 48 c1 e8 03             shr    $0x3,%rax
>
> KASAN shadow address calculation
>
> > * 2a: 42 80 3c 20 00          cmpb   $0x0,(%rax,%r12,1) <-- trapping instruction
>
> Kasan accesses 0xdffffc003ffff113 and dies.
>
> RBX contains the pointer to the next object: 0x00000001ffff8880 which is
> clearly a user space address, but I have no idea where that might come
> from. It's obviously data corruption of unknown provenience.
>
> Unfortunately repro.syz does not hold up to its name and refuses to
> reproduce.

For me, on a locally built kernel (gcc 13.2.0) it didn't work either.

But, interestingly, it does reproduce using the syzbot-built kernel
shared via the "Downloadable assets" [1] in the original report. The
repro crashed the kernel in ~1 minute.

[1] https://github.com/google/syzkaller/blob/master/docs/syzbot_assets.md

[  125.919060][    C0] BUG: KASAN: stack-out-of-bounds in rb_next+0x10a/0x130
[  125.921169][    C0] Read of size 8 at addr ffffc900048e7c60 by task
kworker/0:1/9
[  125.923235][    C0]
[  125.923243][    C0] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted
6.6.0-rc7-syzkaller-00142-g888cf78c29e2 #0
[  125.924546][    C0] Hardware name: QEMU Standard PC (Q35 + ICH9,
2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  125.926915][    C0] Workqueue: events nsim_dev_trap_report_work
[  125.929333][    C0]
[  125.929341][    C0] Call Trace:
[  125.929350][    C0]  <IRQ>
[  125.929356][    C0]  dump_stack_lvl+0xd9/0x1b0
[  125.931302][    C0]  print_report+0xc4/0x620
[  125.932115][    C0]  ? __virt_addr_valid+0x5e/0x2d0
[  125.933194][    C0]  kasan_report+0xda/0x110
[  125.934814][    C0]  ? rb_next+0x10a/0x130
[  125.936521][    C0]  ? rb_next+0x10a/0x130
[  125.936544][    C0]  rb_next+0x10a/0x130
[  125.936565][    C0]  timerqueue_del+0xd4/0x140
[  125.936590][    C0]  __remove_hrtimer+0x99/0x290
[  125.936613][    C0]  __hrtimer_run_queues+0x55b/0xc10
[  125.936638][    C0]  ? enqueue_hrtimer+0x310/0x310
[  125.936659][    C0]  ? ktime_get_update_offsets_now+0x3bc/0x610
[  125.936688][    C0]  hrtimer_interrupt+0x31b/0x800
[  125.936715][    C0]  __sysvec_apic_timer_interrupt+0x105/0x3f0
[  125.936737][    C0]  sysvec_apic_timer_interrupt+0x8e/0xc0
[  125.936755][    C0]  </IRQ>
[  125.936759][    C0]  <TASK>



>
> Thanks,
>
>         tglx
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/875y2lmxys.ffs%40tglx.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [ext4?] general protection fault in hrtimer_nanosleep
  2023-11-02 12:08   ` Aleksandr Nogikh
@ 2023-11-02 15:57     ` Thomas Gleixner
  2023-11-10  5:00       ` Aleksandr Nogikh
  2023-11-03 11:17     ` AW: " carsten.schmid
  1 sibling, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2023-11-02 15:57 UTC (permalink / raw)
  To: Aleksandr Nogikh
  Cc: syzbot, adilger.kernel, linux-ext4, linux-fsdevel, linux-kernel,
	syzkaller-bugs, tytso

On Thu, Nov 02 2023 at 13:08, Aleksandr Nogikh wrote:
> On Wed, Nov 1, 2023 at 1:58 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>> Unfortunately repro.syz does not hold up to its name and refuses to
>> reproduce.
>
> For me, on a locally built kernel (gcc 13.2.0) it didn't work either.
>
> But, interestingly, it does reproduce using the syzbot-built kernel
> shared via the "Downloadable assets" [1] in the original report. The
> repro crashed the kernel in ~1 minute.
>
> [1] https://github.com/google/syzkaller/blob/master/docs/syzbot_assets.md
>
> [  125.919060][    C0] BUG: KASAN: stack-out-of-bounds in rb_next+0x10a/0x130
> [  125.921169][    C0] Read of size 8 at addr ffffc900048e7c60 by task
> kworker/0:1/9
> [  125.923235][    C0]
> [  125.923243][    C0] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted
> 6.6.0-rc7-syzkaller-00142-g888cf78c29e2 #0
> [  125.924546][    C0] Hardware name: QEMU Standard PC (Q35 + ICH9,
> 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [  125.926915][    C0] Workqueue: events nsim_dev_trap_report_work
> [  125.929333][    C0]
> [  125.929341][    C0] Call Trace:
> [  125.929350][    C0]  <IRQ>
> [  125.929356][    C0]  dump_stack_lvl+0xd9/0x1b0
> [  125.931302][    C0]  print_report+0xc4/0x620
> [  125.932115][    C0]  ? __virt_addr_valid+0x5e/0x2d0
> [  125.933194][    C0]  kasan_report+0xda/0x110
> [  125.934814][    C0]  ? rb_next+0x10a/0x130
> [  125.936521][    C0]  ? rb_next+0x10a/0x130
> [  125.936544][    C0]  rb_next+0x10a/0x130
> [  125.936565][    C0]  timerqueue_del+0xd4/0x140
> [  125.936590][    C0]  __remove_hrtimer+0x99/0x290
> [  125.936613][    C0]  __hrtimer_run_queues+0x55b/0xc10
> [  125.936638][    C0]  ? enqueue_hrtimer+0x310/0x310
> [  125.936659][    C0]  ? ktime_get_update_offsets_now+0x3bc/0x610
> [  125.936688][    C0]  hrtimer_interrupt+0x31b/0x800
> [  125.936715][    C0]  __sysvec_apic_timer_interrupt+0x105/0x3f0
> [  125.936737][    C0]  sysvec_apic_timer_interrupt+0x8e/0xc0
> [  125.936755][    C0]  </IRQ>
> [  125.936759][    C0]  <TASK>

Which is a completely different failure mode.

It explodes in the hrtimer interrupt when dequeuing an hrtimer for
expiry. That means the corresponding embedded rb_node is corrupted,
which points to random data corruption.

As you can reproduce (it still fails here with the provided assets),
does the failure change when you run it several times?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [ext4?] general protection fault in hrtimer_nanosleep
  2023-11-02 15:57     ` Thomas Gleixner
@ 2023-11-10  5:00       ` Aleksandr Nogikh
  2023-11-10 17:08         ` Theodore Ts'o
  0 siblings, 1 reply; 7+ messages in thread
From: Aleksandr Nogikh @ 2023-11-10  5:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: syzbot, adilger.kernel, linux-ext4, linux-fsdevel, linux-kernel,
	syzkaller-bugs, tytso

On Thu, Nov 2, 2023 at 8:57 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Thu, Nov 02 2023 at 13:08, Aleksandr Nogikh wrote:
> > On Wed, Nov 1, 2023 at 1:58 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >> Unfortunately repro.syz does not hold up to its name and refuses to
> >> reproduce.
> >
> > For me, on a locally built kernel (gcc 13.2.0) it didn't work either.
> >
> > But, interestingly, it does reproduce using the syzbot-built kernel
> > shared via the "Downloadable assets" [1] in the original report. The
> > repro crashed the kernel in ~1 minute.
> >
> > [1] https://github.com/google/syzkaller/blob/master/docs/syzbot_assets.md
> >
> > [  125.919060][    C0] BUG: KASAN: stack-out-of-bounds in rb_next+0x10a/0x130
> > [  125.921169][    C0] Read of size 8 at addr ffffc900048e7c60 by task
> > kworker/0:1/9
> > [  125.923235][    C0]
> > [  125.923243][    C0] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted
> > 6.6.0-rc7-syzkaller-00142-g888cf78c29e2 #0
> > [  125.924546][    C0] Hardware name: QEMU Standard PC (Q35 + ICH9,
> > 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [  125.926915][    C0] Workqueue: events nsim_dev_trap_report_work
> > [  125.929333][    C0]
> > [  125.929341][    C0] Call Trace:
> > [  125.929350][    C0]  <IRQ>
> > [  125.929356][    C0]  dump_stack_lvl+0xd9/0x1b0
> > [  125.931302][    C0]  print_report+0xc4/0x620
> > [  125.932115][    C0]  ? __virt_addr_valid+0x5e/0x2d0
> > [  125.933194][    C0]  kasan_report+0xda/0x110
> > [  125.934814][    C0]  ? rb_next+0x10a/0x130
> > [  125.936521][    C0]  ? rb_next+0x10a/0x130
> > [  125.936544][    C0]  rb_next+0x10a/0x130
> > [  125.936565][    C0]  timerqueue_del+0xd4/0x140
> > [  125.936590][    C0]  __remove_hrtimer+0x99/0x290
> > [  125.936613][    C0]  __hrtimer_run_queues+0x55b/0xc10
> > [  125.936638][    C0]  ? enqueue_hrtimer+0x310/0x310
> > [  125.936659][    C0]  ? ktime_get_update_offsets_now+0x3bc/0x610
> > [  125.936688][    C0]  hrtimer_interrupt+0x31b/0x800
> > [  125.936715][    C0]  __sysvec_apic_timer_interrupt+0x105/0x3f0
> > [  125.936737][    C0]  sysvec_apic_timer_interrupt+0x8e/0xc0
> > [  125.936755][    C0]  </IRQ>
> > [  125.936759][    C0]  <TASK>
>
> Which is a completely different failure mode.
>
> It explodes in the hrtimer interrupt when dequeuing an hrtimer for
> expiry. That means the corresponding embedded rb_node is corrupted,
> which points to random data corruption.
>
> As you can reproduce (it still fails here with the provided assets),
> does the failure change when you run it several times?

Hmm, it's weird. Maybe I was very lucky that time.

The reproducer does work on the attached disk image, but definitely
not very often. I've just run it 10 times or so and got interleaved
BUG/KFENCE bug reports like this (twice):
https://pastebin.com/W0TkRsnw

These seem to be related to ext4 rather than hrtimers though.

-- 
Aleksandr

>
> Thanks,
>
>         tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [ext4?] general protection fault in hrtimer_nanosleep
  2023-11-10  5:00       ` Aleksandr Nogikh
@ 2023-11-10 17:08         ` Theodore Ts'o
  0 siblings, 0 replies; 7+ messages in thread
From: Theodore Ts'o @ 2023-11-10 17:08 UTC (permalink / raw)
  To: Aleksandr Nogikh
  Cc: Thomas Gleixner, syzbot, adilger.kernel, linux-ext4,
	linux-fsdevel, linux-kernel, syzkaller-bugs

On Thu, Nov 09, 2023 at 09:00:18PM -0800, Aleksandr Nogikh wrote:
> 
> The reproducer does work on the attached disk image, but definitely
> not very often. I've just run it 10 times or so and got interleaved
> BUG/KFENCE bug reports like this (twice):
> https://pastebin.com/W0TkRsnw
> 
> These seem to be related to ext4 rather than hrtimers though.

So what would be nice is if there was a way to ask the syzkaller
tester to use a different config or to change the reproducer somehow
--- for example, is it *really* necessary to twiddle the bluetooth
subsystem, as demonstrated by the spew in the console?

I've certainly spent hours cutting down the reproducer to a simple C
program which is readable by humans, which makes it *clear* the syzbot
minimizer doesn't do a good job.  Why should a time-limited maintainer
spend hours trying to cut down the reproducer, when a robot should be
able to do that for us?  And when often it doesn't reproduce on
anything via syzbot test, but not when run using KVM, this is why we
need to have a simple way of trigger a test where things are as close
as possible to whatever syzbot is using.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* AW: [syzbot] [ext4?] general protection fault in hrtimer_nanosleep
  2023-11-02 12:08   ` Aleksandr Nogikh
  2023-11-02 15:57     ` Thomas Gleixner
@ 2023-11-03 11:17     ` carsten.schmid
  1 sibling, 0 replies; 7+ messages in thread
From: carsten.schmid @ 2023-11-03 11:17 UTC (permalink / raw)
  To: Aleksandr Nogikh, Thomas Gleixner
  Cc: syzbot, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	syzkaller-bugs@googlegroups.com, tytso@mit.edu

Hi,

> [  125.919060][    C0] BUG: KASAN: stack-out-of-bounds in rb_next+0x10a/0x130
> [  125.921169][    C0] Read of size 8 at addr ffffc900048e7c60 by task kworker/0:1/9
> [  125.923235][    C0]
> [  125.923243][    C0] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.0-rc7-syzkaller-00142-g888cf78c29e2 #0
> [  125.924546][    C0] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [  125.926915][    C0] Workqueue: events nsim_dev_trap_report_work
> [  125.929333][    C0]
> [  125.929341][    C0] Call Trace:
> [  125.929350][    C0]  <IRQ>
> [  125.929356][    C0]  dump_stack_lvl+0xd9/0x1b0
> [  125.931302][    C0]  print_report+0xc4/0x620
> [  125.932115][    C0]  ? __virt_addr_valid+0x5e/0x2d0
> [  125.933194][    C0]  kasan_report+0xda/0x110
> [  125.934814][    C0]  ? rb_next+0x10a/0x130
> [  125.936521][    C0]  ? rb_next+0x10a/0x130
> [  125.936544][    C0]  rb_next+0x10a/0x130
> [  125.936565][    C0]  timerqueue_del+0xd4/0x140
> [  125.936590][    C0]  __remove_hrtimer+0x99/0x290
> [  125.936613][    C0]  __hrtimer_run_queues+0x55b/0xc10
> [  125.936638][    C0]  ? enqueue_hrtimer+0x310/0x310
> [  125.936659][    C0]  ? ktime_get_update_offsets_now+0x3bc/0x610
> [  125.936688][    C0]  hrtimer_interrupt+0x31b/0x800
> [  125.936715][    C0]  __sysvec_apic_timer_interrupt+0x105/0x3f0
> [  125.936737][    C0]  sysvec_apic_timer_interrupt+0x8e/0xc0
> [  125.936755][    C0]  </IRQ>
> [  125.936759][    C0]  <TASK>

i had sporadic similar issues with 4.14 kernels (several maturities, .147  .212  .247  .300) in the past 5 years where stack looked quite similar:

[  432.041880] general protection fault: 0000 [#1] PREEMPT SMP NOPTI
[  432.048697] Modules linked in: intel_tfm_governor ecryptfs coretemp i2c_i801 sbi_apl snd_soc_skl sdw_cnl snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress snd_soc_skl_ipc xhci_pci xhci_hcd sdw_bus crc8 ahci snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core libahci snd_hda_core libata snd_pcm usbcore mei_me snd_timer scsi_mod usb_common snd mei soundcore fuse 8021q inap560t(O) i915 video backlight intel_gtt i2c_algo_bit drm_kms_helper drm firmware_class igb_avb(O) ptp hwmon spi_pxa2xx_platform pps_core
[  432.099672] CPU: 3 PID: 5729 Comm: dlt_segmented Tainted: G     U     O    4.14.244-apl #1
[  432.108909] task: 00000000504d2561 task.stack: 000000007d0046fd
[  432.115530] RIP: 0010:rb_erase_cached+0x31/0x3b0
[  432.120683] RSP: 0018:ffffa31d84f77d40 EFLAGS: 00010006
[  432.126517] RAX: 0000000000000001 RBX: ffffa31d84f77e30 RCX: 0000000000000000
[  432.134485] RDX: 0000000000000000 RSI: ffff9ed077c1bb10 RDI: ffffa31d84f77e30
[  432.142456] RBP: ffffa31d84f77d40 R08: ffffa31d84f77e30 R09: 0000a31d80a77c90
[  432.150426] R10: ffff9ed077c1bee0 R11: 0000000000000400 R12: ffff9ed077c1bb10
[  432.158394] R13: 0000000000000000 R14: ffff9ed077c1bac0 R15: 0000000000000000
[  432.166366] FS:  00007ff718cce700(0000) GS:ffff9ed077d80000(0000) knlGS:0000000000000000
[  432.175403] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  432.181819] CR2: 00007ff7182ca3e4 CR3: 000000026175c000 CR4: 00000000003406a0
[  432.189790] Call Trace:
[  432.192526]  timerqueue_del+0x1d/0x40
[  432.196617]  __remove_hrtimer+0x37/0x70
[  432.200898]  hrtimer_try_to_cancel+0xa0/0x120
[  432.205769]  do_nanosleep+0xa9/0x180
[  432.209765]  ? kfree+0x169/0x180
[  432.213370]  hrtimer_nanosleep+0xbb/0x150
[  432.217849]  ? hrtimer_init+0x110/0x110
[  432.222134]  SyS_nanosleep+0x6d/0xa0
[  432.226126]  do_syscall_64+0x79/0x350
[  432.230218]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
[  432.235861] RIP: 0033:0x7ff7199b7240
[  432.239850] RSP: 002b:00007ff718ccddf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000023
[  432.248309] RAX: ffffffffffffffda RBX: 00007ff718ccde20 RCX: 00007ff7199b7240
[  432.256282] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ff718ccde20
[  432.264252] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  432.272222] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe333ec72e
[  432.280190] R13: 00007ffe333ec72f R14: 0000000000802000 R15: 00007ffe333ec730
[  432.288161] Code: 89 f8 4c 8b 4f 08 48 89 e5 4c 8b 57 10 74 0a 48 3b 7e 08 0f 84 a6 02 00 00 4d 85 d2 0f 84 28 02 00 00 4d 85 c9 0f 84 03 02 00 00 <49> 8b 51 10 4c 89 cf 4c 89 c8 48 85 d2 75 0b e9 65 02 00 00 48 
[  432.309346] RIP: rb_erase_cached+0x31/0x3b0 RSP: ffffa31d84f77d40

Looks like it's worth to dig inside that.
Unfortunately i wasn't able to reproduce this, and i'm still not. So i can't help digging but wanted to tell that this seems not to be related to a specific kernel ....

Thanks
Carsten
>>
>> Thanks,
>>
>>         tglx
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-11-11  0:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-01  5:36 [syzbot] [ext4?] general protection fault in hrtimer_nanosleep syzbot
2023-11-01 12:58 ` Thomas Gleixner
2023-11-02 12:08   ` Aleksandr Nogikh
2023-11-02 15:57     ` Thomas Gleixner
2023-11-10  5:00       ` Aleksandr Nogikh
2023-11-10 17:08         ` Theodore Ts'o
2023-11-03 11:17     ` AW: " carsten.schmid

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).