[syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue
@ 2026-02-17 20:55 syzbot
  2026-02-18  9:45 ` Niklas Cassel
  2026-02-19 22:44 ` Niklas Cassel
  0 siblings, 2 replies; 8+ messages in thread
From: syzbot @ 2026-02-17 20:55 UTC (permalink / raw)
  To: cassel, dlemoal, linux-ide, linux-kernel, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    ca4ee40bf13d Partly revert "drm/hyperv: Remove reference t..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13c6c722580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a771bfd268751cd6
dashboard link: https://syzkaller.appspot.com/bug?extid=1f77b8ca15336fff21ff
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-ca4ee40b.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c714adf37ddd/vmlinux-ca4ee40b.xz
kernel image: https://storage.googleapis.com/syzbot-assets/4d56cd9f6175/bzImage-ca4ee40b.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+1f77b8ca15336fff21ff@syzkaller.appspotmail.com

------------[ cut here ]------------
UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
CPU: 2 UID: 0 PID: 1282 Comm: kworker/2:1H Tainted: G             L      syzkaller #0 PREEMPT(full) 
Tainted: [L]=SOFTLOCKUP
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Workqueue: events_highpri ata_scsi_deferred_qc_work
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
 ubsan_epilogue+0xa/0x30 lib/ubsan.c:233
 __ubsan_handle_shift_out_of_bounds+0x279/0x2a0 lib/ubsan.c:494
 ata_qc_issue.cold+0x38/0x9f drivers/ata/libata-core.c:5166
 ata_scsi_deferred_qc_work+0x154/0x1f0 drivers/ata/libata-scsi.c:1679
 process_one_work+0x9d7/0x1920 kernel/workqueue.c:3275
 process_scheduled_works kernel/workqueue.c:3358 [inline]
 worker_thread+0x5da/0xe40 kernel/workqueue.c:3439
 kthread+0x370/0x450 kernel/kthread.c:467
 ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
---[ end trace ]---
Kernel panic - not syncing: UBSAN: panic_on_warn set ...
CPU: 2 UID: 0 PID: 1282 Comm: kworker/2:1H Tainted: G             L      syzkaller #0 PREEMPT(full) 
Tainted: [L]=SOFTLOCKUP
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Workqueue: events_highpri ata_scsi_deferred_qc_work
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
 vpanic+0x552/0x970 kernel/panic.c:650
 panic+0xd1/0xe0 kernel/panic.c:787
 check_panic_on_warn kernel/panic.c:524 [inline]
 check_panic_on_warn.cold+0x19/0x34 kernel/panic.c:519
 __ubsan_handle_shift_out_of_bounds+0x279/0x2a0 lib/ubsan.c:494
 ata_qc_issue.cold+0x38/0x9f drivers/ata/libata-core.c:5166
 ata_scsi_deferred_qc_work+0x154/0x1f0 drivers/ata/libata-scsi.c:1679
 process_one_work+0x9d7/0x1920 kernel/workqueue.c:3275
 process_scheduled_works kernel/workqueue.c:3358 [inline]
 worker_thread+0x5da/0xe40 kernel/workqueue.c:3439
 kthread+0x370/0x450 kernel/kthread.c:467
 ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue
  2026-02-17 20:55 [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue syzbot
@ 2026-02-18  9:45 ` Niklas Cassel
  2026-02-19  1:33   ` Damien Le Moal
  2026-02-19 22:44 ` Niklas Cassel
  1 sibling, 1 reply; 8+ messages in thread
From: Niklas Cassel @ 2026-02-18  9:45 UTC (permalink / raw)
  To: syzbot; +Cc: dlemoal, linux-ide, linux-kernel, syzkaller-bugs

On Tue, Feb 17, 2026 at 12:55:35PM -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    ca4ee40bf13d Partly revert "drm/hyperv: Remove reference t..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=13c6c722580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=a771bfd268751cd6
> dashboard link: https://syzkaller.appspot.com/bug?extid=1f77b8ca15336fff21ff
> compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-ca4ee40b.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/c714adf37ddd/vmlinux-ca4ee40b.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/4d56cd9f6175/bzImage-ca4ee40b.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+1f77b8ca15336fff21ff@syzkaller.appspotmail.com
> 
> ------------[ cut here ]------------
> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'

4210818301 is 0xfafbfcfd

0xfafbfcfd is ATA_TAG_POISON.

ATA_TAG_POISON is set by ata_qc_free(), so it appears that
ata_scsi_deferred_qc_work() is trying to issue a QC that has
already been freed.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue
  2026-02-18  9:45 ` Niklas Cassel
@ 2026-02-19  1:33   ` Damien Le Moal
  2026-02-20  0:55     ` Niklas Cassel
  0 siblings, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2026-02-19  1:33 UTC (permalink / raw)
  To: Niklas Cassel, syzbot; +Cc: linux-ide, linux-kernel, syzkaller-bugs

On 2/18/26 6:45 PM, Niklas Cassel wrote:
> On Tue, Feb 17, 2026 at 12:55:35PM -0800, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    ca4ee40bf13d Partly revert "drm/hyperv: Remove reference t..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=13c6c722580000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=a771bfd268751cd6
>> dashboard link: https://syzkaller.appspot.com/bug?extid=1f77b8ca15336fff21ff
>> compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> Downloadable assets:
>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-ca4ee40b.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/c714adf37ddd/vmlinux-ca4ee40b.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/4d56cd9f6175/bzImage-ca4ee40b.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+1f77b8ca15336fff21ff@syzkaller.appspotmail.com
>>
>> ------------[ cut here ]------------
>> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
>> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
> 
> 4210818301 is 0xfafbfcfd
> 
> 0xfafbfcfd is ATA_TAG_POISON.
> 
> ATA_TAG_POISON is set by ata_qc_free(), so it appears that
> ata_scsi_deferred_qc_work() is trying to issue a QC that has
> already been freed.

I checked the code but I fail to see any path that can lead to this happening.
I did more tests using qemu q35 machine as used by syzbot, and everything looks
fine. So not sure what is happening here. I will dig further.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue
  2026-02-17 20:55 [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue syzbot
  2026-02-18  9:45 ` Niklas Cassel
@ 2026-02-19 22:44 ` Niklas Cassel
  1 sibling, 0 replies; 8+ messages in thread
From: Niklas Cassel @ 2026-02-19 22:44 UTC (permalink / raw)
  To: syzbot, syzkaller; +Cc: dlemoal, linux-ide, linux-kernel, syzkaller-bugs

Hello syzkaller folks,

We syzkaller seems to have found a bug that it can reproduce very easily.

Looking at the dashboard for this bug:
https://syzkaller.appspot.com/bug?extid=1f77b8ca15336fff21ff

It has so far been reproduced 4 times in 3 days.

However, there is no reproducer yet.

Any advice on how we can try to trigger this without an exact reproducer
available yet?


Kind regards,
Niklas


On Tue, Feb 17, 2026 at 12:55:35PM -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    ca4ee40bf13d Partly revert "drm/hyperv: Remove reference t..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=13c6c722580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=a771bfd268751cd6
> dashboard link: https://syzkaller.appspot.com/bug?extid=1f77b8ca15336fff21ff
> compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> 
> Unfortunately, I don't have any reproducer for this issue yet.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue
  2026-02-19  1:33   ` Damien Le Moal
@ 2026-02-20  0:55     ` Niklas Cassel
  2026-02-20  1:06       ` Damien Le Moal
  0 siblings, 1 reply; 8+ messages in thread
From: Niklas Cassel @ 2026-02-20  0:55 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: syzbot, linux-ide, linux-kernel, syzkaller-bugs

On Thu, Feb 19, 2026 at 10:33:22AM +0900, Damien Le Moal wrote:
> >> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
> >> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
> > 
> > 4210818301 is 0xfafbfcfd
> > 
> > 0xfafbfcfd is ATA_TAG_POISON.
> > 
> > ATA_TAG_POISON is set by ata_qc_free(), so it appears that
> > ata_scsi_deferred_qc_work() is trying to issue a QC that has
> > already been freed.
> 
> I checked the code but I fail to see any path that can lead to this happening.
> I did more tests using qemu q35 machine as used by syzbot, and everything looks
> fine. So not sure what is happening here. I will dig further.

Hello Damien,

My best guess:
since qc->tag is ATA_TAG_POISON, ata_qc_free() must have been called
on ap->deferred_qc.

If it was an NCQ abort, ata_eh_set_pending() would have been called to
clear ap->deferred_qc. Since ap->deferred_qc is apparently set, it
appears that we did not get an error IRQ.

To me, that leaves a timeout as the most likely scenario.

I.e. SCSI EH is called without ata_eh_set_pending() having been called.
(Currently ata_eh_set_pending() is the function that clears
ap->deferred_qc)

If I look at ata_scsi_cmd_error_handler() it will only break if:

if (qc->flags & ATA_QCFLAG_ACTIVE && qc->scsicmd == scmd)

If the deferred QC times out, flag ATA_QCFLAG_ACTIVE will not be set
(because ATA_QCFLAG_ACTIVE is only set by qc_issue()).

Since ATA_QCFLAG_ACTIVE is not set i == ATA_MAX_QUEUE, so we will enter the
else clause which calls:
scsi_eh_finish_cmd(scmd, &ap->eh_done_q);

That might potentially free the tag to the block layer to reuse,
while ap->deferred_qc is still set (with the same tag).

Possibly, next time ata_scsi_qc_issue() is called, ap->deferred_qc is still set,
so it calls ata_qc_free(qc), which, since it wasn't cleared, might have the same
tag? because block layer has now reused the tag (since SCSI completed the
command).

I would possibly have expected some kind of print from SCSI in this case.
(But since the else clause finishes the command normally, perhaps not?)

But perhaps it is wise to add some code to ata_scsi_cmd_error_handler()
which clears ap->deferred_qc.

Another possibility... again, timed out commands will not have called
ata_eh_set_pending(). scsi_timeout() will call scsi_abort_command()
which will queue delayed work, and the worker function scmd_eh_abort_handler()
will call scsi_eh_scmd_add(), which calls
scsi_host_set_state(shost, SHOST_RECOVERY).

We did add a guard in libata in commit e20e81a24a4d ("ata: libata-core: do not
issue non-internal commands once EH is pending"), so that we will defer commands
even when EH is pending. But in the case of timeout, there will be no error IRQ,
so we will not do an early return in __ata_scsi_queuecmd(), so we could set
qc->deferred_qc up until the worker function scmd_eh_abort_handler() has called
scsi_host_set_state(shost, SHOST_RECOVERY).

Again, adding some code to ata_scsi_cmd_error_handler() to clear ap->deferred_qc
should handle this case.

I would probably hack some QEMU to not send a reply, so that we will get block
layer timeouts, because right now, ata_scsi_cmd_error_handler() seems like the
most likely problematic code to me.

Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue
  2026-02-20  0:55     ` Niklas Cassel
@ 2026-02-20  1:06       ` Damien Le Moal
  2026-02-20  9:17         ` Dmitry Vyukov
  0 siblings, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2026-02-20  1:06 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: syzbot, linux-ide, linux-kernel, syzkaller-bugs

On 2/20/26 09:55, Niklas Cassel wrote:
> On Thu, Feb 19, 2026 at 10:33:22AM +0900, Damien Le Moal wrote:
>>>> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
>>>> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
>>>
>>> 4210818301 is 0xfafbfcfd
>>>
>>> 0xfafbfcfd is ATA_TAG_POISON.
>>>
>>> ATA_TAG_POISON is set by ata_qc_free(), so it appears that
>>> ata_scsi_deferred_qc_work() is trying to issue a QC that has
>>> already been freed.
>>
>> I checked the code but I fail to see any path that can lead to this happening.
>> I did more tests using qemu q35 machine as used by syzbot, and everything looks
>> fine. So not sure what is happening here. I will dig further.
> 
> Hello Damien,
> 
> 
> My best guess:
> since qc->tag is ATA_TAG_POISON, ata_qc_free() must have been called
> on ap->deferred_qc.
> 
> If it was an NCQ abort, ata_eh_set_pending() would have been called to
> clear ap->deferred_qc. Since ap->deferred_qc is apparently set, it
> appears that we did not get an error IRQ.
> 
> To me, that leaves a timeout as the most likely scenario.

Good point. I think the timeout case was completely overlooked...
That should be fairly easy to debug: I just need to add have the deferred work
do nothing to see the deferred qc timeout.

Let me hack something and come up with a fix.

> 
> I.e. SCSI EH is called without ata_eh_set_pending() having been called.
> (Currently ata_eh_set_pending() is the function that clears
> ap->deferred_qc)
> 
> 
> 
> If I look at ata_scsi_cmd_error_handler() it will only break if:
> 
> if (qc->flags & ATA_QCFLAG_ACTIVE && qc->scsicmd == scmd)
> 
> If the deferred QC times out, flag ATA_QCFLAG_ACTIVE will not be set
> (because ATA_QCFLAG_ACTIVE is only set by qc_issue()).
> 
> Since ATA_QCFLAG_ACTIVE is not set i == ATA_MAX_QUEUE, so we will enter the
> else clause which calls:
> scsi_eh_finish_cmd(scmd, &ap->eh_done_q);
> 
> 
> That might potentially free the tag to the block layer to reuse,
> while ap->deferred_qc is still set (with the same tag).
> 
> Possibly, next time ata_scsi_qc_issue() is called, ap->deferred_qc is still set,
> so it calls ata_qc_free(qc), which, since it wasn't cleared, might have the same
> tag? because block layer has now reused the tag (since SCSI completed the
> command).
> 
> I would possibly have expected some kind of print from SCSI in this case.
> (But since the else clause finishes the command normally, perhaps not?)
> 
> But perhaps it is wise to add some code to ata_scsi_cmd_error_handler()
> which clears ap->deferred_qc.
> 
> 
> 
> Another possibility... again, timed out commands will not have called
> ata_eh_set_pending(). scsi_timeout() will call scsi_abort_command()
> which will queue delayed work, and the worker function scmd_eh_abort_handler()
> will call scsi_eh_scmd_add(), which calls
> scsi_host_set_state(shost, SHOST_RECOVERY).
> 
> We did add a guard in libata in commit e20e81a24a4d ("ata: libata-core: do not
> issue non-internal commands once EH is pending"), so that we will defer commands
> even when EH is pending. But in the case of timeout, there will be no error IRQ,
> so we will not do an early return in __ata_scsi_queuecmd(), so we could set
> qc->deferred_qc up until the worker function scmd_eh_abort_handler() has called
> scsi_host_set_state(shost, SHOST_RECOVERY).
> 
> Again, adding some code to ata_scsi_cmd_error_handler() to clear ap->deferred_qc
> should handle this case.
> 
> 
> I would probably hack some QEMU to not send a reply, so that we will get block
> layer timeouts, because right now, ata_scsi_cmd_error_handler() seems like the
> most likely problematic code to me.
> 
> 
> Kind regards,
> Niklas
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue
  2026-02-20  1:06       ` Damien Le Moal
@ 2026-02-20  9:17         ` Dmitry Vyukov
  2026-02-20  9:27           ` Niklas Cassel
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2026-02-20  9:17 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Niklas Cassel, syzbot, linux-ide, linux-kernel, syzkaller-bugs,
	syzkaller

On Fri, 20 Feb 2026 at 02:06, 'Damien Le Moal' via syzkaller-bugs
<syzkaller-bugs@googlegroups.com> wrote:
>
> On 2/20/26 09:55, Niklas Cassel wrote:
> > On Thu, Feb 19, 2026 at 10:33:22AM +0900, Damien Le Moal wrote:
> >>>> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
> >>>> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
> >>>
> >>> 4210818301 is 0xfafbfcfd
> >>>
> >>> 0xfafbfcfd is ATA_TAG_POISON.
> >>>
> >>> ATA_TAG_POISON is set by ata_qc_free(), so it appears that
> >>> ata_scsi_deferred_qc_work() is trying to issue a QC that has
> >>> already been freed.
> >>
> >> I checked the code but I fail to see any path that can lead to this happening.
> >> I did more tests using qemu q35 machine as used by syzbot, and everything looks
> >> fine. So not sure what is happening here. I will dig further.
> >
> > Hello Damien,
> >
> >
> > My best guess:
> > since qc->tag is ATA_TAG_POISON, ata_qc_free() must have been called
> > on ap->deferred_qc.
> >
> > If it was an NCQ abort, ata_eh_set_pending() would have been called to
> > clear ap->deferred_qc. Since ap->deferred_qc is apparently set, it
> > appears that we did not get an error IRQ.
> >
> > To me, that leaves a timeout as the most likely scenario.
>
> Good point. I think the timeout case was completely overlooked...
> That should be fairly easy to debug: I just need to add have the deferred work
> do nothing to see the deferred qc timeout.
>
> Let me hack something and come up with a fix.
>
> >
> > I.e. SCSI EH is called without ata_eh_set_pending() having been called.
> > (Currently ata_eh_set_pending() is the function that clears
> > ap->deferred_qc)
> >
> >
> >
> > If I look at ata_scsi_cmd_error_handler() it will only break if:
> >
> > if (qc->flags & ATA_QCFLAG_ACTIVE && qc->scsicmd == scmd)
> >
> > If the deferred QC times out, flag ATA_QCFLAG_ACTIVE will not be set
> > (because ATA_QCFLAG_ACTIVE is only set by qc_issue()).
> >
> > Since ATA_QCFLAG_ACTIVE is not set i == ATA_MAX_QUEUE, so we will enter the
> > else clause which calls:
> > scsi_eh_finish_cmd(scmd, &ap->eh_done_q);
> >
> >
> > That might potentially free the tag to the block layer to reuse,
> > while ap->deferred_qc is still set (with the same tag).
> >
> > Possibly, next time ata_scsi_qc_issue() is called, ap->deferred_qc is still set,
> > so it calls ata_qc_free(qc), which, since it wasn't cleared, might have the same
> > tag? because block layer has now reused the tag (since SCSI completed the
> > command).
> >
> > I would possibly have expected some kind of print from SCSI in this case.
> > (But since the else clause finishes the command normally, perhaps not?)
> >
> > But perhaps it is wise to add some code to ata_scsi_cmd_error_handler()
> > which clears ap->deferred_qc.
> >
> >
> >
> > Another possibility... again, timed out commands will not have called
> > ata_eh_set_pending(). scsi_timeout() will call scsi_abort_command()
> > which will queue delayed work, and the worker function scmd_eh_abort_handler()
> > will call scsi_eh_scmd_add(), which calls
> > scsi_host_set_state(shost, SHOST_RECOVERY).
> >
> > We did add a guard in libata in commit e20e81a24a4d ("ata: libata-core: do not
> > issue non-internal commands once EH is pending"), so that we will defer commands
> > even when EH is pending. But in the case of timeout, there will be no error IRQ,
> > so we will not do an early return in __ata_scsi_queuecmd(), so we could set
> > qc->deferred_qc up until the worker function scmd_eh_abort_handler() has called
> > scsi_host_set_state(shost, SHOST_RECOVERY).
> >
> > Again, adding some code to ata_scsi_cmd_error_handler() to clear ap->deferred_qc
> > should handle this case.
> >
> >
> > I would probably hack some QEMU to not send a reply, so that we will get block
> > layer timeouts, because right now, ata_scsi_cmd_error_handler() seems like the
> > most likely problematic code to me.

Hi,

Some info I can infer from these 4 crashes.

There is some kind of race, or very rare timing is likely to be
involved. Only 4 crashes is not much. Usually the fuzzer triggers them
more often.

The crash happens in kworker, this makes it impossible to infer when
test programs may be involved.

In all 4 cases there is a preceding USB disconnect message:
[  644.391966][ T5992] usb 11-1: USB disconnect, device number 24
It may be related. These devices can be connected via USB, right?

Unfortunately, I cannot infer much more.
These USB device numbers may theoretically allow to infer the test
program, but I think it's currently not possible.

It may be possible to reply these logs for longer to see if they
trigger the crash.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue
  2026-02-20  9:17         ` Dmitry Vyukov
@ 2026-02-20  9:27           ` Niklas Cassel
  0 siblings, 0 replies; 8+ messages in thread
From: Niklas Cassel @ 2026-02-20  9:27 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Damien Le Moal, syzbot, linux-ide, linux-kernel, syzkaller-bugs,
	syzkaller

Hello Dmitry,

On Fri, Feb 20, 2026 at 10:17:05AM +0100, Dmitry Vyukov wrote:
> Some info I can infer from these 4 crashes.
> 
> There is some kind of race, or very rare timing is likely to be
> involved. Only 4 crashes is not much. Usually the fuzzer triggers them
> more often.
> 
> The crash happens in kworker, this makes it impossible to infer when
> test programs may be involved.
> 
> In all 4 cases there is a preceding USB disconnect message:
> [  644.391966][ T5992] usb 11-1: USB disconnect, device number 24
> It may be related. These devices can be connected via USB, right?
> 
> Unfortunately, I cannot infer much more.
> These USB device numbers may theoretically allow to infer the test
> program, but I think it's currently not possible.
> 
> It may be possible to reply these logs for longer to see if they
> trigger the crash.

It seems that my suspicion that the bug occurs after a block layer timeout,
was correct.

Damien managed to reproduce the bug and have sent a fix:
https://lore.kernel.org/linux-ide/20260220050053.390135-1-dlemoal@kernel.org/T/#t

A lot of thanks to syzbot for finding this bug that we failed to find
during review.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-02-20  9:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-17 20:55 [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue syzbot
2026-02-18  9:45 ` Niklas Cassel
2026-02-19  1:33   ` Damien Le Moal
2026-02-20  0:55     ` Niklas Cassel
2026-02-20  1:06       ` Damien Le Moal
2026-02-20  9:17         ` Dmitry Vyukov
2026-02-20  9:27           ` Niklas Cassel
2026-02-19 22:44 ` Niklas Cassel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox