From: Damien Le Moal <dlemoal@kernel.org>
To: Niklas Cassel <cassel@kernel.org>
Cc: syzbot <syzbot+1f77b8ca15336fff21ff@syzkaller.appspotmail.com>,
linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org,
syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue
Date: Fri, 20 Feb 2026 10:06:12 +0900 [thread overview]
Message-ID: <d49ebdba-9925-414f-8889-877adbad6052@kernel.org> (raw)
In-Reply-To: <aZeubyGHf1GX0HPA@ryzen>
On 2/20/26 09:55, Niklas Cassel wrote:
> On Thu, Feb 19, 2026 at 10:33:22AM +0900, Damien Le Moal wrote:
>>>> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
>>>> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
>>>
>>> 4210818301 is 0xfafbfcfd
>>>
>>> 0xfafbfcfd is ATA_TAG_POISON.
>>>
>>> ATA_TAG_POISON is set by ata_qc_free(), so it appears that
>>> ata_scsi_deferred_qc_work() is trying to issue a QC that has
>>> already been freed.
>>
>> I checked the code but I fail to see any path that can lead to this happening.
>> I did more tests using qemu q35 machine as used by syzbot, and everything looks
>> fine. So not sure what is happening here. I will dig further.
>
> Hello Damien,
>
>
> My best guess:
> since qc->tag is ATA_TAG_POISON, ata_qc_free() must have been called
> on ap->deferred_qc.
>
> If it was an NCQ abort, ata_eh_set_pending() would have been called to
> clear ap->deferred_qc. Since ap->deferred_qc is apparently set, it
> appears that we did not get an error IRQ.
>
> To me, that leaves a timeout as the most likely scenario.
Good point. I think the timeout case was completely overlooked...
That should be fairly easy to debug: I just need to add have the deferred work
do nothing to see the deferred qc timeout.
Let me hack something and come up with a fix.
>
> I.e. SCSI EH is called without ata_eh_set_pending() having been called.
> (Currently ata_eh_set_pending() is the function that clears
> ap->deferred_qc)
>
>
>
> If I look at ata_scsi_cmd_error_handler() it will only break if:
>
> if (qc->flags & ATA_QCFLAG_ACTIVE && qc->scsicmd == scmd)
>
> If the deferred QC times out, flag ATA_QCFLAG_ACTIVE will not be set
> (because ATA_QCFLAG_ACTIVE is only set by qc_issue()).
>
> Since ATA_QCFLAG_ACTIVE is not set i == ATA_MAX_QUEUE, so we will enter the
> else clause which calls:
> scsi_eh_finish_cmd(scmd, &ap->eh_done_q);
>
>
> That might potentially free the tag to the block layer to reuse,
> while ap->deferred_qc is still set (with the same tag).
>
> Possibly, next time ata_scsi_qc_issue() is called, ap->deferred_qc is still set,
> so it calls ata_qc_free(qc), which, since it wasn't cleared, might have the same
> tag? because block layer has now reused the tag (since SCSI completed the
> command).
>
> I would possibly have expected some kind of print from SCSI in this case.
> (But since the else clause finishes the command normally, perhaps not?)
>
> But perhaps it is wise to add some code to ata_scsi_cmd_error_handler()
> which clears ap->deferred_qc.
>
>
>
> Another possibility... again, timed out commands will not have called
> ata_eh_set_pending(). scsi_timeout() will call scsi_abort_command()
> which will queue delayed work, and the worker function scmd_eh_abort_handler()
> will call scsi_eh_scmd_add(), which calls
> scsi_host_set_state(shost, SHOST_RECOVERY).
>
> We did add a guard in libata in commit e20e81a24a4d ("ata: libata-core: do not
> issue non-internal commands once EH is pending"), so that we will defer commands
> even when EH is pending. But in the case of timeout, there will be no error IRQ,
> so we will not do an early return in __ata_scsi_queuecmd(), so we could set
> qc->deferred_qc up until the worker function scmd_eh_abort_handler() has called
> scsi_host_set_state(shost, SHOST_RECOVERY).
>
> Again, adding some code to ata_scsi_cmd_error_handler() to clear ap->deferred_qc
> should handle this case.
>
>
> I would probably hack some QEMU to not send a reply, so that we will get block
> layer timeouts, because right now, ata_scsi_cmd_error_handler() seems like the
> most likely problematic code to me.
>
>
> Kind regards,
> Niklas
>
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2026-02-20 1:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-17 20:55 [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue syzbot
2026-02-18 9:45 ` Niklas Cassel
2026-02-19 1:33 ` Damien Le Moal
2026-02-20 0:55 ` Niklas Cassel
2026-02-20 1:06 ` Damien Le Moal [this message]
2026-02-20 9:17 ` Dmitry Vyukov
2026-02-20 9:27 ` Niklas Cassel
2026-02-19 22:44 ` Niklas Cassel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d49ebdba-9925-414f-8889-877adbad6052@kernel.org \
--to=dlemoal@kernel.org \
--cc=cassel@kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=syzbot+1f77b8ca15336fff21ff@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox