From: Damien Le Moal <dlemoal@kernel.org>
To: Guenter Roeck <linux@roeck-us.net>, Niklas Cassel <cassel@kernel.org>
Cc: linux-ide@vger.kernel.org
Subject: Re: [PATCH v2 1/2] ata: libata-eh: correctly handle deferred qc timeouts
Date: Fri, 6 Mar 2026 09:21:26 +0900 [thread overview]
Message-ID: <cbc113d1-55a1-41f5-8481-25c707f2ef86@kernel.org> (raw)
In-Reply-To: <81f2f823-8e66-48d8-bf1c-07b0aa7d49c7@roeck-us.net>
On 3/6/26 09:14, Guenter Roeck wrote:
> On Fri, Mar 06, 2026 at 12:27:34AM +0100, Niklas Cassel wrote:
>> On 5 March 2026 18:59:08 CET, Guenter Roeck <linux@roeck-us.net> wrote:
>>> Hi,
>>>
>>> On Sat, Feb 21, 2026 at 07:14:38AM +0900, Damien Le Moal wrote:
>>>> A deferred qc may timeout while waiting for the device queue to drain
>>>> to be submitted. In such case, since the qc is not active,
>>>> ata_scsi_cmd_error_handler() ends up calling scsi_eh_finish_cmd(),
>>>> which frees the qc. But as the port deferred_qc field still references
>>>> this finished/freed qc, the deferred qc work may eventually attempt to
>>>> call ata_qc_issue() against this invalid qc, leading to errors such as
>>>> reported by UBSAN (syzbot run):
>>>>
>>>> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
>>>> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
>>>> ...
>>>> Call Trace:
>>>> <TASK>
>>>> __dump_stack lib/dump_stack.c:94 [inline]
>>>> dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
>>>> ubsan_epilogue+0xa/0x30 lib/ubsan.c:233
>>>> __ubsan_handle_shift_out_of_bounds+0x279/0x2a0 lib/ubsan.c:494
>>>> ata_qc_issue.cold+0x38/0x9f drivers/ata/libata-core.c:5166
>>>> ata_scsi_deferred_qc_work+0x154/0x1f0 drivers/ata/libata-scsi.c:1679
>>>> process_one_work+0x9d7/0x1920 kernel/workqueue.c:3275
>>>> process_scheduled_works kernel/workqueue.c:3358 [inline]
>>>> worker_thread+0x5da/0xe40 kernel/workqueue.c:3439
>>>> kthread+0x370/0x450 kernel/kthread.c:467
>>>> ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
>>>> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>>>> </TASK>
>>>>
>>>> Fix this by checking if the qc of a timed out SCSI command is a deferred
>>>> one, and in such case, clear the port deferred_qc field and finish the
>>>> SCSI command with DID_TIME_OUT.
>>>>
>>>> Reported-by: syzbot+1f77b8ca15336fff21ff@syzkaller.appspotmail.com
>>>> Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
>>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>>>> Reviewed-by: Hannes Reinecke <hare@suse.de>
>>>> Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
>>>> ---
>>>> drivers/ata/libata-eh.c | 22 +++++++++++++++++++---
>>>> 1 file changed, 19 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
>>>> index 72a22b6c9682..b373cceb95d2 100644
>>>> --- a/drivers/ata/libata-eh.c
>>>> +++ b/drivers/ata/libata-eh.c
>>>> @@ -640,12 +640,28 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
>>>> set_host_byte(scmd, DID_OK);
>>>>
>>>> ata_qc_for_each_raw(ap, qc, i) {
>>>> - if (qc->flags & ATA_QCFLAG_ACTIVE &&
>>>> - qc->scsicmd == scmd)
>>>> + if (qc->scsicmd != scmd)
>>>> + continue;
>>>> + if ((qc->flags & ATA_QCFLAG_ACTIVE) ||
>>>> + qc == ap->deferred_qc)
>>>> break;
>>>> }
>>>>
>>>> - if (i < ATA_MAX_QUEUE) {
>>>> + if (qc == ap->deferred_qc) {
>>>
>>> An experimental AI code review agent tagged this patch with the following
>>> comment.
>>>
>>> If the `ata_qc_for_each_raw()` loop finishes without finding a matching `scmd`,
>>> `qc` will hold a pointer to the last element examined (`i == ATA_MAX_QUEUE`).
>>
>> I think the AI is wrong here.
>>
>> That last element assigned to QC will be ATA_MAX_QUEUE - 1.
>>
>
> I think that is what it means with "`qc` will hold a pointer to the
> last element examined". The "(`i == ATA_MAX_QUEUE`) part is a bit
> confusing.
>
> I think what it is trying to say is that if i == ATA_MAX_QUEUE,
> qc would point to the last examined element, which would not
> have ATA_QCFLAG_ACTIVE set because otherwise it would have
> exited the loop. Yet, ap->deferred_qc could be set, and the
> if statement would be true even though i == ATA_MAX_QUEUE
> and there was no qc match.
>
>>
>>> If this last element happens to be `ap->deferred_qc`, the condition
>>> `qc == ap->deferred_qc` evaluates to true despite the loop not breaking on a
>>> match.
>>>
>
> That is pretty much much repeating what I said above, without
> the confusing "(`i == ATA_MAX_QUEUE`)" part.
>
>>> Could this mistakenly intercept a command that completed normally after a SCSI
>>> timeout, returning a timeout error instead of success? Would this also
>>> incorrectly clear `ap->deferred_qc`, dropping the deferred command?
>>
>
> This part is beyond my understanding, primarily because I don't know
> what "qc->deferred" actually refers to.
There are 2 types of ATA commands: queueable ones (NCQ == Native Command
Queueing) and non-queueable ones (legacy/old ATA commands). The 2 types cannot
be mixed. When NCQ commands are on-going, you cannot issue a non-NCQ command,
and vice-versa. This has always been handled with command requeueing in
libata-scsi (since forever), but with blk-mq introduction, there was a potential
command starvation issue for non-NCQ commands that has existed for a long time.
We fixed that recently by keeping on hand any non-NCQ command that must wait for
on-going NCQ commands to complete first. This is ap->deferred_qc.
>
>> I think the AI is partially wrong here.
>>
>> If you read the comment below it if (), we know that ap->deferred_qc is only set until that command has been issued. So if it is set, that qc has not been issued, so it can't have successfully completed.
>>
>> But... Since we don't verify that i < ATA_MAX_QUEUE, we might end up completing the deferred QC as a failed command, even though it did not time out...
>>
>> On NCQ error, we complete the deferred QC as a failed command.
>>
>> However, if there was a timeout of a command, which was not the deferred QC, but the deferred QC did not timeout, I think it is wrong to complete the deferred QC as a failed command.
>>
>> So... I actually think that the change suggested by the AI is something we want.
>> (Especially after Damien commit queued in for-next where we will not invoke error_handler() if there were no timed out commands.)
>>
>
> So should I send a patch, or do you want to handle it ?
> It might be better if you handle it since I don't know
> how to exactly describe the problem differently than the AI.
Send a patch. Write a commit message based on the information I sent in my
previous email. We can correct the commit message if needed.
>
> Thanks,
> Guenter
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2026-03-06 0:21 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-20 22:14 [PATCH v2 0/2] ATA port deferred qc fixes Damien Le Moal
2026-02-20 22:14 ` [PATCH v2 1/2] ata: libata-eh: correctly handle deferred qc timeouts Damien Le Moal
2026-02-23 12:09 ` Hannes Reinecke
2026-02-23 17:48 ` Igor Pylypiv
2026-03-05 17:59 ` Guenter Roeck
2026-03-05 23:27 ` Niklas Cassel
2026-03-06 0:11 ` Damien Le Moal
2026-03-06 0:59 ` Damien Le Moal
2026-03-06 8:23 ` Niklas Cassel
2026-03-06 0:14 ` Guenter Roeck
2026-03-06 0:21 ` Damien Le Moal [this message]
2026-03-06 0:41 ` Guenter Roeck
2026-03-05 23:59 ` Damien Le Moal
2026-03-06 0:32 ` Guenter Roeck
2026-03-06 0:50 ` Damien Le Moal
2026-03-06 1:31 ` Guenter Roeck
2026-03-06 8:24 ` Niklas Cassel
2026-02-20 22:14 ` [PATCH v2 2/2] ata: libata-core: fix cancellation of a port deferred qc work Damien Le Moal
2026-02-23 12:09 ` Hannes Reinecke
2026-02-23 17:49 ` Igor Pylypiv
2026-02-24 0:39 ` [PATCH v2 0/2] ATA port deferred qc fixes Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cbc113d1-55a1-41f5-8481-25c707f2ef86@kernel.org \
--to=dlemoal@kernel.org \
--cc=cassel@kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux@roeck-us.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox