public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Guenter Roeck <linux@roeck-us.net>
Cc: linux-ide@vger.kernel.org, Niklas Cassel <cassel@kernel.org>
Subject: Re: [PATCH v2 1/2] ata: libata-eh: correctly handle deferred qc timeouts
Date: Fri, 6 Mar 2026 09:50:49 +0900	[thread overview]
Message-ID: <68325544-334d-45d6-b406-82cc1379df0e@kernel.org> (raw)
In-Reply-To: <68df8235-dc05-41e2-afdd-42f7143e26ec@roeck-us.net>

On 3/6/26 09:32, Guenter Roeck wrote:
> On 3/5/26 15:59, Damien Le Moal wrote:
>> On 3/6/26 02:59, Guenter Roeck wrote:
>>> Hi,
>>>
>>> On Sat, Feb 21, 2026 at 07:14:38AM +0900, Damien Le Moal wrote:
>>>> A deferred qc may timeout while waiting for the device queue to drain
>>>> to be submitted. In such case, since the qc is not active,
>>>> ata_scsi_cmd_error_handler() ends up calling scsi_eh_finish_cmd(),
>>>> which frees the qc. But as the port deferred_qc field still references
>>>> this finished/freed qc, the deferred qc work may eventually attempt to
>>>> call ata_qc_issue() against this invalid qc, leading to errors such as
>>>> reported by UBSAN (syzbot run):
>>>>
>>>> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
>>>> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
>>>> ...
>>>> Call Trace:
>>>>   <TASK>
>>>>   __dump_stack lib/dump_stack.c:94 [inline]
>>>>   dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
>>>>   ubsan_epilogue+0xa/0x30 lib/ubsan.c:233
>>>>   __ubsan_handle_shift_out_of_bounds+0x279/0x2a0 lib/ubsan.c:494
>>>>   ata_qc_issue.cold+0x38/0x9f drivers/ata/libata-core.c:5166
>>>>   ata_scsi_deferred_qc_work+0x154/0x1f0 drivers/ata/libata-scsi.c:1679
>>>>   process_one_work+0x9d7/0x1920 kernel/workqueue.c:3275
>>>>   process_scheduled_works kernel/workqueue.c:3358 [inline]
>>>>   worker_thread+0x5da/0xe40 kernel/workqueue.c:3439
>>>>   kthread+0x370/0x450 kernel/kthread.c:467
>>>>   ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
>>>>   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>>>>   </TASK>
>>>>
>>>> Fix this by checking if the qc of a timed out SCSI command is a deferred
>>>> one, and in such case, clear the port deferred_qc field and finish the
>>>> SCSI command with DID_TIME_OUT.
>>>>
>>>> Reported-by: syzbot+1f77b8ca15336fff21ff@syzkaller.appspotmail.com
>>>> Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
>>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>>>> Reviewed-by: Hannes Reinecke <hare@suse.de>
>>>> Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
>>>> ---
>>>>   drivers/ata/libata-eh.c | 22 +++++++++++++++++++---
>>>>   1 file changed, 19 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
>>>> index 72a22b6c9682..b373cceb95d2 100644
>>>> --- a/drivers/ata/libata-eh.c
>>>> +++ b/drivers/ata/libata-eh.c
>>>> @@ -640,12 +640,28 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
>>>>   		set_host_byte(scmd, DID_OK);
>>>>   
>>>>   		ata_qc_for_each_raw(ap, qc, i) {
>>>> -			if (qc->flags & ATA_QCFLAG_ACTIVE &&
>>>> -			    qc->scsicmd == scmd)
>>>> +			if (qc->scsicmd != scmd)
>>>> +				continue;
>>>> +			if ((qc->flags & ATA_QCFLAG_ACTIVE) ||
>>>> +			    qc == ap->deferred_qc)
>>>>   				break;
>>>>   		}
>>>>   
>>>> -		if (i < ATA_MAX_QUEUE) {
>>>> +		if (qc == ap->deferred_qc) {
>>>
>>> An experimental AI code review agent tagged this patch with the following
>>> comment.
>>
>> Thanks for that.
>>
>>>    If the `ata_qc_for_each_raw()` loop finishes without finding a matching `scmd`,
>>>    `qc` will hold a pointer to the last element examined (`i == ATA_MAX_QUEUE`).
>>>    If this last element happens to be `ap->deferred_qc`, the condition
>>>    `qc == ap->deferred_qc` evaluates to true despite the loop not breaking on a
>>>    match.
>>>
>>>    Could this mistakenly intercept a command that completed normally after a SCSI
>>>    timeout, returning a timeout error instead of success? Would this also
>>>    incorrectly clear `ap->deferred_qc`, dropping the deferred command?
>>>    Should we verify that the loop actually found a match, for instance by checking
>>>    `if (i < ATA_MAX_QUEUE && qc == ap->deferred_qc)`?
>>
>> Yeah. Something like this. But that condition cannot actually happen. i ==
>> ATA_MAX_QUEUE correspond to internal QCs, and these never go through the
>> deferred path.
>>
> 
> Pardon my ignorance, but doesn't i == ATA_MAX_QUEUE mean that qc points
> to the _previously_ examined qc, i.e., the one associated with i ==
> ATA_MAX_QUEUE - 1 ? Are you saying that this qc will always always be
> internal and never match ap->deferred_qc ?

Arg !!!! Yes ! I screwed up. Checking ata_qc_for_each_raw(), the tag is checked
before setting the qc pointer. So yes, you are absolutely correct, and we have a
problem.

>>> It does seem to be a real problem to me, but I don't know the code well
>>> enough to be sure. Please take a look and let me know if the problem is
>>> real. If so, I'll be happy to submit a patch to fix it. If not, please let
>>> me know what the agent is missing.
>>
>> See above. That is not a real problem, but it would still be good to check. I
>> think your change is fine. Car to send a proper patch ?
>>
> 
> I can try, but if the AI analysis is missing the point I don't really know
> how to describe/explain it. Can you give me some guidance ?

I think the AI is correct. That "if" without the index check can lead to false
positives and we can endup timeout-failing a deferred qc that has not timed out yet.

Send a patch please. As mentioned, we can touch-up the commit message if needed.

> 
> Thanks,
> Guenter
> 


-- 
Damien Le Moal
Western Digital Research

  reply	other threads:[~2026-03-06  0:50 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-20 22:14 [PATCH v2 0/2] ATA port deferred qc fixes Damien Le Moal
2026-02-20 22:14 ` [PATCH v2 1/2] ata: libata-eh: correctly handle deferred qc timeouts Damien Le Moal
2026-02-23 12:09   ` Hannes Reinecke
2026-02-23 17:48   ` Igor Pylypiv
2026-03-05 17:59   ` Guenter Roeck
2026-03-05 23:27     ` Niklas Cassel
2026-03-06  0:11       ` Damien Le Moal
2026-03-06  0:59         ` Damien Le Moal
2026-03-06  8:23           ` Niklas Cassel
2026-03-06  0:14       ` Guenter Roeck
2026-03-06  0:21         ` Damien Le Moal
2026-03-06  0:41           ` Guenter Roeck
2026-03-05 23:59     ` Damien Le Moal
2026-03-06  0:32       ` Guenter Roeck
2026-03-06  0:50         ` Damien Le Moal [this message]
2026-03-06  1:31           ` Guenter Roeck
2026-03-06  8:24           ` Niklas Cassel
2026-02-20 22:14 ` [PATCH v2 2/2] ata: libata-core: fix cancellation of a port deferred qc work Damien Le Moal
2026-02-23 12:09   ` Hannes Reinecke
2026-02-23 17:49   ` Igor Pylypiv
2026-02-24  0:39 ` [PATCH v2 0/2] ATA port deferred qc fixes Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68325544-334d-45d6-b406-82cc1379df0e@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=cassel@kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux@roeck-us.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox