All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Pylypiv <ipylypiv@google.com>
To: Niklas Cassel <cassel@kernel.org>
Cc: Damien Le Moal <dlemoal@kernel.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Hannes Reinecke <hare@suse.de>,
	linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ata: libata-eh: Clear scsicmd->result when setting SAM_STAT_CHECK_CONDITION
Date: Fri, 6 Sep 2024 18:21:43 +0000	[thread overview]
Message-ID: <ZttIN8He8TOZ7Lct@google.com> (raw)
In-Reply-To: <Ztls4mim6Jky7E0S@ryzen.lan>

On Thu, Sep 05, 2024 at 10:33:38AM +0200, Niklas Cassel wrote:

Hi Niklas,

Thank you so much for a thorough review and coming up with a better patch!

> Hello Igor,
> 
> On Wed, Sep 04, 2024 at 10:37:27PM +0000, Igor Pylypiv wrote:
> > commit 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to
> > not set CHECK_CONDITION") changed the way how SAM_STAT_CHECK_CONDITION is
> > set. Assignment "scmd->result = SAM_STAT_CHECK_CONDITION;" was replaced by
> > set_status_byte() which does not clear the scsicmd->result.
> 
> "which does not clear the scsicmd->result"
> 
> scsicmd->result is a combination of:
> -SCSI status byte
> -SCSI ML byte
> -host byte
> 
> Please be more specific of which byte(s) that you want to clear,
> both here and in other places in the commit message.
> 
> 
> > 
> > By not clearing the scsicmd->result we end up in a state where both
> > the DID_TIME_OUT host byte and the SAM_STAT_CHECK_CONDITION status
> > bytes are set.
> > 
> 
> 
> > The DID_TIME_OUT host byte is getting set a part of error handling:
> > 
> > ata_qc_complete()
> >     ata_qc_schedule_eh()
> >         blk_abort_request()
> >             WRITE_ONCE(req->deadline, jiffies);
> > 
> > blk_mq_timeout_work()
> >     blk_mq_check_expired()
> >         blk_mq_rq_timed_out()
> > 	    req->q->mq_ops->timeout() / scsi_timeout()
> >                 set_host_byte(scmd, DID_TIME_OUT);
> 
> I would have reorder your commit log and have this first in the commit log.
> 
> 
> > 
> > Having the host byte set to DID_TIME_OUT for a command that didn't timeout
> > is confusing. Let's bring the old behavior back by setting scmd->result to
> > SAM_STAT_CHECK_CONDITION.
> 
> I think you are missing something very important in the commit log here:
> What is the user visible change before and after your change?
>

For ATA PT commands the 'result' value is being returned to user space so
not clearing the DID_TIME_OUT is user visible.

> Is there a difference is the error message in dmesg? If not, that should
> be mentioned as well.
> 

I haven't seen any dmesg errors reported for ATA PT hence I don't think there
is any difference in dmesg.

> 
> > 
> > Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION")
> > Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
> > ---
> >  drivers/ata/libata-eh.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> > index 214b935c2ced..4927b40e782f 100644
> > --- a/drivers/ata/libata-eh.c
> > +++ b/drivers/ata/libata-eh.c
> > @@ -1605,7 +1605,7 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc)
> >  		 */
> >  		if (!(qc->flags & ATA_QCFLAG_SENSE_VALID) &&
> >  		    (stat & ATA_SENSE) && ata_eh_request_sense(qc))
> > -			set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION);
> > +			qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
> 
> ata_eh_analyze_tf() will only be called on commands that are owned by EH
> (has ATA_QCFLAG_EH set).
> 
> Thus this command will end up in:
> ata_eh_finish() -> ata_eh_qc_complete() -> __ata_eh_qc_complete() ->
> -> __ata_qc_complete() -> qc->complete_fn().
> 
> complete_fn will be (except in special cases): ata_scsi_qc_complete()
> If you look at ata_scsi_qc_complete(), it already clears the host byte:
> https://github.com/torvalds/linux/blob/v6.11-rc6/drivers/ata/libata-scsi.c#L1695-L1696
>
> So could you please be more specific of what problem this change is fixing?
> Is it just that you think that it makes sense to clear the host byte earlier
> in the call chain?

I should have mentioned that the issue is specific to ATA PT commands.
The problem is that ata_scsi_qc_complete() is not clearing the host byte
for ATA PT commands causing the DID_TIME_OUT to be returned.

> 
> There are many different paths a QC can take via EH, e.g. policy 0xD NCQ
> commands will not fetch sense data via ata_eh_request_sense(), so clearing
> the host byte in ata_scsi_qc_complete() should be fine, otherwise we need
> to do a similar change to yours in all the different code paths that sets
> sense data ...which might actually be something that makes sense, but then
> I would expect a patch series that changes all the locations where we set
> sense data, not just in ata_eh_analyze_tf(), and then drops the clearing in
> ata_scsi_qc_complete() (which was introduced in commit 7574a8377c7a ("ata:
> libata-scsi: do not overwrite SCSI ML and status bytes")).
> 
> 
> See this patch to see all the places where we set sense data:
> https://git.kernel.org/pub/scm/linux/kernel/git/libata/linux.git/commit/?h=for-6.12&id=9526dec226f0779d72f798e7a18375bf8d414775
> 
> 
> 
> 
> Side note:
> You might also be interested to know that a command that was sent via EH will
> be finished using scsi_eh_finish_cmd() (called by __ata_eh_qc_complete()),
> and will thus end up in scsi_eh_flush_done_q(). A command that has sense data
> will have scmd->retries == scmd->allowed (set by ata_eh_qc_complete()), so you
> will end up in this code path:
> https://github.com/torvalds/linux/blob/v6.11-rc6/drivers/scsi/scsi_error.c#L2213-L2227
> 
> Which means that SCSI EH will set DID_TIME_OUT for any command that does
> not have (SCSI ML byte || SCSI status byte || host byte) set.
> 
> A command with sense data will have most often have CHECK_CONDITION set, but
> there is also CDL policy 0xD commands (which will have the SCSI ML byte set).
> So this special flag SCMD_FORCE_EH_SUCCESS is for commands that were completed
> successfully without any SK/ASC/ASCQ in the same interrupt as other policy 0xD
> commands which did have SK/ASC/ASCQ set.
> For details, see 3d848ca1ebc8 ("scsi: core: Allow libata to complete successful
> commands via EH").
> 
> 
> Kind regards,
> Niklas

      parent reply	other threads:[~2024-09-06 18:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-04 22:37 [PATCH] ata: libata-eh: Clear scsicmd->result when setting SAM_STAT_CHECK_CONDITION Igor Pylypiv
2024-09-04 23:27 ` Igor Pylypiv
2024-09-05  8:33 ` Niklas Cassel
2024-09-05  9:25   ` Niklas Cassel
2024-09-05 12:40     ` Niklas Cassel
2024-09-05 19:28       ` Niklas Cassel
2024-09-06 18:21   ` Igor Pylypiv [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZttIN8He8TOZ7Lct@google.com \
    --to=ipylypiv@google.com \
    --cc=cassel@kernel.org \
    --cc=dlemoal@kernel.org \
    --cc=hare@suse.de \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.