public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Yan <yanaijie@huawei.com>
To: John Garry <john.g.garry@oracle.com>,
	Xingui Yang <yangxingui@huawei.com>, <jejb@linux.ibm.com>,
	<martin.petersen@oracle.com>, <damien.lemoal@opensource.wdc.com>,
	<linux-ide@vger.kernel.org>, <hare@suse.com>, <hch@lst.de>
Cc: <linux-scsi@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linuxarm@huawei.com>, <prime.zeng@hisilicon.com>,
	<kangfenglong@huawei.com>
Subject: Re: [PATCH V2] scsi: libsas: Directly kick-off EH when ATA device fell off
Date: Tue, 20 Dec 2022 10:39:01 +0800	[thread overview]
Message-ID: <a50ed139-c82f-6d07-ae76-a690cada2c90@huawei.com> (raw)
In-Reply-To: <565fcf28-ec53-8d74-00a3-94be8e5b60e4@oracle.com>

On 2022/12/19 17:23, John Garry wrote:
> On 16/12/2022 10:03, Xingui Yang wrote:
>> If the ATA device fell off, call sas_ata_device_link_abort() directly and
>> mark all outstanding QCs as failed and kick-off EH Immediately. This 
>> avoids
>> having to wait for block layer timeouts.
>>
>> Signed-off-by: Xingui Yang <yangxingui@huawei.com>
>> ---
>> Changes to v1:
>> - Use dev_is_sata() to check ATA device type
>>   drivers/scsi/libsas/sas_discover.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c 
>> b/drivers/scsi/libsas/sas_discover.c
>> index d5bc1314c341..a12b65eb4a2a 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -362,6 +362,9 @@ static void sas_destruct_ports(struct asd_sas_port 
>> *port)
>>   void sas_unregister_dev(struct asd_sas_port *port, struct 
>> domain_device *dev)
>>   {
>> +    if (test_bit(SAS_DEV_GONE, &dev->state) && dev_is_sata(dev))
>> +        sas_ata_device_link_abort(dev, false);
> 
> Firstly, I think that there is a bug in sas_ata_device_link_abort() -> 
> ata_link_abort() code in that the host lock in not grabbed, as the 
> comment in ata_port_abort() mentions. Having said that, libsas had 
> already some dodgy host locking usage - specifically dropping the lock 
> for the queuing path (that's something else to be fixed up ... I think 
> that is due to queue command CB calling task_done() in some cases), but 
> I still think that sas_ata_device_link_abort() should be fixed (to grab 
> the host lock).
> 
> Secondly, this just seems like a half solution to the age-old problem - 
> that is, EH eventually kicking in only after 30 seconds when a disk is 
> removed with active IO. I say half solution as SAS disks still have this 
> issue for libsas. Can we instead push to try to solve both of them now?
> 
> There was a broad previous discussion on this:
> https://urldefense.com/v3/__https://lore.kernel.org/linux-scsi/Ykqg0kr0F*2Fyzk2XW@infradead.org/__;JQ!!ACWV5N9M2RV99hQ!MwAZFXXIwuP0lv-kuUIJ0ekUiGBWlTBhU3oQjyOf_yuP1rHDJb8UKMzJjndXNQ-W1PQGJXzgc0bQUsHh4NGh21EOc50$ 
> 
> 
>  From that discussion, Hannes was doing some related prep work series, 
> but I don't think it got completed.

That discussion is not exactly the same with our issue. That discussion 
focused on whether one device's error handling can not suspend the other 
other devices's IO dispatching on the same host. That is something like 
parallelize the error handling for different device.

However what we are trying to resolve here is to shorten the timeout 
handling of a unplugged device. The scsi middle layer doesn't know the 
device is gone and still waiting for the IO until timeout kicks in and 
start the error handling. This made the applications stuck for a 
significant long time.But libsas knows that because it receives the phy 
down event, it knows that device will not come back and there is no need 
to wait for the timeout.

It's true that this is a half solution. I'd like to have a complete 
solution too. So we will try to solve both of them.

Thanks,
Jason

      parent reply	other threads:[~2022-12-20  2:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20221216100327.7386-1-yangxingui@huawei.com>
2022-12-19  9:23 ` [PATCH V2] scsi: libsas: Directly kick-off EH when ATA device fell off John Garry
2022-12-19 12:59   ` yangxingui
2022-12-19 14:53     ` John Garry
2022-12-20  2:34       ` yangxingui
2022-12-20  9:49       ` Jason Yan
2022-12-21  9:40         ` John Garry
2022-12-21 10:29           ` Jason Yan
2022-12-21  9:28       ` yangxingui
2022-12-19 15:28   ` Jason Yan
2022-12-19 15:55     ` John Garry
2022-12-19 23:00       ` Damien Le Moal
2022-12-20  8:43         ` John Garry
2022-12-19 22:59     ` Damien Le Moal
2022-12-20  2:39   ` Jason Yan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a50ed139-c82f-6d07-ae76-a690cada2c90@huawei.com \
    --to=yanaijie@huawei.com \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=jejb@linux.ibm.com \
    --cc=john.g.garry@oracle.com \
    --cc=kangfenglong@huawei.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=martin.petersen@oracle.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=yangxingui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox