public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: linan666@huaweicloud.com
Cc: linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org,
	linan122@huawei.com, yukuai3@huawei.com, yi.zhang@huawei.com,
	houtao1@huawei.com, yangerkun@huawei.com
Subject: Re: [PATCH] scsi: ata: Fix a race condition between scsi error handler and ahci interrupt
Date: Thu, 10 Aug 2023 11:49:50 +0900	[thread overview]
Message-ID: <25c1aca7-d885-0fff-2639-bb68a7dff44f@kernel.org> (raw)
In-Reply-To: <20230810014848.2148316-1-linan666@huaweicloud.com>

On 8/10/23 10:48, linan666@huaweicloud.com wrote:
> From: Li Nan <linan122@huawei.com>
> 

Please explain the problem first instead of starting with a function call
timeline which cannot ba analized without explanations.

> interrupt                            scsi_eh
> 
> ahci_error_intr
>   =>ata_port_freeze
>     =>__ata_port_freeze
>       =>ahci_freeze (turn IRQ off)
>     =>ata_port_abort
>       =>ata_port_schedule_eh
>         =>shost->host_eh_scheduled++;
>         host_eh_scheduled = 1
>                                      scsi_error_handler
>                                        =>ata_scsi_error
>                                          =>ata_scsi_port_error_handler
>                                            =>ahci_error_handler
>                                            . =>sata_pmp_error_handler
>                                            .   =>ata_eh_thaw_port
>                                            .     =>ahci_thaw (turn IRQ on)
> ahci_error_intr                            .
>   =>ata_port_freeze                        .
>     =>__ata_port_freeze                    .
>       =>ahci_freeze (turn IRQ off)         .
>     =>ata_port_abort                       .
>       =>ata_port_schedule_eh               .
>         =>shost->host_eh_scheduled++;      .
>         host_eh_scheduled = 2              .
>                                            =>ata_std_end_eh
>                                              =>host->host_eh_scheduled = 0;
> 
> 'host_eh_scheduled' is 0 and scsi eh thread will not be scheduled again,
> and the ata port remain freeze and will never be enabled.
> 
> If EH thread is already running, no need to freeze port and schedule
> EH again.
> 
> Reported-by: luojian <luojian5@huawei.com>
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>  drivers/ata/libahci.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
> index e2bacedf28ef..0dfb0b807324 100644
> --- a/drivers/ata/libahci.c
> +++ b/drivers/ata/libahci.c
> @@ -1840,9 +1840,17 @@ static void ahci_error_intr(struct ata_port *ap, u32 irq_stat)
>  
>  	/* okay, let's hand over to EH */
>  
> -	if (irq_stat & PORT_IRQ_FREEZE)
> +	if (irq_stat & PORT_IRQ_FREEZE) {
> +		/*
> +		 * EH already running, this may happen if the port is
> +		 * thawed in the EH. But we cannot freeze it again
> +		 * otherwise the port will never be thawed.
> +		 */
> +		if (ap->pflags & (ATA_PFLAG_EH_PENDING |
> +			ATA_PFLAG_EH_IN_PROGRESS))
> +			return;

This is definitely not correct because EH may have been scheduled for a non
fatal action like a device revalidate or to get sense data for successful
commands. With this change, the port will NOT be frozen when a hard error IRQ
comes while EH is waiting to start, that is, while EH waits for all commands to
complete first.

Furthermore, if you get an IRQ that requires the port to be frozen, it means
that you had a failed command. In that case, the drive is in error state per
ATA specs and stops all communication until a read log 10h command is issued.
So you should never ever see 2 error IRQs one after the other. If you do, it
very likely means that you have buggy hardware.

How do you get into this situation ? What adapter and disk are you using ?

>  		ata_port_freeze(ap);
> -	else if (fbs_need_dec) {
> +	} else if (fbs_need_dec) {
>  		ata_link_abort(link);
>  		ahci_fbs_dec_intr(ap);
>  	} else

-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2023-08-10  2:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10  1:48 [PATCH] scsi: ata: Fix a race condition between scsi error handler and ahci interrupt linan666
2023-08-10  2:49 ` Damien Le Moal [this message]
2023-08-14  6:41   ` Li Nan
2023-08-14  7:50     ` Damien Le Moal
2023-08-14 13:20       ` Li Nan
2023-08-15  2:41         ` Damien Le Moal
2023-08-17  7:41           ` Li Nan
2023-08-21 13:51 ` Niklas Cassel
2023-08-22  9:20   ` Li Nan
2023-08-22 10:30     ` Niklas Cassel
2023-09-04 11:45       ` Li Nan
2023-09-04 11:57         ` Niklas Cassel
2023-09-04 13:00           ` Li Nan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25c1aca7-d885-0fff-2639-bb68a7dff44f@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=houtao1@huawei.com \
    --cc=linan122@huawei.com \
    --cc=linan666@huaweicloud.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox