linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] qla2xxx: prevent board_disable from running during EEH
@ 2015-04-30 12:13 Thadeu Lima de Souza Cascardo
  2015-06-26 16:33 ` [PATCH] [RESEND] " Mauricio Faria de Oliveira
  0 siblings, 1 reply; 4+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2015-04-30 12:13 UTC (permalink / raw)
  To: linux-scsi
  Cc: linux-kernel, JBottomley, qla2xxx-upstream,
	Thadeu Lima de Souza Cascardo, cascardo, mauricfo

Commit f3ddac1918fe963bcbf8d407a3a3c0881b47248b ("[SCSI] qla2xxx:
Disable adapter when we encounter a PCI disconnect.") has introduced a
code that disables the board, releasing some resources, when reading
0xffffffff.

In case this happens when there is an EEH, this read will trigger EEH
detection and set PCI channel offline. EEH will be able to recover the
card from this state by doing a reset, so it's a better option than
simply disabling the card.

Since eeh_check_failure will mark the channel as offline before
returning the read value, in case there really was an EEH, we can simply
check for pci_channel_offline, preventing the board_disable code from
running if it's true.

Without this patch, EEH code will try to access those same resources
that board_disable will try to free. This race can cause EEH recovery to
fail.

[  504.370577] EEH: Notify device driver to resume
[  504.370580] qla2xxx [0001:07:00.0]-9002:2: The device failed to resume I/O from slot/link_reset.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: <cascardo@debian.org>
Cc: <mauricfo@linux.vnet.ibm.com>
---
 drivers/scsi/qla2xxx/qla_isr.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index a04a1b1..8132926 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -116,7 +116,7 @@ bool
 qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg)
 {
 	/* Check for PCI disconnection */
-	if (reg == 0xffffffff) {
+	if (reg == 0xffffffff && !pci_channel_offline(vha->hw->pdev)) {
 		if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) &&
 		    !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags) &&
 		    !test_bit(PFLG_DRIVER_PROBING, &vha->pci_flags)) {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH] [RESEND] qla2xxx: prevent board_disable from running during EEH
  2015-04-30 12:13 [PATCH] qla2xxx: prevent board_disable from running during EEH Thadeu Lima de Souza Cascardo
@ 2015-06-26 16:33 ` Mauricio Faria de Oliveira
  2015-06-26 17:06   ` Himanshu Madhani
  2015-06-26 21:24   ` Joe Lawrence
  0 siblings, 2 replies; 4+ messages in thread
From: Mauricio Faria de Oliveira @ 2015-06-26 16:33 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-kernel, JBottomley, qla2xxx-upstream, cascardo, mauricfo

Commit f3ddac1918fe963bcbf8d407a3a3c0881b47248b ("[SCSI] qla2xxx:
Disable adapter when we encounter a PCI disconnect.") has introduced a
code that disables the board, releasing some resources, when reading
0xffffffff.

In case this happens when there is an EEH, this read will trigger EEH
detection and set PCI channel offline. EEH will be able to recover the
card from this state by doing a reset, so it's a better option than
simply disabling the card.

Since eeh_check_failure will mark the channel as offline before
returning the read value, in case there really was an EEH, we can simply
check for pci_channel_offline, preventing the board_disable code from
running if it's true.

Without this patch, EEH code will try to access those same resources
that board_disable will try to free. This race can cause EEH recovery to
fail.

[  504.370577] EEH: Notify device driver to resume
[  504.370580] qla2xxx [0001:07:00.0]-9002:2: The device failed to 
resume I/O from slot/link_reset.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: <cascardo@debian.org>
Cc: <mauricfo@linux.vnet.ibm.com>
---
  drivers/scsi/qla2xxx/qla_isr.c |    2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index a04a1b1..8132926 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -116,7 +116,7 @@ bool
  qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg)
  {
  	/* Check for PCI disconnection */
-	if (reg == 0xffffffff) {
+	if (reg == 0xffffffff && !pci_channel_offline(vha->hw->pdev)) {
  		if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) &&
  		    !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags) &&
  		    !test_bit(PFLG_DRIVER_PROBING, &vha->pci_flags)) {
-- 
1.7.10.4




^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] [RESEND] qla2xxx: prevent board_disable from running during EEH
  2015-06-26 16:33 ` [PATCH] [RESEND] " Mauricio Faria de Oliveira
@ 2015-06-26 17:06   ` Himanshu Madhani
  2015-06-26 21:24   ` Joe Lawrence
  1 sibling, 0 replies; 4+ messages in thread
From: Himanshu Madhani @ 2015-06-26 17:06 UTC (permalink / raw)
  To: Mauricio Faria de Oliveira, linux-scsi
  Cc: linux-kernel, JBottomley@parallels.com, Dept-Eng QLA2xxx Upstream,
	cascardo@debian.org

[-- Attachment #1: Type: text/plain, Size: 2304 bytes --]


On 6/26/15, 9:33 AM, "Mauricio Faria de Oliveira"
<mauricfo@linux.vnet.ibm.com> wrote:

>Commit f3ddac1918fe963bcbf8d407a3a3c0881b47248b ("[SCSI] qla2xxx:
>Disable adapter when we encounter a PCI disconnect.") has introduced a
>code that disables the board, releasing some resources, when reading
>0xffffffff.
>
>In case this happens when there is an EEH, this read will trigger EEH
>detection and set PCI channel offline. EEH will be able to recover the
>card from this state by doing a reset, so it's a better option than
>simply disabling the card.
>
>Since eeh_check_failure will mark the channel as offline before
>returning the read value, in case there really was an EEH, we can simply
>check for pci_channel_offline, preventing the board_disable code from
>running if it's true.
>
>Without this patch, EEH code will try to access those same resources
>that board_disable will try to free. This race can cause EEH recovery to
>fail.
>
>[  504.370577] EEH: Notify device driver to resume
>[  504.370580] qla2xxx [0001:07:00.0]-9002:2: The device failed to
>resume I/O from slot/link_reset.
>
>Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
>Cc: <cascardo@debian.org>
>Cc: <mauricfo@linux.vnet.ibm.com>
>---
>  drivers/scsi/qla2xxx/qla_isr.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/scsi/qla2xxx/qla_isr.c
>b/drivers/scsi/qla2xxx/qla_isr.c
>index a04a1b1..8132926 100644
>--- a/drivers/scsi/qla2xxx/qla_isr.c
>+++ b/drivers/scsi/qla2xxx/qla_isr.c
>@@ -116,7 +116,7 @@ bool
>  qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg)
>  {
>  	/* Check for PCI disconnection */
>-	if (reg == 0xffffffff) {
>+	if (reg == 0xffffffff && !pci_channel_offline(vha->hw->pdev)) {
>  		if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) &&
>  		    !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags) &&
>  		    !test_bit(PFLG_DRIVER_PROBING, &vha->pci_flags)) {
>-- 
>1.7.10.4
>
>

Looks good. 

Acked-by: Himanshu Madhani <himanshu.madhani@qlogic.com>

>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 5340 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] [RESEND] qla2xxx: prevent board_disable from running during EEH
  2015-06-26 16:33 ` [PATCH] [RESEND] " Mauricio Faria de Oliveira
  2015-06-26 17:06   ` Himanshu Madhani
@ 2015-06-26 21:24   ` Joe Lawrence
  1 sibling, 0 replies; 4+ messages in thread
From: Joe Lawrence @ 2015-06-26 21:24 UTC (permalink / raw)
  To: Mauricio Faria de Oliveira, linux-scsi
  Cc: linux-kernel, JBottomley, qla2xxx-upstream, cascardo

On 06/26/2015 12:33 PM, Mauricio Faria de Oliveira wrote:
> Commit f3ddac1918fe963bcbf8d407a3a3c0881b47248b ("[SCSI] qla2xxx:
> Disable adapter when we encounter a PCI disconnect.") has introduced a
> code that disables the board, releasing some resources, when reading
> 0xffffffff.
> 
> In case this happens when there is an EEH, this read will trigger EEH
> detection and set PCI channel offline. EEH will be able to recover the
> card from this state by doing a reset, so it's a better option than
> simply disabling the card.
> 
> Since eeh_check_failure will mark the channel as offline before
> returning the read value, in case there really was an EEH, we can simply
> check for pci_channel_offline, preventing the board_disable code from
> running if it's true.
> 
> Without this patch, EEH code will try to access those same resources
> that board_disable will try to free. This race can cause EEH recovery to
> fail.
> 
> [  504.370577] EEH: Notify device driver to resume
> [  504.370580] qla2xxx [0001:07:00.0]-9002:2: The device failed to
> resume I/O from slot/link_reset.
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
> Cc: <cascardo@debian.org>
> Cc: <mauricfo@linux.vnet.ibm.com>
> ---
>  drivers/scsi/qla2xxx/qla_isr.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c
> b/drivers/scsi/qla2xxx/qla_isr.c
> index a04a1b1..8132926 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -116,7 +116,7 @@ bool
>  qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg)
>  {
>      /* Check for PCI disconnection */
> -    if (reg == 0xffffffff) {
> +    if (reg == 0xffffffff && !pci_channel_offline(vha->hw->pdev)) {
>          if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) &&
>              !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags) &&
>              !test_bit(PFLG_DRIVER_PROBING, &vha->pci_flags)) {

Hi Mauricio,

Re: signed-off-by chain -- I believe if you are (re)sending another
person's patch, you will need a:  "From: Thadeu Lima de Souza Cascardo
<cascardo@linux.vnet.ibm.com>"  tag at the top of the message body to
retain original authorship and then your own:  "Signed-off-by: Mauricio
Faria de Oliveira <mauricfo@linux.vnet.ibm.com>" tag below Thadeu's
since the patch has passed through you.

Re: the patch -- I did some work last year to harden board_disable
against various races, but without having EEH hardware available, I was
uncertain about EEH behavior.  For example:

pci_read
   EEH -> marks channel offline
pci_read returns ~0

When do the EEH error handler callbacks run?

Thanks,

-- Joe

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-06-26 21:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-30 12:13 [PATCH] qla2xxx: prevent board_disable from running during EEH Thadeu Lima de Souza Cascardo
2015-06-26 16:33 ` [PATCH] [RESEND] " Mauricio Faria de Oliveira
2015-06-26 17:06   ` Himanshu Madhani
2015-06-26 21:24   ` Joe Lawrence

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).