stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] scsi: aacraid: Add a small delay after IOP reset
@ 2017-09-19 15:11 Guilherme G. Piccoli
  2017-09-19 15:37 ` Christoph Hellwig
  2017-09-21 16:19 ` Dave Carroll
  0 siblings, 2 replies; 12+ messages in thread
From: Guilherme G. Piccoli @ 2017-09-19 15:11 UTC (permalink / raw)
  To: aacraid, linux-scsi
  Cc: gpiccoli, RaghavaAditya.Renukunta, david.carroll, brking,
	dougmill, stable

Commit 0e9973ed3382 ("scsi: aacraid: Add periodic checks to see IOP reset
status") changed the way driver checks if a reset succeeded. Now, after an
IOP reset, aacraid immediately start polling a register to verify the reset
is complete.

This behavior cause regressions on the reset path in PowerPC (at least).
Since the delay after the IOP reset was removed by the aforementioned patch,
the fact driver just starts to read a register instantly after the reset
was issued (by writing in another register) "corrupts" the reset procedure,
which ends up failing all the time.

The issue highly impacted kdump on PowerPC, since on kdump path we
proactively issue a reset in adapter (through the reset_devices kernel
parameter).

This patch (re-)adds a delay right after IOP reset is issued. Empirically
we measured that 3 seconds is enough, but for safety reasons we delay
for 5s (and since it was 30s before, 5s is still a small amount).

For reference, without this patch we observe the following messages
on kdump kernel boot process:

  [ 76.294] aacraid 0003:01:00.0: IOP reset failed
  [ 76.294] aacraid 0003:01:00.0: ARC Reset attempt failed
  [ 86.524] aacraid 0003:01:00.0: adapter kernel panic'd ff.
  [ 86.524] aacraid 0003:01:00.0: Controller reset type is 3
  [ 86.524] aacraid 0003:01:00.0: Issuing IOP reset
  [146.534] aacraid 0003:01:00.0: IOP reset failed
  [146.534] aacraid 0003:01:00.0: ARC Reset attempt failed

Fixes: 0e9973ed3382 ("scsi: aacraid: Add periodic checks to see IOP reset status")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
---
 drivers/scsi/aacraid/src.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
index 48c2b2b34b72..0c9361c87ec8 100644
--- a/drivers/scsi/aacraid/src.c
+++ b/drivers/scsi/aacraid/src.c
@@ -740,6 +740,8 @@ static void aac_send_iop_reset(struct aac_dev *dev)
 	aac_set_intx_mode(dev);
 
 	src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
+
+	msleep(5000);
 }
 
 static void aac_send_hardware_soft_reset(struct aac_dev *dev)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: aacraid: Add a small delay after IOP reset
  2017-09-19 15:11 [PATCH] scsi: aacraid: Add a small delay after IOP reset Guilherme G. Piccoli
@ 2017-09-19 15:37 ` Christoph Hellwig
  2017-09-19 15:49   ` Guilherme G. Piccoli
  2017-09-21 16:19 ` Dave Carroll
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2017-09-19 15:37 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: aacraid, linux-scsi, RaghavaAditya.Renukunta, david.carroll,
	brking, dougmill, stable

On Tue, Sep 19, 2017 at 12:11:55PM -0300, Guilherme G. Piccoli wrote:
>  	src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
> +
> +	msleep(5000);

src_writel is a writel, and thus a posted MMIO write.  You'll need
to have to a read first to make it a reliable timing base.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: aacraid: Add a small delay after IOP reset
  2017-09-19 15:37 ` Christoph Hellwig
@ 2017-09-19 15:49   ` Guilherme G. Piccoli
  2017-09-19 15:52     ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Guilherme G. Piccoli @ 2017-09-19 15:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aacraid, linux-scsi, RaghavaAditya.Renukunta, david.carroll,
	brking, dougmill, stable

On 09/19/2017 12:37 PM, Christoph Hellwig wrote:
> On Tue, Sep 19, 2017 at 12:11:55PM -0300, Guilherme G. Piccoli wrote:
>>  	src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
>> +
>> +	msleep(5000);
> 
> src_writel is a writel, and thus a posted MMIO write.  You'll need
> to have to a read first to make it a reliable timing base.
> 

Just for my full understanding - you're saying a readl BEFORE
src_writel() or AFTER src_writel() ?

I could add a read to some dummy register, but notice it was a sequence
of readl's on aac_is_ctrl_up_and_running() that caused the failure of
reset...

Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: aacraid: Add a small delay after IOP reset
  2017-09-19 15:49   ` Guilherme G. Piccoli
@ 2017-09-19 15:52     ` Christoph Hellwig
  2017-09-19 15:58       ` Guilherme G. Piccoli
  2017-09-19 17:05       ` James Bottomley
  0 siblings, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2017-09-19 15:52 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Christoph Hellwig, aacraid, linux-scsi, RaghavaAditya.Renukunta,
	david.carroll, brking, dougmill, stable

On Tue, Sep 19, 2017 at 12:49:21PM -0300, Guilherme G. Piccoli wrote:
> On 09/19/2017 12:37 PM, Christoph Hellwig wrote:
> > On Tue, Sep 19, 2017 at 12:11:55PM -0300, Guilherme G. Piccoli wrote:
> >>  	src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
> >> +
> >> +	msleep(5000);
> > 
> > src_writel is a writel, and thus a posted MMIO write.  You'll need
> > to have to a read first to make it a reliable timing base.
> > 
> 
> Just for my full understanding - you're saying a readl BEFORE
> src_writel() or AFTER src_writel() ?

AFTER.

> I could add a read to some dummy register, but notice it was a sequence
> of readl's on aac_is_ctrl_up_and_running() that caused the failure of
> reset...

Oh, ouch.  I guess in that case we'll need to do the writel and pray,
but that would need a big comment explaining what's going on there.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: aacraid: Add a small delay after IOP reset
  2017-09-19 15:52     ` Christoph Hellwig
@ 2017-09-19 15:58       ` Guilherme G. Piccoli
  2017-09-19 17:05       ` James Bottomley
  1 sibling, 0 replies; 12+ messages in thread
From: Guilherme G. Piccoli @ 2017-09-19 15:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aacraid, linux-scsi, RaghavaAditya.Renukunta, david.carroll,
	brking, dougmill, stable

On 09/19/2017 12:52 PM, Christoph Hellwig wrote:
> On Tue, Sep 19, 2017 at 12:49:21PM -0300, Guilherme G. Piccoli wrote:
>> On 09/19/2017 12:37 PM, Christoph Hellwig wrote:
>>> On Tue, Sep 19, 2017 at 12:11:55PM -0300, Guilherme G. Piccoli wrote:
>>>>  	src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
>>>> +
>>>> +	msleep(5000);
>>>
>>> src_writel is a writel, and thus a posted MMIO write.  You'll need
>>> to have to a read first to make it a reliable timing base.
>>>
>>
>> Just for my full understanding - you're saying a readl BEFORE
>> src_writel() or AFTER src_writel() ?
> 
> AFTER.

Thanks!

> 
>> I could add a read to some dummy register, but notice it was a sequence
>> of readl's on aac_is_ctrl_up_and_running() that caused the failure of
>> reset...
> 
> Oh, ouch.  I guess in that case we'll need to do the writel and pray,
> but that would need a big comment explaining what's going on there.
> 

Heheh you're right!

But do you remember that quirk added on nvme? Basically, it was a
similar scenario in which some adapters weren't happy in poll a register
in nvme_wait_ready() right after we wrote in the Controller Config
register when disabling a controller.

Seems the same thing here, the controller is not being able to handle a
read right after some internal procedure (reset) were initiated.

If you have suggestion to improve the commit msg, let me know :)
Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: aacraid: Add a small delay after IOP reset
  2017-09-19 15:52     ` Christoph Hellwig
  2017-09-19 15:58       ` Guilherme G. Piccoli
@ 2017-09-19 17:05       ` James Bottomley
  2017-09-19 19:15         ` Guilherme G. Piccoli
  1 sibling, 1 reply; 12+ messages in thread
From: James Bottomley @ 2017-09-19 17:05 UTC (permalink / raw)
  To: Christoph Hellwig, Guilherme G. Piccoli
  Cc: aacraid, linux-scsi, RaghavaAditya.Renukunta, david.carroll,
	brking, dougmill, stable

On Tue, 2017-09-19 at 08:52 -0700, Christoph Hellwig wrote:
> On Tue, Sep 19, 2017 at 12:49:21PM -0300, Guilherme G. Piccoli wrote:
> > 
> > On 09/19/2017 12:37 PM, Christoph Hellwig wrote:
> > > 
> > > On Tue, Sep 19, 2017 at 12:11:55PM -0300, Guilherme G. Piccoli
> > > wrote:
> > > > 
> > > >  	src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
> > > > +
> > > > +	msleep(5000);
> > > 
> > > src_writel is a writel, and thus a posted MMIO write.  You'll
> > > need
> > > to have to a read first to make it a reliable timing base.
> > > 
> > 
> > Just for my full understanding - you're saying a readl BEFORE
> > src_writel() or AFTER src_writel() ?
> 
> AFTER.

Actually, the whole problem sounds like a posted write.  Likely the
write that causes the reset doesn't get flushed until the read checking
if the reset has succeeded, which might explain the 100% initial
failure.  Why not throw away that first value if it's a failure and
then do your polled wait and timeout on the reset success.  We should
anyway be waiting some time for a reset to be issued, so even on non-
posted write systems we could see this problem intermittently.

James

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: aacraid: Add a small delay after IOP reset
  2017-09-19 17:05       ` James Bottomley
@ 2017-09-19 19:15         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 12+ messages in thread
From: Guilherme G. Piccoli @ 2017-09-19 19:15 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig
  Cc: aacraid, linux-scsi, RaghavaAditya.Renukunta, david.carroll,
	brking, dougmill, stable

On 09/19/2017 02:05 PM, James Bottomley wrote:
> Actually, the whole problem sounds like a posted write.  Likely the
> write that causes the reset doesn't get flushed until the read checking
> if the reset has succeeded, which might explain the 100% initial
> failure.  Why not throw away that first value if it's a failure and
> then do your polled wait and timeout on the reset success.  We should
> anyway be waiting some time for a reset to be issued, so even on non-
> posted write systems we could see this problem intermittently.
> 
> James
> 

Thanks for this suggestion James.

I tried to remove the sleep and did a dummy read to register using
readl() - issue reproduced.

I did expect that, since in aac_is_ctrl_up_and_running() we indeed read
a register and if it shows us reset is not complete, we wait and read it
again. So, we can think in this 1st read as a dummy one heheh

My theory here is that we're observing a failure similar to one we
already did in some specific NVMe adapters - the readl before some delay
(in nvme it was 2s) corrupts the adapter FW procedure. It's as if the
adapter doesn't like to deal with this read while the reset procedure is
ongoing. So, we wait a bit to issue a readl and everything goes fine.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] scsi: aacraid: Add a small delay after IOP reset
  2017-09-19 15:11 [PATCH] scsi: aacraid: Add a small delay after IOP reset Guilherme G. Piccoli
  2017-09-19 15:37 ` Christoph Hellwig
@ 2017-09-21 16:19 ` Dave Carroll
  2017-09-25 21:09   ` Guilherme G. Piccoli
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Carroll @ 2017-09-21 16:19 UTC (permalink / raw)
  To: Guilherme G. Piccoli, dl-esc-Aacraid Linux Driver,
	linux-scsi@vger.kernel.org
  Cc: Raghava Aditya Renukunta, brking@linux.vnet.ibm.com,
	dougmill@linux.vnet.ibm.com, stable@vger.kernel.org

> From: Guilherme G. Piccoli [mailto:gpiccoli@linux.vnet.ibm.com]
> Sent: Tuesday, September 19, 2017 9:12 AM
> To: dl-esc-Aacraid Linux Driver <aacraid@microsemi.com>; linux-
> scsi@vger.kernel.org
> Cc: gpiccoli@linux.vnet.ibm.com; Raghava Aditya Renukunta
> <RaghavaAditya.Renukunta@microsemi.com>; Dave Carroll
> <david.carroll@microsemi.com>; brking@linux.vnet.ibm.com;
> dougmill@linux.vnet.ibm.com; stable@vger.kernel.org
> Subject: [PATCH] scsi: aacraid: Add a small delay after IOP reset
> 
> Commit 0e9973ed3382 ("scsi: aacraid: Add periodic checks to see IOP reset
> status") changed the way driver checks if a reset succeeded. Now, after an IOP
> reset, aacraid immediately start polling a register to verify the reset is complete.
> 
> This behavior cause regressions on the reset path in PowerPC (at least).
> Since the delay after the IOP reset was removed by the aforementioned patch,
> the fact driver just starts to read a register instantly after the reset was issued
> (by writing in another register) "corrupts" the reset procedure, which ends up
> failing all the time.
> 
> The issue highly impacted kdump on PowerPC, since on kdump path we
> proactively issue a reset in adapter (through the reset_devices kernel
> parameter).
> 
> This patch (re-)adds a delay right after IOP reset is issued. Empirically we
> measured that 3 seconds is enough, but for safety reasons we delay for 5s (and
> since it was 30s before, 5s is still a small amount).
> 
> For reference, without this patch we observe the following messages on kdump
> kernel boot process:
> 
>   [ 76.294] aacraid 0003:01:00.0: IOP reset failed
>   [ 76.294] aacraid 0003:01:00.0: ARC Reset attempt failed
>   [ 86.524] aacraid 0003:01:00.0: adapter kernel panic'd ff.
>   [ 86.524] aacraid 0003:01:00.0: Controller reset type is 3
>   [ 86.524] aacraid 0003:01:00.0: Issuing IOP reset
>   [146.534] aacraid 0003:01:00.0: IOP reset failed
>   [146.534] aacraid 0003:01:00.0: ARC Reset attempt failed
> 
> Fixes: 0e9973ed3382 ("scsi: aacraid: Add periodic checks to see IOP reset
> status")
> Cc: stable@vger.kernel.org # v4.13+
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> ---
Acked-by: Dave Carroll <david.carroll@microsemi.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: aacraid: Add a small delay after IOP reset
  2017-09-21 16:19 ` Dave Carroll
@ 2017-09-25 21:09   ` Guilherme G. Piccoli
  2017-09-25 21:34     ` Martin K. Petersen
  0 siblings, 1 reply; 12+ messages in thread
From: Guilherme G. Piccoli @ 2017-09-25 21:09 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org
  Cc: Dave Carroll, dl-esc-Aacraid Linux Driver,
	Raghava Aditya Renukunta, brking@linux.vnet.ibm.com,
	dougmill@linux.vnet.ibm.com, stable@vger.kernel.org,
	James.Bottomley, Martin K. Petersen

On 09/21/2017 01:19 PM, Dave Carroll wrote:
>> [...]
>> ---
> Acked-by: Dave Carroll <david.carroll@microsemi.com>
> 

Thanks Dave!

James/Martin, am I expected to send a v2 with some change? Perhaps with
Dave's ack?
Sorry to annoy, thanks in advance for any advice!

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: aacraid: Add a small delay after IOP reset
  2017-09-25 21:09   ` Guilherme G. Piccoli
@ 2017-09-25 21:34     ` Martin K. Petersen
  2017-09-27 19:26       ` Dave Carroll
  0 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2017-09-25 21:34 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-scsi@vger.kernel.org, Dave Carroll,
	dl-esc-Aacraid Linux Driver, Raghava Aditya Renukunta,
	brking@linux.vnet.ibm.com, dougmill@linux.vnet.ibm.com,
	stable@vger.kernel.org, James.Bottomley, Martin K. Petersen


Guilherme,

> James/Martin, am I expected to send a v2 with some change? Perhaps
> with Dave's ack?  Sorry to annoy, thanks in advance for any advice!

I was just about to mail Dave and ask for confirmation that your
interpretation of the controller behavior is correct.

Dave?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] scsi: aacraid: Add a small delay after IOP reset
  2017-09-25 21:34     ` Martin K. Petersen
@ 2017-09-27 19:26       ` Dave Carroll
  2017-09-28  1:42         ` Martin K. Petersen
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Carroll @ 2017-09-27 19:26 UTC (permalink / raw)
  To: Martin K. Petersen, Guilherme G. Piccoli
  Cc: linux-scsi@vger.kernel.org, dl-esc-Aacraid Linux Driver,
	Raghava Aditya Renukunta, brking@linux.vnet.ibm.com,
	dougmill@linux.vnet.ibm.com, stable@vger.kernel.org,
	James.Bottomley@HansenPartnership.com


> 
> 
> Guilherme,
> 
> > James/Martin, am I expected to send a v2 with some change? Perhaps
> > with Dave's ack?  Sorry to annoy, thanks in advance for any advice!
> 
> I was just about to mail Dave and ask for confirmation that your interpretation
> of the controller behavior is correct.
> 
> Dave?
> 
 Hi Martin,

Previously we had used sleep to delay until the controller got its mind back, 
but early testing indicated it wasn't needed. I'm  good with this.

-Dave

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] scsi: aacraid: Add a small delay after IOP reset
  2017-09-27 19:26       ` Dave Carroll
@ 2017-09-28  1:42         ` Martin K. Petersen
  0 siblings, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2017-09-28  1:42 UTC (permalink / raw)
  To: Dave Carroll
  Cc: Martin K. Petersen, Guilherme G. Piccoli,
	linux-scsi@vger.kernel.org, dl-esc-Aacraid Linux Driver,
	Raghava Aditya Renukunta, brking@linux.vnet.ibm.com,
	dougmill@linux.vnet.ibm.com, stable@vger.kernel.org,
	James.Bottomley@HansenPartnership.com


Dave,

>> 
> Previously we had used sleep to delay until the controller got its
> mind back, but early testing indicated it wasn't needed. I'm good with
> this.

Applied to 4.14/scsi-fixes. Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-09-28  1:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-19 15:11 [PATCH] scsi: aacraid: Add a small delay after IOP reset Guilherme G. Piccoli
2017-09-19 15:37 ` Christoph Hellwig
2017-09-19 15:49   ` Guilherme G. Piccoli
2017-09-19 15:52     ` Christoph Hellwig
2017-09-19 15:58       ` Guilherme G. Piccoli
2017-09-19 17:05       ` James Bottomley
2017-09-19 19:15         ` Guilherme G. Piccoli
2017-09-21 16:19 ` Dave Carroll
2017-09-25 21:09   ` Guilherme G. Piccoli
2017-09-25 21:34     ` Martin K. Petersen
2017-09-27 19:26       ` Dave Carroll
2017-09-28  1:42         ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).