linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: scameron@beardog.cce.hp.com
Cc: Tomas Henzl <thenzl@redhat.com>,
	stephenmcameron@gmail.com, mikem@beardog.cce.hp.com,
	linux-scsi@vger.kernel.org, scott.teel@hp.com
Subject: Re: [PATCH 03/11] hpsa: add 5 second delay after doorbell reset
Date: Sat, 30 Nov 2013 16:42:02 -0800	[thread overview]
Message-ID: <1385858522.1815.1.camel@dabdike> (raw)
In-Reply-To: <20131108153128.GN31390@beardog.cce.hp.com>

On Fri, 2013-11-08 at 09:31 -0600, scameron@beardog.cce.hp.com wrote:
> On Fri, Nov 08, 2013 at 04:02:20PM +0100, Tomas Henzl wrote:
> > On 11/08/2013 03:44 PM, scameron@beardog.cce.hp.com wrote:
> > > On Fri, Nov 08, 2013 at 02:51:37PM +0100, Tomas Henzl wrote:
> > >> On 11/07/2013 05:45 PM, Stephen M. Cameron wrote:
> > >>> From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> > >>>
> > >>> The hardware guys tell us that after initiating a software
> > >>> reset via the doorbell register we need to wait 5 seconds before
> > >>> attempting to talk to the board *at all*.  This means that we
> > >>> cannot watch the board to verify it transitions from "ready" to
> > >>> to "not ready" then back "ready", since this transition will
> > >>> most likely happen during those 5 seconds (though we can still
> > >>> verify the reset happens by watching the "driver version" field
> > >>> get cleared.)
> > >>>
> > >>> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> > >>> ---
> > >>>  drivers/scsi/hpsa.c |   32 +++++++++++++++++++++++---------
> > >>>  1 files changed, 23 insertions(+), 9 deletions(-)
> > >>>
> > >>> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > >>> index 20fc598..fff5fd3 100644
> > >>> --- a/drivers/scsi/hpsa.c
> > >>> +++ b/drivers/scsi/hpsa.c
> > >>> @@ -3781,6 +3781,13 @@ static int hpsa_controller_hard_reset(struct pci_dev *pdev,
> > >>>  		 */
> > >>>  		dev_info(&pdev->dev, "using doorbell to reset controller\n");
> > >>>  		writel(use_doorbell, vaddr + SA5_DOORBELL);
> > >>> +
> > >>> +		/* PMC hardware guys tell us we need a 5 second delay after
> > >>> +		 * doorbell reset and before any attempt to talk to the board
> > >>> +		 * at all to ensure that this actually works and doesn't fall
> > >>> +		 * over in some weird corner cases.
> > >>> +		 */
> > >>> +		msleep(5000);
> > >>>  	} else { /* Try to do it the PCI power state way */
> > >>>  
> > >>>  		/* Quoting from the Open CISS Specification: "The Power
> > >>> @@ -3977,15 +3984,22 @@ static int hpsa_kdump_hard_reset_controller(struct pci_dev *pdev)
> > >>>  	   need a little pause here */
> > >>>  	msleep(HPSA_POST_RESET_PAUSE_MSECS);
> > >> I know it's complicated with a lot of different devices and fw versions,
> > >> but here^ we wait for 3sec - isn't the method - wait for 3s then wait for board not ready
> > >> a bit fragile, what if a board comes up faster?
> > >> When the method "watching the "driver version"" works why don't you want to use it  
> > >> regardless of the reset method used?
> > > The "watching the driver version" thing is only there to catch if
> > > the firmware guys break things and turn the reset into a no-op
> > > (which happened with the PCI power manaegment based reset and we
> > > didn't catch it for a year or so because we didn't have that check)
> > >
> > > We aren't supposed to look at the driver version field (or anything)
> > > until we first verify the scratch pad register says the firmware is
> > > ready.  In the case of those boards that use the "doorbell" reset,
> > > we aren't supposed to look at *anything* for the first five seconds.
> > >
> > > I have been bugging the firmware/hardware guys for a sane reset
> > > procedure that actually works reliably for years with no luck.
> > >
> > > For the SCSI over PCIe driver, being tired of this crap, I simply
> > > unconditionally reset the device on driver load every single time,
> > > and did this from the beginning.  This kind of forced the firmware
> > > and hardware guys to make the reset on that thing work reliably
> > > and quickly, and since I did that from the earliest days, they didn't
> > > have a chance to screw it up without it being caught immediately.
> > > For Smart Array, obviously it's too late for that approach.
> > 
> > OK, my question was more or less if this:
> > msleep(HPSA_POST_RESET_PAUSE_MSECS);
> > just before waiting for the board to enter BOARD_NOT_READY state
> > isn't dangerous - when the board enters a ready state in the first 3sec
> > it will wait indefinitely for the not_ready state
> > thus whether the test for not ready state shouldn't be removed.
> > The mechanism now works somehow and maybe it's better
> > not to touch it, I just wanted to draw your attention to that
> > potential problem.
> 
> Oh ok, I see.  Thanks, yes that does look questionable.  So you
> are suggesting to skip the check for transition from NOT READY to 
> READY in the scratch pad register in all cases, since we have all
> these ridiculous delay requirements preventing us from watching the
> board closely enough and so that may mean that we would miss such a
> transition.
> 
> Let me talk it over with Mike Miller, but it seems reasonable.

Is there a resolution on this?  It's holding up the patch series.

James



  reply	other threads:[~2013-12-01 16:50 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-07 16:45 [PATCH 00/11] hpsa: minor fixes and cleanups Stephen M. Cameron
2013-11-07 16:45 ` [PATCH 01/11] hpsa: use workqueue instead of kernel thread for lockup detection Stephen M. Cameron
2013-12-02 18:00   ` James Bottomley
2013-12-04 16:31     ` scameron
2013-12-04 17:42       ` James Bottomley
2013-11-07 16:45 ` [PATCH 02/11] hpsa: do not attempt to flush the cache on locked up controllers Stephen M. Cameron
2013-11-07 16:45 ` [PATCH 03/11] hpsa: add 5 second delay after doorbell reset Stephen M. Cameron
2013-11-08 13:51   ` Tomas Henzl
2013-11-08 14:44     ` scameron
2013-11-08 15:02       ` Tomas Henzl
2013-11-08 15:31         ` scameron
2013-12-01  0:42           ` James Bottomley [this message]
2013-12-02 17:15             ` Mike Miller
2013-12-02 17:23               ` James Bottomley
2013-12-02 17:24                 ` Miller, Mike (OS Dev)
2013-11-07 16:45 ` [PATCH 04/11] hpsa: do not discard scsi status on aborted commands Stephen M. Cameron
2013-11-07 16:45 ` [PATCH 05/11] hpsa: remove unneeded include of seq_file.h Stephen M. Cameron
2013-11-07 16:45 ` [PATCH 06/11] hpsa: fix memory leak in CCISS_BIG_PASSTHRU ioctl Stephen M. Cameron
2013-11-07 16:46 ` [PATCH 07/11] hpsa: add MSA 2040 to list of external target devices Stephen M. Cameron
2013-11-07 16:46 ` [PATCH 08/11] hpsa: cap CCISS_PASSTHRU at 20 concurrent commands Stephen M. Cameron
2013-11-07 16:46 ` [PATCH 09/11] hpsa: prevent stalled i/o Stephen M. Cameron
2013-11-07 16:46 ` [PATCH 10/11] hpsa: rename scsi prefetch field Stephen M. Cameron
2013-11-07 16:46 ` [PATCH 11/11] hpsa: enable unit attention reporting Stephen M. Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1385858522.1815.1.camel@dabdike \
    --to=james.bottomley@hansenpartnership.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mikem@beardog.cce.hp.com \
    --cc=scameron@beardog.cce.hp.com \
    --cc=scott.teel@hp.com \
    --cc=stephenmcameron@gmail.com \
    --cc=thenzl@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).