From: Vivek Goyal <vgoyal@in.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@osdl.org>,
Neela.Kolli@engenio.com, "Miller,
Mike (OS Dev)" <Mike.Miller@hp.com>,
linux-scsi@vger.kernel.org, fastboot@lists.osdl.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
Date: Mon, 26 Jun 2006 14:18:25 -0400 [thread overview]
Message-ID: <20060626181825.GI8985@in.ibm.com> (raw)
In-Reply-To: <m1veqnst2b.fsf@ebiederm.dsl.xmission.com>
On Mon, Jun 26, 2006 at 11:52:28AM -0600, Eric W. Biederman wrote:
> "Miller, Mike (OS Dev)" <Mike.Miller@hp.com> writes:
>
> > Thanks Eric, that helps me understand. Section 8.2.2 of the open cciss
> > spec supports a reset message. Target 0x00 is the controller. We could
> > add this to the init routine to ensure the board is made sane again but
> > this would drastically increase init time under normal circumstances.
>
> Where does the init time penalty come from? How large is the
> init penalty? I suspect it is from waiting for the scsi disks to spin up.
> But I am just guessing in the dark.
>
> > And I suspect this is a hard reset, also. Not sure if that would
> > negatively impact kdump. If there were some condition we could test
> > against and perform the reset when that condition is met it would not
> > impact 99.9% of users.
>
> I am wondering if it is possible to look at the controller and
> see if it is in a bad state, (i.e. in some state besides just coming
> out of reset) and if so issue a reset. If this really is a long operation
> that would be the ideal way to handle it.
>
That's a good question. MPT fustion driver already does something like
this. It retrieves the state of IOC and then checks whether there is
a need of reset or not.
/*
* Check to see if IOC got left/stuck in doorbell handshake
* grip of death. If so, hard reset the IOC.
*/
if (ioc_state & MPI_DOORBELL_ACTIVE) {
statefault = 1;
printk(MYIOC_s_WARN_FMT "Unexpected doorbell active!\n",
ioc->name);
}
But then question will be if all the devices out there provide the
capability to query something similar to if we have just come out of reset
state or not.
> If the amount of time is really user noticeable and testing for it
> is impossible then it is probably time to talk kernel command line
> options. >
> Although it might simply be appropriate to handle commands completing
> you didn't start. I am not at all familiar with that particular piece
> of hardware so I can't make a good guess on what needs to happen there.
>
> > Thoughts, comments, flames?
>
> Good question.
>
> It is a bit of a pain but not too hard to setup a test environment
> so you can reproduce this if you are interested. Vivek should
> be the authority there.
>
Mike, I have got one setup ready with me. I have got a Compaq Smart Array
5300 controller. I can reproduce this issue consistently. I don't know
much about this device. Is it possible for you to post a patch for
resetting the device during initialization. I can test the fix and provide
you more data.
Thanks
Vivek
WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@in.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Miller, Mike (OS Dev)" <Mike.Miller@hp.com>,
Maneesh Soni <maneesh@in.ibm.com>, Andrew Morton <akpm@osdl.org>,
Neela.Kolli@engenio.com, linux-scsi@vger.kernel.org,
fastboot@lists.osdl.org, linux-kernel@vger.kernel.org
Subject: Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
Date: Mon, 26 Jun 2006 14:18:25 -0400 [thread overview]
Message-ID: <20060626181825.GI8985@in.ibm.com> (raw)
In-Reply-To: <m1veqnst2b.fsf@ebiederm.dsl.xmission.com>
On Mon, Jun 26, 2006 at 11:52:28AM -0600, Eric W. Biederman wrote:
> "Miller, Mike (OS Dev)" <Mike.Miller@hp.com> writes:
>
> > Thanks Eric, that helps me understand. Section 8.2.2 of the open cciss
> > spec supports a reset message. Target 0x00 is the controller. We could
> > add this to the init routine to ensure the board is made sane again but
> > this would drastically increase init time under normal circumstances.
>
> Where does the init time penalty come from? How large is the
> init penalty? I suspect it is from waiting for the scsi disks to spin up.
> But I am just guessing in the dark.
>
> > And I suspect this is a hard reset, also. Not sure if that would
> > negatively impact kdump. If there were some condition we could test
> > against and perform the reset when that condition is met it would not
> > impact 99.9% of users.
>
> I am wondering if it is possible to look at the controller and
> see if it is in a bad state, (i.e. in some state besides just coming
> out of reset) and if so issue a reset. If this really is a long operation
> that would be the ideal way to handle it.
>
That's a good question. MPT fustion driver already does something like
this. It retrieves the state of IOC and then checks whether there is
a need of reset or not.
/*
* Check to see if IOC got left/stuck in doorbell handshake
* grip of death. If so, hard reset the IOC.
*/
if (ioc_state & MPI_DOORBELL_ACTIVE) {
statefault = 1;
printk(MYIOC_s_WARN_FMT "Unexpected doorbell active!\n",
ioc->name);
}
But then question will be if all the devices out there provide the
capability to query something similar to if we have just come out of reset
state or not.
> If the amount of time is really user noticeable and testing for it
> is impossible then it is probably time to talk kernel command line
> options. >
> Although it might simply be appropriate to handle commands completing
> you didn't start. I am not at all familiar with that particular piece
> of hardware so I can't make a good guess on what needs to happen there.
>
> > Thoughts, comments, flames?
>
> Good question.
>
> It is a bit of a pain but not too hard to setup a test environment
> so you can reproduce this if you are interested. Vivek should
> be the authority there.
>
Mike, I have got one setup ready with me. I have got a Compaq Smart Array
5300 controller. I can reproduce this issue consistently. I don't know
much about this device. Is it possible for you to post a patch for
resetting the device during initialization. I can test the fix and provide
you more data.
Thanks
Vivek
next prev parent reply other threads:[~2006-06-26 18:18 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-23 21:01 [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter Vivek Goyal
2006-06-23 21:04 ` [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix Vivek Goyal
2006-06-24 6:55 ` Andrew Morton
2006-06-24 11:19 ` Vivek Goyal
2006-06-24 11:30 ` Andrew Morton
2006-06-24 12:08 ` Vivek Goyal
2006-06-24 17:13 ` Eric W. Biederman
2006-06-24 17:13 ` Eric W. Biederman
2006-06-26 2:11 ` [Fastboot] " Maneesh Soni
2006-06-26 13:35 ` Vivek Goyal
2006-06-26 14:17 ` Eric W. Biederman
2006-06-26 14:17 ` [Fastboot] " Eric W. Biederman
2006-06-26 15:32 ` Vivek Goyal
2006-06-26 16:00 ` Eric W. Biederman
2006-06-26 16:13 ` Miller, Mike (OS Dev)
2006-06-26 16:13 ` Miller, Mike (OS Dev)
2006-06-26 16:35 ` Vivek Goyal
2006-06-26 16:35 ` [Fastboot] " Vivek Goyal
2006-06-26 16:38 ` Eric W. Biederman
2006-06-26 16:38 ` Eric W. Biederman
2006-06-26 16:51 ` Miller, Mike (OS Dev)
2006-06-26 16:51 ` [Fastboot] " Miller, Mike (OS Dev)
2006-06-26 17:04 ` Vivek Goyal
2006-06-26 17:04 ` [Fastboot] " Vivek Goyal
2006-06-26 17:24 ` Andrew Morton
2006-06-26 17:24 ` [Fastboot] " Andrew Morton
2006-06-26 17:22 ` Vivek Goyal
2006-06-26 17:52 ` Eric W. Biederman
2006-06-26 17:52 ` Eric W. Biederman
2006-06-26 18:18 ` Vivek Goyal [this message]
2006-06-26 18:18 ` Vivek Goyal
2006-06-26 18:51 ` Miller, Mike (OS Dev)
2006-06-26 18:51 ` Miller, Mike (OS Dev)
2006-06-26 19:21 ` Eric W. Biederman
2006-06-26 19:21 ` [Fastboot] " Eric W. Biederman
2006-06-26 19:43 ` Vivek Goyal
2006-06-26 19:43 ` [Fastboot] " Vivek Goyal
2006-06-26 21:24 ` Miller, Mike (OS Dev)
2006-06-26 21:24 ` Miller, Mike (OS Dev)
2006-06-26 19:36 ` Vivek Goyal
2006-06-26 17:16 ` Vivek Goyal
2006-06-26 17:31 ` Andrew Morton
2006-06-26 17:31 ` [Fastboot] " Andrew Morton
2006-06-26 17:39 ` Eric W. Biederman
2006-06-26 17:39 ` [Fastboot] " Eric W. Biederman
2006-06-26 17:56 ` James Bottomley
2006-06-26 17:56 ` [Fastboot] " James Bottomley
2006-06-26 18:23 ` Eric W. Biederman
2006-06-26 18:23 ` [Fastboot] " Eric W. Biederman
2006-06-27 2:42 ` [RFC] [PATCH 2/2] kdump: cciss driver initialization?issue fix Horms
2006-06-27 2:42 ` Horms
2006-06-26 9:09 ` Horms
2006-06-26 9:09 ` Horms
2006-06-26 13:45 ` [Fastboot] " Vivek Goyal
2006-06-27 2:30 ` Horms
2006-06-23 21:30 ` [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter Bernd Eckenfels
2006-06-23 22:39 ` Vivek Goyal
2006-06-24 6:55 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060626181825.GI8985@in.ibm.com \
--to=vgoyal@in.ibm.com \
--cc=Mike.Miller@hp.com \
--cc=Neela.Kolli@engenio.com \
--cc=akpm@osdl.org \
--cc=ebiederm@xmission.com \
--cc=fastboot@lists.osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.