[RFC] [PATCH 1/2] introduce crashboot kernel command line parameter

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter
@ 2006-06-23 21:01 Vivek Goyal
  2006-06-23 21:04 ` [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix Vivek Goyal
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Vivek Goyal @ 2006-06-23 21:01 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Fastboot mailing list, Linux SCSI Mailing list, Eric W. Biederman,
	Morton Andrew Morton, mike.miller


o Add kernel command line option "crashboot"

o This option is an indication to the kernel that kernel is booting in an
  unreliable environment where possibly BIOS execution has been skipped
  and devices are left operational or in unknown state.

o Kernel, especially device drivers can use this option to take special
  actions like soft-resetting the device, relaxing some of the rules
  to make sure kernel can boot/device driver can initiliaze in this
  environment.

o As of today this option is useful to Kdump. Kdump will pass this option
  to second kernel to improve the reliability of successful kenrel boot/
  device driver initializatoin. 

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 linux-2.6.17-1M-vivek/Documentation/kernel-parameters.txt |    9 ++++++
 linux-2.6.17-1M-vivek/include/linux/init.h                |    1 
 linux-2.6.17-1M-vivek/init/main.c                         |   19 ++++++++++++++
 3 files changed, 29 insertions(+)

diff -puN Documentation/kernel-parameters.txt~introduce-crash-boot-command-line-parameter Documentation/kernel-parameters.txt
--- linux-2.6.17-1M/Documentation/kernel-parameters.txt~introduce-crash-boot-command-line-parameter	2006-06-23 10:54:39.000000000 -0400
+++ linux-2.6.17-1M-vivek/Documentation/kernel-parameters.txt	2006-06-23 13:47:28.000000000 -0400
@@ -405,6 +405,15 @@ running once the system is up.
 			[KNL] Reserve a chunk of physical memory to
 			hold a kernel to switch to with kexec on panic.
 
+	crashboot	[KNL] Use this if kernel is booting in a potentially
+			unreliable environement. For ex. kdump, where new
+			kernel is booting after the first kernel crash and
+			BIOS has been skipped and devices are in unknown
+			state.
+
+			Device drivers might soft reset the devices before
+			doing further device initialization.
+
 	cs4232=		[HW,OSS]
 			Format: <io>,<irq>,<dma>,<dma2>,<mpuio>,<mpuirq>
 
diff -puN init/main.c~introduce-crash-boot-command-line-parameter init/main.c
--- linux-2.6.17-1M/init/main.c~introduce-crash-boot-command-line-parameter	2006-06-23 13:14:49.000000000 -0400
+++ linux-2.6.17-1M-vivek/init/main.c	2006-06-23 13:34:34.000000000 -0400
@@ -121,6 +121,17 @@ char saved_command_line[COMMAND_LINE_SIZ
 static char *execute_command;
 static char *ramdisk_execute_command;
 
+/*
+ * If set, indicates that kernel is booting in an unreliable environment.
+ * For ex. kdump situaiton where previous kernel has crashed, BIOS has been
+ * skipped and devices will be in unknown state.
+ *
+ * Device drivers can use it to know that underlying device is in unknown
+ * state and might even be finishing commands issued from previous kernel's
+ * context.
+ */
+unsigned int crash_boot;
+
 /* Setup configured maximum number of CPUs to activate */
 static unsigned int max_cpus = NR_CPUS;
 
@@ -150,6 +161,14 @@ static int __init maxcpus(char *str)
 
 __setup("maxcpus=", maxcpus);
 
+static int __init set_crash_boot(char *str)
+{
+	crash_boot = 1;
+	return 1;
+}
+
+__setup("crashboot", set_crash_boot);
+
 static char * argv_init[MAX_INIT_ARGS+2] = { "init", NULL, };
 char * envp_init[MAX_INIT_ENVS+2] = { "HOME=/", "TERM=linux", NULL, };
 static const char *panic_later, *panic_param;
diff -puN include/linux/init.h~introduce-crash-boot-command-line-parameter include/linux/init.h
--- linux-2.6.17-1M/include/linux/init.h~introduce-crash-boot-command-line-parameter	2006-06-23 15:00:25.000000000 -0400
+++ linux-2.6.17-1M-vivek/include/linux/init.h	2006-06-23 15:00:41.000000000 -0400
@@ -69,6 +69,7 @@ extern initcall_t __security_initcall_st
 
 /* Defined in init/main.c */
 extern char saved_command_line[];
+extern unsigned int crash_boot;
 
 /* used by init/main.c */
 extern void setup_arch(char **);
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-23 21:01 [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter Vivek Goyal
@ 2006-06-23 21:04 ` Vivek Goyal
  2006-06-24  6:55   ` Andrew Morton
  2006-06-23 21:30 ` [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter Bernd Eckenfels
  2006-06-24  6:55 ` Andrew Morton
  2 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2006-06-23 21:04 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Fastboot mailing list, Linux SCSI Mailing list, Eric W. Biederman,
	Morton Andrew Morton, mike.miller


o cciss driver initialization fails and hits BUG() if underlying device
  was active during the driver initialization. Device might be active
  if previous kernel crashed and this kernel is booting after that using
  kdump.

kernel BUG at drivers/block/cciss.c:2141!
invalid opcode: 0000 [#1]
last sysfs file: /class/sound/timer/dev
CPU:    0
EIP:    0060:[<c59b9ab1>]    Not tainted VLI
EFLAGS: 00010296   (2.6.16-1.2122_FC5kdump #1)
EIP is at sendcmd+0x261/0x29c [cciss]
eax: 00000045   ebx: c3300000   ecx: c36b1cf8   edx: c59bd5c5
esi: 00000000   edi: 00001388   ebp: c3ad1400   esp: c36b1d00
ds: 007b   es: 007b   ss: 0068
Process modprobe (pid: 687, threadinfo=c36b1000 task=c3760aa0)
Stack: <0>00246dc0 c14aef40 00000012 00000000 c4f00480 00000000 c4f00508
c14aef40
       c59b9da0 00000024 00000000 00000000 00000000 00000000 00000000 c1006ece
       00000246 c35e2c20 000000d0 c4fddc00 c36b1db0 c3ad1400 c12b2570 c4fddc00
Call Trace:
 [<c59b9da0>] cciss_getgeometry+0xa9/0x24a [cciss]
 [<c1006ece>] dma_alloc_coherent+0xb1/0xec     [<c12b2570>]
setup_IO_APIC+0x23/0xcb5
 [<c59bc07e>] cciss_init_one+0x5cd/0x9e7 [cciss]     [<c10cdd6b>]
pci_match_device+0x13/0xb3
 [<c11284ec>] __driver_attach+0x0/0x8b     [<c10cde57>]
pci_device_probe+0x36/0x57
 [<c1128439>] driver_probe_device+0x42/0x8b     [<c112854f>]
__driver_attach+0x63/0x8b
 [<c1127f35>] bus_for_each_dev+0x33/0x55     [<c112839d>]
driver_attach+0x11/0x13
 [<c11284ec>] __driver_attach+0x0/0x8b     [<c1127c56>]
bus_add_driver+0x64/0xfd [<c10cdfe6>] __pci_register_driver+0x7f/0xa1
[<c1031ab2>] sys_init_module+0x1382/0x1514
 [<c11e0316>] do_page_fault+0x17d/0x5db     [<c1002be9>] syscall_call+0x7/0xb
Code: 00 8b b8 d4 03 00 00 81 ff 81 01 00 00 7e 0f 56 68 0d d6 9b c5 e8 53 28
66 fb 58 5a eb 0d 8b 42 04 89 0c b8 ff 02 e9 72 fe ff ff <0f> 0b 5d 08 a0 d2
9b c5 e9 65 fe ff ff 83
bd d4 03 00 00 00 7e

o If crash_boot parameter is set, then ignore the completed command messages
  sent by device which have not been issued in the context of this kernel.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 linux-2.6.17-1M-vivek/drivers/block/cciss.c |    7 +++++++
 1 files changed, 7 insertions(+)

diff -puN drivers/block/cciss.c~cciss-initialization-issue-over-kdump-fix drivers/block/cciss.c
--- linux-2.6.17-1M/drivers/block/cciss.c~cciss-initialization-issue-over-kdump-fix	2006-06-23 14:04:55.000000000 -0400
+++ linux-2.6.17-1M-vivek/drivers/block/cciss.c	2006-06-23 14:08:12.000000000 -0400
@@ -1976,6 +1976,13 @@ static int add_sendcmd_reject(__u8 cmd, 
 			ctlr, complete);
 		/* not much we can do. */
 #ifdef CONFIG_CISS_SCSI_TAPE
+		/* We might get notification of completion of commands
+		 * which we never issued in this kernel if this boot is
+		 * taking place after previous kernel's crash. Simply
+		 * ignore the commands in this case.
+		 */
+		if (crash_boot)
+			return 0;
 		return 1;
 	}
 
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-23 21:04 ` [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix Vivek Goyal
@ 2006-06-24  6:55   ` Andrew Morton
  2006-06-24 11:19     ` Vivek Goyal
  0 siblings, 1 reply; 38+ messages in thread
From: Andrew Morton @ 2006-06-24  6:55 UTC (permalink / raw)
  To: vgoyal; +Cc: linux-kernel, fastboot, linux-scsi, ebiederm, mike.miller

On Fri, 23 Jun 2006 17:04:24 -0400
Vivek Goyal <vgoyal@in.ibm.com> wrote:

> 
> o cciss driver initialization fails and hits BUG() if underlying device
>   was active during the driver initialization. Device might be active
>   if previous kernel crashed and this kernel is booting after that using
>   kdump.
> 
>
> ...
>
> o If crash_boot parameter is set, then ignore the completed command messages
>   sent by device which have not been issued in the context of this kernel.
> 
> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
> ---
> 
>  linux-2.6.17-1M-vivek/drivers/block/cciss.c |    7 +++++++
>  1 files changed, 7 insertions(+)
> 
> diff -puN drivers/block/cciss.c~cciss-initialization-issue-over-kdump-fix drivers/block/cciss.c
> --- linux-2.6.17-1M/drivers/block/cciss.c~cciss-initialization-issue-over-kdump-fix	2006-06-23 14:04:55.000000000 -0400
> +++ linux-2.6.17-1M-vivek/drivers/block/cciss.c	2006-06-23 14:08:12.000000000 -0400
> @@ -1976,6 +1976,13 @@ static int add_sendcmd_reject(__u8 cmd, 
>  			ctlr, complete);
>  		/* not much we can do. */
>  #ifdef CONFIG_CISS_SCSI_TAPE
> +		/* We might get notification of completion of commands
> +		 * which we never issued in this kernel if this boot is
> +		 * taking place after previous kernel's crash. Simply
> +		 * ignore the commands in this case.
> +		 */
> +		if (crash_boot)
> +			return 0;
>  		return 1;

Looks like this is working around a driver problem rather than fixing it
properly ;)


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-24  6:55   ` Andrew Morton
@ 2006-06-24 11:19     ` Vivek Goyal
  2006-06-24 11:30       ` Andrew Morton
  0 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2006-06-24 11:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, fastboot, linux-scsi, ebiederm, mike.miller

On Fri, Jun 23, 2006 at 11:55:53PM -0700, Andrew Morton wrote:
> On Fri, 23 Jun 2006 17:04:24 -0400
> Vivek Goyal <vgoyal@in.ibm.com> wrote:
> 
> > 
> > o cciss driver initialization fails and hits BUG() if underlying device
> >   was active during the driver initialization. Device might be active
> >   if previous kernel crashed and this kernel is booting after that using
> >   kdump.
> > 
> >
> > ...
> >
> > o If crash_boot parameter is set, then ignore the completed command messages
> >   sent by device which have not been issued in the context of this kernel.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
> > ---
> > 
> >  linux-2.6.17-1M-vivek/drivers/block/cciss.c |    7 +++++++
> >  1 files changed, 7 insertions(+)
> > 
> > diff -puN drivers/block/cciss.c~cciss-initialization-issue-over-kdump-fix drivers/block/cciss.c
> > --- linux-2.6.17-1M/drivers/block/cciss.c~cciss-initialization-issue-over-kdump-fix	2006-06-23 14:04:55.000000000 -0400
> > +++ linux-2.6.17-1M-vivek/drivers/block/cciss.c	2006-06-23 14:08:12.000000000 -0400
> > @@ -1976,6 +1976,13 @@ static int add_sendcmd_reject(__u8 cmd, 
> >  			ctlr, complete);
> >  		/* not much we can do. */
> >  #ifdef CONFIG_CISS_SCSI_TAPE
> > +		/* We might get notification of completion of commands
> > +		 * which we never issued in this kernel if this boot is
> > +		 * taking place after previous kernel's crash. Simply
> > +		 * ignore the commands in this case.
> > +		 */
> > +		if (crash_boot)
> > +			return 0;
> >  		return 1;
> 
> Looks like this is working around a driver problem rather than fixing it
> properly ;)

That's true. Its more of a working around the problem. I think in all
such cases we should soft reset the device so that device drops the messages
issued from the context of previous kernel and starts afresh.

But looks like not all the devices provide software reset facility
(Or I can't find it out from the source code or limited documentation
available). Mike, can I soft reset this device?

I am facing similar problem in megaraid driver as well where detailed
technical documentation is not available and I can't find a way to
soft reset the device.

Or is there a generic way to handle these situations? Fixing them driver
by driver is a long painful process. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-24 11:19     ` Vivek Goyal
@ 2006-06-24 11:30       ` Andrew Morton
  2006-06-24 12:08         ` Vivek Goyal
  0 siblings, 1 reply; 38+ messages in thread
From: Andrew Morton @ 2006-06-24 11:30 UTC (permalink / raw)
  To: vgoyal; +Cc: linux-kernel, fastboot, linux-scsi, ebiederm, mike.miller

On Sat, 24 Jun 2006 07:19:54 -0400
Vivek Goyal <vgoyal@in.ibm.com> wrote:

> On Fri, Jun 23, 2006 at 11:55:53PM -0700, Andrew Morton wrote:
> > On Fri, 23 Jun 2006 17:04:24 -0400
> > Vivek Goyal <vgoyal@in.ibm.com> wrote:
> > 
> > > 
> > > o cciss driver initialization fails and hits BUG() if underlying device
> > >   was active during the driver initialization. Device might be active
> > >   if previous kernel crashed and this kernel is booting after that using
> > >   kdump.
> > > 
> > >
> > > ...
> > >
> > > o If crash_boot parameter is set, then ignore the completed command messages
> > >   sent by device which have not been issued in the context of this kernel.
> > > 
> > > Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
> > > ---
> > > 
> > >  linux-2.6.17-1M-vivek/drivers/block/cciss.c |    7 +++++++
> > >  1 files changed, 7 insertions(+)
> > > 
> > > diff -puN drivers/block/cciss.c~cciss-initialization-issue-over-kdump-fix drivers/block/cciss.c
> > > --- linux-2.6.17-1M/drivers/block/cciss.c~cciss-initialization-issue-over-kdump-fix	2006-06-23 14:04:55.000000000 -0400
> > > +++ linux-2.6.17-1M-vivek/drivers/block/cciss.c	2006-06-23 14:08:12.000000000 -0400
> > > @@ -1976,6 +1976,13 @@ static int add_sendcmd_reject(__u8 cmd, 
> > >  			ctlr, complete);
> > >  		/* not much we can do. */
> > >  #ifdef CONFIG_CISS_SCSI_TAPE
> > > +		/* We might get notification of completion of commands
> > > +		 * which we never issued in this kernel if this boot is
> > > +		 * taking place after previous kernel's crash. Simply
> > > +		 * ignore the commands in this case.
> > > +		 */
> > > +		if (crash_boot)
> > > +			return 0;
> > >  		return 1;
> > 
> > Looks like this is working around a driver problem rather than fixing it
> > properly ;)
> 
> That's true. Its more of a working around the problem. I think in all
> such cases we should soft reset the device so that device drops the messages
> issued from the context of previous kernel and starts afresh.

Sounds good.

> But looks like not all the devices provide software reset facility
> (Or I can't find it out from the source code or limited documentation
> available). Mike, can I soft reset this device?
> 
> I am facing similar problem in megaraid driver as well where detailed
> technical documentation is not available and I can't find a way to
> soft reset the device.

Megaraid has a maintainer who has documents and hardware engineers.

> Or is there a generic way to handle these situations? Fixing them driver
> by driver is a long painful process. 

Some generic way of whacking a PCI device via the standard PCI registers? 
Not that I know of.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-24 11:30       ` Andrew Morton
@ 2006-06-24 12:08         ` Vivek Goyal
  2006-06-24 17:13           ` Eric W. Biederman
  0 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2006-06-24 12:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, fastboot, linux-scsi, ebiederm, mike.miller,
	Neela.Kolli

On Sat, Jun 24, 2006 at 04:30:46AM -0700, Andrew Morton wrote:

[..]
> > > > 
> > > > diff -puN drivers/block/cciss.c~cciss-initialization-issue-over-kdump-fix drivers/block/cciss.c
> > > > --- linux-2.6.17-1M/drivers/block/cciss.c~cciss-initialization-issue-over-kdump-fix	2006-06-23 14:04:55.000000000 -0400
> > > > +++ linux-2.6.17-1M-vivek/drivers/block/cciss.c	2006-06-23 14:08:12.000000000 -0400
> > > > @@ -1976,6 +1976,13 @@ static int add_sendcmd_reject(__u8 cmd, 
> > > >  			ctlr, complete);
> > > >  		/* not much we can do. */
> > > >  #ifdef CONFIG_CISS_SCSI_TAPE
> > > > +		/* We might get notification of completion of commands
> > > > +		 * which we never issued in this kernel if this boot is
> > > > +		 * taking place after previous kernel's crash. Simply
> > > > +		 * ignore the commands in this case.
> > > > +		 */
> > > > +		if (crash_boot)
> > > > +			return 0;
> > > >  		return 1;
> > > 
> > > Looks like this is working around a driver problem rather than fixing it
> > > properly ;)
> > 
> > That's true. Its more of a working around the problem. I think in all
> > such cases we should soft reset the device so that device drops the messages
> > issued from the context of previous kernel and starts afresh.
> 
> Sounds good.
> 
> > But looks like not all the devices provide software reset facility
> > (Or I can't find it out from the source code or limited documentation
> > available). Mike, can I soft reset this device?
> > 
> > I am facing similar problem in megaraid driver as well where detailed
> > technical documentation is not available and I can't find a way to
> > soft reset the device.
> 
> Megaraid has a maintainer who has documents and hardware engineers.
> 

Well, maintainer mentioned that we do not export more documents more than
what is available on LSI site. That site contains product specification,
installation guides, user guides etc but not a technical document which
gives insight into the various registers and what a driver writer
can do with the device.

I have also sent mails regarding my problem to linux-scsi list as well
as to people working on megaraid but but no response. :-(


> > Or is there a generic way to handle these situations? Fixing them driver
> > by driver is a long painful process. 
> 
> Some generic way of whacking a PCI device via the standard PCI registers? 
> Not that I know of.

Somebody hinted that think of PCI bus reset. But I think PCI bus reset will
require firware/BIOS to export a hook to software to so initiate PCI bus
reset and I don't think many platforms do that. Infact I am not even aware
of one platform who does that.

Thanks
Vivek  

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-24 12:08         ` Vivek Goyal
@ 2006-06-24 17:13           ` Eric W. Biederman
  2006-06-26  2:11             ` [Fastboot] " Maneesh Soni
  2006-06-26  9:09             ` Horms
  0 siblings, 2 replies; 38+ messages in thread
From: Eric W. Biederman @ 2006-06-24 17:13 UTC (permalink / raw)
  To: vgoyal
  Cc: Andrew Morton, linux-kernel, fastboot, linux-scsi, mike.miller,
	Neela.Kolli

Vivek Goyal <vgoyal@in.ibm.com> writes:

> On Sat, Jun 24, 2006 at 04:30:46AM -0700, Andrew Morton wrote:
>
>> > Or is there a generic way to handle these situations? Fixing them driver
>> > by driver is a long painful process. 
>> 
>> Some generic way of whacking a PCI device via the standard PCI registers? 
>> Not that I know of.
>
> Somebody hinted that think of PCI bus reset. But I think PCI bus reset will
> require firware/BIOS to export a hook to software to so initiate PCI bus
> reset and I don't think many platforms do that. Infact I am not even aware
> of one platform who does that.

Not all pci busses support it but there is a standard pci bus reset bit
in pci bridges.

I don't know if it would help but it might make sense to have a config
option that can be used to mark drivers that are known to have problems,
in these scenarios.

CONFIG_BRITTLE_INIT perhaps?

It would at least make it easier for people to see which drivers
they don't want to use, and give people some incentive to fix things.

Eric

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-24 17:13           ` Eric W. Biederman
@ 2006-06-26  2:11             ` Maneesh Soni
  2006-06-26 13:35               ` Vivek Goyal
  2006-06-26  9:09             ` Horms
  1 sibling, 1 reply; 38+ messages in thread
From: Maneesh Soni @ 2006-06-26  2:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, Andrew Morton, Neela.Kolli, linux-scsi, mike.miller,
	fastboot, linux-kernel

On Sat, Jun 24, 2006 at 11:13:44AM -0600, Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@in.ibm.com> writes:
> 
> > On Sat, Jun 24, 2006 at 04:30:46AM -0700, Andrew Morton wrote:
> >
> >> > Or is there a generic way to handle these situations? Fixing them driver
> >> > by driver is a long painful process. 
> >> 
> >> Some generic way of whacking a PCI device via the standard PCI registers? 
> >> Not that I know of.
> >
> > Somebody hinted that think of PCI bus reset. But I think PCI bus reset will
> > require firware/BIOS to export a hook to software to so initiate PCI bus
> > reset and I don't think many platforms do that. Infact I am not even aware
> > of one platform who does that.
> 
> Not all pci busses support it but there is a standard pci bus reset bit
> in pci bridges.
> 
> I don't know if it would help but it might make sense to have a config
> option that can be used to mark drivers that are known to have problems,
> in these scenarios.
> 
> CONFIG_BRITTLE_INIT perhaps?
> 
> It would at least make it easier for people to see which drivers
> they don't want to use, and give people some incentive to fix things.
> 

Vivek, 

I think having something as Eric suggested instead of crashboot= is better.
We can hve this config option set for kernel like dump capture
kernel. (CONFIG_CRASH_DUMP=y). This should save some bytes on already longish
kdump kernel boot paramenters.

Thanks
Maneesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26  2:11             ` [Fastboot] " Maneesh Soni
@ 2006-06-26 13:35               ` Vivek Goyal
  2006-06-26 14:17                 ` Eric W. Biederman
  0 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2006-06-26 13:35 UTC (permalink / raw)
  To: Maneesh Soni
  Cc: Eric W. Biederman, Andrew Morton, Neela.Kolli, linux-scsi,
	mike.miller, fastboot, linux-kernel

On Mon, Jun 26, 2006 at 07:41:00AM +0530, Maneesh Soni wrote:
> On Sat, Jun 24, 2006 at 11:13:44AM -0600, Eric W. Biederman wrote:
> > Vivek Goyal <vgoyal@in.ibm.com> writes:
> > 
> > > On Sat, Jun 24, 2006 at 04:30:46AM -0700, Andrew Morton wrote:
> > >
> > >> > Or is there a generic way to handle these situations? Fixing them driver
> > >> > by driver is a long painful process. 
> > >> 
> > >> Some generic way of whacking a PCI device via the standard PCI registers? 
> > >> Not that I know of.
> > >
> > > Somebody hinted that think of PCI bus reset. But I think PCI bus reset will
> > > require firware/BIOS to export a hook to software to so initiate PCI bus
> > > reset and I don't think many platforms do that. Infact I am not even aware
> > > of one platform who does that.
> > 
> > Not all pci busses support it but there is a standard pci bus reset bit
> > in pci bridges.
> > 
> > I don't know if it would help but it might make sense to have a config
> > option that can be used to mark drivers that are known to have problems,
> > in these scenarios.
> > 
> > CONFIG_BRITTLE_INIT perhaps?
> > 
> > It would at least make it easier for people to see which drivers
> > they don't want to use, and give people some incentive to fix things.
> > 
> 
> Vivek, 
> 
> I think having something as Eric suggested instead of crashboot= is better.
> We can hve this config option set for kernel like dump capture
> kernel. (CONFIG_CRASH_DUMP=y). This should save some bytes on already longish
> kdump kernel boot paramenters.
> 

Maneesh, Keeping this code under a config option becomes a problem when we
will have a relocatable kernel. At some point of time we got to have
relocatable kernel so that people don't have to build two kernels. In fact
this is becoming a pain area for distros. That's the reason I thought
of making it a command line parameter.

I remember few months back, Eric had mentioned that he has got patches for
relocatable kernel ready for review for i386 and x86_64. Eric, do you have
any plans to post the patches for review?

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 13:35               ` Vivek Goyal
@ 2006-06-26 14:17                 ` Eric W. Biederman
  2006-06-26 15:32                   ` Vivek Goyal
  2006-06-27  2:42                   ` [RFC] [PATCH 2/2] kdump: cciss driver initialization?issue fix Horms
  0 siblings, 2 replies; 38+ messages in thread
From: Eric W. Biederman @ 2006-06-26 14:17 UTC (permalink / raw)
  To: vgoyal
  Cc: Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi, mike.miller,
	fastboot, linux-kernel

Vivek Goyal <vgoyal@in.ibm.com> writes:

> On Mon, Jun 26, 2006 at 07:41:00AM +0530, Maneesh Soni wrote:
>
> Maneesh, Keeping this code under a config option becomes a problem when we
> will have a relocatable kernel. At some point of time we got to have
> relocatable kernel so that people don't have to build two kernels. In fact
> this is becoming a pain area for distros. That's the reason I thought
> of making it a command line parameter.

Ok. Even if we do this with a command line, we need to have a clean concept.
If the concept is ignore devices with a brittle init routine that is comprehensible
and potentially useful for other reasons than crash dumps.

If the concept is crashdump it is a poorly defined concept and all of Andrews
objections apply.

> I remember few months back, Eric had mentioned that he has got patches for
> relocatable kernel ready for review for i386 and x86_64. Eric, do you have
> any plans to post the patches for review?

I have some code that I keep intending to get to.  It has probably bit rotted
since I wrote it, but it shouldn't be too bad to clean up.
Unfortunately the whole crashdump thing is fairly low on my priority list.

Although I suspect a relocatable kernel is actually easier than the more
important task of moving IRQ initialization into init_IRQ. on x86 and x86_64.

At least I have managed to remove 3 layers of indirection in the x86_64 irq
handling code recently :)

Eric

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 14:17                 ` Eric W. Biederman
@ 2006-06-26 15:32                   ` Vivek Goyal
  2006-06-26 16:00                     ` Eric W. Biederman
  2006-06-27  2:42                   ` [RFC] [PATCH 2/2] kdump: cciss driver initialization?issue fix Horms
  1 sibling, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2006-06-26 15:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi, mike.miller,
	fastboot, linux-kernel

On Mon, Jun 26, 2006 at 08:17:27AM -0600, Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@in.ibm.com> writes:
> 
> > On Mon, Jun 26, 2006 at 07:41:00AM +0530, Maneesh Soni wrote:
> >
> > Maneesh, Keeping this code under a config option becomes a problem when we
> > will have a relocatable kernel. At some point of time we got to have
> > relocatable kernel so that people don't have to build two kernels. In fact
> > this is becoming a pain area for distros. That's the reason I thought
> > of making it a command line parameter.
> 
> Ok. Even if we do this with a command line, we need to have a clean concept.
> If the concept is ignore devices with a brittle init routine that is comprehensible
> and potentially useful for other reasons than crash dumps.
> 

Looks like there are two problems to be solved.

- Framework/capability to mark and isolate the drivers, either at compile
  time or run time, which are not hardened enough to initialize properly
  when the underlying device is in operational or in unknown state.

- Actually hardening a driver to be able to initialize in a potentially
  unreliable environment.  

Solving first problem will help more in terms of people knowing in advance
that certain drivers are known to have problems in specific environemnt and
a user has got the option of skipping the execution/compilation of those
drivers. (This is something close to what CONFIG_EXPERIMENTAL does)

Second problem deals more with actually hardening the driver and not
skipping its compilation/execution.

I think people would like to change a driver's behaviour at run time.
For example if they are booting in a unreliable environment they would
like to reset the device otherwise they would skip that as BIOS has
already done that for them. 

But looks like not all devices have got the capablity to be reset from
software. In those cases probably one need to put some hooks, relax
driver's consistency checks etc in special boot environment.

Here I am trying to solve the second problem so that a driver comes to 
know that it is initializing in a special boot environment and it can
modify its behavior at run time. 

> If the concept is crashdump it is a poorly defined concept and all of Andrews
> objections apply.
> 

I think this parameter is generic enough and not limited to crashdumps.
If a user decides to implement a different scheme than kdump in kexec
on panic and boot a customized kernel, he can very well use this
parameter to make sure that next kernel is able to at least boot and not
panic in between.

Solving first problem will help in doing a plain kexec. We can simply
mark the drivers known to have problems and slowly people can fix those
drivers. Fixing the driver in this case is different because most likely
driver authors will provide a shutdown routine in the driver so that
device can be shutdown and then one can boot into the second kernel.  
Till then a user can happily skip the drivers known to have problems.

In summary, we got two problems to solve. Currently I am focussed on
solving second problem which enables boot a kernel in an unreliable
environment and do some minimal specific operation and then boot back to
regular kernela. So I think just introducing a command line parameter
which drivers can use to determine that they are initializing in an
special environement, solves it and is generic enough.

Options like COFIG_BRITTLE_INIT or sikkping execution of brittle driver
based on a command line option seems to be the solution for the first
problem.

Please correct me if I am wrong. I know little about drivers.

> > I remember few months back, Eric had mentioned that he has got patches for
> > relocatable kernel ready for review for i386 and x86_64. Eric, do you have
> > any plans to post the patches for review?
> 
> I have some code that I keep intending to get to.  It has probably bit rotted
> since I wrote it, but it shouldn't be too bad to clean up.
> Unfortunately the whole crashdump thing is fairly low on my priority list.
> 

I am willing to work on it. Building from scratch always takes more time.
If you are willing, I will more than happy to build on top of your patches.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 15:32                   ` Vivek Goyal
@ 2006-06-26 16:00                     ` Eric W. Biederman
  2006-06-26 16:13                       ` Miller, Mike (OS Dev)
  2006-06-26 17:16                       ` Vivek Goyal
  0 siblings, 2 replies; 38+ messages in thread
From: Eric W. Biederman @ 2006-06-26 16:00 UTC (permalink / raw)
  To: vgoyal
  Cc: Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi, mike.miller,
	fastboot, linux-kernel

Vivek Goyal <vgoyal@in.ibm.com> writes:

> On Mon, Jun 26, 2006 at 08:17:27AM -0600, Eric W. Biederman wrote:
>> Vivek Goyal <vgoyal@in.ibm.com> writes:
>> 
>> > On Mon, Jun 26, 2006 at 07:41:00AM +0530, Maneesh Soni wrote:
>> >
>> > Maneesh, Keeping this code under a config option becomes a problem when we
>> > will have a relocatable kernel. At some point of time we got to have
>> > relocatable kernel so that people don't have to build two kernels. In fact
>> > this is becoming a pain area for distros. That's the reason I thought
>> > of making it a command line parameter.
>> 
>> Ok. Even if we do this with a command line, we need to have a clean concept.
>> If the concept is ignore devices with a brittle init routine that is
> comprehensible
>> and potentially useful for other reasons than crash dumps.
>> 
>
> Looks like there are two problems to be solved.
>
> - Framework/capability to mark and isolate the drivers, either at compile
>   time or run time, which are not hardened enough to initialize properly
>   when the underlying device is in operational or in unknown state.
>
> - Actually hardening a driver to be able to initialize in a potentially
>   unreliable environment.  
>
>
> Solving first problem will help more in terms of people knowing in advance
> that certain drivers are known to have problems in specific environemnt and
> a user has got the option of skipping the execution/compilation of those
> drivers. (This is something close to what CONFIG_EXPERIMENTAL does)
>
> Second problem deals more with actually hardening the driver and not
> skipping its compilation/execution.
>
> I think people would like to change a driver's behaviour at run time.
> For example if they are booting in a unreliable environment they would
> like to reset the device otherwise they would skip that as BIOS has
> already done that for them. 

In the general case the device reset does not hurt.  Yes there
is the case of the slow scsi probe.  But a lot of that appears
to be a poor implementation of the scsi probe.  So I can see a kernel
command line option to play fast and loose but we should be safe
and thorough by default.

The more code paths you introduce the harder code is to maintain
and test.  The earlier discussion suggested you cannot harden
some drivers.  We can take action against drivers like that simply
and easily.

Hacks in the driver initialization are a completely different story.

> But looks like not all devices have got the capablity to be reset from
> software. In those cases probably one need to put some hooks, relax
> driver's consistency checks etc in special boot environment.

Forget the concept of a special boot environment.  A buggy BIOS or
rebooting after being in windows can potentially have the same effect
as a kdump, environment.

> Here I am trying to solve the second problem so that a driver comes to 
> know that it is initializing in a special boot environment and it can
> modify its behavior at run time. 

As Andrew said that encourages hacks.  
For the specific megaraid example it would be simple enough to
always ignore the condition and just print a warning.

There is no such thing as a special boot environment there are only
quality of implementation differences.  And in a kexec on a panic
scenario the quality of implementation is terrible.

>> If the concept is crashdump it is a poorly defined concept and all of Andrews
>> objections apply.
>> 
>
> I think this parameter is generic enough and not limited to crashdumps.
> If a user decides to implement a different scheme than kdump in kexec
> on panic and boot a customized kernel, he can very well use this
> parameter to make sure that next kernel is able to at least boot and not
> panic in between.

The name crashboot is certainly not generic enough to make
it clear what it means or to make it sound interesting outside
of a crashdump scenario.

> Solving first problem will help in doing a plain kexec. We can simply
> mark the drivers known to have problems and slowly people can fix those
> drivers. Fixing the driver in this case is different because most likely
> driver authors will provide a shutdown routine in the driver so that
> device can be shutdown and then one can boot into the second kernel.  
> Till then a user can happily skip the drivers known to have problems.
>
> In summary, we got two problems to solve. Currently I am focussed on
> solving second problem which enables boot a kernel in an unreliable
> environment and do some minimal specific operation and then boot back to
> regular kernela. So I think just introducing a command line parameter
> which drivers can use to determine that they are initializing in an
> special environement, solves it and is generic enough.
>
> Options like COFIG_BRITTLE_INIT or sikkping execution of brittle driver
> based on a command line option seems to be the solution for the first
> problem.

Among other things it is social engineering to solve the first
problem.

> Please correct me if I am wrong. I know little about drivers.
>
>
>> > I remember few months back, Eric had mentioned that he has got patches for
>> > relocatable kernel ready for review for i386 and x86_64. Eric, do you have
>> > any plans to post the patches for review?
>> 
>> I have some code that I keep intending to get to.  It has probably bit rotted
>> since I wrote it, but it shouldn't be too bad to clean up.
>> Unfortunately the whole crashdump thing is fairly low on my priority list.
>> 
>
> I am willing to work on it. Building from scratch always takes more time.
> If you are willing, I will more than happy to build on top of your
> patches.

I will see what I can dig up.

Eric

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 16:00                     ` Eric W. Biederman
@ 2006-06-26 16:13                       ` Miller, Mike (OS Dev)
  2006-06-26 16:35                         ` Vivek Goyal
  2006-06-26 16:38                         ` Eric W. Biederman
  2006-06-26 17:16                       ` Vivek Goyal
  1 sibling, 2 replies; 38+ messages in thread
From: Miller, Mike (OS Dev) @ 2006-06-26 16:13 UTC (permalink / raw)
  To: Eric W. Biederman, vgoyal
  Cc: Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi, fastboot,
	linux-kernel

All,
Sorry to come in late and top post. I've been out of the office and I'm
trying to get to the gist of this issue.
Exactly what is the problem? I'm not familiar with kdump so I don't have
a clue about what's going on. 
There are a couple of reset features supported by _some_ cciss
controllers. I'd have to go back to the open spec to see whats in the
public domain. We're trying to get the open spec updated and more
complete but we're waiting on the lawyers. :(

mikem

> -----Original Message-----
> From: Eric W. Biederman [mailto:ebiederm@xmission.com] 
> Sent: Monday, June 26, 2006 11:01 AM
> To: vgoyal@in.ibm.com
> Cc: Maneesh Soni; Andrew Morton; Neela.Kolli@engenio.com; 
> linux-scsi@vger.kernel.org; Miller, Mike (OS Dev); 
> fastboot@lists.osdl.org; linux-kernel@vger.kernel.org
> Subject: Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver 
> initialization issue fix
> 
> Vivek Goyal <vgoyal@in.ibm.com> writes:
> 
> > On Mon, Jun 26, 2006 at 08:17:27AM -0600, Eric W. Biederman wrote:
> >> Vivek Goyal <vgoyal@in.ibm.com> writes:
> >> 
> >> > On Mon, Jun 26, 2006 at 07:41:00AM +0530, Maneesh Soni wrote:
> >> >
> >> > Maneesh, Keeping this code under a config option becomes 
> a problem 
> >> > when we will have a relocatable kernel. At some point of time we 
> >> > got to have relocatable kernel so that people don't have 
> to build 
> >> > two kernels. In fact this is becoming a pain area for distros. 
> >> > That's the reason I thought of making it a command line 
> parameter.
> >> 
> >> Ok. Even if we do this with a command line, we need to 
> have a clean concept.
> >> If the concept is ignore devices with a brittle init 
> routine that is
> > comprehensible
> >> and potentially useful for other reasons than crash dumps.
> >> 
> >
> > Looks like there are two problems to be solved.
> >
> > - Framework/capability to mark and isolate the drivers, 
> either at compile
> >   time or run time, which are not hardened enough to 
> initialize properly
> >   when the underlying device is in operational or in unknown state.
> >
> > - Actually hardening a driver to be able to initialize in a 
> potentially
> >   unreliable environment.  
> >
> >
> > Solving first problem will help more in terms of people knowing in 
> > advance that certain drivers are known to have problems in specific 
> > environemnt and a user has got the option of skipping the 
> > execution/compilation of those drivers. (This is something close to 
> > what CONFIG_EXPERIMENTAL does)
> >
> > Second problem deals more with actually hardening the 
> driver and not 
> > skipping its compilation/execution.
> >
> > I think people would like to change a driver's behaviour at 
> run time.
> > For example if they are booting in a unreliable environment 
> they would 
> > like to reset the device otherwise they would skip that as BIOS has 
> > already done that for them.
> 
> In the general case the device reset does not hurt.  Yes 
> there is the case of the slow scsi probe.  But a lot of that 
> appears to be a poor implementation of the scsi probe.  So I 
> can see a kernel command line option to play fast and loose 
> but we should be safe and thorough by default.
> 
> The more code paths you introduce the harder code is to 
> maintain and test.  The earlier discussion suggested you 
> cannot harden some drivers.  We can take action against 
> drivers like that simply and easily.
> 
> Hacks in the driver initialization are a completely different story.
> 
> > But looks like not all devices have got the capablity to be 
> reset from 
> > software. In those cases probably one need to put some hooks, relax 
> > driver's consistency checks etc in special boot environment.
> 
> Forget the concept of a special boot environment.  A buggy 
> BIOS or rebooting after being in windows can potentially have 
> the same effect as a kdump, environment.
> 
> > Here I am trying to solve the second problem so that a 
> driver comes to 
> > know that it is initializing in a special boot environment 
> and it can 
> > modify its behavior at run time.
> 
> As Andrew said that encourages hacks.  
> For the specific megaraid example it would be simple enough 
> to always ignore the condition and just print a warning.
> 
> There is no such thing as a special boot environment there 
> are only quality of implementation differences.  And in a 
> kexec on a panic scenario the quality of implementation is terrible.
> 
> >> If the concept is crashdump it is a poorly defined concept 
> and all of 
> >> Andrews objections apply.
> >> 
> >
> > I think this parameter is generic enough and not limited to 
> crashdumps.
> > If a user decides to implement a different scheme than 
> kdump in kexec 
> > on panic and boot a customized kernel, he can very well use this 
> > parameter to make sure that next kernel is able to at least 
> boot and 
> > not panic in between.
> 
> The name crashboot is certainly not generic enough to make it 
> clear what it means or to make it sound interesting outside 
> of a crashdump scenario.
> 
> > Solving first problem will help in doing a plain kexec. We 
> can simply 
> > mark the drivers known to have problems and slowly people can fix 
> > those drivers. Fixing the driver in this case is different because 
> > most likely driver authors will provide a shutdown routine in the 
> > driver so that device can be shutdown and then one can boot 
> into the second kernel.
> > Till then a user can happily skip the drivers known to have 
> problems.
> >
> > In summary, we got two problems to solve. Currently I am 
> focussed on 
> > solving second problem which enables boot a kernel in an unreliable 
> > environment and do some minimal specific operation and then 
> boot back 
> > to regular kernela. So I think just introducing a command line 
> > parameter which drivers can use to determine that they are 
> > initializing in an special environement, solves it and is 
> generic enough.
> >
> > Options like COFIG_BRITTLE_INIT or sikkping execution of brittle 
> > driver based on a command line option seems to be the 
> solution for the 
> > first problem.
> 
> Among other things it is social engineering to solve the 
> first problem.
> 
> > Please correct me if I am wrong. I know little about drivers.
> >
> >
> >> > I remember few months back, Eric had mentioned that he has got 
> >> > patches for relocatable kernel ready for review for i386 and 
> >> > x86_64. Eric, do you have any plans to post the patches 
> for review?
> >> 
> >> I have some code that I keep intending to get to.  It has probably 
> >> bit rotted since I wrote it, but it shouldn't be too bad 
> to clean up.
> >> Unfortunately the whole crashdump thing is fairly low on 
> my priority list.
> >> 
> >
> > I am willing to work on it. Building from scratch always 
> takes more time.
> > If you are willing, I will more than happy to build on top of your 
> > patches.
> 
> I will see what I can dig up.
> 
> Eric
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 16:13                       ` Miller, Mike (OS Dev)
@ 2006-06-26 16:35                         ` Vivek Goyal
  2006-06-26 16:38                         ` Eric W. Biederman
  1 sibling, 0 replies; 38+ messages in thread
From: Vivek Goyal @ 2006-06-26 16:35 UTC (permalink / raw)
  To: Miller, Mike (OS Dev)
  Cc: Eric W. Biederman, Maneesh Soni, Andrew Morton, Neela.Kolli,
	linux-scsi, fastboot, linux-kernel

On Mon, Jun 26, 2006 at 11:13:32AM -0500, Miller, Mike (OS Dev) wrote:
> All,
> Sorry to come in late and top post. I've been out of the office and I'm
> trying to get to the gist of this issue.
> Exactly what is the problem? I'm not familiar with kdump so I don't have
> a clue about what's going on. 

Hi Mike,

Kdump is a kernel crash dumping mechanism which is built on top of
kexec on panic functionality.

http://lse.sourceforge.net/kdump/

After a system crash, a second kernel boots from a reserved memory
without going through the BIOS. This second kernel captures the memory
snapshot of the crashed kernel.

Devices are not shutdown after the first kernel crash hence while second
kernel is initializing device might very well be oprational and sending
interrupts. So the moment a driver loads underlying  device might send an
interrupt indicating completion of a command issued from the context of
crashed kernel. Driver does not know anything about it and often crashes
or raises a BUG() as this is anomalous. 

Ideal thing probably would be to soft reset the deivce before going ahead
with rest of the initilization so that device flushes the messages 
issued from the context of the previous kernels and lower the interrupt line.

Hope this gives some context.

> There are a couple of reset features supported by _some_ cciss
> controllers. I'd have to go back to the open spec to see whats in the
> public domain. We're trying to get the open spec updated and more
> complete but we're waiting on the lawyers. :(
> 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 16:13                       ` Miller, Mike (OS Dev)
  2006-06-26 16:35                         ` Vivek Goyal
@ 2006-06-26 16:38                         ` Eric W. Biederman
  2006-06-26 16:51                           ` Miller, Mike (OS Dev)
  1 sibling, 1 reply; 38+ messages in thread
From: Eric W. Biederman @ 2006-06-26 16:38 UTC (permalink / raw)
  To: Miller, Mike (OS Dev)
  Cc: vgoyal, Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi,
	fastboot, linux-kernel

"Miller, Mike (OS Dev)" <Mike.Miller@hp.com> writes:

> All,
> Sorry to come in late and top post. I've been out of the office and I'm
> trying to get to the gist of this issue.
> Exactly what is the problem? I'm not familiar with kdump so I don't have
> a clue about what's going on. 
> There are a couple of reset features supported by _some_ cciss
> controllers. I'd have to go back to the open spec to see whats in the
> public domain. We're trying to get the open spec updated and more
> complete but we're waiting on the lawyers. :(

kdump or taking crash dumps using the kexec on panic mechanism could
be called a drivers worst nightmare.  In the latest distros this is
becoming the way crash dump style information is captured.

Because the initial kernel is broken we do a jump into another kernel
that is sufficient to record a crash dump.  That second kernel
initializes the hardware from whatever random state the first
kernel left the drivers in.  That first kernel is not permitted
to do any device shutdown activities.

The problem is that a command the running instance of the driver did
not initiate completes.  At least if I read Vivek patch 2/2 correctly.

So we have three options.
- reset the card during initialization.
- handle the case of a command we did not initiate completing.
- mark the driver/card as impossibly hopeless for use in a crash
  dump scenario.

Eric

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 16:38                         ` Eric W. Biederman
@ 2006-06-26 16:51                           ` Miller, Mike (OS Dev)
  2006-06-26 17:04                             ` Vivek Goyal
                                               ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Miller, Mike (OS Dev) @ 2006-06-26 16:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi,
	fastboot, linux-kernel

 

> -----Original Message-----
> From: Eric W. Biederman [mailto:ebiederm@xmission.com] 
> Sent: Monday, June 26, 2006 11:38 AM
> To: Miller, Mike (OS Dev)
> Cc: vgoyal@in.ibm.com; Maneesh Soni; Andrew Morton; 
> Neela.Kolli@engenio.com; linux-scsi@vger.kernel.org; 
> fastboot@lists.osdl.org; linux-kernel@vger.kernel.org
> Subject: Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver 
> initialization issue fix
> 
> "Miller, Mike (OS Dev)" <Mike.Miller@hp.com> writes:
> 
> > All,
> > Sorry to come in late and top post. I've been out of the office and 
> > I'm trying to get to the gist of this issue.
> > Exactly what is the problem? I'm not familiar with kdump so I don't 
> > have a clue about what's going on.
> > There are a couple of reset features supported by _some_ cciss 
> > controllers. I'd have to go back to the open spec to see 
> whats in the 
> > public domain. We're trying to get the open spec updated and more 
> > complete but we're waiting on the lawyers. :(
> 
> 
> kdump or taking crash dumps using the kexec on panic 
> mechanism could be called a drivers worst nightmare.  In the 
> latest distros this is becoming the way crash dump style 
> information is captured.
> 
> Because the initial kernel is broken we do a jump into 
> another kernel that is sufficient to record a crash dump.  
> That second kernel initializes the hardware from whatever 
> random state the first kernel left the drivers in.  That 
> first kernel is not permitted to do any device shutdown activities.
> 
> The problem is that a command the running instance of the 
> driver did not initiate completes.  At least if I read Vivek 
> patch 2/2 correctly.
> 
> So we have three options.
> - reset the card during initialization.
> - handle the case of a command we did not initiate completing.
> - mark the driver/card as impossibly hopeless for use in a crash
>   dump scenario.
> 
> 
> Eric

Thanks Eric, that helps me understand. Section 8.2.2 of the open cciss
spec supports a reset message. Target 0x00 is the controller. We could
add this to the init routine to ensure the board is made sane again but
this would drastically increase init time under normal circumstances.
And I suspect this is a hard reset, also. Not sure if that would
negatively impact kdump. If there were some condition we could test
against and perform the reset when that condition is met it would not
impact 99.9% of users.

Thoughts, comments, flames?

mikem

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 16:51                           ` Miller, Mike (OS Dev)
@ 2006-06-26 17:04                             ` Vivek Goyal
  2006-06-26 17:24                               ` Andrew Morton
  2006-06-26 17:22                             ` Vivek Goyal
  2006-06-26 17:52                             ` Eric W. Biederman
  2 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2006-06-26 17:04 UTC (permalink / raw)
  To: Miller, Mike (OS Dev)
  Cc: Eric W. Biederman, Maneesh Soni, Andrew Morton, Neela.Kolli,
	linux-scsi, fastboot, linux-kernel

On Mon, Jun 26, 2006 at 11:51:52AM -0500, Miller, Mike (OS Dev) wrote:
[..]
> > kdump or taking crash dumps using the kexec on panic 
> > mechanism could be called a drivers worst nightmare.  In the 
> > latest distros this is becoming the way crash dump style 
> > information is captured.
> > 
> > Because the initial kernel is broken we do a jump into 
> > another kernel that is sufficient to record a crash dump.  
> > That second kernel initializes the hardware from whatever 
> > random state the first kernel left the drivers in.  That 
> > first kernel is not permitted to do any device shutdown activities.
> > 
> > The problem is that a command the running instance of the 
> > driver did not initiate completes.  At least if I read Vivek 
> > patch 2/2 correctly.
> > 
> > So we have three options.
> > - reset the card during initialization.
> > - handle the case of a command we did not initiate completing.
> > - mark the driver/card as impossibly hopeless for use in a crash
> >   dump scenario.
> > 
> > 
> > Eric
> 
> Thanks Eric, that helps me understand. Section 8.2.2 of the open cciss
> spec supports a reset message. Target 0x00 is the controller. We could
> add this to the init routine to ensure the board is made sane again but
> this would drastically increase init time under normal circumstances.
> And I suspect this is a hard reset, also. Not sure if that would
> negatively impact kdump. If there were some condition we could test
> against and perform the reset when that condition is met it would not
> impact 99.9% of users.

That's the precise reason of introducing the "crashboot" command line
parameter. Driver authors can check against this condition and reset
the device and 99.9% of the users are not impacted.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 17:04                             ` Vivek Goyal
@ 2006-06-26 17:24                               ` Andrew Morton
  0 siblings, 0 replies; 38+ messages in thread
From: Andrew Morton @ 2006-06-26 17:24 UTC (permalink / raw)
  To: vgoyal
  Cc: Mike.Miller, ebiederm, maneesh, Neela.Kolli, linux-scsi, fastboot,
	linux-kernel

On Mon, 26 Jun 2006 13:04:40 -0400
Vivek Goyal <vgoyal@in.ibm.com> wrote:

> > Thanks Eric, that helps me understand. Section 8.2.2 of the open cciss
> > spec supports a reset message. Target 0x00 is the controller. We could
> > add this to the init routine to ensure the board is made sane again but
> > this would drastically increase init time under normal circumstances.
> > And I suspect this is a hard reset, also. Not sure if that would
> > negatively impact kdump. If there were some condition we could test
> > against and perform the reset when that condition is met it would not
> > impact 99.9% of users.
> 
> That's the precise reason of introducing the "crashboot" command line
> parameter. Driver authors can check against this condition and reset
> the device and 99.9% of the users are not impacted.

Yes, that is a legitimate use.

As long as there is indeed a noticeable downside to issuing the reset - if
it turns out that it just takes a few milliseconds then we'd be better off
dong the reset unconditionally.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 16:51                           ` Miller, Mike (OS Dev)
  2006-06-26 17:04                             ` Vivek Goyal
@ 2006-06-26 17:22                             ` Vivek Goyal
  2006-06-26 17:52                             ` Eric W. Biederman
  2 siblings, 0 replies; 38+ messages in thread
From: Vivek Goyal @ 2006-06-26 17:22 UTC (permalink / raw)
  To: Miller, Mike (OS Dev)
  Cc: Eric W. Biederman, Maneesh Soni, Andrew Morton, Neela.Kolli,
	linux-scsi, fastboot, linux-kernel

On Mon, Jun 26, 2006 at 11:51:52AM -0500, Miller, Mike (OS Dev) wrote:
[..]
> > 
> > 
> > kdump or taking crash dumps using the kexec on panic 
> > mechanism could be called a drivers worst nightmare.  In the 
> > latest distros this is becoming the way crash dump style 
> > information is captured.
> > 
> > Because the initial kernel is broken we do a jump into 
> > another kernel that is sufficient to record a crash dump.  
> > That second kernel initializes the hardware from whatever 
> > random state the first kernel left the drivers in.  That 
> > first kernel is not permitted to do any device shutdown activities.
> > 
> > The problem is that a command the running instance of the 
> > driver did not initiate completes.  At least if I read Vivek 
> > patch 2/2 correctly.
> > 
> > So we have three options.
> > - reset the card during initialization.
> > - handle the case of a command we did not initiate completing.
> > - mark the driver/card as impossibly hopeless for use in a crash
> >   dump scenario.
> > 
> > 
> > Eric
> 
> Thanks Eric, that helps me understand. Section 8.2.2 of the open cciss
> spec supports a reset message. Target 0x00 is the controller. We could
> add this to the init routine to ensure the board is made sane again but
> this would drastically increase init time under normal circumstances.
> And I suspect this is a hard reset, also. Not sure if that would
> negatively impact kdump.

As long as driver is able to initialize the device and continue working
kdump is not impacted whether it is a hard reset or soft reset.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 16:51                           ` Miller, Mike (OS Dev)
  2006-06-26 17:04                             ` Vivek Goyal
  2006-06-26 17:22                             ` Vivek Goyal
@ 2006-06-26 17:52                             ` Eric W. Biederman
  2006-06-26 18:18                               ` Vivek Goyal
  2006-06-26 18:51                               ` Miller, Mike (OS Dev)
  2 siblings, 2 replies; 38+ messages in thread
From: Eric W. Biederman @ 2006-06-26 17:52 UTC (permalink / raw)
  To: Miller, Mike (OS Dev)
  Cc: vgoyal, Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi,
	fastboot, linux-kernel

"Miller, Mike (OS Dev)" <Mike.Miller@hp.com> writes:

> Thanks Eric, that helps me understand. Section 8.2.2 of the open cciss
> spec supports a reset message. Target 0x00 is the controller. We could
> add this to the init routine to ensure the board is made sane again but
> this would drastically increase init time under normal circumstances.

Where does the init time penalty come from? How large is the
init penalty?  I suspect it is from waiting for the scsi disks to spin up.
But I am just guessing in the dark.

> And I suspect this is a hard reset, also. Not sure if that would
> negatively impact kdump. If there were some condition we could test
> against and perform the reset when that condition is met it would not
> impact 99.9% of users.

I am wondering if it is possible to look at the controller and
see if it is in a bad state, (i.e. in some state besides just coming
out of reset) and if so issue a reset.  If this really is a long operation
that would be the ideal way to handle it.

If the amount of time is really user noticeable and testing for it
is impossible then it is probably time to talk kernel command line
options.

Although it might simply be appropriate to handle commands completing
you didn't start.  I am not at all familiar with that particular piece
of hardware so I can't make a good guess on what needs to happen there.

> Thoughts, comments, flames?

Good question.

It is a bit of a pain but not too hard to setup a test environment
so you can reproduce this if you are interested.  Vivek should
be the authority there.

Eric

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 17:52                             ` Eric W. Biederman
@ 2006-06-26 18:18                               ` Vivek Goyal
  2006-06-26 18:51                               ` Miller, Mike (OS Dev)
  1 sibling, 0 replies; 38+ messages in thread
From: Vivek Goyal @ 2006-06-26 18:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miller, Mike (OS Dev), Maneesh Soni, Andrew Morton, Neela.Kolli,
	linux-scsi, fastboot, linux-kernel

On Mon, Jun 26, 2006 at 11:52:28AM -0600, Eric W. Biederman wrote:
> "Miller, Mike (OS Dev)" <Mike.Miller@hp.com> writes:
> 
> > Thanks Eric, that helps me understand. Section 8.2.2 of the open cciss
> > spec supports a reset message. Target 0x00 is the controller. We could
> > add this to the init routine to ensure the board is made sane again but
> > this would drastically increase init time under normal circumstances.
> 
> Where does the init time penalty come from? How large is the
> init penalty?  I suspect it is from waiting for the scsi disks to spin up.
> But I am just guessing in the dark.
> 
> > And I suspect this is a hard reset, also. Not sure if that would
> > negatively impact kdump. If there were some condition we could test
> > against and perform the reset when that condition is met it would not
> > impact 99.9% of users.
> 
> I am wondering if it is possible to look at the controller and
> see if it is in a bad state, (i.e. in some state besides just coming
> out of reset) and if so issue a reset.  If this really is a long operation
> that would be the ideal way to handle it.
> 

That's a good question. MPT fustion driver already does something like
this. It retrieves the state of IOC and then checks whether there is
a need of reset or not.

        /*
         *      Check to see if IOC got left/stuck in doorbell handshake
         *      grip of death.  If so, hard reset the IOC.
         */
        if (ioc_state & MPI_DOORBELL_ACTIVE) {
                statefault = 1;
                printk(MYIOC_s_WARN_FMT "Unexpected doorbell active!\n",
                                ioc->name);
        }

But then question will be if all the devices out there provide the
capability to query something similar to if we have just come out of reset
state or not.

> If the amount of time is really user noticeable and testing for it
> is impossible then it is probably time to talk kernel command line
> options.  > 
> Although it might simply be appropriate to handle commands completing
> you didn't start.  I am not at all familiar with that particular piece
> of hardware so I can't make a good guess on what needs to happen there.
> 
> > Thoughts, comments, flames?
> 
> Good question.
> 
> It is a bit of a pain but not too hard to setup a test environment
> so you can reproduce this if you are interested.  Vivek should
> be the authority there.
> 

Mike, I have got one setup ready with me. I have got a Compaq Smart Array
5300 controller. I can reproduce this issue consistently. I don't know
much about this device. Is it possible for you to post a patch for 
resetting the device during initialization. I can test the fix and provide
you more data.

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 17:52                             ` Eric W. Biederman
  2006-06-26 18:18                               ` Vivek Goyal
@ 2006-06-26 18:51                               ` Miller, Mike (OS Dev)
  2006-06-26 19:21                                 ` Eric W. Biederman
  2006-06-26 19:36                                 ` Vivek Goyal
  1 sibling, 2 replies; 38+ messages in thread
From: Miller, Mike (OS Dev) @ 2006-06-26 18:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi,
	fastboot, linux-kernel

> -----Original Message-----
> From: Eric W. Biederman [mailto:ebiederm@xmission.com] 
> Sent: Monday, June 26, 2006 12:52 PM
> To: Miller, Mike (OS Dev)
> Cc: vgoyal@in.ibm.com; Maneesh Soni; Andrew Morton; 
> Neela.Kolli@engenio.com; linux-scsi@vger.kernel.org; 
> fastboot@lists.osdl.org; linux-kernel@vger.kernel.org
> Subject: Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver 
> initialization issue fix
> 
> "Miller, Mike (OS Dev)" <Mike.Miller@hp.com> writes:
> 
> > Thanks Eric, that helps me understand. Section 8.2.2 of the 
> open cciss 
> > spec supports a reset message. Target 0x00 is the 
> controller. We could 
> > add this to the init routine to ensure the board is made sane again 
> > but this would drastically increase init time under normal 
> circumstances.
> 
> Where does the init time penalty come from? How large is the 
> init penalty?  I suspect it is from waiting for the scsi 
> disks to spin up.
> But I am just guessing in the dark.

The penalty is in the firmware and self-test operations.

> 
> > And I suspect this is a hard reset, also. Not sure if that would 
> > negatively impact kdump. If there were some condition we could test 
> > against and perform the reset when that condition is met it 
> would not 
> > impact 99.9% of users.
> 
> I am wondering if it is possible to look at the controller 
> and see if it is in a bad state, (i.e. in some state besides 
> just coming out of reset) and if so issue a reset.  If this 
> really is a long operation that would be the ideal way to handle it.

It's not really in a bad state at this time, is it? Maybe some commands
hanging around.

> 
> If the amount of time is really user noticeable and testing 
> for it is impossible then it is probably time to talk kernel 
> command line options.

I was informed of the crashboot command line parameter. I can implement
that as a test.

> 
> Although it might simply be appropriate to handle commands 
> completing you didn't start.  I am not at all familiar with 
> that particular piece of hardware so I can't make a good 
> guess on what needs to happen there.

Not sure about doing this.

mikem

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 18:51                               ` Miller, Mike (OS Dev)
@ 2006-06-26 19:21                                 ` Eric W. Biederman
  2006-06-26 19:43                                   ` Vivek Goyal
  2006-06-26 21:24                                   ` Miller, Mike (OS Dev)
  2006-06-26 19:36                                 ` Vivek Goyal
  1 sibling, 2 replies; 38+ messages in thread
From: Eric W. Biederman @ 2006-06-26 19:21 UTC (permalink / raw)
  To: Miller, Mike (OS Dev)
  Cc: vgoyal, Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi,
	fastboot, linux-kernel

"Miller, Mike (OS Dev)" <Mike.Miller@hp.com> writes:

>> -----Original Message-----
>> From: Eric W. Biederman [mailto:ebiederm@xmission.com] 
>> Sent: Monday, June 26, 2006 12:52 PM
>> To: Miller, Mike (OS Dev)
>> Cc: vgoyal@in.ibm.com; Maneesh Soni; Andrew Morton; 
>> Neela.Kolli@engenio.com; linux-scsi@vger.kernel.org; 
>> fastboot@lists.osdl.org; linux-kernel@vger.kernel.org
>> Subject: Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver 
>> initialization issue fix
>> 
>> "Miller, Mike (OS Dev)" <Mike.Miller@hp.com> writes:
>> 
>> > Thanks Eric, that helps me understand. Section 8.2.2 of the 
>> open cciss 
>> > spec supports a reset message. Target 0x00 is the 
>> controller. We could 
>> > add this to the init routine to ensure the board is made sane again 
>> > but this would drastically increase init time under normal 
>> circumstances.
>> 
>> Where does the init time penalty come from? How large is the 
>> init penalty?  I suspect it is from waiting for the scsi 
>> disks to spin up.
>> But I am just guessing in the dark.
>
> The penalty is in the firmware and self-test operations.

Ok.  Reasonable. Roughly long does that take? 1 millisecond? 1 second?
1 minute? 1 hour? 

>> > And I suspect this is a hard reset, also. Not sure if that would 
>> > negatively impact kdump. If there were some condition we could test 
>> > against and perform the reset when that condition is met it 
>> would not 
>> > impact 99.9% of users.
>> 
>> I am wondering if it is possible to look at the controller 
>> and see if it is in a bad state, (i.e. in some state besides 
>> just coming out of reset) and if so issue a reset.  If this 
>> really is a long operation that would be the ideal way to handle it.
>
> It's not really in a bad state at this time, is it? Maybe some commands
> hanging around.

Not bad as in broken.  But bad as in unexpected.  If it is just a matter
of outstanding commands we might even be able to just ask the adapter
to cancel all of the at initialization time.

>> If the amount of time is really user noticeable and testing 
>> for it is impossible then it is probably time to talk kernel 
>> command line options.
>
> I was informed of the crashboot command line parameter. I can implement
> that as a test.

Sounds like a start.

>> Although it might simply be appropriate to handle commands 
>> completing you didn't start.  I am not at all familiar with 
>> that particular piece of hardware so I can't make a good 
>> guess on what needs to happen there.
>
> Not sure about doing this.

Well I would certainly print a warning.

Eric


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 19:21                                 ` Eric W. Biederman
@ 2006-06-26 19:43                                   ` Vivek Goyal
  2006-06-26 21:24                                   ` Miller, Mike (OS Dev)
  1 sibling, 0 replies; 38+ messages in thread
From: Vivek Goyal @ 2006-06-26 19:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miller, Mike (OS Dev), Maneesh Soni, Andrew Morton, Neela.Kolli,
	linux-scsi, fastboot, linux-kernel

On Mon, Jun 26, 2006 at 01:21:04PM -0600, Eric W. Biederman wrote:

[..]
> >> 
> >> Where does the init time penalty come from? How large is the 
> >> init penalty?  I suspect it is from waiting for the scsi 
> >> disks to spin up.
> >> But I am just guessing in the dark.
> >
> > The penalty is in the firmware and self-test operations.
> 
> Ok.  Reasonable. Roughly long does that take? 1 millisecond? 1 second?
> 1 minute? 1 hour? 
> 

IIRC, for MPT fusion, this delay was in order of seconds.


> >> > And I suspect this is a hard reset, also. Not sure if that would 
> >> > negatively impact kdump. If there were some condition we could test 
> >> > against and perform the reset when that condition is met it 
> >> would not 
> >> > impact 99.9% of users.
> >> 
> >> I am wondering if it is possible to look at the controller 
> >> and see if it is in a bad state, (i.e. in some state besides 
> >> just coming out of reset) and if so issue a reset.  If this 
> >> really is a long operation that would be the ideal way to handle it.
> >
> > It's not really in a bad state at this time, is it? Maybe some commands
> > hanging around.
> 
> Not bad as in broken.  But bad as in unexpected.  If it is just a matter
> of outstanding commands we might even be able to just ask the adapter
> to cancel all of the at initialization time.
> 
> >> If the amount of time is really user noticeable and testing 
> >> for it is impossible then it is probably time to talk kernel 
> >> command line options.
> >
> > I was informed of the crashboot command line parameter. I can implement
> > that as a test.
> 
> Sounds like a start.
> 
> >> Although it might simply be appropriate to handle commands 
> >> completing you didn't start.  I am not at all familiar with 
> >> that particular piece of hardware so I can't make a good 
> >> guess on what needs to happen there.
> >
> > Not sure about doing this.
> 
> Well I would certainly print a warning.

cciss already prints a warning message if it receives a command completion
message which driver thinks it has not issued. But what do you do after
that? Simply ignore it and assume that everything is fine or you think that
there is something wrong with the device (During normal boot it should
not happen) and raise a BUG()?

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 19:21                                 ` Eric W. Biederman
  2006-06-26 19:43                                   ` Vivek Goyal
@ 2006-06-26 21:24                                   ` Miller, Mike (OS Dev)
  1 sibling, 0 replies; 38+ messages in thread
From: Miller, Mike (OS Dev) @ 2006-06-26 21:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi,
	fastboot, linux-kernel

> -----Original Message-----
> From: Eric W. Biederman [mailto:ebiederm@xmission.com] 
> Sent: Monday, June 26, 2006 2:21 PM
> To: Miller, Mike (OS Dev)
> Cc: vgoyal@in.ibm.com; Maneesh Soni; Andrew Morton; 
> Neela.Kolli@engenio.com; linux-scsi@vger.kernel.org; 
> fastboot@lists.osdl.org; linux-kernel@vger.kernel.org
> Subject: Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver 
> initialization issue fix
> 
> >> "Miller, Mike (OS Dev)" <Mike.Miller@hp.com> writes:
> >> 
> >> > Thanks Eric, that helps me understand. Section 8.2.2 of the
> >> open cciss
> >> > spec supports a reset message. Target 0x00 is the
> >> controller. We could
> >> > add this to the init routine to ensure the board is made 
> sane again 
> >> > but this would drastically increase init time under normal
> >> circumstances.
> >> 
> >> Where does the init time penalty come from? How large is the init 
> >> penalty?  I suspect it is from waiting for the scsi disks 
> to spin up.
> >> But I am just guessing in the dark.
> >
> > The penalty is in the firmware and self-test operations.
> 
> Ok.  Reasonable. Roughly long does that take? 1 millisecond? 1 second?
> 1 minute? 1 hour? 

Sorry, roughly 30 to 40 seconds. Maybe longer if the controller thinks
there's something wrong with the disks. Typically the disks are always
spinning so that delay is not an issue.

> 
> >> > And I suspect this is a hard reset, also. Not sure if that would 
> >> > negatively impact kdump. If there were some condition we 
> could test 
> >> > against and perform the reset when that condition is met it
> >> would not
> >> > impact 99.9% of users.
> >> 
> >> I am wondering if it is possible to look at the controller 
> and see if 
> >> it is in a bad state, (i.e. in some state besides just 
> coming out of 
> >> reset) and if so issue a reset.  If this really is a long 
> operation 
> >> that would be the ideal way to handle it.
> >
> > It's not really in a bad state at this time, is it? Maybe some 
> > commands hanging around.
> 
> Not bad as in broken.  But bad as in unexpected.  If it is 
> just a matter of outstanding commands we might even be able 
> to just ask the adapter to cancel all of the at initialization time.

We can't detect unexpected but we can discard everything at init.

> >
> > I was informed of the crashboot command line parameter. I can 
> > implement that as a test.
> 
> Sounds like a start.
> 
> >> Although it might simply be appropriate to handle commands 
> completing 
> >> you didn't start.  I am not at all familiar with that particular 
> >> piece of hardware so I can't make a good guess on what needs to 
> >> happen there.
> >
> > Not sure about doing this.
> 
> Well I would certainly print a warning.
> 
> Eric
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 18:51                               ` Miller, Mike (OS Dev)
  2006-06-26 19:21                                 ` Eric W. Biederman
@ 2006-06-26 19:36                                 ` Vivek Goyal
  1 sibling, 0 replies; 38+ messages in thread
From: Vivek Goyal @ 2006-06-26 19:36 UTC (permalink / raw)
  To: Miller, Mike (OS Dev)
  Cc: Eric W. Biederman, Maneesh Soni, Andrew Morton, Neela.Kolli,
	linux-scsi, fastboot, linux-kernel

On Mon, Jun 26, 2006 at 01:51:43PM -0500, Miller, Mike (OS Dev) wrote:

[..]
> > Subject: Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver 
> > initialization issue fix
> > 
> > "Miller, Mike (OS Dev)" <Mike.Miller@hp.com> writes:
> > 
> > > Thanks Eric, that helps me understand. Section 8.2.2 of the 
> > open cciss 
> > > spec supports a reset message. Target 0x00 is the 
> > controller. We could 
> > > add this to the init routine to ensure the board is made sane again 
> > > but this would drastically increase init time under normal 
> > circumstances.
> > 
> > Where does the init time penalty come from? How large is the 
> > init penalty?  I suspect it is from waiting for the scsi 
> > disks to spin up.
> > But I am just guessing in the dark.
> 
> The penalty is in the firmware and self-test operations.
> 
> > 
> > > And I suspect this is a hard reset, also. Not sure if that would 
> > > negatively impact kdump. If there were some condition we could test 
> > > against and perform the reset when that condition is met it 
> > would not 
> > > impact 99.9% of users.
> > 
> > I am wondering if it is possible to look at the controller 
> > and see if it is in a bad state, (i.e. in some state besides 
> > just coming out of reset) and if so issue a reset.  If this 
> > really is a long operation that would be the ideal way to handle it.
> 
> It's not really in a bad state at this time, is it? Maybe some commands
> hanging around.

That's true. Only some commands are hanging around and currently
only a warning is printed, if CONFIG_CISS_SCSI_TAPE is not enabled.
Otherwise, driver thinks that there is on reasons for me receiving
a command completion message for a command which I never issued and
it runs in to the BUG().

                /* This will need changing for direct lookup completions */
                if (complete != c->busaddr) {
                        if (add_sendcmd_reject(cmd, ctlr, complete) != 0) {
                                BUG(); /* we are pretty much hosed if we get here. */
                        }
                        continue;

> 
> > 
> > If the amount of time is really user noticeable and testing 
> > for it is impossible then it is probably time to talk kernel 
> > command line options.
> 
> I was informed of the crashboot command line parameter. I can implement
> that as a test.
> 

Now as per Andrew's suggestion, I have changed crashboot to "reset_devices".

Please find attached the patch. Now reset functionality of cciss driver
needs to be implemented.

Thanks
Vivek



o Introduce "reset_devices" command line option.

o Resetting the devices during driver initialization can be a costly
  operation in terms of time (especially scsi devices). This option can
  be used by drivers to know that user forcibly wants the devices to be
  reset during initialization.

o This option can be useful while kernel is booting in unreliable
  environment. For ex. during kdump boot where devices are in 
  unknown random state and BIOS execution has been skipped.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 linux-2.6.17-1M-vivek/Documentation/kernel-parameters.txt |    3 ++
 linux-2.6.17-1M-vivek/include/linux/init.h                |    1 
 linux-2.6.17-1M-vivek/init/main.c                         |   20 ++++++++++++++
 3 files changed, 24 insertions(+)

diff -puN init/main.c~add-reset-devices-command-line-option init/main.c
--- linux-2.6.17-1M/init/main.c~add-reset-devices-command-line-option	2006-06-26 13:53:51.000000000 -0400
+++ linux-2.6.17-1M-vivek/init/main.c	2006-06-26 15:06:12.000000000 -0400
@@ -125,6 +125,18 @@ static char *ramdisk_execute_command;
 static unsigned int max_cpus = NR_CPUS;
 
 /*
+ * If set, this is an indication to the drivers that reset the underlying
+ * device before going ahead with the initialization otherwise driver might
+ * rely on the BIOS and skip the reset operation.
+ *
+ * This is useful if kernel is booting in an unreliable environment.
+ * For ex. kdump situaiton where previous kernel has crashed, BIOS has been
+ * skipped and devices will be in unknown state.
+ */
+unsigned int reset_devices;
+EXPORT_SYMBOL(reset_devices);
+
+/*
  * Setup routine for controlling SMP activation
  *
  * Command-line option of "nosmp" or "maxcpus=0" will disable SMP
@@ -150,6 +162,14 @@ static int __init maxcpus(char *str)
 
 __setup("maxcpus=", maxcpus);
 
+static int __init set_reset_devices(char *str)
+{
+	reset_devices = 1;
+	return 1;
+}
+
+__setup("reset_devices", set_reset_devices);
+
 static char * argv_init[MAX_INIT_ARGS+2] = { "init", NULL, };
 char * envp_init[MAX_INIT_ENVS+2] = { "HOME=/", "TERM=linux", NULL, };
 static const char *panic_later, *panic_param;
diff -puN include/linux/init.h~add-reset-devices-command-line-option include/linux/init.h
--- linux-2.6.17-1M/include/linux/init.h~add-reset-devices-command-line-option	2006-06-26 14:44:22.000000000 -0400
+++ linux-2.6.17-1M-vivek/include/linux/init.h	2006-06-26 14:44:51.000000000 -0400
@@ -69,6 +69,7 @@ extern initcall_t __security_initcall_st
 
 /* Defined in init/main.c */
 extern char saved_command_line[];
+extern unsigned int reset_devices;
 
 /* used by init/main.c */
 extern void setup_arch(char **);
diff -puN Documentation/kernel-parameters.txt~add-reset-devices-command-line-option Documentation/kernel-parameters.txt
--- linux-2.6.17-1M/Documentation/kernel-parameters.txt~add-reset-devices-command-line-option	2006-06-26 14:45:01.000000000 -0400
+++ linux-2.6.17-1M-vivek/Documentation/kernel-parameters.txt	2006-06-26 14:47:20.000000000 -0400
@@ -1340,6 +1340,9 @@ running once the system is up.
 
 	reserve=	[KNL,BUGS] Force the kernel to ignore some iomem area
 
+	reset_devices	[KNL] Force drivers to reset the underlying device
+			during initialization.
+
 	resume=		[SWSUSP]
 			Specify the partition device for software suspend
 
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 16:00                     ` Eric W. Biederman
  2006-06-26 16:13                       ` Miller, Mike (OS Dev)
@ 2006-06-26 17:16                       ` Vivek Goyal
  2006-06-26 17:31                         ` Andrew Morton
  2006-06-26 17:39                         ` Eric W. Biederman
  1 sibling, 2 replies; 38+ messages in thread
From: Vivek Goyal @ 2006-06-26 17:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi, mike.miller,
	fastboot, linux-kernel

On Mon, Jun 26, 2006 at 10:00:55AM -0600, Eric W. Biederman wrote:

[..]
> >
> > Looks like there are two problems to be solved.
> >
> > - Framework/capability to mark and isolate the drivers, either at compile
> >   time or run time, which are not hardened enough to initialize properly
> >   when the underlying device is in operational or in unknown state.
> >
> > - Actually hardening a driver to be able to initialize in a potentially
> >   unreliable environment.  
> >
> >
> > Solving first problem will help more in terms of people knowing in advance
> > that certain drivers are known to have problems in specific environemnt and
> > a user has got the option of skipping the execution/compilation of those
> > drivers. (This is something close to what CONFIG_EXPERIMENTAL does)
> >
> > Second problem deals more with actually hardening the driver and not
> > skipping its compilation/execution.
> >
> > I think people would like to change a driver's behaviour at run time.
> > For example if they are booting in a unreliable environment they would
> > like to reset the device otherwise they would skip that as BIOS has
> > already done that for them. 
> 
> In the general case the device reset does not hurt.

I think it does hurt.

- I have seen the case of MPT fusion drvier. It takes significantly more
  time to come up if we choose to reset the device during initialization.
  One of the reasons that we wait in a tight loop for the controller to 
  come up after a reset message.

- Long back we fixed ips driver and I remember that the maintainer had
  a similar issue with the reset of device. He did not want to reset the
  device in normal boot because otherwise it took significantly longer
  for the driver to initialize.

- Just now Mike also confirmed that resetting the device definitely
  hurts in terms of time.

>  Yes there
> is the case of the slow scsi probe.  But a lot of that appears
> to be a poor implementation of the scsi probe.  So I can see a kernel
> command line option to play fast and loose but we should be safe
> and thorough by default.
> 
> The more code paths you introduce the harder code is to maintain
> and test.  The earlier discussion suggested you cannot harden
> some drivers.  We can take action against drivers like that simply
> and easily.
> 
> Hacks in the driver initialization are a completely different story.
> 

So it is matter of making a choice in case the device does not have a
software reset capability.

- Either try to make driver work through some hacks based on crashboot
  option.

- Or mark the driver unusable in kdump scenarios.

Even if one decides to go for second option, at least "crashboot" or
similar parameter will be required so that driver can choose whether
to reset the device or not during initialization due to significant
time penalty. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 17:16                       ` Vivek Goyal
@ 2006-06-26 17:31                         ` Andrew Morton
  2006-06-26 17:39                         ` Eric W. Biederman
  1 sibling, 0 replies; 38+ messages in thread
From: Andrew Morton @ 2006-06-26 17:31 UTC (permalink / raw)
  To: vgoyal
  Cc: ebiederm, maneesh, Neela.Kolli, linux-scsi, mike.miller, fastboot,
	linux-kernel

On Mon, 26 Jun 2006 13:16:59 -0400
Vivek Goyal <vgoyal@in.ibm.com> wrote:

> So it is matter of making a choice in case the device does not have a
> software reset capability.
> 
> - Either try to make driver work through some hacks based on crashboot
>   option.
> 
> - Or mark the driver unusable in kdump scenarios.
> 
> Even if one decides to go for second option, at least "crashboot" or
> similar parameter will be required so that driver can choose whether
> to reset the device or not during initialization due to significant
> time penalty. 

yes, this does legitimise the `crashboot' option.

That being said, it's misnamed, I think.  It should be called
`reset_devices' or something.  Because that's what it does, and who
knows, there might be other reasons for wanting to reset devices.

See, it's more accurate.  We don't want drivers to be looking at some
global environmental thing and then independently working out what they
should be doing this time around.  We just want drivers to do what they're
told.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 17:16                       ` Vivek Goyal
  2006-06-26 17:31                         ` Andrew Morton
@ 2006-06-26 17:39                         ` Eric W. Biederman
  2006-06-26 17:56                           ` James Bottomley
  1 sibling, 1 reply; 38+ messages in thread
From: Eric W. Biederman @ 2006-06-26 17:39 UTC (permalink / raw)
  To: vgoyal
  Cc: Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi, mike.miller,
	fastboot, linux-kernel

Vivek Goyal <vgoyal@in.ibm.com> writes:

> On Mon, Jun 26, 2006 at 10:00:55AM -0600, Eric W. Biederman wrote:

> I think it does hurt.
>
> - I have seen the case of MPT fusion drvier. It takes significantly more
>   time to come up if we choose to reset the device during initialization.
>   One of the reasons that we wait in a tight loop for the controller to 
>   come up after a reset message.
>
> - Long back we fixed ips driver and I remember that the maintainer had
>   a similar issue with the reset of device. He did not want to reset the
>   device in normal boot because otherwise it took significantly longer
>   for the driver to initialize.
>
> - Just now Mike also confirmed that resetting the device definitely
>   hurts in terms of time.

In the general case resets are trivial operations.  In scsi land 
things are different.  So a solution appropriate to that domain may
be appropriate.

Eric

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 17:39                         ` Eric W. Biederman
@ 2006-06-26 17:56                           ` James Bottomley
  2006-06-26 18:23                             ` Eric W. Biederman
  0 siblings, 1 reply; 38+ messages in thread
From: James Bottomley @ 2006-06-26 17:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi,
	mike.miller, fastboot, linux-kernel

On Mon, 2006-06-26 at 11:39 -0600, Eric W. Biederman wrote:
> In the general case resets are trivial operations.  In scsi land 
> things are different.  So a solution appropriate to that domain may
> be appropriate.

That's not necessarily true. You're talking about board level resets
here.  Some devices take quite a while to reboot after being reset ...
particularly the complex ones with internal operating system type
firmware ..

James



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
  2006-06-26 17:56                           ` James Bottomley
@ 2006-06-26 18:23                             ` Eric W. Biederman
  0 siblings, 0 replies; 38+ messages in thread
From: Eric W. Biederman @ 2006-06-26 18:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: vgoyal, Maneesh Soni, Andrew Morton, Neela.Kolli, linux-scsi,
	mike.miller, fastboot, linux-kernel

James Bottomley <James.Bottomley@SteelEye.com> writes:

> On Mon, 2006-06-26 at 11:39 -0600, Eric W. Biederman wrote:
>> In the general case resets are trivial operations.  In scsi land 
>> things are different.  So a solution appropriate to that domain may
>> be appropriate.
>
> That's not necessarily true. You're talking about board level resets
> here.  Some devices take quite a while to reboot after being reset ...
> particularly the complex ones with internal operating system type
> firmware ..

Agreed.  I had forgotten about the firmware case as opposed to the device
case.  It is still true that we are mostly talking the scsi domain, when
we are talking about boards with their own OS's.

The important point is to find a way to harden drivers so the driver can
initialize when the device is in a fairly random state and work.  Resets are
the obvious way to get there.  There may be other cheaper ways, like
forcefully setting all of the registers into a know good state.

But I still stand behind the fact that for most devices a reset is a trivial
operation, that takes an insignificant amount of time.  Devices with slow
firmware and devices with big slow disks attached to them are not most devices.

So for most devices the advice can really be just reset it already.  For
scsi devices where we frequently have the weird slow reset case after a
little more experience of what has to be done we can give better domain
specific advice.

Eric

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH 2/2] kdump: cciss driver initialization?issue fix
  2006-06-26 14:17                 ` Eric W. Biederman
  2006-06-26 15:32                   ` Vivek Goyal
@ 2006-06-27  2:42                   ` Horms
  1 sibling, 0 replies; 38+ messages in thread
From: Horms @ 2006-06-27  2:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Neela.Kolli, linux-scsi, mike.miller, fastboot,
	linux-kernel, Vivek Goyal

In article <m11wtcvw5k.fsf@ebiederm.dsl.xmission.com> you wrote:
> Vivek Goyal <vgoyal@in.ibm.com> writes:
> 
>> On Mon, Jun 26, 2006 at 07:41:00AM +0530, Maneesh Soni wrote:
>>
>> Maneesh, Keeping this code under a config option becomes a problem when we
>> will have a relocatable kernel. At some point of time we got to have
>> relocatable kernel so that people don't have to build two kernels. In fact
>> this is becoming a pain area for distros. That's the reason I thought
>> of making it a command line parameter.
> 
> Ok. Even if we do this with a command line, we need to have a clean concept.
> If the concept is ignore devices with a brittle init routine that is compre=
> hensible
> and potentially useful for other reasons than crash dumps.
> 
> If the concept is crashdump it is a poorly defined concept and all of Andre=
> ws
> objections apply.
> 
>> I remember few months back, Eric had mentioned that he has got patches for
>> relocatable kernel ready for review for i386 and x86_64. Eric, do you have
>> any plans to post the patches for review?
> 
> I have some code that I keep intending to get to.  It has probably bit
> rotted since I wrote it, but it shouldn't be too bad to clean up.
> Unfortunately the whole crashdump thing is fairly low on my priority
> list.

Hi Eric,

If you have some code to relocate the i386 and x86_64 kernels then I for
one would really like a chance to look over it.

-- 
Horms                                           
H: http://www.vergenet.net/~horms/          W: http://www.valinux.co.jp/en/


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH 2/2] kdump: cciss driver initialization?issue fix
  2006-06-24 17:13           ` Eric W. Biederman
  2006-06-26  2:11             ` [Fastboot] " Maneesh Soni
@ 2006-06-26  9:09             ` Horms
  2006-06-26 13:45               ` [Fastboot] " Vivek Goyal
  1 sibling, 1 reply; 38+ messages in thread
From: Horms @ 2006-06-26  9:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Neela.Kolli, linux-scsi, mike.miller, fastboot,
	linux-kernel, Zou, Nanhai

In article <m1veqqxyrb.fsf@ebiederm.dsl.xmission.com> you wrote:
> 
> Not all pci busses support it but there is a standard pci bus reset bit
> in pci bridges.
> 
> I don't know if it would help but it might make sense to have a config
> option that can be used to mark drivers that are known to have problems,
> in these scenarios.
> 
> CONFIG_BRITTLE_INIT perhaps?
> 
> It would at least make it easier for people to see which drivers
> they don't want to use, and give people some incentive to fix things.

I believe that MPT Fusion could go on that list.

http://permalink.gmane.org/gmane.linux.ports.ia64/14451



-- 
Horms                                           
H: http://www.vergenet.net/~horms/          W: http://www.valinux.co.jp/en/


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization?issue fix
  2006-06-26  9:09             ` Horms
@ 2006-06-26 13:45               ` Vivek Goyal
  2006-06-27  2:30                 ` Horms
  0 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2006-06-26 13:45 UTC (permalink / raw)
  To: Horms
  Cc: Eric W. Biederman, Andrew Morton, Neela.Kolli, linux-scsi,
	mike.miller, Zou, Nanhai, fastboot, linux-kernel

On Mon, Jun 26, 2006 at 06:09:15PM +0900, Horms wrote:
> In article <m1veqqxyrb.fsf@ebiederm.dsl.xmission.com> you wrote:
> > 
> > Not all pci busses support it but there is a standard pci bus reset bit
> > in pci bridges.
> > 
> > I don't know if it would help but it might make sense to have a config
> > option that can be used to mark drivers that are known to have problems,
> > in these scenarios.
> > 
> > CONFIG_BRITTLE_INIT perhaps?
> > 
> > It would at least make it easier for people to see which drivers
> > they don't want to use, and give people some incentive to fix things.
> 
> I believe that MPT Fusion could go on that list.
> 
> http://permalink.gmane.org/gmane.linux.ports.ia64/14451
> 
>

Above link does not give details of exact MPT fusion initialization 
problems faced on IA64. I faced MPT fusion initialization issues on
i386/x86_64 and posted a fix.

http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9bf0a28c9a24e2cee5deecf89d118254374c75ba

You might want to download and check.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization?issue fix
  2006-06-26 13:45               ` [Fastboot] " Vivek Goyal
@ 2006-06-27  2:30                 ` Horms
  0 siblings, 0 replies; 38+ messages in thread
From: Horms @ 2006-06-27  2:30 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Eric W. Biederman, Andrew Morton, Neela.Kolli, linux-scsi,
	mike.miller, Zou, Nanhai, fastboot, linux-kernel

On Mon, Jun 26, 2006 at 09:45:36AM -0400, Vivek Goyal wrote:
> On Mon, Jun 26, 2006 at 06:09:15PM +0900, Horms wrote:
> > In article <m1veqqxyrb.fsf@ebiederm.dsl.xmission.com> you wrote:
> > > 
> > > Not all pci busses support it but there is a standard pci bus reset bit
> > > in pci bridges.
> > > 
> > > I don't know if it would help but it might make sense to have a config
> > > option that can be used to mark drivers that are known to have problems,
> > > in these scenarios.
> > > 
> > > CONFIG_BRITTLE_INIT perhaps?
> > > 
> > > It would at least make it easier for people to see which drivers
> > > they don't want to use, and give people some incentive to fix things.
> > 
> > I believe that MPT Fusion could go on that list.
> > 
> > http://permalink.gmane.org/gmane.linux.ports.ia64/14451
> > 
> >
> 
> Above link does not give details of exact MPT fusion initialization 
> problems faced on IA64. I faced MPT fusion initialization issues on
> i386/x86_64 and posted a fix.
> 
> http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9bf0a28c9a24e2cee5deecf89d118254374c75ba
> 
> You might want to download and check.

Thanks, I wasn't aware of that. I'll see if that helps on ia64.

-- 
Horms                                           
H: http://www.vergenet.net/~horms/          W: http://www.valinux.co.jp/en/


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter
  2006-06-23 21:01 [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter Vivek Goyal
  2006-06-23 21:04 ` [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix Vivek Goyal
@ 2006-06-23 21:30 ` Bernd Eckenfels
  2006-06-23 22:39   ` Vivek Goyal
  2006-06-24  6:55 ` Andrew Morton
  2 siblings, 1 reply; 38+ messages in thread
From: Bernd Eckenfels @ 2006-06-23 21:30 UTC (permalink / raw)
  To: linux-kernel

Vivek Goyal <vgoyal@in.ibm.com> wrote:
> +static int __init set_crash_boot(char *str)
> +{
> +       crash_boot = 1;
> +       return 1;
> +}

what about a printk? Maybe a combined one which shows some other stuff (init
level, init shell, apic settings) etc from command line.

Gruss
Bernd

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter
  2006-06-23 21:30 ` [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter Bernd Eckenfels
@ 2006-06-23 22:39   ` Vivek Goyal
  0 siblings, 0 replies; 38+ messages in thread
From: Vivek Goyal @ 2006-06-23 22:39 UTC (permalink / raw)
  To: Bernd Eckenfels
  Cc: linux-kernel, Fastboot mailing list, Linux SCSI Mailing list,
	Eric W. Biederman, Morton Andrew Morton, mike.miller

On Fri, Jun 23, 2006 at 11:30:45PM +0200, Bernd Eckenfels wrote:
> Vivek Goyal <vgoyal@in.ibm.com> wrote:
> > +static int __init set_crash_boot(char *str)
> > +{
> > +       crash_boot = 1;
> > +       return 1;
> > +}
> 
> what about a printk? Maybe a combined one which shows some other stuff (init
> level, init shell, apic settings) etc from command line.
> 

In your response, all the people copied on the mail got removed. I am adding
them back.

May be a printk. Not very sure. User can always see in the console messages
list of command line options passed to the kernel.

apic settings are displayed if you pass apic=debug.

Not very sure what do you mean by displaying init level, and init shell
and how that stuff is related to this parameter. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter
  2006-06-23 21:01 [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter Vivek Goyal
  2006-06-23 21:04 ` [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix Vivek Goyal
  2006-06-23 21:30 ` [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter Bernd Eckenfels
@ 2006-06-24  6:55 ` Andrew Morton
  2 siblings, 0 replies; 38+ messages in thread
From: Andrew Morton @ 2006-06-24  6:55 UTC (permalink / raw)
  To: vgoyal; +Cc: linux-kernel, fastboot, linux-scsi, ebiederm, mike.miller

On Fri, 23 Jun 2006 17:01:21 -0400
Vivek Goyal <vgoyal@in.ibm.com> wrote:

> o Add kernel command line option "crashboot"
> 
> o This option is an indication to the kernel that kernel is booting in an
>   unreliable environment where possibly BIOS execution has been skipped
>   and devices are left operational or in unknown state.
> 
> o Kernel, especially device drivers can use this option to take special
>   actions like soft-resetting the device, relaxing some of the rules
>   to make sure kernel can boot/device driver can initiliaze in this
>   environment.
> 
> o As of today this option is useful to Kdump. Kdump will pass this option
>   to second kernel to improve the reliability of successful kenrel boot/
>   device driver initializatoin. 

It worries me that this will be used to work around driver problems rather
than fixing them properly.


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2006-06-27  2:42 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-23 21:01 [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter Vivek Goyal
2006-06-23 21:04 ` [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix Vivek Goyal
2006-06-24  6:55   ` Andrew Morton
2006-06-24 11:19     ` Vivek Goyal
2006-06-24 11:30       ` Andrew Morton
2006-06-24 12:08         ` Vivek Goyal
2006-06-24 17:13           ` Eric W. Biederman
2006-06-26  2:11             ` [Fastboot] " Maneesh Soni
2006-06-26 13:35               ` Vivek Goyal
2006-06-26 14:17                 ` Eric W. Biederman
2006-06-26 15:32                   ` Vivek Goyal
2006-06-26 16:00                     ` Eric W. Biederman
2006-06-26 16:13                       ` Miller, Mike (OS Dev)
2006-06-26 16:35                         ` Vivek Goyal
2006-06-26 16:38                         ` Eric W. Biederman
2006-06-26 16:51                           ` Miller, Mike (OS Dev)
2006-06-26 17:04                             ` Vivek Goyal
2006-06-26 17:24                               ` Andrew Morton
2006-06-26 17:22                             ` Vivek Goyal
2006-06-26 17:52                             ` Eric W. Biederman
2006-06-26 18:18                               ` Vivek Goyal
2006-06-26 18:51                               ` Miller, Mike (OS Dev)
2006-06-26 19:21                                 ` Eric W. Biederman
2006-06-26 19:43                                   ` Vivek Goyal
2006-06-26 21:24                                   ` Miller, Mike (OS Dev)
2006-06-26 19:36                                 ` Vivek Goyal
2006-06-26 17:16                       ` Vivek Goyal
2006-06-26 17:31                         ` Andrew Morton
2006-06-26 17:39                         ` Eric W. Biederman
2006-06-26 17:56                           ` James Bottomley
2006-06-26 18:23                             ` Eric W. Biederman
2006-06-27  2:42                   ` [RFC] [PATCH 2/2] kdump: cciss driver initialization?issue fix Horms
2006-06-26  9:09             ` Horms
2006-06-26 13:45               ` [Fastboot] " Vivek Goyal
2006-06-27  2:30                 ` Horms
2006-06-23 21:30 ` [RFC] [PATCH 1/2] introduce crashboot kernel command line parameter Bernd Eckenfels
2006-06-23 22:39   ` Vivek Goyal
2006-06-24  6:55 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox