* kexec and aacraid broken @ 2007-05-30 1:59 Yinghai Lu 2007-05-30 2:13 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Yinghai Lu @ 2007-05-30 1:59 UTC (permalink / raw) To: Andrew Morton, Vivek Goyal, Eric W. Biederman, aacraid Cc: Linux Kernel Mailing List latest tree, can not use kexec to load 2.6.22-rc3 at least. got: AAC0: adapter kernel panic'd fffffffd AAC0: adapter kernel failed to start, init status=0 but can load 2.6.21.3 YH ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kexec and aacraid broken 2007-05-30 1:59 kexec and aacraid broken Yinghai Lu @ 2007-05-30 2:13 ` Andrew Morton 2007-05-30 11:44 ` Salyzyn, Mark 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2007-05-30 2:13 UTC (permalink / raw) To: Yinghai Lu Cc: Vivek Goyal, Eric W. Biederman, aacraid, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski On Tue, 29 May 2007 18:59:32 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote: > latest tree, can not use kexec to load 2.6.22-rc3 at least. > > got: > > AAC0: adapter kernel panic'd fffffffd > AAC0: adapter kernel failed to start, init status=0 One of the two diffs below, I guess. Please do a `patch -R -p1' of this email and retest? > > but can load 2.6.21.3 > Michal, can you please add this to the regression list? commit 9e4d4a5d71d673901d9c1df5146ce545c2cc0cc0 Author: Salyzyn, Mark <mark_salyzyn@adaptec.com> Date: Tue May 1 11:43:06 2007 -0400 [SCSI] aacraid: superfluous adapter reset for IBM 8 series ServeRAID controllers The kexec patch introduced a superfluous (and otherwise inert) reset of some adapters. The register can have a hardware default value that has zeros for the undefined interrupts. This patch refines the test of the interrupt enable register to focus on only the interrupts that affect the driver in order to detect if an incomplete shutdown of the Adapter had occurred (kdump). Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> diff --git a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c index b6ee3c0..291cd14 100644 --- a/drivers/scsi/aacraid/rx.c +++ b/drivers/scsi/aacraid/rx.c @@ -542,7 +542,7 @@ int _aac_rx_init(struct aac_dev *dev) dev->a_ops.adapter_sync_cmd = rx_sync_cmd; dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt; dev->OIMR = status = rx_readb (dev, MUnit.OIMR); - if ((((status & 0xff) != 0xff) || reset_devices) && + if ((((status & 0x0c) != 0x0c) || reset_devices) && !aac_rx_restart_adapter(dev, 0)) ++restart; /* commit a5694ec545a880f9d23463fddc894f5096cc68fa Author: Salyzyn, Mark <mark_salyzyn@adaptec.com> Date: Mon Apr 30 13:22:24 2007 -0400 [SCSI] aacraid: kexec fix (reset interrupt handler) Another layer on this onion also discovered by Duane, the interrupt enable handler also needed to be set ... The interrupt enable was called from within the synchronous command handler. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> diff --git a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c index 0c71315..b6ee3c0 100644 --- a/drivers/scsi/aacraid/rx.c +++ b/drivers/scsi/aacraid/rx.c @@ -539,6 +539,8 @@ int _aac_rx_init(struct aac_dev *dev) } /* Failure to reset here is an option ... */ + dev->a_ops.adapter_sync_cmd = rx_sync_cmd; + dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt; dev->OIMR = status = rx_readb (dev, MUnit.OIMR); if ((((status & 0xff) != 0xff) || reset_devices) && !aac_rx_restart_adapter(dev, 0)) ^ permalink raw reply related [flat|nested] 19+ messages in thread
* RE: kexec and aacraid broken 2007-05-30 2:13 ` Andrew Morton @ 2007-05-30 11:44 ` Salyzyn, Mark 2007-05-30 13:24 ` Vivek Goyal 2007-05-30 21:22 ` Yinghai Lu 0 siblings, 2 replies; 19+ messages in thread From: Salyzyn, Mark @ 2007-05-30 11:44 UTC (permalink / raw) To: Andrew Morton, Yinghai Lu Cc: Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski [-- Attachment #1: Type: text/plain, Size: 4219 bytes --] I believe this issue is a result of the aacraid_commit_reset patch (as posted for scsi-misc-2.6, enclosed to permit testing) not yet propagated to the 2.6.22-rc3 tree. This is the adapter taking longer than 3 minutes to start after a reset. I seriously doubt either of these patches suggested below will have an affect. And if they do, they are not root cause, one reduces the chances that the card will be reset during initialization (thus applied would likely mitigate this problem), the other prevents a panic when the Adapter is reset (removed, would result in dogs and cats sleeping with each other). Please use kernel parameter aacraid.startup_timeout=540 (merely larger than the default 180 seconds) when spawning the kexec or see if the aacraid_commit_reset.patch resolves the issue to confirm my hunch. Sincerely -- Mark Salyzyn > -----Original Message----- > From: Andrew Morton [mailto:akpm@linux-foundation.org] > Sent: Tuesday, May 29, 2007 10:14 PM > To: Yinghai Lu > Cc: Vivek Goyal; Eric W. Biederman; AACRAID; Linux Kernel > Mailing List; linux-scsi@vger.kernel.org; Michal Piotrowski > Subject: Re: kexec and aacraid broken > > > On Tue, 29 May 2007 18:59:32 -0700 "Yinghai Lu" > <yhlu.kernel@gmail.com> wrote: > > > latest tree, can not use kexec to load 2.6.22-rc3 at least. > > > > got: > > > > AAC0: adapter kernel panic'd fffffffd > > AAC0: adapter kernel failed to start, init status=0 > > One of the two diffs below, I guess. Please do a `patch -R > -p1' of this > email and retest? > > > > > but can load 2.6.21.3 > > > > Michal, can you please add this to the regression list? > > > > > commit 9e4d4a5d71d673901d9c1df5146ce545c2cc0cc0 > Author: Salyzyn, Mark <mark_salyzyn@adaptec.com> > Date: Tue May 1 11:43:06 2007 -0400 > > [SCSI] aacraid: superfluous adapter reset for IBM 8 > series ServeRAID controllers > > The kexec patch introduced a superfluous (and otherwise > inert) reset of > some adapters. The register can have a hardware default > value that has > zeros for the undefined interrupts. This patch refines > the test of the > interrupt enable register to focus on only the interrupts > that affect > the driver in order to detect if an incomplete shutdown > of the Adapter > had occurred (kdump). > > Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> > Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> > > diff --git a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c > index b6ee3c0..291cd14 100644 > --- a/drivers/scsi/aacraid/rx.c > +++ b/drivers/scsi/aacraid/rx.c > @@ -542,7 +542,7 @@ int _aac_rx_init(struct aac_dev *dev) > dev->a_ops.adapter_sync_cmd = rx_sync_cmd; > dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt; > dev->OIMR = status = rx_readb (dev, MUnit.OIMR); > - if ((((status & 0xff) != 0xff) || reset_devices) && > + if ((((status & 0x0c) != 0x0c) || reset_devices) && > !aac_rx_restart_adapter(dev, 0)) > ++restart; > /* > commit a5694ec545a880f9d23463fddc894f5096cc68fa > Author: Salyzyn, Mark <mark_salyzyn@adaptec.com> > Date: Mon Apr 30 13:22:24 2007 -0400 > > [SCSI] aacraid: kexec fix (reset interrupt handler) > > Another layer on this onion also discovered by Duane, the > interrupt enable handler also needed to be set ... The > interrupt enable > was called from within the synchronous command handler. > > Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> > Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> > > diff --git a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c > index 0c71315..b6ee3c0 100644 > --- a/drivers/scsi/aacraid/rx.c > +++ b/drivers/scsi/aacraid/rx.c > @@ -539,6 +539,8 @@ int _aac_rx_init(struct aac_dev *dev) > } > > /* Failure to reset here is an option ... */ > + dev->a_ops.adapter_sync_cmd = rx_sync_cmd; > + dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt; > dev->OIMR = status = rx_readb (dev, MUnit.OIMR); > if ((((status & 0xff) != 0xff) || reset_devices) && > !aac_rx_restart_adapter(dev, 0)) > > [-- Attachment #2: aacraid_commit_reset.patch --] [-- Type: application/octet-stream, Size: 3499 bytes --] diff -ru a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c --- a/drivers/scsi/aacraid/aachba.c 2007-05-16 10:29:25.697735367 -0400 +++ b/drivers/scsi/aacraid/aachba.c 2007-05-16 10:37:33.537128485 -0400 @@ -146,7 +146,7 @@ static int nondasd = -1; static int dacmode = -1; -static int commit = -1; +int aac_commit = -1; int startup_timeout = 180; int aif_timeout = 120; @@ -154,7 +154,7 @@ MODULE_PARM_DESC(nondasd, "Control scanning of hba for nondasd devices. 0=off, 1=on"); module_param(dacmode, int, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(dacmode, "Control whether dma addressing is using 64 bit DAC. 0=off, 1=on"); -module_param(commit, int, S_IRUGO|S_IWUSR); +module_param_named(commit, aac_commit, int, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(commit, "Control whether a COMMIT_CONFIG is issued to the adapter for foreign arrays.\nThis is typically needed in systems that do not have a BIOS. 0=off, 1=on"); module_param(startup_timeout, int, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(startup_timeout, "The duration of time in seconds to wait for adapter to have it's kernel up and\nrunning. This is typically adjusted for large systems that do not have a BIOS."); @@ -173,6 +173,9 @@ module_param(expose_physicals, int, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(expose_physicals, "Expose physical components of the arrays. -1=protect 0=off, 1=on"); +int aac_reset_devices = 0; +module_param_named(reset_devices, aac_reset_devices, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(reset_devices, "Force an adapter reset at initialization."); static inline int aac_valid_context(struct scsi_cmnd *scsicmd, struct fib *fibptr) { @@ -246,7 +249,7 @@ aac_fib_complete(fibptr); /* Send a CT_COMMIT_CONFIG to enable discovery of devices */ if (status >= 0) { - if ((commit == 1) || commit_flag) { + if ((aac_commit == 1) || commit_flag) { struct aac_commit_config * dinfo; aac_fib_init(fibptr); dinfo = (struct aac_commit_config *) fib_data(fibptr); @@ -261,7 +264,7 @@ 1, 1, NULL, NULL); aac_fib_complete(fibptr); - } else if (commit == 0) { + } else if (aac_commit == 0) { printk(KERN_WARNING "aac_get_config_status: Foreign device configurations are being ignored\n"); } diff -ru a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h --- a/drivers/scsi/aacraid/aacraid.h 2007-05-16 10:29:25.697735367 -0400 +++ b/drivers/scsi/aacraid/aacraid.h 2007-05-16 10:37:33.538128354 -0400 @@ -1829,3 +1829,5 @@ extern int startup_timeout; extern int aif_timeout; extern int expose_physicals; +extern int aac_reset_devices; +extern int aac_commit; diff -ru a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c --- a/drivers/scsi/aacraid/rx.c 2007-05-16 10:29:25.699735113 -0400 +++ b/drivers/scsi/aacraid/rx.c 2007-05-16 10:37:33.539128223 -0400 @@ -488,6 +488,8 @@ return -EINVAL; if (rx_readl(dev, MUnit.OMRx[0]) & KERNEL_PANIC) return -ENODEV; + if (startup_timeout < 300) + startup_timeout = 300; return 0; } @@ -542,7 +544,7 @@ dev->a_ops.adapter_sync_cmd = rx_sync_cmd; dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt; dev->OIMR = status = rx_readb (dev, MUnit.OIMR); - if ((((status & 0x0c) != 0x0c) || reset_devices) && + if ((((status & 0x0c) != 0x0c) || aac_reset_devices || reset_devices) && !aac_rx_restart_adapter(dev, 0)) ++restart; /* @@ -594,6 +596,8 @@ } msleep(1); } + if (restart) + aac_commit = 1; /* * Fill in the common function dispatch table. */ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kexec and aacraid broken 2007-05-30 11:44 ` Salyzyn, Mark @ 2007-05-30 13:24 ` Vivek Goyal 2007-05-30 13:57 ` Salyzyn, Mark 2007-05-30 21:22 ` Yinghai Lu 1 sibling, 1 reply; 19+ messages in thread From: Vivek Goyal @ 2007-05-30 13:24 UTC (permalink / raw) To: Salyzyn, Mark Cc: Andrew Morton, Yinghai Lu, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski On Wed, May 30, 2007 at 07:44:02AM -0400, Salyzyn, Mark wrote: > I believe this issue is a result of the aacraid_commit_reset patch (as > posted for scsi-misc-2.6, enclosed to permit testing) not yet propagated > to the 2.6.22-rc3 tree. > > This is the adapter taking longer than 3 minutes to start after a reset. > I seriously doubt either of these patches suggested below will have an > affect. And if they do, they are not root cause, one reduces the chances > that the card will be reset during initialization (thus applied would > likely mitigate this problem), the other prevents a panic when the > Adapter is reset (removed, would result in dogs and cats sleeping with > each other). > > Please use kernel parameter aacraid.startup_timeout=540 (merely larger > than the default 180 seconds) when spawning the kexec or see if the > aacraid_commit_reset.patch resolves the issue to confirm my hunch. > Hi Mark, During a normal kexec (not kdump) adapter reset should not have taken place at all. device_shutdown() routines should have taken care to bring the device to a known sane state in first kernel so that second kernel can initialize it without doing a reset. With reset patch, now reset triggers on every kexec. Previously that was not the case with kexec and adapter used to come up. I think this needs to be looked into. Thanks Vivek ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: kexec and aacraid broken 2007-05-30 13:24 ` Vivek Goyal @ 2007-05-30 13:57 ` Salyzyn, Mark 2007-05-30 14:17 ` Vivek Goyal 0 siblings, 1 reply; 19+ messages in thread From: Salyzyn, Mark @ 2007-05-30 13:57 UTC (permalink / raw) To: vgoyal Cc: Andrew Morton, Yinghai Lu, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski This is clouding the issue, Vivek. There should be no harm, except to time, resetting the adapter. I do want to optimize for boot time, but do not view this as a 'bug' if the Adapter should reset during the initialization procedure. We need instead to harden the driver to deal with Adapters that behave in an untimely manner as a result of the reset since this generically deals with all possible transitions (boot w/o BIOS, w/BIOS, kexec and kdump). I will look into a possibility the driver is not performing the clean shutdown as a result of a kexec, but that is a refinement and should not be considered a fix for *this* reported problem; it merely moves the problem to a kdump. The driver only disables the interrupts when the driver is .remove'd (aac_remove_one) and not for .shutdown (aac_shutdown). The later merely tells the firmware to stop performing builds if in progress, flush the cache, and all subsequent writes are performed in write-through mode; it does not clear out the driver resources and leaves that to the .remove function only. The failure of .remove being called may be a result of this being a boot driver? Also, the code: dev->OIMR = status = rx_readb (dev, MUnit.OIMR); if ((((status & 0x0c) != 0x0c) . . . detects if the adapter's interrupts were disabled, as would happen on a clean shutdown. Some of the Adapters can NOT disable their interrupts, and some have a default state with the interrupts enabled. If the Adapter still has active interrupts, then there is no telling what transpired before and it is considered a safety measure to reset the Adapter in these cases. I'd prefer to err on the side of resetting the Adapter superfluously than deal with a condition where the Adapter could be in an unknown state with a possibility of sustaining an outstanding command and associated interrupt (which was the whole reason this code was introduced). In time I am sure, I will refine this code to incorporate Quirks for adapters that have unusual conditions for the above stated interrupt and remove the possible superfluous reset. Yinghai, can you please provide the Adapter designation just in case it could be the first in this refined list. I will NOT consider this refinement a bugfix for the same reasons stated above. Sincerely -- Mark Salyzyn > -----Original Message----- > From: Vivek Goyal [mailto:vgoyal@in.ibm.com] > Sent: Wednesday, May 30, 2007 9:25 AM > To: Salyzyn, Mark > Cc: Andrew Morton; Yinghai Lu; Eric W. Biederman; Linux > Kernel Mailing List; linux-scsi@vger.kernel.org; Michal Piotrowski > Subject: Re: kexec and aacraid broken > > > On Wed, May 30, 2007 at 07:44:02AM -0400, Salyzyn, Mark wrote: > > I believe this issue is a result of the > aacraid_commit_reset patch (as > > posted for scsi-misc-2.6, enclosed to permit testing) not > yet propagated > > to the 2.6.22-rc3 tree. > > > > This is the adapter taking longer than 3 minutes to start > after a reset. > > I seriously doubt either of these patches suggested below > will have an > > affect. And if they do, they are not root cause, one > reduces the chances > > that the card will be reset during initialization (thus > applied would > > likely mitigate this problem), the other prevents a panic when the > > Adapter is reset (removed, would result in dogs and cats > sleeping with > > each other). > > > > Please use kernel parameter aacraid.startup_timeout=540 > (merely larger > > than the default 180 seconds) when spawning the kexec or see if the > > aacraid_commit_reset.patch resolves the issue to confirm my hunch. > > > > Hi Mark, > > During a normal kexec (not kdump) adapter reset should not have taken > place at all. device_shutdown() routines should have taken care to > bring the device to a known sane state in first kernel so that second > kernel can initialize it without doing a reset. > > With reset patch, now reset triggers on every kexec. Previously > that was not the case with kexec and adapter used to come up. I think > this needs to be looked into. > > Thanks > Vivek > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kexec and aacraid broken 2007-05-30 13:57 ` Salyzyn, Mark @ 2007-05-30 14:17 ` Vivek Goyal 2007-05-30 14:30 ` Salyzyn, Mark 0 siblings, 1 reply; 19+ messages in thread From: Vivek Goyal @ 2007-05-30 14:17 UTC (permalink / raw) To: Salyzyn, Mark Cc: Andrew Morton, Yinghai Lu, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski On Wed, May 30, 2007 at 09:57:08AM -0400, Salyzyn, Mark wrote: > This is clouding the issue, Vivek. > > There should be no harm, except to time, resetting the adapter. I do > want to optimize for boot time, but do not view this as a 'bug' if the > Adapter should reset during the initialization procedure. We need > instead to harden the driver to deal with Adapters that behave in an > untimely manner as a result of the reset since this generically deals > with all possible transitions (boot w/o BIOS, w/BIOS, kexec and kdump). > Hi Mark, I agree. We should make sure that we should be able to do a software reset of adapters. > I will look into a possibility the driver is not performing the clean > shutdown as a result of a kexec, but that is a refinement and should not > be considered a fix for *this* reported problem; it merely moves the > problem to a kdump. Agreed. I just wanted to bring out this point that right now we are triggering software reset on every kexec and probably that is not required. One can avoid it to save boot time. That was the whole purpose of kexec (fastboot) project. But this is not a fix for this problem. We should any way be able to reset the device and should root cause this. > The driver only disables the interrupts when the > driver is .remove'd (aac_remove_one) and not for .shutdown > (aac_shutdown). The later merely tells the firmware to stop performing > builds if in progress, flush the cache, and all subsequent writes are > performed in write-through mode; it does not clear out the driver > resources and leaves that to the .remove function only. The failure of > .remove being called may be a result of this being a boot driver? > > Also, the code: > > dev->OIMR = status = rx_readb (dev, MUnit.OIMR); > if ((((status & 0x0c) != 0x0c) . . . > > detects if the adapter's interrupts were disabled, as would happen on a > clean shutdown. Some of the Adapters can NOT disable their interrupts, > and some have a default state with the interrupts enabled. If the > Adapter still has active interrupts, then there is no telling what > transpired before and it is considered a safety measure to reset the > Adapter in these cases. I'd prefer to err on the side of resetting the > Adapter superfluously than deal with a condition where the Adapter could > be in an unknown state with a possibility of sustaining an outstanding > command and associated interrupt (which was the whole reason this code > was introduced). > So most likely if we start disabling the interrupts in .shutdown routine we might skip resetting adapter on every kexec without any side affects? Thanks Vivek ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: kexec and aacraid broken 2007-05-30 14:17 ` Vivek Goyal @ 2007-05-30 14:30 ` Salyzyn, Mark 2007-05-30 15:59 ` [PATCH] aacraid: fix shutdown handler to also disable interrupts Salyzyn, Mark 2007-05-30 21:19 ` kexec and aacraid broken Yinghai Lu 0 siblings, 2 replies; 19+ messages in thread From: Salyzyn, Mark @ 2007-05-30 14:30 UTC (permalink / raw) To: vgoyal Cc: Andrew Morton, Yinghai Lu, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski Vivek Goyal [mailto:vgoyal@in.ibm.com] writes: > So most likely if we start disabling the interrupts > in .shutdown routine we might skip resetting adapter > on every kexec without any side affects? Not that simple. The .shutdown would need to perform more resource cleanups of the .remove call to prevent side effects. I need to move some of the .remove activity into the .shutdown handler to make sure the adapter is quiesced. I will hold off on submitting any of these changes until they are evaluated and tested; I am waiting for feedback from Yinghai on the other mitigations that I feel are closer to the root cause. Sincerely -- Mark Salyzyn ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] aacraid: fix shutdown handler to also disable interrupts. 2007-05-30 14:30 ` Salyzyn, Mark @ 2007-05-30 15:59 ` Salyzyn, Mark 2007-05-30 17:36 ` Yinghai Lu 2007-06-01 11:08 ` Vivek Goyal 2007-05-30 21:19 ` kexec and aacraid broken Yinghai Lu 1 sibling, 2 replies; 19+ messages in thread From: Salyzyn, Mark @ 2007-05-30 15:59 UTC (permalink / raw) To: linux-scsi Cc: vgoyal, Andrew Morton, Yinghai Lu, Eric W. Biederman, Michal Piotrowski, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 2022 bytes --] Moves quiesce, thread and interrupt shutdown into aacraid drivers' .shutdown handler. This fix to the aac_shutdown handler will remove the superfluous reset of the adapter during a (clean) kexec. This fix may mitigate the active investigation 'kexec and aacraid broken' but it is unlikely to affect the root cause (issue likely present in both kexec and kdump). This patch reduces the chance the problem will occur with a kexec. The fix for root cause is currently expected to be the minimum value check to the aacraid.startup_timeout driver variable after an adapter reset within aacraid_commit_reset.patch submitted on 05/22/2007 and awaiting testing by Yinghai to confirm. This attached patch is against current scsi-misc-2.6 ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> Sincerely -- Mark Salyzyn > -----Original Message----- > From: linux-scsi-owner@vger.kernel.org > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Salyzyn, Mark > Sent: Wednesday, May 30, 2007 10:31 AM > To: vgoyal@in.ibm.com > Cc: Andrew Morton; Yinghai Lu; Eric W. Biederman; Linux > Kernel Mailing List; linux-scsi@vger.kernel.org; Michal Piotrowski > Subject: RE: kexec and aacraid broken > > Vivek Goyal [mailto:vgoyal@in.ibm.com] writes: > > So most likely if we start disabling the interrupts > > in .shutdown routine we might skip resetting adapter > > on every kexec without any side affects? > > Not that simple. The .shutdown would need to perform more resource > cleanups of the .remove call to prevent side effects. I need to move > some of the .remove activity into the .shutdown handler to > make sure the > adapter is quiesced. > > I will hold off on submitting any of these changes until they are > evaluated and tested; I am waiting for feedback from Yinghai on the > other mitigations that I feel are closer to the root cause. > > Sincerely -- Mark Salyzyn [-- Attachment #2: aacraid_shutdown.patch --] [-- Type: application/octet-stream, Size: 1524 bytes --] diff -ru a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c --- a/drivers/scsi/aacraid/linit.c 2007-05-30 11:00:36.619831521 -0400 +++ b/drivers/scsi/aacraid/linit.c 2007-05-30 11:04:35.325867212 -0400 @@ -859,6 +859,14 @@ .emulated = 1, }; +static void __aac_shutdown(struct aac_dev * aac) +{ + kthread_stop(aac->thread); + aac_send_shutdown(aac); + aac_adapter_disable_int(aac); + free_irq(aac->pdev->irq, aac); +} + static int __devinit aac_probe_one(struct pci_dev *pdev, const struct pci_device_id *id) { @@ -1011,10 +1019,7 @@ return 0; out_deinit: - kthread_stop(aac->thread); - aac_send_shutdown(aac); - aac_adapter_disable_int(aac); - free_irq(pdev->irq, aac); + __aac_shutdown(aac); out_unmap: aac_fib_map_free(aac); pci_free_consistent(aac->pdev, aac->comm_size, aac->comm_addr, aac->comm_phys); @@ -1034,7 +1039,8 @@ { struct Scsi_Host *shost = pci_get_drvdata(dev); struct aac_dev *aac = (struct aac_dev *)shost->hostdata; - aac_send_shutdown(aac); + scsi_block_requests(shost); + __aac_shutdown(aac); } static void __devexit aac_remove_one(struct pci_dev *pdev) @@ -1044,16 +1050,12 @@ scsi_remove_host(shost); - kthread_stop(aac->thread); - - aac_send_shutdown(aac); - aac_adapter_disable_int(aac); + __aac_shutdown(aac); aac_fib_map_free(aac); pci_free_consistent(aac->pdev, aac->comm_size, aac->comm_addr, aac->comm_phys); kfree(aac->queues); - free_irq(pdev->irq, aac); aac_adapter_ioremap(aac, 0); kfree(aac->fibs); ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] aacraid: fix shutdown handler to also disable interrupts. 2007-05-30 15:59 ` [PATCH] aacraid: fix shutdown handler to also disable interrupts Salyzyn, Mark @ 2007-05-30 17:36 ` Yinghai Lu 2007-06-01 11:08 ` Vivek Goyal 1 sibling, 0 replies; 19+ messages in thread From: Yinghai Lu @ 2007-05-30 17:36 UTC (permalink / raw) To: Salyzyn, Mark Cc: linux-scsi, vgoyal, Andrew Morton, Eric W. Biederman, Michal Piotrowski, Linux Kernel Mailing List On 5/30/07, Salyzyn, Mark <mark_salyzyn@adaptec.com> wrote: > Moves quiesce, thread and interrupt shutdown into aacraid drivers' > .shutdown handler. This fix to the aac_shutdown handler will remove the > superfluous reset of the adapter during a (clean) kexec. > > This fix may mitigate the active investigation 'kexec and aacraid > broken' but it is unlikely to affect the root cause (issue likely > present in both kexec and kdump). This patch reduces the chance the > problem will occur with a kexec. The fix for root cause is currently > expected to be the minimum value check to the aacraid.startup_timeout > driver variable after an adapter reset within aacraid_commit_reset.patch > submitted on 05/22/2007 and awaiting testing by Yinghai to confirm. > > This attached patch is against current scsi-misc-2.6 > > ObligatoryDisclaimer: Please accept my condolences regarding Outlook's > handling of patch attachments. > > Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> > > Sincerely -- Mark Salyzyn > the kernel with this patch -4 and even without 1. [SCSI] aacraid: superfluous adapter reset for IBM 8 series ServeRAID controllers 2. [SCSI] aacraid: kexec fix (reset interrupt handler) 3. aacraid_commit_reset.patch can load other kernel with or without patch 1,2,3 YH ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] aacraid: fix shutdown handler to also disable interrupts. 2007-05-30 15:59 ` [PATCH] aacraid: fix shutdown handler to also disable interrupts Salyzyn, Mark 2007-05-30 17:36 ` Yinghai Lu @ 2007-06-01 11:08 ` Vivek Goyal 2007-06-01 17:07 ` Yinghai Lu 1 sibling, 1 reply; 19+ messages in thread From: Vivek Goyal @ 2007-06-01 11:08 UTC (permalink / raw) To: Salyzyn, Mark Cc: linux-scsi, Andrew Morton, Yinghai Lu, Eric W. Biederman, Michal Piotrowski, Linux Kernel Mailing List On Wed, May 30, 2007 at 11:59:13AM -0400, Salyzyn, Mark wrote: > Moves quiesce, thread and interrupt shutdown into aacraid drivers' > .shutdown handler. This fix to the aac_shutdown handler will remove the > superfluous reset of the adapter during a (clean) kexec. > > This fix may mitigate the active investigation 'kexec and aacraid > broken' but it is unlikely to affect the root cause (issue likely > present in both kexec and kdump). This patch reduces the chance the > problem will occur with a kexec. The fix for root cause is currently > expected to be the minimum value check to the aacraid.startup_timeout > driver variable after an adapter reset within aacraid_commit_reset.patch > submitted on 05/22/2007 and awaiting testing by Yinghai to confirm. > > This attached patch is against current scsi-misc-2.6 > > ObligatoryDisclaimer: Please accept my condolences regarding Outlook's > handling of patch attachments. > > Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> > Thanks Mark. This does fix the issue of unnecessary reset of aacraid adapter over kexec on my machine. Thanks Vivek ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] aacraid: fix shutdown handler to also disable interrupts. 2007-06-01 11:08 ` Vivek Goyal @ 2007-06-01 17:07 ` Yinghai Lu 2007-06-01 17:34 ` Salyzyn, Mark 0 siblings, 1 reply; 19+ messages in thread From: Yinghai Lu @ 2007-06-01 17:07 UTC (permalink / raw) To: vgoyal Cc: Salyzyn, Mark, linux-scsi, Andrew Morton, Eric W. Biederman, Michal Piotrowski, Linux Kernel Mailing List On 6/1/07, Vivek Goyal <vgoyal@in.ibm.com> wrote: > On Wed, May 30, 2007 at 11:59:13AM -0400, Salyzyn, Mark wrote: > > Moves quiesce, thread and interrupt shutdown into aacraid drivers' > > .shutdown handler. This fix to the aac_shutdown handler will remove the > > superfluous reset of the adapter during a (clean) kexec. > > > > This fix may mitigate the active investigation 'kexec and aacraid > > broken' but it is unlikely to affect the root cause (issue likely > > present in both kexec and kdump). This patch reduces the chance the > > problem will occur with a kexec. The fix for root cause is currently > > expected to be the minimum value check to the aacraid.startup_timeout > > driver variable after an adapter reset within aacraid_commit_reset.patch > > submitted on 05/22/2007 and awaiting testing by Yinghai to confirm. > > > > This attached patch is against current scsi-misc-2.6 > > > > ObligatoryDisclaimer: Please accept my condolences regarding Outlook's > > handling of patch attachments. > > > > Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> > > > > Thanks Mark. This does fix the issue of unnecessary reset of aacraid > adapter over kexec on my machine. > i'm little confused about that. this patch is some clear shutdown, so even next start will have tight condition will not try to reset the adapter fw. right Mark? Maybe the driver could be smart to find out if it need to reset adaptec fw. YH ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] aacraid: fix shutdown handler to also disable interrupts. 2007-06-01 17:07 ` Yinghai Lu @ 2007-06-01 17:34 ` Salyzyn, Mark 0 siblings, 0 replies; 19+ messages in thread From: Salyzyn, Mark @ 2007-06-01 17:34 UTC (permalink / raw) To: Yinghai Lu, vgoyal Cc: linux-scsi, Andrew Morton, Eric W. Biederman, Michal Piotrowski, Linux Kernel Mailing List Yes, this patch makes sure that the Adapter is shut down correctly, and thus when the kexec driver loads, it does not automatically reset the adapter during initialization. This regression was a result of adding code to the driver to detect if the adapter needed a reset as a result of an unclean shutdown in order to deal with an issue that came up with kdump. Kdump does not issue a clean shutdown. As you see, it was the process of making the driver smarter to find out if it needed to reset the adaptec fw that triggered the problem. As noted before, please be advised to go through SUN channels. Upgrade your Drive(s), SES, Motherboard and Card Firmware to the latest versions; and make sure you are using compatible drives and drive bays to see if this problem dealing with the superfluous reset on your pre-release system goes away. You will be able to trigger this by trying to perform a kdump on the system, OR by reverting this patch and running your kexec test. The superfluous reset has yet to cause an issue with a released card beyond noticing a superfluous Firmware reset as Vivek has pointed out. Sincerely -- Mark Salyzyn From: Yinghai Lu [mailto:yhlu.kernel@gmail.com] sez: > On 6/1/07, Vivek Goyal <vgoyal@in.ibm.com> wrote: > > Thanks Mark. This does fix the issue of unnecessary reset of aacraid > > adapter over kexec on my machine. > i'm little confused about that. > this patch is some clear shutdown, so even next start will have tight > condition will not try to reset the adapter fw. right Mark? > Maybe the driver could be smart to find out if it need to > reset adaptec fw. > > YH ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kexec and aacraid broken 2007-05-30 14:30 ` Salyzyn, Mark 2007-05-30 15:59 ` [PATCH] aacraid: fix shutdown handler to also disable interrupts Salyzyn, Mark @ 2007-05-30 21:19 ` Yinghai Lu 1 sibling, 0 replies; 19+ messages in thread From: Yinghai Lu @ 2007-05-30 21:19 UTC (permalink / raw) To: Salyzyn, Mark Cc: vgoyal, Andrew Morton, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski On 5/30/07, Salyzyn, Mark <mark_salyzyn@adaptec.com> wrote: > Vivek Goyal [mailto:vgoyal@in.ibm.com] writes: > > So most likely if we start disabling the interrupts > > in .shutdown routine we might skip resetting adapter > > on every kexec without any side affects? > > Not that simple. The .shutdown would need to perform more resource > cleanups of the .remove call to prevent side effects. I need to move > some of the .remove activity into the .shutdown handler to make sure the > adapter is quiesced. > > I will hold off on submitting any of these changes until they are > evaluated and tested; I am waiting for feedback from Yinghai on the > other mitigations that I feel are closer to the root cause. > 1. [SCSI] aacraid: superfluous adapter reset for IBM 8 series ServeRAID controllers 2. [SCSI] aacraid: kexec fix (reset interrupt handler) 3. aacraid_commit_reset.patch 4. [PATCH] aacraid: fix shutdown handler to also disable interrupts the kernel with this patch -4 and even without 1, 2, 3 can load other kernel with or without patch 1,2,3 YH ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kexec and aacraid broken 2007-05-30 11:44 ` Salyzyn, Mark 2007-05-30 13:24 ` Vivek Goyal @ 2007-05-30 21:22 ` Yinghai Lu 2007-05-30 21:49 ` Salyzyn, Mark 1 sibling, 1 reply; 19+ messages in thread From: Yinghai Lu @ 2007-05-30 21:22 UTC (permalink / raw) To: Salyzyn, Mark Cc: Andrew Morton, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski On 5/30/07, Salyzyn, Mark <mark_salyzyn@adaptec.com> wrote: > I believe this issue is a result of the aacraid_commit_reset patch (as > posted for scsi-misc-2.6, enclosed to permit testing) not yet propagated > to the 2.6.22-rc3 tree. > > This is the adapter taking longer than 3 minutes to start after a reset. > I seriously doubt either of these patches suggested below will have an > affect. And if they do, they are not root cause, one reduces the chances > that the card will be reset during initialization (thus applied would > likely mitigate this problem), the other prevents a panic when the > Adapter is reset (removed, would result in dogs and cats sleeping with > each other). > > Please use kernel parameter aacraid.startup_timeout=540 (merely larger > than the default 180 seconds) when spawning the kexec or see if the > aacraid_commit_reset.patch resolves the issue to confirm my hunch. > aacraid_commit_reset.patch is in the mainline already. YH ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: kexec and aacraid broken 2007-05-30 21:22 ` Yinghai Lu @ 2007-05-30 21:49 ` Salyzyn, Mark 2007-05-30 22:11 ` Yinghai Lu 0 siblings, 1 reply; 19+ messages in thread From: Salyzyn, Mark @ 2007-05-30 21:49 UTC (permalink / raw) To: Yinghai Lu Cc: Andrew Morton, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski Yinghai Lu [mailto:yhlu.kernel@gmail.com] writes: > aacraid_commit_reset.patch is in the mainline already. But aacraid_commit_reset.patch is not in 2.6.22-rc3 (to which you report the issue). Does the aacraid_commit_reset.patch work to resolve this issue all by itself in the kexec'd kernel? Or alternatively did you try aacraid.startup_timeout=540 as one of the kernel parameters passed to the kexec'd kernel? The '[PATCH] aacraid: fix shutdown handler to also disable interrupts' patch (you refer to this as patch 4) is not to be in the picture because it will hide the root cause. I believe I have you correct in stating that this patch (4) resolves the problem... but I expect the problem to remain with kdump. Sincerely -- Mark Salyzyn ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kexec and aacraid broken 2007-05-30 21:49 ` Salyzyn, Mark @ 2007-05-30 22:11 ` Yinghai Lu 2007-05-31 12:37 ` Salyzyn, Mark 0 siblings, 1 reply; 19+ messages in thread From: Yinghai Lu @ 2007-05-30 22:11 UTC (permalink / raw) To: Salyzyn, Mark Cc: Andrew Morton, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski On 5/30/07, Salyzyn, Mark <mark_salyzyn@adaptec.com> wrote: > Yinghai Lu [mailto:yhlu.kernel@gmail.com] writes: > > aacraid_commit_reset.patch is in the mainline already. > > But aacraid_commit_reset.patch is not in 2.6.22-rc3 (to which you report > the issue). Does the aacraid_commit_reset.patch work to resolve this > issue all by itself in the kexec'd kernel? Or alternatively did you try > aacraid.startup_timeout=540 as one of the kernel parameters passed to > the kexec'd kernel? No, still get adapter kernel panic > > The '[PATCH] aacraid: fix shutdown handler to also disable interrupts' > patch (you refer to this as patch 4) is not to be in the picture because > it will hide the root cause. I believe I have you correct in stating > that this patch (4) resolves the problem... but I expect the problem to > remain with kdump. Oh. without patch(4), latest kernel still can use kexec to 2.6.21.3 will try to load 2.6.22-rc1 etc. YH ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: kexec and aacraid broken 2007-05-30 22:11 ` Yinghai Lu @ 2007-05-31 12:37 ` Salyzyn, Mark 2007-05-31 19:59 ` Yinghai Lu 0 siblings, 1 reply; 19+ messages in thread From: Salyzyn, Mark @ 2007-05-31 12:37 UTC (permalink / raw) To: Yinghai Lu Cc: Andrew Morton, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski > No, still get adapter kernel panic Which adapter are you using? Sincerely -- Mark Salyzyn ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kexec and aacraid broken 2007-05-31 12:37 ` Salyzyn, Mark @ 2007-05-31 19:59 ` Yinghai Lu 2007-05-31 20:45 ` Salyzyn, Mark 0 siblings, 1 reply; 19+ messages in thread From: Yinghai Lu @ 2007-05-31 19:59 UTC (permalink / raw) To: Salyzyn, Mark Cc: Andrew Morton, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski SUN coguar with 11731 YH On 5/31/07, Salyzyn, Mark <mark_salyzyn@adaptec.com> wrote: > > No, still get adapter kernel panic > > Which adapter are you using? > > Sincerely -- Mark Salyzyn > ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: kexec and aacraid broken 2007-05-31 19:59 ` Yinghai Lu @ 2007-05-31 20:45 ` Salyzyn, Mark 0 siblings, 0 replies; 19+ messages in thread From: Salyzyn, Mark @ 2007-05-31 20:45 UTC (permalink / raw) To: Yinghai Lu Cc: Andrew Morton, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski Ahhhh. explains why I am having troubles duping this issue thus far. This is prerelease Firmware on a yet to be released card and thus should not get any driver workarounds if this issue can be resolved in Firmware. If this can be duped on a released card with released Firmware, then the story changes of course; but still does not preclude a Firmware/Hardware/Drive Compatibility bug ;-} . Until then, please work this issue via SUN channels so that we get all the necessary card debug information for our teams to work this. I will ensure Adaptec will remain on top of this issue since it is clearly a problem with the Adapter Hardware interfacing. The adapter is not surviving an IOP_RESET and is going into an Adapter Firmware Kernel Panic or taking an excessively long period (in the testing thus far > 540 seconds) of time to complete it's reset. Sincerely -- Mark Salyzyn Yinghai Lu [mailto:yhlu.kernel@gmail.com] sez: > SUN coguar with 11731 > > On 5/31/07, Salyzyn, Mark <mark_salyzyn@adaptec.com> wrote: > > > No, still get adapter kernel panic > > > > Which adapter are you using? ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2007-06-01 17:34 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-05-30 1:59 kexec and aacraid broken Yinghai Lu 2007-05-30 2:13 ` Andrew Morton 2007-05-30 11:44 ` Salyzyn, Mark 2007-05-30 13:24 ` Vivek Goyal 2007-05-30 13:57 ` Salyzyn, Mark 2007-05-30 14:17 ` Vivek Goyal 2007-05-30 14:30 ` Salyzyn, Mark 2007-05-30 15:59 ` [PATCH] aacraid: fix shutdown handler to also disable interrupts Salyzyn, Mark 2007-05-30 17:36 ` Yinghai Lu 2007-06-01 11:08 ` Vivek Goyal 2007-06-01 17:07 ` Yinghai Lu 2007-06-01 17:34 ` Salyzyn, Mark 2007-05-30 21:19 ` kexec and aacraid broken Yinghai Lu 2007-05-30 21:22 ` Yinghai Lu 2007-05-30 21:49 ` Salyzyn, Mark 2007-05-30 22:11 ` Yinghai Lu 2007-05-31 12:37 ` Salyzyn, Mark 2007-05-31 19:59 ` Yinghai Lu 2007-05-31 20:45 ` Salyzyn, Mark
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox