* Re: kexec and aacraid broken [not found] <86802c440705291859y39a4ca27uf5ddb84810f33510@mail.gmail.com> @ 2007-05-30 2:13 ` Andrew Morton 2007-05-30 11:44 ` Salyzyn, Mark 0 siblings, 1 reply; 28+ messages in thread From: Andrew Morton @ 2007-05-30 2:13 UTC (permalink / raw) To: Yinghai Lu Cc: Vivek Goyal, Eric W. Biederman, aacraid, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski On Tue, 29 May 2007 18:59:32 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote: > latest tree, can not use kexec to load 2.6.22-rc3 at least. > > got: > > AAC0: adapter kernel panic'd fffffffd > AAC0: adapter kernel failed to start, init status=0 One of the two diffs below, I guess. Please do a `patch -R -p1' of this email and retest? > > but can load 2.6.21.3 > Michal, can you please add this to the regression list? commit 9e4d4a5d71d673901d9c1df5146ce545c2cc0cc0 Author: Salyzyn, Mark <mark_salyzyn@adaptec.com> Date: Tue May 1 11:43:06 2007 -0400 [SCSI] aacraid: superfluous adapter reset for IBM 8 series ServeRAID controllers The kexec patch introduced a superfluous (and otherwise inert) reset of some adapters. The register can have a hardware default value that has zeros for the undefined interrupts. This patch refines the test of the interrupt enable register to focus on only the interrupts that affect the driver in order to detect if an incomplete shutdown of the Adapter had occurred (kdump). Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> diff --git a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c index b6ee3c0..291cd14 100644 --- a/drivers/scsi/aacraid/rx.c +++ b/drivers/scsi/aacraid/rx.c @@ -542,7 +542,7 @@ int _aac_rx_init(struct aac_dev *dev) dev->a_ops.adapter_sync_cmd = rx_sync_cmd; dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt; dev->OIMR = status = rx_readb (dev, MUnit.OIMR); - if ((((status & 0xff) != 0xff) || reset_devices) && + if ((((status & 0x0c) != 0x0c) || reset_devices) && !aac_rx_restart_adapter(dev, 0)) ++restart; /* commit a5694ec545a880f9d23463fddc894f5096cc68fa Author: Salyzyn, Mark <mark_salyzyn@adaptec.com> Date: Mon Apr 30 13:22:24 2007 -0400 [SCSI] aacraid: kexec fix (reset interrupt handler) Another layer on this onion also discovered by Duane, the interrupt enable handler also needed to be set ... The interrupt enable was called from within the synchronous command handler. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> diff --git a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c index 0c71315..b6ee3c0 100644 --- a/drivers/scsi/aacraid/rx.c +++ b/drivers/scsi/aacraid/rx.c @@ -539,6 +539,8 @@ int _aac_rx_init(struct aac_dev *dev) } /* Failure to reset here is an option ... */ + dev->a_ops.adapter_sync_cmd = rx_sync_cmd; + dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt; dev->OIMR = status = rx_readb (dev, MUnit.OIMR); if ((((status & 0xff) != 0xff) || reset_devices) && !aac_rx_restart_adapter(dev, 0)) ^ permalink raw reply related [flat|nested] 28+ messages in thread
* RE: kexec and aacraid broken 2007-05-30 2:13 ` kexec and aacraid broken Andrew Morton @ 2007-05-30 11:44 ` Salyzyn, Mark 2007-05-30 13:24 ` Vivek Goyal 2007-05-30 21:22 ` Yinghai Lu 0 siblings, 2 replies; 28+ messages in thread From: Salyzyn, Mark @ 2007-05-30 11:44 UTC (permalink / raw) To: Andrew Morton, Yinghai Lu Cc: Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski [-- Attachment #1: Type: text/plain, Size: 4219 bytes --] I believe this issue is a result of the aacraid_commit_reset patch (as posted for scsi-misc-2.6, enclosed to permit testing) not yet propagated to the 2.6.22-rc3 tree. This is the adapter taking longer than 3 minutes to start after a reset. I seriously doubt either of these patches suggested below will have an affect. And if they do, they are not root cause, one reduces the chances that the card will be reset during initialization (thus applied would likely mitigate this problem), the other prevents a panic when the Adapter is reset (removed, would result in dogs and cats sleeping with each other). Please use kernel parameter aacraid.startup_timeout=540 (merely larger than the default 180 seconds) when spawning the kexec or see if the aacraid_commit_reset.patch resolves the issue to confirm my hunch. Sincerely -- Mark Salyzyn > -----Original Message----- > From: Andrew Morton [mailto:akpm@linux-foundation.org] > Sent: Tuesday, May 29, 2007 10:14 PM > To: Yinghai Lu > Cc: Vivek Goyal; Eric W. Biederman; AACRAID; Linux Kernel > Mailing List; linux-scsi@vger.kernel.org; Michal Piotrowski > Subject: Re: kexec and aacraid broken > > > On Tue, 29 May 2007 18:59:32 -0700 "Yinghai Lu" > <yhlu.kernel@gmail.com> wrote: > > > latest tree, can not use kexec to load 2.6.22-rc3 at least. > > > > got: > > > > AAC0: adapter kernel panic'd fffffffd > > AAC0: adapter kernel failed to start, init status=0 > > One of the two diffs below, I guess. Please do a `patch -R > -p1' of this > email and retest? > > > > > but can load 2.6.21.3 > > > > Michal, can you please add this to the regression list? > > > > > commit 9e4d4a5d71d673901d9c1df5146ce545c2cc0cc0 > Author: Salyzyn, Mark <mark_salyzyn@adaptec.com> > Date: Tue May 1 11:43:06 2007 -0400 > > [SCSI] aacraid: superfluous adapter reset for IBM 8 > series ServeRAID controllers > > The kexec patch introduced a superfluous (and otherwise > inert) reset of > some adapters. The register can have a hardware default > value that has > zeros for the undefined interrupts. This patch refines > the test of the > interrupt enable register to focus on only the interrupts > that affect > the driver in order to detect if an incomplete shutdown > of the Adapter > had occurred (kdump). > > Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> > Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> > > diff --git a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c > index b6ee3c0..291cd14 100644 > --- a/drivers/scsi/aacraid/rx.c > +++ b/drivers/scsi/aacraid/rx.c > @@ -542,7 +542,7 @@ int _aac_rx_init(struct aac_dev *dev) > dev->a_ops.adapter_sync_cmd = rx_sync_cmd; > dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt; > dev->OIMR = status = rx_readb (dev, MUnit.OIMR); > - if ((((status & 0xff) != 0xff) || reset_devices) && > + if ((((status & 0x0c) != 0x0c) || reset_devices) && > !aac_rx_restart_adapter(dev, 0)) > ++restart; > /* > commit a5694ec545a880f9d23463fddc894f5096cc68fa > Author: Salyzyn, Mark <mark_salyzyn@adaptec.com> > Date: Mon Apr 30 13:22:24 2007 -0400 > > [SCSI] aacraid: kexec fix (reset interrupt handler) > > Another layer on this onion also discovered by Duane, the > interrupt enable handler also needed to be set ... The > interrupt enable > was called from within the synchronous command handler. > > Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> > Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> > > diff --git a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c > index 0c71315..b6ee3c0 100644 > --- a/drivers/scsi/aacraid/rx.c > +++ b/drivers/scsi/aacraid/rx.c > @@ -539,6 +539,8 @@ int _aac_rx_init(struct aac_dev *dev) > } > > /* Failure to reset here is an option ... */ > + dev->a_ops.adapter_sync_cmd = rx_sync_cmd; > + dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt; > dev->OIMR = status = rx_readb (dev, MUnit.OIMR); > if ((((status & 0xff) != 0xff) || reset_devices) && > !aac_rx_restart_adapter(dev, 0)) > > [-- Attachment #2: aacraid_commit_reset.patch --] [-- Type: application/octet-stream, Size: 3499 bytes --] diff -ru a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c --- a/drivers/scsi/aacraid/aachba.c 2007-05-16 10:29:25.697735367 -0400 +++ b/drivers/scsi/aacraid/aachba.c 2007-05-16 10:37:33.537128485 -0400 @@ -146,7 +146,7 @@ static int nondasd = -1; static int dacmode = -1; -static int commit = -1; +int aac_commit = -1; int startup_timeout = 180; int aif_timeout = 120; @@ -154,7 +154,7 @@ MODULE_PARM_DESC(nondasd, "Control scanning of hba for nondasd devices. 0=off, 1=on"); module_param(dacmode, int, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(dacmode, "Control whether dma addressing is using 64 bit DAC. 0=off, 1=on"); -module_param(commit, int, S_IRUGO|S_IWUSR); +module_param_named(commit, aac_commit, int, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(commit, "Control whether a COMMIT_CONFIG is issued to the adapter for foreign arrays.\nThis is typically needed in systems that do not have a BIOS. 0=off, 1=on"); module_param(startup_timeout, int, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(startup_timeout, "The duration of time in seconds to wait for adapter to have it's kernel up and\nrunning. This is typically adjusted for large systems that do not have a BIOS."); @@ -173,6 +173,9 @@ module_param(expose_physicals, int, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(expose_physicals, "Expose physical components of the arrays. -1=protect 0=off, 1=on"); +int aac_reset_devices = 0; +module_param_named(reset_devices, aac_reset_devices, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(reset_devices, "Force an adapter reset at initialization."); static inline int aac_valid_context(struct scsi_cmnd *scsicmd, struct fib *fibptr) { @@ -246,7 +249,7 @@ aac_fib_complete(fibptr); /* Send a CT_COMMIT_CONFIG to enable discovery of devices */ if (status >= 0) { - if ((commit == 1) || commit_flag) { + if ((aac_commit == 1) || commit_flag) { struct aac_commit_config * dinfo; aac_fib_init(fibptr); dinfo = (struct aac_commit_config *) fib_data(fibptr); @@ -261,7 +264,7 @@ 1, 1, NULL, NULL); aac_fib_complete(fibptr); - } else if (commit == 0) { + } else if (aac_commit == 0) { printk(KERN_WARNING "aac_get_config_status: Foreign device configurations are being ignored\n"); } diff -ru a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h --- a/drivers/scsi/aacraid/aacraid.h 2007-05-16 10:29:25.697735367 -0400 +++ b/drivers/scsi/aacraid/aacraid.h 2007-05-16 10:37:33.538128354 -0400 @@ -1829,3 +1829,5 @@ extern int startup_timeout; extern int aif_timeout; extern int expose_physicals; +extern int aac_reset_devices; +extern int aac_commit; diff -ru a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c --- a/drivers/scsi/aacraid/rx.c 2007-05-16 10:29:25.699735113 -0400 +++ b/drivers/scsi/aacraid/rx.c 2007-05-16 10:37:33.539128223 -0400 @@ -488,6 +488,8 @@ return -EINVAL; if (rx_readl(dev, MUnit.OMRx[0]) & KERNEL_PANIC) return -ENODEV; + if (startup_timeout < 300) + startup_timeout = 300; return 0; } @@ -542,7 +544,7 @@ dev->a_ops.adapter_sync_cmd = rx_sync_cmd; dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt; dev->OIMR = status = rx_readb (dev, MUnit.OIMR); - if ((((status & 0x0c) != 0x0c) || reset_devices) && + if ((((status & 0x0c) != 0x0c) || aac_reset_devices || reset_devices) && !aac_rx_restart_adapter(dev, 0)) ++restart; /* @@ -594,6 +596,8 @@ } msleep(1); } + if (restart) + aac_commit = 1; /* * Fill in the common function dispatch table. */ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: kexec and aacraid broken 2007-05-30 11:44 ` Salyzyn, Mark @ 2007-05-30 13:24 ` Vivek Goyal 2007-05-30 13:57 ` Salyzyn, Mark 2007-05-30 21:22 ` Yinghai Lu 1 sibling, 1 reply; 28+ messages in thread From: Vivek Goyal @ 2007-05-30 13:24 UTC (permalink / raw) To: Salyzyn, Mark Cc: Andrew Morton, Yinghai Lu, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski On Wed, May 30, 2007 at 07:44:02AM -0400, Salyzyn, Mark wrote: > I believe this issue is a result of the aacraid_commit_reset patch (as > posted for scsi-misc-2.6, enclosed to permit testing) not yet propagated > to the 2.6.22-rc3 tree. > > This is the adapter taking longer than 3 minutes to start after a reset. > I seriously doubt either of these patches suggested below will have an > affect. And if they do, they are not root cause, one reduces the chances > that the card will be reset during initialization (thus applied would > likely mitigate this problem), the other prevents a panic when the > Adapter is reset (removed, would result in dogs and cats sleeping with > each other). > > Please use kernel parameter aacraid.startup_timeout=540 (merely larger > than the default 180 seconds) when spawning the kexec or see if the > aacraid_commit_reset.patch resolves the issue to confirm my hunch. > Hi Mark, During a normal kexec (not kdump) adapter reset should not have taken place at all. device_shutdown() routines should have taken care to bring the device to a known sane state in first kernel so that second kernel can initialize it without doing a reset. With reset patch, now reset triggers on every kexec. Previously that was not the case with kexec and adapter used to come up. I think this needs to be looked into. Thanks Vivek ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: kexec and aacraid broken 2007-05-30 13:24 ` Vivek Goyal @ 2007-05-30 13:57 ` Salyzyn, Mark 2007-05-30 14:17 ` Vivek Goyal 0 siblings, 1 reply; 28+ messages in thread From: Salyzyn, Mark @ 2007-05-30 13:57 UTC (permalink / raw) To: vgoyal Cc: Andrew Morton, Yinghai Lu, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski This is clouding the issue, Vivek. There should be no harm, except to time, resetting the adapter. I do want to optimize for boot time, but do not view this as a 'bug' if the Adapter should reset during the initialization procedure. We need instead to harden the driver to deal with Adapters that behave in an untimely manner as a result of the reset since this generically deals with all possible transitions (boot w/o BIOS, w/BIOS, kexec and kdump). I will look into a possibility the driver is not performing the clean shutdown as a result of a kexec, but that is a refinement and should not be considered a fix for *this* reported problem; it merely moves the problem to a kdump. The driver only disables the interrupts when the driver is .remove'd (aac_remove_one) and not for .shutdown (aac_shutdown). The later merely tells the firmware to stop performing builds if in progress, flush the cache, and all subsequent writes are performed in write-through mode; it does not clear out the driver resources and leaves that to the .remove function only. The failure of .remove being called may be a result of this being a boot driver? Also, the code: dev->OIMR = status = rx_readb (dev, MUnit.OIMR); if ((((status & 0x0c) != 0x0c) . . . detects if the adapter's interrupts were disabled, as would happen on a clean shutdown. Some of the Adapters can NOT disable their interrupts, and some have a default state with the interrupts enabled. If the Adapter still has active interrupts, then there is no telling what transpired before and it is considered a safety measure to reset the Adapter in these cases. I'd prefer to err on the side of resetting the Adapter superfluously than deal with a condition where the Adapter could be in an unknown state with a possibility of sustaining an outstanding command and associated interrupt (which was the whole reason this code was introduced). In time I am sure, I will refine this code to incorporate Quirks for adapters that have unusual conditions for the above stated interrupt and remove the possible superfluous reset. Yinghai, can you please provide the Adapter designation just in case it could be the first in this refined list. I will NOT consider this refinement a bugfix for the same reasons stated above. Sincerely -- Mark Salyzyn > -----Original Message----- > From: Vivek Goyal [mailto:vgoyal@in.ibm.com] > Sent: Wednesday, May 30, 2007 9:25 AM > To: Salyzyn, Mark > Cc: Andrew Morton; Yinghai Lu; Eric W. Biederman; Linux > Kernel Mailing List; linux-scsi@vger.kernel.org; Michal Piotrowski > Subject: Re: kexec and aacraid broken > > > On Wed, May 30, 2007 at 07:44:02AM -0400, Salyzyn, Mark wrote: > > I believe this issue is a result of the > aacraid_commit_reset patch (as > > posted for scsi-misc-2.6, enclosed to permit testing) not > yet propagated > > to the 2.6.22-rc3 tree. > > > > This is the adapter taking longer than 3 minutes to start > after a reset. > > I seriously doubt either of these patches suggested below > will have an > > affect. And if they do, they are not root cause, one > reduces the chances > > that the card will be reset during initialization (thus > applied would > > likely mitigate this problem), the other prevents a panic when the > > Adapter is reset (removed, would result in dogs and cats > sleeping with > > each other). > > > > Please use kernel parameter aacraid.startup_timeout=540 > (merely larger > > than the default 180 seconds) when spawning the kexec or see if the > > aacraid_commit_reset.patch resolves the issue to confirm my hunch. > > > > Hi Mark, > > During a normal kexec (not kdump) adapter reset should not have taken > place at all. device_shutdown() routines should have taken care to > bring the device to a known sane state in first kernel so that second > kernel can initialize it without doing a reset. > > With reset patch, now reset triggers on every kexec. Previously > that was not the case with kexec and adapter used to come up. I think > this needs to be looked into. > > Thanks > Vivek > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: kexec and aacraid broken 2007-05-30 13:57 ` Salyzyn, Mark @ 2007-05-30 14:17 ` Vivek Goyal 2007-05-30 14:30 ` Salyzyn, Mark 0 siblings, 1 reply; 28+ messages in thread From: Vivek Goyal @ 2007-05-30 14:17 UTC (permalink / raw) To: Salyzyn, Mark Cc: Andrew Morton, Yinghai Lu, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski On Wed, May 30, 2007 at 09:57:08AM -0400, Salyzyn, Mark wrote: > This is clouding the issue, Vivek. > > There should be no harm, except to time, resetting the adapter. I do > want to optimize for boot time, but do not view this as a 'bug' if the > Adapter should reset during the initialization procedure. We need > instead to harden the driver to deal with Adapters that behave in an > untimely manner as a result of the reset since this generically deals > with all possible transitions (boot w/o BIOS, w/BIOS, kexec and kdump). > Hi Mark, I agree. We should make sure that we should be able to do a software reset of adapters. > I will look into a possibility the driver is not performing the clean > shutdown as a result of a kexec, but that is a refinement and should not > be considered a fix for *this* reported problem; it merely moves the > problem to a kdump. Agreed. I just wanted to bring out this point that right now we are triggering software reset on every kexec and probably that is not required. One can avoid it to save boot time. That was the whole purpose of kexec (fastboot) project. But this is not a fix for this problem. We should any way be able to reset the device and should root cause this. > The driver only disables the interrupts when the > driver is .remove'd (aac_remove_one) and not for .shutdown > (aac_shutdown). The later merely tells the firmware to stop performing > builds if in progress, flush the cache, and all subsequent writes are > performed in write-through mode; it does not clear out the driver > resources and leaves that to the .remove function only. The failure of > .remove being called may be a result of this being a boot driver? > > Also, the code: > > dev->OIMR = status = rx_readb (dev, MUnit.OIMR); > if ((((status & 0x0c) != 0x0c) . . . > > detects if the adapter's interrupts were disabled, as would happen on a > clean shutdown. Some of the Adapters can NOT disable their interrupts, > and some have a default state with the interrupts enabled. If the > Adapter still has active interrupts, then there is no telling what > transpired before and it is considered a safety measure to reset the > Adapter in these cases. I'd prefer to err on the side of resetting the > Adapter superfluously than deal with a condition where the Adapter could > be in an unknown state with a possibility of sustaining an outstanding > command and associated interrupt (which was the whole reason this code > was introduced). > So most likely if we start disabling the interrupts in .shutdown routine we might skip resetting adapter on every kexec without any side affects? Thanks Vivek ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: kexec and aacraid broken 2007-05-30 14:17 ` Vivek Goyal @ 2007-05-30 14:30 ` Salyzyn, Mark 2007-05-30 15:59 ` [PATCH] aacraid: fix shutdown handler to also disable interrupts Salyzyn, Mark 2007-05-30 21:19 ` kexec and aacraid broken Yinghai Lu 0 siblings, 2 replies; 28+ messages in thread From: Salyzyn, Mark @ 2007-05-30 14:30 UTC (permalink / raw) To: vgoyal Cc: Andrew Morton, Yinghai Lu, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski Vivek Goyal [mailto:vgoyal@in.ibm.com] writes: > So most likely if we start disabling the interrupts > in .shutdown routine we might skip resetting adapter > on every kexec without any side affects? Not that simple. The .shutdown would need to perform more resource cleanups of the .remove call to prevent side effects. I need to move some of the .remove activity into the .shutdown handler to make sure the adapter is quiesced. I will hold off on submitting any of these changes until they are evaluated and tested; I am waiting for feedback from Yinghai on the other mitigations that I feel are closer to the root cause. Sincerely -- Mark Salyzyn ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] aacraid: fix shutdown handler to also disable interrupts. 2007-05-30 14:30 ` Salyzyn, Mark @ 2007-05-30 15:59 ` Salyzyn, Mark 2007-05-30 17:36 ` Yinghai Lu ` (2 more replies) 2007-05-30 21:19 ` kexec and aacraid broken Yinghai Lu 1 sibling, 3 replies; 28+ messages in thread From: Salyzyn, Mark @ 2007-05-30 15:59 UTC (permalink / raw) To: linux-scsi Cc: vgoyal, Andrew Morton, Yinghai Lu, Eric W. Biederman, Michal Piotrowski, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 2022 bytes --] Moves quiesce, thread and interrupt shutdown into aacraid drivers' .shutdown handler. This fix to the aac_shutdown handler will remove the superfluous reset of the adapter during a (clean) kexec. This fix may mitigate the active investigation 'kexec and aacraid broken' but it is unlikely to affect the root cause (issue likely present in both kexec and kdump). This patch reduces the chance the problem will occur with a kexec. The fix for root cause is currently expected to be the minimum value check to the aacraid.startup_timeout driver variable after an adapter reset within aacraid_commit_reset.patch submitted on 05/22/2007 and awaiting testing by Yinghai to confirm. This attached patch is against current scsi-misc-2.6 ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> Sincerely -- Mark Salyzyn > -----Original Message----- > From: linux-scsi-owner@vger.kernel.org > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Salyzyn, Mark > Sent: Wednesday, May 30, 2007 10:31 AM > To: vgoyal@in.ibm.com > Cc: Andrew Morton; Yinghai Lu; Eric W. Biederman; Linux > Kernel Mailing List; linux-scsi@vger.kernel.org; Michal Piotrowski > Subject: RE: kexec and aacraid broken > > Vivek Goyal [mailto:vgoyal@in.ibm.com] writes: > > So most likely if we start disabling the interrupts > > in .shutdown routine we might skip resetting adapter > > on every kexec without any side affects? > > Not that simple. The .shutdown would need to perform more resource > cleanups of the .remove call to prevent side effects. I need to move > some of the .remove activity into the .shutdown handler to > make sure the > adapter is quiesced. > > I will hold off on submitting any of these changes until they are > evaluated and tested; I am waiting for feedback from Yinghai on the > other mitigations that I feel are closer to the root cause. > > Sincerely -- Mark Salyzyn [-- Attachment #2: aacraid_shutdown.patch --] [-- Type: application/octet-stream, Size: 1524 bytes --] diff -ru a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c --- a/drivers/scsi/aacraid/linit.c 2007-05-30 11:00:36.619831521 -0400 +++ b/drivers/scsi/aacraid/linit.c 2007-05-30 11:04:35.325867212 -0400 @@ -859,6 +859,14 @@ .emulated = 1, }; +static void __aac_shutdown(struct aac_dev * aac) +{ + kthread_stop(aac->thread); + aac_send_shutdown(aac); + aac_adapter_disable_int(aac); + free_irq(aac->pdev->irq, aac); +} + static int __devinit aac_probe_one(struct pci_dev *pdev, const struct pci_device_id *id) { @@ -1011,10 +1019,7 @@ return 0; out_deinit: - kthread_stop(aac->thread); - aac_send_shutdown(aac); - aac_adapter_disable_int(aac); - free_irq(pdev->irq, aac); + __aac_shutdown(aac); out_unmap: aac_fib_map_free(aac); pci_free_consistent(aac->pdev, aac->comm_size, aac->comm_addr, aac->comm_phys); @@ -1034,7 +1039,8 @@ { struct Scsi_Host *shost = pci_get_drvdata(dev); struct aac_dev *aac = (struct aac_dev *)shost->hostdata; - aac_send_shutdown(aac); + scsi_block_requests(shost); + __aac_shutdown(aac); } static void __devexit aac_remove_one(struct pci_dev *pdev) @@ -1044,16 +1050,12 @@ scsi_remove_host(shost); - kthread_stop(aac->thread); - - aac_send_shutdown(aac); - aac_adapter_disable_int(aac); + __aac_shutdown(aac); aac_fib_map_free(aac); pci_free_consistent(aac->pdev, aac->comm_size, aac->comm_addr, aac->comm_phys); kfree(aac->queues); - free_irq(pdev->irq, aac); aac_adapter_ioremap(aac, 0); kfree(aac->fibs); ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] aacraid: fix shutdown handler to also disable interrupts. 2007-05-30 15:59 ` [PATCH] aacraid: fix shutdown handler to also disable interrupts Salyzyn, Mark @ 2007-05-30 17:36 ` Yinghai Lu 2007-06-01 11:08 ` Vivek Goyal 2007-06-07 17:21 ` [PATCH] aacraid: add SCSI SYNCHONIZE_CACHE range checking Salyzyn, Mark 2 siblings, 0 replies; 28+ messages in thread From: Yinghai Lu @ 2007-05-30 17:36 UTC (permalink / raw) To: Salyzyn, Mark Cc: linux-scsi, vgoyal, Andrew Morton, Eric W. Biederman, Michal Piotrowski, Linux Kernel Mailing List On 5/30/07, Salyzyn, Mark <mark_salyzyn@adaptec.com> wrote: > Moves quiesce, thread and interrupt shutdown into aacraid drivers' > .shutdown handler. This fix to the aac_shutdown handler will remove the > superfluous reset of the adapter during a (clean) kexec. > > This fix may mitigate the active investigation 'kexec and aacraid > broken' but it is unlikely to affect the root cause (issue likely > present in both kexec and kdump). This patch reduces the chance the > problem will occur with a kexec. The fix for root cause is currently > expected to be the minimum value check to the aacraid.startup_timeout > driver variable after an adapter reset within aacraid_commit_reset.patch > submitted on 05/22/2007 and awaiting testing by Yinghai to confirm. > > This attached patch is against current scsi-misc-2.6 > > ObligatoryDisclaimer: Please accept my condolences regarding Outlook's > handling of patch attachments. > > Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> > > Sincerely -- Mark Salyzyn > the kernel with this patch -4 and even without 1. [SCSI] aacraid: superfluous adapter reset for IBM 8 series ServeRAID controllers 2. [SCSI] aacraid: kexec fix (reset interrupt handler) 3. aacraid_commit_reset.patch can load other kernel with or without patch 1,2,3 YH ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] aacraid: fix shutdown handler to also disable interrupts. 2007-05-30 15:59 ` [PATCH] aacraid: fix shutdown handler to also disable interrupts Salyzyn, Mark 2007-05-30 17:36 ` Yinghai Lu @ 2007-06-01 11:08 ` Vivek Goyal 2007-06-01 17:07 ` Yinghai Lu 2007-06-07 17:21 ` [PATCH] aacraid: add SCSI SYNCHONIZE_CACHE range checking Salyzyn, Mark 2 siblings, 1 reply; 28+ messages in thread From: Vivek Goyal @ 2007-06-01 11:08 UTC (permalink / raw) To: Salyzyn, Mark Cc: linux-scsi, Andrew Morton, Yinghai Lu, Eric W. Biederman, Michal Piotrowski, Linux Kernel Mailing List On Wed, May 30, 2007 at 11:59:13AM -0400, Salyzyn, Mark wrote: > Moves quiesce, thread and interrupt shutdown into aacraid drivers' > .shutdown handler. This fix to the aac_shutdown handler will remove the > superfluous reset of the adapter during a (clean) kexec. > > This fix may mitigate the active investigation 'kexec and aacraid > broken' but it is unlikely to affect the root cause (issue likely > present in both kexec and kdump). This patch reduces the chance the > problem will occur with a kexec. The fix for root cause is currently > expected to be the minimum value check to the aacraid.startup_timeout > driver variable after an adapter reset within aacraid_commit_reset.patch > submitted on 05/22/2007 and awaiting testing by Yinghai to confirm. > > This attached patch is against current scsi-misc-2.6 > > ObligatoryDisclaimer: Please accept my condolences regarding Outlook's > handling of patch attachments. > > Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> > Thanks Mark. This does fix the issue of unnecessary reset of aacraid adapter over kexec on my machine. Thanks Vivek ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] aacraid: fix shutdown handler to also disable interrupts. 2007-06-01 11:08 ` Vivek Goyal @ 2007-06-01 17:07 ` Yinghai Lu 2007-06-01 17:34 ` Salyzyn, Mark 0 siblings, 1 reply; 28+ messages in thread From: Yinghai Lu @ 2007-06-01 17:07 UTC (permalink / raw) To: vgoyal Cc: Salyzyn, Mark, linux-scsi, Andrew Morton, Eric W. Biederman, Michal Piotrowski, Linux Kernel Mailing List On 6/1/07, Vivek Goyal <vgoyal@in.ibm.com> wrote: > On Wed, May 30, 2007 at 11:59:13AM -0400, Salyzyn, Mark wrote: > > Moves quiesce, thread and interrupt shutdown into aacraid drivers' > > .shutdown handler. This fix to the aac_shutdown handler will remove the > > superfluous reset of the adapter during a (clean) kexec. > > > > This fix may mitigate the active investigation 'kexec and aacraid > > broken' but it is unlikely to affect the root cause (issue likely > > present in both kexec and kdump). This patch reduces the chance the > > problem will occur with a kexec. The fix for root cause is currently > > expected to be the minimum value check to the aacraid.startup_timeout > > driver variable after an adapter reset within aacraid_commit_reset.patch > > submitted on 05/22/2007 and awaiting testing by Yinghai to confirm. > > > > This attached patch is against current scsi-misc-2.6 > > > > ObligatoryDisclaimer: Please accept my condolences regarding Outlook's > > handling of patch attachments. > > > > Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> > > > > Thanks Mark. This does fix the issue of unnecessary reset of aacraid > adapter over kexec on my machine. > i'm little confused about that. this patch is some clear shutdown, so even next start will have tight condition will not try to reset the adapter fw. right Mark? Maybe the driver could be smart to find out if it need to reset adaptec fw. YH ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH] aacraid: fix shutdown handler to also disable interrupts. 2007-06-01 17:07 ` Yinghai Lu @ 2007-06-01 17:34 ` Salyzyn, Mark 0 siblings, 0 replies; 28+ messages in thread From: Salyzyn, Mark @ 2007-06-01 17:34 UTC (permalink / raw) To: Yinghai Lu, vgoyal Cc: linux-scsi, Andrew Morton, Eric W. Biederman, Michal Piotrowski, Linux Kernel Mailing List Yes, this patch makes sure that the Adapter is shut down correctly, and thus when the kexec driver loads, it does not automatically reset the adapter during initialization. This regression was a result of adding code to the driver to detect if the adapter needed a reset as a result of an unclean shutdown in order to deal with an issue that came up with kdump. Kdump does not issue a clean shutdown. As you see, it was the process of making the driver smarter to find out if it needed to reset the adaptec fw that triggered the problem. As noted before, please be advised to go through SUN channels. Upgrade your Drive(s), SES, Motherboard and Card Firmware to the latest versions; and make sure you are using compatible drives and drive bays to see if this problem dealing with the superfluous reset on your pre-release system goes away. You will be able to trigger this by trying to perform a kdump on the system, OR by reverting this patch and running your kexec test. The superfluous reset has yet to cause an issue with a released card beyond noticing a superfluous Firmware reset as Vivek has pointed out. Sincerely -- Mark Salyzyn From: Yinghai Lu [mailto:yhlu.kernel@gmail.com] sez: > On 6/1/07, Vivek Goyal <vgoyal@in.ibm.com> wrote: > > Thanks Mark. This does fix the issue of unnecessary reset of aacraid > > adapter over kexec on my machine. > i'm little confused about that. > this patch is some clear shutdown, so even next start will have tight > condition will not try to reset the adapter fw. right Mark? > Maybe the driver could be smart to find out if it need to > reset adaptec fw. > > YH ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] aacraid: add SCSI SYNCHONIZE_CACHE range checking. 2007-05-30 15:59 ` [PATCH] aacraid: fix shutdown handler to also disable interrupts Salyzyn, Mark 2007-05-30 17:36 ` Yinghai Lu 2007-06-01 11:08 ` Vivek Goyal @ 2007-06-07 17:21 ` Salyzyn, Mark 2007-06-11 20:17 ` [PATCH] aacraid: probe related code cleanup Salyzyn, Mark 2007-06-20 15:30 ` [PATCH] aacraid: add SCSI SYNCHONIZE_CACHE range checking (take 2) Salyzyn, Mark 2 siblings, 2 replies; 28+ messages in thread From: Salyzyn, Mark @ 2007-06-07 17:21 UTC (permalink / raw) To: linux-scsi [-- Attachment #1: Type: text/plain, Size: 977 bytes --] Customer running an application that issues SYNCHRONIZE_CACHE calls directly noticed the broad stroke of the current implementation in the aacraid driver resulting in multiple applications feeding I/O to the storage causing the issuing application to stall for long periods of time. By only waiting for the current WRITE commands, rather than all commands, to complete; and those that are in range of the SYNCHRONIZE_CACHE call that would associate more tightly with the issuing application before telling the Firmware to flush it's dirty cache, we managed to reduce the stalling. The Firmware itself still flushes all the dirty cache associated with the array ignoring the range, it just does so in a more timely manner. This attached patch is against current scsi-misc-2.6 ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> Sincerely -- Mark Salyzyn [-- Attachment #2: aacraid_synch_range.patch --] [-- Type: application/octet-stream, Size: 2510 bytes --] diff -ru a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c --- a/drivers/scsi/aacraid/aachba.c 2007-06-07 12:52:44.951750334 -0400 +++ b/drivers/scsi/aacraid/aachba.c 2007-06-07 13:04:34.564189741 -0400 @@ -1587,7 +1587,7 @@ COMMAND_COMPLETE << 8 | SAM_STAT_GOOD; else { struct scsi_device *sdev = cmd->device; - struct aac_dev *dev = (struct aac_dev *)sdev->host->hostdata; + struct aac_dev *dev = fibptr->dev; u32 cid = sdev_id(sdev); printk(KERN_WARNING "synchronize_callback: synchronize failed, status = %d\n", @@ -1618,6 +1618,9 @@ struct scsi_device *sdev = scsicmd->device; int active = 0; struct aac_dev *aac; + u64 lba = ((u64)scsicmd->cmnd[2] << 24) | (scsicmd->cmnd[3] << 16) | + (scsicmd->cmnd[4] << 8) | scsicmd->cmnd[5]; + u32 count = (scsicmd->cmnd[7] << 8) | scsicmd->cmnd[8]; unsigned long flags; /* @@ -1626,11 +1629,54 @@ */ spin_lock_irqsave(&sdev->list_lock, flags); list_for_each_entry(cmd, &sdev->cmd_list, list) - if (cmd != scsicmd && cmd->SCp.phase == AAC_OWNER_FIRMWARE) { + if (cmd->SCp.phase == AAC_OWNER_FIRMWARE) { + u64 cmnd_lba; + u32 cmnd_count; + + if (cmd->cmnd[0] == WRITE_6) { + cmnd_lba = ((cmd->cmnd[1] & 0x1F) << 16) | + (cmd->cmnd[2] << 8) | + cmd->cmnd[3]; + cmnd_count = cmd->cmnd[4]; + if (cmnd_count == 0) + cmnd_count = 256; + } else if (cmd->cmnd[0] == WRITE_16) { + cmnd_lba = ((u64)cmd->cmnd[2] << 56) | + ((u64)cmd->cmnd[3] << 48) | + ((u64)cmd->cmnd[4] << 40) | + ((u64)cmd->cmnd[5] << 32) | + ((u64)cmd->cmnd[6] << 24) | + (cmd->cmnd[7] << 16) | + (cmd->cmnd[8] << 8) | + cmd->cmnd[9]; + cmnd_count = (cmd->cmnd[10] << 24) | + (cmd->cmnd[11] << 16) | + (cmd->cmnd[12] << 8) | + cmd->cmnd[13]; + } else if (cmd->cmnd[0] == WRITE_12) { + cmnd_lba = ((u64)cmd->cmnd[2] << 24) | + (cmd->cmnd[3] << 16) | + (cmd->cmnd[4] << 8) | + cmd->cmnd[5]; + cmnd_count = (cmd->cmnd[6] << 24) | + (cmd->cmnd[7] << 16) | + (cmd->cmnd[8] << 8) | + cmd->cmnd[9]; + } else if (cmd->cmnd[0] == WRITE_10) { + cmnd_lba = ((u64)cmd->cmnd[2] << 24) | + (cmd->cmnd[3] << 16) | + (cmd->cmnd[4] << 8) | + cmd->cmnd[5]; + cmnd_count = (cmd->cmnd[7] << 8) | + cmd->cmnd[8]; + } else + continue; + if (((cmnd_lba + cmnd_count) < lba) || + (count && ((lba + count) < cmnd_lba))) + continue; ++active; break; } - spin_unlock_irqrestore(&sdev->list_lock, flags); /* ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] aacraid: probe related code cleanup 2007-06-07 17:21 ` [PATCH] aacraid: add SCSI SYNCHONIZE_CACHE range checking Salyzyn, Mark @ 2007-06-11 20:17 ` Salyzyn, Mark 2007-06-20 15:30 ` [PATCH] aacraid: add SCSI SYNCHONIZE_CACHE range checking (take 2) Salyzyn, Mark 1 sibling, 0 replies; 28+ messages in thread From: Salyzyn, Mark @ 2007-06-11 20:17 UTC (permalink / raw) To: linux-scsi [-- Attachment #1: Type: text/plain, Size: 1197 bytes --] Sundry cleanups: 1) Use kzalloc instead of kmalloc. 2) Make sure probe worked before recalling the SCSI command to finalize processing. 3) _aac_probe_container2 and _aac_probe_container1 return value goes unused, change return to void. 4) Use a lower depth pointer reference to pick up the driver instance variable. 5) Although effectively unused except to fake for scsicmd validity, set the scsi_done in probe code to aac_probe_container_callback1 instead of the less valid dummy reference to _aac_probe_container1. 6) SCp.phase is set in aac_valid_context, drop setting up this value in caller when unnecessary. 7) take container target id at the beginning, rather than referencing scmd_id() to pick it up. There should be no side effects or functionality changes. This attached patch is against current scsi-misc-2.6, scsi-rc-fixes-2.6 & scsi-pending-2.6 ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> drivers/scsi/aacraid/aachba.c | 64 ++++++++++++++++++++++++++++++--------------------------------- 1 file changed, 31 insertions(+), 33 deletions(-) [-- Attachment #2: aacraid_probe_cleanup.patch --] [-- Type: application/octet-stream, Size: 6423 bytes --] diff -ru a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c --- a/drivers/scsi/aacraid/aachba.c 2007-06-11 15:15:38.908828462 -0400 +++ b/drivers/scsi/aacraid/aachba.c 2007-06-11 15:51:30.352826577 -0400 @@ -312,11 +312,10 @@ if (maximum_num_containers < MAXIMUM_NUM_CONTAINERS) maximum_num_containers = MAXIMUM_NUM_CONTAINERS; - fsa_dev_ptr = kmalloc(sizeof(*fsa_dev_ptr) * maximum_num_containers, + fsa_dev_ptr = kzalloc(sizeof(*fsa_dev_ptr) * maximum_num_containers, GFP_KERNEL); if (!fsa_dev_ptr) return -ENOMEM; - memset(fsa_dev_ptr, 0, sizeof(*fsa_dev_ptr) * maximum_num_containers); dev->fsa_dev = fsa_dev_ptr; dev->maximum_num_containers = maximum_num_containers; @@ -446,7 +445,7 @@ { struct fsa_dev_info *fsa_dev_ptr = ((struct aac_dev *)(scsicmd->device->host->hostdata))->fsa_dev; - if (fsa_dev_ptr[scmd_id(scsicmd)].valid) + if ((fsa_dev_ptr[scmd_id(scsicmd)].valid & 1)) return aac_scsi_cmd(scsicmd); scsicmd->result = DID_NO_CONNECT << 16; @@ -454,18 +453,18 @@ return 0; } -static int _aac_probe_container2(void * context, struct fib * fibptr) +static void _aac_probe_container2(void * context, struct fib * fibptr) { struct fsa_dev_info *fsa_dev_ptr; int (*callback)(struct scsi_cmnd *); struct scsi_cmnd * scsicmd = (struct scsi_cmnd *)context; - if (!aac_valid_context(scsicmd, fibptr)) - return 0; - fsa_dev_ptr = ((struct aac_dev *)(scsicmd->device->host->hostdata))->fsa_dev; + if (!aac_valid_context(scsicmd, fibptr)) + return; scsicmd->SCp.Status = 0; + fsa_dev_ptr = fibptr->dev->fsa_dev; if (fsa_dev_ptr) { struct aac_mount * dresp = (struct aac_mount *) fib_data(fibptr); fsa_dev_ptr += scmd_id(scsicmd); @@ -488,10 +487,11 @@ aac_fib_free(fibptr); callback = (int (*)(struct scsi_cmnd *))(scsicmd->SCp.ptr); scsicmd->SCp.ptr = NULL; - return (*callback)(scsicmd); + (*callback)(scsicmd); + return; } -static int _aac_probe_container1(void * context, struct fib * fibptr) +static void _aac_probe_container1(void * context, struct fib * fibptr) { struct scsi_cmnd * scsicmd; struct aac_mount * dresp; @@ -501,13 +501,14 @@ dresp = (struct aac_mount *) fib_data(fibptr); dresp->mnt[0].capacityhigh = 0; if ((le32_to_cpu(dresp->status) != ST_OK) || - (le32_to_cpu(dresp->mnt[0].vol) != CT_NONE)) - return _aac_probe_container2(context, fibptr); + (le32_to_cpu(dresp->mnt[0].vol) != CT_NONE)) { + _aac_probe_container2(context, fibptr); + return; + } scsicmd = (struct scsi_cmnd *) context; - scsicmd->SCp.phase = AAC_OWNER_MIDLEVEL; if (!aac_valid_context(scsicmd, fibptr)) - return 0; + return; aac_fib_init(fibptr); @@ -522,21 +523,18 @@ sizeof(struct aac_query_mount), FsaNormal, 0, 1, - (fib_callback) _aac_probe_container2, + _aac_probe_container2, (void *) scsicmd); /* * Check that the command queued to the controller */ - if (status == -EINPROGRESS) { + if (status == -EINPROGRESS) scsicmd->SCp.phase = AAC_OWNER_FIRMWARE; - return 0; - } - if (status < 0) { + else if (status < 0) { /* Inherit results from VM_NameServe, if any */ dresp->status = cpu_to_le32(ST_OK); - return _aac_probe_container2(context, fibptr); + _aac_probe_container2(context, fibptr); } - return 0; } static int _aac_probe_container(struct scsi_cmnd * scsicmd, int (*callback)(struct scsi_cmnd *)) @@ -561,7 +559,7 @@ sizeof(struct aac_query_mount), FsaNormal, 0, 1, - (fib_callback) _aac_probe_container1, + _aac_probe_container1, (void *) scsicmd); /* * Check that the command queued to the controller @@ -615,7 +613,7 @@ return -ENOMEM; } scsicmd->list.next = NULL; - scsicmd->scsi_done = (void (*)(struct scsi_cmnd*))_aac_probe_container1; + scsicmd->scsi_done = (void (*)(struct scsi_cmnd*))aac_probe_container_callback1; scsicmd->device = scsidev; scsidev->sdev_state = 0; @@ -1329,7 +1327,7 @@ if (!aac_valid_context(scsicmd, fibptr)) return; - dev = (struct aac_dev *)scsicmd->device->host->hostdata; + dev = fibptr->dev; cid = scmd_id(scsicmd); if (nblank(dprintk(x))) { @@ -1587,7 +1585,7 @@ COMMAND_COMPLETE << 8 | SAM_STAT_GOOD; else { struct scsi_device *sdev = cmd->device; - struct aac_dev *dev = (struct aac_dev *)sdev->host->hostdata; + struct aac_dev *dev = fibptr->dev; u32 cid = sdev_id(sdev); printk(KERN_WARNING "synchronize_callback: synchronize failed, status = %d\n", @@ -1694,7 +1692,7 @@ int aac_scsi_cmd(struct scsi_cmnd * scsicmd) { - u32 cid = 0; + u32 cid; struct Scsi_Host *host = scsicmd->device->host; struct aac_dev *dev = (struct aac_dev *)host->hostdata; struct fsa_dev_info *fsa_dev_ptr = dev->fsa_dev; @@ -1706,15 +1704,15 @@ * Test does not apply to ID 16, the pseudo id for the controller * itself. */ - if (scmd_id(scsicmd) != host->this_id) { - if ((scmd_channel(scsicmd) == CONTAINER_CHANNEL)) { - if((scmd_id(scsicmd) >= dev->maximum_num_containers) || + cid = scmd_id(scsicmd); + if (cid != host->this_id) { + if (scmd_channel(scsicmd) == CONTAINER_CHANNEL) { + if((cid >= dev->maximum_num_containers) || (scsicmd->device->lun != 0)) { scsicmd->result = DID_NO_CONNECT << 16; scsicmd->scsi_done(scsicmd); return 0; } - cid = scmd_id(scsicmd); /* * If the target container doesn't exist, it may have @@ -1777,7 +1775,7 @@ { struct inquiry_data inq_data; - dprintk((KERN_DEBUG "INQUIRY command, ID: %d.\n", scmd_id(scsicmd))); + dprintk((KERN_DEBUG "INQUIRY command, ID: %d.\n", cid)); memset(&inq_data, 0, sizeof (struct inquiry_data)); inq_data.inqd_ver = 2; /* claim compliance to SCSI-2 */ @@ -1789,7 +1787,7 @@ * Set the Vendor, Product, and Revision Level * see: <vendor>.c i.e. aac.c */ - if (scmd_id(scsicmd) == host->this_id) { + if (cid == host->this_id) { setinqstr(dev, (void *) (inq_data.inqd_vid), ARRAY_SIZE(container_types)); inq_data.inqd_pdt = INQD_PDT_PROC; /* Processor device */ aac_internal_transfer(scsicmd, &inq_data, 0, sizeof(inq_data)); @@ -2160,10 +2158,10 @@ if (!aac_valid_context(scsicmd, fibptr)) return; - dev = (struct aac_dev *)scsicmd->device->host->hostdata; - BUG_ON(fibptr == NULL); + dev = fibptr->dev; + srbreply = (struct aac_srb_reply *) fib_data(fibptr); scsicmd->sense_buffer[0] = '\0'; /* Initialize sense valid flag to false */ ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] aacraid: add SCSI SYNCHONIZE_CACHE range checking (take 2) 2007-06-07 17:21 ` [PATCH] aacraid: add SCSI SYNCHONIZE_CACHE range checking Salyzyn, Mark 2007-06-11 20:17 ` [PATCH] aacraid: probe related code cleanup Salyzyn, Mark @ 2007-06-20 15:30 ` Salyzyn, Mark 2007-07-09 13:57 ` [PATCH] aacraid: add 51245, 51645 and 52245 adapters to documentation Salyzyn, Mark 1 sibling, 1 reply; 28+ messages in thread From: Salyzyn, Mark @ 2007-06-20 15:30 UTC (permalink / raw) To: linux-scsi [-- Attachment #1: Type: text/plain, Size: 1852 bytes --] There was some overlap with another patch (?) this one has not shown in scsi-pending-2.6. Modernized to apply cleanly and did some extra cleanup. This attached patch is against current scsi-misc-2.6 ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> drivers/scsi/aacraid/aachba.c | 63 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 55 insertions(+), 8 deletions(-) Sincerely -- Mark Salyzyn > -----Original Message----- > From: linux-scsi-owner@vger.kernel.org > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Salyzyn, Mark > Sent: Thursday, June 07, 2007 1:21 PM > To: linux-scsi@vger.kernel.org > Subject: [PATCH] aacraid: add SCSI SYNCHONIZE_CACHE range checking. > > > Customer running an application that issues SYNCHRONIZE_CACHE calls > directly noticed the broad stroke of the current implementation in the > aacraid driver resulting in multiple applications feeding I/O to the > storage causing the issuing application to stall for long periods of > time. By only waiting for the current WRITE commands, rather than all > commands, to complete; and those that are in range of the > SYNCHRONIZE_CACHE call that would associate more tightly with the > issuing application before telling the Firmware to flush it's dirty > cache, we managed to reduce the stalling. The Firmware itself still > flushes all the dirty cache associated with the array ignoring the > range, it just does so in a more timely manner. > > This attached patch is against current scsi-misc-2.6 > > ObligatoryDisclaimer: Please accept my condolences regarding Outlook's > handling of patch attachments. > > Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> > > Sincerely -- Mark Salyzyn > [-- Attachment #2: aacraid_synch_range2.patch --] [-- Type: application/octet-stream, Size: 3893 bytes --] diff -ru a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c --- a/drivers/scsi/aacraid/aachba.c 2007-06-20 11:05:47.673609233 -0400 +++ b/drivers/scsi/aacraid/aachba.c 2007-06-20 11:21:33.655053285 -0400 @@ -1595,23 +1595,23 @@ if (!aac_valid_context(cmd, fibptr)) return; - dprintk((KERN_DEBUG "synchronize_callback[cpu %d]: t = %ld.\n", + dprintk((KERN_DEBUG "synchronize_callback[cpu %d]: t = %ld.\n", smp_processor_id(), jiffies)); BUG_ON(fibptr == NULL); synchronizereply = fib_data(fibptr); if (le32_to_cpu(synchronizereply->status) == CT_OK) - cmd->result = DID_OK << 16 | + cmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8 | SAM_STAT_GOOD; else { struct scsi_device *sdev = cmd->device; struct aac_dev *dev = fibptr->dev; u32 cid = sdev_id(sdev); - printk(KERN_WARNING + printk(KERN_WARNING "synchronize_callback: synchronize failed, status = %d\n", le32_to_cpu(synchronizereply->status)); - cmd->result = DID_OK << 16 | + cmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8 | SAM_STAT_CHECK_CONDITION; set_sense((u8 *)&dev->fsa_dev[cid].sense_data, HARDWARE_ERROR, @@ -1619,7 +1619,7 @@ ASENCODE_INTERNAL_TARGET_FAILURE, 0, 0, 0, 0); memcpy(cmd->sense_buffer, &dev->fsa_dev[cid].sense_data, - min(sizeof(dev->fsa_dev[cid].sense_data), + min(sizeof(dev->fsa_dev[cid].sense_data), sizeof(cmd->sense_buffer))); } @@ -1637,6 +1637,9 @@ struct scsi_device *sdev = scsicmd->device; int active = 0; struct aac_dev *aac; + u64 lba = ((u64)scsicmd->cmnd[2] << 24) | (scsicmd->cmnd[3] << 16) | + (scsicmd->cmnd[4] << 8) | scsicmd->cmnd[5]; + u32 count = (scsicmd->cmnd[7] << 8) | scsicmd->cmnd[8]; unsigned long flags; /* @@ -1645,7 +1648,51 @@ */ spin_lock_irqsave(&sdev->list_lock, flags); list_for_each_entry(cmd, &sdev->cmd_list, list) - if (cmd != scsicmd && cmd->SCp.phase == AAC_OWNER_FIRMWARE) { + if (cmd->SCp.phase == AAC_OWNER_FIRMWARE) { + u64 cmnd_lba; + u32 cmnd_count; + + if (cmd->cmnd[0] == WRITE_6) { + cmnd_lba = ((cmd->cmnd[1] & 0x1F) << 16) | + (cmd->cmnd[2] << 8) | + cmd->cmnd[3]; + cmnd_count = cmd->cmnd[4]; + if (cmnd_count == 0) + cmnd_count = 256; + } else if (cmd->cmnd[0] == WRITE_16) { + cmnd_lba = ((u64)cmd->cmnd[2] << 56) | + ((u64)cmd->cmnd[3] << 48) | + ((u64)cmd->cmnd[4] << 40) | + ((u64)cmd->cmnd[5] << 32) | + ((u64)cmd->cmnd[6] << 24) | + (cmd->cmnd[7] << 16) | + (cmd->cmnd[8] << 8) | + cmd->cmnd[9]; + cmnd_count = (cmd->cmnd[10] << 24) | + (cmd->cmnd[11] << 16) | + (cmd->cmnd[12] << 8) | + cmd->cmnd[13]; + } else if (cmd->cmnd[0] == WRITE_12) { + cmnd_lba = ((u64)cmd->cmnd[2] << 24) | + (cmd->cmnd[3] << 16) | + (cmd->cmnd[4] << 8) | + cmd->cmnd[5]; + cmnd_count = (cmd->cmnd[6] << 24) | + (cmd->cmnd[7] << 16) | + (cmd->cmnd[8] << 8) | + cmd->cmnd[9]; + } else if (cmd->cmnd[0] == WRITE_10) { + cmnd_lba = ((u64)cmd->cmnd[2] << 24) | + (cmd->cmnd[3] << 16) | + (cmd->cmnd[4] << 8) | + cmd->cmnd[5]; + cmnd_count = (cmd->cmnd[7] << 8) | + cmd->cmnd[8]; + } else + continue; + if (((cmnd_lba + cmnd_count) < lba) || + (count && ((lba + count) < cmnd_lba))) + continue; ++active; break; } @@ -1674,7 +1721,7 @@ synchronizecmd->command = cpu_to_le32(VM_ContainerConfig); synchronizecmd->type = cpu_to_le32(CT_FLUSH_CACHE); synchronizecmd->cid = cpu_to_le32(scmd_id(scsicmd)); - synchronizecmd->count = + synchronizecmd->count = cpu_to_le32(sizeof(((struct aac_synchronize_reply *)NULL)->data)); /* @@ -1696,7 +1743,7 @@ return 0; } - printk(KERN_WARNING + printk(KERN_WARNING "aac_synchronize: aac_fib_send failed with status: %d.\n", status); aac_fib_complete(cmd_fibcontext); aac_fib_free(cmd_fibcontext); ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] aacraid: add 51245, 51645 and 52245 adapters to documentation. 2007-06-20 15:30 ` [PATCH] aacraid: add SCSI SYNCHONIZE_CACHE range checking (take 2) Salyzyn, Mark @ 2007-07-09 13:57 ` Salyzyn, Mark 2007-07-23 14:13 ` [PATCH] aacraid: sysfs adapter reset/status format change Salyzyn, Mark 0 siblings, 1 reply; 28+ messages in thread From: Salyzyn, Mark @ 2007-07-09 13:57 UTC (permalink / raw) To: linux-scsi [-- Attachment #1: Type: text/plain, Size: 458 bytes --] Adding Adaptec 51245 (16 port), 51645 (20 port) and 52445 (28 port) Universal Serial RAID controllers to the aacraid documentation. This attached patch is against current scsi-misc-2.6 ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> Documentation/scsi/aacraid.txt | 3 +++ 1 file changed, 3 insertions(+) Sincerely -- Mark Salyzyn [-- Attachment #2: aacraid_voodoo244.patch --] [-- Type: application/octet-stream, Size: 673 bytes --] diff -ru a/Documentation/scsi/aacraid.txt b/Documentation/scsi/aacraid.txt --- a/Documentation/scsi/aacraid.txt 2007-07-09 09:38:47.319012381 -0400 +++ b/Documentation/scsi/aacraid.txt 2007-07-09 09:47:03.383207866 -0400 @@ -50,6 +50,9 @@ 9005:0285:9005:02be Adaptec 31605 (Marauder160) 9005:0285:9005:02c3 Adaptec 51205 (Voodoo120) 9005:0285:9005:02c4 Adaptec 51605 (Voodoo160) + 9005:0285:9005:02ce Adaptec 51245 (Voodoo124) + 9005:0285:9005:02cf Adaptec 51645 (Voodoo164) + 9005:0285:9005:02d0 Adaptec 52445 (Voodoo244) 1011:0046:9005:0364 Adaptec 5400S (Mustang) 9005:0287:9005:0800 Adaptec Themisto (Jupiter) 9005:0200:9005:0200 Adaptec Themisto (Jupiter) ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] aacraid: sysfs adapter reset/status format change. 2007-07-09 13:57 ` [PATCH] aacraid: add 51245, 51645 and 52245 adapters to documentation Salyzyn, Mark @ 2007-07-23 14:13 ` Salyzyn, Mark 2007-07-26 18:20 ` [PATCH 1/1] aacraid: draw line in sand, sundry cleanup and version update Salyzyn, Mark 0 siblings, 1 reply; 28+ messages in thread From: Salyzyn, Mark @ 2007-07-23 14:13 UTC (permalink / raw) To: linux-scsi [-- Attachment #1: Type: text/plain, Size: 483 bytes --] We need to newline terminate responses from nodes within the sysfs tree, the Adapter status value reported by the reset adapter node is adjusted. This attached patch is against current scsi-misc-2.6 ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> drivers/scsi/aacraid/linit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Sincerely -- Mark Salyzyn [-- Attachment #2: aacraid_adapter_status_format_change.patch --] [-- Type: application/octet-stream, Size: 439 bytes --] diff -ru a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c --- a/drivers/scsi/aacraid/linit.c 2007-07-23 09:53:06.852929239 -0400 +++ b/drivers/scsi/aacraid/linit.c 2007-07-23 10:08:10.347390939 -0400 @@ -822,7 +822,7 @@ tmp = aac_adapter_check_health(dev); if ((tmp == 0) && dev->in_reset) tmp = -EBUSY; - len = snprintf(buf, PAGE_SIZE, "0x%x", tmp); + len = snprintf(buf, PAGE_SIZE, "0x%x\n", tmp); return len; } ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/1] aacraid: draw line in sand, sundry cleanup and version update 2007-07-23 14:13 ` [PATCH] aacraid: sysfs adapter reset/status format change Salyzyn, Mark @ 2007-07-26 18:20 ` Salyzyn, Mark 2007-07-27 14:29 ` [PATCH 1/1] aacraid: fix Sunrise Lake reset handling Salyzyn, Mark 0 siblings, 1 reply; 28+ messages in thread From: Salyzyn, Mark @ 2007-07-26 18:20 UTC (permalink / raw) To: linux-scsi [-- Attachment #1: Type: text/plain, Size: 741 bytes --] Minor unimportant cuttings from the floor bundled in with a version stamp update. Only controversial change is the dropping of Alan Cox copyright on the nark.c module since that file has no code written by him in it. This attached patch is against current scsi-misc-2.6 ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> drivers/scsi/aacraid/aachba.c | 3 +-- drivers/scsi/aacraid/aacraid.h | 6 +++--- drivers/scsi/aacraid/linit.c | 3 +-- drivers/scsi/aacraid/nark.c | 3 +-- drivers/scsi/aacraid/rkt.c | 2 +- 5 files changed, 7 insertions(+), 10 deletions(-) Sincerely -- Mark Salyzyn [-- Attachment #2: aacraid_cleanup_2449.patch --] [-- Type: application/octet-stream, Size: 3267 bytes --] diff -ru a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c --- a/drivers/scsi/aacraid/aachba.c 2007-07-26 13:28:32.179279220 -0400 +++ b/drivers/scsi/aacraid/aachba.c 2007-07-26 14:11:31.762916390 -0400 @@ -194,8 +194,7 @@ struct scsi_device *device; if (unlikely(!scsicmd || !scsicmd->scsi_done )) { - dprintk((KERN_WARNING "aac_valid_context: scsi command corrupt\n")) -; + dprintk((KERN_WARNING "aac_valid_context: scsi command corrupt\n")); aac_fib_complete(fibptr); aac_fib_free(fibptr); return 0; diff -ru a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h --- a/drivers/scsi/aacraid/aacraid.h 2007-07-26 13:28:32.180279094 -0400 +++ b/drivers/scsi/aacraid/aacraid.h 2007-07-26 14:11:31.770915383 -0400 @@ -12,7 +12,7 @@ *----------------------------------------------------------------------------*/ #ifndef AAC_DRIVER_BUILD -# define AAC_DRIVER_BUILD 2447 +# define AAC_DRIVER_BUILD 2449 # define AAC_DRIVER_BRANCH "-ms" #endif #define MAXIMUM_NUM_CONTAINERS 32 @@ -1807,10 +1807,10 @@ * accounting for the fact capacity could be a 64 bit value * */ -static inline u32 cap_to_cyls(sector_t capacity, u32 divisor) +static inline unsigned int cap_to_cyls(sector_t capacity, unsigned divisor) { sector_div(capacity, divisor); - return (u32)capacity; + return capacity; } /* SCp.phase values */ diff -ru a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c --- a/drivers/scsi/aacraid/linit.c 2007-07-26 13:28:32.183278715 -0400 +++ b/drivers/scsi/aacraid/linit.c 2007-07-26 14:11:31.772915132 -0400 @@ -1122,9 +1122,8 @@ static void aac_shutdown(struct pci_dev *dev) { struct Scsi_Host *shost = pci_get_drvdata(dev); - struct aac_dev *aac = (struct aac_dev *)shost->hostdata; scsi_block_requests(shost); - __aac_shutdown(aac); + __aac_shutdown((struct aac_dev *)shost->hostdata); } static void __devexit aac_remove_one(struct pci_dev *pdev) diff -ru a/drivers/scsi/aacraid/nark.c b/drivers/scsi/aacraid/nark.c --- a/drivers/scsi/aacraid/nark.c 2007-07-26 13:28:32.184278589 -0400 +++ b/drivers/scsi/aacraid/nark.c 2007-07-26 14:11:31.772915132 -0400 @@ -1,11 +1,10 @@ /* * Adaptec AAC series RAID controller driver - * (c) Copyright 2001 Red Hat Inc. <alan@redhat.com> * * based on the old aacraid driver that is.. * Adaptec aacraid device driver for Linux. * - * Copyright (c) 2000 Adaptec, Inc. (aacraid@adaptec.com) + * Copyright (c) 2006-2007 Adaptec, Inc. (aacraid@adaptec.com) * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by diff -ru a/drivers/scsi/aacraid/rkt.c b/drivers/scsi/aacraid/rkt.c --- a/drivers/scsi/aacraid/rkt.c 2007-07-26 13:28:32.184278589 -0400 +++ b/drivers/scsi/aacraid/rkt.c 2007-07-26 14:11:31.780914125 -0400 @@ -5,7 +5,7 @@ * based on the old aacraid driver that is.. * Adaptec aacraid device driver for Linux. * - * Copyright (c) 2000 Adaptec, Inc. (aacraid@adaptec.com) + * Copyright (c) 2000-2007 Adaptec, Inc. (aacraid@adaptec.com) * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/1] aacraid: fix Sunrise Lake reset handling 2007-07-26 18:20 ` [PATCH 1/1] aacraid: draw line in sand, sundry cleanup and version update Salyzyn, Mark @ 2007-07-27 14:29 ` Salyzyn, Mark 2007-08-02 19:38 ` [PATCH 1/1] aacraid: prevent panic on adapter resource failure Salyzyn, Mark 0 siblings, 1 reply; 28+ messages in thread From: Salyzyn, Mark @ 2007-07-27 14:29 UTC (permalink / raw) To: linux-scsi, Linux Kernel Mailing List Cc: Yinghai Lu, Vivek Goyal, Eric W. Biederman [-- Attachment #1: Type: text/plain, Size: 3750 bytes --] The patch is *much* smaller than the description. I am attempting to answer to those that want to understand an issue that was reported in May this year. If a Sunrise Lake based card that requires an alternate reset mechanism is set up to ignore the commanded IOP_RESET it reports 0x00000010 (IOP_RESET ignored) instead of 0x3803000F (use alternate reset mechanism to reset all cores), and thus the reset platform function decides to switch to IOP_RESET_ALWAYS because the reset platform function parameters indicate that we *need* to reset the card. IOP_RESET_ALWAYS then responds with the 0x3803000F return code, but alas we treat this as an error instead of using the alternate reset mechanism (put a 0x03 into the register offset 0x38). The reset fails, but the fact that the IOP_RESET_ALWAYS command was issued has put the card in a purposeful shutdown state in preparation for the alternate hardware reset to be applied. Yuck. IOP_RESET is ignored in internal production cards, typically to ensure that we catch all adapter lockup issues without the driver progressing further, so this would not appear to be a field issue and thus this patch was destined to be only in the internal Adaptec source tree. IOP_RESET_ALWAYS is reserved for kexec/kdump/FirmwareUpdate/AutomatedTestFrames so we did not function as expected in any case. Also in the past we have had OEMs specifically request that cards not be resetable after a BlinkLED/FirmwareAssert for one reason or another and To head off the possibility that the Sunrise Lake based cards would suffer a similar fate, we propose the enclosed fix. Yinghai Lu of SUN had a pre-production card with IOP_RESET disabled when he reported an issue to the linux kernel list back in May regarding a kexec problem resulting from this reset being ignore. His fix was to update the Firmware to one that did not ignore the IOP_RESET. Previous kernels did not attempt to reset the adapter and that is why it surfaced as a regression in his hands. The current list of aacraid based cards that use Sunrise Lake: 9005:0285:9005:02b5 Adaptec 5445 9005:0285:9005:02b6 Adaptec 5805 9005:0285:9005:02b7 Adaptec 5085 9005:0285:9005:02c3 Adaptec 51205 9005:0285:9005:02c4 Adaptec 51605 9005:0285:9005:02ce Adaptec 51245 9005:0285:9005:02cf Adaptec 51645 9005:0285:9005:02d0 Adaptec 52445 9005:0285:9005:02d1 Adaptec 5405 9005:0285:9005:02b8 ICP ICP5445SL 9005:0285:9005:02b9 ICP ICP5085SL 9005:0285:9005:02ba ICP ICP5805SL 9005:0285:9005:02c5 ICP ICP5125SL 9005:0285:9005:02c6 ICP ICP5165SL 9005:0285:108e:7aac SUN STK RAID REM 9005:0285:108e:0286 SUN STK RAID INT 9005:0285:108e:0287 SUN STK RAID EXT 9005:0285:108e:7aae SUN STK RAID EM All of these are publicly released with IOP_RESET enabled. So there is no immediate need for this patch. This attached patch is against July 11 2007 scsi-misc-2.6, still applies today. ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> /drivers/scsi/aacraid/rx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Sincerely -- Mark Salyzyn -----Original Message----- From: Yinghai Lu [mailto:yhlu.kernel@gmail.com] Sent: Tuesday, May 29, 2007 10:00 PM To: Andrew Morton; Vivek Goyal; Eric W. Biederman; AACRAID Cc: Linux Kernel Mailing List Subject: kexec and aacraid broken latest tree, can not use kexec to load 2.6.22-rc3 at least. got: AAC0: adapter kernel panic'd fffffffd AAC0: adapter kernel failed to start, init status=0 but can load 2.6.21.3 YH [-- Attachment #2: aacraid_voodoo_reset.patch --] [-- Type: application/octet-stream, Size: 507 bytes --] diff -ru a//drivers/scsi/aacraid/rx.c b//drivers/scsi/aacraid/rx.c --- a//drivers/scsi/aacraid/rx.c 2007-07-11 11:26:25.091066761 -0400 +++ b//drivers/scsi/aacraid/rx.c 2007-07-11 11:28:31.961859496 -0400 @@ -472,7 +472,7 @@ else { bled = aac_adapter_sync_cmd(dev, IOP_RESET_ALWAYS, 0, 0, 0, 0, 0, 0, &var, NULL, NULL, NULL, NULL); - if (!bled && (var != 0x00000001)) + if (!bled && (var != 0x00000001) && (var != 0x3803000F)) bled = -EINVAL; } if (bled && (bled != -ETIMEDOUT)) ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/1] aacraid: prevent panic on adapter resource failure 2007-07-27 14:29 ` [PATCH 1/1] aacraid: fix Sunrise Lake reset handling Salyzyn, Mark @ 2007-08-02 19:38 ` Salyzyn, Mark 2007-08-07 19:36 ` [PATCH 1/1] aacraid: default timeout for arrays too short Salyzyn, Mark 0 siblings, 1 reply; 28+ messages in thread From: Salyzyn, Mark @ 2007-08-02 19:38 UTC (permalink / raw) To: linux-scsi [-- Attachment #1: Type: text/plain, Size: 652 bytes --] If the driver fails to allocate the contiguous (DMAable) memory for system reasons, we fail to load the instance, but then we try to free the <nul> allocation in the cleanup code and we get a panic in pci_free_consistent(). This is reported against an older kernel, hope this is relevant for latest/greatest. This attached patch is against current scsi-misc-2.6. ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> drivers/scsi/aacraid/linit.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Sincerely -- Mark Salyzyn [-- Attachment #2: aacraid_fail_to_load_panic.patch --] [-- Type: application/octet-stream, Size: 561 bytes --] diff -ru a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c --- a/drivers/scsi/aacraid/linit.c 2007-08-02 15:30:35.489215671 -0400 +++ b/drivers/scsi/aacraid/linit.c 2007-08-02 15:30:41.567415315 -0400 @@ -1110,7 +1110,9 @@ __aac_shutdown(aac); out_unmap: aac_fib_map_free(aac); - pci_free_consistent(aac->pdev, aac->comm_size, aac->comm_addr, aac->comm_phys); + if (aac->comm_addr) + pci_free_consistent(aac->pdev, aac->comm_size, aac->comm_addr, + aac->comm_phys); kfree(aac->queues); aac_adapter_ioremap(aac, 0); kfree(aac->fibs); ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/1] aacraid: default timeout for arrays too short 2007-08-02 19:38 ` [PATCH 1/1] aacraid: prevent panic on adapter resource failure Salyzyn, Mark @ 2007-08-07 19:36 ` Salyzyn, Mark 2007-09-04 16:55 ` [PATCH 1/1] aacraid: Add documentation for new Adaptec, SMC and SUN cards Salyzyn, Mark 0 siblings, 1 reply; 28+ messages in thread From: Salyzyn, Mark @ 2007-08-07 19:36 UTC (permalink / raw) To: linux-scsi [-- Attachment #1: Type: text/plain, Size: 1098 bytes --] The default SCSI timeout is 30 seconds for a logical device. The aacraid based controllers currently have a 35 second timeout for the array. We are bumping up the default SCSI timeout for array devices, which typically manage many physical disks, to 45 seconds to provide a small margin to permit the controller to do what it is designed for. We have not observed any bad side-effects either way because no significant actions are taken by the aacraid timeout handler except to take advantage of the quiesced state to allow completion of all outstanding commands in the controller to provide a poor-mans guaranty of delivery. This is merely a preferential decision to reduce the number of timeout reports in the system logs to only the more serious conditions. This attached patch is against current scsi-misc-2.6. ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> drivers/scsi/aacraid/linit.c | 6 ++++++ 1 file changed, 6 insertions(+) Sincerely -- Mark Salyzyn [-- Attachment #2: aacraid_array_timeout_too_short.patch --] [-- Type: application/octet-stream, Size: 607 bytes --] diff -ru a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c --- a/drivers/scsi/aacraid/linit.c 2007-08-07 14:50:42.087439732 -0400 +++ b/drivers/scsi/aacraid/linit.c 2007-08-07 14:55:54.973530300 -0400 @@ -420,6 +420,12 @@ unsigned num_one = 0; unsigned depth; + /* + * Firmware has an individual device recovery time typically + * of 35 seconds, give us a margin. + */ + if (sdev->timeout < (45 * HZ)) + sdev->timeout = 45 * HZ; __shost_for_each_device(dev, host) { if (dev->tagged_supported && (dev->type == TYPE_DISK) && (sdev_channel(dev) == CONTAINER_CHANNEL)) ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/1] aacraid: Add documentation for new Adaptec, SMC and SUN cards 2007-08-07 19:36 ` [PATCH 1/1] aacraid: default timeout for arrays too short Salyzyn, Mark @ 2007-09-04 16:55 ` Salyzyn, Mark 0 siblings, 0 replies; 28+ messages in thread From: Salyzyn, Mark @ 2007-09-04 16:55 UTC (permalink / raw) To: linux-scsi [-- Attachment #1: Type: text/plain, Size: 528 bytes --] Add the SMC LP, SUN EM and Adaptec 5405 cards to the aacraid documentation list of supported products. These cards are picked up with family match, so no associated code changes. This attached patch is against current scsi-misc-2.6. ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> Documentation/scsi/aacraid.txt | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) Sincerely -- Mark Salyzyn [-- Attachment #2: aacraid_SMC_SUN.patch --] [-- Type: application/octet-stream, Size: 1607 bytes --] diff -ru a/Documentation/scsi/aacraid.txt b/Documentation/scsi/aacraid.txt --- a/Documentation/scsi/aacraid.txt 2007-09-04 12:40:17.761048273 -0400 +++ b/Documentation/scsi/aacraid.txt 2007-09-04 12:50:05.785727810 -0400 @@ -38,10 +38,8 @@ 9005:0286:9005:02ac Adaptec 1800 (Typhoon44) 9005:0285:9005:02b5 Adaptec 5445 (Voodoo44) 9005:0285:15d9:02b5 SMC AOC-USAS-S4i - 9005:0285:15d9:02c9 SMC AOC-USAS-S4iR 9005:0285:9005:02b6 Adaptec 5805 (Voodoo80) 9005:0285:15d9:02b6 SMC AOC-USAS-S8i - 9005:0285:15d9:02ca SMC AOC-USAS-S8iR 9005:0285:9005:02b7 Adaptec 5085 (Voodoo08) 9005:0285:9005:02bb Adaptec 3405 (Marauder40LP) 9005:0285:9005:02bc Adaptec 3805 (Marauder80LP) @@ -50,9 +48,14 @@ 9005:0285:9005:02be Adaptec 31605 (Marauder160) 9005:0285:9005:02c3 Adaptec 51205 (Voodoo120) 9005:0285:9005:02c4 Adaptec 51605 (Voodoo160) + 9005:0285:15d9:02c9 SMC AOC-USAS-S4iR + 9005:0285:15d9:02ca SMC AOC-USAS-S8iR 9005:0285:9005:02ce Adaptec 51245 (Voodoo124) 9005:0285:9005:02cf Adaptec 51645 (Voodoo164) 9005:0285:9005:02d0 Adaptec 52445 (Voodoo244) + 9005:0285:9005:02d1 Adaptec 5405 (Voodoo40) + 9005:0285:15d9:02d2 SMC AOC-USAS-S8i-LP + 9005:0285:15d9:02d3 SMC AOC-USAS-S8iR-LP 1011:0046:9005:0364 Adaptec 5400S (Mustang) 9005:0287:9005:0800 Adaptec Themisto (Jupiter) 9005:0200:9005:0200 Adaptec Themisto (Jupiter) @@ -103,6 +106,7 @@ 9005:0285:108e:7aac SUN STK RAID REM (Voodoo44 Coyote) 9005:0285:108e:0286 SUN STK RAID INT (Cougar) 9005:0285:108e:0287 SUN STK RAID EXT (Prometheus) + 9005:0285:108e:7aae SUN STK RAID EM (Narvi) People ------------------------- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: kexec and aacraid broken 2007-05-30 14:30 ` Salyzyn, Mark 2007-05-30 15:59 ` [PATCH] aacraid: fix shutdown handler to also disable interrupts Salyzyn, Mark @ 2007-05-30 21:19 ` Yinghai Lu 1 sibling, 0 replies; 28+ messages in thread From: Yinghai Lu @ 2007-05-30 21:19 UTC (permalink / raw) To: Salyzyn, Mark Cc: vgoyal, Andrew Morton, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski On 5/30/07, Salyzyn, Mark <mark_salyzyn@adaptec.com> wrote: > Vivek Goyal [mailto:vgoyal@in.ibm.com] writes: > > So most likely if we start disabling the interrupts > > in .shutdown routine we might skip resetting adapter > > on every kexec without any side affects? > > Not that simple. The .shutdown would need to perform more resource > cleanups of the .remove call to prevent side effects. I need to move > some of the .remove activity into the .shutdown handler to make sure the > adapter is quiesced. > > I will hold off on submitting any of these changes until they are > evaluated and tested; I am waiting for feedback from Yinghai on the > other mitigations that I feel are closer to the root cause. > 1. [SCSI] aacraid: superfluous adapter reset for IBM 8 series ServeRAID controllers 2. [SCSI] aacraid: kexec fix (reset interrupt handler) 3. aacraid_commit_reset.patch 4. [PATCH] aacraid: fix shutdown handler to also disable interrupts the kernel with this patch -4 and even without 1, 2, 3 can load other kernel with or without patch 1,2,3 YH ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: kexec and aacraid broken 2007-05-30 11:44 ` Salyzyn, Mark 2007-05-30 13:24 ` Vivek Goyal @ 2007-05-30 21:22 ` Yinghai Lu 2007-05-30 21:49 ` Salyzyn, Mark 1 sibling, 1 reply; 28+ messages in thread From: Yinghai Lu @ 2007-05-30 21:22 UTC (permalink / raw) To: Salyzyn, Mark Cc: Andrew Morton, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski On 5/30/07, Salyzyn, Mark <mark_salyzyn@adaptec.com> wrote: > I believe this issue is a result of the aacraid_commit_reset patch (as > posted for scsi-misc-2.6, enclosed to permit testing) not yet propagated > to the 2.6.22-rc3 tree. > > This is the adapter taking longer than 3 minutes to start after a reset. > I seriously doubt either of these patches suggested below will have an > affect. And if they do, they are not root cause, one reduces the chances > that the card will be reset during initialization (thus applied would > likely mitigate this problem), the other prevents a panic when the > Adapter is reset (removed, would result in dogs and cats sleeping with > each other). > > Please use kernel parameter aacraid.startup_timeout=540 (merely larger > than the default 180 seconds) when spawning the kexec or see if the > aacraid_commit_reset.patch resolves the issue to confirm my hunch. > aacraid_commit_reset.patch is in the mainline already. YH ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: kexec and aacraid broken 2007-05-30 21:22 ` Yinghai Lu @ 2007-05-30 21:49 ` Salyzyn, Mark 2007-05-30 22:11 ` Yinghai Lu 0 siblings, 1 reply; 28+ messages in thread From: Salyzyn, Mark @ 2007-05-30 21:49 UTC (permalink / raw) To: Yinghai Lu Cc: Andrew Morton, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski Yinghai Lu [mailto:yhlu.kernel@gmail.com] writes: > aacraid_commit_reset.patch is in the mainline already. But aacraid_commit_reset.patch is not in 2.6.22-rc3 (to which you report the issue). Does the aacraid_commit_reset.patch work to resolve this issue all by itself in the kexec'd kernel? Or alternatively did you try aacraid.startup_timeout=540 as one of the kernel parameters passed to the kexec'd kernel? The '[PATCH] aacraid: fix shutdown handler to also disable interrupts' patch (you refer to this as patch 4) is not to be in the picture because it will hide the root cause. I believe I have you correct in stating that this patch (4) resolves the problem... but I expect the problem to remain with kdump. Sincerely -- Mark Salyzyn ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: kexec and aacraid broken 2007-05-30 21:49 ` Salyzyn, Mark @ 2007-05-30 22:11 ` Yinghai Lu 2007-05-31 12:37 ` Salyzyn, Mark 0 siblings, 1 reply; 28+ messages in thread From: Yinghai Lu @ 2007-05-30 22:11 UTC (permalink / raw) To: Salyzyn, Mark Cc: Andrew Morton, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski On 5/30/07, Salyzyn, Mark <mark_salyzyn@adaptec.com> wrote: > Yinghai Lu [mailto:yhlu.kernel@gmail.com] writes: > > aacraid_commit_reset.patch is in the mainline already. > > But aacraid_commit_reset.patch is not in 2.6.22-rc3 (to which you report > the issue). Does the aacraid_commit_reset.patch work to resolve this > issue all by itself in the kexec'd kernel? Or alternatively did you try > aacraid.startup_timeout=540 as one of the kernel parameters passed to > the kexec'd kernel? No, still get adapter kernel panic > > The '[PATCH] aacraid: fix shutdown handler to also disable interrupts' > patch (you refer to this as patch 4) is not to be in the picture because > it will hide the root cause. I believe I have you correct in stating > that this patch (4) resolves the problem... but I expect the problem to > remain with kdump. Oh. without patch(4), latest kernel still can use kexec to 2.6.21.3 will try to load 2.6.22-rc1 etc. YH ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: kexec and aacraid broken 2007-05-30 22:11 ` Yinghai Lu @ 2007-05-31 12:37 ` Salyzyn, Mark 2007-05-31 19:59 ` Yinghai Lu 0 siblings, 1 reply; 28+ messages in thread From: Salyzyn, Mark @ 2007-05-31 12:37 UTC (permalink / raw) To: Yinghai Lu Cc: Andrew Morton, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski > No, still get adapter kernel panic Which adapter are you using? Sincerely -- Mark Salyzyn ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: kexec and aacraid broken 2007-05-31 12:37 ` Salyzyn, Mark @ 2007-05-31 19:59 ` Yinghai Lu 2007-05-31 20:45 ` Salyzyn, Mark 0 siblings, 1 reply; 28+ messages in thread From: Yinghai Lu @ 2007-05-31 19:59 UTC (permalink / raw) To: Salyzyn, Mark Cc: Andrew Morton, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski SUN coguar with 11731 YH On 5/31/07, Salyzyn, Mark <mark_salyzyn@adaptec.com> wrote: > > No, still get adapter kernel panic > > Which adapter are you using? > > Sincerely -- Mark Salyzyn > ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: kexec and aacraid broken 2007-05-31 19:59 ` Yinghai Lu @ 2007-05-31 20:45 ` Salyzyn, Mark 0 siblings, 0 replies; 28+ messages in thread From: Salyzyn, Mark @ 2007-05-31 20:45 UTC (permalink / raw) To: Yinghai Lu Cc: Andrew Morton, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List, linux-scsi, Michal Piotrowski Ahhhh. explains why I am having troubles duping this issue thus far. This is prerelease Firmware on a yet to be released card and thus should not get any driver workarounds if this issue can be resolved in Firmware. If this can be duped on a released card with released Firmware, then the story changes of course; but still does not preclude a Firmware/Hardware/Drive Compatibility bug ;-} . Until then, please work this issue via SUN channels so that we get all the necessary card debug information for our teams to work this. I will ensure Adaptec will remain on top of this issue since it is clearly a problem with the Adapter Hardware interfacing. The adapter is not surviving an IOP_RESET and is going into an Adapter Firmware Kernel Panic or taking an excessively long period (in the testing thus far > 540 seconds) of time to complete it's reset. Sincerely -- Mark Salyzyn Yinghai Lu [mailto:yhlu.kernel@gmail.com] sez: > SUN coguar with 11731 > > On 5/31/07, Salyzyn, Mark <mark_salyzyn@adaptec.com> wrote: > > > No, still get adapter kernel panic > > > > Which adapter are you using? ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2007-09-04 17:10 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <86802c440705291859y39a4ca27uf5ddb84810f33510@mail.gmail.com>
2007-05-30 2:13 ` kexec and aacraid broken Andrew Morton
2007-05-30 11:44 ` Salyzyn, Mark
2007-05-30 13:24 ` Vivek Goyal
2007-05-30 13:57 ` Salyzyn, Mark
2007-05-30 14:17 ` Vivek Goyal
2007-05-30 14:30 ` Salyzyn, Mark
2007-05-30 15:59 ` [PATCH] aacraid: fix shutdown handler to also disable interrupts Salyzyn, Mark
2007-05-30 17:36 ` Yinghai Lu
2007-06-01 11:08 ` Vivek Goyal
2007-06-01 17:07 ` Yinghai Lu
2007-06-01 17:34 ` Salyzyn, Mark
2007-06-07 17:21 ` [PATCH] aacraid: add SCSI SYNCHONIZE_CACHE range checking Salyzyn, Mark
2007-06-11 20:17 ` [PATCH] aacraid: probe related code cleanup Salyzyn, Mark
2007-06-20 15:30 ` [PATCH] aacraid: add SCSI SYNCHONIZE_CACHE range checking (take 2) Salyzyn, Mark
2007-07-09 13:57 ` [PATCH] aacraid: add 51245, 51645 and 52245 adapters to documentation Salyzyn, Mark
2007-07-23 14:13 ` [PATCH] aacraid: sysfs adapter reset/status format change Salyzyn, Mark
2007-07-26 18:20 ` [PATCH 1/1] aacraid: draw line in sand, sundry cleanup and version update Salyzyn, Mark
2007-07-27 14:29 ` [PATCH 1/1] aacraid: fix Sunrise Lake reset handling Salyzyn, Mark
2007-08-02 19:38 ` [PATCH 1/1] aacraid: prevent panic on adapter resource failure Salyzyn, Mark
2007-08-07 19:36 ` [PATCH 1/1] aacraid: default timeout for arrays too short Salyzyn, Mark
2007-09-04 16:55 ` [PATCH 1/1] aacraid: Add documentation for new Adaptec, SMC and SUN cards Salyzyn, Mark
2007-05-30 21:19 ` kexec and aacraid broken Yinghai Lu
2007-05-30 21:22 ` Yinghai Lu
2007-05-30 21:49 ` Salyzyn, Mark
2007-05-30 22:11 ` Yinghai Lu
2007-05-31 12:37 ` Salyzyn, Mark
2007-05-31 19:59 ` Yinghai Lu
2007-05-31 20:45 ` Salyzyn, Mark
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox