* [PATCH] aacraid: [Fastboot] Panics for AACRAID driverduring'insmod' for kexec test [take 4] [not found] <1175611901.3645.3.camel@mulgrave.il.steeleye.com> @ 2007-04-03 15:58 ` Salyzyn, Mark 2007-04-03 16:09 ` James Bottomley 2007-04-03 16:54 ` [PATCH] aacraid: [Fastboot] Panics for AACRAID driverduring'insmod' for kexec test [take 4] Judith Lebzelter 0 siblings, 2 replies; 5+ messages in thread From: Salyzyn, Mark @ 2007-04-03 15:58 UTC (permalink / raw) To: James Bottomley; +Cc: Judith Lebzelter, vgoyal, linux-scsi [-- Attachment #1: Type: text/plain, Size: 3026 bytes --] I will do you one better, James, I will slip in a little cleanup in sa.c (support file for the old PPC based ARC cards) where I discovered the restart platform function was ALSO left unset which could result in similar pain of null pointer discovery. Please note: The issue Judith ran into, where the card took longer than 3 minutes to initialize because of a problem drive may require the extension of the timeout to address (insmod parameter aacraid.startup_timeout=540 may do the trick). Extending the timeout may have been a fact of life given that the restart of the adapter normally occurs on BIOS load long before the driver instantiates settling the problem drives; if this is the case a small and lower priority follow-up hardening patch can help the users that find adding the insmod parameter repugnant in order to support kexec and kdump in the face of problem drives. Problem drives may have lead to the need to get a kernel dump ... You will find enclosed the pristine patch based on the initial patch, dropping the static function, and adding the three missing platform function initializations. Attached is the patch I feel will address this interrupt issue. As an added 'perk' I have also added the code to detect if the controller was previously initialized for interrupted operations by ANY operating system should the reset_devices kernel parameter not be set and we are dealing with a naïve kexec without the addition of this kernel parameter. The reset handler is also improved. Related to reset operations, but not pertinent specifically to this issue, I have also altered the handling somewhat so that we reset the adapter if we feel it is taking too long (three minutes) to start up. ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patches. This attached patch is against current scsi-misc-2.6 MINUS the initial version of this patch and the first patch that sets the missing platform function related to this discussion. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> --- Sincerely -- Mark Salyzyn > -----Original Message----- > From: James Bottomley [mailto:James.Bottomley@SteelEye.com] > Sent: Tuesday, April 03, 2007 10:52 AM > To: Salyzyn, Mark > Cc: Judith Lebzelter; vgoyal@in.ibm.com > Subject: RE: [PATCH] aacraid: [Fastboot] Panics for AACRAID > driverduring'insmod' for kexec test. > > > On Tue, 2007-04-03 at 09:30 -0400, Salyzyn, Mark wrote: > > 0x48 status code means the Firmware is trying to boot the > Kernel. This > > phase is most likely blocked because of the hard drive > failure as you > > suspected; the kernel is not declared up and running until after the > > drives have spun up, and a problem drive could be tricking > the Firmware > > into a recovery loop holding things back ... > > I'm constructing what I hope will be the last pre 2.6.21 > merge tree ... > do you have a clean patch with the two necessary fixes for > the panic you > can send to the list? > > James [-- Attachment #2: aacraid_kexec_4.patch --] [-- Type: application/octet-stream, Size: 3794 bytes --] diff -ru a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c --- a/drivers/scsi/aacraid/rx.c 2007-04-03 11:31:40.288114365 -0400 +++ b/drivers/scsi/aacraid/rx.c 2007-04-03 11:34:12.560873530 -0400 @@ -467,16 +467,19 @@ if (bled) printk(KERN_ERR "%s%d: adapter kernel panic'd %x.\n", dev->name, dev->id, bled); - else + else { bled = aac_adapter_sync_cmd(dev, IOP_RESET_ALWAYS, 0, 0, 0, 0, 0, 0, &var, NULL, NULL, NULL, NULL); - if (bled) + if (!bled && (var != 0x00000001)) + bled = -EINVAL; + } + if (bled && (bled != -ETIMEDOUT)) bled = aac_adapter_sync_cmd(dev, IOP_RESET, 0, 0, 0, 0, 0, 0, &var, NULL, NULL, NULL, NULL); - if (bled) + if (bled && (bled != -ETIMEDOUT)) return -EINVAL; - if (var == 0x3803000F) { /* USE_OTHER_METHOD */ + if (bled || (var == 0x3803000F)) { /* USE_OTHER_METHOD */ rx_writel(dev, MUnit.reserved2, 3); msleep(5000); /* Delay 5 seconds */ var = 0x00000001; @@ -526,6 +529,7 @@ { unsigned long start; unsigned long status; + int restart = 0; int instance = dev->id; const char * name = dev->name; @@ -534,15 +538,21 @@ goto error_iounmap; } + /* Failure to reset here is an option ... */ + dev->a_ops.adapter_sync_cmd = rx_sync_cmd; + dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt; + dev->OIMR = status = rx_readb (dev, MUnit.OIMR); + if ((((status & 0xff) != 0xff) || reset_devices) && + !aac_rx_restart_adapter(dev, 0)) + ++restart; /* * Check to see if the board panic'd while booting. */ status = rx_readl(dev, MUnit.OMRx[0]); if (status & KERNEL_PANIC) { - if ((status = aac_rx_check_health(dev)) <= 0) - goto error_iounmap; - if (aac_rx_restart_adapter(dev, status)) + if (aac_rx_restart_adapter(dev, aac_rx_check_health(dev))) goto error_iounmap; + ++restart; } /* * Check to see if the board failed any self tests. @@ -565,11 +575,23 @@ */ while (!((status = rx_readl(dev, MUnit.OMRx[0])) & KERNEL_UP_AND_RUNNING)) { - if(time_after(jiffies, start+startup_timeout*HZ)) { + if ((restart && + (status & (KERNEL_PANIC|SELF_TEST_FAILED|MONITOR_PANIC))) || + time_after(jiffies, start+HZ*startup_timeout)) { printk(KERN_ERR "%s%d: adapter kernel failed to start, init status = %lx.\n", dev->name, instance, status); goto error_iounmap; } + if (!restart && + ((status & (KERNEL_PANIC|SELF_TEST_FAILED|MONITOR_PANIC)) || + time_after(jiffies, start + HZ * + ((startup_timeout > 60) + ? (startup_timeout - 60) + : (startup_timeout / 2))))) { + if (likely(!aac_rx_restart_adapter(dev, aac_rx_check_health(dev)))) + start = jiffies; + ++restart; + } msleep(1); } /* diff -ru a/drivers/scsi/aacraid/sa.c b/drivers/scsi/aacraid/sa.c --- a/drivers/scsi/aacraid/sa.c 2007-04-03 11:17:32.770117249 -0400 +++ b/drivers/scsi/aacraid/sa.c 2007-04-03 11:35:39.427898381 -0400 @@ -5,7 +5,7 @@ * based on the old aacraid driver that is.. * Adaptec aacraid device driver for Linux. * - * Copyright (c) 2000 Adaptec, Inc. (aacraid@adaptec.com) + * Copyright (c) 2000-2007 Adaptec, Inc. (aacraid@adaptec.com) * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -258,6 +258,11 @@ NULL, NULL, NULL, NULL, NULL); } +static int aac_sa_restart_adapter(struct aac_dev *dev, int bled) +{ + return -EINVAL; +} + /** * aac_sa_check_health * @dev: device to check if healthy @@ -367,6 +372,7 @@ dev->a_ops.adapter_notify = aac_sa_notify_adapter; dev->a_ops.adapter_sync_cmd = sa_sync_cmd; dev->a_ops.adapter_check_health = aac_sa_check_health; + dev->a_ops.adapter_restart = aac_sa_restart_adapter; dev->a_ops.adapter_intr = aac_sa_intr; dev->a_ops.adapter_ioremap = aac_sa_ioremap; ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] aacraid: [Fastboot] Panics for AACRAID driverduring'insmod' for kexec test [take 4] 2007-04-03 15:58 ` [PATCH] aacraid: [Fastboot] Panics for AACRAID driverduring'insmod' for kexec test [take 4] Salyzyn, Mark @ 2007-04-03 16:09 ` James Bottomley 2007-04-03 16:14 ` [PATCH] aacraid: [Fastboot] Panics for AACRAIDdriverduring'insmod' for kexec test [take 5] Salyzyn, Mark 2007-04-03 16:54 ` [PATCH] aacraid: [Fastboot] Panics for AACRAID driverduring'insmod' for kexec test [take 4] Judith Lebzelter 1 sibling, 1 reply; 5+ messages in thread From: James Bottomley @ 2007-04-03 16:09 UTC (permalink / raw) To: Salyzyn, Mark; +Cc: Judith Lebzelter, vgoyal, linux-scsi On Tue, 2007-04-03 at 11:58 -0400, Salyzyn, Mark wrote: > I will do you one better, James, I will slip in a little cleanup in > sa.c (support file for the old PPC based ARC cards) where I discovered > the restart platform function was ALSO left unset which could result > in similar pain of null pointer discovery. Actually, for 2.6.21-rc5 could I just have the two strict bug fixes for the potential oops you spotted ... I should really be adding feature changes or cleanups at this stage of the kernel release. I can put all the rest into scsi-misc for post 2.6.21 Thanks, James ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] aacraid: [Fastboot] Panics for AACRAIDdriverduring'insmod' for kexec test [take 5] 2007-04-03 16:09 ` James Bottomley @ 2007-04-03 16:14 ` Salyzyn, Mark 2007-04-04 19:49 ` [PATCH] aacraid: Correct SMC products in aacraid.txt Salyzyn, Mark 0 siblings, 1 reply; 5+ messages in thread From: Salyzyn, Mark @ 2007-04-03 16:14 UTC (permalink / raw) To: James Bottomley; +Cc: Judith Lebzelter, vgoyal, linux-scsi [-- Attachment #1: Type: text/plain, Size: 2028 bytes --] Dropped the portion of the diff with references to sa.c. Attached is the patch I feel will address this interrupt issue. As an added 'perk' I have also added the code to detect if the controller was previously initialized for interrupted operations by ANY operating system should the reset_devices kernel parameter not be set and we are dealing with a naïve kexec without the addition of this kernel parameter. The reset handler is also improved. Related to reset operations, but not pertinent specifically to this issue, I have also altered the handling somewhat so that we reset the adapter if we feel it is taking too long (three minutes) to start up. ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patches. This attached patch is against current scsi-misc-2.6 MINUS the initial version of this patch and the first patch that sets the missing platform function related to this discussion. Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> --- Sincerely -- Mark Salyzyn > -----Original Message----- > From: James Bottomley [mailto:James.Bottomley@SteelEye.com] > Sent: Tuesday, April 03, 2007 12:10 PM > To: Salyzyn, Mark > Cc: Judith Lebzelter; vgoyal@in.ibm.com; linux-scsi@vger.kernel.org > Subject: Re: [PATCH] aacraid: [Fastboot] Panics for > AACRAIDdriverduring'insmod' for kexec test [take 4] > > > On Tue, 2007-04-03 at 11:58 -0400, Salyzyn, Mark wrote: > > I will do you one better, James, I will slip in a little cleanup in > > sa.c (support file for the old PPC based ARC cards) where I > discovered > > the restart platform function was ALSO left unset which could result > > in similar pain of null pointer discovery. > > Actually, for 2.6.21-rc5 could I just have the two strict bug > fixes for > the potential oops you spotted ... I should really be adding feature > changes or cleanups at this stage of the kernel release. > > I can put all the rest into scsi-misc for post 2.6.21 > > Thanks, > > James > > > [-- Attachment #2: aacraid_kexec_5.patch --] [-- Type: application/octet-stream, Size: 2670 bytes --] diff -ru a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c --- a/drivers/scsi/aacraid/rx.c 2007-04-03 11:31:40.288114365 -0400 +++ b/drivers/scsi/aacraid/rx.c 2007-04-03 11:34:12.560873530 -0400 @@ -467,16 +467,19 @@ if (bled) printk(KERN_ERR "%s%d: adapter kernel panic'd %x.\n", dev->name, dev->id, bled); - else + else { bled = aac_adapter_sync_cmd(dev, IOP_RESET_ALWAYS, 0, 0, 0, 0, 0, 0, &var, NULL, NULL, NULL, NULL); - if (bled) + if (!bled && (var != 0x00000001)) + bled = -EINVAL; + } + if (bled && (bled != -ETIMEDOUT)) bled = aac_adapter_sync_cmd(dev, IOP_RESET, 0, 0, 0, 0, 0, 0, &var, NULL, NULL, NULL, NULL); - if (bled) + if (bled && (bled != -ETIMEDOUT)) return -EINVAL; - if (var == 0x3803000F) { /* USE_OTHER_METHOD */ + if (bled || (var == 0x3803000F)) { /* USE_OTHER_METHOD */ rx_writel(dev, MUnit.reserved2, 3); msleep(5000); /* Delay 5 seconds */ var = 0x00000001; @@ -526,6 +529,7 @@ { unsigned long start; unsigned long status; + int restart = 0; int instance = dev->id; const char * name = dev->name; @@ -534,15 +538,21 @@ goto error_iounmap; } + /* Failure to reset here is an option ... */ + dev->a_ops.adapter_sync_cmd = rx_sync_cmd; + dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt; + dev->OIMR = status = rx_readb (dev, MUnit.OIMR); + if ((((status & 0xff) != 0xff) || reset_devices) && + !aac_rx_restart_adapter(dev, 0)) + ++restart; /* * Check to see if the board panic'd while booting. */ status = rx_readl(dev, MUnit.OMRx[0]); if (status & KERNEL_PANIC) { - if ((status = aac_rx_check_health(dev)) <= 0) - goto error_iounmap; - if (aac_rx_restart_adapter(dev, status)) + if (aac_rx_restart_adapter(dev, aac_rx_check_health(dev))) goto error_iounmap; + ++restart; } /* * Check to see if the board failed any self tests. @@ -565,11 +575,23 @@ */ while (!((status = rx_readl(dev, MUnit.OMRx[0])) & KERNEL_UP_AND_RUNNING)) { - if(time_after(jiffies, start+startup_timeout*HZ)) { + if ((restart && + (status & (KERNEL_PANIC|SELF_TEST_FAILED|MONITOR_PANIC))) || + time_after(jiffies, start+HZ*startup_timeout)) { printk(KERN_ERR "%s%d: adapter kernel failed to start, init status = %lx.\n", dev->name, instance, status); goto error_iounmap; } + if (!restart && + ((status & (KERNEL_PANIC|SELF_TEST_FAILED|MONITOR_PANIC)) || + time_after(jiffies, start + HZ * + ((startup_timeout > 60) + ? (startup_timeout - 60) + : (startup_timeout / 2))))) { + if (likely(!aac_rx_restart_adapter(dev, aac_rx_check_health(dev)))) + start = jiffies; + ++restart; + } msleep(1); } /* ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] aacraid: Correct SMC products in aacraid.txt 2007-04-03 16:14 ` [PATCH] aacraid: [Fastboot] Panics for AACRAIDdriverduring'insmod' for kexec test [take 5] Salyzyn, Mark @ 2007-04-04 19:49 ` Salyzyn, Mark 0 siblings, 0 replies; 5+ messages in thread From: Salyzyn, Mark @ 2007-04-04 19:49 UTC (permalink / raw) To: linux-scsi [-- Attachment #1: Type: text/plain, Size: 524 bytes --] Correct a spelling mistake for the SMC product names (replace 'B' with 'R') in the Documentation/scsi/aacraid.txt file. This is a follow-up to a documentation patch '[PATCH] aacraid: Add SMC and SUN products to README' submitted and accepted to scsi-misc-2.6 on March 27 2007. ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patches. This attached patch is against current scsi-misc-2.6 Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> --- Sincerely -- Mark Salyzyn [-- Attachment #2: aacraid_SMC_documentation.patch --] [-- Type: application/octet-stream, Size: 764 bytes --] diff -ru a/Documentation/scsi/aacraid.txt b/Documentation/scsi/aacraid.txt --- a/Documentation/scsi/aacraid.txt 2007-04-04 15:23:04.575334394 -0400 +++ b/Documentation/scsi/aacraid.txt 2007-04-04 15:40:08.366873972 -0400 @@ -38,10 +38,10 @@ 9005:0286:9005:02ac Adaptec 1800 (Typhoon44) 9005:0285:9005:02b5 Adaptec 5445 (Voodoo44) 9005:0285:15d9:02b5 SMC AOC-USAS-S4i - 9005:0285:15d9:02c9 SMC AOC-USAS-S4iB + 9005:0285:15d9:02c9 SMC AOC-USAS-S4iR 9005:0285:9005:02b6 Adaptec 5805 (Voodoo80) 9005:0285:15d9:02b6 SMC AOC-USAS-S8i - 9005:0285:15d9:02ca SMC AOC-USAS-S8iB + 9005:0285:15d9:02ca SMC AOC-USAS-S8iR 9005:0285:9005:02b7 Adaptec 5085 (Voodoo08) 9005:0285:9005:02bb Adaptec 3405 (Marauder40LP) 9005:0285:9005:02bc Adaptec 3805 (Marauder80LP) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] aacraid: [Fastboot] Panics for AACRAID driverduring'insmod' for kexec test [take 4] 2007-04-03 15:58 ` [PATCH] aacraid: [Fastboot] Panics for AACRAID driverduring'insmod' for kexec test [take 4] Salyzyn, Mark 2007-04-03 16:09 ` James Bottomley @ 2007-04-03 16:54 ` Judith Lebzelter 1 sibling, 0 replies; 5+ messages in thread From: Judith Lebzelter @ 2007-04-03 16:54 UTC (permalink / raw) To: Salyzyn, Mark; +Cc: James Bottomley, Judith Lebzelter, vgoyal, linux-scsi Hi Mark, I was going to try and test this patch rather than the last, but I am getting this compile error again where line 640 is the beginning of function aac_rx_init(): CC [M] drivers/scsi/aacraid/rx.o drivers/scsi/aacraid/rx.c: In function '_aac_rx_init': drivers/scsi/aacraid/rx.c:640: warning: ISO C90 forbids mixed declarations and code drivers/scsi/aacraid/rx.c:649: error: expected declaration or statement at end of input drivers/scsi/aacraid/rx.c:649: warning: control reaches end of non-void function make[3]: *** [drivers/scsi/aacraid/rx.o] Error 1 make[2]: *** [drivers/scsi/aacraid] Error 2 make[1]: *** [drivers/scsi] Error 2 make: *** [drivers] Error 2 I applied it to the scsi-misc tree I pulled yesterday after removing the old patch. Judith On Tue, Apr 03, 2007 at 11:58:17AM -0400, Salyzyn, Mark wrote: > I will do you one better, James, I will slip in a little cleanup in sa.c (support file for the old PPC based ARC cards) where I discovered the restart platform function was ALSO left unset which could result in similar pain of null pointer discovery. > > Please note: The issue Judith ran into, where the card took longer than 3 minutes to initialize because of a problem drive may require the extension of the timeout to address (insmod parameter aacraid.startup_timeout=540 may do the trick). Extending the timeout may have been a fact of life given that the restart of the adapter normally occurs on BIOS load long before the driver instantiates settling the problem drives; if this is the case a small and lower priority follow-up hardening patch can help the users that find adding the insmod parameter repugnant in order to support kexec and kdump in the face of problem drives. Problem drives may have lead to the need to get a kernel dump ... > > You will find enclosed the pristine patch based on the initial patch, dropping the static function, and adding the three missing platform function initializations. > > Attached is the patch I feel will address this interrupt issue. As an added 'perk' I have also added the code to detect if the controller was previously initialized for interrupted operations by ANY operating system should the reset_devices kernel parameter not be set and we are dealing with a naïve kexec without the addition of this kernel parameter. The reset handler is also improved. Related to reset operations, but not pertinent specifically to this issue, I have also altered the handling somewhat so that we reset the adapter if we feel it is taking too long (three minutes) to start up. > > ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patches. > > This attached patch is against current scsi-misc-2.6 MINUS the initial version of this patch and the first patch that sets the missing platform function related to this discussion. > > Signed-off-by: Mark Salyzyn <aacraid@adaptec.com> > > --- > > Sincerely -- Mark Salyzyn > > > -----Original Message----- > > From: James Bottomley [mailto:James.Bottomley@SteelEye.com] > > Sent: Tuesday, April 03, 2007 10:52 AM > > To: Salyzyn, Mark > > Cc: Judith Lebzelter; vgoyal@in.ibm.com > > Subject: RE: [PATCH] aacraid: [Fastboot] Panics for AACRAID > > driverduring'insmod' for kexec test. > > > > > > On Tue, 2007-04-03 at 09:30 -0400, Salyzyn, Mark wrote: > > > 0x48 status code means the Firmware is trying to boot the > > Kernel. This > > > phase is most likely blocked because of the hard drive > > failure as you > > > suspected; the kernel is not declared up and running until after the > > > drives have spun up, and a problem drive could be tricking > > the Firmware > > > into a recovery loop holding things back ... > > > > I'm constructing what I hope will be the last pre 2.6.21 > > merge tree ... > > do you have a clean patch with the two necessary fixes for > > the panic you > > can send to the list? > > > > James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-04-04 19:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1175611901.3645.3.camel@mulgrave.il.steeleye.com>
2007-04-03 15:58 ` [PATCH] aacraid: [Fastboot] Panics for AACRAID driverduring'insmod' for kexec test [take 4] Salyzyn, Mark
2007-04-03 16:09 ` James Bottomley
2007-04-03 16:14 ` [PATCH] aacraid: [Fastboot] Panics for AACRAIDdriverduring'insmod' for kexec test [take 5] Salyzyn, Mark
2007-04-04 19:49 ` [PATCH] aacraid: Correct SMC products in aacraid.txt Salyzyn, Mark
2007-04-03 16:54 ` [PATCH] aacraid: [Fastboot] Panics for AACRAID driverduring'insmod' for kexec test [take 4] Judith Lebzelter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox