* Re: 2.5.63-mm2 [not found] ` <1046815078.12931.79.camel@ibm-b> @ 2003-03-05 7:40 ` Andrew Morton 2003-03-05 17:38 ` 2.5.63-mm2 Mike Anderson 0 siblings, 1 reply; 6+ messages in thread From: Andrew Morton @ 2003-03-05 7:40 UTC (permalink / raw) To: Mark Wong; +Cc: linux-scsi Mark Wong <markw@osdl.org> wrote: > > It appears something is conflicting with the old Adapatec AIC7xxx. My > system halts when it attempts to probe the devices (I think it's that.) > So I started using the new AIC7xxx driver and all is well. I don't see > any messages to the console that points to any causes. Is there > someplace I can look for a clue to the problem? > > I actually didn't realize I was using the old driver and have no qualms > about not using it, but if it'll help someone else, I can help gather > information. > Well aic7xxx_old fails for me in 2.5.62, as well as later kernels. Always in the same way. It gets stuck at boot: (gdb) bt #0 wait_for_completion (x=0xc8dbfea8) at kernel/sched.c:1408 #1 0xc024295d in scsi_wait_req (SRpnt=0xf7e22a00, cmnd=0xc8dbfeec, buffer=0xc03fce60, bufflen=36, timeout=6000, retries=3) at drivers/scsi/scsi.c:673 #2 0xc024839f in scsi_probe_lun (sreq=0xf7e22a00, inq_result=0xc03fce60 "", bflags=0xc8dbff20) at drivers/scsi/scsi_scan.c:1050 #3 0xc024877e in scsi_probe_and_add_lun (host=0xf7e92600, q=0xc8dbff80, channel=0, id=2, lun=0, bflagsp=0xc8dbff54) at drivers/scsi/scsi_scan.c:1372 #4 0xc0248a8e in scsi_scan_target (shost=0xf7e92600, q=0xc8dbff80, channel=0, id=2) at drivers/scsi/scsi_scan.c:1835 #5 0xc0248b7e in scsi_scan_host (shost=0xf7e92600) at drivers/scsi/scsi_scan.c:1892 #6 0xc0243b9e in scsi_add_host (shost=0xf7e92600, dev=0x0) at drivers/scsi/hosts.c:314 #7 0xc0243fbd in scsi_register_host (shost_tp=<incomplete type>) at drivers/scsi/hosts.c:513 #8 0xc03b3259 in init_this_scsi_driver () at drivers/scsi/scsi_module.c:38 #9 0xc03a07d5 in do_initcalls () at init/main.c:472 #10 0xc03a0803 in do_basic_setup () at init/main.c:497 #11 0xc01050a7 in init (unused=0x0) at init/main.c:535 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.5.63-mm2 2003-03-05 7:40 ` 2.5.63-mm2 Andrew Morton @ 2003-03-05 17:38 ` Mike Anderson 2003-03-05 18:52 ` 2.5.63-mm2 Mike Anderson 0 siblings, 1 reply; 6+ messages in thread From: Mike Anderson @ 2003-03-05 17:38 UTC (permalink / raw) To: Andrew Morton; +Cc: Mark Wong, linux-scsi I can also reproduce the problem on my system now that I switch from new AIC7xxx to old AIC7xxx. I am looking at the problem now. Andrew Morton [akpm@digeo.com] wrote: > Mark Wong <markw@osdl.org> wrote: > > > > It appears something is conflicting with the old Adapatec AIC7xxx. My > > system halts when it attempts to probe the devices (I think it's that.) > > So I started using the new AIC7xxx driver and all is well. I don't see > > any messages to the console that points to any causes. Is there > > someplace I can look for a clue to the problem? > > > > I actually didn't realize I was using the old driver and have no qualms > > about not using it, but if it'll help someone else, I can help gather > > information. > > > > Well aic7xxx_old fails for me in 2.5.62, as well as later kernels. > Always in the same way. It gets stuck at boot: > > (gdb) bt > #0 wait_for_completion (x=0xc8dbfea8) at kernel/sched.c:1408 > #1 0xc024295d in scsi_wait_req (SRpnt=0xf7e22a00, cmnd=0xc8dbfeec, buffer=0xc03fce60, bufflen=36, timeout=6000, > retries=3) at drivers/scsi/scsi.c:673 > #2 0xc024839f in scsi_probe_lun (sreq=0xf7e22a00, inq_result=0xc03fce60 "", bflags=0xc8dbff20) > at drivers/scsi/scsi_scan.c:1050 > #3 0xc024877e in scsi_probe_and_add_lun (host=0xf7e92600, q=0xc8dbff80, channel=0, id=2, lun=0, bflagsp=0xc8dbff54) > at drivers/scsi/scsi_scan.c:1372 > #4 0xc0248a8e in scsi_scan_target (shost=0xf7e92600, q=0xc8dbff80, channel=0, id=2) at drivers/scsi/scsi_scan.c:1835 > #5 0xc0248b7e in scsi_scan_host (shost=0xf7e92600) at drivers/scsi/scsi_scan.c:1892 > #6 0xc0243b9e in scsi_add_host (shost=0xf7e92600, dev=0x0) at drivers/scsi/hosts.c:314 > #7 0xc0243fbd in scsi_register_host (shost_tp=<incomplete type>) at drivers/scsi/hosts.c:513 > #8 0xc03b3259 in init_this_scsi_driver () at drivers/scsi/scsi_module.c:38 > #9 0xc03a07d5 in do_initcalls () at init/main.c:472 > #10 0xc03a0803 in do_basic_setup () at init/main.c:497 > #11 0xc01050a7 in init (unused=0x0) at init/main.c:535 > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.5.63-mm2 2003-03-05 17:38 ` 2.5.63-mm2 Mike Anderson @ 2003-03-05 18:52 ` Mike Anderson 2003-03-05 21:33 ` 2.5.63-mm2 Patrick Mansfield 0 siblings, 1 reply; 6+ messages in thread From: Mike Anderson @ 2003-03-05 18:52 UTC (permalink / raw) To: Andrew Morton, Mark Wong, linux-scsi The patch below fixed the problem on my system. I had my list empty checks reversed if aborting and bus device reset failed. The condition that causes the error handler to run is still unknown. I will look at it when I get a chance. Mike Anderson [andmike@us.ibm.com] wrote: > I can also reproduce the problem on my system now that I switch from new > AIC7xxx to old AIC7xxx. I am looking at the problem now. > > Andrew Morton [akpm@digeo.com] wrote: > > Mark Wong <markw@osdl.org> wrote: > > > > > > It appears something is conflicting with the old Adapatec AIC7xxx. My > > > system halts when it attempts to probe the devices (I think it's that.) > > > So I started using the new AIC7xxx driver and all is well. I don't see > > > any messages to the console that points to any causes. Is there > > > someplace I can look for a clue to the problem? > > > > > > I actually didn't realize I was using the old driver and have no qualms > > > about not using it, but if it'll help someone else, I can help gather > > > information. > > > > > -andmike -- Michael Anderson andmike@us.ibm.com ===== name: 00_scsi_error_ready_devs-1.diff version: 2003-03-05.10:39:28-0800 against: 2.5.63 scsi_error.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) ===== ===== drivers/scsi/scsi_error.c 1.38 vs edited ===== --- 1.38/drivers/scsi/scsi_error.c Sat Feb 22 08:17:01 2003 +++ edited/drivers/scsi/scsi_error.c Wed Mar 5 10:14:22 2003 @@ -1490,9 +1490,9 @@ struct list_head *work_q, struct list_head *done_q) { - if (scsi_eh_bus_device_reset(shost, work_q, done_q)) - if (scsi_eh_bus_reset(shost, work_q, done_q)) - if (scsi_eh_host_reset(work_q, done_q)) + if (!scsi_eh_bus_device_reset(shost, work_q, done_q)) + if (!scsi_eh_bus_reset(shost, work_q, done_q)) + if (!scsi_eh_host_reset(work_q, done_q)) scsi_eh_offline_sdevs(work_q, done_q); } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.5.63-mm2 2003-03-05 18:52 ` 2.5.63-mm2 Mike Anderson @ 2003-03-05 21:33 ` Patrick Mansfield 2003-03-05 22:01 ` 2.5.63-mm2 Mike Anderson 2003-03-06 7:38 ` 2.5.63-mm2 Matthew Jacob 0 siblings, 2 replies; 6+ messages in thread From: Patrick Mansfield @ 2003-03-05 21:33 UTC (permalink / raw) To: Andrew Morton, Mark Wong, linux-scsi On Wed, Mar 05, 2003 at 10:52:31AM -0800, Mike Anderson wrote: > The patch below fixed the problem on my system. I had my list empty > checks reversed if aborting and bus device reset failed. The condition > that causes the error handler to run is still unknown. I will look at it > when I get a chance. Mike - With your patch, I am able to boot again using the feral driver with an isp1020 (on a NUMAQ system). Though I still do not know why the feral gets a time out but qlogicisp does not. It is apparently a read that times out while mounting root for the first time (readonly), so this is not the first read sent to the drive (partition code should have sent a read). It could be the queue_depth settings (qlogicisp sets to 1, I have feral setting them to 16). On boot with (SCSI error logging on) I see: [ ... ] TCP: Hash tables configured (established 524288 bind 65536) NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. isp6: Loop ID 125, AL_PA 0x1, Port ID 0x1, Loop State 0x2, Topology 'Private Loop' Error handler scsi_eh_0 waking up Error handler scsi_eh_0 waking up scsi_eh_prt_fail_stats: 0:0:0:0 cmds failed: 0, cancel: 1 Total of 1 commands on 1 devices require eh work scsi_eh_0: aborting cmd:0xc3f4e600 scsi_eh_0: aborting cmd failed:0xc3f4e600 scsi_eh_0: Sending BDR sdev: 0xc3fba600 isp0: Interrupting Mailbox Command (0x17) Timeout isp0: Mailbox Command 'ABORT TARGET' failed (TIMEOUT) scsi_eh_0: BDR failed sdev:0xc3fba600 scsi_eh_0: Sending BRST chan: 0 scsi_try_bus_reset: Snd Bus RST isp0: Interrupting Mailbox Command (0x18) Timeout isp0: Mailbox Command 'BUS RESET' failed (TIMEOUT) scsi_eh_0: BRST failed chan: 0 scsi_eh_0: Sending HRST scsi_try_host_reset: Snd Host RST isp0: Differential Mode isp0: Ultra Mode Capable isp0: Board Type 1040B, Chip Revision 0x5, loaded F/W Revision 4.66.0 isp0: Last F/W revision was 4.40.0 scsi_eh_done scmd: c3f4e600 result: 2 scsi_send_eh_cmnd: scmd: c3f4e600, rtn:2002 scsi_send_eh_cmnd: scsi_eh_completed_normally 2001 scsi_eh_tur: scmd c3f4e600 rtn 2002 scsi_eh_0: flush retry cmd: c3f4e600 scsi_restart_operations: waking up host to restart kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. Freeing unused kernel memory: 112k freed INIT: version 2.78 booting [ ... ] -- Patrick Mansfield ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.5.63-mm2 2003-03-05 21:33 ` 2.5.63-mm2 Patrick Mansfield @ 2003-03-05 22:01 ` Mike Anderson 2003-03-06 7:38 ` 2.5.63-mm2 Matthew Jacob 1 sibling, 0 replies; 6+ messages in thread From: Mike Anderson @ 2003-03-05 22:01 UTC (permalink / raw) To: Patrick Mansfield; +Cc: Andrew Morton, Mark Wong, linux-scsi Thanks for trying this Patrick. I have added this to my list of systems available that display error handling signatures for future testing. Patrick Mansfield [patmans@us.ibm.com] wrote: > On Wed, Mar 05, 2003 at 10:52:31AM -0800, Mike Anderson wrote: > > The patch below fixed the problem on my system. I had my list empty > > checks reversed if aborting and bus device reset failed. The condition > > that causes the error handler to run is still unknown. I will look at it > > when I get a chance. > > Mike - > > With your patch, I am able to boot again using the feral driver with an > isp1020 (on a NUMAQ system). > > Though I still do not know why the feral gets a time out but qlogicisp > does not. It is apparently a read that times out while mounting root for > the first time (readonly), so this is not the first read sent to the drive > (partition code should have sent a read). It could be the queue_depth > settings (qlogicisp sets to 1, I have feral setting them to 16). > -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.5.63-mm2 2003-03-05 21:33 ` 2.5.63-mm2 Patrick Mansfield 2003-03-05 22:01 ` 2.5.63-mm2 Mike Anderson @ 2003-03-06 7:38 ` Matthew Jacob 1 sibling, 0 replies; 6+ messages in thread From: Matthew Jacob @ 2003-03-06 7:38 UTC (permalink / raw) To: Patrick Mansfield; +Cc: Andrew Morton, Mark Wong, linux-scsi > Though I still do not know why the feral gets a time out but qlogicisp > does not. It is apparently a read that times out while mounting root for > the first time (readonly), so this is not the first read sent to the drive The command in question that's timing out is "ABOUT FIRMWARE". If at reset you see the pattern "ISP " in mailbox registers 1, 2, 3 you're supposed to be at the hard PROM- i.e., no firmware has been loaded and set running by system boot procedures. Unfortunately you get some platforms where if this is not set this means that the card is neither in hard reset state (i.e., running out of it's prom) nor actually running f/w. Hence the timeout on the command. I've always considered this a relatively minor buglet as it doesn't occur on *that* many systems but I should probably fix it. -matt ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-03-06 7:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20030302180959.3c9c437a.akpm@digeo.com>
[not found] ` <1046815078.12931.79.camel@ibm-b>
2003-03-05 7:40 ` 2.5.63-mm2 Andrew Morton
2003-03-05 17:38 ` 2.5.63-mm2 Mike Anderson
2003-03-05 18:52 ` 2.5.63-mm2 Mike Anderson
2003-03-05 21:33 ` 2.5.63-mm2 Patrick Mansfield
2003-03-05 22:01 ` 2.5.63-mm2 Mike Anderson
2003-03-06 7:38 ` 2.5.63-mm2 Matthew Jacob
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox