All of lore.kernel.org
 help / color / mirror / Atom feed
* sym53c8xx parity errors on SuSE 9.1's hwscan?
@ 2004-09-22 23:16 Matthias Andree
  2004-09-23  2:39 ` Matthew Wilcox
  0 siblings, 1 reply; 6+ messages in thread
From: Matthias Andree @ 2004-09-22 23:16 UTC (permalink / raw)
  To: linux-scsi; +Cc: Matthew Wilcox

Greetings,

SuSE Linux 9.1 (kernel 2.6.5 + SuSE patch set, but also 2.6.7 or a bit
milder in 2.6.9-rc2-mm1) uses some SuSE-specific "hwprobe" or "hwinfo"
tool to scan for hardware.

Whenever this tool, probing the PCI bus, hits my Tekram DC-390U or
DC-310U, the box logs a SCSI parity error, some timed out abort
messages, then the usual reset escalation; device reset (times out), bus
reset (times out), finally HBA reset (succeeds). The whole procedure
takes about 1 to 2 minutes before the bus is usable again.

If the machine is idle and has warm caches, I may occasionally see just
the parity error message, so it may look a bit like a race in sym53c8xx
or perhaps the hardware.

The problem seems to persist through current versions, although they can
_usually_ just fix up a phase error and continue immediately.

The problem did not exist when I was running 2.6.7 on SuSE 8.2.

I find it a bit intimidating that user-space (albeit with root
permissions) causes "SCSI" parity errors, and given the 2.6.9 logging
towards the end of the mail, I am wondering if SuSE's hwinfo stuff
triggers some race condition or manages to bypass the SCSI phase state
machine or if the probe confuses the chip. I haven't yet managed to
isolate (with strace) the cause.

Is there a useful debug setting for sym53c8xx that could shed some light
on what the user-space has attempted that led to the SCSI parity error?

Log from SuSE's 2.6.5-7.108-default during system boot-up:

Sep 22 15:15:09 merlin kernel: sym0: SCSI parity error detected: SCR1=3 DBC=50000000 SBCL=0
Sep 22 15:15:39 merlin kernel: sym0:1:0: ABORT operation started.
Sep 22 15:15:44 merlin kernel: sym0:1:0: ABORT operation timed-out.
Sep 22 15:15:44 merlin kernel: sym0:1:0: ABORT operation started.
Sep 22 15:15:49 merlin kernel: sym0:1:0: ABORT operation timed-out.
Sep 22 15:15:49 merlin kernel: sym0:1:0: DEVICE RESET operation started.
Sep 22 15:15:54 merlin kernel: sym0:1:0: DEVICE RESET operation timed-out.
Sep 22 15:15:54 merlin kernel: sym0:1:0: BUS RESET operation started.
Sep 22 15:15:54 merlin kernel: sym0: SCSI BUS reset detected.
Sep 22 15:15:54 merlin kernel: sym0: SCSI BUS has been reset.
Sep 22 15:15:54 merlin kernel: sym0:1:0: BUS RESET operation complete.

and a while later:

Sep 22 15:18:32 merlin kernel: sym0: SCSI parity error detected: SCR1=3 DBC=50000000 SBCL=0
Sep 22 15:18:32 merlin kernel: sym0: interrupted SCRIPT address not found.
Sep 22 15:18:32 merlin kernel: sym0: SCSI BUS reset detected.
Sep 22 15:18:32 merlin kernel: sym0: SCSI BUS has been reset.

2.6.9-rc2-mm1, although otherwise not useful for me (UDP networking)
logs this instead:

Sep 22 14:00:05 merlin kernel: sym0: SCSI parity error detected: SCR1=3 DBC=50000000 SBCL=0
Sep 22 14:00:05 merlin kernel: sym0: SCSI phase error fixup: CCB already dequeued.
Sep 22 14:00:05 merlin kernel: sym0: SCSI BUS reset detected.
Sep 22 14:00:05 merlin kernel: sym0: SCSI BUS has been reset.

-- 
Matthias Andree

Encrypted mail welcome: my GnuPG key ID is 0x052E7D95 (PGP/MIME preferred)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-10-25  8:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-22 23:16 sym53c8xx parity errors on SuSE 9.1's hwscan? Matthias Andree
2004-09-23  2:39 ` Matthew Wilcox
2004-09-23  7:42   ` Olaf Hering
2004-09-23  8:51     ` Matthias Andree
2004-09-23  8:45   ` Matthias Andree
2004-10-25  8:16   ` Matthias Andree

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.