* Serverworks OSB4 in impossible state
@ 2002-06-10 15:52 Martin Wilck
2002-06-10 16:41 ` Daniela Engert
2002-06-12 8:58 ` Alan Cox
0 siblings, 2 replies; 18+ messages in thread
From: Martin Wilck @ 2002-06-10 15:52 UTC (permalink / raw)
To: osb4-bug; +Cc: Linux Kernel mailing list, Martin Wilck
[-- Attachment #1: Type: text/plain, Size: 1574 bytes --]
Hello,
I know a similar problem was discussed here a short while ago.
However we have here a situation where we can reproduce the problem
reliably. This is a RedHat 2.4.18-4 kernel.
We have a CD with a corrupt last block. If we try to read this block in
PIO mode (hdparm -d 0 /dev/hdc) , we get errors like in the first
attachment.
The machine has only a CDROM (Mitsumi FX 4830T) attached to the IDE bus
as /dev/hdc. We used no IDE-related boot parameters.
If we read the block in DMA mode (with dd), the machine stalls with the
"impossible state" message.
A PCI bus scan reveals that the IO register (dma_base+2) contains indeed
0xa5 (bit 0 set), which leads to the panic. Normally the read on that
register returns 0xa0.
We see in our PCI bus scan that a successful DMA of 4096 bytes was
carried out ~23ms before the stall condition. Another 4096 byte request
was scheduled but never seen. Between the successful DMA and the stall
condition we see nothing but a few timer interrupts.
Then an IDE interrupt occurs, which leads immediately to the panic.
The CD-ROM drive certainly reports some sort of error like in the PIO
case when tyring to access the last block. This seems to be the
(indirect) reason why the Bus master bit in (dma_base+2) remains set
long after the DMA is finished.
Any ideas/comments?
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
[-- Attachment #2: Kernel error messages in PIO-mode --]
[-- Type: text/plain, Size: 3573 bytes --]
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: error=0x50
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: error=0x50
Jun 10 13:12:40 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307712
Jun 10 13:12:49 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:49 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:12:56 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:56 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:06 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:06 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:09 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:09 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:09 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:13 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:13 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:17 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:17 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:21 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:21 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:21 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:21 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307716
Jun 10 13:13:21 pdb0384c kernel: hdc: command error: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:21 pdb0384c kernel: hdc: command error: error=0x50
Jun 10 13:13:21 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307712
Jun 10 13:13:25 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:25 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:29 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:29 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:33 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:33 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:36 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:36 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:36 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:41 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:41 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:44 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:44 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:48 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:48 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:48 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:48 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307716
[-- Attachment #3: dmesg --]
[-- Type: text/plain, Size: 15690 bytes --]
ACPI table found: RSDT v1 [PTLTD RSDT 1540.1]
__va_range(0xbfefc0f9, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefc0f9, 0x74): idx=8 mapped at ffff6000
ACPI table found: FACP v1 [FSC D1309 1540.1]
__va_range(0xbfefeef8, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefeef8, 0x50): idx=8 mapped at ffff6000
ACPI table found: SPCR v1 [PTLTD $UCRTBL$ 1540.1]
__va_range(0xbfefef48, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefef48, 0x90): idx=8 mapped at ffff6000
ACPI table found: APIC v1 [PTLTD APIC 1540.1]
__va_range(0xbfefef48, 0x90): idx=8 mapped at ffff6000
LAPIC (acpi_id[0x0000] id[0x6] enabled[1])
CPU 0 (0x0600) enabledProcessor #6 Unknown CPU [15:2] APIC version 16
LAPIC (acpi_id[0x0001] id[0x0] enabled[1])
CPU 1 (0x0000) enabledProcessor #0 Unknown CPU [15:2] APIC version 16
LAPIC (acpi_id[0x0002] id[0x1] enabled[1])
CPU 2 (0x0100) enabledProcessor #1 Unknown CPU [15:2] APIC version 16
LAPIC (acpi_id[0x0003] id[0x7] enabled[1])
CPU 3 (0x0700) enabledProcessor #7 Unknown CPU [15:2] APIC version 16
IOAPIC (id[0x2] address[0xfec00000] global_irq_base[0x0])
IOAPIC (id[0x3] address[0xfec10000] global_irq_base[0x10])
INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x1] trigger[0x1])
INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x3] trigger[0x3])
LAPIC_NMI (acpi_id[0x0000] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0001] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0002] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0003] polarity[0x1] trigger[0x1] lint[0x1])
4 CPUs total
Local APIC address fee00000
__va_range(0xbfefefd8, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefefd8, 0x28): idx=8 mapped at ffff6000
ACPI table found: BOOT v1 [PTLTD $SBFTBL$ 1540.1]
Enabling the CPU's according to the ACPI table
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: FSCD1309 Product ID: PRIMERGY APIC at: 0xFEE00000
I/O APIC #2 Version 17 at 0xFEC00000.
I/O APIC #3 Version 17 at 0xFEC10000.
Processors: 4
Kernel command line: ro root=/dev/sda2
Initializing CPU#0
Detected 2395.457 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 4771.02 BogoMIPS
Memory: 3098776k/3145728k available (1232k kernel code, 46496k reserved, 842k data, 304k init, 2228160k highmem)
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode cache hash table entries: 262144 (order: 9, 2097152 bytes)
Mount-cache hash table entries: 65536 (order: 7, 524288 bytes)
Buffer cache hash table entries: 262144 (order: 8, 1048576 bytes)
Page-cache hash table entries: 524288 (order: 9, 2097152 bytes)
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: 3febfbff 00000000 00000000 00000000
CPU: Common caps: 3febfbff 00000000 00000000 00000000
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: 3febfbff 00000000 00000000 00000000
CPU: Common caps: 3febfbff 00000000 00000000 00000000
CPU0: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
per-CPU timeslice cutoff: 1462.93 usecs.
task migration cache decay timeout: 10 msecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Booting processor 1/0 eip 2000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4784.12 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#1.
CPU: After generic, caps: 3febfbff 00000000 00000000 00000000
CPU: Common caps: 3febfbff 00000000 00000000 00000000
CPU1: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
Booting processor 2/1 eip 2000
Initializing CPU#2
masked ExtINT on CPU#2
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4784.12 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#2.
CPU: After generic, caps: 3febfbff 00000000 00000000 00000000
CPU: Common caps: 3febfbff 00000000 00000000 00000000
CPU2: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
Booting processor 3/7 eip 2000
Initializing CPU#3
masked ExtINT on CPU#3
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4784.12 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#3.
CPU: After generic, caps: 3febfbff 00000000 00000000 00000000
CPU: Common caps: 3febfbff 00000000 00000000 00000000
CPU3: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
Total of 4 processors activated (19123.40 BogoMIPS).
cpu_sibling_map[0] = 3
cpu_sibling_map[1] = 2
cpu_sibling_map[2] = 1
cpu_sibling_map[3] = 0
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
Setting 3 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 3 ... ok.
init IO_APIC IRQs
IO-APIC (apicid-pin) 2-0, 2-10, 2-11, 3-0, 3-1, 3-2, 3-3, 3-4, 3-5, 3-6, 3-7, 3-8, 3-9, 3-10, 3-11, 3-12, 3-15 not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
number of MP IRQ sources: 16.
number of IO-APIC #2 registers: 16.
number of IO-APIC #3 registers: 16.
testing the IO APIC.......................
IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
.... register #01: 000F0011
....... : max redirection entries: 000F
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 02000000
....... : arbitration: 02
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 00F 0F 0 0 0 0 0 1 1 39
02 00F 0F 0 0 0 0 0 1 1 31
03 00F 0F 0 0 0 0 0 1 1 41
04 00F 0F 0 0 0 0 0 1 1 49
05 00F 0F 0 0 0 0 0 1 1 51
06 00F 0F 0 0 0 0 0 1 1 59
07 00F 0F 0 0 0 0 0 1 1 61
08 00F 0F 0 0 0 0 0 1 1 69
09 00F 0F 1 1 0 1 0 1 1 71
0a 000 00 1 0 0 0 0 0 0 00
0b 000 00 1 0 0 0 0 0 0 00
0c 00F 0F 0 0 0 0 0 1 1 79
0d 00F 0F 0 0 0 0 0 1 1 81
0e 00F 0F 0 0 0 0 0 1 1 89
0f 00F 0F 0 0 0 0 0 1 1 91
IO APIC #3......
.... register #00: 03000000
....... : physical APIC id: 03
.... register #01: 000F0011
....... : max redirection entries: 000F
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 03000000
....... : arbitration: 03
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 000 00 1 0 0 0 0 0 0 00
02 000 00 1 0 0 0 0 0 0 00
03 000 00 1 0 0 0 0 0 0 00
04 000 00 1 0 0 0 0 0 0 00
05 000 00 1 0 0 0 0 0 0 00
06 000 00 1 0 0 0 0 0 0 00
07 000 00 1 0 0 0 0 0 0 00
08 000 00 1 0 0 0 0 0 0 00
09 000 00 1 0 0 0 0 0 0 00
0a 000 00 1 0 0 0 0 0 0 00
0b 000 00 1 0 0 0 0 0 0 00
0c 000 00 1 0 0 0 0 0 0 00
0d 00F 0F 1 1 0 1 0 1 1 99
0e 00F 0F 1 1 0 1 0 1 1 A1
0f 000 00 1 0 0 0 0 0 0 00
IRQ to pin mappings:
IRQ0 -> 0:2
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ29 -> 1:13
IRQ30 -> 1:14
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 2395.2498 MHz.
..... host bus clock speed is 99.8019 MHz.
cpu: 0, clocks: 998019, slice: 199603
CPU0<T0:998016,T1:798400,D:13,S:199603,C:998019>
cpu: 2, clocks: 998019, slice: 199603
cpu: 3, clocks: 998019, slice: 199603
cpu: 1, clocks: 998019, slice: 199603
CPU1<T0:998016,T1:598800,D:10,S:199603,C:998019>
CPU2<T0:998016,T1:399200,D:7,S:199603,C:998019>
CPU3<T0:998016,T1:199600,D:4,S:199603,C:998019>
checking TSC synchronization across CPUs: passed.
PCI: PCI BIOS revision 2.10 entry at 0xfd9aa, last bus=2
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Discovered primary peer bus 01 [IRQ]
PCI: Discovered primary peer bus 02 [IRQ]
PCI->APIC IRQ transform: (B0,I5,P0) -> 30
PCI->APIC IRQ transform: (B0,I15,P0) -> 9
PCI->APIC IRQ transform: (B2,I10,P0) -> 29
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS not found.
Starting kswapd
allocated 64 pages and 64 bhs reserved for the highmem bounces
VFS: Diskquotas version dquot_6.5.0 initialized
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
block: 1024 slots per queue, batch=256
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks CSB5: IDE controller on PCI bus 00 dev 79
SvrWks CSB5: chipset revision 147
SvrWks CSB5: not 100% native mode: will probe irqs later
SvrWks CSB5: simplex device: DMA forced
ide0: BM-DMA at 0x1800-0x1807, BIOS settings: hda:pio, hdb:pio
SvrWks CSB5: simplex device: DMA forced
ide1: BM-DMA at 0x1808-0x180f, BIOS settings: hdc:pio, hdd:pio
hdc: FX4830T, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
ide-floppy driver 0.99.newide
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
pci_hotplug: PCI Hot Plug PCI Core version: 0.4
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 32768 buckets, 256Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 220k freed
VFS: Mounted root (ext2 filesystem).
SCSI subsystem driver Revision: 1.00
kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
sym53c8xx: at PCI bus 2, device 10, function 0
sym53c8xx: 53c1010-66 detected with Symbios NVRAM
sym53c1010-66-0: rev 0x1 on pci bus 2 device 10 function 0 irq 29
sym53c1010-66-0: Symbios format NVRAM, ID 7, Fast-80, Parity Checking
sym53c1010-66-0: on-chip RAM at 0xfe000000
sym53c1010-66-0: restart (scsi reset).
sym53c1010-66-0: handling phase mismatch from SCRIPTS.
sym53c1010-66-0: Downloading SCSI SCRIPTS.
scsi0 : sym53c8xx-1.7.3c-20010512
blk: queue f7fd6e18, I/O limit 4095Mb (mask 0xffffffff)
Vendor: SEAGATE Model: ST318451LC Rev: 7500
Type: Direct-Access ANSI SCSI revision: 03
blk: queue f7fd6c18, I/O limit 4095Mb (mask 0xffffffff)
Vendor: SEAGATE Model: ST318451LC Rev: 7500
Type: Direct-Access ANSI SCSI revision: 03
blk: queue f7fd6a18, I/O limit 4095Mb (mask 0xffffffff)
Vendor: SDR Model: GEM318 Rev: 0
Type: Processor ANSI SCSI revision: 02
blk: queue f7308e18, I/O limit 4095Mb (mask 0xffffffff)
sym53c1010-66-0-<0,0>: tagged command queue depth set to 8
sym53c1010-66-0-<1,0>: tagged command queue depth set to 8
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
sym53c1010-66-0-<0,*>: FAST-80 WIDE SCSI 160.0 MB/s (12.5 ns, offset 62)
SCSI device sda: 35843671 512-byte hdwr sectors (18352 MB)
Partition check:
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 >
sym53c1010-66-0-<1,*>: FAST-80 WIDE SCSI 160.0 MB/s (12.5 ns, offset 62)
SCSI device sdb: 35843671 512-byte hdwr sectors (18352 MB)
sdb:
Journalled Block Device driver loaded
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 304k freed
Adding Swap: 1702848k swap-space (priority -1)
Adding Swap: 1694816k swap-space (priority -2)
Adding Swap: 1702848k swap-space (priority -3)
Adding Swap: 1702848k swap-space (priority -4)
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-ohci.c: USB OHCI at membase 0xf8bc5000, IRQ 9
usb-ohci.c: usb-00:0f.2, ServerWorks OSB4/CSB5 OHCI USB Controller
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 4 ports detected
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on sd(8,2), internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on sd(8,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on sd(8,3), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
parport0: PC-style at 0x378 [PCSPP,TRISTATE,EPP]
eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others
eth0: OEM i82557/i82558 10/100 Ethernet, 00:30:05:29:74:92, IRQ 30.
Board assembly 000000-000, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
hdc: ATAPI 48X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12
hdc: DMA disabled
[-- Attachment #4: /proc/ide/ide1/hdc/settings --]
[-- Type: text/plain, Size: 1199 bytes --]
name value min max mode
---- ----- --- --- ----
breada_readahead 4 0 127 rw
current_speed 66 0 69 rw
dsc_overlap 0 0 1 rw
file_readahead 0 0 2097151 rw
ide_scsi 0 0 1 rw
init_speed 66 0 69 rw
io_32bit 0 0 3 rw
keepsettings 0 0 1 rw
max_kb_per_request 64 1 127 rw
nice1 1 0 1 rw
number 2 0 3 rw
pio_mode write-only 0 255 w
slow 0 0 1 rw
unmaskirq 0 0 1 rw
using_dma 1 0 1 rw
[-- Attachment #5: /proc/ide/svwks --]
[-- Type: text/plain, Size: 785 bytes --]
ServerWorks OSB4/CSB5/CSB6
ServerWorks CSB5 Chipset (rev 93)
------------------------------- General Status ---------------------------------
--------------- Primary Channel ---------------- Secondary Channel -------------
disabled disabled
--------------- drive0 --------- drive1 -------- drive0 ---------- drive1 ------
DMA enabled: no no yes no
UDMA enabled: no no yes no
UDMA enabled: 0 0 2 0
DMA enabled: 2 2 2 2
PIO enabled: ? ? 4 ?
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: Serverworks OSB4 in impossible state
2002-06-10 15:52 Serverworks OSB4 in impossible state Martin Wilck
@ 2002-06-10 16:41 ` Daniela Engert
2002-06-11 7:22 ` Martin Wilck
2002-06-12 8:58 ` Alan Cox
1 sibling, 1 reply; 18+ messages in thread
From: Daniela Engert @ 2002-06-10 16:41 UTC (permalink / raw)
To: Martin Wilck; +Cc: Linux Kernel mailing list
Hello Martin,
On 10 Jun 2002 17:52:58 +0200, Martin Wilck wrote:
>We have a CD with a corrupt last block. If we try to read this block in
>PIO mode (hdparm -d 0 /dev/hdc) , we get errors like in the first
>attachment.
The error code returned is "check condition" with a sense key of 3
"medium error". The most appropriate driver action would have been to
issue a "request sense" command to learn the precise error and retry
only in case of a good chance of a recoverable problem - but that's a
different story.
>If we read the block in DMA mode (with dd), the machine stalls with the
>"impossible state" message.
>
>A PCI bus scan reveals that the IO register (dma_base+2) contains indeed
>0xa5 (bit 0 set), which leads to the panic. Normally the read on that
>register returns 0xa0.
The intersting bits of the DMA status register are bits 0 though 2. A
value of 5 indicates the condition "interrupt from unit, DMA state
machine active". This is a valid status! It basically means the unit
issued an interrupt before the PRD table is exhausted. This makes sense
because the CD-ROM units fails to transfer the amount of data described
by the PRD table because of the non-recoverable read error.
>We see in our PCI bus scan that a successful DMA of 4096 bytes was
>carried out ~23ms before the stall condition. Another 4096 byte request
>was scheduled but never seen. Between the successful DMA and the stall
>condition we see nothing but a few timer interrupts.
>Then an IDE interrupt occurs, which leads immediately to the panic.
What you makes sense (the next DMA transfer is scheduled but never
carried out by the CD-ROM unit) except for the panic, ofcoz. The
correct driver action in this case were stopping the DMA engine and
issuing a reset of the state machines involved (both on the host and
the unit side).
>Any ideas/comments?
I hope this clears up things a little ...
Ciao,
Dani
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-10 16:41 ` Daniela Engert
@ 2002-06-11 7:22 ` Martin Wilck
2002-06-11 7:45 ` Daniela Engert
0 siblings, 1 reply; 18+ messages in thread
From: Martin Wilck @ 2002-06-11 7:22 UTC (permalink / raw)
To: Daniela Engert; +Cc: Linux Kernel mailing list
Am Mon, 2002-06-10 um 18.41 schrieb Daniela Engert:
> The intersting bits of the DMA status register are bits 0 though 2. A
> value of 5 indicates the condition "interrupt from unit, DMA state
> machine active". This is a valid status! It basically means the unit
> issued an interrupt before the PRD table is exhausted. This makes sense
> because the CD-ROM units fails to transfer the amount of data described
> by the PRD table because of the non-recoverable read error.
Shouldn't the error bit be set too? (But that wouldn't make any
difference with the current driver ...)
> What you makes sense (the next DMA transfer is scheduled but never
> carried out by the CD-ROM unit) except for the panic, ofcoz. The
> correct driver action in this case were stopping the DMA engine and
> issuing a reset of the state machines involved (both on the host and
> the unit side).
The message, the comments in the code, and what Alan wrote here:
http://groups.google.com/groups?hl=de&lr=&threadm=linux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%40boxer.fnal.gov&rnum=2&prev=/groups%3Fq%3Dosb4-bug%2540ide.cabal.tm%26hl%3Dde%26lr%3D%26selm%3Dlinux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%2540boxer.fnal.gov%26rnum%3D2
suggest that trying to recover from this condition is extremely
dangerous (note that the kernel doesn't even panic(), because
a sync() may kill a disk, the comments say).
Anyway, thanks a lot for your insightful comments.
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-11 7:22 ` Martin Wilck
@ 2002-06-11 7:45 ` Daniela Engert
2002-06-11 8:37 ` Martin Wilck
2002-06-11 11:25 ` Martin Wilck
0 siblings, 2 replies; 18+ messages in thread
From: Daniela Engert @ 2002-06-11 7:45 UTC (permalink / raw)
To: Martin Wilck; +Cc: Linux Kernel mailing list
On 11 Jun 2002 09:22:24 +0200, Martin Wilck wrote:
>Am Mon, 2002-06-10 um 18.41 schrieb Daniela Engert:
>> The intersting bits of the DMA status register are bits 0 though 2. A
>> value of 5 indicates the condition "interrupt from unit, DMA state
>> machine active". This is a valid status! It basically means the unit
>> issued an interrupt before the PRD table is exhausted. This makes sense
>> because the CD-ROM units fails to transfer the amount of data described
>> by the PRD table because of the non-recoverable read error.
>
>Shouldn't the error bit be set too? (But that wouldn't make any
>difference with the current driver ...)
No it shouldn't. The error is happening on the unit side and not on the
host side of the bus. Thus it is correct that the host is *not*
reporting an error (which is true) but only the CD-ROM unit.
>> What you makes sense (the next DMA transfer is scheduled but never
>> carried out by the CD-ROM unit) except for the panic, ofcoz. The
>> correct driver action in this case were stopping the DMA engine and
>> issuing a reset of the state machines involved (both on the host and
>> the unit side).
>
>The message, the comments in the code, and what Alan wrote here:
>http://groups.google.com/groups?hl=de&lr=&threadm=linux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%40boxer.fnal.gov&rnum=2&prev=/groups%3Fq%3Dosb4-bug%2540ide.cabal.tm%26hl%3Dde%26lr%3D%26selm%3Dlinux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%2540boxer.fnal.gov%26rnum%3D2
>suggest that trying to recover from this condition is extremely
>dangerous (note that the kernel doesn't even panic(), because
>a sync() may kill a disk, the comments say).
I'm aware of all of that. By pure chance I have a machine with an OSB4
sitting on my desk for a couple of days. May be I can find a defect
CD-ROM to test it with my driver and see if it manages to recover from
errors like these. Hopefully, the PCI tracer gives some more insight.
Ciao,
Dani
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gräfenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-11 7:45 ` Daniela Engert
@ 2002-06-11 8:37 ` Martin Wilck
2002-06-11 11:25 ` Martin Wilck
1 sibling, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2002-06-11 8:37 UTC (permalink / raw)
To: Daniela Engert; +Cc: Linux Kernel mailing list
Am Die, 2002-06-11 um 09.45 schrieb Daniela Engert:
> I'm aware of all of that. By pure chance I have a machine with an OSB4
> sitting on my desk for a couple of days. May be I can find a defect
> CD-ROM to test it with my driver and see if it manages to recover from
> errors like these. Hopefully, the PCI tracer gives some more insight.
Do you have a custom version of the driver (because you write "my
driver")? If yes, can you send it, so that I can test it, too?
Can you point me to any reference material on the web?
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-11 7:45 ` Daniela Engert
2002-06-11 8:37 ` Martin Wilck
@ 2002-06-11 11:25 ` Martin Wilck
2002-06-11 21:27 ` Chris Wedgwood
2002-06-13 11:50 ` Daniela Engert
1 sibling, 2 replies; 18+ messages in thread
From: Martin Wilck @ 2002-06-11 11:25 UTC (permalink / raw)
To: Daniela Engert; +Cc: Linux Kernel mailing list, Alan Cox
[Alan, I am cc'ing you on this because I read elsewhere that you want
osb4-bug@ide.cabal.tm to be forwarded to you, and that address still
bounces].
I have tried the following:
- comment out the code that stalls the machine when the condition in
question is encountered.
- run dd over a couple of good blocks on the CD.
- run dd over the corrupted blocks. This leads now to very similar
errors as in the PIO case.
- reenable DMA with hdparm, because it is automatically disabled by the
ide-cd driver if an error occurs (why that? the error has nothing to
do with DMA here).
- repeat the first dd command on the good blocks and compare the
results.
The results are identical, thus I cannot verify the "4 byte shift" Alan
has been talking about. Of course this is a CD-ROM only scenario, thus
I can't tell anything about hard disks.
Is it possible that the 4-byte shift occurs only with some particular
(older?) version of the chipset?
In any case, the condition that usually causes Linux to stall is
indeed a perfectly valid condition for DMA when the device transfers
less data than it's supposed to. I doubt that hanging the system
without more detailed checks is the right measure to take there.
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-11 11:25 ` Martin Wilck
@ 2002-06-11 21:27 ` Chris Wedgwood
2002-06-12 7:24 ` Martin Wilck
2002-06-13 11:50 ` Daniela Engert
1 sibling, 1 reply; 18+ messages in thread
From: Chris Wedgwood @ 2002-06-11 21:27 UTC (permalink / raw)
To: Martin Wilck; +Cc: Daniela Engert, Linux Kernel mailing list, Alan Cox
On Tue, Jun 11, 2002 at 01:25:25PM +0200, Martin Wilck wrote:
Is it possible that the 4-byte shift occurs only with some
particular (older?) version of the chipset?
Maybe.
I have an oldish OSB4 here and beating on it only with the CDROM
(disks are all SCSI) I don't ever seem to see this problem:
00:00.0 Host bridge: ServerWorks CNB20LE Host Bridge (rev 05)
Flags: bus master, medium devsel, latency 48
00:00.1 Host bridge: ServerWorks CNB20LE Host Bridge (rev 05)
Flags: bus master, medium devsel, latency 48
I think what is really required is input from ServerWorks/Broadcom
about this.
--cw
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: Serverworks OSB4 in impossible state
2002-06-11 21:27 ` Chris Wedgwood
@ 2002-06-12 7:24 ` Martin Wilck
0 siblings, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2002-06-12 7:24 UTC (permalink / raw)
To: Chris Wedgwood; +Cc: Linux Kernel mailing list
Am Die, 2002-06-11 um 23.27 schrieb Chris Wedgwood:
> I have an oldish OSB4 here and beating on it only with the CDROM
> (disks are all SCSI) I don't ever seem to see this problem:
UDMA33 mode? You need to have a broken CD (we happen to have a CD burner
that generates broken CDs)
> I think what is really required is input from ServerWorks/Broadcom
> about this.
Yeah, we are in contact with them.
Thanks,
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-11 11:25 ` Martin Wilck
2002-06-11 21:27 ` Chris Wedgwood
@ 2002-06-13 11:50 ` Daniela Engert
2002-06-13 11:59 ` Martin Wilck
2002-06-13 23:48 ` Re[2]: " Nerijus Baliunas
1 sibling, 2 replies; 18+ messages in thread
From: Daniela Engert @ 2002-06-13 11:50 UTC (permalink / raw)
To: Martin Wilck; +Cc: Alan Cox, Linux Kernel mailing list
Hi,
as promised I've conducted a test similar to Martin's to check the
behaviour of a Serverworks ROSB4 IDE controller in case of an aborted
ATAPI DMA transfer (probably due to a media error). In fact, I've done
this by comparing it with a known-to-be-good system, a dual processor
Intel BX based board with a PIIX4 IDE controller chip.
The following trace shows how it should be:
- lines 158-172: setup DMA transfer, send command packet
- lines 173-174: the DMA engine loads the first (of multiple)
PRD entry
- the actual DMA induced memory writes are not shown here
- line 175: IRQ14 is acknowledged
- lines 176-181: gather unit and DMA status
- lines 182-210: issue "request sense" and get sense status
CD-ROM read error on Intel PIIX4:
______Time_______Burst_BE#__Wait___Command_Address__Data____
158 20.089ms . 1011 . I/OWri 000001F6 ..B0....
159 1.656us . 1011 . I/ORd 000003F6 ..50....
160 10.08us . 0000 . I/OWri 0000F004 00EAC800
161 812.7ns . 1110 . I/OWri 0000F000 ......08
162 752.5ns . 1011 . I/OWri 0000F002 ..46....
163 3.100us . 1110 . I/OWri 000001F4 ......FF
164 4.214us . 1101 . I/OWri 000001F5 ....FF..
165 4.244us . 1101 . I/OWri 000001F1 ....01..
166 4.214us . 0111 . I/OWri 000001F7 A0......
167 5.027us . 1011 . I/ORd 000003F6 ..58....
168 5.448us . 1011 . I/OWri 0000F002 ..46....
169 752.5ns . 1110 . I/OWri 0000F000 ......09
170 1.204us . 0000 . I/OWri 000001F0 00000028
171 903.0ns . 0000 . I/OWri 000001F0 0000440D
172 903.0ns . 0000 . I/OWri 000001F0 0000001F
173 3.673s Start 0000 . MemRd 00EAC800 006E3000
174 30.1ns B 0000 . MemRd 00EAC800 0000D000
175 648.99ms . 1110 . IntAck ........ ......76
176 5.779us . 0111 . I/ORd 000001F7 51......
177 11.47us . 0111 . I/ORd 000001F7 51......
178 1.324us . 1011 . I/ORd 000001F2 ..03....
179 1.957us . 1110 . I/OWri 0000F000 ......08
180 812.7ns . 1011 . I/ORd 0000F002 ..44....
181 1.355us . 1101 . I/ORd 000001F1 ....30..
182 9.361us . 1011 . I/OWri 000001F6 ..B0....
183 1.806us . 1011 . I/ORd 000003F6 ..51....
184 9.301us . 1110 . I/OWri 000001F4 ......12
185 4.274us . 1101 . I/OWri 000001F5 ....00..
186 4.244us . 1101 . I/OWri 000001F1 ....00..
187 4.214us . 0111 . I/OWri 000001F7 A0......
188 4.906us . 1011 . I/ORd 000003F6 ..58....
189 6.020us . 0000 . I/OWri 000001F0 00000003
190 903.0ns . 0000 . I/OWri 000001F0 00000012
191 903.0ns . 0000 . I/OWri 000001F0 00000000
192 258.17us . 1110 . IntAck ........ ......76
193 3.431us . 0111 . I/ORd 000001F7 58......
194 10.08us . 0111 . I/ORd 000001F7 58......
195 1.204us . 1011 . I/ORd 000001F2 ..02....
196 1.535us . 1101 . I/ORd 000001F5 ....00..
197 1.174us . 1110 . I/ORd 000001F4 ......12
198 10.20us . 1100 . I/ORd 000001F0 ....0070
199 1.475us . 1100 . I/ORd 000001F0 ....0003
200 632.1ns . 1100 . I/ORd 000001F0 ....0000
201 632.1ns . 1100 . I/ORd 000001F0 ....0A00
202 602.0ns . 1100 . I/ORd 000001F0 ....0000
203 632.1ns . 1100 . I/ORd 000001F0 ....0000
204 602.0ns . 1100 . I/ORd 000001F0 ....0611
205 632.1ns . 1100 . I/ORd 000001F0 ....0000
206 602.0ns . 1100 . I/ORd 000001F0 ....0000
207 12.79us . 1110 . IntAck ........ ......76
208 3.401us . 0111 . I/ORd 000001F7 50......
209 9.361us . 0111 . I/ORd 000001F7 50......
210 1.234us . 1011 . I/ORd 000001F2 ..03....
And here is the same with the ROSB4. This time, some of the
DMA writes are shown. After loading the second PRD entry
which describes a memory region of 7800h bytes, 3000h bytes
are transferred before IRQ14 is asserted. The IRQ14 INTACK
cycle is the last transaction on the PCI bus ever, the
machine is completely frozen!
CD-ROM read error on ServerWorks ROSB4 revision 0:
______Time_______Burst_BE#__Wait___Command_Address__Data____
51316 297.63us . 1011 . I/OWri 000001F6 ..B0....
51317 1.530us . 1011 . I/ORd 000003F6 ..50....
51318 6.300us . 0000 . I/OWri 00005404 00EF2800
51319 450ns . 1110 . I/OWri 00005400 ......08
51320 450ns . 1011 . I/OWri 00005402 ..66....
51321 1.440us . 1110 . I/OWri 000001F4 ......FF
51322 3.480us . 1101 . I/OWri 000001F5 ....FF..
51323 3.480us . 1101 . I/OWri 000001F1 ....01..
51324 3.510us . 0111 . I/OWri 000001F7 A0......
51325 4.470us . 1011 . I/ORd 000003F6 ..58....
51326 4.620us . 1011 . I/OWri 00005402 ..66....
51327 660ns . 0000 . I/OWri 000001F0 00000028
51328 420ns . 0000 . I/OWri 000001F0 0000F80D
51329 420ns . 0000 . I/OWri 000001F0 0000001F
51330 1.290us . 1011 . I/ORd 000003F6 ..D0....
51331 3.660us . 1110 . I/OWri 00005400 ......09
51332 1.290us . 0000 . MemRd 00EF2800 00B08000
51333 630ns . 0000 . MemRd 00EF2804 00008000
51334 166.11us Start 0000 . MemWri 00B08000 7BC0728C
51335 30ns B 0000 . MemWri 00B08000 285DA7D0
51336 30ns B 0000 . MemWri 00B08000 9FAE557A
51337 30ns B 0000 . MemWri 00B08000 B3F88165
51338 30ns B 0000 . MemWri 00B08000 BDFD7823
51339 30ns B 0000 . MemWri 00B08000 42ED22D0
51340 30ns B 0000 . MemWri 00B08000 7BA5743F
51341 30ns B 0000 . MemWri 00B08000 6B5897BA
51342 780ns Start 0000 . MemWri 00B08020 ACF1D36B
..
..
59518 930ns Start 0000 . MemWri 00B0FFE0 845971B8
59519 30ns B 0000 . MemWri 00B0FFE0 7E325F95
59520 30ns B 0000 . MemWri 00B0FFE0 7ADA36D0
59521 30ns B 0000 . MemWri 00B0FFE0 96BD435C
59522 30ns B 0000 . MemWri 00B0FFE0 4ED88CB0
59523 30ns B 0000 . MemWri 00B0FFE0 2E1CCAF7
59524 30ns B 0000 . MemWri 00B0FFE0 FC8782B3
59525 30ns B 0000 . MemWri 00B0FFE0 9C0A2335
59526 780ns . 0000 . MemRd 00EF2808 00B10000
59527 630ns . 0000 . MemRd 00EF280C 80007800
59528 1.2518ms Start 0000 . MemWri 00B10000 E85C33CD
59529 30ns B 0000 . MemWri 00B10000 AD2F9613
59530 30ns B 0000 . MemWri 00B10000 D8BEC924
59531 30ns B 0000 . MemWri 00B10000 E273C0BD
59532 30ns B 0000 . MemWri 00B10000 DC655F5E
59533 30ns B 0000 . MemWri 00B10000 69B3087B
59534 30ns B 0000 . MemWri 00B10000 369B26D1
59535 30ns B 0000 . MemWri 00B10000 9A8C47DF
59536 780ns Start 0000 . MemWri 00B10020 3F026EA5
..
..
62592 750ns Start 0000 . MemWri 00B12FE0 367016E1
62593 30ns B 0000 . MemWri 00B12FE0 35654905
62594 30ns B 0000 . MemWri 00B12FE0 9968FF02
62595 30ns B 0000 . MemWri 00B12FE0 9ABB5CAE
62596 30ns B 0000 . MemWri 00B12FE0 D32DF135
62597 30ns B 0000 . MemWri 00B12FE0 7A03326A
62598 30ns B 0000 . MemWri 00B12FE0 86CCE8BF
62599 30ns B 0000 . MemWri 00B12FE0 D4E66D21
62600 1.176s . 1110 . IntAck ........ ......76
My conclusion: don't do ATAPI DMA on a serverworks ROSB4 revision 0 IDE
controller.
Ciao,
Dani
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gräfenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: Serverworks OSB4 in impossible state
2002-06-13 11:50 ` Daniela Engert
@ 2002-06-13 11:59 ` Martin Wilck
2002-06-13 12:04 ` Daniela Engert
2002-06-13 23:48 ` Re[2]: " Nerijus Baliunas
1 sibling, 1 reply; 18+ messages in thread
From: Martin Wilck @ 2002-06-13 11:59 UTC (permalink / raw)
To: Daniela Engert; +Cc: Alan Cox, Linux Kernel mailing list
Am Don, 2002-06-13 um 13.50 schrieb Daniela Engert:
> And here is the same with the ROSB4. This time, some of the
> DMA writes are shown. After loading the second PRD entry
> which describes a memory region of 7800h bytes, 3000h bytes
> are transferred before IRQ14 is asserted. The IRQ14 INTACK
> cycle is the last transaction on the PCI bus ever, the
> machine is completely frozen!
You say (dma_base+2) is never read?
Was that a Linux system? If yes, I assume you never saw "OSB4 in
impossible state ..." ?
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-13 11:59 ` Martin Wilck
@ 2002-06-13 12:04 ` Daniela Engert
2002-06-13 18:27 ` rico-linux-kernel
0 siblings, 1 reply; 18+ messages in thread
From: Daniela Engert @ 2002-06-13 12:04 UTC (permalink / raw)
To: Martin Wilck; +Cc: Alan Cox, Linux Kernel mailing list
On 13 Jun 2002 13:59:06 +0200, Martin Wilck wrote:
>Am Don, 2002-06-13 um 13.50 schrieb Daniela Engert:
>> are transferred before IRQ14 is asserted. The IRQ14 INTACK
>> cycle is the last transaction on the PCI bus ever, the
>> machine is completely frozen!
>
>You say (dma_base+2) is never read?
Exactly. If checked this twice, the PCI tracer was configured to gather
*all* PCI bus events.
>Was that a Linux system?
No, I think this doesn't matter here at all, because the hardware
stalls completely - full stop.
Ciao,
Dani
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gräfenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-13 12:04 ` Daniela Engert
@ 2002-06-13 18:27 ` rico-linux-kernel
0 siblings, 0 replies; 18+ messages in thread
From: rico-linux-kernel @ 2002-06-13 18:27 UTC (permalink / raw)
To: dani; +Cc: linux-kernel
Thanks for investing time on the logic analyser, Dani. My experience
is slightly different.
I have several mainboards (Tyan S1867) with older chipsets from
ServerWorks (f.k.a. Reliance). The IDE controller (OSB4 rev 0) is used
daily with ATAPI CDRW drives in UDMA(33) Mode. System handles read/write
errors without problem.
The system will lock solid when both IDE channels are accessed,
and either one is using DMA. Since I want DMA, I simply abandon the
secondary channel.
I have spare machines available for quack medical experiments.
Select boot-time info...
Linux version 2.4.17 (rico@pc2) (gcc version 2.95.3 20010315 (release)) #1 SMP Mon Dec 31 11:51:33 CST 2001
ServerWorks OSB4: IDE controller on PCI bus 00 dev 79
ServerWorks OSB4: chipset revision 0
ServerWorks OSB4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xfcb0-0xfcb7, BIOS settings: hda:pio, hdb:pio
ide1: BM-DMA at 0xfcb8-0xfcbf, BIOS settings: hdc:pio, hdd:pio
hda: PLEXTOR CD-R PX-W2410A, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 40X CD-ROM CD-R/RW drive, 4096kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re[2]: Serverworks OSB4 in impossible state
2002-06-13 11:50 ` Daniela Engert
2002-06-13 11:59 ` Martin Wilck
@ 2002-06-13 23:48 ` Nerijus Baliunas
1 sibling, 0 replies; 18+ messages in thread
From: Nerijus Baliunas @ 2002-06-13 23:48 UTC (permalink / raw)
To: Daniela Engert; +Cc: Linux Kernel mailing list
On Thu, 13 Jun 2002 13:50:25 +0200 (CDT) Daniela Engert <dani@ngrt.de> wrote:
> My conclusion: don't do ATAPI DMA on a serverworks ROSB4 revision 0 IDE
> controller.
How can I find revision? I have a problem with (Seagate) hdds, but lspci -v
only shows:
00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 51)
Subsystem: ServerWorks OSB4 South Bridge
Flags: bus master, medium devsel, latency 0
00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller (prog-if 8a [Master SecP PriP])
Flags: bus master, medium devsel, latency 64
I/O ports at 2000 [size=16]
Regards,
Nerijus
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-10 15:52 Serverworks OSB4 in impossible state Martin Wilck
2002-06-10 16:41 ` Daniela Engert
@ 2002-06-12 8:58 ` Alan Cox
2002-06-12 8:47 ` Martin Wilck
1 sibling, 1 reply; 18+ messages in thread
From: Alan Cox @ 2002-06-12 8:58 UTC (permalink / raw)
To: Martin Wilck; +Cc: osb4-bug, Linux Kernel mailing list, Martin Wilck
Triggering the check on csb5/csb6 would be a bug - maybe an extra
test is needed there as CSB5/6 are fine
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-12 8:58 ` Alan Cox
@ 2002-06-12 8:47 ` Martin Wilck
2002-06-12 9:14 ` Alan Cox
0 siblings, 1 reply; 18+ messages in thread
From: Martin Wilck @ 2002-06-12 8:47 UTC (permalink / raw)
To: Alan Cox; +Cc: osb4-bug, Linux Kernel mailing list
Am Mit, 2002-06-12 um 10.58 schrieb Alan Cox:
> Triggering the check on csb5/csb6 would be a bug - maybe an extra
> test is needed there as CSB5/6 are fine
Currently the stall is triggered if the DMA engine active bit is set, no
further conditions.
Would you concur that it would be reasonable to trigger only if
- the chipset version is < CSB5,
- the drive is a hard disk,
- and the drive did not report an error?
(I am not certain about the last condition, but from the descriptions
of the 4-byte-shift problem I have seen I infer that there was no drive
error condition involved).
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-12 8:47 ` Martin Wilck
@ 2002-06-12 9:14 ` Alan Cox
2002-06-12 10:30 ` OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state) Martin Wilck
0 siblings, 1 reply; 18+ messages in thread
From: Alan Cox @ 2002-06-12 9:14 UTC (permalink / raw)
To: Martin Wilck; +Cc: Alan Cox, osb4-bug, Linux Kernel mailing list
> Would you concur that it would be reasonable to trigger only if
>
> - the chipset version is < CSB5,
> - the drive is a hard disk,
> - and the drive did not report an error?
>
> (I am not certain about the last condition, but from the descriptions
> of the 4-byte-shift problem I have seen I infer that there was no drive
> error condition involved).
Entirely agreed
^ permalink raw reply [flat|nested] 18+ messages in thread
* OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state)
2002-06-12 9:14 ` Alan Cox
@ 2002-06-12 10:30 ` Martin Wilck
2002-06-12 20:35 ` Christian Zoffoli
0 siblings, 1 reply; 18+ messages in thread
From: Martin Wilck @ 2002-06-12 10:30 UTC (permalink / raw)
To: Alan Cox; +Cc: osb4-bug, Linux Kernel mailing list
Am Mit, 2002-06-12 um 11.14 schrieb Alan Cox:
> Entirely agreed
I propose this patch to remedy the problem.
I don't know how to test if the drive is a seagate drive, and
I think we don't want to do that, because it would end up in yet another
blacklist.
I cannot test if this behaves correctly on machines that do expose the
4-byte shift bug - it would be great if somebody could test that.
Martin
--- drivers/ide/serverworks.c.orig Tue Jun 11 11:24:59 2002
+++ drivers/ide/serverworks.c Wed Jun 12 12:00:36 2002
@@ -547,7 +547,13 @@
ide_hwif_t *hwif = HWIF(drive);
unsigned long dma_base = hwif->dma_base;
- if(inb(dma_base+0x02)&1)
+ /* If it's a disk on the OSB4, the DMA engine is still on,
+ and the device reports no error status, we are probably
+ facing the "4 byte shift" problem */
+ if(drive->media == ide_disk &&
+ hwif->pci_dev->device == PCI_DEVICE_ID_SERVERWORKS_OSB4IDE &&
+ inb(dma_base+0x02)&1 &&
+ OK_STAT (GET_STAT(), DRIVE_READY, BAD_STAT))
{
#if 0
int i;
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state)
2002-06-12 10:30 ` OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state) Martin Wilck
@ 2002-06-12 20:35 ` Christian Zoffoli
0 siblings, 0 replies; 18+ messages in thread
From: Christian Zoffoli @ 2002-06-12 20:35 UTC (permalink / raw)
To: Martin Wilck; +Cc: Alan Cox, osb4-bug, Linux Kernel mailing list
Martin Wilck wrote:
> Am Mit, 2002-06-12 um 11.14 schrieb Alan Cox:
> > Entirely agreed
>
> I propose this patch to remedy the problem.
>
> I don't know how to test if the drive is a seagate drive, and
> I think we don't want to do that, because it would end up in yet another
> blacklist.
>
> I cannot test if this behaves correctly on machines that do expose the
> 4-byte shift bug - it would be great if somebody could test that.
>
> Martin
>
> --- drivers/ide/serverworks.c.orig Tue Jun 11 11:24:59 2002
> +++ drivers/ide/serverworks.c Wed Jun 12 12:00:36 2002
> @@ -547,7 +547,13 @@
> ide_hwif_t *hwif = HWIF(drive);
> unsigned long dma_base = hwif->dma_base;
>
> - if(inb(dma_base+0x02)&1)
> + /* If it's a disk on the OSB4, the DMA engine is still on,
> + and the device reports no error status, we are probably
> + facing the "4 byte shift" problem */
> + if(drive->media == ide_disk &&
> + hwif->pci_dev->device == PCI_DEVICE_ID_SERVERWORKS_OSB4IDE &&
> + inb(dma_base+0x02)&1 &&
> + OK_STAT (GET_STAT(), DRIVE_READY, BAD_STAT))
> {
> #if 0
> int i;
>
>
It works for me ...I have a supermicro 370DE6 (serverworks HE-SL) and a
maxtor HD (5T030H3).
Christian
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2002-06-13 23:50 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-10 15:52 Serverworks OSB4 in impossible state Martin Wilck
2002-06-10 16:41 ` Daniela Engert
2002-06-11 7:22 ` Martin Wilck
2002-06-11 7:45 ` Daniela Engert
2002-06-11 8:37 ` Martin Wilck
2002-06-11 11:25 ` Martin Wilck
2002-06-11 21:27 ` Chris Wedgwood
2002-06-12 7:24 ` Martin Wilck
2002-06-13 11:50 ` Daniela Engert
2002-06-13 11:59 ` Martin Wilck
2002-06-13 12:04 ` Daniela Engert
2002-06-13 18:27 ` rico-linux-kernel
2002-06-13 23:48 ` Re[2]: " Nerijus Baliunas
2002-06-12 8:58 ` Alan Cox
2002-06-12 8:47 ` Martin Wilck
2002-06-12 9:14 ` Alan Cox
2002-06-12 10:30 ` OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state) Martin Wilck
2002-06-12 20:35 ` Christian Zoffoli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox