* Serverworks OSB4 in impossible state
@ 2002-06-10 15:52 Martin Wilck
2002-06-10 16:41 ` Daniela Engert
2002-06-12 8:58 ` Alan Cox
0 siblings, 2 replies; 31+ messages in thread
From: Martin Wilck @ 2002-06-10 15:52 UTC (permalink / raw)
To: osb4-bug; +Cc: Linux Kernel mailing list, Martin Wilck
[-- Attachment #1: Type: text/plain, Size: 1574 bytes --]
Hello,
I know a similar problem was discussed here a short while ago.
However we have here a situation where we can reproduce the problem
reliably. This is a RedHat 2.4.18-4 kernel.
We have a CD with a corrupt last block. If we try to read this block in
PIO mode (hdparm -d 0 /dev/hdc) , we get errors like in the first
attachment.
The machine has only a CDROM (Mitsumi FX 4830T) attached to the IDE bus
as /dev/hdc. We used no IDE-related boot parameters.
If we read the block in DMA mode (with dd), the machine stalls with the
"impossible state" message.
A PCI bus scan reveals that the IO register (dma_base+2) contains indeed
0xa5 (bit 0 set), which leads to the panic. Normally the read on that
register returns 0xa0.
We see in our PCI bus scan that a successful DMA of 4096 bytes was
carried out ~23ms before the stall condition. Another 4096 byte request
was scheduled but never seen. Between the successful DMA and the stall
condition we see nothing but a few timer interrupts.
Then an IDE interrupt occurs, which leads immediately to the panic.
The CD-ROM drive certainly reports some sort of error like in the PIO
case when tyring to access the last block. This seems to be the
(indirect) reason why the Bus master bit in (dma_base+2) remains set
long after the DMA is finished.
Any ideas/comments?
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
[-- Attachment #2: Kernel error messages in PIO-mode --]
[-- Type: text/plain, Size: 3573 bytes --]
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: error=0x50
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: error=0x50
Jun 10 13:12:40 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307712
Jun 10 13:12:49 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:49 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:12:56 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:56 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:06 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:06 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:09 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:09 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:09 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:13 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:13 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:17 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:17 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:21 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:21 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:21 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:21 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307716
Jun 10 13:13:21 pdb0384c kernel: hdc: command error: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:21 pdb0384c kernel: hdc: command error: error=0x50
Jun 10 13:13:21 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307712
Jun 10 13:13:25 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:25 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:29 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:29 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:33 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:33 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:36 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:36 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:36 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:41 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:41 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:44 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:44 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:48 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:48 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:48 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:48 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307716
[-- Attachment #3: dmesg --]
[-- Type: text/plain, Size: 15690 bytes --]
ACPI table found: RSDT v1 [PTLTD RSDT 1540.1]
__va_range(0xbfefc0f9, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefc0f9, 0x74): idx=8 mapped at ffff6000
ACPI table found: FACP v1 [FSC D1309 1540.1]
__va_range(0xbfefeef8, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefeef8, 0x50): idx=8 mapped at ffff6000
ACPI table found: SPCR v1 [PTLTD $UCRTBL$ 1540.1]
__va_range(0xbfefef48, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefef48, 0x90): idx=8 mapped at ffff6000
ACPI table found: APIC v1 [PTLTD APIC 1540.1]
__va_range(0xbfefef48, 0x90): idx=8 mapped at ffff6000
LAPIC (acpi_id[0x0000] id[0x6] enabled[1])
CPU 0 (0x0600) enabledProcessor #6 Unknown CPU [15:2] APIC version 16
LAPIC (acpi_id[0x0001] id[0x0] enabled[1])
CPU 1 (0x0000) enabledProcessor #0 Unknown CPU [15:2] APIC version 16
LAPIC (acpi_id[0x0002] id[0x1] enabled[1])
CPU 2 (0x0100) enabledProcessor #1 Unknown CPU [15:2] APIC version 16
LAPIC (acpi_id[0x0003] id[0x7] enabled[1])
CPU 3 (0x0700) enabledProcessor #7 Unknown CPU [15:2] APIC version 16
IOAPIC (id[0x2] address[0xfec00000] global_irq_base[0x0])
IOAPIC (id[0x3] address[0xfec10000] global_irq_base[0x10])
INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x1] trigger[0x1])
INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x3] trigger[0x3])
LAPIC_NMI (acpi_id[0x0000] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0001] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0002] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0003] polarity[0x1] trigger[0x1] lint[0x1])
4 CPUs total
Local APIC address fee00000
__va_range(0xbfefefd8, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefefd8, 0x28): idx=8 mapped at ffff6000
ACPI table found: BOOT v1 [PTLTD $SBFTBL$ 1540.1]
Enabling the CPU's according to the ACPI table
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: FSCD1309 Product ID: PRIMERGY APIC at: 0xFEE00000
I/O APIC #2 Version 17 at 0xFEC00000.
I/O APIC #3 Version 17 at 0xFEC10000.
Processors: 4
Kernel command line: ro root=/dev/sda2
Initializing CPU#0
Detected 2395.457 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 4771.02 BogoMIPS
Memory: 3098776k/3145728k available (1232k kernel code, 46496k reserved, 842k data, 304k init, 2228160k highmem)
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode cache hash table entries: 262144 (order: 9, 2097152 bytes)
Mount-cache hash table entries: 65536 (order: 7, 524288 bytes)
Buffer cache hash table entries: 262144 (order: 8, 1048576 bytes)
Page-cache hash table entries: 524288 (order: 9, 2097152 bytes)
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: 3febfbff 00000000 00000000 00000000
CPU: Common caps: 3febfbff 00000000 00000000 00000000
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: 3febfbff 00000000 00000000 00000000
CPU: Common caps: 3febfbff 00000000 00000000 00000000
CPU0: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
per-CPU timeslice cutoff: 1462.93 usecs.
task migration cache decay timeout: 10 msecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Booting processor 1/0 eip 2000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4784.12 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#1.
CPU: After generic, caps: 3febfbff 00000000 00000000 00000000
CPU: Common caps: 3febfbff 00000000 00000000 00000000
CPU1: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
Booting processor 2/1 eip 2000
Initializing CPU#2
masked ExtINT on CPU#2
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4784.12 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#2.
CPU: After generic, caps: 3febfbff 00000000 00000000 00000000
CPU: Common caps: 3febfbff 00000000 00000000 00000000
CPU2: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
Booting processor 3/7 eip 2000
Initializing CPU#3
masked ExtINT on CPU#3
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4784.12 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#3.
CPU: After generic, caps: 3febfbff 00000000 00000000 00000000
CPU: Common caps: 3febfbff 00000000 00000000 00000000
CPU3: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
Total of 4 processors activated (19123.40 BogoMIPS).
cpu_sibling_map[0] = 3
cpu_sibling_map[1] = 2
cpu_sibling_map[2] = 1
cpu_sibling_map[3] = 0
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
Setting 3 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 3 ... ok.
init IO_APIC IRQs
IO-APIC (apicid-pin) 2-0, 2-10, 2-11, 3-0, 3-1, 3-2, 3-3, 3-4, 3-5, 3-6, 3-7, 3-8, 3-9, 3-10, 3-11, 3-12, 3-15 not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
number of MP IRQ sources: 16.
number of IO-APIC #2 registers: 16.
number of IO-APIC #3 registers: 16.
testing the IO APIC.......................
IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
.... register #01: 000F0011
....... : max redirection entries: 000F
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 02000000
....... : arbitration: 02
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 00F 0F 0 0 0 0 0 1 1 39
02 00F 0F 0 0 0 0 0 1 1 31
03 00F 0F 0 0 0 0 0 1 1 41
04 00F 0F 0 0 0 0 0 1 1 49
05 00F 0F 0 0 0 0 0 1 1 51
06 00F 0F 0 0 0 0 0 1 1 59
07 00F 0F 0 0 0 0 0 1 1 61
08 00F 0F 0 0 0 0 0 1 1 69
09 00F 0F 1 1 0 1 0 1 1 71
0a 000 00 1 0 0 0 0 0 0 00
0b 000 00 1 0 0 0 0 0 0 00
0c 00F 0F 0 0 0 0 0 1 1 79
0d 00F 0F 0 0 0 0 0 1 1 81
0e 00F 0F 0 0 0 0 0 1 1 89
0f 00F 0F 0 0 0 0 0 1 1 91
IO APIC #3......
.... register #00: 03000000
....... : physical APIC id: 03
.... register #01: 000F0011
....... : max redirection entries: 000F
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 03000000
....... : arbitration: 03
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 000 00 1 0 0 0 0 0 0 00
02 000 00 1 0 0 0 0 0 0 00
03 000 00 1 0 0 0 0 0 0 00
04 000 00 1 0 0 0 0 0 0 00
05 000 00 1 0 0 0 0 0 0 00
06 000 00 1 0 0 0 0 0 0 00
07 000 00 1 0 0 0 0 0 0 00
08 000 00 1 0 0 0 0 0 0 00
09 000 00 1 0 0 0 0 0 0 00
0a 000 00 1 0 0 0 0 0 0 00
0b 000 00 1 0 0 0 0 0 0 00
0c 000 00 1 0 0 0 0 0 0 00
0d 00F 0F 1 1 0 1 0 1 1 99
0e 00F 0F 1 1 0 1 0 1 1 A1
0f 000 00 1 0 0 0 0 0 0 00
IRQ to pin mappings:
IRQ0 -> 0:2
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ29 -> 1:13
IRQ30 -> 1:14
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 2395.2498 MHz.
..... host bus clock speed is 99.8019 MHz.
cpu: 0, clocks: 998019, slice: 199603
CPU0<T0:998016,T1:798400,D:13,S:199603,C:998019>
cpu: 2, clocks: 998019, slice: 199603
cpu: 3, clocks: 998019, slice: 199603
cpu: 1, clocks: 998019, slice: 199603
CPU1<T0:998016,T1:598800,D:10,S:199603,C:998019>
CPU2<T0:998016,T1:399200,D:7,S:199603,C:998019>
CPU3<T0:998016,T1:199600,D:4,S:199603,C:998019>
checking TSC synchronization across CPUs: passed.
PCI: PCI BIOS revision 2.10 entry at 0xfd9aa, last bus=2
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Discovered primary peer bus 01 [IRQ]
PCI: Discovered primary peer bus 02 [IRQ]
PCI->APIC IRQ transform: (B0,I5,P0) -> 30
PCI->APIC IRQ transform: (B0,I15,P0) -> 9
PCI->APIC IRQ transform: (B2,I10,P0) -> 29
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS not found.
Starting kswapd
allocated 64 pages and 64 bhs reserved for the highmem bounces
VFS: Diskquotas version dquot_6.5.0 initialized
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
block: 1024 slots per queue, batch=256
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks CSB5: IDE controller on PCI bus 00 dev 79
SvrWks CSB5: chipset revision 147
SvrWks CSB5: not 100% native mode: will probe irqs later
SvrWks CSB5: simplex device: DMA forced
ide0: BM-DMA at 0x1800-0x1807, BIOS settings: hda:pio, hdb:pio
SvrWks CSB5: simplex device: DMA forced
ide1: BM-DMA at 0x1808-0x180f, BIOS settings: hdc:pio, hdd:pio
hdc: FX4830T, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
ide-floppy driver 0.99.newide
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
pci_hotplug: PCI Hot Plug PCI Core version: 0.4
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 32768 buckets, 256Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 220k freed
VFS: Mounted root (ext2 filesystem).
SCSI subsystem driver Revision: 1.00
kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
sym53c8xx: at PCI bus 2, device 10, function 0
sym53c8xx: 53c1010-66 detected with Symbios NVRAM
sym53c1010-66-0: rev 0x1 on pci bus 2 device 10 function 0 irq 29
sym53c1010-66-0: Symbios format NVRAM, ID 7, Fast-80, Parity Checking
sym53c1010-66-0: on-chip RAM at 0xfe000000
sym53c1010-66-0: restart (scsi reset).
sym53c1010-66-0: handling phase mismatch from SCRIPTS.
sym53c1010-66-0: Downloading SCSI SCRIPTS.
scsi0 : sym53c8xx-1.7.3c-20010512
blk: queue f7fd6e18, I/O limit 4095Mb (mask 0xffffffff)
Vendor: SEAGATE Model: ST318451LC Rev: 7500
Type: Direct-Access ANSI SCSI revision: 03
blk: queue f7fd6c18, I/O limit 4095Mb (mask 0xffffffff)
Vendor: SEAGATE Model: ST318451LC Rev: 7500
Type: Direct-Access ANSI SCSI revision: 03
blk: queue f7fd6a18, I/O limit 4095Mb (mask 0xffffffff)
Vendor: SDR Model: GEM318 Rev: 0
Type: Processor ANSI SCSI revision: 02
blk: queue f7308e18, I/O limit 4095Mb (mask 0xffffffff)
sym53c1010-66-0-<0,0>: tagged command queue depth set to 8
sym53c1010-66-0-<1,0>: tagged command queue depth set to 8
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
sym53c1010-66-0-<0,*>: FAST-80 WIDE SCSI 160.0 MB/s (12.5 ns, offset 62)
SCSI device sda: 35843671 512-byte hdwr sectors (18352 MB)
Partition check:
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 >
sym53c1010-66-0-<1,*>: FAST-80 WIDE SCSI 160.0 MB/s (12.5 ns, offset 62)
SCSI device sdb: 35843671 512-byte hdwr sectors (18352 MB)
sdb:
Journalled Block Device driver loaded
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 304k freed
Adding Swap: 1702848k swap-space (priority -1)
Adding Swap: 1694816k swap-space (priority -2)
Adding Swap: 1702848k swap-space (priority -3)
Adding Swap: 1702848k swap-space (priority -4)
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-ohci.c: USB OHCI at membase 0xf8bc5000, IRQ 9
usb-ohci.c: usb-00:0f.2, ServerWorks OSB4/CSB5 OHCI USB Controller
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 4 ports detected
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on sd(8,2), internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on sd(8,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on sd(8,3), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
parport0: PC-style at 0x378 [PCSPP,TRISTATE,EPP]
eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others
eth0: OEM i82557/i82558 10/100 Ethernet, 00:30:05:29:74:92, IRQ 30.
Board assembly 000000-000, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
hdc: ATAPI 48X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12
hdc: DMA disabled
[-- Attachment #4: /proc/ide/ide1/hdc/settings --]
[-- Type: text/plain, Size: 1199 bytes --]
name value min max mode
---- ----- --- --- ----
breada_readahead 4 0 127 rw
current_speed 66 0 69 rw
dsc_overlap 0 0 1 rw
file_readahead 0 0 2097151 rw
ide_scsi 0 0 1 rw
init_speed 66 0 69 rw
io_32bit 0 0 3 rw
keepsettings 0 0 1 rw
max_kb_per_request 64 1 127 rw
nice1 1 0 1 rw
number 2 0 3 rw
pio_mode write-only 0 255 w
slow 0 0 1 rw
unmaskirq 0 0 1 rw
using_dma 1 0 1 rw
[-- Attachment #5: /proc/ide/svwks --]
[-- Type: text/plain, Size: 785 bytes --]
ServerWorks OSB4/CSB5/CSB6
ServerWorks CSB5 Chipset (rev 93)
------------------------------- General Status ---------------------------------
--------------- Primary Channel ---------------- Secondary Channel -------------
disabled disabled
--------------- drive0 --------- drive1 -------- drive0 ---------- drive1 ------
DMA enabled: no no yes no
UDMA enabled: no no yes no
UDMA enabled: 0 0 2 0
DMA enabled: 2 2 2 2
PIO enabled: ? ? 4 ?
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: Serverworks OSB4 in impossible state
2002-06-10 15:52 Serverworks OSB4 in impossible state Martin Wilck
@ 2002-06-10 16:41 ` Daniela Engert
2002-06-11 7:22 ` Martin Wilck
2002-06-12 8:58 ` Alan Cox
1 sibling, 1 reply; 31+ messages in thread
From: Daniela Engert @ 2002-06-10 16:41 UTC (permalink / raw)
To: Martin Wilck; +Cc: Linux Kernel mailing list
Hello Martin,
On 10 Jun 2002 17:52:58 +0200, Martin Wilck wrote:
>We have a CD with a corrupt last block. If we try to read this block in
>PIO mode (hdparm -d 0 /dev/hdc) , we get errors like in the first
>attachment.
The error code returned is "check condition" with a sense key of 3
"medium error". The most appropriate driver action would have been to
issue a "request sense" command to learn the precise error and retry
only in case of a good chance of a recoverable problem - but that's a
different story.
>If we read the block in DMA mode (with dd), the machine stalls with the
>"impossible state" message.
>
>A PCI bus scan reveals that the IO register (dma_base+2) contains indeed
>0xa5 (bit 0 set), which leads to the panic. Normally the read on that
>register returns 0xa0.
The intersting bits of the DMA status register are bits 0 though 2. A
value of 5 indicates the condition "interrupt from unit, DMA state
machine active". This is a valid status! It basically means the unit
issued an interrupt before the PRD table is exhausted. This makes sense
because the CD-ROM units fails to transfer the amount of data described
by the PRD table because of the non-recoverable read error.
>We see in our PCI bus scan that a successful DMA of 4096 bytes was
>carried out ~23ms before the stall condition. Another 4096 byte request
>was scheduled but never seen. Between the successful DMA and the stall
>condition we see nothing but a few timer interrupts.
>Then an IDE interrupt occurs, which leads immediately to the panic.
What you makes sense (the next DMA transfer is scheduled but never
carried out by the CD-ROM unit) except for the panic, ofcoz. The
correct driver action in this case were stopping the DMA engine and
issuing a reset of the state machines involved (both on the host and
the unit side).
>Any ideas/comments?
I hope this clears up things a little ...
Ciao,
Dani
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-10 16:41 ` Daniela Engert
@ 2002-06-11 7:22 ` Martin Wilck
2002-06-11 7:45 ` Daniela Engert
0 siblings, 1 reply; 31+ messages in thread
From: Martin Wilck @ 2002-06-11 7:22 UTC (permalink / raw)
To: Daniela Engert; +Cc: Linux Kernel mailing list
Am Mon, 2002-06-10 um 18.41 schrieb Daniela Engert:
> The intersting bits of the DMA status register are bits 0 though 2. A
> value of 5 indicates the condition "interrupt from unit, DMA state
> machine active". This is a valid status! It basically means the unit
> issued an interrupt before the PRD table is exhausted. This makes sense
> because the CD-ROM units fails to transfer the amount of data described
> by the PRD table because of the non-recoverable read error.
Shouldn't the error bit be set too? (But that wouldn't make any
difference with the current driver ...)
> What you makes sense (the next DMA transfer is scheduled but never
> carried out by the CD-ROM unit) except for the panic, ofcoz. The
> correct driver action in this case were stopping the DMA engine and
> issuing a reset of the state machines involved (both on the host and
> the unit side).
The message, the comments in the code, and what Alan wrote here:
http://groups.google.com/groups?hl=de&lr=&threadm=linux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%40boxer.fnal.gov&rnum=2&prev=/groups%3Fq%3Dosb4-bug%2540ide.cabal.tm%26hl%3Dde%26lr%3D%26selm%3Dlinux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%2540boxer.fnal.gov%26rnum%3D2
suggest that trying to recover from this condition is extremely
dangerous (note that the kernel doesn't even panic(), because
a sync() may kill a disk, the comments say).
Anyway, thanks a lot for your insightful comments.
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-11 7:22 ` Martin Wilck
@ 2002-06-11 7:45 ` Daniela Engert
2002-06-11 8:37 ` Martin Wilck
2002-06-11 11:25 ` Martin Wilck
0 siblings, 2 replies; 31+ messages in thread
From: Daniela Engert @ 2002-06-11 7:45 UTC (permalink / raw)
To: Martin Wilck; +Cc: Linux Kernel mailing list
On 11 Jun 2002 09:22:24 +0200, Martin Wilck wrote:
>Am Mon, 2002-06-10 um 18.41 schrieb Daniela Engert:
>> The intersting bits of the DMA status register are bits 0 though 2. A
>> value of 5 indicates the condition "interrupt from unit, DMA state
>> machine active". This is a valid status! It basically means the unit
>> issued an interrupt before the PRD table is exhausted. This makes sense
>> because the CD-ROM units fails to transfer the amount of data described
>> by the PRD table because of the non-recoverable read error.
>
>Shouldn't the error bit be set too? (But that wouldn't make any
>difference with the current driver ...)
No it shouldn't. The error is happening on the unit side and not on the
host side of the bus. Thus it is correct that the host is *not*
reporting an error (which is true) but only the CD-ROM unit.
>> What you makes sense (the next DMA transfer is scheduled but never
>> carried out by the CD-ROM unit) except for the panic, ofcoz. The
>> correct driver action in this case were stopping the DMA engine and
>> issuing a reset of the state machines involved (both on the host and
>> the unit side).
>
>The message, the comments in the code, and what Alan wrote here:
>http://groups.google.com/groups?hl=de&lr=&threadm=linux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%40boxer.fnal.gov&rnum=2&prev=/groups%3Fq%3Dosb4-bug%2540ide.cabal.tm%26hl%3Dde%26lr%3D%26selm%3Dlinux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%2540boxer.fnal.gov%26rnum%3D2
>suggest that trying to recover from this condition is extremely
>dangerous (note that the kernel doesn't even panic(), because
>a sync() may kill a disk, the comments say).
I'm aware of all of that. By pure chance I have a machine with an OSB4
sitting on my desk for a couple of days. May be I can find a defect
CD-ROM to test it with my driver and see if it manages to recover from
errors like these. Hopefully, the PCI tracer gives some more insight.
Ciao,
Dani
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gräfenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-11 7:45 ` Daniela Engert
@ 2002-06-11 8:37 ` Martin Wilck
2002-06-11 11:25 ` Martin Wilck
1 sibling, 0 replies; 31+ messages in thread
From: Martin Wilck @ 2002-06-11 8:37 UTC (permalink / raw)
To: Daniela Engert; +Cc: Linux Kernel mailing list
Am Die, 2002-06-11 um 09.45 schrieb Daniela Engert:
> I'm aware of all of that. By pure chance I have a machine with an OSB4
> sitting on my desk for a couple of days. May be I can find a defect
> CD-ROM to test it with my driver and see if it manages to recover from
> errors like these. Hopefully, the PCI tracer gives some more insight.
Do you have a custom version of the driver (because you write "my
driver")? If yes, can you send it, so that I can test it, too?
Can you point me to any reference material on the web?
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-11 7:45 ` Daniela Engert
2002-06-11 8:37 ` Martin Wilck
@ 2002-06-11 11:25 ` Martin Wilck
2002-06-11 21:27 ` Chris Wedgwood
2002-06-13 11:50 ` Daniela Engert
1 sibling, 2 replies; 31+ messages in thread
From: Martin Wilck @ 2002-06-11 11:25 UTC (permalink / raw)
To: Daniela Engert; +Cc: Linux Kernel mailing list, Alan Cox
[Alan, I am cc'ing you on this because I read elsewhere that you want
osb4-bug@ide.cabal.tm to be forwarded to you, and that address still
bounces].
I have tried the following:
- comment out the code that stalls the machine when the condition in
question is encountered.
- run dd over a couple of good blocks on the CD.
- run dd over the corrupted blocks. This leads now to very similar
errors as in the PIO case.
- reenable DMA with hdparm, because it is automatically disabled by the
ide-cd driver if an error occurs (why that? the error has nothing to
do with DMA here).
- repeat the first dd command on the good blocks and compare the
results.
The results are identical, thus I cannot verify the "4 byte shift" Alan
has been talking about. Of course this is a CD-ROM only scenario, thus
I can't tell anything about hard disks.
Is it possible that the 4-byte shift occurs only with some particular
(older?) version of the chipset?
In any case, the condition that usually causes Linux to stall is
indeed a perfectly valid condition for DMA when the device transfers
less data than it's supposed to. I doubt that hanging the system
without more detailed checks is the right measure to take there.
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-11 11:25 ` Martin Wilck
@ 2002-06-11 21:27 ` Chris Wedgwood
2002-06-12 7:24 ` Martin Wilck
2002-06-13 11:50 ` Daniela Engert
1 sibling, 1 reply; 31+ messages in thread
From: Chris Wedgwood @ 2002-06-11 21:27 UTC (permalink / raw)
To: Martin Wilck; +Cc: Daniela Engert, Linux Kernel mailing list, Alan Cox
On Tue, Jun 11, 2002 at 01:25:25PM +0200, Martin Wilck wrote:
Is it possible that the 4-byte shift occurs only with some
particular (older?) version of the chipset?
Maybe.
I have an oldish OSB4 here and beating on it only with the CDROM
(disks are all SCSI) I don't ever seem to see this problem:
00:00.0 Host bridge: ServerWorks CNB20LE Host Bridge (rev 05)
Flags: bus master, medium devsel, latency 48
00:00.1 Host bridge: ServerWorks CNB20LE Host Bridge (rev 05)
Flags: bus master, medium devsel, latency 48
I think what is really required is input from ServerWorks/Broadcom
about this.
--cw
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: Serverworks OSB4 in impossible state
2002-06-11 21:27 ` Chris Wedgwood
@ 2002-06-12 7:24 ` Martin Wilck
0 siblings, 0 replies; 31+ messages in thread
From: Martin Wilck @ 2002-06-12 7:24 UTC (permalink / raw)
To: Chris Wedgwood; +Cc: Linux Kernel mailing list
Am Die, 2002-06-11 um 23.27 schrieb Chris Wedgwood:
> I have an oldish OSB4 here and beating on it only with the CDROM
> (disks are all SCSI) I don't ever seem to see this problem:
UDMA33 mode? You need to have a broken CD (we happen to have a CD burner
that generates broken CDs)
> I think what is really required is input from ServerWorks/Broadcom
> about this.
Yeah, we are in contact with them.
Thanks,
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-11 11:25 ` Martin Wilck
2002-06-11 21:27 ` Chris Wedgwood
@ 2002-06-13 11:50 ` Daniela Engert
2002-06-13 11:59 ` Martin Wilck
2002-06-13 23:48 ` Re[2]: " Nerijus Baliunas
1 sibling, 2 replies; 31+ messages in thread
From: Daniela Engert @ 2002-06-13 11:50 UTC (permalink / raw)
To: Martin Wilck; +Cc: Alan Cox, Linux Kernel mailing list
Hi,
as promised I've conducted a test similar to Martin's to check the
behaviour of a Serverworks ROSB4 IDE controller in case of an aborted
ATAPI DMA transfer (probably due to a media error). In fact, I've done
this by comparing it with a known-to-be-good system, a dual processor
Intel BX based board with a PIIX4 IDE controller chip.
The following trace shows how it should be:
- lines 158-172: setup DMA transfer, send command packet
- lines 173-174: the DMA engine loads the first (of multiple)
PRD entry
- the actual DMA induced memory writes are not shown here
- line 175: IRQ14 is acknowledged
- lines 176-181: gather unit and DMA status
- lines 182-210: issue "request sense" and get sense status
CD-ROM read error on Intel PIIX4:
______Time_______Burst_BE#__Wait___Command_Address__Data____
158 20.089ms . 1011 . I/OWri 000001F6 ..B0....
159 1.656us . 1011 . I/ORd 000003F6 ..50....
160 10.08us . 0000 . I/OWri 0000F004 00EAC800
161 812.7ns . 1110 . I/OWri 0000F000 ......08
162 752.5ns . 1011 . I/OWri 0000F002 ..46....
163 3.100us . 1110 . I/OWri 000001F4 ......FF
164 4.214us . 1101 . I/OWri 000001F5 ....FF..
165 4.244us . 1101 . I/OWri 000001F1 ....01..
166 4.214us . 0111 . I/OWri 000001F7 A0......
167 5.027us . 1011 . I/ORd 000003F6 ..58....
168 5.448us . 1011 . I/OWri 0000F002 ..46....
169 752.5ns . 1110 . I/OWri 0000F000 ......09
170 1.204us . 0000 . I/OWri 000001F0 00000028
171 903.0ns . 0000 . I/OWri 000001F0 0000440D
172 903.0ns . 0000 . I/OWri 000001F0 0000001F
173 3.673s Start 0000 . MemRd 00EAC800 006E3000
174 30.1ns B 0000 . MemRd 00EAC800 0000D000
175 648.99ms . 1110 . IntAck ........ ......76
176 5.779us . 0111 . I/ORd 000001F7 51......
177 11.47us . 0111 . I/ORd 000001F7 51......
178 1.324us . 1011 . I/ORd 000001F2 ..03....
179 1.957us . 1110 . I/OWri 0000F000 ......08
180 812.7ns . 1011 . I/ORd 0000F002 ..44....
181 1.355us . 1101 . I/ORd 000001F1 ....30..
182 9.361us . 1011 . I/OWri 000001F6 ..B0....
183 1.806us . 1011 . I/ORd 000003F6 ..51....
184 9.301us . 1110 . I/OWri 000001F4 ......12
185 4.274us . 1101 . I/OWri 000001F5 ....00..
186 4.244us . 1101 . I/OWri 000001F1 ....00..
187 4.214us . 0111 . I/OWri 000001F7 A0......
188 4.906us . 1011 . I/ORd 000003F6 ..58....
189 6.020us . 0000 . I/OWri 000001F0 00000003
190 903.0ns . 0000 . I/OWri 000001F0 00000012
191 903.0ns . 0000 . I/OWri 000001F0 00000000
192 258.17us . 1110 . IntAck ........ ......76
193 3.431us . 0111 . I/ORd 000001F7 58......
194 10.08us . 0111 . I/ORd 000001F7 58......
195 1.204us . 1011 . I/ORd 000001F2 ..02....
196 1.535us . 1101 . I/ORd 000001F5 ....00..
197 1.174us . 1110 . I/ORd 000001F4 ......12
198 10.20us . 1100 . I/ORd 000001F0 ....0070
199 1.475us . 1100 . I/ORd 000001F0 ....0003
200 632.1ns . 1100 . I/ORd 000001F0 ....0000
201 632.1ns . 1100 . I/ORd 000001F0 ....0A00
202 602.0ns . 1100 . I/ORd 000001F0 ....0000
203 632.1ns . 1100 . I/ORd 000001F0 ....0000
204 602.0ns . 1100 . I/ORd 000001F0 ....0611
205 632.1ns . 1100 . I/ORd 000001F0 ....0000
206 602.0ns . 1100 . I/ORd 000001F0 ....0000
207 12.79us . 1110 . IntAck ........ ......76
208 3.401us . 0111 . I/ORd 000001F7 50......
209 9.361us . 0111 . I/ORd 000001F7 50......
210 1.234us . 1011 . I/ORd 000001F2 ..03....
And here is the same with the ROSB4. This time, some of the
DMA writes are shown. After loading the second PRD entry
which describes a memory region of 7800h bytes, 3000h bytes
are transferred before IRQ14 is asserted. The IRQ14 INTACK
cycle is the last transaction on the PCI bus ever, the
machine is completely frozen!
CD-ROM read error on ServerWorks ROSB4 revision 0:
______Time_______Burst_BE#__Wait___Command_Address__Data____
51316 297.63us . 1011 . I/OWri 000001F6 ..B0....
51317 1.530us . 1011 . I/ORd 000003F6 ..50....
51318 6.300us . 0000 . I/OWri 00005404 00EF2800
51319 450ns . 1110 . I/OWri 00005400 ......08
51320 450ns . 1011 . I/OWri 00005402 ..66....
51321 1.440us . 1110 . I/OWri 000001F4 ......FF
51322 3.480us . 1101 . I/OWri 000001F5 ....FF..
51323 3.480us . 1101 . I/OWri 000001F1 ....01..
51324 3.510us . 0111 . I/OWri 000001F7 A0......
51325 4.470us . 1011 . I/ORd 000003F6 ..58....
51326 4.620us . 1011 . I/OWri 00005402 ..66....
51327 660ns . 0000 . I/OWri 000001F0 00000028
51328 420ns . 0000 . I/OWri 000001F0 0000F80D
51329 420ns . 0000 . I/OWri 000001F0 0000001F
51330 1.290us . 1011 . I/ORd 000003F6 ..D0....
51331 3.660us . 1110 . I/OWri 00005400 ......09
51332 1.290us . 0000 . MemRd 00EF2800 00B08000
51333 630ns . 0000 . MemRd 00EF2804 00008000
51334 166.11us Start 0000 . MemWri 00B08000 7BC0728C
51335 30ns B 0000 . MemWri 00B08000 285DA7D0
51336 30ns B 0000 . MemWri 00B08000 9FAE557A
51337 30ns B 0000 . MemWri 00B08000 B3F88165
51338 30ns B 0000 . MemWri 00B08000 BDFD7823
51339 30ns B 0000 . MemWri 00B08000 42ED22D0
51340 30ns B 0000 . MemWri 00B08000 7BA5743F
51341 30ns B 0000 . MemWri 00B08000 6B5897BA
51342 780ns Start 0000 . MemWri 00B08020 ACF1D36B
..
..
59518 930ns Start 0000 . MemWri 00B0FFE0 845971B8
59519 30ns B 0000 . MemWri 00B0FFE0 7E325F95
59520 30ns B 0000 . MemWri 00B0FFE0 7ADA36D0
59521 30ns B 0000 . MemWri 00B0FFE0 96BD435C
59522 30ns B 0000 . MemWri 00B0FFE0 4ED88CB0
59523 30ns B 0000 . MemWri 00B0FFE0 2E1CCAF7
59524 30ns B 0000 . MemWri 00B0FFE0 FC8782B3
59525 30ns B 0000 . MemWri 00B0FFE0 9C0A2335
59526 780ns . 0000 . MemRd 00EF2808 00B10000
59527 630ns . 0000 . MemRd 00EF280C 80007800
59528 1.2518ms Start 0000 . MemWri 00B10000 E85C33CD
59529 30ns B 0000 . MemWri 00B10000 AD2F9613
59530 30ns B 0000 . MemWri 00B10000 D8BEC924
59531 30ns B 0000 . MemWri 00B10000 E273C0BD
59532 30ns B 0000 . MemWri 00B10000 DC655F5E
59533 30ns B 0000 . MemWri 00B10000 69B3087B
59534 30ns B 0000 . MemWri 00B10000 369B26D1
59535 30ns B 0000 . MemWri 00B10000 9A8C47DF
59536 780ns Start 0000 . MemWri 00B10020 3F026EA5
..
..
62592 750ns Start 0000 . MemWri 00B12FE0 367016E1
62593 30ns B 0000 . MemWri 00B12FE0 35654905
62594 30ns B 0000 . MemWri 00B12FE0 9968FF02
62595 30ns B 0000 . MemWri 00B12FE0 9ABB5CAE
62596 30ns B 0000 . MemWri 00B12FE0 D32DF135
62597 30ns B 0000 . MemWri 00B12FE0 7A03326A
62598 30ns B 0000 . MemWri 00B12FE0 86CCE8BF
62599 30ns B 0000 . MemWri 00B12FE0 D4E66D21
62600 1.176s . 1110 . IntAck ........ ......76
My conclusion: don't do ATAPI DMA on a serverworks ROSB4 revision 0 IDE
controller.
Ciao,
Dani
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gräfenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: Serverworks OSB4 in impossible state
2002-06-13 11:50 ` Daniela Engert
@ 2002-06-13 11:59 ` Martin Wilck
2002-06-13 12:04 ` Daniela Engert
2002-06-13 23:48 ` Re[2]: " Nerijus Baliunas
1 sibling, 1 reply; 31+ messages in thread
From: Martin Wilck @ 2002-06-13 11:59 UTC (permalink / raw)
To: Daniela Engert; +Cc: Alan Cox, Linux Kernel mailing list
Am Don, 2002-06-13 um 13.50 schrieb Daniela Engert:
> And here is the same with the ROSB4. This time, some of the
> DMA writes are shown. After loading the second PRD entry
> which describes a memory region of 7800h bytes, 3000h bytes
> are transferred before IRQ14 is asserted. The IRQ14 INTACK
> cycle is the last transaction on the PCI bus ever, the
> machine is completely frozen!
You say (dma_base+2) is never read?
Was that a Linux system? If yes, I assume you never saw "OSB4 in
impossible state ..." ?
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-13 11:59 ` Martin Wilck
@ 2002-06-13 12:04 ` Daniela Engert
2002-06-13 18:27 ` rico-linux-kernel
0 siblings, 1 reply; 31+ messages in thread
From: Daniela Engert @ 2002-06-13 12:04 UTC (permalink / raw)
To: Martin Wilck; +Cc: Alan Cox, Linux Kernel mailing list
On 13 Jun 2002 13:59:06 +0200, Martin Wilck wrote:
>Am Don, 2002-06-13 um 13.50 schrieb Daniela Engert:
>> are transferred before IRQ14 is asserted. The IRQ14 INTACK
>> cycle is the last transaction on the PCI bus ever, the
>> machine is completely frozen!
>
>You say (dma_base+2) is never read?
Exactly. If checked this twice, the PCI tracer was configured to gather
*all* PCI bus events.
>Was that a Linux system?
No, I think this doesn't matter here at all, because the hardware
stalls completely - full stop.
Ciao,
Dani
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gräfenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-13 12:04 ` Daniela Engert
@ 2002-06-13 18:27 ` rico-linux-kernel
0 siblings, 0 replies; 31+ messages in thread
From: rico-linux-kernel @ 2002-06-13 18:27 UTC (permalink / raw)
To: dani; +Cc: linux-kernel
Thanks for investing time on the logic analyser, Dani. My experience
is slightly different.
I have several mainboards (Tyan S1867) with older chipsets from
ServerWorks (f.k.a. Reliance). The IDE controller (OSB4 rev 0) is used
daily with ATAPI CDRW drives in UDMA(33) Mode. System handles read/write
errors without problem.
The system will lock solid when both IDE channels are accessed,
and either one is using DMA. Since I want DMA, I simply abandon the
secondary channel.
I have spare machines available for quack medical experiments.
Select boot-time info...
Linux version 2.4.17 (rico@pc2) (gcc version 2.95.3 20010315 (release)) #1 SMP Mon Dec 31 11:51:33 CST 2001
ServerWorks OSB4: IDE controller on PCI bus 00 dev 79
ServerWorks OSB4: chipset revision 0
ServerWorks OSB4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xfcb0-0xfcb7, BIOS settings: hda:pio, hdb:pio
ide1: BM-DMA at 0xfcb8-0xfcbf, BIOS settings: hdc:pio, hdd:pio
hda: PLEXTOR CD-R PX-W2410A, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 40X CD-ROM CD-R/RW drive, 4096kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re[2]: Serverworks OSB4 in impossible state
2002-06-13 11:50 ` Daniela Engert
2002-06-13 11:59 ` Martin Wilck
@ 2002-06-13 23:48 ` Nerijus Baliunas
1 sibling, 0 replies; 31+ messages in thread
From: Nerijus Baliunas @ 2002-06-13 23:48 UTC (permalink / raw)
To: Daniela Engert; +Cc: Linux Kernel mailing list
On Thu, 13 Jun 2002 13:50:25 +0200 (CDT) Daniela Engert <dani@ngrt.de> wrote:
> My conclusion: don't do ATAPI DMA on a serverworks ROSB4 revision 0 IDE
> controller.
How can I find revision? I have a problem with (Seagate) hdds, but lspci -v
only shows:
00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 51)
Subsystem: ServerWorks OSB4 South Bridge
Flags: bus master, medium devsel, latency 0
00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller (prog-if 8a [Master SecP PriP])
Flags: bus master, medium devsel, latency 64
I/O ports at 2000 [size=16]
Regards,
Nerijus
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-10 15:52 Serverworks OSB4 in impossible state Martin Wilck
2002-06-10 16:41 ` Daniela Engert
@ 2002-06-12 8:58 ` Alan Cox
2002-06-12 8:47 ` Martin Wilck
1 sibling, 1 reply; 31+ messages in thread
From: Alan Cox @ 2002-06-12 8:58 UTC (permalink / raw)
To: Martin Wilck; +Cc: osb4-bug, Linux Kernel mailing list, Martin Wilck
Triggering the check on csb5/csb6 would be a bug - maybe an extra
test is needed there as CSB5/6 are fine
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-12 8:58 ` Alan Cox
@ 2002-06-12 8:47 ` Martin Wilck
2002-06-12 9:14 ` Alan Cox
0 siblings, 1 reply; 31+ messages in thread
From: Martin Wilck @ 2002-06-12 8:47 UTC (permalink / raw)
To: Alan Cox; +Cc: osb4-bug, Linux Kernel mailing list
Am Mit, 2002-06-12 um 10.58 schrieb Alan Cox:
> Triggering the check on csb5/csb6 would be a bug - maybe an extra
> test is needed there as CSB5/6 are fine
Currently the stall is triggered if the DMA engine active bit is set, no
further conditions.
Would you concur that it would be reasonable to trigger only if
- the chipset version is < CSB5,
- the drive is a hard disk,
- and the drive did not report an error?
(I am not certain about the last condition, but from the descriptions
of the 4-byte-shift problem I have seen I infer that there was no drive
error condition involved).
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state
2002-06-12 8:47 ` Martin Wilck
@ 2002-06-12 9:14 ` Alan Cox
2002-06-12 10:30 ` OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state) Martin Wilck
0 siblings, 1 reply; 31+ messages in thread
From: Alan Cox @ 2002-06-12 9:14 UTC (permalink / raw)
To: Martin Wilck; +Cc: Alan Cox, osb4-bug, Linux Kernel mailing list
> Would you concur that it would be reasonable to trigger only if
>
> - the chipset version is < CSB5,
> - the drive is a hard disk,
> - and the drive did not report an error?
>
> (I am not certain about the last condition, but from the descriptions
> of the 4-byte-shift problem I have seen I infer that there was no drive
> error condition involved).
Entirely agreed
^ permalink raw reply [flat|nested] 31+ messages in thread
* OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state)
2002-06-12 9:14 ` Alan Cox
@ 2002-06-12 10:30 ` Martin Wilck
2002-06-12 20:35 ` Christian Zoffoli
0 siblings, 1 reply; 31+ messages in thread
From: Martin Wilck @ 2002-06-12 10:30 UTC (permalink / raw)
To: Alan Cox; +Cc: osb4-bug, Linux Kernel mailing list
Am Mit, 2002-06-12 um 11.14 schrieb Alan Cox:
> Entirely agreed
I propose this patch to remedy the problem.
I don't know how to test if the drive is a seagate drive, and
I think we don't want to do that, because it would end up in yet another
blacklist.
I cannot test if this behaves correctly on machines that do expose the
4-byte shift bug - it would be great if somebody could test that.
Martin
--- drivers/ide/serverworks.c.orig Tue Jun 11 11:24:59 2002
+++ drivers/ide/serverworks.c Wed Jun 12 12:00:36 2002
@@ -547,7 +547,13 @@
ide_hwif_t *hwif = HWIF(drive);
unsigned long dma_base = hwif->dma_base;
- if(inb(dma_base+0x02)&1)
+ /* If it's a disk on the OSB4, the DMA engine is still on,
+ and the device reports no error status, we are probably
+ facing the "4 byte shift" problem */
+ if(drive->media == ide_disk &&
+ hwif->pci_dev->device == PCI_DEVICE_ID_SERVERWORKS_OSB4IDE &&
+ inb(dma_base+0x02)&1 &&
+ OK_STAT (GET_STAT(), DRIVE_READY, BAD_STAT))
{
#if 0
int i;
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state)
2002-06-12 10:30 ` OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state) Martin Wilck
@ 2002-06-12 20:35 ` Christian Zoffoli
0 siblings, 0 replies; 31+ messages in thread
From: Christian Zoffoli @ 2002-06-12 20:35 UTC (permalink / raw)
To: Martin Wilck; +Cc: Alan Cox, osb4-bug, Linux Kernel mailing list
Martin Wilck wrote:
> Am Mit, 2002-06-12 um 11.14 schrieb Alan Cox:
> > Entirely agreed
>
> I propose this patch to remedy the problem.
>
> I don't know how to test if the drive is a seagate drive, and
> I think we don't want to do that, because it would end up in yet another
> blacklist.
>
> I cannot test if this behaves correctly on machines that do expose the
> 4-byte shift bug - it would be great if somebody could test that.
>
> Martin
>
> --- drivers/ide/serverworks.c.orig Tue Jun 11 11:24:59 2002
> +++ drivers/ide/serverworks.c Wed Jun 12 12:00:36 2002
> @@ -547,7 +547,13 @@
> ide_hwif_t *hwif = HWIF(drive);
> unsigned long dma_base = hwif->dma_base;
>
> - if(inb(dma_base+0x02)&1)
> + /* If it's a disk on the OSB4, the DMA engine is still on,
> + and the device reports no error status, we are probably
> + facing the "4 byte shift" problem */
> + if(drive->media == ide_disk &&
> + hwif->pci_dev->device == PCI_DEVICE_ID_SERVERWORKS_OSB4IDE &&
> + inb(dma_base+0x02)&1 &&
> + OK_STAT (GET_STAT(), DRIVE_READY, BAD_STAT))
> {
> #if 0
> int i;
>
>
It works for me ...I have a supermicro 370DE6 (serverworks HE-SL) and a
maxtor HD (5T030H3).
Christian
^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <1030002761.32380.27.camel@pluto.unixpac.com.au>]
* Re: ServerWorks OSB4 in impossible state
[not found] <1030002761.32380.27.camel@pluto.unixpac.com.au>
@ 2002-08-22 8:35 ` Martin Wilck
2002-08-22 8:51 ` Andre Hedrick
0 siblings, 1 reply; 31+ messages in thread
From: Martin Wilck @ 2002-08-22 8:35 UTC (permalink / raw)
To: Gonzalo Servat; +Cc: Alan Cox, Linux Kernel mailing list
Am Don, 2002-08-22 um 09.52 schrieb Gonzalo Servat:
> Do you have any suggestions on how I can work around this problem? It's
> been driving me nuts all day! (I bet it's driven people nuts for
> weeks...). Do you think your patch (as posted on
> http://linux-kernel.skylab.org/20020609/msg00935.html) may help my
> situation? If so, what kernel does it apply to? I looked up
> serverworks.c in a 2.4.19-rc3 tree to see if the patch would apply
> cleanly but it won't because line 547 is different to yours.
It should be fairly easy to adapt the patch, all you need is modify
the line
if(inb(dma_base+0x02)&1)
in svwks_dmaproc() to the more complex condition test in the patch.
Alan, I understood you to wanted apply this patch - what happened to it,
do you want me to resubmit it?
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: ServerWorks OSB4 in impossible state
2002-08-22 8:35 ` ServerWorks OSB4 in impossible state Martin Wilck
@ 2002-08-22 8:51 ` Andre Hedrick
2002-08-22 12:02 ` Martin Wilck
0 siblings, 1 reply; 31+ messages in thread
From: Andre Hedrick @ 2002-08-22 8:51 UTC (permalink / raw)
To: Martin Wilck; +Cc: Gonzalo Servat, Alan Cox, Linux Kernel mailing list
The problem is we need a special DMA engine for this broken puppy.
I am trying to remember the rule for forming the dma-table, and it is not
nice. The 4 byte issues is a direct result of building the SG which is
not compatable to the hardware.
508 + 4 is okay but 510 + 2 is not.
Now I have to remember why :-/
IIRC, we have to have 4 byte boundaries on the list.
This is where I need some extra help and doing something like the trm290
but for all of OSB4 because parsing out the broken engine bases on asic
revisions is darn near impossible.
Big Problem -- Big Hammer.
Tough if it tanks some of the performance, but it is better than the
deadlocks we are getting now.
Yeah I expect to take heat for this one from ServerWorks and it may cost
me later, but nobody else has got the guts to press the issue for the
correct solution.
Then again if we solve this correctly I have "ends justify means"
argument.
Cheers,
On 22 Aug 2002, Martin Wilck wrote:
> Am Don, 2002-08-22 um 09.52 schrieb Gonzalo Servat:
>
> > Do you have any suggestions on how I can work around this problem? It's
> > been driving me nuts all day! (I bet it's driven people nuts for
> > weeks...). Do you think your patch (as posted on
> > http://linux-kernel.skylab.org/20020609/msg00935.html) may help my
> > situation? If so, what kernel does it apply to? I looked up
> > serverworks.c in a 2.4.19-rc3 tree to see if the patch would apply
> > cleanly but it won't because line 547 is different to yours.
>
> It should be fairly easy to adapt the patch, all you need is modify
> the line
> if(inb(dma_base+0x02)&1)
>
> in svwks_dmaproc() to the more complex condition test in the patch.
>
> Alan, I understood you to wanted apply this patch - what happened to it,
> do you want me to resubmit it?
>
> Martin
>
> --
> Martin Wilck Phone: +49 5251 8 15113
> Fujitsu Siemens Computers Fax: +49 5251 8 20409
> Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
> D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
>
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Andre Hedrick
LAD Storage Consulting Group
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: ServerWorks OSB4 in impossible state
2002-08-22 8:51 ` Andre Hedrick
@ 2002-08-22 12:02 ` Martin Wilck
2002-08-22 16:45 ` Tomas Szepe
2002-08-22 17:58 ` Alan Cox
0 siblings, 2 replies; 31+ messages in thread
From: Martin Wilck @ 2002-08-22 12:02 UTC (permalink / raw)
To: Andre Hedrick; +Cc: Gonzalo Servat, Alan Cox, Linux Kernel mailing list
Am Don, 2002-08-22 um 10.51 schrieb Andre Hedrick:
> The problem is we need a special DMA engine for this broken puppy.
You certainly have much more insight into the problem than I.
I wonder if (something like) the simple patch I submitted before can
be a temporary solution nevertheless. Please correct me if one of the
following statements is wrong:
1) The "4 byte shift" issue does not affect the CSB5 series.
2) The tested condition inb(dma_base+0x02)&1 is valid if the
device doing the DMA reported an error status. Only if the
device reports success is there an indication of the "4 byte shift".
3) The "4 byte shift" problem matters not for read-only devices like
CD-ROMS; at least it is no reason to stall the computer if it occurs
because data corruption is not an issue.
If these assertions are true, the patch I sent will at least prevent
people's machines from stalling unnecessarily. Even if one ore more are
false, the remaining correct condition test(s) will narrow the set
of machines that are stalled unnecessarily.
> 508 + 4 is okay but 510 + 2 is not.
>
> Now I have to remember why :-/
You sure have to go for the right solution.
But if my patch was applied, ServerWorks chip sets would cause less
grief to people until you have figured it out.
> Yeah I expect to take heat for this one from ServerWorks and it may cost
> me later, but nobody else has got the guts to press the issue for the
> correct solution.
Let me know if we can help. I have no personal contacts to ServerWorks,
but we are a large customer of them and may be able to exert some
additional pressure. The current situation (IDE DMA must be disabled)
is hardly acceptable for us anyway.
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: ServerWorks OSB4 in impossible state
2002-08-22 12:02 ` Martin Wilck
@ 2002-08-22 16:45 ` Tomas Szepe
2002-08-22 17:48 ` Andre Hedrick
2002-08-22 17:59 ` Alan Cox
2002-08-22 17:58 ` Alan Cox
1 sibling, 2 replies; 31+ messages in thread
From: Tomas Szepe @ 2002-08-22 16:45 UTC (permalink / raw)
To: Martin Wilck
Cc: Andre Hedrick, Gonzalo Servat, Alan Cox,
Linux Kernel mailing list
> > Yeah I expect to take heat for this one from ServerWorks and it may cost
> > me later, but nobody else has got the guts to press the issue for the
> > correct solution.
>
> Let me know if we can help. I have no personal contacts to ServerWorks,
> but we are a large customer of them and may be able to exert some
> additional pressure. The current situation (IDE DMA must be disabled)
> is hardly acceptable for us anyway.
AFAIK 2.4.18 as well as 2.4.19-preEARLY seemed to work flawlessly w/ OSB4
even in DMA modes. How's the code there then? Is it dangerous to use?
T.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: ServerWorks OSB4 in impossible state
2002-08-22 16:45 ` Tomas Szepe
@ 2002-08-22 17:48 ` Andre Hedrick
2002-08-22 17:59 ` Alan Cox
1 sibling, 0 replies; 31+ messages in thread
From: Andre Hedrick @ 2002-08-22 17:48 UTC (permalink / raw)
To: Tomas Szepe
Cc: Martin Wilck, Gonzalo Servat, Alan Cox, Linux Kernel mailing list
It took sometime figure this out with the ASIC architect.
Since there is not an easy way to determine which of the extremely early
SB's had the issue, it is suggested to hit it with a hammer on the DMA
table building.
On Thu, 22 Aug 2002, Tomas Szepe wrote:
> > > Yeah I expect to take heat for this one from ServerWorks and it may cost
> > > me later, but nobody else has got the guts to press the issue for the
> > > correct solution.
> >
> > Let me know if we can help. I have no personal contacts to ServerWorks,
> > but we are a large customer of them and may be able to exert some
> > additional pressure. The current situation (IDE DMA must be disabled)
> > is hardly acceptable for us anyway.
>
> AFAIK 2.4.18 as well as 2.4.19-preEARLY seemed to work flawlessly w/ OSB4
> even in DMA modes. How's the code there then? Is it dangerous to use?
>
> T.
>
Andre Hedrick
LAD Storage Consulting Group
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: ServerWorks OSB4 in impossible state
2002-08-22 16:45 ` Tomas Szepe
2002-08-22 17:48 ` Andre Hedrick
@ 2002-08-22 17:59 ` Alan Cox
2002-08-22 18:14 ` Tomas Szepe
1 sibling, 1 reply; 31+ messages in thread
From: Alan Cox @ 2002-08-22 17:59 UTC (permalink / raw)
To: Tomas Szepe
Cc: Martin Wilck, Andre Hedrick, Gonzalo Servat,
Linux Kernel mailing list
On Thu, 2002-08-22 at 17:45, Tomas Szepe wrote:
> AFAIK 2.4.18 as well as 2.4.19-preEARLY seemed to work flawlessly w/ OSB4
> even in DMA modes. How's the code there then? Is it dangerous to use?
Most of them work all the time (most OSB4, all CSB5. all CSB6)
All of them work all the time with most drives
Some of them do horrible things in UDMA with some drives (timing
patterns I guess)
All of the OSB4 do MWDMA fine.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: ServerWorks OSB4 in impossible state
2002-08-22 17:59 ` Alan Cox
@ 2002-08-22 18:14 ` Tomas Szepe
0 siblings, 0 replies; 31+ messages in thread
From: Tomas Szepe @ 2002-08-22 18:14 UTC (permalink / raw)
To: Alan Cox
Cc: Martin Wilck, Andre Hedrick, Gonzalo Servat,
Linux Kernel mailing list
> > AFAIK 2.4.18 as well as 2.4.19-preEARLY seemed to work flawlessly w/ OSB4
> > even in DMA modes. How's the code there then? Is it dangerous to use?
>
> Most of them work all the time (most OSB4, all CSB5. all CSB6)
> All of them work all the time with most drives
> Some of them do horrible things in UDMA with some drives (timing
> patterns I guess)
>
> All of the OSB4 do MWDMA fine.
Oh it's not such a big problem then. If it tells you/Andre anything,
the controller I've run into trouble with seems to be (output from
2.4.19-pre2):
00:0f.1 IDE interface: Relience Computer: Unknown device 0211 (prog-if 8a [Master SecP PriP])
Flags: bus master, medium devsel, latency 64
I/O ports at 1880 [size=16]
00: 66 11 11 02 45 01 00 02 00 8a 01 01 00 40 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 81 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ServerWorks OSB4: IDE controller on PCI bus 00 dev 79
ServerWorks OSB4: chipset revision 0
(This is what they put into the HP NetServer E800, which is otherwise a nice
machine -- With these we can get up to 8 NICs to work w/o IRQ sharing. Ideal
for building routers, except if we were to put SCSI drives everywhere, we'd
have nothing to eat soon enough.)
So far we've been ok as 2.4.19-pre2 indeed appears to work just fine in UDMA2.
T.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: ServerWorks OSB4 in impossible state
2002-08-22 12:02 ` Martin Wilck
2002-08-22 16:45 ` Tomas Szepe
@ 2002-08-22 17:58 ` Alan Cox
2002-08-22 18:58 ` Martin Wilck
1 sibling, 1 reply; 31+ messages in thread
From: Alan Cox @ 2002-08-22 17:58 UTC (permalink / raw)
To: Martin Wilck; +Cc: Andre Hedrick, Gonzalo Servat, Linux Kernel mailing list
On Thu, 2002-08-22 at 13:02, Martin Wilck wrote:
> 1) The "4 byte shift" issue does not affect the CSB5 series.
True (not a rule the -ac tree knows about right now) but one that the
next tree will subject to time constraints.
> 2) The tested condition inb(dma_base+0x02)&1 is valid if the
> device doing the DMA reported an error status. Only if the
> device reports success is there an indication of the "4 byte shift".
True
> 3) The "4 byte shift" problem matters not for read-only devices like
> CD-ROMS; at least it is no reason to stall the computer if it occurs
> because data corruption is not an issue.
True (-ac knows about this)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: ServerWorks OSB4 in impossible state
2002-08-22 17:58 ` Alan Cox
@ 2002-08-22 18:58 ` Martin Wilck
0 siblings, 0 replies; 31+ messages in thread
From: Martin Wilck @ 2002-08-22 18:58 UTC (permalink / raw)
To: Alan Cox; +Cc: Andre Hedrick, Gonzalo Servat, Linux Kernel mailing list
Am Don, 2002-08-22 um 19.58 schrieb Alan Cox:
> > 2) The tested condition inb(dma_base+0x02)&1 is valid if the
> > device doing the DMA reported an error status. Only if the
> > device reports success is there an indication of the "4 byte shift".
>
> True
This condition is easy to test, right? My patch tested for
OK_STAT (GET_STAT(), DRIVE_READY, BAD_STAT)
Why not put that in the code?
Martin
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20020613112932.C2B8C10A1B@mail.medav.de>]
* Re: Serverworks OSB4 in impossible state
[not found] <20020613112932.C2B8C10A1B@mail.medav.de>
@ 2002-06-13 12:52 ` Martin Wilck
0 siblings, 0 replies; 31+ messages in thread
From: Martin Wilck @ 2002-06-13 12:52 UTC (permalink / raw)
To: Daniela Engert; +Cc: Linux Kernel mailing list
Am Don, 2002-06-13 um 14.32 schrieb Daniela Engert:
> I have no idea if the same is happening in case of an aborted ATA DMA
> transfer (I have no bad disk around), but at least I will disable ATAPI
> DMA transfers in my driver in case of early revision (whatever this is)
> OSB4 systems - possibly on all OSB4 systems. According to your
> experiences, the CSB5 and later seem to be fine.
Sorry, bad wording. I meant "OSB4" as opposed to "CSB5/6".
--
Martin Wilck Phone: +49 5251 8 15113
Fujitsu Siemens Computers Fax: +49 5251 8 20409
Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn http://www.fujitsu-siemens.com/primergy
^ permalink raw reply [flat|nested] 31+ messages in thread
* Serverworks OSB4 in impossible state.
@ 2002-06-03 17:40 Steven Timm
2002-06-04 0:29 ` Alan Cox
0 siblings, 1 reply; 31+ messages in thread
From: Steven Timm @ 2002-06-03 17:40 UTC (permalink / raw)
To: linux-kernel
Configuration: Supermicro 370DLE motherboard, 2x1GHz pentium III,
Redhat 7.1 plus 2.4.18-4 kernel as shipped from Redhat,
Three IBM disks, hda=20Gb, hdc,hdd=40Gb, hdb=cdrom.
This system and 100-some others like it have had some kind
of DMA problems at every level of kernel and with
three different vendors of system disk...but was pretty
stable at 2.4.9 kernel and IBM system disks, also with 2.2.19
kernel and IBM system disks.
Now with 2.4.18 we get the following error, and the
system hangs:
Serverworks OSB4 in impossible state.
Disable UDMA or if you are using Seagate then try switching disk types
on this controller. Please report this event to osb4-bug@ide.cabal.tm
OSB4: continuing might cause disk corruption.
This is the only one of 60 machines thus configured that has
had the error thus far.
Two points:
1) The E-mail address in that kernel debug message doesn't exist.
E-mail bounces back from it.
2) What is causing the hang and are there any hopes to
fix it in software this time? Last year when I came to the kernel
list with problems very similar, the consensus was that this
is actually broken hardware in the OSB4 chipset...but obviously
it is possible for at least some kernels to run quasi-normally
on this hardware... what changed between 2.4.9 and 2.4.18 so
it doesn't anymore?
Steve Timm
------------------------------------------------------------------
Steven C. Timm (630) 840-8525 timm@fnal.gov http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state.
2002-06-03 17:40 Steven Timm
@ 2002-06-04 0:29 ` Alan Cox
2002-06-03 18:11 ` kwijibo
0 siblings, 1 reply; 31+ messages in thread
From: Alan Cox @ 2002-06-04 0:29 UTC (permalink / raw)
To: Steven Timm; +Cc: linux-kernel
On Mon, 2002-06-03 at 18:40, Steven Timm wrote:
> Serverworks OSB4 in impossible state.
> Disable UDMA or if you are using Seagate then try switching disk types
> on this controller. Please report this event to osb4-bug@ide.cabal.tm
> OSB4: continuing might cause disk corruption.
>
> This is the only one of 60 machines thus configured that has
> had the error thus far.
>
> Two points:
> 1) The E-mail address in that kernel debug message doesn't exist.
> E-mail bounces back from it.
Oops I'll go fix that small detail. It should have been forwarded to me.
> 2) What is causing the hang and are there any hopes to
> fix it in software this time? Last year when I came to the kernel
> list with problems very similar, the consensus was that this
> is actually broken hardware in the OSB4 chipset...but obviously
> it is possible for at least some kernels to run quasi-normally
> on this hardware... what changed between 2.4.9 and 2.4.18 so
> it doesn't anymore?
The code traps out when it sees the I/O complete and it turns out that
the DMA engine flags say the engine is still running. In this state we
kill the box because we know the next I/O will be written 4 bytes skewed
with the last 4 bytes of the previous I/O apparently repeated at the
start.
I took it up with the Serverworks guys at the time, but they were not
able to duplicate the problem and provide advice. Since we could verify
this across an entire rendering farm it was clearly not a weird one off
bug. It also doesn't appear to be a Linux bug (but maybe one day I'll be
proved wrong).
If you drop the drives to MWDMA2 you'll see only slightly lower
performance and solid behaviour
Alan
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Serverworks OSB4 in impossible state.
2002-06-04 0:29 ` Alan Cox
@ 2002-06-03 18:11 ` kwijibo
0 siblings, 0 replies; 31+ messages in thread
From: kwijibo @ 2002-06-03 18:11 UTC (permalink / raw)
To: Alan Cox; +Cc: Steven Timm, linux-kernel
I had this same problem and I posted to the list a couple
of weeks ago but it never got any response. The only
thing I have on the IDE is a CDROM, rest is SCSI. I could
mount the CD drive with no problem but once I tried to read
any data from it I would get the 'impossible state' error. I can
reproduce this at any time, I don't know how the Serverworks
people can't. Just have them go buy a Dell PowerEdge 1650
and use the CDROM. This was with 2.4.18. I found a work
around for it however. I just turned off DMA and it worked fine
again. I guess it is turned on by default. DMA turned off on a
hard drive could suck though, not sure what you could do.
Steven
Alan Cox wrote:
>On Mon, 2002-06-03 at 18:40, Steven Timm wrote:
>
>
>>Serverworks OSB4 in impossible state.
>>Disable UDMA or if you are using Seagate then try switching disk types
>>on this controller. Please report this event to osb4-bug@ide.cabal.tm
>>OSB4: continuing might cause disk corruption.
>>
>>This is the only one of 60 machines thus configured that has
>>had the error thus far.
>>
>>Two points:
>>1) The E-mail address in that kernel debug message doesn't exist.
>>E-mail bounces back from it.
>>
>>
>
>Oops I'll go fix that small detail. It should have been forwarded to me.
>
>
>
>>2) What is causing the hang and are there any hopes to
>>fix it in software this time? Last year when I came to the kernel
>>list with problems very similar, the consensus was that this
>>is actually broken hardware in the OSB4 chipset...but obviously
>>it is possible for at least some kernels to run quasi-normally
>>on this hardware... what changed between 2.4.9 and 2.4.18 so
>>it doesn't anymore?
>>
>>
>
>The code traps out when it sees the I/O complete and it turns out that
>the DMA engine flags say the engine is still running. In this state we
>kill the box because we know the next I/O will be written 4 bytes skewed
>with the last 4 bytes of the previous I/O apparently repeated at the
>start.
>
>I took it up with the Serverworks guys at the time, but they were not
>able to duplicate the problem and provide advice. Since we could verify
>this across an entire rendering farm it was clearly not a weird one off
>bug. It also doesn't appear to be a Linux bug (but maybe one day I'll be
>proved wrong).
>
>If you drop the drives to MWDMA2 you'll see only slightly lower
>performance and solid behaviour
>
>Alan
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2002-08-22 18:54 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-10 15:52 Serverworks OSB4 in impossible state Martin Wilck
2002-06-10 16:41 ` Daniela Engert
2002-06-11 7:22 ` Martin Wilck
2002-06-11 7:45 ` Daniela Engert
2002-06-11 8:37 ` Martin Wilck
2002-06-11 11:25 ` Martin Wilck
2002-06-11 21:27 ` Chris Wedgwood
2002-06-12 7:24 ` Martin Wilck
2002-06-13 11:50 ` Daniela Engert
2002-06-13 11:59 ` Martin Wilck
2002-06-13 12:04 ` Daniela Engert
2002-06-13 18:27 ` rico-linux-kernel
2002-06-13 23:48 ` Re[2]: " Nerijus Baliunas
2002-06-12 8:58 ` Alan Cox
2002-06-12 8:47 ` Martin Wilck
2002-06-12 9:14 ` Alan Cox
2002-06-12 10:30 ` OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state) Martin Wilck
2002-06-12 20:35 ` Christian Zoffoli
[not found] <1030002761.32380.27.camel@pluto.unixpac.com.au>
2002-08-22 8:35 ` ServerWorks OSB4 in impossible state Martin Wilck
2002-08-22 8:51 ` Andre Hedrick
2002-08-22 12:02 ` Martin Wilck
2002-08-22 16:45 ` Tomas Szepe
2002-08-22 17:48 ` Andre Hedrick
2002-08-22 17:59 ` Alan Cox
2002-08-22 18:14 ` Tomas Szepe
2002-08-22 17:58 ` Alan Cox
2002-08-22 18:58 ` Martin Wilck
[not found] <20020613112932.C2B8C10A1B@mail.medav.de>
2002-06-13 12:52 ` Serverworks " Martin Wilck
-- strict thread matches above, loose matches on Subject: below --
2002-06-03 17:40 Steven Timm
2002-06-04 0:29 ` Alan Cox
2002-06-03 18:11 ` kwijibo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox