Serverworks OSB4 in impossible state

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Serverworks OSB4 in impossible state
@ 2002-06-10 15:52 Martin Wilck
  2002-06-10 16:41 ` Daniela Engert
  2002-06-12  8:58 ` Alan Cox
  0 siblings, 2 replies; 31+ messages in thread
From: Martin Wilck @ 2002-06-10 15:52 UTC (permalink / raw)
  To: osb4-bug; +Cc: Linux Kernel mailing list, Martin Wilck

[-- Attachment #1: Type: text/plain, Size: 1574 bytes --]


Hello,

I know a similar problem was discussed here a short while ago.
However we have here a situation where we can reproduce the problem 
reliably. This is a RedHat 2.4.18-4 kernel.

We have a CD with a corrupt last block. If we try to read this block in
PIO mode (hdparm -d 0 /dev/hdc) , we get errors like in the first
attachment.

The machine has only a CDROM (Mitsumi FX 4830T) attached to the IDE bus
as /dev/hdc. We used no IDE-related boot parameters.

If we read the block in DMA mode (with dd), the machine stalls with the
"impossible state" message.

A PCI bus scan reveals that the IO register (dma_base+2) contains indeed
0xa5 (bit 0 set), which leads to the panic. Normally the read on that
register returns 0xa0.

We see in our PCI bus scan that a successful DMA of 4096 bytes was
carried out ~23ms before the stall condition. Another 4096 byte request
was scheduled but never seen. Between the successful DMA and the stall
condition we see nothing but a few timer interrupts.
Then an IDE interrupt occurs, which leads immediately to the panic.

The CD-ROM drive certainly reports some sort of error like in the PIO
case when tyring to access the last block. This seems to be the
(indirect) reason why the Bus master bit in (dma_base+2) remains set
long after the DMA is finished. 

Any ideas/comments?

Martin

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy





[-- Attachment #2: Kernel error messages in PIO-mode --]
[-- Type: text/plain, Size: 3573 bytes --]

Jun 10 13:12:40 pdb0384c kernel: hdc: command error: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: error=0x50
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: error=0x50
Jun 10 13:12:40 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307712
Jun 10 13:12:49 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:49 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:12:56 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:56 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:06 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:06 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:09 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:09 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:09 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:13 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:13 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:17 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:17 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:21 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:21 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:21 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:21 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307716
Jun 10 13:13:21 pdb0384c kernel: hdc: command error: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:21 pdb0384c kernel: hdc: command error: error=0x50
Jun 10 13:13:21 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307712
Jun 10 13:13:25 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:25 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:29 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:29 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:33 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:33 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:36 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:36 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:36 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:41 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:41 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:44 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:44 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:48 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:48 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:48 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:48 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307716

[-- Attachment #3: dmesg --]
[-- Type: text/plain, Size: 15690 bytes --]

ACPI table found: RSDT v1 [PTLTD    RSDT   1540.1]
__va_range(0xbfefc0f9, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefc0f9, 0x74): idx=8 mapped at ffff6000
ACPI table found: FACP v1 [FSC    D1309    1540.1]
__va_range(0xbfefeef8, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefeef8, 0x50): idx=8 mapped at ffff6000
ACPI table found: SPCR v1 [PTLTD  $UCRTBL$ 1540.1]
__va_range(0xbfefef48, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefef48, 0x90): idx=8 mapped at ffff6000
ACPI table found: APIC v1 [PTLTD  	 APIC   1540.1]
__va_range(0xbfefef48, 0x90): idx=8 mapped at ffff6000
LAPIC (acpi_id[0x0000] id[0x6] enabled[1])
CPU 0 (0x0600) enabledProcessor #6 Unknown CPU [15:2] APIC version 16

LAPIC (acpi_id[0x0001] id[0x0] enabled[1])
CPU 1 (0x0000) enabledProcessor #0 Unknown CPU [15:2] APIC version 16

LAPIC (acpi_id[0x0002] id[0x1] enabled[1])
CPU 2 (0x0100) enabledProcessor #1 Unknown CPU [15:2] APIC version 16

LAPIC (acpi_id[0x0003] id[0x7] enabled[1])
CPU 3 (0x0700) enabledProcessor #7 Unknown CPU [15:2] APIC version 16

IOAPIC (id[0x2] address[0xfec00000] global_irq_base[0x0])
IOAPIC (id[0x3] address[0xfec10000] global_irq_base[0x10])
INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x1] trigger[0x1])
INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x3] trigger[0x3])
LAPIC_NMI (acpi_id[0x0000] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0001] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0002] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0003] polarity[0x1] trigger[0x1] lint[0x1])
4 CPUs total
Local APIC address fee00000
__va_range(0xbfefefd8, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefefd8, 0x28): idx=8 mapped at ffff6000
ACPI table found: BOOT v1 [PTLTD  $SBFTBL$ 1540.1]
Enabling the CPU's according to the ACPI table
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: FSCD1309 Product ID: PRIMERGY     APIC at: 0xFEE00000
I/O APIC #2 Version 17 at 0xFEC00000.
I/O APIC #3 Version 17 at 0xFEC10000.
Processors: 4
Kernel command line: ro root=/dev/sda2
Initializing CPU#0
Detected 2395.457 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 4771.02 BogoMIPS
Memory: 3098776k/3145728k available (1232k kernel code, 46496k reserved, 842k data, 304k init, 2228160k highmem)
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode cache hash table entries: 262144 (order: 9, 2097152 bytes)
Mount-cache hash table entries: 65536 (order: 7, 524288 bytes)
Buffer cache hash table entries: 262144 (order: 8, 1048576 bytes)
Page-cache hash table entries: 524288 (order: 9, 2097152 bytes)
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU0: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
per-CPU timeslice cutoff: 1462.93 usecs.
task migration cache decay timeout: 10 msecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Booting processor 1/0 eip 2000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4784.12 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#1.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU1: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
Booting processor 2/1 eip 2000
Initializing CPU#2
masked ExtINT on CPU#2
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4784.12 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#2.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU2: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
Booting processor 3/7 eip 2000
Initializing CPU#3
masked ExtINT on CPU#3
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4784.12 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#3.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU3: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
Total of 4 processors activated (19123.40 BogoMIPS).
cpu_sibling_map[0] = 3
cpu_sibling_map[1] = 2
cpu_sibling_map[2] = 1
cpu_sibling_map[3] = 0
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
Setting 3 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 3 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-10, 2-11, 3-0, 3-1, 3-2, 3-3, 3-4, 3-5, 3-6, 3-7, 3-8, 3-9, 3-10, 3-11, 3-12, 3-15 not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
number of MP IRQ sources: 16.
number of IO-APIC #2 registers: 16.
number of IO-APIC #3 registers: 16.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02000000
.......    : physical APIC id: 02
.... register #01: 000F0011
.......     : max redirection entries: 000F
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 02000000
.......     : arbitration: 02
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 000 00  1    0    0   0   0    0    0    00
 01 00F 0F  0    0    0   0   0    1    1    39
 02 00F 0F  0    0    0   0   0    1    1    31
 03 00F 0F  0    0    0   0   0    1    1    41
 04 00F 0F  0    0    0   0   0    1    1    49
 05 00F 0F  0    0    0   0   0    1    1    51
 06 00F 0F  0    0    0   0   0    1    1    59
 07 00F 0F  0    0    0   0   0    1    1    61
 08 00F 0F  0    0    0   0   0    1    1    69
 09 00F 0F  1    1    0   1   0    1    1    71
 0a 000 00  1    0    0   0   0    0    0    00
 0b 000 00  1    0    0   0   0    0    0    00
 0c 00F 0F  0    0    0   0   0    1    1    79
 0d 00F 0F  0    0    0   0   0    1    1    81
 0e 00F 0F  0    0    0   0   0    1    1    89
 0f 00F 0F  0    0    0   0   0    1    1    91

IO APIC #3......
.... register #00: 03000000
.......    : physical APIC id: 03
.... register #01: 000F0011
.......     : max redirection entries: 000F
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 03000000
.......     : arbitration: 03
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 000 00  1    0    0   0   0    0    0    00
 01 000 00  1    0    0   0   0    0    0    00
 02 000 00  1    0    0   0   0    0    0    00
 03 000 00  1    0    0   0   0    0    0    00
 04 000 00  1    0    0   0   0    0    0    00
 05 000 00  1    0    0   0   0    0    0    00
 06 000 00  1    0    0   0   0    0    0    00
 07 000 00  1    0    0   0   0    0    0    00
 08 000 00  1    0    0   0   0    0    0    00
 09 000 00  1    0    0   0   0    0    0    00
 0a 000 00  1    0    0   0   0    0    0    00
 0b 000 00  1    0    0   0   0    0    0    00
 0c 000 00  1    0    0   0   0    0    0    00
 0d 00F 0F  1    1    0   1   0    1    1    99
 0e 00F 0F  1    1    0   1   0    1    1    A1
 0f 000 00  1    0    0   0   0    0    0    00
IRQ to pin mappings:
IRQ0 -> 0:2
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ29 -> 1:13
IRQ30 -> 1:14
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 2395.2498 MHz.
..... host bus clock speed is 99.8019 MHz.
cpu: 0, clocks: 998019, slice: 199603
CPU0<T0:998016,T1:798400,D:13,S:199603,C:998019>
cpu: 2, clocks: 998019, slice: 199603
cpu: 3, clocks: 998019, slice: 199603
cpu: 1, clocks: 998019, slice: 199603
CPU1<T0:998016,T1:598800,D:10,S:199603,C:998019>
CPU2<T0:998016,T1:399200,D:7,S:199603,C:998019>
CPU3<T0:998016,T1:199600,D:4,S:199603,C:998019>
checking TSC synchronization across CPUs: passed.
PCI: PCI BIOS revision 2.10 entry at 0xfd9aa, last bus=2
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Discovered primary peer bus 01 [IRQ]
PCI: Discovered primary peer bus 02 [IRQ]
PCI->APIC IRQ transform: (B0,I5,P0) -> 30
PCI->APIC IRQ transform: (B0,I15,P0) -> 9
PCI->APIC IRQ transform: (B2,I10,P0) -> 29
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS not found.
Starting kswapd
allocated 64 pages and 64 bhs reserved for the highmem bounces
VFS: Diskquotas version dquot_6.5.0 initialized
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
block: 1024 slots per queue, batch=256
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks CSB5: IDE controller on PCI bus 00 dev 79
SvrWks CSB5: chipset revision 147
SvrWks CSB5: not 100% native mode: will probe irqs later
SvrWks CSB5: simplex device: DMA forced
    ide0: BM-DMA at 0x1800-0x1807, BIOS settings: hda:pio, hdb:pio
SvrWks CSB5: simplex device: DMA forced
    ide1: BM-DMA at 0x1808-0x180f, BIOS settings: hdc:pio, hdd:pio
hdc: FX4830T, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
ide-floppy driver 0.99.newide
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
pci_hotplug: PCI Hot Plug PCI Core version: 0.4
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 32768 buckets, 256Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 220k freed
VFS: Mounted root (ext2 filesystem).
SCSI subsystem driver Revision: 1.00
kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
sym53c8xx: at PCI bus 2, device 10, function 0
sym53c8xx: 53c1010-66 detected with Symbios NVRAM
sym53c1010-66-0: rev 0x1 on pci bus 2 device 10 function 0 irq 29
sym53c1010-66-0: Symbios format NVRAM, ID 7, Fast-80, Parity Checking
sym53c1010-66-0: on-chip RAM at 0xfe000000
sym53c1010-66-0: restart (scsi reset).
sym53c1010-66-0: handling phase mismatch from SCRIPTS.
sym53c1010-66-0: Downloading SCSI SCRIPTS.
scsi0 : sym53c8xx-1.7.3c-20010512
blk: queue f7fd6e18, I/O limit 4095Mb (mask 0xffffffff)
  Vendor: SEAGATE   Model: ST318451LC        Rev: 7500
  Type:   Direct-Access                      ANSI SCSI revision: 03
blk: queue f7fd6c18, I/O limit 4095Mb (mask 0xffffffff)
  Vendor: SEAGATE   Model: ST318451LC        Rev: 7500
  Type:   Direct-Access                      ANSI SCSI revision: 03
blk: queue f7fd6a18, I/O limit 4095Mb (mask 0xffffffff)
  Vendor: SDR       Model: GEM318            Rev: 0   
  Type:   Processor                          ANSI SCSI revision: 02
blk: queue f7308e18, I/O limit 4095Mb (mask 0xffffffff)
sym53c1010-66-0-<0,0>: tagged command queue depth set to 8
sym53c1010-66-0-<1,0>: tagged command queue depth set to 8
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
sym53c1010-66-0-<0,*>: FAST-80 WIDE SCSI 160.0 MB/s (12.5 ns, offset 62)
SCSI device sda: 35843671 512-byte hdwr sectors (18352 MB)
Partition check:
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 >
sym53c1010-66-0-<1,*>: FAST-80 WIDE SCSI 160.0 MB/s (12.5 ns, offset 62)
SCSI device sdb: 35843671 512-byte hdwr sectors (18352 MB)
 sdb:
Journalled Block Device driver loaded
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 304k freed
Adding Swap: 1702848k swap-space (priority -1)
Adding Swap: 1694816k swap-space (priority -2)
Adding Swap: 1702848k swap-space (priority -3)
Adding Swap: 1702848k swap-space (priority -4)
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-ohci.c: USB OHCI at membase 0xf8bc5000, IRQ 9
usb-ohci.c: usb-00:0f.2, ServerWorks OSB4/CSB5 OHCI USB Controller
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 4 ports detected
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on sd(8,2), internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on sd(8,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on sd(8,3), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
parport0: PC-style at 0x378 [PCSPP,TRISTATE,EPP]
eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others
eth0: OEM i82557/i82558 10/100 Ethernet, 00:30:05:29:74:92, IRQ 30.
  Board assembly 000000-000, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.
  General self-test: passed.
  Serial sub-system self-test: passed.
  Internal registers self-test: passed.
  ROM checksum self-test: passed (0x04f4518b).
hdc: ATAPI 48X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12
hdc: DMA disabled

[-- Attachment #4: /proc/ide/ide1/hdc/settings --]
[-- Type: text/plain, Size: 1199 bytes --]

name			value		min		max		mode
----			-----		---		---		----
breada_readahead        4               0               127             rw
current_speed           66              0               69              rw
dsc_overlap             0               0               1               rw
file_readahead          0               0               2097151         rw
ide_scsi                0               0               1               rw
init_speed              66              0               69              rw
io_32bit                0               0               3               rw
keepsettings            0               0               1               rw
max_kb_per_request      64              1               127             rw
nice1                   1               0               1               rw
number                  2               0               3               rw
pio_mode                write-only      0               255             w
slow                    0               0               1               rw
unmaskirq               0               0               1               rw
using_dma               1               0               1               rw

[-- Attachment #5: /proc/ide/svwks --]
[-- Type: text/plain, Size: 785 bytes --]


                             ServerWorks OSB4/CSB5/CSB6

                            ServerWorks CSB5 Chipset (rev 93)
------------------------------- General Status ---------------------------------
--------------- Primary Channel ---------------- Secondary Channel -------------
                disabled                         disabled
--------------- drive0 --------- drive1 -------- drive0 ---------- drive1 ------
DMA enabled:    no               no              yes               no 
UDMA enabled:   no               no              yes               no 
UDMA enabled:   0                0               2                 0
DMA enabled:    2                2               2                 2
PIO  enabled:   ?                ?               4                 ?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-10 15:52 Serverworks OSB4 in impossible state Martin Wilck
@ 2002-06-10 16:41 ` Daniela Engert
  2002-06-11  7:22   ` Martin Wilck
  2002-06-12  8:58 ` Alan Cox
  1 sibling, 1 reply; 31+ messages in thread
From: Daniela Engert @ 2002-06-10 16:41 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Linux Kernel mailing list

Hello Martin,

On 10 Jun 2002 17:52:58 +0200, Martin Wilck wrote:

>We have a CD with a corrupt last block. If we try to read this block in
>PIO mode (hdparm -d 0 /dev/hdc) , we get errors like in the first
>attachment.

The error code returned is "check condition" with a sense key of 3
"medium error". The most appropriate driver action would have been to
issue a "request sense" command to learn the precise error and retry
only in case of a good chance of a recoverable problem - but that's a
different story.

>If we read the block in DMA mode (with dd), the machine stalls with the
>"impossible state" message.
>
>A PCI bus scan reveals that the IO register (dma_base+2) contains indeed
>0xa5 (bit 0 set), which leads to the panic. Normally the read on that
>register returns 0xa0.

The intersting bits of the DMA status register are bits 0 though 2. A
value of 5 indicates the condition "interrupt from unit, DMA state
machine active". This is a valid status! It basically means the unit
issued an interrupt before the PRD table is exhausted. This makes sense
because the CD-ROM units fails to transfer the amount of data described
by the PRD table because of the non-recoverable read error.

>We see in our PCI bus scan that a successful DMA of 4096 bytes was
>carried out ~23ms before the stall condition. Another 4096 byte request
>was scheduled but never seen. Between the successful DMA and the stall
>condition we see nothing but a few timer interrupts.
>Then an IDE interrupt occurs, which leads immediately to the panic.

What you makes sense (the next DMA transfer is scheduled but never
carried out by the CD-ROM unit) except for the panic, ofcoz. The
correct driver action in this case were stopping the DMA engine and
issuing a reset of the state machines involved (both on the host and
the unit side).

>Any ideas/comments?

I hope this clears up things a little ...

Ciao,
  Dani

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-10 16:41 ` Daniela Engert
@ 2002-06-11  7:22   ` Martin Wilck
  2002-06-11  7:45     ` Daniela Engert
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Wilck @ 2002-06-11  7:22 UTC (permalink / raw)
  To: Daniela Engert; +Cc: Linux Kernel mailing list

Am Mon, 2002-06-10 um 18.41 schrieb Daniela Engert:

> The intersting bits of the DMA status register are bits 0 though 2. A
> value of 5 indicates the condition "interrupt from unit, DMA state
> machine active". This is a valid status! It basically means the unit
> issued an interrupt before the PRD table is exhausted. This makes sense
> because the CD-ROM units fails to transfer the amount of data described
> by the PRD table because of the non-recoverable read error.

Shouldn't the error bit be set too? (But that wouldn't make any
difference with the current driver ...)

> What you makes sense (the next DMA transfer is scheduled but never
> carried out by the CD-ROM unit) except for the panic, ofcoz. The
> correct driver action in this case were stopping the DMA engine and
> issuing a reset of the state machines involved (both on the host and
> the unit side).

The message, the comments in the code, and what Alan wrote here:
http://groups.google.com/groups?hl=de&lr=&threadm=linux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%40boxer.fnal.gov&rnum=2&prev=/groups%3Fq%3Dosb4-bug%2540ide.cabal.tm%26hl%3Dde%26lr%3D%26selm%3Dlinux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%2540boxer.fnal.gov%26rnum%3D2
suggest that trying to recover from this condition is extremely
dangerous (note that the kernel doesn't even panic(), because
a sync() may kill a disk, the comments say).

Anyway, thanks a lot for your insightful comments.
Martin

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-11  7:22   ` Martin Wilck
@ 2002-06-11  7:45     ` Daniela Engert
  2002-06-11  8:37       ` Martin Wilck
  2002-06-11 11:25       ` Martin Wilck
  0 siblings, 2 replies; 31+ messages in thread
From: Daniela Engert @ 2002-06-11  7:45 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Linux Kernel mailing list

On 11 Jun 2002 09:22:24 +0200, Martin Wilck wrote:

>Am Mon, 2002-06-10 um 18.41 schrieb Daniela Engert:

>> The intersting bits of the DMA status register are bits 0 though 2. A
>> value of 5 indicates the condition "interrupt from unit, DMA state
>> machine active". This is a valid status! It basically means the unit
>> issued an interrupt before the PRD table is exhausted. This makes sense
>> because the CD-ROM units fails to transfer the amount of data described
>> by the PRD table because of the non-recoverable read error.
>
>Shouldn't the error bit be set too? (But that wouldn't make any
>difference with the current driver ...)

No it shouldn't. The error is happening on the unit side and not on the
host side of the bus. Thus it is correct that the host is *not*
reporting an error (which is true) but only the CD-ROM unit.

>> What you makes sense (the next DMA transfer is scheduled but never
>> carried out by the CD-ROM unit) except for the panic, ofcoz. The
>> correct driver action in this case were stopping the DMA engine and
>> issuing a reset of the state machines involved (both on the host and
>> the unit side).
>
>The message, the comments in the code, and what Alan wrote here:
>http://groups.google.com/groups?hl=de&lr=&threadm=linux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%40boxer.fnal.gov&rnum=2&prev=/groups%3Fq%3Dosb4-bug%2540ide.cabal.tm%26hl%3Dde%26lr%3D%26selm%3Dlinux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%2540boxer.fnal.gov%26rnum%3D2
>suggest that trying to recover from this condition is extremely
>dangerous (note that the kernel doesn't even panic(), because
>a sync() may kill a disk, the comments say).

I'm aware of all of that. By pure chance I have a machine with an OSB4
sitting on my desk for a couple of days. May be I can find a defect
CD-ROM to test it with my driver and see if it manages to recover from
errors like these. Hopefully, the PCI tracer gives some more insight.

Ciao,
  Dani

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gräfenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-11  7:45     ` Daniela Engert
@ 2002-06-11  8:37       ` Martin Wilck
  2002-06-11 11:25       ` Martin Wilck
  1 sibling, 0 replies; 31+ messages in thread
From: Martin Wilck @ 2002-06-11  8:37 UTC (permalink / raw)
  To: Daniela Engert; +Cc: Linux Kernel mailing list

Am Die, 2002-06-11 um 09.45 schrieb Daniela Engert:

> I'm aware of all of that. By pure chance I have a machine with an OSB4
> sitting on my desk for a couple of days. May be I can find a defect
> CD-ROM to test it with my driver and see if it manages to recover from
> errors like these. Hopefully, the PCI tracer gives some more insight.

Do you have a custom version of the driver (because you write "my
driver")? If yes, can you send it, so that I can test it, too?

Can you point me to any reference material on the web?

Martin
-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-11  7:45     ` Daniela Engert
  2002-06-11  8:37       ` Martin Wilck
@ 2002-06-11 11:25       ` Martin Wilck
  2002-06-11 21:27         ` Chris Wedgwood
  2002-06-13 11:50         ` Daniela Engert
  1 sibling, 2 replies; 31+ messages in thread
From: Martin Wilck @ 2002-06-11 11:25 UTC (permalink / raw)
  To: Daniela Engert; +Cc: Linux Kernel mailing list, Alan Cox

[Alan, I am cc'ing you on this because I read elsewhere that you want 
osb4-bug@ide.cabal.tm to be forwarded to you, and that address still
bounces]. 

I have tried the following:

- comment out the code that stalls the machine when the condition in
  question is encountered.
- run dd over a couple of good blocks on the CD.
- run dd over the corrupted blocks. This leads now to very similar
  errors as in the PIO case.
- reenable DMA with hdparm, because it is automatically disabled by the
  ide-cd driver if an error occurs (why that? the error has nothing to
  do with DMA here).
- repeat the first dd command on the good blocks and compare the
  results.

The results are identical, thus I cannot verify the "4 byte shift" Alan
has been talking about. Of course this is a CD-ROM only scenario, thus
I can't tell anything about hard disks.

Is it possible that the 4-byte shift occurs only with some particular
(older?) version of the chipset? 

In any case, the condition that usually causes Linux to stall is 
indeed a perfectly valid condition for DMA when the device transfers
less data than it's supposed to. I doubt that hanging the system 
without more detailed checks is the right measure to take there.

Martin

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-11 11:25       ` Martin Wilck
@ 2002-06-11 21:27         ` Chris Wedgwood
  2002-06-12  7:24           ` Martin Wilck
  2002-06-13 11:50         ` Daniela Engert
  1 sibling, 1 reply; 31+ messages in thread
From: Chris Wedgwood @ 2002-06-11 21:27 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Daniela Engert, Linux Kernel mailing list, Alan Cox

On Tue, Jun 11, 2002 at 01:25:25PM +0200, Martin Wilck wrote:

    Is it possible that the 4-byte shift occurs only with some
    particular (older?) version of the chipset?

Maybe.

I have an oldish OSB4 here and beating on it only with the CDROM
(disks are all SCSI) I don't ever seem to see this problem:

00:00.0 Host bridge: ServerWorks CNB20LE Host Bridge (rev 05)
        Flags: bus master, medium devsel, latency 48

00:00.1 Host bridge: ServerWorks CNB20LE Host Bridge (rev 05)
        Flags: bus master, medium devsel, latency 48

I think what is really required is input from ServerWorks/Broadcom
about this.



  --cw

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-11 21:27         ` Chris Wedgwood
@ 2002-06-12  7:24           ` Martin Wilck
  0 siblings, 0 replies; 31+ messages in thread
From: Martin Wilck @ 2002-06-12  7:24 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linux Kernel mailing list

Am Die, 2002-06-11 um 23.27 schrieb Chris Wedgwood:

> I have an oldish OSB4 here and beating on it only with the CDROM
> (disks are all SCSI) I don't ever seem to see this problem:

UDMA33 mode? You need to have a broken CD (we happen to have a CD burner
that generates broken CDs)

> I think what is really required is input from ServerWorks/Broadcom
> about this.

Yeah, we are in contact with them.
Thanks,
Martin


-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-11 11:25       ` Martin Wilck
  2002-06-11 21:27         ` Chris Wedgwood
@ 2002-06-13 11:50         ` Daniela Engert
  2002-06-13 11:59           ` Martin Wilck
  2002-06-13 23:48           ` Re[2]: " Nerijus Baliunas
  1 sibling, 2 replies; 31+ messages in thread
From: Daniela Engert @ 2002-06-13 11:50 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Alan Cox, Linux Kernel mailing list

Hi,

as promised I've conducted a test similar to Martin's to check the
behaviour of a Serverworks ROSB4 IDE controller in case of an aborted
ATAPI DMA transfer (probably due to a media error). In fact, I've done
this by comparing it with a known-to-be-good system, a dual processor
Intel BX based board with a PIIX4 IDE controller chip.

The following trace shows how it should be:

 - lines 158-172: setup DMA transfer, send command packet
 - lines 173-174: the DMA engine loads the first (of multiple)
                  PRD entry
 - the actual DMA induced memory writes are not shown here
 - line  175:     IRQ14 is acknowledged
 - lines 176-181: gather unit and DMA status
 - lines 182-210: issue "request sense" and get sense status


CD-ROM read error on Intel PIIX4:

______Time_______Burst_BE#__Wait___Command_Address__Data____
  158  20.089ms  .     1011    .   I/OWri  000001F6 ..B0....
  159	1.656us  .     1011    .   I/ORd   000003F6 ..50....
  160	10.08us  .     0000    .   I/OWri  0000F004 00EAC800
  161	812.7ns  .     1110    .   I/OWri  0000F000 ......08
  162	752.5ns  .     1011    .   I/OWri  0000F002 ..46....
  163	3.100us  .     1110    .   I/OWri  000001F4 ......FF
  164	4.214us  .     1101    .   I/OWri  000001F5 ....FF..
  165	4.244us  .     1101    .   I/OWri  000001F1 ....01..
  166	4.214us  .     0111    .   I/OWri  000001F7 A0......
  167	5.027us  .     1011    .   I/ORd   000003F6 ..58....
  168	5.448us  .     1011    .   I/OWri  0000F002 ..46....
  169	752.5ns  .     1110    .   I/OWri  0000F000 ......09
  170	1.204us  .     0000    .   I/OWri  000001F0 00000028
  171	903.0ns  .     0000    .   I/OWri  000001F0 0000440D
  172	903.0ns  .     0000    .   I/OWri  000001F0 0000001F
  173	 3.673s  Start 0000    .   MemRd   00EAC800 006E3000
  174	 30.1ns  B     0000    .   MemRd   00EAC800 0000D000
  175  648.99ms  .     1110    .   IntAck  ........ ......76
  176	5.779us  .     0111    .   I/ORd   000001F7 51......
  177	11.47us  .     0111    .   I/ORd   000001F7 51......
  178	1.324us  .     1011    .   I/ORd   000001F2 ..03....
  179	1.957us  .     1110    .   I/OWri  0000F000 ......08
  180	812.7ns  .     1011    .   I/ORd   0000F002 ..44....
  181	1.355us  .     1101    .   I/ORd   000001F1 ....30..
  182	9.361us  .     1011    .   I/OWri  000001F6 ..B0....
  183	1.806us  .     1011    .   I/ORd   000003F6 ..51....
  184	9.301us  .     1110    .   I/OWri  000001F4 ......12
  185	4.274us  .     1101    .   I/OWri  000001F5 ....00..
  186	4.244us  .     1101    .   I/OWri  000001F1 ....00..
  187	4.214us  .     0111    .   I/OWri  000001F7 A0......
  188	4.906us  .     1011    .   I/ORd   000003F6 ..58....
  189	6.020us  .     0000    .   I/OWri  000001F0 00000003
  190	903.0ns  .     0000    .   I/OWri  000001F0 00000012
  191	903.0ns  .     0000    .   I/OWri  000001F0 00000000
  192  258.17us  .     1110    .   IntAck  ........ ......76
  193	3.431us  .     0111    .   I/ORd   000001F7 58......
  194	10.08us  .     0111    .   I/ORd   000001F7 58......
  195	1.204us  .     1011    .   I/ORd   000001F2 ..02....
  196	1.535us  .     1101    .   I/ORd   000001F5 ....00..
  197	1.174us  .     1110    .   I/ORd   000001F4 ......12
  198	10.20us  .     1100    .   I/ORd   000001F0 ....0070
  199	1.475us  .     1100    .   I/ORd   000001F0 ....0003
  200	632.1ns  .     1100    .   I/ORd   000001F0 ....0000
  201	632.1ns  .     1100    .   I/ORd   000001F0 ....0A00
  202	602.0ns  .     1100    .   I/ORd   000001F0 ....0000
  203	632.1ns  .     1100    .   I/ORd   000001F0 ....0000
  204	602.0ns  .     1100    .   I/ORd   000001F0 ....0611
  205	632.1ns  .     1100    .   I/ORd   000001F0 ....0000
  206	602.0ns  .     1100    .   I/ORd   000001F0 ....0000
  207	12.79us  .     1110    .   IntAck  ........ ......76
  208	3.401us  .     0111    .   I/ORd   000001F7 50......
  209	9.361us  .     0111    .   I/ORd   000001F7 50......
  210	1.234us  .     1011    .   I/ORd   000001F2 ..03....


And here is the same with the ROSB4. This time, some of the
DMA writes are shown. After loading the second PRD entry
which describes a memory region of 7800h bytes, 3000h bytes
are transferred before IRQ14 is asserted. The IRQ14 INTACK
cycle is the last transaction on the PCI bus ever, the
machine is completely frozen!

CD-ROM read error on ServerWorks ROSB4 revision 0:

______Time_______Burst_BE#__Wait___Command_Address__Data____
51316  297.63us  .     1011    .   I/OWri  000001F6 ..B0....
51317	1.530us  .     1011    .   I/ORd   000003F6 ..50....
51318	6.300us  .     0000    .   I/OWri  00005404 00EF2800
51319	  450ns  .     1110    .   I/OWri  00005400 ......08
51320	  450ns  .     1011    .   I/OWri  00005402 ..66....
51321	1.440us  .     1110    .   I/OWri  000001F4 ......FF
51322	3.480us  .     1101    .   I/OWri  000001F5 ....FF..
51323	3.480us  .     1101    .   I/OWri  000001F1 ....01..
51324	3.510us  .     0111    .   I/OWri  000001F7 A0......
51325	4.470us  .     1011    .   I/ORd   000003F6 ..58....
51326	4.620us  .     1011    .   I/OWri  00005402 ..66....
51327	  660ns  .     0000    .   I/OWri  000001F0 00000028
51328	  420ns  .     0000    .   I/OWri  000001F0 0000F80D
51329	  420ns  .     0000    .   I/OWri  000001F0 0000001F
51330	1.290us  .     1011    .   I/ORd   000003F6 ..D0....
51331	3.660us  .     1110    .   I/OWri  00005400 ......09
51332	1.290us  .     0000    .   MemRd   00EF2800 00B08000
51333	  630ns  .     0000    .   MemRd   00EF2804 00008000
51334  166.11us  Start 0000    .   MemWri  00B08000 7BC0728C
51335	   30ns  B     0000    .   MemWri  00B08000 285DA7D0
51336	   30ns  B     0000    .   MemWri  00B08000 9FAE557A
51337	   30ns  B     0000    .   MemWri  00B08000 B3F88165
51338	   30ns  B     0000    .   MemWri  00B08000 BDFD7823
51339	   30ns  B     0000    .   MemWri  00B08000 42ED22D0
51340	   30ns  B     0000    .   MemWri  00B08000 7BA5743F
51341	   30ns  B     0000    .   MemWri  00B08000 6B5897BA
51342	  780ns  Start 0000    .   MemWri  00B08020 ACF1D36B
  ..
  ..
59518	  930ns  Start 0000    .   MemWri  00B0FFE0 845971B8
59519	   30ns  B     0000    .   MemWri  00B0FFE0 7E325F95
59520	   30ns  B     0000    .   MemWri  00B0FFE0 7ADA36D0
59521	   30ns  B     0000    .   MemWri  00B0FFE0 96BD435C
59522	   30ns  B     0000    .   MemWri  00B0FFE0 4ED88CB0
59523	   30ns  B     0000    .   MemWri  00B0FFE0 2E1CCAF7
59524	   30ns  B     0000    .   MemWri  00B0FFE0 FC8782B3
59525	   30ns  B     0000    .   MemWri  00B0FFE0 9C0A2335
59526	  780ns  .     0000    .   MemRd   00EF2808 00B10000
59527	  630ns  .     0000    .   MemRd   00EF280C 80007800
59528  1.2518ms  Start 0000    .   MemWri  00B10000 E85C33CD
59529	   30ns  B     0000    .   MemWri  00B10000 AD2F9613
59530	   30ns  B     0000    .   MemWri  00B10000 D8BEC924
59531	   30ns  B     0000    .   MemWri  00B10000 E273C0BD
59532	   30ns  B     0000    .   MemWri  00B10000 DC655F5E
59533	   30ns  B     0000    .   MemWri  00B10000 69B3087B
59534	   30ns  B     0000    .   MemWri  00B10000 369B26D1
59535	   30ns  B     0000    .   MemWri  00B10000 9A8C47DF
59536	  780ns  Start 0000    .   MemWri  00B10020 3F026EA5
  ..
  ..
62592	  750ns  Start 0000    .   MemWri  00B12FE0 367016E1
62593	   30ns  B     0000    .   MemWri  00B12FE0 35654905
62594	   30ns  B     0000    .   MemWri  00B12FE0 9968FF02
62595	   30ns  B     0000    .   MemWri  00B12FE0 9ABB5CAE
62596	   30ns  B     0000    .   MemWri  00B12FE0 D32DF135
62597	   30ns  B     0000    .   MemWri  00B12FE0 7A03326A
62598	   30ns  B     0000    .   MemWri  00B12FE0 86CCE8BF
62599	   30ns  B     0000    .   MemWri  00B12FE0 D4E66D21
62600	 1.176s  .     1110    .   IntAck  ........ ......76


My conclusion: don't do ATAPI DMA on a serverworks ROSB4 revision 0 IDE
controller.

Ciao,
  Dani

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gräfenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-13 11:50         ` Daniela Engert
@ 2002-06-13 11:59           ` Martin Wilck
  2002-06-13 12:04             ` Daniela Engert
  2002-06-13 23:48           ` Re[2]: " Nerijus Baliunas
  1 sibling, 1 reply; 31+ messages in thread
From: Martin Wilck @ 2002-06-13 11:59 UTC (permalink / raw)
  To: Daniela Engert; +Cc: Alan Cox, Linux Kernel mailing list

Am Don, 2002-06-13 um 13.50 schrieb Daniela Engert:

> And here is the same with the ROSB4. This time, some of the
> DMA writes are shown. After loading the second PRD entry
> which describes a memory region of 7800h bytes, 3000h bytes
> are transferred before IRQ14 is asserted. The IRQ14 INTACK
> cycle is the last transaction on the PCI bus ever, the
> machine is completely frozen!

You say (dma_base+2) is never read?
Was that a Linux system? If yes, I assume you never saw "OSB4 in
impossible state ..." ?

Martin

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-13 11:59           ` Martin Wilck
@ 2002-06-13 12:04             ` Daniela Engert
  2002-06-13 18:27               ` rico-linux-kernel
  0 siblings, 1 reply; 31+ messages in thread
From: Daniela Engert @ 2002-06-13 12:04 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Alan Cox, Linux Kernel mailing list

On 13 Jun 2002 13:59:06 +0200, Martin Wilck wrote:

>Am Don, 2002-06-13 um 13.50 schrieb Daniela Engert:

>> are transferred before IRQ14 is asserted. The IRQ14 INTACK
>> cycle is the last transaction on the PCI bus ever, the
>> machine is completely frozen!
>
>You say (dma_base+2) is never read?

Exactly. If checked this twice, the PCI tracer was configured to gather
*all* PCI bus events.

>Was that a Linux system?

No, I think this doesn't matter here at all, because the hardware
stalls completely - full stop.

Ciao,
  Dani

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gräfenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-13 12:04             ` Daniela Engert
@ 2002-06-13 18:27               ` rico-linux-kernel
  0 siblings, 0 replies; 31+ messages in thread
From: rico-linux-kernel @ 2002-06-13 18:27 UTC (permalink / raw)
  To: dani; +Cc: linux-kernel

Thanks for investing time on the logic analyser, Dani.  My experience
is slightly different.

I have several mainboards (Tyan S1867) with older chipsets from
ServerWorks (f.k.a. Reliance).  The IDE controller (OSB4 rev 0) is used
daily with ATAPI CDRW drives in UDMA(33) Mode.  System handles read/write
errors without problem.

The system will lock solid when both IDE channels are accessed,
and either one is using DMA.  Since I want DMA, I simply abandon the
secondary channel.

I have spare machines available for quack medical experiments.

Select boot-time info...

Linux version 2.4.17 (rico@pc2) (gcc version 2.95.3 20010315 (release)) #1 SMP Mon Dec 31 11:51:33 CST 2001
ServerWorks OSB4: IDE controller on PCI bus 00 dev 79
ServerWorks OSB4: chipset revision 0
ServerWorks OSB4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xfcb0-0xfcb7, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0xfcb8-0xfcbf, BIOS settings: hdc:pio, hdd:pio
hda: PLEXTOR CD-R PX-W2410A, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 40X CD-ROM CD-R/RW drive, 4096kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re[2]: Serverworks OSB4 in impossible state
  2002-06-13 11:50         ` Daniela Engert
  2002-06-13 11:59           ` Martin Wilck
@ 2002-06-13 23:48           ` Nerijus Baliunas
  1 sibling, 0 replies; 31+ messages in thread
From: Nerijus Baliunas @ 2002-06-13 23:48 UTC (permalink / raw)
  To: Daniela Engert; +Cc: Linux Kernel mailing list

On Thu, 13 Jun 2002 13:50:25 +0200 (CDT) Daniela Engert <dani@ngrt.de> wrote:

> My conclusion: don't do ATAPI DMA on a serverworks ROSB4 revision 0 IDE
> controller.

How can I find revision? I have a problem with (Seagate) hdds, but lspci -v
only shows:

00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 51)
        Subsystem: ServerWorks OSB4 South Bridge
        Flags: bus master, medium devsel, latency 0


00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller (prog-if 8a [Master SecP PriP])
        Flags: bus master, medium devsel, latency 64
        I/O ports at 2000 [size=16]


Regards,
Nerijus


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-10 15:52 Serverworks OSB4 in impossible state Martin Wilck
  2002-06-10 16:41 ` Daniela Engert
@ 2002-06-12  8:58 ` Alan Cox
  2002-06-12  8:47   ` Martin Wilck
  1 sibling, 1 reply; 31+ messages in thread
From: Alan Cox @ 2002-06-12  8:58 UTC (permalink / raw)
  To: Martin Wilck; +Cc: osb4-bug, Linux Kernel mailing list, Martin Wilck

Triggering the check on csb5/csb6 would be a bug - maybe an extra 
test is needed there as CSB5/6 are fine

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-12  8:58 ` Alan Cox
@ 2002-06-12  8:47   ` Martin Wilck
  2002-06-12  9:14     ` Alan Cox
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Wilck @ 2002-06-12  8:47 UTC (permalink / raw)
  To: Alan Cox; +Cc: osb4-bug, Linux Kernel mailing list

Am Mit, 2002-06-12 um 10.58 schrieb Alan Cox:
> Triggering the check on csb5/csb6 would be a bug - maybe an extra 
> test is needed there as CSB5/6 are fine

Currently the stall is triggered if the DMA engine active bit is set, no
further conditions.

Would you concur that it would be reasonable to trigger only if

- the chipset version is < CSB5,
- the drive is a hard disk,
- and the drive did not report an error?

(I am not certain about the last condition, but from the descriptions 
of the 4-byte-shift problem I have seen I infer that there was no drive
error condition involved).

Martin

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-12  8:47   ` Martin Wilck
@ 2002-06-12  9:14     ` Alan Cox
  2002-06-12 10:30       ` OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state) Martin Wilck
  0 siblings, 1 reply; 31+ messages in thread
From: Alan Cox @ 2002-06-12  9:14 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Alan Cox, osb4-bug, Linux Kernel mailing list

> Would you concur that it would be reasonable to trigger only if
> 
> - the chipset version is < CSB5,
> - the drive is a hard disk,
> - and the drive did not report an error?
> 
> (I am not certain about the last condition, but from the descriptions 
> of the 4-byte-shift problem I have seen I infer that there was no drive
> error condition involved).

Entirely agreed


^ permalink raw reply	[flat|nested] 31+ messages in thread

* OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state)
  2002-06-12  9:14     ` Alan Cox
@ 2002-06-12 10:30       ` Martin Wilck
  2002-06-12 20:35         ` Christian Zoffoli
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Wilck @ 2002-06-12 10:30 UTC (permalink / raw)
  To: Alan Cox; +Cc: osb4-bug, Linux Kernel mailing list

Am Mit, 2002-06-12 um 11.14 schrieb Alan Cox:
 > Entirely agreed

I propose this patch to remedy the problem.

I don't know how to test if the drive is a seagate drive, and
I think we don't want to do that, because it would end up in yet another
blacklist.

I cannot test if this behaves correctly on machines that do expose the
4-byte shift bug - it would be great if somebody could test that.

Martin

--- drivers/ide/serverworks.c.orig	Tue Jun 11 11:24:59 2002
+++ drivers/ide/serverworks.c	Wed Jun 12 12:00:36 2002
@@ -547,7 +547,13 @@
 			ide_hwif_t *hwif		= HWIF(drive);
 			unsigned long dma_base		= hwif->dma_base;
 	
-			if(inb(dma_base+0x02)&1)
+			/* If it's a disk on the OSB4, the DMA engine is still on,
+			   and the device reports no error status, we are probably
+			   facing the "4 byte shift" problem */
+			if(drive->media == ide_disk && 
+			   hwif->pci_dev->device == PCI_DEVICE_ID_SERVERWORKS_OSB4IDE && 
+			   inb(dma_base+0x02)&1 &&
+			   OK_STAT (GET_STAT(), DRIVE_READY, BAD_STAT))
 			{
 #if 0		
 				int i;


-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state)
  2002-06-12 10:30       ` OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state) Martin Wilck
@ 2002-06-12 20:35         ` Christian Zoffoli
  0 siblings, 0 replies; 31+ messages in thread
From: Christian Zoffoli @ 2002-06-12 20:35 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Alan Cox, osb4-bug, Linux Kernel mailing list

Martin Wilck wrote:
> Am Mit, 2002-06-12 um 11.14 schrieb Alan Cox:
>  > Entirely agreed
> 
> I propose this patch to remedy the problem.
> 
> I don't know how to test if the drive is a seagate drive, and
> I think we don't want to do that, because it would end up in yet another
> blacklist.
> 
> I cannot test if this behaves correctly on machines that do expose the
> 4-byte shift bug - it would be great if somebody could test that.
> 
> Martin
> 
> --- drivers/ide/serverworks.c.orig	Tue Jun 11 11:24:59 2002
> +++ drivers/ide/serverworks.c	Wed Jun 12 12:00:36 2002
> @@ -547,7 +547,13 @@
>  			ide_hwif_t *hwif		= HWIF(drive);
>  			unsigned long dma_base		= hwif->dma_base;
>  	
> -			if(inb(dma_base+0x02)&1)
> +			/* If it's a disk on the OSB4, the DMA engine is still on,
> +			   and the device reports no error status, we are probably
> +			   facing the "4 byte shift" problem */
> +			if(drive->media == ide_disk && 
> +			   hwif->pci_dev->device == PCI_DEVICE_ID_SERVERWORKS_OSB4IDE && 
> +			   inb(dma_base+0x02)&1 &&
> +			   OK_STAT (GET_STAT(), DRIVE_READY, BAD_STAT))
>  			{
>  #if 0		
>  				int i;
> 
> 


It works for me ...I have a supermicro 370DE6 (serverworks HE-SL) and a 
maxtor HD (5T030H3).


Christian


^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <1030002761.32380.27.camel@pluto.unixpac.com.au>]

* Re: ServerWorks OSB4 in impossible state
       [not found] <1030002761.32380.27.camel@pluto.unixpac.com.au>
@ 2002-08-22  8:35 ` Martin Wilck
  2002-08-22  8:51   ` Andre Hedrick
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Wilck @ 2002-08-22  8:35 UTC (permalink / raw)
  To: Gonzalo Servat; +Cc: Alan Cox, Linux Kernel mailing list

Am Don, 2002-08-22 um 09.52 schrieb Gonzalo Servat:

> Do you have any suggestions on how I can work around this problem? It's
> been driving me nuts all day! (I bet it's driven people nuts for
> weeks...). Do you think your patch (as posted on
> http://linux-kernel.skylab.org/20020609/msg00935.html) may help my
> situation? If so, what kernel does it apply to? I looked up
> serverworks.c in a 2.4.19-rc3 tree to see if the patch would apply
> cleanly but it won't because line 547 is different to yours.

It should be fairly easy to adapt the patch, all you need is modify 
the line
			if(inb(dma_base+0x02)&1)

in svwks_dmaproc() to the more complex condition test in the patch.

Alan, I understood you to wanted apply this patch - what happened to it,
do you want me to resubmit it?

Martin

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ServerWorks OSB4 in impossible state
  2002-08-22  8:35 ` ServerWorks OSB4 in impossible state Martin Wilck
@ 2002-08-22  8:51   ` Andre Hedrick
  2002-08-22 12:02     ` Martin Wilck
  0 siblings, 1 reply; 31+ messages in thread
From: Andre Hedrick @ 2002-08-22  8:51 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Gonzalo Servat, Alan Cox, Linux Kernel mailing list

The problem is we need a special DMA engine for this broken puppy.

I am trying to remember the rule for forming the dma-table, and it is not
nice.  The 4 byte issues is a direct result of building the SG which is
not compatable to the hardware.

508 + 4 is okay but 510 + 2 is not.

Now I have to remember why :-/

IIRC, we have to have 4 byte boundaries on the list.

This is where I need some extra help and doing something like the trm290
but for all of OSB4 because parsing out the broken engine bases on asic
revisions is darn near impossible.

Big Problem -- Big Hammer.

Tough if it tanks some of the performance, but it is better than the
deadlocks we are getting now.

Yeah I expect to take heat for this one from ServerWorks and it may cost
me later, but nobody else has got the guts to press the issue for the
correct solution.

Then again if we solve this correctly I have "ends justify means"
argument.

Cheers,

On 22 Aug 2002, Martin Wilck wrote:

> Am Don, 2002-08-22 um 09.52 schrieb Gonzalo Servat:
> 
> > Do you have any suggestions on how I can work around this problem? It's
> > been driving me nuts all day! (I bet it's driven people nuts for
> > weeks...). Do you think your patch (as posted on
> > http://linux-kernel.skylab.org/20020609/msg00935.html) may help my
> > situation? If so, what kernel does it apply to? I looked up
> > serverworks.c in a 2.4.19-rc3 tree to see if the patch would apply
> > cleanly but it won't because line 547 is different to yours.
> 
> It should be fairly easy to adapt the patch, all you need is modify 
> the line
> 			if(inb(dma_base+0x02)&1)
> 
> in svwks_dmaproc() to the more complex condition test in the patch.
> 
> Alan, I understood you to wanted apply this patch - what happened to it,
> do you want me to resubmit it?
> 
> Martin
> 
> -- 
> Martin Wilck                Phone: +49 5251 8 15113
> Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
> Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
> D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy
> 
> 
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
LAD Storage Consulting Group

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ServerWorks OSB4 in impossible state
  2002-08-22  8:51   ` Andre Hedrick
@ 2002-08-22 12:02     ` Martin Wilck
  2002-08-22 16:45       ` Tomas Szepe
  2002-08-22 17:58       ` Alan Cox
  0 siblings, 2 replies; 31+ messages in thread
From: Martin Wilck @ 2002-08-22 12:02 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Gonzalo Servat, Alan Cox, Linux Kernel mailing list

Am Don, 2002-08-22 um 10.51 schrieb Andre Hedrick:

> The problem is we need a special DMA engine for this broken puppy.

You certainly have much more insight into the problem than I. 
I wonder if (something like) the simple patch I submitted before can
be a temporary solution nevertheless. Please correct me if one of the
following statements is wrong:

1) The "4 byte shift" issue does not affect the CSB5 series.
2) The tested condition inb(dma_base+0x02)&1 is valid if the
   device doing the DMA reported an error status. Only if the
   device reports success is there an indication of the "4 byte shift".
3) The "4 byte shift" problem matters not for read-only devices like
   CD-ROMS; at least it is no reason to stall the computer if it occurs
   because data corruption is not an issue.

If these assertions are true, the patch I sent will at least prevent
people's machines from stalling unnecessarily. Even if one ore more are
false, the remaining correct condition test(s) will narrow the set
of machines that are stalled unnecessarily.

> 508 + 4 is okay but 510 + 2 is not.
> 
> Now I have to remember why :-/

You sure have to go for the right solution.
But if my patch was applied, ServerWorks chip sets would cause less
grief to people until you have figured it out.

> Yeah I expect to take heat for this one from ServerWorks and it may cost
> me later, but nobody else has got the guts to press the issue for the
> correct solution.

Let me know if we can help. I have no personal contacts to ServerWorks,
but we are a large customer of them and may be able to exert some
additional pressure. The current situation (IDE DMA must be disabled)
is hardly acceptable for us anyway.

Martin

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ServerWorks OSB4 in impossible state
  2002-08-22 12:02     ` Martin Wilck
@ 2002-08-22 16:45       ` Tomas Szepe
  2002-08-22 17:48         ` Andre Hedrick
  2002-08-22 17:59         ` Alan Cox
  2002-08-22 17:58       ` Alan Cox
  1 sibling, 2 replies; 31+ messages in thread
From: Tomas Szepe @ 2002-08-22 16:45 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Andre Hedrick, Gonzalo Servat, Alan Cox,
	Linux Kernel mailing list

> > Yeah I expect to take heat for this one from ServerWorks and it may cost
> > me later, but nobody else has got the guts to press the issue for the
> > correct solution.
> 
> Let me know if we can help. I have no personal contacts to ServerWorks,
> but we are a large customer of them and may be able to exert some
> additional pressure. The current situation (IDE DMA must be disabled)
> is hardly acceptable for us anyway.

AFAIK 2.4.18 as well as 2.4.19-preEARLY seemed to work flawlessly w/ OSB4
even in DMA modes. How's the code there then? Is it dangerous to use?

T.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ServerWorks OSB4 in impossible state
  2002-08-22 16:45       ` Tomas Szepe
@ 2002-08-22 17:48         ` Andre Hedrick
  2002-08-22 17:59         ` Alan Cox
  1 sibling, 0 replies; 31+ messages in thread
From: Andre Hedrick @ 2002-08-22 17:48 UTC (permalink / raw)
  To: Tomas Szepe
  Cc: Martin Wilck, Gonzalo Servat, Alan Cox, Linux Kernel mailing list


It took sometime figure this out with the ASIC architect.
Since there is not an easy way to determine which of the extremely early
SB's had the issue, it is suggested to hit it with a hammer on the DMA
table building.
 
On Thu, 22 Aug 2002, Tomas Szepe wrote:

> > > Yeah I expect to take heat for this one from ServerWorks and it may cost
> > > me later, but nobody else has got the guts to press the issue for the
> > > correct solution.
> > 
> > Let me know if we can help. I have no personal contacts to ServerWorks,
> > but we are a large customer of them and may be able to exert some
> > additional pressure. The current situation (IDE DMA must be disabled)
> > is hardly acceptable for us anyway.
> 
> AFAIK 2.4.18 as well as 2.4.19-preEARLY seemed to work flawlessly w/ OSB4
> even in DMA modes. How's the code there then? Is it dangerous to use?
> 
> T.
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ServerWorks OSB4 in impossible state
  2002-08-22 16:45       ` Tomas Szepe
  2002-08-22 17:48         ` Andre Hedrick
@ 2002-08-22 17:59         ` Alan Cox
  2002-08-22 18:14           ` Tomas Szepe
  1 sibling, 1 reply; 31+ messages in thread
From: Alan Cox @ 2002-08-22 17:59 UTC (permalink / raw)
  To: Tomas Szepe
  Cc: Martin Wilck, Andre Hedrick, Gonzalo Servat,
	Linux Kernel mailing list

On Thu, 2002-08-22 at 17:45, Tomas Szepe wrote:
> AFAIK 2.4.18 as well as 2.4.19-preEARLY seemed to work flawlessly w/ OSB4
> even in DMA modes. How's the code there then? Is it dangerous to use?

Most of them work all the time (most OSB4, all CSB5. all CSB6)
All of them work all the time with most drives
Some of them do horrible things in UDMA with some drives (timing
patterns I guess)

All of the OSB4 do MWDMA fine.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ServerWorks OSB4 in impossible state
  2002-08-22 17:59         ` Alan Cox
@ 2002-08-22 18:14           ` Tomas Szepe
  0 siblings, 0 replies; 31+ messages in thread
From: Tomas Szepe @ 2002-08-22 18:14 UTC (permalink / raw)
  To: Alan Cox
  Cc: Martin Wilck, Andre Hedrick, Gonzalo Servat,
	Linux Kernel mailing list

> > AFAIK 2.4.18 as well as 2.4.19-preEARLY seemed to work flawlessly w/ OSB4
> > even in DMA modes. How's the code there then? Is it dangerous to use?
> 
> Most of them work all the time (most OSB4, all CSB5. all CSB6)
> All of them work all the time with most drives
> Some of them do horrible things in UDMA with some drives (timing
> patterns I guess)
> 
> All of the OSB4 do MWDMA fine.

Oh it's not such a big problem then. If it tells you/Andre anything,
the controller I've run into trouble with seems to be (output from
2.4.19-pre2):

00:0f.1 IDE interface: Relience Computer: Unknown device 0211 (prog-if 8a [Master SecP PriP])
        Flags: bus master, medium devsel, latency 64
        I/O ports at 1880 [size=16]
00: 66 11 11 02 45 01 00 02 00 8a 01 01 00 40 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 81 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ServerWorks OSB4: IDE controller on PCI bus 00 dev 79
ServerWorks OSB4: chipset revision 0

(This is what they put into the HP NetServer E800, which is otherwise a nice
machine -- With these we can get up to 8 NICs to work w/o IRQ sharing. Ideal
for building routers, except if we were to put SCSI drives everywhere, we'd
have nothing to eat soon enough.)

So far we've been ok as 2.4.19-pre2 indeed appears to work just fine in UDMA2.

T.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ServerWorks OSB4 in impossible state
  2002-08-22 12:02     ` Martin Wilck
  2002-08-22 16:45       ` Tomas Szepe
@ 2002-08-22 17:58       ` Alan Cox
  2002-08-22 18:58         ` Martin Wilck
  1 sibling, 1 reply; 31+ messages in thread
From: Alan Cox @ 2002-08-22 17:58 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Andre Hedrick, Gonzalo Servat, Linux Kernel mailing list

On Thu, 2002-08-22 at 13:02, Martin Wilck wrote:
> 1) The "4 byte shift" issue does not affect the CSB5 series.

True (not a rule the -ac tree knows about right now) but one that the
next tree will subject to time constraints.

> 2) The tested condition inb(dma_base+0x02)&1 is valid if the
>    device doing the DMA reported an error status. Only if the
>    device reports success is there an indication of the "4 byte shift".

True

> 3) The "4 byte shift" problem matters not for read-only devices like
>    CD-ROMS; at least it is no reason to stall the computer if it occurs
>    because data corruption is not an issue.

True (-ac knows about this)



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ServerWorks OSB4 in impossible state
  2002-08-22 17:58       ` Alan Cox
@ 2002-08-22 18:58         ` Martin Wilck
  0 siblings, 0 replies; 31+ messages in thread
From: Martin Wilck @ 2002-08-22 18:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andre Hedrick, Gonzalo Servat, Linux Kernel mailing list

Am Don, 2002-08-22 um 19.58 schrieb Alan Cox:

> > 2) The tested condition inb(dma_base+0x02)&1 is valid if the
> >    device doing the DMA reported an error status. Only if the
> >    device reports success is there an indication of the "4 byte shift".
> 
> True

This condition is easy to test, right? My patch tested for
   OK_STAT (GET_STAT(), DRIVE_READY, BAD_STAT)
Why not put that in the code?

Martin
-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <20020613112932.C2B8C10A1B@mail.medav.de>]

* Re: Serverworks OSB4 in impossible state
       [not found] <20020613112932.C2B8C10A1B@mail.medav.de>
@ 2002-06-13 12:52 ` Martin Wilck
  0 siblings, 0 replies; 31+ messages in thread
From: Martin Wilck @ 2002-06-13 12:52 UTC (permalink / raw)
  To: Daniela Engert; +Cc: Linux Kernel mailing list

Am Don, 2002-06-13 um 14.32 schrieb Daniela Engert:

> I have no idea if the same is happening in case of an aborted ATA DMA
> transfer (I have no bad disk around), but at least I will disable ATAPI
> DMA transfers in my driver in case of early revision (whatever this is)
> OSB4 systems - possibly on all OSB4 systems. According to your
> experiences, the CSB5 and later seem to be fine.

Sorry, bad wording. I meant "OSB4" as opposed to "CSB5/6".

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 31+ messages in thread

* Serverworks OSB4 in impossible state.
@ 2002-06-03 17:40 Steven Timm
  2002-06-04  0:29 ` Alan Cox
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Timm @ 2002-06-03 17:40 UTC (permalink / raw)
  To: linux-kernel

Configuration:  Supermicro 370DLE motherboard, 2x1GHz pentium III,
Redhat 7.1 plus 2.4.18-4 kernel as shipped from Redhat,
Three IBM disks, hda=20Gb, hdc,hdd=40Gb, hdb=cdrom.

This system and 100-some others like it have had some kind
of DMA problems at every level of kernel and with
three different vendors of system disk...but was pretty
stable at 2.4.9 kernel and IBM system disks, also with 2.2.19
kernel and IBM system disks.

Now with 2.4.18 we get the following error, and the
system hangs:

Serverworks OSB4 in impossible state.
Disable UDMA or if you are using Seagate then try switching disk types
on this controller. Please report this event to osb4-bug@ide.cabal.tm
OSB4: continuing might cause disk corruption.

This is the only one of 60 machines thus configured that has
had the error thus far.

Two points:
1) The E-mail address in that kernel debug message doesn't exist.
E-mail bounces back from it.

2) What is causing the hang and are there any hopes to
fix it in software this time?  Last year when I came to the kernel
list with problems very similar, the consensus was that this
is actually broken hardware in the OSB4 chipset...but obviously
it is possible for at least some kernels to run quasi-normally
on this hardware... what changed between 2.4.9 and 2.4.18 so
it doesn't anymore?

Steve Timm

------------------------------------------------------------------
Steven C. Timm (630) 840-8525  timm@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state.
  2002-06-03 17:40 Steven Timm
@ 2002-06-04  0:29 ` Alan Cox
  2002-06-03 18:11   ` kwijibo
  0 siblings, 1 reply; 31+ messages in thread
From: Alan Cox @ 2002-06-04  0:29 UTC (permalink / raw)
  To: Steven Timm; +Cc: linux-kernel

On Mon, 2002-06-03 at 18:40, Steven Timm wrote:
> Serverworks OSB4 in impossible state.
> Disable UDMA or if you are using Seagate then try switching disk types
> on this controller. Please report this event to osb4-bug@ide.cabal.tm
> OSB4: continuing might cause disk corruption.
> 
> This is the only one of 60 machines thus configured that has
> had the error thus far.
> 
> Two points:
> 1) The E-mail address in that kernel debug message doesn't exist.
> E-mail bounces back from it.

Oops I'll go fix that small detail. It should have been forwarded to me.

> 2) What is causing the hang and are there any hopes to
> fix it in software this time?  Last year when I came to the kernel
> list with problems very similar, the consensus was that this
> is actually broken hardware in the OSB4 chipset...but obviously
> it is possible for at least some kernels to run quasi-normally
> on this hardware... what changed between 2.4.9 and 2.4.18 so
> it doesn't anymore?

The code traps out when it sees the I/O complete and it turns out that
the DMA engine flags say the engine is still running. In this state we
kill the box because we know the next I/O will be written 4 bytes skewed
with the last 4 bytes of the previous I/O apparently repeated at the
start.

I took it up with the Serverworks guys at the time, but they were not
able to duplicate the problem and provide advice. Since we could verify
this across an entire rendering farm it was clearly not a weird one off
bug. It also doesn't appear to be a Linux bug (but maybe one day I'll be
proved wrong).

If you drop the drives to MWDMA2 you'll see only slightly lower
performance and solid behaviour

Alan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Serverworks OSB4 in impossible state.
  2002-06-04  0:29 ` Alan Cox
@ 2002-06-03 18:11   ` kwijibo
  0 siblings, 0 replies; 31+ messages in thread
From: kwijibo @ 2002-06-03 18:11 UTC (permalink / raw)
  To: Alan Cox; +Cc: Steven Timm, linux-kernel

I had this same problem and I posted to the list a couple
of weeks ago but it never got any response.  The only
thing I have on the IDE is a CDROM, rest is SCSI.  I could
mount the CD drive with no problem but once I tried to read
any data from it I would get the 'impossible state' error.  I can
reproduce this at any time, I don't know how the Serverworks
people can't.  Just have them go buy a Dell PowerEdge 1650
and use the CDROM.  This was with 2.4.18.  I found a work
around for it however.  I just turned off DMA and it worked fine
again.  I guess it is turned on by default.  DMA turned off on a
hard drive could suck though, not sure what you could do.

Steven

Alan Cox wrote:

>On Mon, 2002-06-03 at 18:40, Steven Timm wrote:
>  
>
>>Serverworks OSB4 in impossible state.
>>Disable UDMA or if you are using Seagate then try switching disk types
>>on this controller. Please report this event to osb4-bug@ide.cabal.tm
>>OSB4: continuing might cause disk corruption.
>>
>>This is the only one of 60 machines thus configured that has
>>had the error thus far.
>>
>>Two points:
>>1) The E-mail address in that kernel debug message doesn't exist.
>>E-mail bounces back from it.
>>    
>>
>
>Oops I'll go fix that small detail. It should have been forwarded to me.
>
>  
>
>>2) What is causing the hang and are there any hopes to
>>fix it in software this time?  Last year when I came to the kernel
>>list with problems very similar, the consensus was that this
>>is actually broken hardware in the OSB4 chipset...but obviously
>>it is possible for at least some kernels to run quasi-normally
>>on this hardware... what changed between 2.4.9 and 2.4.18 so
>>it doesn't anymore?
>>    
>>
>
>The code traps out when it sees the I/O complete and it turns out that
>the DMA engine flags say the engine is still running. In this state we
>kill the box because we know the next I/O will be written 4 bytes skewed
>with the last 4 bytes of the previous I/O apparently repeated at the
>start.
>
>I took it up with the Serverworks guys at the time, but they were not
>able to duplicate the problem and provide advice. Since we could verify
>this across an entire rendering farm it was clearly not a weird one off
>bug. It also doesn't appear to be a Linux bug (but maybe one day I'll be
>proved wrong).
>
>If you drop the drives to MWDMA2 you'll see only slightly lower
>performance and solid behaviour
>
>Alan
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>
>  
>




^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2002-08-22 18:54 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-10 15:52 Serverworks OSB4 in impossible state Martin Wilck
2002-06-10 16:41 ` Daniela Engert
2002-06-11  7:22   ` Martin Wilck
2002-06-11  7:45     ` Daniela Engert
2002-06-11  8:37       ` Martin Wilck
2002-06-11 11:25       ` Martin Wilck
2002-06-11 21:27         ` Chris Wedgwood
2002-06-12  7:24           ` Martin Wilck
2002-06-13 11:50         ` Daniela Engert
2002-06-13 11:59           ` Martin Wilck
2002-06-13 12:04             ` Daniela Engert
2002-06-13 18:27               ` rico-linux-kernel
2002-06-13 23:48           ` Re[2]: " Nerijus Baliunas
2002-06-12  8:58 ` Alan Cox
2002-06-12  8:47   ` Martin Wilck
2002-06-12  9:14     ` Alan Cox
2002-06-12 10:30       ` OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state) Martin Wilck
2002-06-12 20:35         ` Christian Zoffoli
     [not found] <1030002761.32380.27.camel@pluto.unixpac.com.au>
2002-08-22  8:35 ` ServerWorks OSB4 in impossible state Martin Wilck
2002-08-22  8:51   ` Andre Hedrick
2002-08-22 12:02     ` Martin Wilck
2002-08-22 16:45       ` Tomas Szepe
2002-08-22 17:48         ` Andre Hedrick
2002-08-22 17:59         ` Alan Cox
2002-08-22 18:14           ` Tomas Szepe
2002-08-22 17:58       ` Alan Cox
2002-08-22 18:58         ` Martin Wilck
     [not found] <20020613112932.C2B8C10A1B@mail.medav.de>
2002-06-13 12:52 ` Serverworks " Martin Wilck
  -- strict thread matches above, loose matches on Subject: below --
2002-06-03 17:40 Steven Timm
2002-06-04  0:29 ` Alan Cox
2002-06-03 18:11   ` kwijibo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox