public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Serverworks OSB4 in impossible state
@ 2002-06-10 15:52 Martin Wilck
  2002-06-10 16:41 ` Daniela Engert
  2002-06-12  8:58 ` Alan Cox
  0 siblings, 2 replies; 18+ messages in thread
From: Martin Wilck @ 2002-06-10 15:52 UTC (permalink / raw)
  To: osb4-bug; +Cc: Linux Kernel mailing list, Martin Wilck

[-- Attachment #1: Type: text/plain, Size: 1574 bytes --]


Hello,

I know a similar problem was discussed here a short while ago.
However we have here a situation where we can reproduce the problem 
reliably. This is a RedHat 2.4.18-4 kernel.

We have a CD with a corrupt last block. If we try to read this block in
PIO mode (hdparm -d 0 /dev/hdc) , we get errors like in the first
attachment.

The machine has only a CDROM (Mitsumi FX 4830T) attached to the IDE bus
as /dev/hdc. We used no IDE-related boot parameters.

If we read the block in DMA mode (with dd), the machine stalls with the
"impossible state" message.

A PCI bus scan reveals that the IO register (dma_base+2) contains indeed
0xa5 (bit 0 set), which leads to the panic. Normally the read on that
register returns 0xa0.

We see in our PCI bus scan that a successful DMA of 4096 bytes was
carried out ~23ms before the stall condition. Another 4096 byte request
was scheduled but never seen. Between the successful DMA and the stall
condition we see nothing but a few timer interrupts.
Then an IDE interrupt occurs, which leads immediately to the panic.

The CD-ROM drive certainly reports some sort of error like in the PIO
case when tyring to access the last block. This seems to be the
(indirect) reason why the Bus master bit in (dma_base+2) remains set
long after the DMA is finished. 

Any ideas/comments?

Martin

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy





[-- Attachment #2: Kernel error messages in PIO-mode --]
[-- Type: text/plain, Size: 3573 bytes --]

Jun 10 13:12:40 pdb0384c kernel: hdc: command error: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: error=0x50
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:40 pdb0384c kernel: hdc: command error: error=0x50
Jun 10 13:12:40 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307712
Jun 10 13:12:49 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:49 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:12:56 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:12:56 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:06 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:06 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:09 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:09 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:09 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:13 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:13 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:17 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:17 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:21 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:21 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:21 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:21 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307716
Jun 10 13:13:21 pdb0384c kernel: hdc: command error: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:21 pdb0384c kernel: hdc: command error: error=0x50
Jun 10 13:13:21 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307712
Jun 10 13:13:25 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:25 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:29 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:29 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:33 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:33 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:36 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:36 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:36 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:41 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:41 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:44 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:44 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:48 pdb0384c kernel: hdc: cdrom_decode_status: status=0x51 { DriveReady SeekComplete Error }
Jun 10 13:13:48 pdb0384c kernel: hdc: cdrom_decode_status: error=0x30
Jun 10 13:13:48 pdb0384c kernel: hdc: ATAPI reset complete
Jun 10 13:13:48 pdb0384c kernel: end_request: I/O error, dev 16:00 (hdc), sector 1307716

[-- Attachment #3: dmesg --]
[-- Type: text/plain, Size: 15690 bytes --]

ACPI table found: RSDT v1 [PTLTD    RSDT   1540.1]
__va_range(0xbfefc0f9, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefc0f9, 0x74): idx=8 mapped at ffff6000
ACPI table found: FACP v1 [FSC    D1309    1540.1]
__va_range(0xbfefeef8, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefeef8, 0x50): idx=8 mapped at ffff6000
ACPI table found: SPCR v1 [PTLTD  $UCRTBL$ 1540.1]
__va_range(0xbfefef48, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefef48, 0x90): idx=8 mapped at ffff6000
ACPI table found: APIC v1 [PTLTD  	 APIC   1540.1]
__va_range(0xbfefef48, 0x90): idx=8 mapped at ffff6000
LAPIC (acpi_id[0x0000] id[0x6] enabled[1])
CPU 0 (0x0600) enabledProcessor #6 Unknown CPU [15:2] APIC version 16

LAPIC (acpi_id[0x0001] id[0x0] enabled[1])
CPU 1 (0x0000) enabledProcessor #0 Unknown CPU [15:2] APIC version 16

LAPIC (acpi_id[0x0002] id[0x1] enabled[1])
CPU 2 (0x0100) enabledProcessor #1 Unknown CPU [15:2] APIC version 16

LAPIC (acpi_id[0x0003] id[0x7] enabled[1])
CPU 3 (0x0700) enabledProcessor #7 Unknown CPU [15:2] APIC version 16

IOAPIC (id[0x2] address[0xfec00000] global_irq_base[0x0])
IOAPIC (id[0x3] address[0xfec10000] global_irq_base[0x10])
INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x1] trigger[0x1])
INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x3] trigger[0x3])
LAPIC_NMI (acpi_id[0x0000] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0001] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0002] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0003] polarity[0x1] trigger[0x1] lint[0x1])
4 CPUs total
Local APIC address fee00000
__va_range(0xbfefefd8, 0x24): idx=8 mapped at ffff6000
__va_range(0xbfefefd8, 0x28): idx=8 mapped at ffff6000
ACPI table found: BOOT v1 [PTLTD  $SBFTBL$ 1540.1]
Enabling the CPU's according to the ACPI table
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: FSCD1309 Product ID: PRIMERGY     APIC at: 0xFEE00000
I/O APIC #2 Version 17 at 0xFEC00000.
I/O APIC #3 Version 17 at 0xFEC10000.
Processors: 4
Kernel command line: ro root=/dev/sda2
Initializing CPU#0
Detected 2395.457 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 4771.02 BogoMIPS
Memory: 3098776k/3145728k available (1232k kernel code, 46496k reserved, 842k data, 304k init, 2228160k highmem)
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode cache hash table entries: 262144 (order: 9, 2097152 bytes)
Mount-cache hash table entries: 65536 (order: 7, 524288 bytes)
Buffer cache hash table entries: 262144 (order: 8, 1048576 bytes)
Page-cache hash table entries: 524288 (order: 9, 2097152 bytes)
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU0: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
per-CPU timeslice cutoff: 1462.93 usecs.
task migration cache decay timeout: 10 msecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Booting processor 1/0 eip 2000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4784.12 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#1.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU1: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
Booting processor 2/1 eip 2000
Initializing CPU#2
masked ExtINT on CPU#2
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4784.12 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#2.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU2: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
Booting processor 3/7 eip 2000
Initializing CPU#3
masked ExtINT on CPU#3
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 4784.12 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#3.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU3: Intel(R) XEON(TM) CPU 2.40GHz stepping 04
Total of 4 processors activated (19123.40 BogoMIPS).
cpu_sibling_map[0] = 3
cpu_sibling_map[1] = 2
cpu_sibling_map[2] = 1
cpu_sibling_map[3] = 0
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
Setting 3 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 3 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-10, 2-11, 3-0, 3-1, 3-2, 3-3, 3-4, 3-5, 3-6, 3-7, 3-8, 3-9, 3-10, 3-11, 3-12, 3-15 not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
number of MP IRQ sources: 16.
number of IO-APIC #2 registers: 16.
number of IO-APIC #3 registers: 16.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02000000
.......    : physical APIC id: 02
.... register #01: 000F0011
.......     : max redirection entries: 000F
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 02000000
.......     : arbitration: 02
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 000 00  1    0    0   0   0    0    0    00
 01 00F 0F  0    0    0   0   0    1    1    39
 02 00F 0F  0    0    0   0   0    1    1    31
 03 00F 0F  0    0    0   0   0    1    1    41
 04 00F 0F  0    0    0   0   0    1    1    49
 05 00F 0F  0    0    0   0   0    1    1    51
 06 00F 0F  0    0    0   0   0    1    1    59
 07 00F 0F  0    0    0   0   0    1    1    61
 08 00F 0F  0    0    0   0   0    1    1    69
 09 00F 0F  1    1    0   1   0    1    1    71
 0a 000 00  1    0    0   0   0    0    0    00
 0b 000 00  1    0    0   0   0    0    0    00
 0c 00F 0F  0    0    0   0   0    1    1    79
 0d 00F 0F  0    0    0   0   0    1    1    81
 0e 00F 0F  0    0    0   0   0    1    1    89
 0f 00F 0F  0    0    0   0   0    1    1    91

IO APIC #3......
.... register #00: 03000000
.......    : physical APIC id: 03
.... register #01: 000F0011
.......     : max redirection entries: 000F
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 03000000
.......     : arbitration: 03
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 000 00  1    0    0   0   0    0    0    00
 01 000 00  1    0    0   0   0    0    0    00
 02 000 00  1    0    0   0   0    0    0    00
 03 000 00  1    0    0   0   0    0    0    00
 04 000 00  1    0    0   0   0    0    0    00
 05 000 00  1    0    0   0   0    0    0    00
 06 000 00  1    0    0   0   0    0    0    00
 07 000 00  1    0    0   0   0    0    0    00
 08 000 00  1    0    0   0   0    0    0    00
 09 000 00  1    0    0   0   0    0    0    00
 0a 000 00  1    0    0   0   0    0    0    00
 0b 000 00  1    0    0   0   0    0    0    00
 0c 000 00  1    0    0   0   0    0    0    00
 0d 00F 0F  1    1    0   1   0    1    1    99
 0e 00F 0F  1    1    0   1   0    1    1    A1
 0f 000 00  1    0    0   0   0    0    0    00
IRQ to pin mappings:
IRQ0 -> 0:2
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ29 -> 1:13
IRQ30 -> 1:14
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 2395.2498 MHz.
..... host bus clock speed is 99.8019 MHz.
cpu: 0, clocks: 998019, slice: 199603
CPU0<T0:998016,T1:798400,D:13,S:199603,C:998019>
cpu: 2, clocks: 998019, slice: 199603
cpu: 3, clocks: 998019, slice: 199603
cpu: 1, clocks: 998019, slice: 199603
CPU1<T0:998016,T1:598800,D:10,S:199603,C:998019>
CPU2<T0:998016,T1:399200,D:7,S:199603,C:998019>
CPU3<T0:998016,T1:199600,D:4,S:199603,C:998019>
checking TSC synchronization across CPUs: passed.
PCI: PCI BIOS revision 2.10 entry at 0xfd9aa, last bus=2
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Discovered primary peer bus 01 [IRQ]
PCI: Discovered primary peer bus 02 [IRQ]
PCI->APIC IRQ transform: (B0,I5,P0) -> 30
PCI->APIC IRQ transform: (B0,I15,P0) -> 9
PCI->APIC IRQ transform: (B2,I10,P0) -> 29
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS not found.
Starting kswapd
allocated 64 pages and 64 bhs reserved for the highmem bounces
VFS: Diskquotas version dquot_6.5.0 initialized
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
block: 1024 slots per queue, batch=256
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks CSB5: IDE controller on PCI bus 00 dev 79
SvrWks CSB5: chipset revision 147
SvrWks CSB5: not 100% native mode: will probe irqs later
SvrWks CSB5: simplex device: DMA forced
    ide0: BM-DMA at 0x1800-0x1807, BIOS settings: hda:pio, hdb:pio
SvrWks CSB5: simplex device: DMA forced
    ide1: BM-DMA at 0x1808-0x180f, BIOS settings: hdc:pio, hdd:pio
hdc: FX4830T, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
ide-floppy driver 0.99.newide
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
pci_hotplug: PCI Hot Plug PCI Core version: 0.4
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 32768 buckets, 256Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 220k freed
VFS: Mounted root (ext2 filesystem).
SCSI subsystem driver Revision: 1.00
kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
sym53c8xx: at PCI bus 2, device 10, function 0
sym53c8xx: 53c1010-66 detected with Symbios NVRAM
sym53c1010-66-0: rev 0x1 on pci bus 2 device 10 function 0 irq 29
sym53c1010-66-0: Symbios format NVRAM, ID 7, Fast-80, Parity Checking
sym53c1010-66-0: on-chip RAM at 0xfe000000
sym53c1010-66-0: restart (scsi reset).
sym53c1010-66-0: handling phase mismatch from SCRIPTS.
sym53c1010-66-0: Downloading SCSI SCRIPTS.
scsi0 : sym53c8xx-1.7.3c-20010512
blk: queue f7fd6e18, I/O limit 4095Mb (mask 0xffffffff)
  Vendor: SEAGATE   Model: ST318451LC        Rev: 7500
  Type:   Direct-Access                      ANSI SCSI revision: 03
blk: queue f7fd6c18, I/O limit 4095Mb (mask 0xffffffff)
  Vendor: SEAGATE   Model: ST318451LC        Rev: 7500
  Type:   Direct-Access                      ANSI SCSI revision: 03
blk: queue f7fd6a18, I/O limit 4095Mb (mask 0xffffffff)
  Vendor: SDR       Model: GEM318            Rev: 0   
  Type:   Processor                          ANSI SCSI revision: 02
blk: queue f7308e18, I/O limit 4095Mb (mask 0xffffffff)
sym53c1010-66-0-<0,0>: tagged command queue depth set to 8
sym53c1010-66-0-<1,0>: tagged command queue depth set to 8
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
sym53c1010-66-0-<0,*>: FAST-80 WIDE SCSI 160.0 MB/s (12.5 ns, offset 62)
SCSI device sda: 35843671 512-byte hdwr sectors (18352 MB)
Partition check:
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 >
sym53c1010-66-0-<1,*>: FAST-80 WIDE SCSI 160.0 MB/s (12.5 ns, offset 62)
SCSI device sdb: 35843671 512-byte hdwr sectors (18352 MB)
 sdb:
Journalled Block Device driver loaded
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 304k freed
Adding Swap: 1702848k swap-space (priority -1)
Adding Swap: 1694816k swap-space (priority -2)
Adding Swap: 1702848k swap-space (priority -3)
Adding Swap: 1702848k swap-space (priority -4)
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-ohci.c: USB OHCI at membase 0xf8bc5000, IRQ 9
usb-ohci.c: usb-00:0f.2, ServerWorks OSB4/CSB5 OHCI USB Controller
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 4 ports detected
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on sd(8,2), internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on sd(8,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on sd(8,3), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
parport0: PC-style at 0x378 [PCSPP,TRISTATE,EPP]
eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others
eth0: OEM i82557/i82558 10/100 Ethernet, 00:30:05:29:74:92, IRQ 30.
  Board assembly 000000-000, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.
  General self-test: passed.
  Serial sub-system self-test: passed.
  Internal registers self-test: passed.
  ROM checksum self-test: passed (0x04f4518b).
hdc: ATAPI 48X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12
hdc: DMA disabled

[-- Attachment #4: /proc/ide/ide1/hdc/settings --]
[-- Type: text/plain, Size: 1199 bytes --]

name			value		min		max		mode
----			-----		---		---		----
breada_readahead        4               0               127             rw
current_speed           66              0               69              rw
dsc_overlap             0               0               1               rw
file_readahead          0               0               2097151         rw
ide_scsi                0               0               1               rw
init_speed              66              0               69              rw
io_32bit                0               0               3               rw
keepsettings            0               0               1               rw
max_kb_per_request      64              1               127             rw
nice1                   1               0               1               rw
number                  2               0               3               rw
pio_mode                write-only      0               255             w
slow                    0               0               1               rw
unmaskirq               0               0               1               rw
using_dma               1               0               1               rw

[-- Attachment #5: /proc/ide/svwks --]
[-- Type: text/plain, Size: 785 bytes --]


                             ServerWorks OSB4/CSB5/CSB6

                            ServerWorks CSB5 Chipset (rev 93)
------------------------------- General Status ---------------------------------
--------------- Primary Channel ---------------- Secondary Channel -------------
                disabled                         disabled
--------------- drive0 --------- drive1 -------- drive0 ---------- drive1 ------
DMA enabled:    no               no              yes               no 
UDMA enabled:   no               no              yes               no 
UDMA enabled:   0                0               2                 0
DMA enabled:    2                2               2                 2
PIO  enabled:   ?                ?               4                 ?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-10 15:52 Serverworks OSB4 in impossible state Martin Wilck
@ 2002-06-10 16:41 ` Daniela Engert
  2002-06-11  7:22   ` Martin Wilck
  2002-06-12  8:58 ` Alan Cox
  1 sibling, 1 reply; 18+ messages in thread
From: Daniela Engert @ 2002-06-10 16:41 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Linux Kernel mailing list

Hello Martin,

On 10 Jun 2002 17:52:58 +0200, Martin Wilck wrote:

>We have a CD with a corrupt last block. If we try to read this block in
>PIO mode (hdparm -d 0 /dev/hdc) , we get errors like in the first
>attachment.

The error code returned is "check condition" with a sense key of 3
"medium error". The most appropriate driver action would have been to
issue a "request sense" command to learn the precise error and retry
only in case of a good chance of a recoverable problem - but that's a
different story.

>If we read the block in DMA mode (with dd), the machine stalls with the
>"impossible state" message.
>
>A PCI bus scan reveals that the IO register (dma_base+2) contains indeed
>0xa5 (bit 0 set), which leads to the panic. Normally the read on that
>register returns 0xa0.

The intersting bits of the DMA status register are bits 0 though 2. A
value of 5 indicates the condition "interrupt from unit, DMA state
machine active". This is a valid status! It basically means the unit
issued an interrupt before the PRD table is exhausted. This makes sense
because the CD-ROM units fails to transfer the amount of data described
by the PRD table because of the non-recoverable read error.

>We see in our PCI bus scan that a successful DMA of 4096 bytes was
>carried out ~23ms before the stall condition. Another 4096 byte request
>was scheduled but never seen. Between the successful DMA and the stall
>condition we see nothing but a few timer interrupts.
>Then an IDE interrupt occurs, which leads immediately to the panic.

What you makes sense (the next DMA transfer is scheduled but never
carried out by the CD-ROM unit) except for the panic, ofcoz. The
correct driver action in this case were stopping the DMA engine and
issuing a reset of the state machines involved (both on the host and
the unit side).

>Any ideas/comments?

I hope this clears up things a little ...

Ciao,
  Dani



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-10 16:41 ` Daniela Engert
@ 2002-06-11  7:22   ` Martin Wilck
  2002-06-11  7:45     ` Daniela Engert
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Wilck @ 2002-06-11  7:22 UTC (permalink / raw)
  To: Daniela Engert; +Cc: Linux Kernel mailing list

Am Mon, 2002-06-10 um 18.41 schrieb Daniela Engert:

> The intersting bits of the DMA status register are bits 0 though 2. A
> value of 5 indicates the condition "interrupt from unit, DMA state
> machine active". This is a valid status! It basically means the unit
> issued an interrupt before the PRD table is exhausted. This makes sense
> because the CD-ROM units fails to transfer the amount of data described
> by the PRD table because of the non-recoverable read error.

Shouldn't the error bit be set too? (But that wouldn't make any
difference with the current driver ...)

> What you makes sense (the next DMA transfer is scheduled but never
> carried out by the CD-ROM unit) except for the panic, ofcoz. The
> correct driver action in this case were stopping the DMA engine and
> issuing a reset of the state machines involved (both on the host and
> the unit side).

The message, the comments in the code, and what Alan wrote here:
http://groups.google.com/groups?hl=de&lr=&threadm=linux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%40boxer.fnal.gov&rnum=2&prev=/groups%3Fq%3Dosb4-bug%2540ide.cabal.tm%26hl%3Dde%26lr%3D%26selm%3Dlinux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%2540boxer.fnal.gov%26rnum%3D2
suggest that trying to recover from this condition is extremely
dangerous (note that the kernel doesn't even panic(), because
a sync() may kill a disk, the comments say).

Anyway, thanks a lot for your insightful comments.
Martin

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-11  7:22   ` Martin Wilck
@ 2002-06-11  7:45     ` Daniela Engert
  2002-06-11  8:37       ` Martin Wilck
  2002-06-11 11:25       ` Martin Wilck
  0 siblings, 2 replies; 18+ messages in thread
From: Daniela Engert @ 2002-06-11  7:45 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Linux Kernel mailing list

On 11 Jun 2002 09:22:24 +0200, Martin Wilck wrote:

>Am Mon, 2002-06-10 um 18.41 schrieb Daniela Engert:

>> The intersting bits of the DMA status register are bits 0 though 2. A
>> value of 5 indicates the condition "interrupt from unit, DMA state
>> machine active". This is a valid status! It basically means the unit
>> issued an interrupt before the PRD table is exhausted. This makes sense
>> because the CD-ROM units fails to transfer the amount of data described
>> by the PRD table because of the non-recoverable read error.
>
>Shouldn't the error bit be set too? (But that wouldn't make any
>difference with the current driver ...)

No it shouldn't. The error is happening on the unit side and not on the
host side of the bus. Thus it is correct that the host is *not*
reporting an error (which is true) but only the CD-ROM unit.

>> What you makes sense (the next DMA transfer is scheduled but never
>> carried out by the CD-ROM unit) except for the panic, ofcoz. The
>> correct driver action in this case were stopping the DMA engine and
>> issuing a reset of the state machines involved (both on the host and
>> the unit side).
>
>The message, the comments in the code, and what Alan wrote here:
>http://groups.google.com/groups?hl=de&lr=&threadm=linux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%40boxer.fnal.gov&rnum=2&prev=/groups%3Fq%3Dosb4-bug%2540ide.cabal.tm%26hl%3Dde%26lr%3D%26selm%3Dlinux.kernel.Pine.LNX.4.31.0206031234370.12103-100000%2540boxer.fnal.gov%26rnum%3D2
>suggest that trying to recover from this condition is extremely
>dangerous (note that the kernel doesn't even panic(), because
>a sync() may kill a disk, the comments say).

I'm aware of all of that. By pure chance I have a machine with an OSB4
sitting on my desk for a couple of days. May be I can find a defect
CD-ROM to test it with my driver and see if it manages to recover from
errors like these. Hopefully, the PCI tracer gives some more insight.

Ciao,
  Dani

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gräfenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-11  7:45     ` Daniela Engert
@ 2002-06-11  8:37       ` Martin Wilck
  2002-06-11 11:25       ` Martin Wilck
  1 sibling, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2002-06-11  8:37 UTC (permalink / raw)
  To: Daniela Engert; +Cc: Linux Kernel mailing list

Am Die, 2002-06-11 um 09.45 schrieb Daniela Engert:

> I'm aware of all of that. By pure chance I have a machine with an OSB4
> sitting on my desk for a couple of days. May be I can find a defect
> CD-ROM to test it with my driver and see if it manages to recover from
> errors like these. Hopefully, the PCI tracer gives some more insight.

Do you have a custom version of the driver (because you write "my
driver")? If yes, can you send it, so that I can test it, too?

Can you point me to any reference material on the web?

Martin
-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-11  7:45     ` Daniela Engert
  2002-06-11  8:37       ` Martin Wilck
@ 2002-06-11 11:25       ` Martin Wilck
  2002-06-11 21:27         ` Chris Wedgwood
  2002-06-13 11:50         ` Daniela Engert
  1 sibling, 2 replies; 18+ messages in thread
From: Martin Wilck @ 2002-06-11 11:25 UTC (permalink / raw)
  To: Daniela Engert; +Cc: Linux Kernel mailing list, Alan Cox

[Alan, I am cc'ing you on this because I read elsewhere that you want 
osb4-bug@ide.cabal.tm to be forwarded to you, and that address still
bounces]. 

I have tried the following:

- comment out the code that stalls the machine when the condition in
  question is encountered.
- run dd over a couple of good blocks on the CD.
- run dd over the corrupted blocks. This leads now to very similar
  errors as in the PIO case.
- reenable DMA with hdparm, because it is automatically disabled by the
  ide-cd driver if an error occurs (why that? the error has nothing to
  do with DMA here).
- repeat the first dd command on the good blocks and compare the
  results.

The results are identical, thus I cannot verify the "4 byte shift" Alan
has been talking about. Of course this is a CD-ROM only scenario, thus
I can't tell anything about hard disks.

Is it possible that the 4-byte shift occurs only with some particular
(older?) version of the chipset? 

In any case, the condition that usually causes Linux to stall is 
indeed a perfectly valid condition for DMA when the device transfers
less data than it's supposed to. I doubt that hanging the system 
without more detailed checks is the right measure to take there.

Martin
 
-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-11 11:25       ` Martin Wilck
@ 2002-06-11 21:27         ` Chris Wedgwood
  2002-06-12  7:24           ` Martin Wilck
  2002-06-13 11:50         ` Daniela Engert
  1 sibling, 1 reply; 18+ messages in thread
From: Chris Wedgwood @ 2002-06-11 21:27 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Daniela Engert, Linux Kernel mailing list, Alan Cox

On Tue, Jun 11, 2002 at 01:25:25PM +0200, Martin Wilck wrote:

    Is it possible that the 4-byte shift occurs only with some
    particular (older?) version of the chipset?

Maybe.

I have an oldish OSB4 here and beating on it only with the CDROM
(disks are all SCSI) I don't ever seem to see this problem:

00:00.0 Host bridge: ServerWorks CNB20LE Host Bridge (rev 05)
        Flags: bus master, medium devsel, latency 48

00:00.1 Host bridge: ServerWorks CNB20LE Host Bridge (rev 05)
        Flags: bus master, medium devsel, latency 48

I think what is really required is input from ServerWorks/Broadcom
about this.



  --cw

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-11 21:27         ` Chris Wedgwood
@ 2002-06-12  7:24           ` Martin Wilck
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2002-06-12  7:24 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linux Kernel mailing list

Am Die, 2002-06-11 um 23.27 schrieb Chris Wedgwood:

> I have an oldish OSB4 here and beating on it only with the CDROM
> (disks are all SCSI) I don't ever seem to see this problem:

UDMA33 mode? You need to have a broken CD (we happen to have a CD burner
that generates broken CDs)

> I think what is really required is input from ServerWorks/Broadcom
> about this.

Yeah, we are in contact with them.
Thanks,
Martin


-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-12  8:58 ` Alan Cox
@ 2002-06-12  8:47   ` Martin Wilck
  2002-06-12  9:14     ` Alan Cox
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Wilck @ 2002-06-12  8:47 UTC (permalink / raw)
  To: Alan Cox; +Cc: osb4-bug, Linux Kernel mailing list

Am Mit, 2002-06-12 um 10.58 schrieb Alan Cox:
> Triggering the check on csb5/csb6 would be a bug - maybe an extra 
> test is needed there as CSB5/6 are fine

Currently the stall is triggered if the DMA engine active bit is set, no
further conditions.

Would you concur that it would be reasonable to trigger only if

- the chipset version is < CSB5,
- the drive is a hard disk,
- and the drive did not report an error?

(I am not certain about the last condition, but from the descriptions 
of the 4-byte-shift problem I have seen I infer that there was no drive
error condition involved).

Martin

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-10 15:52 Serverworks OSB4 in impossible state Martin Wilck
  2002-06-10 16:41 ` Daniela Engert
@ 2002-06-12  8:58 ` Alan Cox
  2002-06-12  8:47   ` Martin Wilck
  1 sibling, 1 reply; 18+ messages in thread
From: Alan Cox @ 2002-06-12  8:58 UTC (permalink / raw)
  To: Martin Wilck; +Cc: osb4-bug, Linux Kernel mailing list, Martin Wilck

Triggering the check on csb5/csb6 would be a bug - maybe an extra 
test is needed there as CSB5/6 are fine

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-12  8:47   ` Martin Wilck
@ 2002-06-12  9:14     ` Alan Cox
  2002-06-12 10:30       ` OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state) Martin Wilck
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Cox @ 2002-06-12  9:14 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Alan Cox, osb4-bug, Linux Kernel mailing list

> Would you concur that it would be reasonable to trigger only if
> 
> - the chipset version is < CSB5,
> - the drive is a hard disk,
> - and the drive did not report an error?
> 
> (I am not certain about the last condition, but from the descriptions 
> of the 4-byte-shift problem I have seen I infer that there was no drive
> error condition involved).

Entirely agreed


^ permalink raw reply	[flat|nested] 18+ messages in thread

* OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state)
  2002-06-12  9:14     ` Alan Cox
@ 2002-06-12 10:30       ` Martin Wilck
  2002-06-12 20:35         ` Christian Zoffoli
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Wilck @ 2002-06-12 10:30 UTC (permalink / raw)
  To: Alan Cox; +Cc: osb4-bug, Linux Kernel mailing list

Am Mit, 2002-06-12 um 11.14 schrieb Alan Cox:
 > Entirely agreed

I propose this patch to remedy the problem.

I don't know how to test if the drive is a seagate drive, and
I think we don't want to do that, because it would end up in yet another
blacklist.

I cannot test if this behaves correctly on machines that do expose the
4-byte shift bug - it would be great if somebody could test that.

Martin

--- drivers/ide/serverworks.c.orig	Tue Jun 11 11:24:59 2002
+++ drivers/ide/serverworks.c	Wed Jun 12 12:00:36 2002
@@ -547,7 +547,13 @@
 			ide_hwif_t *hwif		= HWIF(drive);
 			unsigned long dma_base		= hwif->dma_base;
 	
-			if(inb(dma_base+0x02)&1)
+			/* If it's a disk on the OSB4, the DMA engine is still on,
+			   and the device reports no error status, we are probably
+			   facing the "4 byte shift" problem */
+			if(drive->media == ide_disk && 
+			   hwif->pci_dev->device == PCI_DEVICE_ID_SERVERWORKS_OSB4IDE && 
+			   inb(dma_base+0x02)&1 &&
+			   OK_STAT (GET_STAT(), DRIVE_READY, BAD_STAT))
 			{
 #if 0		
 				int i;


-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state)
  2002-06-12 10:30       ` OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state) Martin Wilck
@ 2002-06-12 20:35         ` Christian Zoffoli
  0 siblings, 0 replies; 18+ messages in thread
From: Christian Zoffoli @ 2002-06-12 20:35 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Alan Cox, osb4-bug, Linux Kernel mailing list

Martin Wilck wrote:
> Am Mit, 2002-06-12 um 11.14 schrieb Alan Cox:
>  > Entirely agreed
> 
> I propose this patch to remedy the problem.
> 
> I don't know how to test if the drive is a seagate drive, and
> I think we don't want to do that, because it would end up in yet another
> blacklist.
> 
> I cannot test if this behaves correctly on machines that do expose the
> 4-byte shift bug - it would be great if somebody could test that.
> 
> Martin
> 
> --- drivers/ide/serverworks.c.orig	Tue Jun 11 11:24:59 2002
> +++ drivers/ide/serverworks.c	Wed Jun 12 12:00:36 2002
> @@ -547,7 +547,13 @@
>  			ide_hwif_t *hwif		= HWIF(drive);
>  			unsigned long dma_base		= hwif->dma_base;
>  	
> -			if(inb(dma_base+0x02)&1)
> +			/* If it's a disk on the OSB4, the DMA engine is still on,
> +			   and the device reports no error status, we are probably
> +			   facing the "4 byte shift" problem */
> +			if(drive->media == ide_disk && 
> +			   hwif->pci_dev->device == PCI_DEVICE_ID_SERVERWORKS_OSB4IDE && 
> +			   inb(dma_base+0x02)&1 &&
> +			   OK_STAT (GET_STAT(), DRIVE_READY, BAD_STAT))
>  			{
>  #if 0		
>  				int i;
> 
> 


It works for me ...I have a supermicro 370DE6 (serverworks HE-SL) and a 
maxtor HD (5T030H3).


Christian


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-11 11:25       ` Martin Wilck
  2002-06-11 21:27         ` Chris Wedgwood
@ 2002-06-13 11:50         ` Daniela Engert
  2002-06-13 11:59           ` Martin Wilck
  2002-06-13 23:48           ` Re[2]: " Nerijus Baliunas
  1 sibling, 2 replies; 18+ messages in thread
From: Daniela Engert @ 2002-06-13 11:50 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Alan Cox, Linux Kernel mailing list

Hi,

as promised I've conducted a test similar to Martin's to check the
behaviour of a Serverworks ROSB4 IDE controller in case of an aborted
ATAPI DMA transfer (probably due to a media error). In fact, I've done
this by comparing it with a known-to-be-good system, a dual processor
Intel BX based board with a PIIX4 IDE controller chip.

The following trace shows how it should be:

 - lines 158-172: setup DMA transfer, send command packet
 - lines 173-174: the DMA engine loads the first (of multiple)
                  PRD entry
 - the actual DMA induced memory writes are not shown here
 - line  175:     IRQ14 is acknowledged
 - lines 176-181: gather unit and DMA status
 - lines 182-210: issue "request sense" and get sense status


CD-ROM read error on Intel PIIX4:

______Time_______Burst_BE#__Wait___Command_Address__Data____
  158  20.089ms  .     1011    .   I/OWri  000001F6 ..B0....
  159	1.656us  .     1011    .   I/ORd   000003F6 ..50....
  160	10.08us  .     0000    .   I/OWri  0000F004 00EAC800
  161	812.7ns  .     1110    .   I/OWri  0000F000 ......08
  162	752.5ns  .     1011    .   I/OWri  0000F002 ..46....
  163	3.100us  .     1110    .   I/OWri  000001F4 ......FF
  164	4.214us  .     1101    .   I/OWri  000001F5 ....FF..
  165	4.244us  .     1101    .   I/OWri  000001F1 ....01..
  166	4.214us  .     0111    .   I/OWri  000001F7 A0......
  167	5.027us  .     1011    .   I/ORd   000003F6 ..58....
  168	5.448us  .     1011    .   I/OWri  0000F002 ..46....
  169	752.5ns  .     1110    .   I/OWri  0000F000 ......09
  170	1.204us  .     0000    .   I/OWri  000001F0 00000028
  171	903.0ns  .     0000    .   I/OWri  000001F0 0000440D
  172	903.0ns  .     0000    .   I/OWri  000001F0 0000001F
  173	 3.673s  Start 0000    .   MemRd   00EAC800 006E3000
  174	 30.1ns  B     0000    .   MemRd   00EAC800 0000D000
  175  648.99ms  .     1110    .   IntAck  ........ ......76
  176	5.779us  .     0111    .   I/ORd   000001F7 51......
  177	11.47us  .     0111    .   I/ORd   000001F7 51......
  178	1.324us  .     1011    .   I/ORd   000001F2 ..03....
  179	1.957us  .     1110    .   I/OWri  0000F000 ......08
  180	812.7ns  .     1011    .   I/ORd   0000F002 ..44....
  181	1.355us  .     1101    .   I/ORd   000001F1 ....30..
  182	9.361us  .     1011    .   I/OWri  000001F6 ..B0....
  183	1.806us  .     1011    .   I/ORd   000003F6 ..51....
  184	9.301us  .     1110    .   I/OWri  000001F4 ......12
  185	4.274us  .     1101    .   I/OWri  000001F5 ....00..
  186	4.244us  .     1101    .   I/OWri  000001F1 ....00..
  187	4.214us  .     0111    .   I/OWri  000001F7 A0......
  188	4.906us  .     1011    .   I/ORd   000003F6 ..58....
  189	6.020us  .     0000    .   I/OWri  000001F0 00000003
  190	903.0ns  .     0000    .   I/OWri  000001F0 00000012
  191	903.0ns  .     0000    .   I/OWri  000001F0 00000000
  192  258.17us  .     1110    .   IntAck  ........ ......76
  193	3.431us  .     0111    .   I/ORd   000001F7 58......
  194	10.08us  .     0111    .   I/ORd   000001F7 58......
  195	1.204us  .     1011    .   I/ORd   000001F2 ..02....
  196	1.535us  .     1101    .   I/ORd   000001F5 ....00..
  197	1.174us  .     1110    .   I/ORd   000001F4 ......12
  198	10.20us  .     1100    .   I/ORd   000001F0 ....0070
  199	1.475us  .     1100    .   I/ORd   000001F0 ....0003
  200	632.1ns  .     1100    .   I/ORd   000001F0 ....0000
  201	632.1ns  .     1100    .   I/ORd   000001F0 ....0A00
  202	602.0ns  .     1100    .   I/ORd   000001F0 ....0000
  203	632.1ns  .     1100    .   I/ORd   000001F0 ....0000
  204	602.0ns  .     1100    .   I/ORd   000001F0 ....0611
  205	632.1ns  .     1100    .   I/ORd   000001F0 ....0000
  206	602.0ns  .     1100    .   I/ORd   000001F0 ....0000
  207	12.79us  .     1110    .   IntAck  ........ ......76
  208	3.401us  .     0111    .   I/ORd   000001F7 50......
  209	9.361us  .     0111    .   I/ORd   000001F7 50......
  210	1.234us  .     1011    .   I/ORd   000001F2 ..03....


And here is the same with the ROSB4. This time, some of the
DMA writes are shown. After loading the second PRD entry
which describes a memory region of 7800h bytes, 3000h bytes
are transferred before IRQ14 is asserted. The IRQ14 INTACK
cycle is the last transaction on the PCI bus ever, the
machine is completely frozen!

CD-ROM read error on ServerWorks ROSB4 revision 0:

______Time_______Burst_BE#__Wait___Command_Address__Data____
51316  297.63us  .     1011    .   I/OWri  000001F6 ..B0....
51317	1.530us  .     1011    .   I/ORd   000003F6 ..50....
51318	6.300us  .     0000    .   I/OWri  00005404 00EF2800
51319	  450ns  .     1110    .   I/OWri  00005400 ......08
51320	  450ns  .     1011    .   I/OWri  00005402 ..66....
51321	1.440us  .     1110    .   I/OWri  000001F4 ......FF
51322	3.480us  .     1101    .   I/OWri  000001F5 ....FF..
51323	3.480us  .     1101    .   I/OWri  000001F1 ....01..
51324	3.510us  .     0111    .   I/OWri  000001F7 A0......
51325	4.470us  .     1011    .   I/ORd   000003F6 ..58....
51326	4.620us  .     1011    .   I/OWri  00005402 ..66....
51327	  660ns  .     0000    .   I/OWri  000001F0 00000028
51328	  420ns  .     0000    .   I/OWri  000001F0 0000F80D
51329	  420ns  .     0000    .   I/OWri  000001F0 0000001F
51330	1.290us  .     1011    .   I/ORd   000003F6 ..D0....
51331	3.660us  .     1110    .   I/OWri  00005400 ......09
51332	1.290us  .     0000    .   MemRd   00EF2800 00B08000
51333	  630ns  .     0000    .   MemRd   00EF2804 00008000
51334  166.11us  Start 0000    .   MemWri  00B08000 7BC0728C
51335	   30ns  B     0000    .   MemWri  00B08000 285DA7D0
51336	   30ns  B     0000    .   MemWri  00B08000 9FAE557A
51337	   30ns  B     0000    .   MemWri  00B08000 B3F88165
51338	   30ns  B     0000    .   MemWri  00B08000 BDFD7823
51339	   30ns  B     0000    .   MemWri  00B08000 42ED22D0
51340	   30ns  B     0000    .   MemWri  00B08000 7BA5743F
51341	   30ns  B     0000    .   MemWri  00B08000 6B5897BA
51342	  780ns  Start 0000    .   MemWri  00B08020 ACF1D36B
  ..
  ..
59518	  930ns  Start 0000    .   MemWri  00B0FFE0 845971B8
59519	   30ns  B     0000    .   MemWri  00B0FFE0 7E325F95
59520	   30ns  B     0000    .   MemWri  00B0FFE0 7ADA36D0
59521	   30ns  B     0000    .   MemWri  00B0FFE0 96BD435C
59522	   30ns  B     0000    .   MemWri  00B0FFE0 4ED88CB0
59523	   30ns  B     0000    .   MemWri  00B0FFE0 2E1CCAF7
59524	   30ns  B     0000    .   MemWri  00B0FFE0 FC8782B3
59525	   30ns  B     0000    .   MemWri  00B0FFE0 9C0A2335
59526	  780ns  .     0000    .   MemRd   00EF2808 00B10000
59527	  630ns  .     0000    .   MemRd   00EF280C 80007800
59528  1.2518ms  Start 0000    .   MemWri  00B10000 E85C33CD
59529	   30ns  B     0000    .   MemWri  00B10000 AD2F9613
59530	   30ns  B     0000    .   MemWri  00B10000 D8BEC924
59531	   30ns  B     0000    .   MemWri  00B10000 E273C0BD
59532	   30ns  B     0000    .   MemWri  00B10000 DC655F5E
59533	   30ns  B     0000    .   MemWri  00B10000 69B3087B
59534	   30ns  B     0000    .   MemWri  00B10000 369B26D1
59535	   30ns  B     0000    .   MemWri  00B10000 9A8C47DF
59536	  780ns  Start 0000    .   MemWri  00B10020 3F026EA5
  ..
  ..
62592	  750ns  Start 0000    .   MemWri  00B12FE0 367016E1
62593	   30ns  B     0000    .   MemWri  00B12FE0 35654905
62594	   30ns  B     0000    .   MemWri  00B12FE0 9968FF02
62595	   30ns  B     0000    .   MemWri  00B12FE0 9ABB5CAE
62596	   30ns  B     0000    .   MemWri  00B12FE0 D32DF135
62597	   30ns  B     0000    .   MemWri  00B12FE0 7A03326A
62598	   30ns  B     0000    .   MemWri  00B12FE0 86CCE8BF
62599	   30ns  B     0000    .   MemWri  00B12FE0 D4E66D21
62600	 1.176s  .     1110    .   IntAck  ........ ......76


My conclusion: don't do ATAPI DMA on a serverworks ROSB4 revision 0 IDE
controller.

Ciao,
  Dani

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gräfenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-13 11:50         ` Daniela Engert
@ 2002-06-13 11:59           ` Martin Wilck
  2002-06-13 12:04             ` Daniela Engert
  2002-06-13 23:48           ` Re[2]: " Nerijus Baliunas
  1 sibling, 1 reply; 18+ messages in thread
From: Martin Wilck @ 2002-06-13 11:59 UTC (permalink / raw)
  To: Daniela Engert; +Cc: Alan Cox, Linux Kernel mailing list

Am Don, 2002-06-13 um 13.50 schrieb Daniela Engert:

> And here is the same with the ROSB4. This time, some of the
> DMA writes are shown. After loading the second PRD entry
> which describes a memory region of 7800h bytes, 3000h bytes
> are transferred before IRQ14 is asserted. The IRQ14 INTACK
> cycle is the last transaction on the PCI bus ever, the
> machine is completely frozen!

You say (dma_base+2) is never read?
Was that a Linux system? If yes, I assume you never saw "OSB4 in
impossible state ..." ?

Martin

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-13 11:59           ` Martin Wilck
@ 2002-06-13 12:04             ` Daniela Engert
  2002-06-13 18:27               ` rico-linux-kernel
  0 siblings, 1 reply; 18+ messages in thread
From: Daniela Engert @ 2002-06-13 12:04 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Alan Cox, Linux Kernel mailing list

On 13 Jun 2002 13:59:06 +0200, Martin Wilck wrote:

>Am Don, 2002-06-13 um 13.50 schrieb Daniela Engert:

>> are transferred before IRQ14 is asserted. The IRQ14 INTACK
>> cycle is the last transaction on the PCI bus ever, the
>> machine is completely frozen!
>
>You say (dma_base+2) is never read?

Exactly. If checked this twice, the PCI tracer was configured to gather
*all* PCI bus events.

>Was that a Linux system?

No, I think this doesn't matter here at all, because the hardware
stalls completely - full stop.

Ciao,
  Dani

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gräfenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Serverworks OSB4 in impossible state
  2002-06-13 12:04             ` Daniela Engert
@ 2002-06-13 18:27               ` rico-linux-kernel
  0 siblings, 0 replies; 18+ messages in thread
From: rico-linux-kernel @ 2002-06-13 18:27 UTC (permalink / raw)
  To: dani; +Cc: linux-kernel

Thanks for investing time on the logic analyser, Dani.  My experience
is slightly different.

I have several mainboards (Tyan S1867) with older chipsets from
ServerWorks (f.k.a. Reliance).  The IDE controller (OSB4 rev 0) is used
daily with ATAPI CDRW drives in UDMA(33) Mode.  System handles read/write
errors without problem.

The system will lock solid when both IDE channels are accessed,
and either one is using DMA.  Since I want DMA, I simply abandon the
secondary channel.

I have spare machines available for quack medical experiments.

Select boot-time info...

Linux version 2.4.17 (rico@pc2) (gcc version 2.95.3 20010315 (release)) #1 SMP Mon Dec 31 11:51:33 CST 2001
ServerWorks OSB4: IDE controller on PCI bus 00 dev 79
ServerWorks OSB4: chipset revision 0
ServerWorks OSB4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xfcb0-0xfcb7, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0xfcb8-0xfcbf, BIOS settings: hdc:pio, hdd:pio
hda: PLEXTOR CD-R PX-W2410A, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 40X CD-ROM CD-R/RW drive, 4096kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re[2]: Serverworks OSB4 in impossible state
  2002-06-13 11:50         ` Daniela Engert
  2002-06-13 11:59           ` Martin Wilck
@ 2002-06-13 23:48           ` Nerijus Baliunas
  1 sibling, 0 replies; 18+ messages in thread
From: Nerijus Baliunas @ 2002-06-13 23:48 UTC (permalink / raw)
  To: Daniela Engert; +Cc: Linux Kernel mailing list

On Thu, 13 Jun 2002 13:50:25 +0200 (CDT) Daniela Engert <dani@ngrt.de> wrote:

> My conclusion: don't do ATAPI DMA on a serverworks ROSB4 revision 0 IDE
> controller.

How can I find revision? I have a problem with (Seagate) hdds, but lspci -v
only shows:

00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 51)
        Subsystem: ServerWorks OSB4 South Bridge
        Flags: bus master, medium devsel, latency 0


00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller (prog-if 8a [Master SecP PriP])
        Flags: bus master, medium devsel, latency 64
        I/O ports at 2000 [size=16]


Regards,
Nerijus


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2002-06-13 23:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-10 15:52 Serverworks OSB4 in impossible state Martin Wilck
2002-06-10 16:41 ` Daniela Engert
2002-06-11  7:22   ` Martin Wilck
2002-06-11  7:45     ` Daniela Engert
2002-06-11  8:37       ` Martin Wilck
2002-06-11 11:25       ` Martin Wilck
2002-06-11 21:27         ` Chris Wedgwood
2002-06-12  7:24           ` Martin Wilck
2002-06-13 11:50         ` Daniela Engert
2002-06-13 11:59           ` Martin Wilck
2002-06-13 12:04             ` Daniela Engert
2002-06-13 18:27               ` rico-linux-kernel
2002-06-13 23:48           ` Re[2]: " Nerijus Baliunas
2002-06-12  8:58 ` Alan Cox
2002-06-12  8:47   ` Martin Wilck
2002-06-12  9:14     ` Alan Cox
2002-06-12 10:30       ` OSB4 PATCH (was: Re: Serverworks OSB4 in impossible state) Martin Wilck
2002-06-12 20:35         ` Christian Zoffoli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox