All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen / Dell 2850 PERC 4e/Di lock up
@ 2005-07-19 17:15 Shane Chen
  2005-07-19 17:59 ` Will DeHaan
  0 siblings, 1 reply; 8+ messages in thread
From: Shane Chen @ 2005-07-19 17:15 UTC (permalink / raw)
  To: xen-devel

I've read the posts where the other users have been reporting Dell
boxes hanging.  I can readily reproduce the symtom, and hopefully
provide some extra info.

This is for a Xen project for work.  I have it running on a
development box with different hardware with no problems whatsoever. 
However, for the rollout stage, Dell 2850's are to be used w/ the PERC
4e/Di controller (new megaraid driver).

I've tried xen-2.0-testing, 2.0.6 and 2.0.4 (all compiled from source), and
the boxes "lock up" exactly the same way.  Basically, I (re)boot the
box to the 2.6.11.10-xen0 (or 2.6.11.12-xen0) kernel, ssh in, issue
`vgscan; vgchange -ay; mkfs.ext3 /dev/lvm/device` and it'll proceed
briefly then appear to be completely locked up (network dies),
keyboard doesn't respond (can't toggle caps lock), etc.

However, one time I just decided to let it sit there (over lunch) and
upon returning, was able to log in using the console.  Examining the
dmesg, I noticed:

Jul 18 12:36:18 xen0 megaraid: aborting-17662 cmd=2a <c=2 t=0 l=0>
Jul 18 12:36:18 xen0 megaraid abort: scsi cmd:17662, do now own

repeated over and over again.

I tried disabling USB (via nousb boot param), and it locks up exactly
the same way.

Booting the normal vmlinuz-2.6.12-gentoo-r4 kernel and performing the
same steps do not lock up the box.

Please let me know if there's any additional info you'd like me to
provide or anything else you'd like me to try.  If this issue cannot
be resolved, then I'll obviously have to scrap the Xen project.  And
having played with UMLs, I'd rather not have to touch that if possible
at all.

Thanks,
Shane

# uname -a
Linux xen0 2.6.11.12-xen0 #3 Thu Jul 14 11:16:17 PDT 2005 i686
Intel(R) Xeon(TM) CPU 3.20GHz GenuineIntel GNU/Linux


# lspci -v
0000:00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 09)
       Subsystem: Dell: Unknown device 016d
       Flags: bus master, fast devsel, latency 0
       Capabilities: [40] #09 [4105]

0000:00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI
Express Port A (rev 09) (prog-if 00 [Normal decode])
       Flags: bus master, fast devsel, latency 0
       Bus: primary=00, secondary=01, subordinate=03, sec-latency=0
       Memory behind bridge: dfc00000-dfefffff
       Prefetchable memory behind bridge: 00000000d8000000-00000000d8000000
       Secondary status: SERR
       Capabilities: [50] Power Management version 2
       Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
       Capabilities: [64] #10 [0041]

0000:00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express
Port B (rev 09) (prog-if 00 [Normal decode])
       Flags: bus master, fast devsel, latency 0
       Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
       Capabilities: [50] Power Management version 2
       Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
       Capabilities: [64] #10 [0041]

0000:00:05.0 PCI bridge: Intel Corporation E7520 PCI Express Port B1
(rev 09) (prog-if 00 [Normal decode])
       Flags: bus master, fast devsel, latency 0
       Bus: primary=00, secondary=05, subordinate=07, sec-latency=0
       I/O behind bridge: 0000d000-0000efff
       Memory behind bridge: df700000-dfbfffff
       Secondary status: SERR
       Expansion ROM at 0000d000 [disabled] [size=8K]
       Capabilities: [50] Power Management version 2
       Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
       Capabilities: [64] #10 [0041]

0000:00:06.0 PCI bridge: Intel Corporation E7520 PCI Express Port C
(rev 09) (prog-if 00 [Normal decode])
       Flags: bus master, fast devsel, latency 0
       Bus: primary=00, secondary=08, subordinate=0a, sec-latency=0
       Memory behind bridge: df600000-df6fffff
       Secondary status: SERR
       Capabilities: [50] Power Management version 2
       Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
       Capabilities: [64] #10 [0041]

0000:00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R)
USB UHCI Controller #1 (rev 02) (prog-if 00 [UHCI])
       Subsystem: Dell: Unknown device 016d
       Flags: bus master, medium devsel, latency 0, IRQ 16
       I/O ports at bce0 [size=32]

0000:00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R)
USB UHCI Controller #2 (rev 02) (prog-if 00 [UHCI])
       Subsystem: Dell: Unknown device 016d
       Flags: bus master, medium devsel, latency 0, IRQ 19
       I/O ports at bcc0 [size=32]

0000:00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R)
USB UHCI #3 (rev 02) (prog-if 00 [UHCI])
       Subsystem: Dell: Unknown device 016d
       Flags: bus master, medium devsel, latency 0, IRQ 18
       I/O ports at bca0 [size=32]

0000:00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R)
USB2 EHCI Controller (rev 02) (prog-if 20 [EHCI])
       Subsystem: Dell: Unknown device 016d
       Flags: bus master, medium devsel, latency 0, IRQ 23
       Memory at dff00000 (32-bit, non-prefetchable)
       Capabilities: [50] Power Management version 2
       Capabilities: [58] #0a [20a0]

0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
(prog-if 00 [Normal decode])
       Flags: bus master, fast devsel, latency 0
       Bus: primary=00, secondary=0b, subordinate=0b, sec-latency=32
       I/O behind bridge: 0000c000-0000cfff
       Memory behind bridge: df400000-df5fffff
       Prefetchable memory behind bridge: d0000000-d7ffffff

0000:00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC
Interface Bridge (rev 02)
       Flags: bus master, medium devsel, latency 0

0000:00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R)
IDE Controller (rev 02) (prog-if 8a [Master SecP PriP])
       Subsystem: Dell: Unknown device 016d
       Flags: bus master, medium devsel, latency 0
       I/O ports at <unassigned>
       I/O ports at <unassigned>
       I/O ports at <unassigned>
       I/O ports at <unassigned>
       I/O ports at fc00 [size=16]
       Memory at 40000000 (32-bit, non-prefetchable) [size=1K]

0000:01:00.0 PCI bridge: Intel Corporation 80332 [Dobson] I/O
processor (rev 06) (prog-if 00 [Normal decode])
       Flags: bus master, fast devsel, latency 0
       Bus: primary=01, secondary=02, subordinate=02, sec-latency=64
       Memory behind bridge: dfd00000-dfefffff
       Prefetchable memory behind bridge: 00000000d8000000-00000000d8000000
       Capabilities: [44] #10 [0071]
       Capabilities: [5c] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-
       Capabilities: [6c] Power Management version 2
       Capabilities: [d8]
0000:01:00.2 PCI bridge: Intel Corporation 80332 [Dobson] I/O
processor (rev 06) (prog-if 00 [Normal decode])
       Flags: bus master, fast devsel, latency 0
       Bus: primary=01, secondary=03, subordinate=03, sec-latency=64
       Capabilities: [44] #10 [0071]
       Capabilities: [5c] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-
       Capabilities: [6c] Power Management version 2
       Capabilities: [d8]
0000:02:0e.0 RAID bus controller: Dell PowerEdge Expandable RAID
controller 4 (rev 06)
       Subsystem: Dell PowerEdge Expandable RAID Controller 4e/Di
       Flags: bus master, stepping, 66Mhz, medium devsel, latency 64, IRQ 38
       Memory at d80f0000 (32-bit, prefetchable) [size=dfe00000]
       Memory at dfdc0000 (32-bit, non-prefetchable) [size=256K]
       Expansion ROM at 00020000 [disabled]
       Capabilities: [c0] Power Management version 2
       Capabilities: [d0] Message Signalled Interrupts: 64bit+
Queue=0/1 Enable-
       Capabilities: [e0] PCI-X non-bridge device.

0000:05:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI
Bridge A (rev 09) (prog-if 00 [Normal decode])
       Flags: bus master, fast devsel, latency 0
       Bus: primary=05, secondary=06, subordinate=06, sec-latency=32
       I/O behind bridge: 0000e000-0000efff
       Memory behind bridge: dfa00000-dfbfffff
       Expansion ROM at 0000e000 [disabled] [size=4K]
       Capabilities: [44] #10 [0071]
       Capabilities: [5c] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-
       Capabilities: [6c] Power Management version 2
       Capabilities: [d8]
0000:05:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI
Bridge B (rev 09) (prog-if 00 [Normal decode])
       Flags: bus master, fast devsel, latency 0
       Bus: primary=05, secondary=07, subordinate=07, sec-latency=32
       I/O behind bridge: 0000d000-0000dfff
       Memory behind bridge: df800000-df9fffff
       Expansion ROM at 0000d000 [disabled] [size=4K]
       Capabilities: [44] #10 [0071]
       Capabilities: [5c] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-
       Capabilities: [6c] Power Management version 2
       Capabilities: [d8]
0000:06:07.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit
Ethernet Controller (rev 05)
       Subsystem: Dell: Unknown device 016d
       Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 48
       Memory at dfae0000 (32-bit, non-prefetchable)
       I/O ports at ecc0 [size=64]
       Capabilities: [dc] Power Management version 2
       Capabilities: [e4] PCI-X non-bridge device.

0000:07:08.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit
Ethernet Controller (rev 05)
       Subsystem: Dell: Unknown device 016d
       Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 49
       Memory at df8e0000 (32-bit, non-prefetchable)
       I/O ports at dcc0 [size=64]
       Capabilities: [dc] Power Management version 2
       Capabilities: [e4] PCI-X non-bridge device.

0000:08:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI
Bridge A (rev 09) (prog-if 00 [Normal decode])
       Flags: bus master, fast devsel, latency 0
       Bus: primary=08, secondary=09, subordinate=09, sec-latency=64
       Capabilities: [44] #10 [0071]
       Capabilities: [5c] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-
       Capabilities: [6c] Power Management version 2
       Capabilities: [d8]
0000:08:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI
Bridge B (rev 09) (prog-if 00 [Normal decode])
       Flags: bus master, fast devsel, latency 0
       Bus: primary=08, secondary=0a, subordinate=0a, sec-latency=64
       Capabilities: [44] #10 [0071]
       Capabilities: [5c] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-
       Capabilities: [6c] Power Management version 2
       Capabilities: [d8]
0000:0b:0d.0 VGA compatible controller: ATI Technologies Inc Radeon
RV100 QY [Radeon 7000/VE] (prog-if 00 [VGA])
       Subsystem: Dell: Unknown device 016d
       Flags: bus master, VGA palette snoop, stepping, medium devsel,
latency 32, IRQ 18
       Memory at d0000000 (32-bit, prefetchable)
       I/O ports at cc00 [size=256]
       Memory at df4f0000 (32-bit, non-prefetchable) [size=64K]
       Capabilities: [50] Power Management version 2


# cat /proc/interrupts
          CPU0
 1:          8        Phys-irq  i8042
 8:          2        Phys-irq  rtc
 14:         29        Phys-irq  ide0
 16:      41014        Phys-irq  uhci_hcd
 18:        931        Phys-irq  uhci_hcd
 19:         10        Phys-irq  uhci_hcd
 23:         24        Phys-irq  ehci_hcd
 38:       1848        Phys-irq  megaraid
 48:      14018        Phys-irq  eth0
128:          1     Dynamic-irq  misdirect
129:          0     Dynamic-irq  ctrl-if
130:      78781     Dynamic-irq  timer
131:          0     Dynamic-irq  console
132:          0     Dynamic-irq  net-be-dbg
NMI:          0
ERR:          0


# gcc -v
Reading specs from /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.5-20050130/specs
Configured with:
/var/tmp/portage/gcc-3.3.5.20050130-r1/work/gcc-3.3.5/configure
--enable-version-specific-runtime-libs --prefix=/usr
--bindir=/usr/i686-pc-linux-gnu/gcc-bin/3.3.5-20050130
--includedir=/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.5-20050130/include
--datadir=/usr/share/gcc-data/i686-pc-linux-gnu/3.3.5-20050130
--mandir=/usr/share/gcc-data/i686-pc-linux-gnu/3.3.5-20050130/man
--infodir=/usr/share/gcc-data/i686-pc-linux-gnu/3.3.5-20050130/info
--with-gxx-include-dir=/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.5-20050130/include/g++-v3
--host=i686-pc-linux-gnu --build=i686-pc-linux-gnu --disable-altivec
--enable-nls --without-included-gettext --with-system-zlib
--disable-checking --disable-werror --disable-libunwind-exceptions
--disable-multilib --disable-libgcj --enable-languages=c,c++,f77
--enable-shared --enable-threads=posix --enable-__cxa_atexit
--enable-clocale=gnu
Thread model: posix
gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1,
ssp-3.3.5.20050130-1, pie-8.7.7.1)

# xm dmesg
ERROR: cannot use unconfigured serial port COM1
 __  __            ____    ___     _            _   _
 \ \/ /___ _ __   |___ \  / _ \   | |_ ___  ___| |_(_)_ __   __ _
  \  // _ \ '_ \    __) || | | |__| __/ _ \/ __| __| | '_ \ / _` |
  /  \  __/ | | |  / __/ | |_| |__| ||  __/\__ \ |_| | | | | (_| |
 /_/\_\___|_| |_| |_____(_)___/    \__\___||___/\__|_|_| |_|\__, |
                                                            |___/
 http://www.cl.cam.ac.uk/netos/xen
 University of Cambridge Computer Laboratory

 Xen version 2.0-testing (root@(none)) (gcc version 3.3.5-20050130
(Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) Thu Jul
14 11:26:28 PDT 2005
 Latest ChangeSet: information unavailable

(XEN) Physical RAM map:
(XEN)  0000000000000000 - 00000000000a0000 (usable)
(XEN)  0000000000100000 - 000000003ffc0000 (usable)
(XEN)  000000003ffc0000 - 000000003ffcfc00 (ACPI data)
(XEN)  000000003ffcfc00 - 000000003ffff000 (reserved)
(XEN)  00000000e0000000 - 00000000fec90000 (reserved)
(XEN)  00000000fed00000 - 00000000fed00400 (reserved)
(XEN)  00000000fee00000 - 00000000fee10000 (reserved)
(XEN)  00000000ffb00000 - 0000000100000000 (reserved)
(XEN) System RAM: 1023MB (1047936kB)
(XEN) Xen heap: 10MB (10764kB)
(XEN) CPU0: Before vendor init, caps: bfebfbff 20100000 00000000, vendor = 0
(XEN) CPU#0: Physical ID: 0, Logical ID: 0
(XEN) CPU caps: bfebfbff 20100000 00000000 00000000
(XEN) found SMP MP-table at 000fe710
(XEN) ACPI: RSDP (v000 DELL                                      ) @ 0x000fd650
(XEN) ACPI: RSDT (v001 DELL   PE BKC   0x00000001 MSFT 0x0100000a) @ 0x000fd664
(XEN) ACPI: FADT (v001 DELL   PE BKC   0x00000001 MSFT 0x0100000a) @ 0x000fd6b0
(XEN) ACPI: MADT (v001 DELL   PE BKC   0x00000001 MSFT 0x0100000a) @ 0x000fd724
(XEN) ACPI: SPCR (v001 DELL   PE BKC   0x00000001 MSFT 0x0100000a) @ 0x000fd7cc
(XEN) ACPI: HPET (v001 DELL   PE BKC   0x00000001 MSFT 0x0100000a) @ 0x000fd81c
(XEN) ACPI: MCFG (v001 DELL   PE BKC   0x00000001 MSFT 0x0100000a) @ 0x000fd854
(XEN) ACPI: DSDT (v001 DELL   PE BKC   0x00000001 MSFT 0x0100000e) @ 0x00000000
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) Processor #0 Unknown CPU [15:4] APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
(XEN) Processor #6 Unknown CPU [15:4] APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
(XEN) Processor #1 Unknown CPU [15:4] APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] enabled)
(XEN) Processor #7 Unknown CPU [15:4] APIC version 20
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
(XEN) Using ACPI for processor (LAPIC) configuration information
(XEN) Intel MultiProcessor Specification v1.4
(XEN)     Virtual Wire compatibility mode.
(XEN) OEM ID: DELL     Product ID: PE 016D      APIC at: 0xFEE00000
(XEN) I/O APIC #8 Version 32 at 0xFEC00000.
(XEN) I/O APIC #9 Version 32 at 0xFEC80000.
(XEN) I/O APIC #10 Version 32 at 0xFEC83000.
(XEN) I/O APIC #11 Version 32 at 0xFEC84000.
(XEN) Enabling APIC mode: Flat. Using 4 I/O APICs
(XEN) Processors: 4
(XEN) Using scheduler: Borrowed Virtual Time (bvt)
(XEN) Initializing CPU#0
(XEN) Detected 3192.149 MHz processor.
(XEN) CPU0: Before vendor init, caps: bfebfbff 20100000 00000000, vendor = 0
(XEN) CPU#0: Physical ID: 0, Logical ID: 0
(XEN) CPU caps: bfebfbff 20100000 00000000 00000000
(XEN) CPU0 booted
(XEN) enabled ExtINT on CPU#0
(XEN) ESR value before enabling vector: 00000000
(XEN) ESR value after enabling vector: 00000000
(XEN) Booting processor 1/1 eip 90000
(XEN) Initializing CPU#1
(XEN) masked ExtINT on CPU#1
(XEN) ESR value before enabling vector: 00000000
(XEN) ESR value after enabling vector: 00000000
(XEN) CPU1: Before vendor init, caps: bfebfbff 20100000 00000000, vendor = 0
(XEN) CPU#1: Physical ID: 0, Logical ID: 1
(XEN) CPU caps: bfebfbff 20100000 00000000 00000000
(XEN) CPU1 has booted.
(XEN) Booting processor 2/6 eip 90000
(XEN) Initializing CPU#2
(XEN) masked ExtINT on CPU#2
(XEN) ESR value before enabling vector: 00000000
(XEN) ESR value after enabling vector: 00000000
(XEN) CPU2: Before vendor init, caps: bfebfbff 20100000 00000000, vendor = 0
(XEN) CPU#2: Physical ID: 3, Logical ID: 0
(XEN) CPU caps: bfebfbff 20100000 00000000 00000000
(XEN) CPU2 has booted.
(XEN) Booting processor 3/7 eip 90000
(XEN) Initializing CPU#3
(XEN) masked ExtINT on CPU#3
(XEN) ESR value before enabling vector: 00000000
(XEN) ESR value after enabling vector: 00000000
(XEN) CPU3: Before vendor init, caps: bfebfbff 20100000 00000000, vendor = 0
(XEN) CPU#3: Physical ID: 3, Logical ID: 1
(XEN) CPU caps: bfebfbff 20100000 00000000 00000000
(XEN) CPU3 has booted.
(XEN) Total of 4 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN) Setting 8 in the phys_id_present_map
(XEN) ...changing IO-APIC physical APIC ID to 8 ... ok.
(XEN) Setting 9 in the phys_id_present_map
(XEN) ...changing IO-APIC physical APIC ID to 9 ... ok.
(XEN) Setting 10 in the phys_id_present_map
(XEN) ...changing IO-APIC physical APIC ID to 10 ... ok.
(XEN) Setting 11 in the phys_id_present_map
(XEN) ...changing IO-APIC physical APIC ID to 11 ... ok.
(XEN) init IO_APIC IRQs
(XEN) ..TIMER: vector=0x41 pin1=2 pin2=0
(XEN) Using local APIC timer interrupts.
(XEN) Calibrating APIC timer for CPU0...
(XEN) ..... CPU speed is 3192.1249 MHz.
(XEN) ..... Bus speed is 199.5077 MHz.
(XEN) ..... bus_scale = 0x0000CC4F
(XEN) checking TSC synchronization across CPUs: passed.
(XEN) Time init:
(XEN) .... System Time: 954414204ns
(XEN) .... cpu_freq:    00000000:BE445520
(XEN) .... scale:       00000001:40C97903
(XEN) .... Wall Clock:  1121792577s 170000us
(XEN) PCI: PCI BIOS revision 2.10 entry at 0xfbf0e, last bus=11
(XEN) PCI: Using configuration type 1
(XEN) PCI: Probing PCI hardware
(XEN) PCI: Probing PCI hardware (bus 00)
(XEN) PCI: Ignoring BAR0-3 of IDE controller 00:1f.1
(XEN) Transparent bridge - PCI device 8086:244e
(XEN) PCI: Using IRQ router PIIX/ICH [8086/24d0] at 00:1f.0
(XEN) PCI->APIC IRQ transform: (B0,I2,P0) -> 16
(XEN) PCI->APIC IRQ transform: (B0,I4,P0) -> 16
(XEN) PCI->APIC IRQ transform: (B0,I5,P0) -> 16
(XEN) PCI->APIC IRQ transform: (B0,I6,P0) -> 16
(XEN) PCI->APIC IRQ transform: (B0,I29,P0) -> 16
(XEN) PCI->APIC IRQ transform: (B0,I29,P1) -> 19
(XEN) PCI->APIC IRQ transform: (B0,I29,P2) -> 18
(XEN) PCI->APIC IRQ transform: (B0,I29,P3) -> 23
(XEN) PCI->APIC IRQ transform: (B2,I14,P0) -> 38
(XEN) PCI->APIC IRQ transform: (B6,I7,P0) -> 48
(XEN) PCI->APIC IRQ transform: (B7,I8,P0) -> 49
(XEN) PCI->APIC IRQ transform: (B11,I13,P0) -> 18
(XEN) mtrr: v2.0 (20020519)
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Xen-ELF header found:
'GUEST_OS=linux,GUEST_VER=2.6,XEN_VER=2.0,VIRT_BASE=0xC0000000,LOADER=generic,PT_MODE_WRITABLE'
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Kernel image:  00c00000->00ef931c
(XEN)  Initrd image:  00000000->00000000
(XEN)  Dom0 alloc.:   01000000->11000000
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: c0100000->c04223b4
(XEN)  Init. ramdisk: c0423000->c0423000
(XEN)  Phys-Mach map: c0423000->c0463000
(XEN)  Page tables:   c0463000->c0466000
(XEN)  Start info:    c0466000->c0467000
(XEN)  Boot stack:    c0467000->c0468000
(XEN)  TOTAL:         c0000000->c0800000
(XEN)  ENTRY ADDRESS: c0100000
(XEN) Scrubbing DOM0 RAM: ...done.
(XEN) Scrubbing Free RAM: ...........done.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
input to Xen).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xen / Dell 2850 PERC 4e/Di lock up
  2005-07-19 17:15 Xen / Dell 2850 PERC 4e/Di lock up Shane Chen
@ 2005-07-19 17:59 ` Will DeHaan
  2005-07-19 18:08   ` Shane Chen
  0 siblings, 1 reply; 8+ messages in thread
From: Will DeHaan @ 2005-07-19 17:59 UTC (permalink / raw)
  To: xen-devel; +Cc: Shane Chen

On Tue, 19 Jul 2005, Shane Chen wrote:

> This is for a Xen project for work.  I have it running on a
> development box with different hardware with no problems whatsoever.
> However, for the rollout stage, Dell 2850's are to be used w/ the PERC
> 4e/Di controller (new megaraid driver).
>
> I've tried xen-2.0-testing, 2.0.6 and 2.0.4 (all compiled from source), and
> the boxes "lock up" exactly the same way.

Dell's PERC4e/Di controllers with firmware older than 516A are known to 
lock up due to overly aggressive cache memory timing. The megaraid driver 
is suspect of course but ensure your firmware is current.


 	-- Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xen / Dell 2850 PERC 4e/Di lock up
  2005-07-19 17:59 ` Will DeHaan
@ 2005-07-19 18:08   ` Shane Chen
  0 siblings, 0 replies; 8+ messages in thread
From: Shane Chen @ 2005-07-19 18:08 UTC (permalink / raw)
  To: xen-devel

On 7/19/05, Will DeHaan <will@aggro.us> wrote:
> Dell's PERC4e/Di controllers with firmware older than 516A are known to
> lock up due to overly aggressive cache memory timing. The megaraid driver
> is suspect of course but ensure your firmware is current.

megaraid: fw version:[516A] bios version:[H418]
scsi0 : LSI Logic MegaRAID driver
scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices
  Vendor: PE/PV     Model: 1x6 SCSI BP       Rev: 1.0
  Type:   Processor                          ANSI SCSI revision: 02
scsi[0]: scanning scsi channel 1 [Phy 1] for non-raid devices
scsi[0]: scanning scsi channel 2 [virtual] for logical drives
  Vendor: MegaRAID  Model: LD 0 RAID5  279G  Rev: 516A
  Type:   Direct-Access                      ANSI SCSI revision: 02


Shane

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Xen / Dell 2850 PERC 4e/Di lock up
@ 2005-07-19 19:19 Ian Pratt
  2005-07-19 21:39 ` Shane Chen
  0 siblings, 1 reply; 8+ messages in thread
From: Ian Pratt @ 2005-07-19 19:19 UTC (permalink / raw)
  To: Shane Chen, xen-devel

 

> I've tried xen-2.0-testing, 2.0.6 and 2.0.4 (all compiled 
> from source), and the boxes "lock up" exactly the same way.  
> Basically, I (re)boot the box to the 2.6.11.10-xen0 (or 
> 2.6.11.12-xen0) kernel, ssh in, issue `vgscan; vgchange -ay; 
> mkfs.ext3 /dev/lvm/device` and it'll proceed briefly then 
> appear to be completely locked up (network dies), keyboard 
> doesn't respond (can't toggle caps lock), etc.

Have you tried 2.0-testing with 'noirqballance'?

Ian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xen / Dell 2850 PERC 4e/Di lock up
  2005-07-19 19:19 Ian Pratt
@ 2005-07-19 21:39 ` Shane Chen
  2005-07-20 11:02   ` Keir Fraser
  0 siblings, 1 reply; 8+ messages in thread
From: Shane Chen @ 2005-07-19 21:39 UTC (permalink / raw)
  To: xen-devel

On 7/19/05, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote:
> Have you tried 2.0-testing with 'noirqballance'?

I have to admit and say that I did post before I had turned up the
noirqbalance thread.  I've since then compiled a new testing kernel
(dea74c466948d94a181bf0009c44ea51  xen-2.0-testing-src.tgz) and tried
the noirqbalance (and did some more troubleshooting - more below).

First, the noirqbalance does help.  The box definitely does not
completely lock up.  But it would still sort of pause briefly.  I
guess you could say that it stutters.  However, I'm not completely
sure I managed to get it working because I never saw "XEN: Platform
quirk -- Disabling IRQ balancing/affinity" when I did `xm dmesg`.  So
then the curious question for me is then why it didn't completely hang
when I tried the same thing (multiple times).

Second, it turns out that I lied about the nousb thing.  The kernel I
had compiled used USB as modules.  So that even if you pass it nousb,
the coldplug script on gentoo manages to load all of the usb modules
regardless.

If I manually unload the usb modules, the lock up completely goes away
(does not stutter).  Since I don't need USB for my Xen project, I've
gone ahead and disabled it the in the BIOS.

I'll be glad to provide addtional information or try additional things
and give feedback.

Thanks,
Shane

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xen / Dell 2850 PERC 4e/Di lock up
  2005-07-19 21:39 ` Shane Chen
@ 2005-07-20 11:02   ` Keir Fraser
  2005-07-20 17:24     ` David H
  0 siblings, 1 reply; 8+ messages in thread
From: Keir Fraser @ 2005-07-20 11:02 UTC (permalink / raw)
  To: Shane Chen; +Cc: xen-devel


On 19 Jul 2005, at 22:39, Shane Chen wrote:

> However, I'm not completely
> sure I managed to get it working because I never saw "XEN: Platform
> quirk -- Disabling IRQ balancing/affinity" when I did `xm dmesg`.  So
> then the curious question for me is then why it didn't completely hang
> when I tried the same thing (multiple times).

Only the unstable tree prints that message, and only then when it 
automatically detects and applies fix to a buggy chipset (not if you 
manually specify noirqbalance)

  -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xen / Dell 2850 PERC 4e/Di lock up
  2005-07-20 11:02   ` Keir Fraser
@ 2005-07-20 17:24     ` David H
  2005-07-21 17:04       ` Shane Chen
  0 siblings, 1 reply; 8+ messages in thread
From: David H @ 2005-07-20 17:24 UTC (permalink / raw)
  To: xen-devel

I am seeing the same behavior on a similar system.  As previously
discussed, on these severs domain0 hangs under I/O load.  The results
of my testing with 2.0-testing and the latest unstable are as follows.

2.0-testing:
basic xen:              Hangs under load, never* comes back  *(waited 12 hours)
with noirqbalance:  Seems to fix the problem (network throughput ~5%
lower then with "nousb")
with nousb:            Reliably running for weeks

3.0-unstable:         
basic xen:              Hangs under load
with noirqbalance:  Long delays under load but does not hang (peak
network throughput greater than 2x 2.0-testing but delays lower
average throughput)
with nousb:            Reliably running for one day (peak network
throughput greater than 2.5x 2.0-testing with average throughput
slightly greater than 2x)

I am testing by coping a 1.5GB file using scp.  This will reliably,
and quickly hang both xen versions without noirqbalance or nousb.  The
same file is being copied each time between the same two systems.  For
each test a script copies the file 10 times or untill the domain
hangs.  The systems are connected by a GigE switch.  The xen version
are from yesterdays tar balls.

This system will be available for testing for the remainder of the
week.  Please let me know if there is anything I can do to help
resolve this problem.

I would also like to take a moment to thank everyone involved in this
project.  Xen is truly amazing and getting better all the time.

David

On 7/20/05, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:
> 
> On 19 Jul 2005, at 22:39, Shane Chen wrote:
> 
> > However, I'm not completely
> > sure I managed to get it working because I never saw "XEN: Platform
> > quirk -- Disabling IRQ balancing/affinity" when I did `xm dmesg`.  So
> > then the curious question for me is then why it didn't completely hang
> > when I tried the same thing (multiple times).
> 
> Only the unstable tree prints that message, and only then when it
> automatically detects and applies fix to a buggy chipset (not if you
> manually specify noirqbalance)
> 
>   -- Keir
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xen / Dell 2850 PERC 4e/Di lock up
  2005-07-20 17:24     ` David H
@ 2005-07-21 17:04       ` Shane Chen
  0 siblings, 0 replies; 8+ messages in thread
From: Shane Chen @ 2005-07-21 17:04 UTC (permalink / raw)
  To: xen-devel

On 7/20/05, David H <davidh.davidh@gmail.com> wrote:
> This system will be available for testing for the remainder of the
> week.  Please let me know if there is anything I can do to help
> resolve this problem.
> I would also like to take a moment to thank everyone involved in this
> project.  Xen is truly amazing and getting better all the time.

Ditto.

I am also willing to help in any capacity to try and squelch this. 
The system that I'm working on will also be available for about a week
or so.

Shane

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-07-21 17:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-19 17:15 Xen / Dell 2850 PERC 4e/Di lock up Shane Chen
2005-07-19 17:59 ` Will DeHaan
2005-07-19 18:08   ` Shane Chen
  -- strict thread matches above, loose matches on Subject: below --
2005-07-19 19:19 Ian Pratt
2005-07-19 21:39 ` Shane Chen
2005-07-20 11:02   ` Keir Fraser
2005-07-20 17:24     ` David H
2005-07-21 17:04       ` Shane Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.