All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Gael Le Mignot <gael@pilotsystems.net>
Cc: linux-kernel@vger.kernel.org, Ian Campbell <Ian.Campbell@citrix.com>
Subject: Re: Kernel bug on Xen DOM0 with SAS, swraid, lvm
Date: Mon, 01 Nov 2010 16:15:34 -0400	[thread overview]
Message-ID: <4CCF1FE6.3090804@goop.org> (raw)
In-Reply-To: <plop87eib6msf5.fsf@aoskar.kilobug.org>

 On 10/31/2010 08:57 AM, Gael Le Mignot wrote:
> Hello,
>
> We are using the following setup :
>
> - A  Xen DOM0  with  several  Xen DOMUs,  all  running Debian  GNU/Linux
>   stable, with kernel 2.6.26-2-xen-amd64 ;

Which dom0 kernel are you using?  If its a Debian kernel then you should
probably get in touch with them to report the problem, and/or xen-devel.

    J


> - Disks are either SATA or SAS disks (2 SAS disks for performances and 4
>   SATA disks for bulk storage), all paired in software RAID1, and we use
>   LVM on top of that.
>
> We usually have  no problem with this setup which  is running for almost
> two  years, except  today we  had a  kernel BUG  (I'll  include hardware
> details and syslog trace afterwards).
>
> After this kernel  BUG, most operations were still  working, but several
> disk  related   ones  like   "cat  /proc/mdstat",  "lvs"   were  hanging
> undefinitely, and starting  a Xen VM would fail with  a timeout. Doing a
> reboot fixed everything.
>
> Since  it  included  a  "BUG:  unable  to  handle  kernel  NULL  pointer
> dereference at 0000000000000000", I guess it  is a kernel bug, not a bug
> in Xen or a hardware problem. I know the kernel is a bit old, but we use
> stable Debian on production servers.
>
> If the kernel is  too old for the bug report to  be useful, feel free to
> ignore  it,  but  in  doubt,  I  prefer to  submit  it.  I  can  provide
> additionnal details if needed.
>
> -------
> Here is the syslog about the problem :
> -------
>
> Oct 31 08:09:13 thelma kernel: [8150666.955924] mptscsih: ioc2: attempting task abort! (sc=ffff880039a546c0)
> Oct 31 08:09:13 thelma kernel: [8150666.955967] sd 2:0:5:0: [sdi] CDB: Read(10): 28 00 2d b4 bc af 00 00 08 00
> Oct 31 08:09:16 thelma kernel: [8150669.309069] mptbase: ioc2: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
> Oct 31 08:09:16 thelma kernel: [8150669.517393] mptscsih: ioc2: task abort: SUCCESS (sc=ffff880039a546c0)
> Oct 31 08:09:23 thelma kernel: [8150676.065107] mptbase: ioc2: LogInfo(0x31111000): Originator={PL}, Code={Reset}, SubCode(0x1000)
> Oct 31 08:09:26 thelma kernel: [8150679.516012] mptscsih: ioc2: attempting task abort! (sc=ffff880039a546c0)
> Oct 31 08:09:26 thelma kernel: [8150679.516053] sd 2:0:5:0: [sdi] CDB: Test Unit Ready: 00 00 00 00 00 00
> Oct 31 08:09:27 thelma kernel: [8150680.558931] mptbase: ioc2: LogInfo(0x31111000): Originator={PL}, Code={Reset}, SubCode(0x1000)
> Oct 31 08:09:30 thelma kernel: [8150683.809175] mptsas: ioc2: removing sata device, channel 0, id 7, phy 6
> Oct 31 08:09:30 thelma kernel: [8150683.809216]  port-2:5: mptsas: ioc2: delete port (5)
> Oct 31 08:09:30 thelma kernel: [8150683.867264] mptscsih: ioc2: task abort: SUCCESS (sc=ffff880039a546c0)
> Oct 31 08:09:30 thelma kernel: [8150683.867302] mptscsih: ioc2: attempting task abort! (sc=ffff88003a8eae80)
> Oct 31 08:09:30 thelma kernel: [8150683.867335] scsi 2:0:5:0: [sdi] CDB: Read(10): 28 00 3a 87 82 c7 00 00 08 00
> Oct 31 08:09:30 thelma kernel: [8150683.867434] mptscsih: ioc2: task abort: SUCCESS (sc=ffff88003a8eae80)
> Oct 31 08:09:30 thelma kernel: [8150683.867472] mptscsih: ioc2: attempting bus reset! (sc=ffff880039a546c0)
> Oct 31 08:09:30 thelma kernel: [8150683.867504] scsi 2:0:5:0: [sdi] CDB: Read(10): 28 00 2d b4 bc af 00 00 08 00
> Oct 31 08:09:30 thelma kernel: [8150683.867615] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> Oct 31 08:09:30 thelma kernel: [8150683.867672] IP: [<ffffffffa006459a>] :mptscsih:mptscsih_bus_reset+0x95/0x104
> Oct 31 08:09:30 thelma kernel: [8150683.867672] IP: [<ffffffffa006459a>] :mptscsih:mptscsih_bus_reset+0x95/0x104
> Oct 31 08:09:30 thelma kernel: [8150683.867733] PGD 3fddd067 PUD 3dd51067 PMD 0
> Oct 31 08:09:30 thelma kernel: [8150683.867771] Oops: 0000 [1] SMP
> Oct 31 08:09:30 thelma kernel: [8150683.867804] CPU 0
> Oct 31 08:09:30 thelma kernel: [8150683.867831] Modules linked in: tcp_diag inet_diag xt_tcpudp xt_physdev iptable_filter ip_tables x_tables ipv6 8021q bridge raid1 \
> md_mod loop parport_pc parport psmouse pcspkr serio_raw i2c_i801 i2c_core rng_core i5000_edac edac_core shpchp pci_hotplug container button joydev evdev ext3 jbd mbc\
> ache dm_mirror dm_log dm_snapshot dm_mod ide_disk ata_generic libata dock usbhid hid ff_memless sd_mod piix floppy ide_pci_generic ide_core ehci_hcd uhci_hcd e1000e \
> mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
> Oct 31 08:09:30 thelma kernel: [8150683.868277] Pid: 860, comm: scsi_eh_2 Not tainted 2.6.26-2-xen-amd64 #1
> Oct 31 08:09:30 thelma kernel: [8150683.868310] RIP: e030:[<ffffffffa006459a>]  [<ffffffffa006459a>] :mptscsih:mptscsih_bus_reset+0x95/0x104
> Oct 31 08:09:30 thelma kernel: [8150683.868370] RSP: e02b:ffff88003d89fe00  EFLAGS: 00010246
> Oct 31 08:09:30 thelma kernel: [8150683.868400] RAX: ffff880032ac8002 RBX: ffff880039a546c0 RCX: 000000000000000a
> Oct 31 08:09:30 thelma kernel: [8150683.868449] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff805aaab0
> Oct 31 08:09:30 thelma kernel: [8150683.868497] RBP: ffff88003d85f5e8 R08: 0000000000000001 R09: 00000000ffffff20
> Oct 31 08:09:30 thelma kernel: [8150683.868546] R10: 0000000000000000 R11: 0000010b7494482f R12: ffff88003dfe7008
> Oct 31 08:09:30 thelma kernel: [8150683.868595] R13: ffff88003dfe7000 R14: 0000000000000071 R15: ffff88003d85f000
> Oct 31 08:09:30 thelma kernel: [8150683.868646] FS:  00007fd31f1d8770(0000) GS:ffffffff8053a000(0000) knlGS:0000000000000000
> Oct 31 08:09:30 thelma kernel: [8150683.868696] CS:  e033 DS: 0000 ES: 0000
> Oct 31 08:09:30 thelma kernel: [8150683.868724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Oct 31 08:09:30 thelma kernel: [8150683.868773] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Oct 31 08:09:30 thelma kernel: [8150683.868821] Process scsi_eh_2 (pid: 860, threadinfo ffff88003d89e000, task ffff88003fe62040)
> Oct 31 08:09:30 thelma kernel: [8150683.870876] Stack:  00003a3a00000000 ffff880039a546c0 ffff880039a546c0 0000000000002003
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  ffff88003d89fee0 ffffffffa002027f ffff880039a546c0 ffff88003d89fec8
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  0000000000000000 ffffffffa0020f65 ffff88003d89fed0 ffff88003d89fec8
> Oct 31 08:09:30 thelma kernel: [8150683.870876] Call Trace:
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  [<ffffffffa002027f>] ? :scsi_mod:scsi_try_bus_reset+0x4c/0xb4
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  [<ffffffffa0020f65>] ? :scsi_mod:scsi_eh_ready_devs+0x3ce/0x5ac
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  [<ffffffffa0021870>] ? :scsi_mod:scsi_error_handler+0x312/0x4b7
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  [<ffffffff80221555>] ? __wake_up_common+0x41/0x74
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  [<ffffffffa002155e>] ? :scsi_mod:scsi_error_handler+0x0/0x4b7
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  [<ffffffff8023f527>] ? kthread+0x47/0x74
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  [<ffffffff802282ec>] ? schedule_tail+0x27/0x5c
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  [<ffffffff8020be28>] ? child_rip+0xa/0x12
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  [<ffffffff8023f4e0>] ? kthread+0x0/0x74
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  [<ffffffff8020be1e>] ? child_rip+0x0/0x12
> Oct 31 08:09:30 thelma kernel: [8150683.870876]
> Oct 31 08:09:30 thelma kernel: [8150683.870876]
> Oct 31 08:09:30 thelma kernel: [8150683.870876] Code: 00 00 00 48 8b 03 b9 28 00 00 00 48 8b 90 88 00 00 00 41 8a 85 98 00 00 00 84 c0 74 0e 31 c9 3c 02 0f 94 c1 8d \
> 0c cd 02 00 00 00 <48> 8b 02 45 31 c9 45 31 c0 be 04 00 00 00 48 89 ef 0f b6 50 0b
> Oct 31 08:09:30 thelma kernel: [8150683.870876] RIP  [<ffffffffa006459a>] :mptscsih:mptscsih_bus_reset+0x95/0x104
> Oct 31 08:09:30 thelma kernel: [8150683.870876]  RSP <ffff88003d89fe00>
> Oct 31 08:09:30 thelma kernel: [8150683.870876] CR2: 0000000000000000
> Oct 31 08:09:30 thelma kernel: [8150683.876328] ---[ end trace 2f918c612367ed7e ]---
>
>
> -----
> Hardware information
> -----
>
> Hardware is a dual quad-core Xeon, with 64Gb of ECC RAM (no ECC error in
> the IPMI log), with LSI SAS controllers.
>
> Here is a lspci :
>
> 00:00.0 Host bridge: Intel Corporation 5000P Chipset Memory Controller Hub (rev b1)
> 00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 2-3 (rev b1)
> 00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 4-5 (rev b1)
> 00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 6-7 (rev b1)
> 00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev b1)
> 00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
> 00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
> 00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
> 00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1)
> 00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1)
> 00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1)
> 00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1)
> 00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
> 00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09)
> 00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09)
> 00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09)
> 00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09)
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
> 00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09)
> 00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 09)
> 00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (rev 09)
> 01:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01)
> 01:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01)
> 02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01)
> 02:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E3 (rev 01)
> 03:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09)
> 03:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09)
> 04:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)
> 05:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)
> 06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
> 06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
> 07:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)
> 0b:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
>
> Regards,
>


      reply	other threads:[~2010-11-01 20:15 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-31 12:57 Kernel bug on Xen DOM0 with SAS, swraid, lvm Gael Le Mignot
2010-11-01 20:15 ` Jeremy Fitzhardinge [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CCF1FE6.3090804@goop.org \
    --to=jeremy@goop.org \
    --cc=Ian.Campbell@citrix.com \
    --cc=gael@pilotsystems.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.