All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: xen dom0 2.6.32.15 kernel BUG at drivers/xen/grant-table.c:583
Date: Mon, 14 Jun 2010 14:26:39 +0200	[thread overview]
Message-ID: <4C161FFF.4050102@nets.rwth-aachen.de> (raw)
In-Reply-To: <alpine.DEB.2.00.1006141156170.3401@kaball-desktop>

Hi,

Am 14.06.2010 12:57, schrieb Stefano Stabellini:
> On Mon, 14 Jun 2010, Arnd Hannemann wrote:
>> Hi,
>>
>> we have regular but hard to reproduce (wait for a day or two starting domUs) kernel panics (see below) with latest
>> "xen/stable-2.6.32.x" git tree.
>>
>> Any idea, anyone?
>>
> 
> this CS from origin/xen/dom0/gntdev should fix your problem:
> 
> sstabellini@kaball-desktop:~/xensource/linux-pvops-latest$ git show ad469f0da31bc16b945f9a06710b9d45434d0091
> commit ad469f0da31bc16b945f9a06710b9d45434d0091
> Author: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
> Date:   Wed Jun 9 12:34:02 2010 -0700
> 
>     xen/gntdev: use spinlocks rather than rwsem for locking
>     
>     The mmu notifier mechanism calls its callbacks with an rcu lock,
>     which disables preemption.  This means we cannot use any blocking
>     synchronization for locking.
>     
>     Convert all the rwsemas to plain spinlocks.  This requires that
>     the memory allocation and copying to/from userspace be split
>     from the actual datastructure updates since they can't be done
>     under spinlock.
>     
>     Signed-off-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 

Unfortunately, this patch does not seem to help. We get a very similar
backtrace after one hour stress testing with a script starting and stopping
domUs in a loop.

Maybe the problem is the hypervisor itself?
We are currently using 4.0.1-rc2-pre (we updated from 4.0.0 because of what we believed was the same
problem, we had no working netconsole back then though).

Jun 14 14:07:22 vmhost2 [ 2418.542425] ------------[ cut here ]------------
Jun 14 14:07:22 vmhost2 [ 2418.542475] kernel BUG at drivers/xen/grant-table.c:583!
Jun 14 14:07:22 vmhost2 [ 2418.542515] invalid opcode: 0000 [#1]
Jun 14 14:07:22 vmhost2 SMP
Jun 14 14:07:22 vmhost2
Jun 14 14:07:22 vmhost2 [ 2418.542574] last sysfs file: /sys/devices/virtual/net/br0/bridge/topology_change_detected
Jun 14 14:07:22 vmhost2 [ 2418.542640] Modules linked in:
Jun 14 14:07:22 vmhost2 netconsole
Jun 14 14:07:22 vmhost2 raid0
Jun 14 14:07:22 vmhost2 md_mod
Jun 14 14:07:22 vmhost2 rtc_cmos
Jun 14 14:07:22 vmhost2 rtc_core
Jun 14 14:07:22 vmhost2 rtc_lib
Jun 14 14:07:22 vmhost2 ipv6
Jun 14 14:07:22 vmhost2 thermal
Jun 14 14:07:22 vmhost2 processor
Jun 14 14:07:22 vmhost2 thermal_sys
Jun 14 14:07:22 vmhost2 hwmon
Jun 14 14:07:22 vmhost2 pl2303
Jun 14 14:07:22 vmhost2 button
Jun 14 14:07:22 vmhost2 acpi_processor
Jun 14 14:07:22 vmhost2 usbserial
Jun 14 14:07:22 vmhost2 sr_mod
Jun 14 14:07:22 vmhost2 evdev
Jun 14 14:07:22 vmhost2 cdrom
Jun 14 14:07:22 vmhost2
Jun 14 14:07:22 vmhost2 [ 2418.542937]
Jun 14 14:07:22 vmhost2 [ 2418.542970] Pid: 0, comm: swapper Not tainted (2.6.32.15-xen4.0.0-dom0-stefano #2) System Product Name
Jun 14 14:07:22 vmhost2 [ 2418.543034] EIP: 0061:[<c120f170>] EFLAGS: 00010282 CPU: 0
Jun 14 14:07:22 vmhost2 [ 2418.543077] EIP is at gnttab_copy_grant_page+0x1f0/0x260
Jun 14 14:07:22 vmhost2 [ 2418.543117] EAX: ffffffea EBX: c153be84 ECX: 00000001 EDX: 00000000
Jun 14 14:07:22 vmhost2 [ 2418.543158] ESI: 00007ff0 EDI: 00000013 EBP: c290e660 ESP: c153be50
Jun 14 14:07:22 vmhost2 [ 2418.543199]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Jun 14 14:07:22 vmhost2 [ 2418.543239] Process swapper (pid: 0, ti=c153a000 task=c1543760 task.ti=c153a000)
Jun 14 14:07:22 vmhost2 [ 2418.543297] Stack:
Jun 14 14:07:22 vmhost2 [ 2418.543329]  00000000
Jun 14 14:07:22 vmhost2 00213784
Jun 14 14:07:22 vmhost2 c2904dc0
Jun 14 14:07:22 vmhost2 0002c233
Jun 14 14:07:22 vmhost2 ec233000
Jun 14 14:07:22 vmhost2 ecf85bec
Jun 14 14:07:22 vmhost2 00000013
Jun 14 14:07:22 vmhost2 ec233000
Jun 14 14:07:22 vmhost2
Jun 14 14:07:22 vmhost2 [ 2418.543461] <0>
Jun 14 14:07:22 vmhost2 00000000
Jun 14 14:07:22 vmhost2 ebd6e000
Jun 14 14:07:22 vmhost2 00000000
Jun 14 14:07:22 vmhost2 00000013
Jun 14 14:07:22 vmhost2 c1350000
Jun 14 14:07:22 vmhost2 13784001
Jun 14 14:07:22 vmhost2 00000000
Jun 14 14:07:22 vmhost2 0002c233
Jun 14 14:07:22 vmhost2
Jun 14 14:07:22 vmhost2 [ 2418.543616] <0>
Jun 14 14:07:22 vmhost2 00000000
Jun 14 14:07:22 vmhost2 c1628284
Jun 14 14:07:22 vmhost2 c155b978
Jun 14 14:07:22 vmhost2 c1628284
Jun 14 14:07:22 vmhost2 00560014
Jun 14 14:07:22 vmhost2 c12200c1
Jun 14 14:07:22 vmhost2 00000001
Jun 14 14:07:22 vmhost2 00000000
Jun 14 14:07:22 vmhost2
Jun 14 14:07:22 vmhost2 [ 2418.543797] Call Trace:
Jun 14 14:07:22 vmhost2 [ 2418.543838]  [<c1350000>] ? sock_release+0x10/0x80
Jun 14 14:07:22 vmhost2 [ 2418.543882]  [<c12200c1>] ? net_tx_action+0x1d1/0x9b0
Jun 14 14:07:22 vmhost2 [ 2418.543925]  [<c103bc2e>] ? tasklet_action+0x9e/0xb0
Jun 14 14:07:22 vmhost2 [ 2418.543967]  [<c103c378>] ? __do_softirq+0x88/0x110
Jun 14 14:07:22 vmhost2 [ 2418.544009]  [<c1210057>] ? __xen_evtchn_do_upcall+0xd7/0x160
Jun 14 14:07:22 vmhost2 [ 2418.544053]  [<c103c43d>] ? do_softirq+0x3d/0x40
Jun 14 14:07:22 vmhost2 [ 2418.544094]  [<c121063a>] ? xen_evtchn_do_upcall+0x2a/0x40
Jun 14 14:07:22 vmhost2 [ 2418.544147]  [<c1009da7>] ? xen_do_upcall+0x7/0xc
Jun 14 14:07:22 vmhost2 [ 2418.544190]  [<c10013a7>] ? hypercall_page+0x3a7/0x1010
Jun 14 14:07:22 vmhost2 [ 2418.544234]  [<c10061ef>] ? xen_safe_halt+0xf/0x20
Jun 14 14:07:22 vmhost2 [ 2418.544275]  [<c100382c>] ? xen_idle+0x1c/0x30
Jun 14 14:07:22 vmhost2 [ 2418.544316]  [<c10081fa>] ? cpu_idle+0x3a/0x60
Jun 14 14:07:22 vmhost2 [ 2418.544359]  [<c15787ef>] ? start_kernel+0x2c6/0x2cb
Jun 14 14:07:22 vmhost2 [ 2418.544401]  [<c1578367>] ? unknown_bootoption+0x0/0x190
Jun 14 14:07:22 vmhost2 [ 2418.544444]  [<c157b0e6>] ? xen_start_kernel+0x624/0x62c
Jun 14 14:07:22 vmhost2 [ 2418.544483] Code:
Jun 14 14:07:22 vmhost2 8d
Jun 14 14:07:22 vmhost2 5c
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 34
Jun 14 14:07:22 vmhost2 c1
Jun 14 14:07:22 vmhost2 e0
Jun 14 14:07:22 vmhost2 0c
Jun 14 14:07:22 vmhost2 83
Jun 14 14:07:22 vmhost2 c8
Jun 14 14:07:22 vmhost2 01
Jun 14 14:07:22 vmhost2 89
Jun 14 14:07:22 vmhost2 44
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 34
Jun 14 14:07:22 vmhost2 8b
Jun 14 14:07:22 vmhost2 44
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 0c
Jun 14 14:07:22 vmhost2 c7
Jun 14 14:07:22 vmhost2 44
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 40
Jun 14 14:07:22 vmhost2 00
Jun 14 14:07:22 vmhost2 00
Jun 14 14:07:22 vmhost2 00
Jun 14 14:07:22 vmhost2 00
Jun 14 14:07:22 vmhost2 89
Jun 14 14:07:22 vmhost2 44
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 3c
Jun 14 14:07:22 vmhost2 e8
Jun 14 14:07:22 vmhost2 b8
Jun 14 14:07:22 vmhost2 1e
Jun 14 14:07:22 vmhost2 df
Jun 14 14:07:22 vmhost2 ff
Jun 14 14:07:22 vmhost2 85
Jun 14 14:07:22 vmhost2 c0
Jun 14 14:07:22 vmhost2 0f
Jun 14 14:07:22 vmhost2 84
Jun 14 14:07:22 vmhost2 2c
Jun 14 14:07:22 vmhost2 ff
Jun 14 14:07:22 vmhost2 ff
Jun 14 14:07:22 vmhost2 ff
Jun 14 12:07:21 vmhost2 unparseable log message: "<0f> "
Jun 14 14:07:22 vmhost2 0b
Jun 14 14:07:22 vmhost2 eb
Jun 14 14:07:22 vmhost2 fe
Jun 14 14:07:22 vmhost2 0f
Jun 14 14:07:22 vmhost2 0b
Jun 14 14:07:22 vmhost2 eb
Jun 14 14:07:22 vmhost2 fe
Jun 14 14:07:22 vmhost2 0f
Jun 14 14:07:22 vmhost2 0b
Jun 14 14:07:22 vmhost2 eb
Jun 14 14:07:22 vmhost2 fe
Jun 14 14:07:22 vmhost2 8b
Jun 14 14:07:22 vmhost2 54
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 04
Jun 14 14:07:22 vmhost2 8b
Jun 14 14:07:22 vmhost2 44
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 0c
Jun 14 14:07:22 vmhost2 e8
Jun 14 14:07:22 vmhost2
Jun 14 14:07:22 vmhost2 [ 2418.545277] EIP: [<c120f170>]
Jun 14 14:07:22 vmhost2 gnttab_copy_grant_page+0x1f0/0x260
Jun 14 14:07:22 vmhost2 SS:ESP 0069:c153be50
Jun 14 14:07:22 vmhost2 [ 2418.545597] ---[ end trace f877a40240218318 ]---
Jun 14 14:07:22 vmhost2 [ 2418.545669] Kernel panic - not syncing: Fatal exception in interrupt
Jun 14 14:07:22 vmhost2 [ 2418.545746] Pid: 0, comm: swapper Tainted: G      D    2.6.32.15-xen4.0.0-dom0-stefano #2
Jun 14 14:07:22 vmhost2 [ 2418.545840] Call Trace:
Jun 14 14:07:22 vmhost2 [ 2418.545912]  [<c141d3b5>] ? panic+0x42/0xe1
Jun 14 14:07:22 vmhost2 [ 2418.545986]  [<c100cc56>] ? oops_end+0x96/0xa0
Jun 14 14:07:22 vmhost2 [ 2418.546060]  [<c100a73f>] ? do_invalid_op+0x7f/0x90
Jun 14 14:07:22 vmhost2 [ 2418.546135]  [<c120f170>] ? gnttab_copy_grant_page+0x1f0/0x260
Jun 14 14:07:22 vmhost2 [ 2418.546223]  [<c10741e4>] ? __alloc_pages_nodemask+0xe4/0x5b0
Jun 14 14:07:22 vmhost2 [ 2418.546303]  [<c1006197>] ? xen_force_evtchn_callback+0x17/0x30
Jun 14 14:07:22 vmhost2 [ 2418.546380]  [<c1006a98>] ? check_events+0x8/0xc
Jun 14 14:07:22 vmhost2 [ 2418.546455]  [<c141faa6>] ? error_code+0x66/0x6c
Jun 14 14:07:22 vmhost2 [ 2418.546530]  [<c100a6c0>] ? do_invalid_op+0x0/0x90
Jun 14 14:07:22 vmhost2 [ 2418.546606]  [<c120f170>] ? gnttab_copy_grant_page+0x1f0/0x260
Jun 14 14:07:22 vmhost2 [ 2418.546687]  [<c1350000>] ? sock_release+0x10/0x80
Jun 14 14:07:22 vmhost2 [ 2418.546763]  [<c12200c1>] ? net_tx_action+0x1d1/0x9b0
Jun 14 14:07:22 vmhost2 [ 2418.546839]  [<c103bc2e>] ? tasklet_action+0x9e/0xb0
Jun 14 14:07:22 vmhost2 [ 2418.546915]  [<c103c378>] ? __do_softirq+0x88/0x110
Jun 14 14:07:22 vmhost2 [ 2418.546993]  [<c1210057>] ? __xen_evtchn_do_upcall+0xd7/0x160
Jun 14 14:07:22 vmhost2 [ 2418.547070]  [<c103c43d>] ? do_softirq+0x3d/0x40
Jun 14 14:07:22 vmhost2 [ 2418.547145]  [<c121063a>] ? xen_evtchn_do_upcall+0x2a/0x40
Jun 14 14:07:22 vmhost2 [ 2418.547222]  [<c1009da7>] ? xen_do_upcall+0x7/0xc
Jun 14 14:07:22 vmhost2 [ 2418.547299]  [<c10013a7>] ? hypercall_page+0x3a7/0x1010
Jun 14 14:07:22 vmhost2 [ 2418.547385]  [<c10061ef>] ? xen_safe_halt+0xf/0x20
Jun 14 14:07:22 vmhost2 [ 2418.547463]  [<c100382c>] ? xen_idle+0x1c/0x30
Jun 14 14:07:22 vmhost2 [ 2418.547537]  [<c10081fa>] ? cpu_idle+0x3a/0x60
Jun 14 14:07:22 vmhost2 [ 2418.547615]  [<c15787ef>] ? start_kernel+0x2c6/0x2cb
Jun 14 14:07:22 vmhost2 [ 2418.547690]  [<c1578367>] ? unknown_bootoption+0x0/0x190
Jun 14 14:07:22 vmhost2 [ 2418.547766]  [<c157b0e6>] ? xen_start_kernel+0x624/0x62c

Best regards,
Arnd

  parent reply	other threads:[~2010-06-14 12:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-14  7:53 xen dom0 2.6.32.15 kernel BUG at drivers/xen/grant-table.c:583 Arnd Hannemann
2010-06-14 10:56 ` Jeremy Fitzhardinge
2010-06-14 11:04   ` Arnd Hannemann
2010-06-14 10:57 ` Stefano Stabellini
2010-06-14 11:09   ` Arnd Hannemann
2010-06-14 12:26   ` Arnd Hannemann [this message]
2010-06-14 12:44     ` Arnd Hannemann
  -- strict thread matches above, loose matches on Subject: below --
2010-06-21  8:37 Christian Samsel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C161FFF.4050102@nets.rwth-aachen.de \
    --to=hannemann@nets.rwth-aachen.de \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.