From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arnd Hannemann Subject: Re: xen dom0 2.6.32.15 kernel BUG at drivers/xen/grant-table.c:583 Date: Mon, 14 Jun 2010 14:44:36 +0200 Message-ID: <4C162434.5050708@nets.rwth-aachen.de> References: <4C15E000.7060509@nets.rwth-aachen.de> <4C161FFF.4050102@nets.rwth-aachen.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7BIT Return-path: In-reply-to: <4C161FFF.4050102@nets.rwth-aachen.de> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Stefano Stabellini Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org Am 14.06.2010 14:26, schrieb Arnd Hannemann: > Hi, > > Am 14.06.2010 12:57, schrieb Stefano Stabellini: >> On Mon, 14 Jun 2010, Arnd Hannemann wrote: >>> Hi, >>> >>> we have regular but hard to reproduce (wait for a day or two starting domUs) kernel panics (see below) with latest >>> "xen/stable-2.6.32.x" git tree. >>> >>> Any idea, anyone? >>> >> >> this CS from origin/xen/dom0/gntdev should fix your problem: >> >> sstabellini@kaball-desktop:~/xensource/linux-pvops-latest$ git show ad469f0da31bc16b945f9a06710b9d45434d0091 >> commit ad469f0da31bc16b945f9a06710b9d45434d0091 >> Author: Stefano Stabellini >> Date: Wed Jun 9 12:34:02 2010 -0700 >> >> xen/gntdev: use spinlocks rather than rwsem for locking >> >> The mmu notifier mechanism calls its callbacks with an rcu lock, >> which disables preemption. This means we cannot use any blocking >> synchronization for locking. >> >> Convert all the rwsemas to plain spinlocks. This requires that >> the memory allocation and copying to/from userspace be split >> from the actual datastructure updates since they can't be done >> under spinlock. >> >> Signed-off-by: Stefano Stabellini >> Signed-off-by: Jeremy Fitzhardinge >> > > Unfortunately, this patch does not seem to help. We get a very similar > backtrace after one hour stress testing with a script starting and stopping > domUs in a loop. > > Maybe the problem is the hypervisor itself? > We are currently using 4.0.1-rc2-pre (we updated from 4.0.0 because of what we believed was the same > problem, we had no working netconsole back then though). FYI: I got lucky and reproduced the error within only 15 minutes and hypervisor version: (XEN) Xen version 4.0.1-rc3-pre (samsel@umic-mesh.net) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) Mon Jun 14 12:43:49 CEST 2010 (XEN) Latest ChangeSet: Fri Jun 11 14:04:36 2010 +0100 21203:3903d95733f7 traceback below Jun 14 14:38:14 vmhost2 [ 201.636188] ------------[ cut here ]------------ Jun 14 14:38:14 vmhost2 [ 201.636272] kernel BUG at drivers/xen/grant-table.c:583! Jun 14 14:38:14 vmhost2 [ 201.636345] invalid opcode: 0000 [#1] Jun 14 14:38:14 vmhost2 SMP Jun 14 14:38:14 vmhost2 Jun 14 14:38:14 vmhost2 [ 201.636503] last sysfs file: /sys/devices/virtual/net/br0/bridge/topology_change_detected Jun 14 14:38:14 vmhost2 [ 201.636596] Modules linked in: Jun 14 14:38:14 vmhost2 netconsole Jun 14 14:38:14 vmhost2 raid0 Jun 14 14:38:14 vmhost2 md_mod Jun 14 14:38:14 vmhost2 rtc_cmos Jun 14 14:38:14 vmhost2 rtc_core Jun 14 14:38:14 vmhost2 rtc_lib Jun 14 14:38:14 vmhost2 thermal Jun 14 14:38:14 vmhost2 processor Jun 14 14:38:14 vmhost2 ipv6 Jun 14 14:38:14 vmhost2 thermal_sys Jun 14 14:38:14 vmhost2 hwmon Jun 14 14:38:14 vmhost2 button Jun 14 14:38:14 vmhost2 acpi_processor Jun 14 14:38:14 vmhost2 sr_mod Jun 14 14:38:14 vmhost2 pl2303 Jun 14 14:38:14 vmhost2 cdrom Jun 14 14:38:14 vmhost2 usbserial Jun 14 14:38:14 vmhost2 evdev Jun 14 14:38:14 vmhost2 Jun 14 14:38:14 vmhost2 [ 201.637553] Jun 14 14:38:14 vmhost2 [ 201.637619] Pid: 0, comm: swapper Not tainted (2.6.32.15-xen4.0.0-dom0-stefano #2) System Product Name Jun 14 14:38:14 vmhost2 [ 201.637715] EIP: 0061:[] EFLAGS: 00010282 CPU: 0 Jun 14 14:38:14 vmhost2 [ 201.637792] EIP is at gnttab_copy_grant_page+0x1f0/0x260 Jun 14 14:38:14 vmhost2 [ 201.637864] EAX: ffffffea EBX: c153be84 ECX: 00000001 EDX: 00000000 Jun 14 14:38:14 vmhost2 [ 201.637937] ESI: 00007ff0 EDI: 0000000f EBP: c290d120 ESP: c153be50 Jun 14 14:38:14 vmhost2 [ 201.638022] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 Jun 14 14:38:14 vmhost2 [ 201.638096] Process swapper (pid: 0, ti=c153a000 task=c1543760 task.ti=c153a000) Jun 14 14:38:14 vmhost2 [ 201.638187] Stack: Jun 14 14:38:14 vmhost2 [ 201.638251] 00000000 Jun 14 14:38:14 vmhost2 00213e1c Jun 14 14:38:14 vmhost2 c28f20c0 Jun 14 14:38:14 vmhost2 0002c189 Jun 14 14:38:14 vmhost2 ec189000 Jun 14 14:38:14 vmhost2 ecd95944 Jun 14 14:38:14 vmhost2 0000000f Jun 14 14:38:14 vmhost2 ec189000 Jun 14 14:38:14 vmhost2 Jun 14 14:38:14 vmhost2 [ 201.638634] <0> Jun 14 14:38:14 vmhost2 00000000 Jun 14 14:38:14 vmhost2 eb406000 Jun 14 14:38:14 vmhost2 00000000 Jun 14 14:38:14 vmhost2 0000000f Jun 14 14:38:14 vmhost2 ece40000 Jun 14 14:38:14 vmhost2 13e1c001 Jun 14 14:38:14 vmhost2 00000000 Jun 14 14:38:14 vmhost2 0002c189 Jun 14 14:38:14 vmhost2 Jun 14 14:38:14 vmhost2 [ 201.639115] <0> Jun 14 14:38:14 vmhost2 00000000 Jun 14 14:38:14 vmhost2 c1627a8c Jun 14 14:38:14 vmhost2 c16277c8 Jun 14 14:38:14 vmhost2 c1627a8c Jun 14 14:38:14 vmhost2 000068c4 Jun 14 14:38:14 vmhost2 c12200c1 Jun 14 14:38:14 vmhost2 00000000 Jun 14 14:38:14 vmhost2 ebce8000 Jun 14 14:38:14 vmhost2 Jun 14 14:38:14 vmhost2 [ 201.639655] Call Trace: Jun 14 14:38:14 vmhost2 [ 201.639729] [] ? net_tx_action+0x1d1/0x9b0 Jun 14 14:38:14 vmhost2 [ 201.639805] [] ? process_backlog+0x90/0xa0 Jun 14 14:38:14 vmhost2 [ 201.639882] [] ? tasklet_action+0x9e/0xb0 Jun 14 14:38:14 vmhost2 [ 201.639956] [] ? __do_softirq+0x88/0x110 Jun 14 14:38:14 vmhost2 [ 201.640032] [] ? __xen_evtchn_do_upcall+0xd7/0x160 Jun 14 14:38:14 vmhost2 [ 201.640108] [] ? do_softirq+0x3d/0x40 Jun 14 14:38:14 vmhost2 [ 201.640184] [] ? xen_evtchn_do_upcall+0x2a/0x40 Jun 14 14:38:14 vmhost2 [ 201.640261] [] ? xen_do_upcall+0x7/0xc Jun 14 14:38:14 vmhost2 [ 201.640336] [] ? hypercall_page+0x3a7/0x1010 Jun 14 14:38:14 vmhost2 [ 201.640411] [] ? xen_safe_halt+0xf/0x20 Jun 14 14:38:14 vmhost2 [ 201.640486] [] ? xen_idle+0x1c/0x30 Jun 14 14:38:14 vmhost2 [ 201.640560] [] ? cpu_idle+0x3a/0x60 Jun 14 14:38:14 vmhost2 [ 201.640635] [] ? start_kernel+0x2c6/0x2cb Jun 14 14:38:14 vmhost2 [ 201.640710] [] ? unknown_bootoption+0x0/0x190 Jun 14 14:38:14 vmhost2 [ 201.640786] [] ? xen_start_kernel+0x624/0x62c Jun 14 14:38:14 vmhost2 [ 201.640857] Code: Jun 14 14:38:14 vmhost2 8d Jun 14 14:38:14 vmhost2 5c Jun 14 14:38:14 vmhost2 24 Jun 14 14:38:14 vmhost2 34 Jun 14 14:38:14 vmhost2 c1 Jun 14 14:38:14 vmhost2 e0 Jun 14 14:38:14 vmhost2 0c Jun 14 14:38:14 vmhost2 83 Jun 14 14:38:14 vmhost2 c8 Jun 14 14:38:14 vmhost2 01 Jun 14 14:38:14 vmhost2 89 Jun 14 14:38:14 vmhost2 44 Jun 14 14:38:14 vmhost2 24 Jun 14 14:38:14 vmhost2 34 Jun 14 14:38:14 vmhost2 8b Jun 14 14:38:14 vmhost2 44 Jun 14 14:38:14 vmhost2 24 Jun 14 14:38:14 vmhost2 0c Jun 14 14:38:14 vmhost2 c7 Jun 14 14:38:14 vmhost2 44 Jun 14 14:38:14 vmhost2 24 Jun 14 14:38:14 vmhost2 40 Jun 14 14:38:14 vmhost2 00 Jun 14 14:38:14 vmhost2 00 Jun 14 14:38:14 vmhost2 00 Jun 14 14:38:14 vmhost2 00 Jun 14 14:38:14 vmhost2 89 Jun 14 14:38:14 vmhost2 44 Jun 14 14:38:14 vmhost2 24 Jun 14 14:38:14 vmhost2 3c Jun 14 14:38:14 vmhost2 e8 Jun 14 14:38:14 vmhost2 b8 Jun 14 14:38:14 vmhost2 1e Jun 14 14:38:14 vmhost2 df Jun 14 14:38:14 vmhost2 ff Jun 14 14:38:14 vmhost2 85 Jun 14 14:38:14 vmhost2 c0 Jun 14 14:38:14 vmhost2 0f Jun 14 14:38:14 vmhost2 84 Jun 14 14:38:14 vmhost2 2c Jun 14 14:38:14 vmhost2 ff Jun 14 14:38:14 vmhost2 ff Jun 14 14:38:14 vmhost2 ff Jun 14 12:38:13 vmhost2 unparseable log message: "<0f> " Jun 14 14:38:14 vmhost2 0b Jun 14 14:38:14 vmhost2 eb Jun 14 14:38:14 vmhost2 fe Jun 14 14:38:14 vmhost2 0f Jun 14 14:38:14 vmhost2 0b Jun 14 14:38:14 vmhost2 eb Jun 14 14:38:14 vmhost2 fe Jun 14 14:38:14 vmhost2 0f Jun 14 14:38:14 vmhost2 0b Jun 14 14:38:14 vmhost2 eb Jun 14 14:38:14 vmhost2 fe Jun 14 14:38:14 vmhost2 0f Jun 14 14:38:14 vmhost2 0b Jun 14 14:38:14 vmhost2 eb Jun 14 14:38:14 vmhost2 fe Jun 14 14:38:14 vmhost2 8b Jun 14 14:38:14 vmhost2 54 Jun 14 14:38:14 vmhost2 24 Jun 14 14:38:14 vmhost2 04 Jun 14 14:38:14 vmhost2 8b Jun 14 14:38:14 vmhost2 44 Jun 14 14:38:14 vmhost2 24 Jun 14 14:38:14 vmhost2 0c Jun 14 14:38:14 vmhost2 e8 Jun 14 14:38:14 vmhost2 Jun 14 14:38:14 vmhost2 [ 201.643843] EIP: [] Jun 14 14:38:14 vmhost2 gnttab_copy_grant_page+0x1f0/0x260 Jun 14 14:38:14 vmhost2 SS:ESP 0069:c153be50 Jun 14 14:38:14 vmhost2 [ 201.644028] ---[ end trace af6399fb7ba91a18 ]--- Jun 14 14:38:14 vmhost2 [ 201.644098] Kernel panic - not syncing: Fatal exception in interrupt Jun 14 14:38:14 vmhost2 [ 201.644173] Pid: 0, comm: swapper Tainted: G D 2.6.32.15-xen4.0.0-dom0-stefano #2 Jun 14 14:38:14 vmhost2 [ 201.644265] Call Trace: Jun 14 14:38:14 vmhost2 [ 201.644336] [] ? panic+0x42/0xe1 Jun 14 14:38:14 vmhost2 [ 201.644408] [] ? oops_end+0x96/0xa0 Jun 14 14:38:14 vmhost2 [ 201.644481] [] ? do_invalid_op+0x7f/0x90 Jun 14 14:38:14 vmhost2 [ 201.644555] [] ? gnttab_copy_grant_page+0x1f0/0x260 Jun 14 14:38:14 vmhost2 [ 201.644632] [] ? br_nf_pre_routing_finish+0x0/0x310 Jun 14 14:38:14 vmhost2 [ 201.644709] [] ? nf_hook_slow+0x62/0xe0 Jun 14 14:38:14 vmhost2 [ 201.644784] [] ? __alloc_pages_nodemask+0xe4/0x5b0 Jun 14 14:38:14 vmhost2 [ 201.644860] [] ? handle_IRQ_event+0x5d/0xc0 Jun 14 14:38:14 vmhost2 [ 201.644935] [] ? error_code+0x66/0x6c Jun 14 14:38:14 vmhost2 [ 201.645009] [] ? dev_graft_qdisc+0x5b/0x70 Jun 14 14:38:14 vmhost2 [ 201.645083] [] ? do_invalid_op+0x0/0x90 Jun 14 14:38:14 vmhost2 [ 201.645157] [] ? gnttab_copy_grant_page+0x1f0/0x260 Jun 14 14:38:14 vmhost2 [ 201.645234] [] ? net_tx_action+0x1d1/0x9b0 Jun 14 14:38:14 vmhost2 [ 201.645308] [] ? process_backlog+0x90/0xa0 Jun 14 14:38:14 vmhost2 [ 201.645382] [] ? tasklet_action+0x9e/0xb0 Jun 14 14:38:14 vmhost2 [ 201.645455] [] ? __do_softirq+0x88/0x110 Jun 14 14:38:14 vmhost2 [ 201.645529] [] ? __xen_evtchn_do_upcall+0xd7/0x160 Jun 14 14:38:14 vmhost2 [ 201.645604] [] ? do_softirq+0x3d/0x40 Jun 14 14:38:14 vmhost2 [ 201.645677] [] ? xen_evtchn_do_upcall+0x2a/0x40 Jun 14 14:38:14 vmhost2 [ 201.645754] [] ? xen_do_upcall+0x7/0xc Jun 14 14:38:14 vmhost2 [ 201.645830] [] ? hypercall_page+0x3a7/0x1010 Jun 14 14:38:14 vmhost2 [ 201.645904] [] ? xen_safe_halt+0xf/0x20 Jun 14 14:38:14 vmhost2 [ 201.645989] [] ? xen_idle+0x1c/0x30 Jun 14 14:38:14 vmhost2 [ 201.646063] [] ? cpu_idle+0x3a/0x60 Jun 14 14:38:14 vmhost2 [ 201.646139] [] ? start_kernel+0x2c6/0x2cb Jun 14 14:38:14 vmhost2 [ 201.646213] [] ? unknown_bootoption+0x0/0x190 Jun 14 14:38:14 vmhost2 [ 201.646288] [] ? xen_start_kernel+0x624/0x62c