From mboxrd@z Thu Jan 1 00:00:00 1970 From: Teck Choon Giam Subject: Re: BUG at xen4.1/kernel 2.6.32.35 at a CentOS 5.5 when starting a VM Date: Wed, 13 Apr 2011 00:41:24 +0800 Message-ID: References: <4D933ADB.8060106@alog.com.br> <4D9F247C.6030801@alog.com.br> <4D9F805E.2020806@alog.com.br> <20110412105946.GA23127@dumpdata.com> <4DA43548.1090104@alog.com.br> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <4DA43548.1090104@alog.com.br> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Gerd Jakobovitsch Cc: xen-devel@lists.xensource.com, Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On Tue, Apr 12, 2011 at 7:19 PM, Gerd Jakobovitsch wrote= : > Yes. It is the same scenario from previous bug. > > > On 04/12/2011 07:59 AM, Konrad Rzeszutek Wilk wrote: >> >> On Fri, Apr 08, 2011 at 06:38:38PM -0300, Gerd Jakobovitsch wrote: >>> >>> One more follow-up: >>> >>> Another kernel bug report, with no kernel debug activated: >> >> This is just with the guest starting right? Not running for a long time? Since you are running NFS related in storage and the latest xen/stable-2.6.32 commit has such fix backported http://git.kernel.org/?p=3Dlinux/kernel/git/jeremy/xen.git;a=3Dcommit;h=3Da= e333e97552c81ab10395ad1ffc6d6daaadb144a. Are you able to run your test with this? >> >>> r2b16ch2x28p2 kernel: [ 3243.777796] CR2: 00000000000002f0 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.761622] BUG: unable to >>> handle kernel NULL pointer dereference at 00000000000002f0 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.761781] IP: >>> [] blktap_device_end_request+0x4e/0x6c >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.761892] PGD 710a3067 >>> PUD 724c6067 PMD 0 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.762076] Oops: 0000 [#1] = SMP >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.762212] last sysfs >>> file: /sys/devices/vbd-6-51712/statistics/wr_sect >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.762271] CPU 5 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.762363] Modules linked >>> in: bnx2 xt_mac bridge stp nfs fscache nfs_acl auth_rpcgss >>> arptable_filter arp_tables xt_esp ipt_ah xt_physdev xt_multiport >>> lockd sunrpc bonding dm_multipath megaraid_sas [last unloaded: bnx2] >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.763350] Pid: 7781, >>> comm: tapdisk2 Not tainted 2.6.32.36 #5 PowerEdge M610 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.763410] RIP: >>> e030:[] =A0[] >>> blktap_device_end_request+0x4e/0x6c >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.763519] RSP: >>> e02b:ffff88006ed49cf8 =A0EFLAGS: 00010046 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.763574] RAX: >>> 0000000000000000 RBX: ffff88005e6fc3e0 RCX: ffffffff811a3de6 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.763635] RDX: >>> ffff88005e6fc3e0 RSI: ffffffff8149bed6 RDI: ffff880070f42178 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.763694] RBP: >>> ffff880070f42010 R08: ffffffff81661840 R09: 00000001002c9435 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.763971] R10: >>> 0000000000000000 R11: ffff88005e6378f0 R12: ffff88005e6378f0 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.764244] R13: >>> ffff880070f42000 R14: 0000000000000000 R15: 0000000000000001 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.764522] FS: >>> 00007fcfd34b0730(0000) GS:ffff880015fe7000(0000) >>> knlGS:0000000000000000 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.765011] CS: =A0e033 DS: >>> 0000 ES: 0000 CR0: 000000008005003b >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.765280] CR2: >>> 00000000000002f0 CR3: 000000006ed46000 CR4: 0000000000002660 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.765554] DR0: >>> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.765829] DR3: >>> 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.766105] Process >>> tapdisk2 (pid: 7781, threadinfo ffff88006ed48000, task >>> ffff880079486270) >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.766602] Stack: >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.766868] >>> ffff88007a81dee0 ffff880079486270 0000000000000000 0000000000000001 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.767266]<0> >>> 0000000000000001 ffffffff8121e0b9 0000000000000003 0000000000000000 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.767967]<0> >>> ffff880070f42000 0000bda50000bda7 ffff88005e6fc3e0 fffffffd00000000 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.768921] Call Trace: >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.769193] >>> [] ? blktap_ring_ioctl+0x159/0x290 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.769474] >>> [] ? error_exit+0x2a/0x60 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.769750] >>> [] ? retint_restore_args+0x5/0x6 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.770030] >>> [] ? hypercall_page+0x3aa/0x1001 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.770308] >>> [] ? hypercall_page+0x3aa/0x1001 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.770592] >>> [] ? selinux_file_ioctl+0x0/0x3d >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.770874] >>> [] ? xen_force_evtchn_callback+0x9/0xa >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.771157] >>> [] ? check_events+0x12/0x20 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.771435] >>> [] ? selinux_file_ioctl+0x0/0x3d >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.771715] >>> [] ? xen_restore_fl_direct_end+0x0/0x1 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.771993] >>> [] ? xen_spin_lock_slow+0xb7/0xf8 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.772275] >>> [] ? vfs_ioctl+0x55/0x6b >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.772554] >>> [] ? do_vfs_ioctl+0x492/0x4e5 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.772834] >>> [] ? xen_restore_fl_direct_end+0x0/0x1 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.773116] >>> [] ? sys_ioctl+0x51/0x70 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.773392] >>> [] ? system_call_fastpath+0x16/0x1b >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.773671] Code: e8 39 f6 >>> ff ff 49 8b 44 24 40 48 8b b8 f0 02 00 00 e8 e6 d6 27 00 4c 89 e7 41 >>> 8b 54 24 60 44 89 f6 e8 b9 59 f8 ff 49 8b 44 24 40<48> =A08b b8 f0 02 >>> 00 00 e8 02 14 df ff 66 90 ff 14 25 78 8d 65 81 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.777198] RIP >>> [] blktap_device_end_request+0x4e/0x6c >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.777524] >>> =A0RSP >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.777796] CR2: >>> 00000000000002f0 >>> Apr =A08 18:41:58 r2b16ch2x28p2 kernel: [ 3243.778067] ---[ end trace >>> a71b80c14de09da1 ]--- >>> >>> On 04/08/2011 12:44 PM, Teck Choon Giam wrote: >>>> >>>> On Fri, Apr 8, 2011 at 11:06 PM, Gerd Jakobovitsch >>>> > =A0wrote: >>>> >>>> =A0 =A0On 03/30/2011 11:44 PM, Teck Choon Giam wrote: >>>>> >>>>> =A0 =A0On Wed, Mar 30, 2011 at 10:14 PM, Gerd >>>>> Jakobovitsch =A0 =A0 wrot= e: >>>>>> >>>>>> =A0 =A0Hello all, >>>>>> >>>>>> =A0 =A0I used to run xen4.0 kernel 2.6.32.24 over CentOS 5.5, with a >>>>>> relative success, but the bug at mmu.c appeared once at a while. The= refore, >>>>>> I'm looking for a more stable option. >>>>>> =A0 =A0I compiled and ran the newly released xen 4.1, with kernel PV= OPS >>>>>> 2.6.32.35 over CentOS 5.5. When trying to start a VM, the following = bugs >>>>>> appeared at dmesg. After that, xl and xm commands do not longer resp= ond: >>>>>> >>>>>> =A0 =A0[ =A0145.749573] =A0 alloc irq_desc for 2209 on node -1 >>>>>> =A0 =A0[ =A0145.749581] =A0 alloc kstat_irqs on node -1 >>>>>> =A0 =A0[ =A0145.883515] block tda: sector-size: 512 capacity: 262144 >>>>>> =A0 =A0[ =A0145.889952] general protection fault: 0000 [#1] SMP >>>>>> =A0 =A0[ =A0145.890109] last sysfs file: /sys/block/tda/removable >>>>>> =A0 =A0[ =A0145.890164] CPU 7 >>>>>> =A0 =A0[ =A0145.890252] Modules linked in: bridge stp nfs fscache nf= s_acl >>>>>> auth_rpcgss arptable_filter arp_tables xt_esp ipt_ah xt_physdev xt_m= ultiport >>>>>> lockd sunrpc bonding dm_multipath bnx2 megaraid_sas >>>>>> =A0 =A0[ =A0145.891125] Pid: 5179, comm: tapdisk2 Not tainted 2.6.32= .35 #1 >>>>>> PowerEdge M610 >>>>>> =A0 =A0[ =A0145.891184] RIP: e030:[] =A0[] >>>>>> blktap_device_end_request+0x4e/0x63 >>>>>> =A0 =A0[ =A0145.891296] RSP: e02b:ffff880064061cd8 =A0EFLAGS: 000100= 46 >>>>>> =A0 =A0[ =A0145.891351] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007d264690 = RCX: >>>>>> 0000000000000028 >>>>>> =A0 =A0[ =A0145.891410] RDX: 0000000000000000 RSI: 0000000000000000 = RDI: >>>>>> 0000000000000000 >>>>>> =A0 =A0[ =A0145.891469] RBP: ffff880064061cf8 R08: 0000000064061c98 = R09: >>>>>> ffff88007da42948 >>>>>> =A0 =A0[ =A0145.891528] R10: ffffea0000000008 R11: 0000000001f60080 = R12: >>>>>> ffff88007da427f8 >>>>>> =A0 =A0[ =A0145.891587] R13: ffff88007c75f398 R14: 0000000000000000 = R15: >>>>>> ffff88007c75f3a8 >>>>>> =A0 =A0[ =A0145.891651] FS: =A000007ff33d9a4730(0000) >>>>>> GS:ffff8800189e5000(0000) knlGS:0000000000000000 >>>>>> =A0 =A0[ =A0145.891714] CS: =A0e033 DS: 0000 ES: 0000 CR0: 000000008= 005003b >>>>>> =A0 =A0[ =A0145.891771] CR2: 0000000002594cc8 CR3: 000000007be61000 = CR4: >>>>>> 0000000000002660 >>>>>> =A0 =A0[ =A0145.891830] DR0: 0000000000000000 DR1: 0000000000000000 = DR2: >>>>>> 0000000000000000 >>>>>> =A0 =A0[ =A0145.891890] DR3: 0000000000000000 DR6: 00000000ffff0ff0 = DR7: >>>>>> 0000000000000400 >>>>>> =A0 =A0[ =A0145.892171] Process tapdisk2 (pid: 5179, threadinfo >>>>>> ffff880064060000, task ffff88007c272d60) >>>>>> =A0 =A0[ =A0145.892669] Stack: >>>>>> =A0 =A0[ =A0145.892934] =A0ffff88007c272d60 0000000000000000 0000000= 000000000 >>>>>> 0000000000000000 >>>>>> =A0 =A0[ =A0145.893334]<0> =A0 ffff880064061e88 ffffffff812815ae >>>>>> ffff880064061e58 ffffffff811d234f >>>>>> =A0 =A0[ =A0145.894035]<0> =A0 ffff88007e9bbfc0 ffff88007c75f398 >>>>>> 00000001ffffffff 0000000000000000 >>>>>> =A0 =A0[ =A0145.895015] Call Trace: >>>>>> =A0 =A0[ =A0145.895286] =A0[] blktap_ring_ioctl+0x= 183/0x2d8 >>>>>> =A0 =A0[ =A0145.895566] =A0[] ? inode_has_perm+0x7= 7/0x89 >>>>>> =A0 =A0[ =A0145.895844] =A0[] ? inode_has_perm+0x7= 7/0x89 >>>>>> =A0 =A0[ =A0145.896124] =A0[] ? _raw_spin_lock+0x7= 7/0x12f >>>>>> =A0 =A0[ =A0145.896403] =A0[] ? _raw_spin_unlock+0= xab/0xb2 >>>>>> =A0 =A0[ =A0145.896682] =A0[] ? _spin_unlock+0x9/0= xb >>>>>> =A0 =A0[ =A0145.896958] =A0[] ? _raw_spin_lock+0x7= 7/0x12f >>>>>> =A0 =A0[ =A0145.897234] =A0[] ? file_has_perm+0xb4= /0xc6 >>>>>> =A0 =A0[ =A0145.897513] =A0[] vfs_ioctl+0x5e/0x77 >>>>>> =A0 =A0[ =A0145.897786] =A0[] do_vfs_ioctl+0x484/0= x4d5 >>>>>> =A0 =A0[ =A0145.898060] =A0[] sys_ioctl+0x57/0x7a >>>>>> =A0 =A0[ =A0145.898338] =A0[] system_call_fastpath= +0x16/0x1b >>>>>> =A0 =A0[ =A0145.898614] Code: e8 5f f4 ff ff 49 8b 44 24 40 48 8b b8= 80 03 >>>>>> 00 00 e8 64 75 2a 00 41 8b 54 24 60 44 89 f6 4c 89 e7 e8 b5 89 f7 ff= 49 8b >>>>>> 44 24 40<48> =A0 8b b8 80 03 00 00 e8 23 74 2a 00 5b 41 5c 41 5d 41 = 5e c9 c3 >>>>>> =A0 =A0[ =A0145.902008] RIP =A0[] >>>>>> blktap_device_end_request+0x4e/0x63 >>>>>> =A0 =A0[ =A0145.902321] =A0RSP >>>>>> =A0 =A0[ =A0145.902585] ---[ end trace 2800cfa5aa85ca0a ]--- >>>>>> =A0 =A0[ =A0262.100689] BUG: spinlock lockup on CPU#4, vol_id/5181, >>>>>> ffff88007c75f520 >>>>>> =A0 =A0[ =A0262.100965] Pid: 5181, comm: vol_id Tainted: G =A0 =A0 = =A0D >>>>>> =A02.6.32.35 #1 >>>>>> =A0 =A0[ =A0262.101232] Call Trace: >>>>>> =A0 =A0[ =A0262.101497] =A0[] _raw_spin_lock+0x101= /0x12f >>>>>> =A0 =A0[ =A0262.101762] =A0[] _spin_lock_irq+0x1e/= 0x20 >>>>>> =A0 =A0[ =A0262.102028] =A0[] __make_request+0x5e/= 0x402 >>>>>> =A0 =A0[ =A0262.102294] =A0[] ? >>>>>> xen_restore_fl_direct_end+0x0/0x1 >>>>>> =A0 =A0[ =A0262.102563] =A0[] >>>>>> generic_make_request+0x258/0x2f4 >>>>>> =A0 =A0[ =A0262.102832] =A0[] ? bio_init+0x18/0x32 >>>>>> =A0 =A0[ =A0262.103099] =A0[] submit_bio+0xd0/0xd9 >>>>>> =A0 =A0[ =A0262.103366] =A0[] submit_bh+0xf7/0x11a >>>>>> =A0 =A0[ =A0262.103631] =A0[] >>>>>> block_read_full_page+0x246/0x264 >>>>>> =A0 =A0[ =A0262.103898] =A0[] ? blkdev_get_block+0= x0/0x4d >>>>>> =A0 =A0[ =A0262.104165] =A0[] ? _spin_unlock_irq+0= x1e/0x20 >>>>>> =A0 =A0[ =A0262.104433] =A0[] ? >>>>>> add_to_page_cache_locked+0xa0/0xca >>>>>> =A0 =A0[ =A0262.104702] =A0[] blkdev_readpage+0x13= /0x15 >>>>>> =A0 =A0[ =A0262.104972] =A0[] >>>>>> __do_page_cache_readahead+0x144/0x177 >>>>>> =A0 =A0[ =A0262.105240] =A0[] ondemand_readahead+0= x126/0x18e >>>>>> =A0 =A0[ =A0262.105507] =A0[] >>>>>> page_cache_sync_readahead+0x38/0x3a >>>>>> =A0 =A0[ =A0262.105778] =A0[] >>>>>> generic_file_aio_read+0x24c/0x5c1 >>>>>> =A0 =A0[ =A0262.106045] =A0[] do_sync_read+0xe2/0x= 126 >>>>>> =A0 =A0[ =A0262.106315] =A0[] ? >>>>>> autoremove_wake_function+0x0/0x38 >>>>>> =A0 =A0[ =A0262.106584] =A0[] ? >>>>>> selinux_file_permission+0x5c/0x10e >>>>>> =A0 =A0[ =A0262.106854] =A0[] ? >>>>>> security_file_permission+0x11/0x13 >>>>>> =A0 =A0[ =A0262.107120] =A0[] vfs_read+0xab/0x167 >>>>>> =A0 =A0[ =A0262.107385] =A0[] sys_read+0x47/0x70 >>>>>> =A0 =A0[ =A0262.107652] =A0[] system_call_fastpath= +0x16/0x1b >>>>>> =A0 =A0[ =A0262.107918] sending NMI to all CPUs: >>>>>> =A0 =A0[ =A0262.108189] BUG: unable to handle kernel paging request = at >>>>>> ffffffffff5fb310 >>>>>> =A0 =A0[ =A0262.108526] IP: [] >>>>>> flat_send_IPI_mask+0x6a/0xc0 >>>>>> =A0 =A0[ =A0262.108832] PGD 1003067 PUD 1004067 PMD 18b7067 PTE 0 >>>>>> =A0 =A0[ =A0262.109235] Oops: 0002 [#2] SMP >>>>>> =A0 =A0[ =A0262.109565] last sysfs file: /sys/class/blktap2/blktap1/= dev >>>>>> =A0 =A0[ =A0262.109830] CPU 4 >>>>>> =A0 =A0[ =A0262.110121] Modules linked in: bridge stp nfs fscache nf= s_acl >>>>>> auth_rpcgss arptable_filter arp_tables xt_esp ipt_ah xt_physdev xt_m= ultiport >>>>>> lockd sunrpc bonding dm_multipath bnx2 megaraid_sas >>>>>> =A0 =A0[ =A0262.111520] Pid: 5181, comm: vol_id Tainted: G =A0 =A0 = =A0D >>>>>> =A02.6.32.35 #1 PowerEdge M610 >>>>>> =A0 =A0[ =A0262.112008] RIP: e030:[] =A0[] >>>>>> flat_send_IPI_mask+0x6a/0xc0 >>>>>> =A0 =A0[ =A0262.112535] RSP: e02b:ffff88006778f968 =A0EFLAGS: 000100= 86 >>>>>> =A0 =A0[ =A0262.112800] RAX: 00000000ff000000 RBX: ffffffff81790060 = RCX: >>>>>> 00000000000160a0 >>>>>> =A0 =A0[ =A0262.113068] RDX: ffff88001898e000 RSI: 0000000000000002 = RDI: >>>>>> ffffffff81816020 >>>>>> =A0 =A0[ =A0262.113337] RBP: ffff88006778f988 R08: 0000000000000000 = R09: >>>>>> 0000000000000004 >>>>>> =A0 =A0[ =A0262.113605] R10: 0000000000000002 R11: 0000000000000004 = R12: >>>>>> 0000000000000002 >>>>>> =A0 =A0[ =A0262.113877] R13: 0000000000000800 R14: 00000000000000ff = R15: >>>>>> 0000000000000000 >>>>>> =A0 =A0[ =A0262.114149] FS: =A000007fa78bcc5710(0063) >>>>>> GS:ffff88001898e000(0000) knlGS:0000000000000000 >>>>>> =A0 =A0[ =A0262.114636] CS: =A0e033 DS: 0000 ES: 0000 CR0: 000000008= 005003b >>>>>> =A0 =A0[ =A0262.114902] CR2: ffffffffff5fb310 CR3: 00000000641b4000 = CR4: >>>>>> 0000000000002660 >>>>>> =A0 =A0[ =A0262.115171] DR0: 0000000000000000 DR1: 0000000000000000 = DR2: >>>>>> 0000000000000000 >>>>>> =A0 =A0[ =A0262.115438] DR3: 0000000000000000 DR6: 00000000ffff0ff0 = DR7: >>>>>> 0000000000000400 >>>>>> =A0 =A0[ =A0262.115707] Process vol_id (pid: 5181, threadinfo >>>>>> ffff88006778e000, task ffff88007db86250) >>>>>> =A0 =A0[ =A0262.116194] Stack: >>>>>> =A0 =A0[ =A0262.116451] =A00000000000000000 0000000076e9ecd0 0000000= 000000000 >>>>>> 0000000076e9ecd0 >>>>>> =A0 =A0[ =A0262.116825]<0> =A0 ffff88006778f998 ffffffff8102c841 >>>>>> ffff88006778f9b8 ffffffff81029f0d >>>>>> =A0 =A0[ =A0262.117485]<0> =A0 ffff88007c75f520 ffff88007c75f520 >>>>>> ffff88006778f9f8 ffffffff81219eb3 >>>>>> =A0 =A0[ =A0262.118396] Call Trace: >>>>>> =A0 =A0[ =A0262.118657] =A0[] flat_send_IPI_all+0x= 1a/0x56 >>>>>> =A0 =A0[ =A0262.118925] =A0[] >>>>>> arch_trigger_all_cpu_backtrace+0x45/0x66 >>>>>> =A0 =A0[ =A0262.119195] =A0[] _raw_spin_lock+0x106= /0x12f >>>>>> =A0 =A0[ =A0262.119463] =A0[] _spin_lock_irq+0x1e/= 0x20 >>>>>> =A0 =A0[ =A0262.119730] =A0[] __make_request+0x5e/= 0x402 >>>>>> =A0 =A0[ =A0262.119996] =A0[] ? >>>>>> xen_restore_fl_direct_end+0x0/0x1 >>>>>> =A0 =A0[ =A0262.120264] =A0[] >>>>>> generic_make_request+0x258/0x2f4 >>>>>> =A0 =A0[ =A0262.120532] =A0[] ? bio_init+0x18/0x32 >>>>>> =A0 =A0[ =A0262.120799] =A0[] submit_bio+0xd0/0xd9 >>>>>> =A0 =A0[ =A0262.121066] =A0[] submit_bh+0xf7/0x11a >>>>>> =A0 =A0[ =A0262.121333] =A0[] >>>>>> block_read_full_page+0x246/0x264 >>>>>> =A0 =A0[ =A0262.121602] =A0[] ? blkdev_get_block+0= x0/0x4d >>>>>> =A0 =A0[ =A0262.121870] =A0[] ? _spin_unlock_irq+0= x1e/0x20 >>>>>> =A0 =A0[ =A0262.122137] =A0[] ? >>>>>> add_to_page_cache_locked+0xa0/0xca >>>>>> =A0 =A0[ =A0262.127766] =A0[] blkdev_readpage+0x13= /0x15 >>>>>> =A0 =A0[ =A0262.128025] =A0[] >>>>>> __do_page_cache_readahead+0x144/0x177 >>>>>> =A0 =A0[ =A0262.128288] =A0[] ondemand_readahead+0= x126/0x18e >>>>>> =A0 =A0[ =A0262.128548] =A0[] >>>>>> page_cache_sync_readahead+0x38/0x3a >>>>>> =A0 =A0[ =A0262.128810] =A0[] >>>>>> generic_file_aio_read+0x24c/0x5c1 >>>>>> =A0 =A0[ =A0262.129070] =A0[] do_sync_read+0xe2/0x= 126 >>>>>> =A0 =A0[ =A0262.129329] =A0[] ? >>>>>> autoremove_wake_function+0x0/0x38 >>>>>> =A0 =A0[ =A0262.129590] =A0[] ? >>>>>> selinux_file_permission+0x5c/0x10e >>>>>> =A0 =A0[ =A0262.129851] =A0[] ? >>>>>> security_file_permission+0x11/0x13 >>>>>> =A0 =A0[ =A0262.130110] =A0[] vfs_read+0xab/0x167 >>>>>> =A0 =A0[ =A0262.130368] =A0[] sys_read+0x47/0x70 >>>>>> =A0 =A0[ =A0262.130624] =A0[] system_call_fastpath= +0x16/0x1b >>>>>> =A0 =A0[ =A0262.130883] Code: 8b 05 b4 95 7e 00 83 fe 02 44 8b 68 34= 75 0a >>>>>> ff 90 58 01 00 00 eb 0e f3 90 8b 04 25 00 b3 5f ff f6 c4 10 75 f2 44= 89 f0 >>>>>> c1 e0 18<89> =A0 04 25 10 b3 5f ff 41 83 fc 02 74 08 44 89 e0 44 09 = e8 eb 06 >>>>>> =A0 =A0[ =A0262.133866] RIP =A0[] >>>>>> flat_send_IPI_mask+0x6a/0xc0 >>>>>> =A0 =A0[ =A0262.134164] =A0RSP >>>>>> =A0 =A0[ =A0262.134419] CR2: ffffffffff5fb310 >>>>>> =A0 =A0[ =A0262.134673] ---[ end trace 2800cfa5aa85ca0b ]--- >>>>>> >>>>> =A0 =A0Can you try to recompile your PVOPS kernel with >>>>> CONFIG_DEBUG_PAGEALLOC=3Dy? >>>>> >>>>> =A0 =A0You can read more about this BUG at >>>>> >>>>> =A0http://lists.xensource.com/archives/html/xen-devel/2011-03/msg0175= 6.html >>>>> >>>>> =A0 =A0I initially hit this BUG sometime Dec 2010... ... >>>>> >>>>> =A0http://lists.xensource.com/archives/html/xen-devel/2010-12/msg0150= 1.html >>>>> >>>>> =A0 =A0Thanks. >>>>> >>>>> =A0 =A0Kindest regards, >>>>> =A0 =A0Giam Teck Choon >>>> >>>> =A0 =A0Sorry for the delayed answer. The problem I'm facing now is not >>>> =A0 =A0related to the mmu bug - that one I still am seeing at systems >>>> =A0 =A0with xen 4.0.2 / kernel 2.6.32.24. Newer kernels have new bugs, >>>> =A0 =A0much more troublesome, since I can not run a single VM instance= . >>>> >>>> =A0 =A0Adding DEBUG_PAGEALLOC, the main difference is that the system = is >>>> =A0 =A0rebooting shortly after trying to start up a VM: >>>> >>>> >>>> Ok. =A0Sorry, didn't see your log message carefully previously. >>>> >>>> >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 tapdisk2[15563]: I/O queue driv= er: lio >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 tapdisk2[15563]: received 'atta= ch' >>>> =A0 =A0message (uuid =3D 0) >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 tapdisk2[15563]: sending 'attac= h >>>> =A0 =A0response' message (uuid =3D 0) >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 tapdisk2[15563]: received 'open= ' >>>> =A0 =A0message (uuid =3D 0) >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 tapdisk2[15563]: block-aio >>>> =A0 =A0open('/storage5/linux-centos-5-64b-base-7253/hda') >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 tapdisk2[15563]: >>>> =A0 =A0open(/storage5/linux-centos-5-64b-base-7253/hda) with O_DIRECT >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 tapdisk2[15563]: Image size: >>>> pre sector_shift =A0[134217728] =A0 =A0 post sector_shift [262144] >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 tapdisk2[15563]: opened image >>>> =A0 =A0/storage5/linux-centos-5-64b-base-rip/hda (1 users, state: >>>> =A0 =A00x00000001, type: 0) >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 tapdisk2[15563]: VBD CHAIN: >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 tapdisk2[15563]: >>>> =A0 =A0/storage5/linux-centos-5-64b-base/hda: 0 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 tapdisk2[15563]: sending 'open >>>> =A0 =A0response' message (uuid =3D 0) >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.644887] block td= a: >>>> =A0 =A0sector-size: 512 capacity: 262144 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.657328] general >>>> =A0 =A0protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.657379] last sys= fs >>>> =A0 =A0file: /sys/block/tda/removable >>>> >>>> >>>> Just curious... ... what type of storage you are using for your VMs? >>>> >>>> >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.657400] CPU 0 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.657421] Modules >>>> =A0 =A0linked in: nfs fscache nfs_acl auth_rpcgss bridge stp ocfs2 >>>> =A0 =A0ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager >>>> =A0 =A0ocfs2_stackglue configfs arptable_filter arp_tables xt_esp ipt_= ah >>>> =A0 =A0xt_physdev xt_multiport dm_round_robin lockd sunrpc crc32c bond= ing >>>> =A0 =A0iscsi_tcp libiscsi_tcp bnx2i libiscsi scsi_transport_iscsi cnic >>>> =A0 =A0uio dm_multipath bnx2 megaraid_sas >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.657736] Pid: 155= 66, >>>> =A0 =A0comm: tapdisk2 Not tainted 2.6.32.36 #4 PowerEdge M610 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.657763] RIP: >>>> =A0 =A0e030:[] =A0[] >>>> =A0 =A0blktap_device_end_request+0x4e/0x63 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.657808] RSP: >>>> =A0 =A0e02b:ffff88006da5dcd8 =A0EFLAGS: 00010046 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.657833] RAX: >>>> =A0 =A06b6b6b6b6b6b6b6b RBX: ffff88006566c000 RCX: 0000000000000000 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.657860] RDX: >>>> =A0 =A00000000000000000 RSI: 0000000000000000 RDI: ffff88006d8c7980 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.657887] RBP: >>>> =A0 =A0ffff88006da5dcf8 R08: ffffffff817e66a0 R09: ffff88007d775790 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.658136] R10: >>>> =A0 =A0ffffffff810ccfe4 R11: ffff8800280d8f60 R12: ffff88006720e7f8 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.658384] R13: >>>> =A0 =A0ffff88006d8c77e0 R14: 0000000000000000 R15: ffff88006d8c77f0 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.658635] FS: >>>> 00007f86d75a3730(0000) GS:ffff8800280c7000(0000) >>>> =A0 =A0knlGS:0000000000000000 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.659107] CS: =A0e= 033 DS: >>>> =A0 =A00000 ES: 0000 CR0: 000000008005003b >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.659351] CR2: >>>> =A0 =A00000000045614ed8 CR3: 000000006d911000 CR4: 0000000000002660 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.659600] DR0: >>>> =A0 =A00000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.659847] DR3: >>>> =A0 =A00000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.660100] Process >>>> =A0 =A0tapdisk2 (pid: 15566, threadinfo ffff88006da5c000, task >>>> =A0 =A0ffff88007d7750f0) >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.660574] Stack: >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.660807] >>>> ffff88007d7750f0 0000000000000000 0000000000000000 >>>> 0000000000000000 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.661074]<0> >>>> =A0 =A0ffff88006da5de88 ffffffff8129f2c0 ffffffff8100fedd 000000016da5= ddc8 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.661574]<0> >>>> =A0 =A000000000ffffffff ffff88006d8c77e0 00000001816fab27 000000000000= 0000 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.662287] Call Tra= ce: >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.662522] >>>> [] blktap_ring_ioctl+0x183/0x2d8 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.662767] >>>> [] ? xen_force_evtchn_callback+0xd/0xf >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.663012] >>>> [] ? inode_has_perm+0xa1/0xb3 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.663260] >>>> [] ? xen_restore_fl_direct_end+0x0/0x1 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.663509] >>>> [] ? lock_release+0x1b8/0x1c3 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.663756] >>>> [] ? _raw_spin_unlock+0xab/0xb2 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.663999] >>>> [] ? _spin_unlock+0x26/0x2a >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.664248] >>>> [] ? aio_read_evt+0x87/0x13a >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.664493] >>>> [] ? aio_read_evt+0x11c/0x13a >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.664740] >>>> [] ? _raw_spin_lock+0x77/0x12f >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.664986] >>>> [] ? file_has_perm+0xb4/0xc6 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.665237] >>>> [] vfs_ioctl+0x5e/0x77 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.665480] >>>> [] do_vfs_ioctl+0x484/0x4d5 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.665726] >>>> [] sys_ioctl+0x57/0x7a >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.665970] >>>> [] system_call_fastpath+0x16/0x1b >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.666214] Code: e8= 61 >>>> =A0 =A0f4 ff ff 49 8b 44 24 40 48 8b b8 70 04 00 00 e8 79 67 2b 00 41 = 8b >>>> =A0 =A054 24 60 44 89 f6 4c 89 e7 e8 76 3b f7 ff 49 8b 44 24 40<48> = =A08b >>>> =A0 =A0b8 70 04 00 00 e8 1d 65 2b 00 5b 41 5c 41 5d 41 5e c9 c3 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.667320] RIP >>>> [] blktap_device_end_request+0x4e/0x63 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.667577] =A0RSP >>>> =A0 =A0 >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 kernel: [ 8879.668069] ---[ end >>>> =A0 =A0trace da218b929afc63f7 ]--- >>>> =A0 =A0Apr =A08 12:00:43 r2b16ch2x28p2 tapdisk2[15584]: I/O queue driv= er: lio >>>> =A0 =A0Apr =A08 12:00:48 r2b16ch2x28p2 tap-ctl: >>>> =A0 =A0tap-err:tap_ctl_read_message: failure reading message >>>> =A0 =A0Apr =A08 12:00:48 r2b16ch2x28p2 tap-ctl: >>>> =A0 =A0tap-err:tap_ctl_send_and_receive: failed to receive 'unknown' m= essage >>>> >>>> >>>> Looks like it is related to blktap/blktap2 drivers related issue >>>> to me... so you are right... ... this is different BUG from what I >>>> encountered as the logs are different. =A0Sorry, didn't read your >>>> log carefully before replying previously. =A0Have you try to use >>>> normal LVM for your VM to reproduce the BUG as I guess you are >>>> using different storage? >>>> >>>> Thanks. >>>> >>>> Kindest regards, >>>> Giam Teck Choon >>> >>> -- >>> >>> *Gerd Jakobovitsch >>> Engenheiro de Produto ** >>> ---------------------------------------------------------* * >>> **ALOG Data Centers do Brasil** >>> **Excel=EAncia em Projetos de Hosting* >>> Rua Dr. Miguel Couto, 58 -- 01008-010 -- S=E3o Paulo - SP >>> Telefone: (11) 3524-4970 / (11) 7152-0815 >>> *http://www.alog.com.br* >>> >>> >>> *"Como est=E3o nossos servi=E7os? Clique aqui >>> =A0e nos conte. Queremos >>> escutar a sua opini=E3o!"* >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel > > >