From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Kernel panic with 2.6.32-30 under network activity Date: Tue, 15 Mar 2011 23:20:18 -0400 Message-ID: <20110316032018.GC7905@dumpdata.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Olivier Hanesse Cc: xen-devel@lists.xensource.com, Xen Users List-Id: xen-devel@lists.xenproject.org On Thu, Mar 10, 2011 at 12:25:55PM +0100, Olivier Hanesse wrote: > Hello, > > I've got several kernel panic on a domU under network activity (multiple > rsync using rsh). I didn't manage to reproduce it manually, but it happened > 5times during the last month. Does it happend all the time? > Each time, it is the same kernel trace. > > I am using Debian 5.0.8 with kernel/hypervisor : > > ii linux-image-2.6.32-bpo.5-amd64 2.6.32-30~bpo50+1 Linux 2.6.32 for > 64-bit PCs > ii xen-hypervisor-4.0-amd64 4.0.1-2 The > Xen Hypervisor on AMD64 > > Here is the trace : > > [469390.126691] alignment check: 0000 [#1] SMP aligment check? Was there anything else in the log before this? Was there anything in the Dom0 log? > [469390.126711] last sysfs file: /sys/devices/virtual/net/lo/operstate > [469390.126718] CPU 0 > [469390.126725] Modules linked in: snd_pcsp xen_netfront snd_pcm evdev > snd_timer snd soundcore snd_page_alloc ext3 jbd mbcache dm_mirror > dm_region_hash dm_log dm_snapshot dm_mod xen_blkfront thermal_sys > [469390.126772] Pid: 22077, comm: rsh Not tainted 2.6.32-bpo.5-amd64 #1 > [469390.126779] RIP: e030:[] [] > eth_header+0x61/0x9c > [469390.126795] RSP: e02b:ffff88001ec3f9b8 EFLAGS: 00050286 > [469390.126802] RAX: 00000000090f0900 RBX: 0000000000000008 RCX: > ffff88001ecd0cee > [469390.126811] RDX: 0000000000000800 RSI: 000000000000000e RDI: > ffff88001ecd0cee > [469390.126820] RBP: ffff8800029016d0 R08: 0000000000000000 R09: > 0000000000000034 > [469390.126829] R10: 000000000000000e R11: ffffffff81255821 R12: > ffff880002935144 > [469390.126838] R13: 0000000000000034 R14: ffff88001fe80000 R15: > ffff88001fe80000 > [469390.126851] FS: 00007f340c2276e0(0000) GS:ffff880002f4d000(0000) > knlGS:0000000000000000 > [469390.126860] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [469390.126867] CR2: 00007fffb8f33a8c CR3: 000000001d875000 CR4: > 0000000000002660 > [469390.126877] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [469390.126886] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [469390.126895] Process rsh (pid: 22077, threadinfo ffff88001ec3e000, task > ffff88001ea61530) > [469390.126904] Stack: > [469390.126908] 0000000000000000 0000000000000000 ffff88001ecd0cfc > ffff88001f1a4ae8 > [469390.126921] <0> ffff880002935100 ffff880002935140 0000000000000000 > ffffffff81255a20 > [469390.126937] <0> 0000000000000000 ffffffff8127743d 0000000000000000 > ffff88001ecd0cfc > [469390.126954] Call Trace: > [469390.126963] [] ? neigh_resolve_output+0x1ff/0x284 > [469390.126974] [] ? ip_finish_output2+0x1d6/0x22b > [469390.126983] [] ? ip_queue_xmit+0x311/0x386 > [469390.126994] [] ? xen_force_evtchn_callback+0x9/0xa > [469390.127003] [] ? check_events+0x12/0x20 > [469390.127013] [] ? tcp_transmit_skb+0x648/0x687 > [469390.127022] [] ? check_events+0x12/0x20 > [469390.127031] [] ? xen_restore_fl_direct_end+0x0/0x1 > [469390.127040] [] ? tcp_write_xmit+0x874/0x96c > [469390.127049] [] ? __tcp_push_pending_frames+0x22/0x53 > [469390.127059] [] ? tcp_close+0x176/0x3d0 > [469390.127069] [] ? inet_release+0x4e/0x54 > [469390.127079] [] ? sock_release+0x19/0x66 > [469390.127087] [] ? sock_close+0x22/0x26 > [469390.127097] [] ? __fput+0x100/0x1af > [469390.127106] [] ? filp_close+0x5b/0x62 > [469390.127116] [] ? put_files_struct+0x64/0xc1 > [469390.127127] [] ? _spin_lock_irq+0x7/0x22 > [469390.127135] [] ? do_exit+0x236/0x6c6 > [469390.127144] [] ? > __raw_callee_save_xen_pud_val+0x11/0x1e > [469390.127154] [] ? xen_restore_fl_direct_end+0x0/0x1 > [469390.127163] [] ? > __raw_callee_save_xen_pmd_val+0x11/0x1e > [469390.127173] [] ? do_group_exit+0x76/0x9d > [469390.127183] [] ? get_signal_to_deliver+0x318/0x343 > [469390.127193] [] ? do_notify_resume+0x87/0x73f > [469390.127202] [] ? page_fault+0x25/0x30 > [469390.127211] [] ? error_exit+0x2a/0x60 > [469390.127219] [] ? retint_restore_args+0x5/0x6 > [469390.127228] [] ? xen_restore_fl_direct_end+0x0/0x1 > [469390.127240] [] ? __put_user_4+0x1d/0x30 > [469390.128009] [] ? int_signal+0x12/0x17 > [469390.128009] Code: 89 e8 86 e0 66 89 47 0c 48 85 ed 75 07 49 8b ae 20 02 > 00 00 8b 45 00 4d 85 e4 89 47 06 66 8b 45 04 66 89 47 0a 74 12 41 8b 04 24 > <89> 07 66 41 8b 44 24 04 66 89 47 04 eb 18 41 f6 86 60 01 00 00 > [469390.128009] RIP [] eth_header+0x61/0x9c > [469390.128009] RSP > [469390.128009] ---[ end trace dd6b1396ef9d9a96 ]--- > [469390.128009] Kernel panic - not syncing: Fatal exception in interrupt > [469390.128009] Pid: 22077, comm: rsh Tainted: G D > 2.6.32-bpo.5-amd64 #1 > [469390.128009] Call Trace: > [469390.128009] [] ? panic+0x86/0x143 > [469390.128009] [] ? _spin_unlock_irqrestore+0xd/0xe > [469390.128009] [] ? xen_restore_fl_direct_end+0x0/0x1 > [469390.128009] [] ? _spin_unlock_irqrestore+0xd/0xe > [469390.128009] [] ? release_console_sem+0x17e/0x1af > [469390.128009] [] ? oops_end+0xa7/0xb4 > [469390.128009] [] ? do_alignment_check+0x88/0x92 > [469390.128009] [] ? alignment_check+0x25/0x30 > [469390.128009] [] ? neigh_resolve_output+0x0/0x284 > [469390.128009] [] ? eth_header+0x61/0x9c > [469390.128009] [] ? eth_header+0x24/0x9c > [469390.128009] [] ? neigh_resolve_output+0x1ff/0x284 > [469390.128009] [] ? ip_finish_output2+0x1d6/0x22b > [469390.128009] [] ? ip_queue_xmit+0x311/0x386 > [469390.128009] [] ? xen_force_evtchn_callback+0x9/0xa > [469390.128009] [] ? check_events+0x12/0x20 > [469390.128009] [] ? tcp_transmit_skb+0x648/0x687 > [469390.128009] [] ? check_events+0x12/0x20 > [469390.128009] [] ? xen_restore_fl_direct_end+0x0/0x1 > [469390.128009] [] ? tcp_write_xmit+0x874/0x96c > [469390.128009] [] ? __tcp_push_pending_frames+0x22/0x53 > [469390.128009] [] ? tcp_close+0x176/0x3d0 > [469390.128009] [] ? inet_release+0x4e/0x54 > [469390.128009] [] ? sock_release+0x19/0x66 > [469390.128009] [] ? sock_close+0x22/0x26 > [469390.128009] [] ? __fput+0x100/0x1af > [469390.128009] [] ? filp_close+0x5b/0x62 > [469390.128009] [] ? put_files_struct+0x64/0xc1 > [469390.128009] [] ? _spin_lock_irq+0x7/0x22 > [469390.128009] [] ? do_exit+0x236/0x6c6 > [469390.128009] [] ? > __raw_callee_save_xen_pud_val+0x11/0x1e > [469390.128009] [] ? xen_restore_fl_direct_end+0x0/0x1 > [469390.128009] [] ? > __raw_callee_save_xen_pmd_val+0x11/0x1e > [469390.128009] [] ? do_group_exit+0x76/0x9d > [469390.128009] [] ? get_signal_to_deliver+0x318/0x343 > [469390.128009] [] ? do_notify_resume+0x87/0x73f > [469390.128009] [] ? page_fault+0x25/0x30 > [469390.128009] [] ? error_exit+0x2a/0x60 > [469390.128009] [] ? retint_restore_args+0x5/0x6 > [469390.128009] [] ? xen_restore_fl_direct_end+0x0/0x1 > [469390.128009] [] ? __put_user_4+0x1d/0x30 > [469390.128009] [] ? int_signal+0x12/0x17 > > I found another post, which may be the same bug (same kernel, network > activity ... ) : > > http://jira.mongodb.org/browse/SERVER-2383 > > Any ideas ? None.. What type of CPU do you have? Are you pinning your guest to a specific CPU?