* Re: Networking-related crash? [not found] <608c44bf0912090546s446bf973ne408e99661fdc56f@mail.gmail.com> @ 2009-12-09 15:11 ` Avi Kivity 2009-12-09 20:36 ` Adam Huffman 2009-12-09 20:54 ` Eric Dumazet 0 siblings, 2 replies; 8+ messages in thread From: Avi Kivity @ 2009-12-09 15:11 UTC (permalink / raw) To: Adam Huffman; +Cc: kvm, netdev On 12/09/2009 03:46 PM, Adam Huffman wrote: > I've been seeing lots of crashes on a new Dell Precision T7500, > running the KVM in Fedora 12. Finally managed to capture an Oops, > which is shown below (hand-transcribed): > > BUG: unable to handle kernel paging request at 0000000000200200 > IP: [<ffffffff8139aab7>] destroy_conntrack+0x82/0x11f > PGD 332d0e067 PUD 33453c067 PMD 0 > Oops: 0002 [#1] SMP > last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map > CPU 4 > Modules linked in: tun bridge stp llc sunrpc ipt_MASQUERADE > iptable_nat nf_nat ipt_LOG xt_physdev ip6t_REJECT nf_conntrack_ipv6 > ip6table_filter ip6 > _tables ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_analog > nouveau snd_hda_intel snd_hda_codec ttm drm_kms_helper snd_hwdep > snd_seq drm sn > d_seq_device snd_pcm firewire_ohci i2c_i801 snd_timer ppdev > firewire_core snd i2c_algo_bit iTCO_wdt crc_itu_t parport_pc i2c_core > soundcore parport > iTCO_vendor_support tg3 snd_page_alloc shpchp dcdbas wmi mptsas > mptscsih mptbase scsi_transport_sas megaraid_sas [last_unloaded: > speedstep_lib] > Pid: 1759, comm: qemu-kvm Not tainted 2.6.31.6-162.fc12.x86_64 #1 > Precision WorkStation T7500 > RIP: 0010:[<ffffffff8139aab7>] [<ffffffff8139aab7>] > destroy_conntrack+0x82/0x11f > RSP: 0018:ffffc90000803bf0 EFLAGS: 00010202 > RAX: 0000000080000001 RBX: ffffffff816fb1a0 RCX: 000000000000752f > RDX: 0000000000200200 RSI: 0000000000000011 RDI: ffffffff816fb1a0 > RBP: ffffc90000803c00 R08: ffff880336699438 R09: 0000000000aaa5e0 > R10: 00000002f54189d5 R11: 0000000000000001 R12: ffffffff819a92e0 > R13: ffffffffa029adcc R14: 0000000000000000 R15: ffff880632866c38 > FS: 00007fdd34b17710(0000) GS:ffffc90000800000(0000) knlGS:0000000000000000 > CS: 0010 DS: 002B ES: 002B CR0: 0000000080050033 > CR2: 0000000000200200 CR3: 00000003349c0000 CR4: 00000000000026e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process qemu-kvm (pid: 1759, threadinfo ffff88062e9e8000, task ffff880634945e00) > Stack: > ffff880632866c00 ffff880634640c30 ffffc90000803c10 ffffffff813989c2 > <0> ffffc90000803c30 ffffffff81374092 ffffc90000803c30 ffff880632866c00 > <0> ffffc90000803c50 ffffffff81373dd3 0000000200000000 ffff880632866c00 > Call Trace: > <IRQ> > [<ffffffff813989c2>] nf_conntrack_destroy+0x1b/0x1d > [<ffffffff81374092>] skb_release_head_state+0x95/0xd7 > [<ffffffff81373dd3>] __kfree_skb+0x16/0x81 > [<ffffffff81373ed7>] kfree_skb+0x6a/0x72 > [<ffffffffa029adcc>] ip6_mc_input+0x220/0x230 [ipv6] > [<ffffffffa029a3d1>] ip6_rcv_finish+0x27/0x2b [ipv6] > [<ffffffffa029a763>] ipv6_rcv+0x38e/0x3e5 [ipv6] > [<ffffffff8137bd91>] netif_receive_skb+0x402/0x427 > [<ffffffff8137bf1b>] napi_skb_finish+0x29/0x3d > [<ffffffff8137c37a>] napi_gro_receive+0x2f/0x34 > [<ffffffffa0084fad>] tg3_poll+0x6c6/0x8c3 [tg3] > [<ffffffff8137c4b0>] net_rx_action+0xaf/0x1c9 > [<ffffffff81379cfe>] ? list-add_tail+0x15/0x17 > [<ffffffff81057614>] __do_softirq+0xdd/0x1ad > [<ffffffff81026936>] ? apic_write+0x16/0x18 > [<ffffffff81012eac>] call_softirq+0x1c/0x30 > [<ffffffff810143fb>] do_softirq+0x47/0x8d > [<ffffffff81057326>] irq_exit+0x44/0x86 > [<ffffffff8141ecd5>] do_IRQ+0xa5/0xbc > [<ffffffff810126d3>] ret_from_intr+0x0/0x11 > <EOI> > [<ffffffffa02437bb>] ? kvm_arch_vcpu_ioctl_run+0x84b/0xb34 [kvm] > [<ffffffffa02437aa>] ? kvm_arch_vcpu_ioctl_run+0x83a/0xb34 [kvm] > [<ffffffffa02395e3>] ? kvm_vcpu_ioctl+0xfd/0x556 [kvm] > [<ffffffff81108adc>] ? vfs_ioctl+0x22/0x87 > [<ffffffff81109038>] ? do_vfs_ioctl+0x47b/0x4c1 > [<ffffffff811090d4>] ? sys_ioctl+0x56/0x79 > [<ffffffff81012093>] ? stub_clone+0x13/0x20 > [<ffffffff81011cf2>] ? system_call_fastpath+0x16/0x1b > Code: c7 00 a6 9a 81 e8 23 04 08 00 48 89 df e8 68 29 00 00 f6 43 78 > 08 75 24 48 8b 53 10 48 85 d2 75 04 0f 0b eb fe 48 8b 43 08 a8 01<48> > 89 02 7 > 5 04 48 89 50 08 48 c7 43 10 00 02 20 00 65 8b 14 25 > RIP [<ffffffff8139aab7>] destroy_conntrack+0x82/0x11f > RSP<ffffc90000803bf0> > CR2: 0000000000200200 > Looks unrelated to kvm - softirq happened to trigger during a kvm ioctl. Fault looks like list poison. Copying netdev. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Networking-related crash? 2009-12-09 15:11 ` Networking-related crash? Avi Kivity @ 2009-12-09 20:36 ` Adam Huffman 2009-12-09 20:54 ` Eric Dumazet 1 sibling, 0 replies; 8+ messages in thread From: Adam Huffman @ 2009-12-09 20:36 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, netdev On Wed, Dec 9, 2009 at 3:11 PM, Avi Kivity <avi@redhat.com> wrote: > On 12/09/2009 03:46 PM, Adam Huffman wrote: >> >> I've been seeing lots of crashes on a new Dell Precision T7500, >> running the KVM in Fedora 12. Finally managed to capture an Oops, >> which is shown below (hand-transcribed): >> >> BUG: unable to handle kernel paging request at 0000000000200200 >> IP: [<ffffffff8139aab7>] destroy_conntrack+0x82/0x11f >> PGD 332d0e067 PUD 33453c067 PMD 0 >> Oops: 0002 [#1] SMP >> last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map >> CPU 4 >> Modules linked in: tun bridge stp llc sunrpc ipt_MASQUERADE >> iptable_nat nf_nat ipt_LOG xt_physdev ip6t_REJECT nf_conntrack_ipv6 >> ip6table_filter ip6 >> _tables ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_analog >> nouveau snd_hda_intel snd_hda_codec ttm drm_kms_helper snd_hwdep >> snd_seq drm sn >> d_seq_device snd_pcm firewire_ohci i2c_i801 snd_timer ppdev >> firewire_core snd i2c_algo_bit iTCO_wdt crc_itu_t parport_pc i2c_core >> soundcore parport >> iTCO_vendor_support tg3 snd_page_alloc shpchp dcdbas wmi mptsas >> mptscsih mptbase scsi_transport_sas megaraid_sas [last_unloaded: >> speedstep_lib] >> Pid: 1759, comm: qemu-kvm Not tainted 2.6.31.6-162.fc12.x86_64 #1 >> Precision WorkStation T7500 >> RIP: 0010:[<ffffffff8139aab7>] [<ffffffff8139aab7>] >> destroy_conntrack+0x82/0x11f >> RSP: 0018:ffffc90000803bf0 EFLAGS: 00010202 >> RAX: 0000000080000001 RBX: ffffffff816fb1a0 RCX: 000000000000752f >> RDX: 0000000000200200 RSI: 0000000000000011 RDI: ffffffff816fb1a0 >> RBP: ffffc90000803c00 R08: ffff880336699438 R09: 0000000000aaa5e0 >> R10: 00000002f54189d5 R11: 0000000000000001 R12: ffffffff819a92e0 >> R13: ffffffffa029adcc R14: 0000000000000000 R15: ffff880632866c38 >> FS: 00007fdd34b17710(0000) GS:ffffc90000800000(0000) >> knlGS:0000000000000000 >> CS: 0010 DS: 002B ES: 002B CR0: 0000000080050033 >> CR2: 0000000000200200 CR3: 00000003349c0000 CR4: 00000000000026e0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process qemu-kvm (pid: 1759, threadinfo ffff88062e9e8000, task >> ffff880634945e00) >> Stack: >> ffff880632866c00 ffff880634640c30 ffffc90000803c10 ffffffff813989c2 >> <0> ffffc90000803c30 ffffffff81374092 ffffc90000803c30 ffff880632866c00 >> <0> ffffc90000803c50 ffffffff81373dd3 0000000200000000 ffff880632866c00 >> Call Trace: >> <IRQ> >> [<ffffffff813989c2>] nf_conntrack_destroy+0x1b/0x1d >> [<ffffffff81374092>] skb_release_head_state+0x95/0xd7 >> [<ffffffff81373dd3>] __kfree_skb+0x16/0x81 >> [<ffffffff81373ed7>] kfree_skb+0x6a/0x72 >> [<ffffffffa029adcc>] ip6_mc_input+0x220/0x230 [ipv6] >> [<ffffffffa029a3d1>] ip6_rcv_finish+0x27/0x2b [ipv6] >> [<ffffffffa029a763>] ipv6_rcv+0x38e/0x3e5 [ipv6] >> [<ffffffff8137bd91>] netif_receive_skb+0x402/0x427 >> [<ffffffff8137bf1b>] napi_skb_finish+0x29/0x3d >> [<ffffffff8137c37a>] napi_gro_receive+0x2f/0x34 >> [<ffffffffa0084fad>] tg3_poll+0x6c6/0x8c3 [tg3] >> [<ffffffff8137c4b0>] net_rx_action+0xaf/0x1c9 >> [<ffffffff81379cfe>] ? list-add_tail+0x15/0x17 >> [<ffffffff81057614>] __do_softirq+0xdd/0x1ad >> [<ffffffff81026936>] ? apic_write+0x16/0x18 >> [<ffffffff81012eac>] call_softirq+0x1c/0x30 >> [<ffffffff810143fb>] do_softirq+0x47/0x8d >> [<ffffffff81057326>] irq_exit+0x44/0x86 >> [<ffffffff8141ecd5>] do_IRQ+0xa5/0xbc >> [<ffffffff810126d3>] ret_from_intr+0x0/0x11 >> <EOI> >> [<ffffffffa02437bb>] ? kvm_arch_vcpu_ioctl_run+0x84b/0xb34 [kvm] >> [<ffffffffa02437aa>] ? kvm_arch_vcpu_ioctl_run+0x83a/0xb34 [kvm] >> [<ffffffffa02395e3>] ? kvm_vcpu_ioctl+0xfd/0x556 [kvm] >> [<ffffffff81108adc>] ? vfs_ioctl+0x22/0x87 >> [<ffffffff81109038>] ? do_vfs_ioctl+0x47b/0x4c1 >> [<ffffffff811090d4>] ? sys_ioctl+0x56/0x79 >> [<ffffffff81012093>] ? stub_clone+0x13/0x20 >> [<ffffffff81011cf2>] ? system_call_fastpath+0x16/0x1b >> Code: c7 00 a6 9a 81 e8 23 04 08 00 48 89 df e8 68 29 00 00 f6 43 78 >> 08 75 24 48 8b 53 10 48 85 d2 75 04 0f 0b eb fe 48 8b 43 08 a8 01<48> >> 89 02 7 >> 5 04 48 89 50 08 48 c7 43 10 00 02 20 00 65 8b 14 25 >> RIP [<ffffffff8139aab7>] destroy_conntrack+0x82/0x11f >> RSP<ffffc90000803bf0> >> CR2: 0000000000200200 >> > > Looks unrelated to kvm - softirq happened to trigger during a kvm ioctl. > Fault looks like list poison. Copying netdev. > Disabling VT-d support in the BIOS seems to have stopped the crashes. At least it's been running without crashing for several hours now, while it would only last minutes before. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Networking-related crash? 2009-12-09 15:11 ` Networking-related crash? Avi Kivity 2009-12-09 20:36 ` Adam Huffman @ 2009-12-09 20:54 ` Eric Dumazet 2009-12-10 11:01 ` Patrick McHardy 1 sibling, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2009-12-09 20:54 UTC (permalink / raw) To: Avi Kivity; +Cc: Adam Huffman, kvm, netdev, Patrick McHardy Le 09/12/2009 16:11, Avi Kivity a écrit : > On 12/09/2009 03:46 PM, Adam Huffman wrote: >> I've been seeing lots of crashes on a new Dell Precision T7500, >> running the KVM in Fedora 12. Finally managed to capture an Oops, >> which is shown below (hand-transcribed): >> >> BUG: unable to handle kernel paging request at 0000000000200200 >> IP: [<ffffffff8139aab7>] destroy_conntrack+0x82/0x11f >> PGD 332d0e067 PUD 33453c067 PMD 0 >> Oops: 0002 [#1] SMP >> last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map >> CPU 4 >> Modules linked in: tun bridge stp llc sunrpc ipt_MASQUERADE >> iptable_nat nf_nat ipt_LOG xt_physdev ip6t_REJECT nf_conntrack_ipv6 >> ip6table_filter ip6 >> _tables ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_analog >> nouveau snd_hda_intel snd_hda_codec ttm drm_kms_helper snd_hwdep >> snd_seq drm sn >> d_seq_device snd_pcm firewire_ohci i2c_i801 snd_timer ppdev >> firewire_core snd i2c_algo_bit iTCO_wdt crc_itu_t parport_pc i2c_core >> soundcore parport >> iTCO_vendor_support tg3 snd_page_alloc shpchp dcdbas wmi mptsas >> mptscsih mptbase scsi_transport_sas megaraid_sas [last_unloaded: >> speedstep_lib] >> Pid: 1759, comm: qemu-kvm Not tainted 2.6.31.6-162.fc12.x86_64 #1 >> Precision WorkStation T7500 >> RIP: 0010:[<ffffffff8139aab7>] [<ffffffff8139aab7>] >> destroy_conntrack+0x82/0x11f >> RSP: 0018:ffffc90000803bf0 EFLAGS: 00010202 >> RAX: 0000000080000001 RBX: ffffffff816fb1a0 RCX: 000000000000752f >> RDX: 0000000000200200 RSI: 0000000000000011 RDI: ffffffff816fb1a0 >> RBP: ffffc90000803c00 R08: ffff880336699438 R09: 0000000000aaa5e0 >> R10: 00000002f54189d5 R11: 0000000000000001 R12: ffffffff819a92e0 >> R13: ffffffffa029adcc R14: 0000000000000000 R15: ffff880632866c38 >> FS: 00007fdd34b17710(0000) GS:ffffc90000800000(0000) >> knlGS:0000000000000000 >> CS: 0010 DS: 002B ES: 002B CR0: 0000000080050033 >> CR2: 0000000000200200 CR3: 00000003349c0000 CR4: 00000000000026e0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process qemu-kvm (pid: 1759, threadinfo ffff88062e9e8000, task >> ffff880634945e00) >> Stack: >> ffff880632866c00 ffff880634640c30 ffffc90000803c10 ffffffff813989c2 >> <0> ffffc90000803c30 ffffffff81374092 ffffc90000803c30 ffff880632866c00 >> <0> ffffc90000803c50 ffffffff81373dd3 0000000200000000 ffff880632866c00 >> Call Trace: >> <IRQ> >> [<ffffffff813989c2>] nf_conntrack_destroy+0x1b/0x1d >> [<ffffffff81374092>] skb_release_head_state+0x95/0xd7 >> [<ffffffff81373dd3>] __kfree_skb+0x16/0x81 >> [<ffffffff81373ed7>] kfree_skb+0x6a/0x72 >> [<ffffffffa029adcc>] ip6_mc_input+0x220/0x230 [ipv6] >> [<ffffffffa029a3d1>] ip6_rcv_finish+0x27/0x2b [ipv6] >> [<ffffffffa029a763>] ipv6_rcv+0x38e/0x3e5 [ipv6] >> [<ffffffff8137bd91>] netif_receive_skb+0x402/0x427 >> [<ffffffff8137bf1b>] napi_skb_finish+0x29/0x3d >> [<ffffffff8137c37a>] napi_gro_receive+0x2f/0x34 >> [<ffffffffa0084fad>] tg3_poll+0x6c6/0x8c3 [tg3] >> [<ffffffff8137c4b0>] net_rx_action+0xaf/0x1c9 >> [<ffffffff81379cfe>] ? list-add_tail+0x15/0x17 >> [<ffffffff81057614>] __do_softirq+0xdd/0x1ad >> [<ffffffff81026936>] ? apic_write+0x16/0x18 >> [<ffffffff81012eac>] call_softirq+0x1c/0x30 >> [<ffffffff810143fb>] do_softirq+0x47/0x8d >> [<ffffffff81057326>] irq_exit+0x44/0x86 >> [<ffffffff8141ecd5>] do_IRQ+0xa5/0xbc >> [<ffffffff810126d3>] ret_from_intr+0x0/0x11 >> <EOI> >> [<ffffffffa02437bb>] ? kvm_arch_vcpu_ioctl_run+0x84b/0xb34 [kvm] >> [<ffffffffa02437aa>] ? kvm_arch_vcpu_ioctl_run+0x83a/0xb34 [kvm] >> [<ffffffffa02395e3>] ? kvm_vcpu_ioctl+0xfd/0x556 [kvm] >> [<ffffffff81108adc>] ? vfs_ioctl+0x22/0x87 >> [<ffffffff81109038>] ? do_vfs_ioctl+0x47b/0x4c1 >> [<ffffffff811090d4>] ? sys_ioctl+0x56/0x79 >> [<ffffffff81012093>] ? stub_clone+0x13/0x20 >> [<ffffffff81011cf2>] ? system_call_fastpath+0x16/0x1b >> Code: c7 00 a6 9a 81 e8 23 04 08 00 48 89 df e8 68 29 00 00 f6 43 78 >> 08 75 24 48 8b 53 10 48 85 d2 75 04 0f 0b eb fe 48 8b 43 08 a8 01<48> >> 89 02 7 >> 5 04 48 89 50 08 48 c7 43 10 00 02 20 00 65 8b 14 25 >> RIP [<ffffffff8139aab7>] destroy_conntrack+0x82/0x11f >> RSP<ffffc90000803bf0> >> CR2: 0000000000200200 >> > > Looks unrelated to kvm - softirq happened to trigger during a kvm > ioctl. Fault looks like list poison. Copying netdev. > crash in : 48 8b 43 08 mov 0x8(%rbx),%rax a8 01 test $0x1,%al 48 89 02 mov %rax,(%rdx) << HERE >> RDX=0x200200 (LIST_POISON2) 75 04 jne 1f 48 89 50 08 mov %rdx,0x8(%rax) 1: 48 c7 43 10 00 02 20 movq $0x200200,0x10(%rbx) if (!nf_ct_is_confirmed(ct)) { BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode)); hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); << HERE >> } NF_CT_STAT_INC(net, delete); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Networking-related crash? 2009-12-09 20:54 ` Eric Dumazet @ 2009-12-10 11:01 ` Patrick McHardy 2009-12-10 15:18 ` Adam Huffman 0 siblings, 1 reply; 8+ messages in thread From: Patrick McHardy @ 2009-12-10 11:01 UTC (permalink / raw) To: Eric Dumazet Cc: Avi Kivity, Adam Huffman, kvm, netdev, Netfilter Development Mailinglist Eric Dumazet wrote: > Le 09/12/2009 16:11, Avi Kivity a écrit : >> On 12/09/2009 03:46 PM, Adam Huffman wrote: >>> I've been seeing lots of crashes on a new Dell Precision T7500, >>> running the KVM in Fedora 12. Finally managed to capture an Oops, >>> which is shown below (hand-transcribed): >>> >>> BUG: unable to handle kernel paging request at 0000000000200200 >>> IP: [<ffffffff8139aab7>] destroy_conntrack+0x82/0x11f >>> PGD 332d0e067 PUD 33453c067 PMD 0 >>> RIP: 0010:[<ffffffff8139aab7>] [<ffffffff8139aab7>] >>> destroy_conntrack+0x82/0x11f >>> RSP: 0018:ffffc90000803bf0 EFLAGS: 00010202 >>> RAX: 0000000080000001 RBX: ffffffff816fb1a0 RCX: 000000000000752f >>> RDX: 0000000000200200 RSI: 0000000000000011 RDI: ffffffff816fb1a0 >>> RBP: ffffc90000803c00 R08: ffff880336699438 R09: 0000000000aaa5e0 >>> R10: 00000002f54189d5 R11: 0000000000000001 R12: ffffffff819a92e0 >>> R13: ffffffffa029adcc R14: 0000000000000000 R15: ffff880632866c38 >>> FS: 00007fdd34b17710(0000) GS:ffffc90000800000(0000) >>> knlGS:0000000000000000 >>> CS: 0010 DS: 002B ES: 002B CR0: 0000000080050033 >>> CR2: 0000000000200200 CR3: 00000003349c0000 CR4: 00000000000026e0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>> Process qemu-kvm (pid: 1759, threadinfo ffff88062e9e8000, task >>> ffff880634945e00) >>> Stack: >>> ffff880632866c00 ffff880634640c30 ffffc90000803c10 ffffffff813989c2 >>> <0> ffffc90000803c30 ffffffff81374092 ffffc90000803c30 ffff880632866c00 >>> <0> ffffc90000803c50 ffffffff81373dd3 0000000200000000 ffff880632866c00 >>> Call Trace: >>> <IRQ> >>> [<ffffffff813989c2>] nf_conntrack_destroy+0x1b/0x1d >>> [<ffffffff81374092>] skb_release_head_state+0x95/0xd7 >>> [<ffffffff81373dd3>] __kfree_skb+0x16/0x81 >>> [<ffffffff81373ed7>] kfree_skb+0x6a/0x72 >>> [<ffffffffa029adcc>] ip6_mc_input+0x220/0x230 [ipv6] >>> [<ffffffffa029a3d1>] ip6_rcv_finish+0x27/0x2b [ipv6] >>> [<ffffffffa029a763>] ipv6_rcv+0x38e/0x3e5 [ipv6] >>> [<ffffffff8137bd91>] netif_receive_skb+0x402/0x427 >>> ... >>> > crash in : > 48 8b 43 08 mov 0x8(%rbx),%rax > a8 01 test $0x1,%al > 48 89 02 mov %rax,(%rdx) << HERE >> RDX=0x200200 (LIST_POISON2) > 75 04 jne 1f > 48 89 50 08 mov %rdx,0x8(%rax) > 1: 48 c7 43 10 00 02 20 movq $0x200200,0x10(%rbx) > > if (!nf_ct_is_confirmed(ct)) { > BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode)); > hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); << HERE >> > } > NF_CT_STAT_INC(net, delete); I can't spot the problem. Adam, please send me your .config file. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Networking-related crash? 2009-12-10 11:01 ` Patrick McHardy @ 2009-12-10 15:18 ` Adam Huffman 2009-12-14 13:16 ` Patrick McHardy 0 siblings, 1 reply; 8+ messages in thread From: Adam Huffman @ 2009-12-10 15:18 UTC (permalink / raw) To: Patrick McHardy Cc: Eric Dumazet, Avi Kivity, kvm, netdev, Netfilter Development Mailinglist [-- Attachment #1: Type: text/plain, Size: 3210 bytes --] On Thu, Dec 10, 2009 at 11:01 AM, Patrick McHardy <kaber@trash.net> wrote: > Eric Dumazet wrote: >> Le 09/12/2009 16:11, Avi Kivity a écrit : >>> On 12/09/2009 03:46 PM, Adam Huffman wrote: >>>> I've been seeing lots of crashes on a new Dell Precision T7500, >>>> running the KVM in Fedora 12. Finally managed to capture an Oops, >>>> which is shown below (hand-transcribed): >>>> >>>> BUG: unable to handle kernel paging request at 0000000000200200 >>>> IP: [<ffffffff8139aab7>] destroy_conntrack+0x82/0x11f >>>> PGD 332d0e067 PUD 33453c067 PMD 0 >>>> RIP: 0010:[<ffffffff8139aab7>] [<ffffffff8139aab7>] >>>> destroy_conntrack+0x82/0x11f >>>> RSP: 0018:ffffc90000803bf0 EFLAGS: 00010202 >>>> RAX: 0000000080000001 RBX: ffffffff816fb1a0 RCX: 000000000000752f >>>> RDX: 0000000000200200 RSI: 0000000000000011 RDI: ffffffff816fb1a0 >>>> RBP: ffffc90000803c00 R08: ffff880336699438 R09: 0000000000aaa5e0 >>>> R10: 00000002f54189d5 R11: 0000000000000001 R12: ffffffff819a92e0 >>>> R13: ffffffffa029adcc R14: 0000000000000000 R15: ffff880632866c38 >>>> FS: 00007fdd34b17710(0000) GS:ffffc90000800000(0000) >>>> knlGS:0000000000000000 >>>> CS: 0010 DS: 002B ES: 002B CR0: 0000000080050033 >>>> CR2: 0000000000200200 CR3: 00000003349c0000 CR4: 00000000000026e0 >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>> Process qemu-kvm (pid: 1759, threadinfo ffff88062e9e8000, task >>>> ffff880634945e00) >>>> Stack: >>>> ffff880632866c00 ffff880634640c30 ffffc90000803c10 ffffffff813989c2 >>>> <0> ffffc90000803c30 ffffffff81374092 ffffc90000803c30 ffff880632866c00 >>>> <0> ffffc90000803c50 ffffffff81373dd3 0000000200000000 ffff880632866c00 >>>> Call Trace: >>>> <IRQ> >>>> [<ffffffff813989c2>] nf_conntrack_destroy+0x1b/0x1d >>>> [<ffffffff81374092>] skb_release_head_state+0x95/0xd7 >>>> [<ffffffff81373dd3>] __kfree_skb+0x16/0x81 >>>> [<ffffffff81373ed7>] kfree_skb+0x6a/0x72 >>>> [<ffffffffa029adcc>] ip6_mc_input+0x220/0x230 [ipv6] >>>> [<ffffffffa029a3d1>] ip6_rcv_finish+0x27/0x2b [ipv6] >>>> [<ffffffffa029a763>] ipv6_rcv+0x38e/0x3e5 [ipv6] >>>> [<ffffffff8137bd91>] netif_receive_skb+0x402/0x427 >>>> ... >>>> >> crash in : >> 48 8b 43 08 mov 0x8(%rbx),%rax >> a8 01 test $0x1,%al >> 48 89 02 mov %rax,(%rdx) << HERE >> RDX=0x200200 (LIST_POISON2) >> 75 04 jne 1f >> 48 89 50 08 mov %rdx,0x8(%rax) >> 1: 48 c7 43 10 00 02 20 movq $0x200200,0x10(%rbx) >> >> if (!nf_ct_is_confirmed(ct)) { >> BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode)); >> hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); << HERE >> >> } >> NF_CT_STAT_INC(net, delete); > > > I can't spot the problem. Adam, please send me your .config file. > > It's the standard Fedora .config, which is attached. As I stated in another message, the oops seems related to VT-d. With that disabled, the machine has been stable for nearly a day now. [-- Attachment #2: .config --] [-- Type: application/x-config, Size: 97986 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Networking-related crash? 2009-12-10 15:18 ` Adam Huffman @ 2009-12-14 13:16 ` Patrick McHardy 2009-12-14 13:46 ` Adam Huffman 2010-01-05 16:57 ` Adam Huffman 0 siblings, 2 replies; 8+ messages in thread From: Patrick McHardy @ 2009-12-14 13:16 UTC (permalink / raw) To: Adam Huffman Cc: Eric Dumazet, Avi Kivity, kvm, netdev, Netfilter Development Mailinglist Adam Huffman wrote: > On Thu, Dec 10, 2009 at 11:01 AM, Patrick McHardy <kaber@trash.net> wrote: >> Eric Dumazet wrote: >>> Le 09/12/2009 16:11, Avi Kivity a écrit : >>>> On 12/09/2009 03:46 PM, Adam Huffman wrote: >>>>> I've been seeing lots of crashes on a new Dell Precision T7500, >>>>> running the KVM in Fedora 12. Finally managed to capture an Oops, >>>>> which is shown below (hand-transcribed): >>>>> >>>>> BUG: unable to handle kernel paging request at 0000000000200200 >>>>> IP: [<ffffffff8139aab7>] destroy_conntrack+0x82/0x11f >>>>> PGD 332d0e067 PUD 33453c067 PMD 0 >>>>> RIP: 0010:[<ffffffff8139aab7>] [<ffffffff8139aab7>] >>>>> destroy_conntrack+0x82/0x11f >>>>> RSP: 0018:ffffc90000803bf0 EFLAGS: 00010202 >>>>> RAX: 0000000080000001 RBX: ffffffff816fb1a0 RCX: 000000000000752f >>>>> RDX: 0000000000200200 RSI: 0000000000000011 RDI: ffffffff816fb1a0 >>>>> RBP: ffffc90000803c00 R08: ffff880336699438 R09: 0000000000aaa5e0 >>>>> R10: 00000002f54189d5 R11: 0000000000000001 R12: ffffffff819a92e0 >>>>> R13: ffffffffa029adcc R14: 0000000000000000 R15: ffff880632866c38 >>>>> FS: 00007fdd34b17710(0000) GS:ffffc90000800000(0000) >>>>> knlGS:0000000000000000 >>>>> CS: 0010 DS: 002B ES: 002B CR0: 0000000080050033 >>>>> CR2: 0000000000200200 CR3: 00000003349c0000 CR4: 00000000000026e0 >>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>> Process qemu-kvm (pid: 1759, threadinfo ffff88062e9e8000, task >>>>> ffff880634945e00) >>>>> Stack: >>>>> ffff880632866c00 ffff880634640c30 ffffc90000803c10 ffffffff813989c2 >>>>> <0> ffffc90000803c30 ffffffff81374092 ffffc90000803c30 ffff880632866c00 >>>>> <0> ffffc90000803c50 ffffffff81373dd3 0000000200000000 ffff880632866c00 >>>>> Call Trace: >>>>> <IRQ> >>>>> [<ffffffff813989c2>] nf_conntrack_destroy+0x1b/0x1d >>>>> [<ffffffff81374092>] skb_release_head_state+0x95/0xd7 >>>>> [<ffffffff81373dd3>] __kfree_skb+0x16/0x81 >>>>> [<ffffffff81373ed7>] kfree_skb+0x6a/0x72 >>>>> [<ffffffffa029adcc>] ip6_mc_input+0x220/0x230 [ipv6] >>>>> [<ffffffffa029a3d1>] ip6_rcv_finish+0x27/0x2b [ipv6] >>>>> [<ffffffffa029a763>] ipv6_rcv+0x38e/0x3e5 [ipv6] >>>>> [<ffffffff8137bd91>] netif_receive_skb+0x402/0x427 >>>>> ... >>>>> >>> crash in : >>> 48 8b 43 08 mov 0x8(%rbx),%rax >>> a8 01 test $0x1,%al >>> 48 89 02 mov %rax,(%rdx) << HERE >> RDX=0x200200 (LIST_POISON2) >>> 75 04 jne 1f >>> 48 89 50 08 mov %rdx,0x8(%rax) >>> 1: 48 c7 43 10 00 02 20 movq $0x200200,0x10(%rbx) >>> >>> if (!nf_ct_is_confirmed(ct)) { >>> BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode)); >>> hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); << HERE >> >>> } >>> NF_CT_STAT_INC(net, delete); >> >> I can't spot the problem. Adam, please send me your .config file. >> >> > > It's the standard Fedora .config, which is attached. > > As I stated in another message, the oops seems related to VT-d. With > that disabled, the machine has been stable for nearly a day now. That probably only affects the timing of some race. Please also send me the IPv6 ruleset used on that machine. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Networking-related crash? 2009-12-14 13:16 ` Patrick McHardy @ 2009-12-14 13:46 ` Adam Huffman 2010-01-05 16:57 ` Adam Huffman 1 sibling, 0 replies; 8+ messages in thread From: Adam Huffman @ 2009-12-14 13:46 UTC (permalink / raw) To: Patrick McHardy Cc: Eric Dumazet, Avi Kivity, kvm, netdev, Netfilter Development Mailinglist On Mon, Dec 14, 2009 at 1:16 PM, Patrick McHardy <kaber@trash.net> wrote: > Adam Huffman wrote: >> On Thu, Dec 10, 2009 at 11:01 AM, Patrick McHardy <kaber@trash.net> wrote: >>> Eric Dumazet wrote: >>>> Le 09/12/2009 16:11, Avi Kivity a écrit : >>>>> On 12/09/2009 03:46 PM, Adam Huffman wrote: >>>>>> I've been seeing lots of crashes on a new Dell Precision T7500, >>>>>> running the KVM in Fedora 12. Finally managed to capture an Oops, >>>>>> which is shown below (hand-transcribed): >>>>>> >>>>>> BUG: unable to handle kernel paging request at 0000000000200200 >>>>>> IP: [<ffffffff8139aab7>] destroy_conntrack+0x82/0x11f >>>>>> PGD 332d0e067 PUD 33453c067 PMD 0 >>>>>> RIP: 0010:[<ffffffff8139aab7>] [<ffffffff8139aab7>] >>>>>> destroy_conntrack+0x82/0x11f >>>>>> RSP: 0018:ffffc90000803bf0 EFLAGS: 00010202 >>>>>> RAX: 0000000080000001 RBX: ffffffff816fb1a0 RCX: 000000000000752f >>>>>> RDX: 0000000000200200 RSI: 0000000000000011 RDI: ffffffff816fb1a0 >>>>>> RBP: ffffc90000803c00 R08: ffff880336699438 R09: 0000000000aaa5e0 >>>>>> R10: 00000002f54189d5 R11: 0000000000000001 R12: ffffffff819a92e0 >>>>>> R13: ffffffffa029adcc R14: 0000000000000000 R15: ffff880632866c38 >>>>>> FS: 00007fdd34b17710(0000) GS:ffffc90000800000(0000) >>>>>> knlGS:0000000000000000 >>>>>> CS: 0010 DS: 002B ES: 002B CR0: 0000000080050033 >>>>>> CR2: 0000000000200200 CR3: 00000003349c0000 CR4: 00000000000026e0 >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>> Process qemu-kvm (pid: 1759, threadinfo ffff88062e9e8000, task >>>>>> ffff880634945e00) >>>>>> Stack: >>>>>> ffff880632866c00 ffff880634640c30 ffffc90000803c10 ffffffff813989c2 >>>>>> <0> ffffc90000803c30 ffffffff81374092 ffffc90000803c30 ffff880632866c00 >>>>>> <0> ffffc90000803c50 ffffffff81373dd3 0000000200000000 ffff880632866c00 >>>>>> Call Trace: >>>>>> <IRQ> >>>>>> [<ffffffff813989c2>] nf_conntrack_destroy+0x1b/0x1d >>>>>> [<ffffffff81374092>] skb_release_head_state+0x95/0xd7 >>>>>> [<ffffffff81373dd3>] __kfree_skb+0x16/0x81 >>>>>> [<ffffffff81373ed7>] kfree_skb+0x6a/0x72 >>>>>> [<ffffffffa029adcc>] ip6_mc_input+0x220/0x230 [ipv6] >>>>>> [<ffffffffa029a3d1>] ip6_rcv_finish+0x27/0x2b [ipv6] >>>>>> [<ffffffffa029a763>] ipv6_rcv+0x38e/0x3e5 [ipv6] >>>>>> [<ffffffff8137bd91>] netif_receive_skb+0x402/0x427 >>>>>> ... >>>>>> >>>> crash in : >>>> 48 8b 43 08 mov 0x8(%rbx),%rax >>>> a8 01 test $0x1,%al >>>> 48 89 02 mov %rax,(%rdx) << HERE >> RDX=0x200200 (LIST_POISON2) >>>> 75 04 jne 1f >>>> 48 89 50 08 mov %rdx,0x8(%rax) >>>> 1: 48 c7 43 10 00 02 20 movq $0x200200,0x10(%rbx) >>>> >>>> if (!nf_ct_is_confirmed(ct)) { >>>> BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode)); >>>> hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); << HERE >> >>>> } >>>> NF_CT_STAT_INC(net, delete); >>> >>> I can't spot the problem. Adam, please send me your .config file. >>> >>> >> >> It's the standard Fedora .config, which is attached. >> >> As I stated in another message, the oops seems related to VT-d. With >> that disabled, the machine has been stable for nearly a day now. > > That probably only affects the timing of some race. Please also > send me the IPv6 ruleset used on that machine. Thanks. > Again, it's the Fedora 12 default. Here you go: *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -p ipv6-icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT -A INPUT -j REJECT --reject-with icmp6-adm-prohibited -A FORWARD -j REJECT --reject-with icmp6-adm-prohibited COMMIT ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Networking-related crash? 2009-12-14 13:16 ` Patrick McHardy 2009-12-14 13:46 ` Adam Huffman @ 2010-01-05 16:57 ` Adam Huffman 1 sibling, 0 replies; 8+ messages in thread From: Adam Huffman @ 2010-01-05 16:57 UTC (permalink / raw) To: Patrick McHardy Cc: Eric Dumazet, Avi Kivity, kvm, netdev, Netfilter Development Mailinglist On Mon, Dec 14, 2009 at 1:16 PM, Patrick McHardy <kaber@trash.net> wrote: > Adam Huffman wrote: >> On Thu, Dec 10, 2009 at 11:01 AM, Patrick McHardy <kaber@trash.net> wrote: >>> Eric Dumazet wrote: >>>> Le 09/12/2009 16:11, Avi Kivity a écrit : >>>>> On 12/09/2009 03:46 PM, Adam Huffman wrote: >>>>>> I've been seeing lots of crashes on a new Dell Precision T7500, >>>>>> running the KVM in Fedora 12. Finally managed to capture an Oops, >>>>>> which is shown below (hand-transcribed): >>>>>> >>>>>> BUG: unable to handle kernel paging request at 0000000000200200 >>>>>> IP: [<ffffffff8139aab7>] destroy_conntrack+0x82/0x11f >>>>>> PGD 332d0e067 PUD 33453c067 PMD 0 >>>>>> RIP: 0010:[<ffffffff8139aab7>] [<ffffffff8139aab7>] >>>>>> destroy_conntrack+0x82/0x11f >>>>>> RSP: 0018:ffffc90000803bf0 EFLAGS: 00010202 >>>>>> RAX: 0000000080000001 RBX: ffffffff816fb1a0 RCX: 000000000000752f >>>>>> RDX: 0000000000200200 RSI: 0000000000000011 RDI: ffffffff816fb1a0 >>>>>> RBP: ffffc90000803c00 R08: ffff880336699438 R09: 0000000000aaa5e0 >>>>>> R10: 00000002f54189d5 R11: 0000000000000001 R12: ffffffff819a92e0 >>>>>> R13: ffffffffa029adcc R14: 0000000000000000 R15: ffff880632866c38 >>>>>> FS: 00007fdd34b17710(0000) GS:ffffc90000800000(0000) >>>>>> knlGS:0000000000000000 >>>>>> CS: 0010 DS: 002B ES: 002B CR0: 0000000080050033 >>>>>> CR2: 0000000000200200 CR3: 00000003349c0000 CR4: 00000000000026e0 >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>> Process qemu-kvm (pid: 1759, threadinfo ffff88062e9e8000, task >>>>>> ffff880634945e00) >>>>>> Stack: >>>>>> ffff880632866c00 ffff880634640c30 ffffc90000803c10 ffffffff813989c2 >>>>>> <0> ffffc90000803c30 ffffffff81374092 ffffc90000803c30 ffff880632866c00 >>>>>> <0> ffffc90000803c50 ffffffff81373dd3 0000000200000000 ffff880632866c00 >>>>>> Call Trace: >>>>>> <IRQ> >>>>>> [<ffffffff813989c2>] nf_conntrack_destroy+0x1b/0x1d >>>>>> [<ffffffff81374092>] skb_release_head_state+0x95/0xd7 >>>>>> [<ffffffff81373dd3>] __kfree_skb+0x16/0x81 >>>>>> [<ffffffff81373ed7>] kfree_skb+0x6a/0x72 >>>>>> [<ffffffffa029adcc>] ip6_mc_input+0x220/0x230 [ipv6] >>>>>> [<ffffffffa029a3d1>] ip6_rcv_finish+0x27/0x2b [ipv6] >>>>>> [<ffffffffa029a763>] ipv6_rcv+0x38e/0x3e5 [ipv6] >>>>>> [<ffffffff8137bd91>] netif_receive_skb+0x402/0x427 >>>>>> ... >>>>>> >>>> crash in : >>>> 48 8b 43 08 mov 0x8(%rbx),%rax >>>> a8 01 test $0x1,%al >>>> 48 89 02 mov %rax,(%rdx) << HERE >> RDX=0x200200 (LIST_POISON2) >>>> 75 04 jne 1f >>>> 48 89 50 08 mov %rdx,0x8(%rax) >>>> 1: 48 c7 43 10 00 02 20 movq $0x200200,0x10(%rbx) >>>> >>>> if (!nf_ct_is_confirmed(ct)) { >>>> BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode)); >>>> hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); << HERE >> >>>> } >>>> NF_CT_STAT_INC(net, delete); >>> >>> I can't spot the problem. Adam, please send me your .config file. >>> >>> >> >> It's the standard Fedora .config, which is attached. >> >> As I stated in another message, the oops seems related to VT-d. With >> that disabled, the machine has been stable for nearly a day now. > > That probably only affects the timing of some race. Please also > send me the IPv6 ruleset used on that machine. Thanks. > Just to note that if I disable IPv6 completely, the machine is stable - certainly compared with the crashes after a few minutes when IPv6 is enabled. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-01-05 16:57 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <608c44bf0912090546s446bf973ne408e99661fdc56f@mail.gmail.com> 2009-12-09 15:11 ` Networking-related crash? Avi Kivity 2009-12-09 20:36 ` Adam Huffman 2009-12-09 20:54 ` Eric Dumazet 2009-12-10 11:01 ` Patrick McHardy 2009-12-10 15:18 ` Adam Huffman 2009-12-14 13:16 ` Patrick McHardy 2009-12-14 13:46 ` Adam Huffman 2010-01-05 16:57 ` Adam Huffman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).