* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 [not found] ` <isavsg$3or$1@dough.gmane.org> @ 2011-06-03 16:07 ` Brad Campbell 2011-06-06 20:10 ` Bart De Schuymer 0 siblings, 1 reply; 21+ messages in thread From: Brad Campbell @ 2011-06-03 16:07 UTC (permalink / raw) To: kvm; +Cc: linux-mm, linux-kernel, netdev, netfilter-devel On 03/06/11 23:50, Bernhard Held wrote: > Am 03.06.2011 15:38, schrieb Brad Campbell: >> On 02/06/11 07:03, CaT wrote: >>> On Wed, Jun 01, 2011 at 07:52:33PM +0800, Brad Campbell wrote: >>>> Unfortunately the only interface that is mentioned by name anywhere >>>> in my firewall is $DMZ (which is ppp0 and not part of any bridge). >>>> >>>> All of the nat/dnat and other horrible hacks are based on IP addresses. >>> >>> Damn. Not referencing the bridge interfaces at all stopped our host from >>> going down in flames when we passed it a few packets. These are two >>> of the oopses we got from it. Whilst the kernel here is .35 we got the >>> same issue from a range of kernels. Seems related. >> >> Well, I tried sending an explanatory message to netdev, netfilter & >> cc'd to kvm, >> but it appears not to have made it to kvm or netfilter, and the cc to >> netdev has >> not elicited a response. My resend to netfilter seems to have dropped >> into the >> bit bucket also. > Just another reference 3.5 months ago: > http://www.spinics.net/lists/netfilter-devel/msg17239.html <waves hands around shouting "I have a reproducible test case for this and don't mind patching and crashing the machine to get it fixed"> Attempted to add netfilter-devel to the cc this time. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-03 16:07 ` KVM induced panic on 2.6.38[2367] & 2.6.39 Brad Campbell @ 2011-06-06 20:10 ` Bart De Schuymer 2011-06-06 20:23 ` Eric Dumazet 2011-06-07 3:33 ` Brad Campbell 0 siblings, 2 replies; 21+ messages in thread From: Bart De Schuymer @ 2011-06-06 20:10 UTC (permalink / raw) To: Brad Campbell; +Cc: kvm, linux-mm, linux-kernel, netdev, netfilter-devel Hi Brad, This has probably nothing to do with ebtables, so please rmmod in case it's loaded. A few questions I didn't directly see an answer to in the threads I scanned... I'm assuming you actually use the bridging firewall functionality. So, what iptables modules do you use? Can you reduce your iptables rules to a core that triggers the bug? Or does it get triggered even with an empty set of firewall rules? Are you using a stock .35 kernel or is it patched? Is this something I can trigger on a poor guy's laptop or does it require specialized hardware (I'm catching up on qemu/kvm...)? cheers, Bart PS: I'm not sure if we should keep CC-ing everybody, netfilter-devel together with kvm should probably do fine. Op 3/06/2011 18:07, Brad Campbell schreef: > On 03/06/11 23:50, Bernhard Held wrote: >> Am 03.06.2011 15:38, schrieb Brad Campbell: >>> On 02/06/11 07:03, CaT wrote: >>>> On Wed, Jun 01, 2011 at 07:52:33PM +0800, Brad Campbell wrote: >>>>> Unfortunately the only interface that is mentioned by name anywhere >>>>> in my firewall is $DMZ (which is ppp0 and not part of any bridge). >>>>> >>>>> All of the nat/dnat and other horrible hacks are based on IP >>>>> addresses. >>>> >>>> Damn. Not referencing the bridge interfaces at all stopped our host >>>> from >>>> going down in flames when we passed it a few packets. These are two >>>> of the oopses we got from it. Whilst the kernel here is .35 we got the >>>> same issue from a range of kernels. Seems related. >>> >>> Well, I tried sending an explanatory message to netdev, netfilter & >>> cc'd to kvm, >>> but it appears not to have made it to kvm or netfilter, and the cc to >>> netdev has >>> not elicited a response. My resend to netfilter seems to have dropped >>> into the >>> bit bucket also. >> Just another reference 3.5 months ago: >> http://www.spinics.net/lists/netfilter-devel/msg17239.html > > <waves hands around shouting "I have a reproducible test case for this > and don't mind patching and crashing the machine to get it fixed"> > > Attempted to add netfilter-devel to the cc this time. > -- > To unsubscribe from this list: send the line "unsubscribe > netfilter-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Bart De Schuymer www.artinalgorithms.be -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-06 20:10 ` Bart De Schuymer @ 2011-06-06 20:23 ` Eric Dumazet 2011-06-07 3:33 ` Brad Campbell 1 sibling, 0 replies; 21+ messages in thread From: Eric Dumazet @ 2011-06-06 20:23 UTC (permalink / raw) To: Bart De Schuymer Cc: Brad Campbell, kvm, linux-mm, linux-kernel, netdev, netfilter-devel Le lundi 06 juin 2011 à 22:10 +0200, Bart De Schuymer a écrit : > Hi Brad, > > This has probably nothing to do with ebtables, so please rmmod in case > it's loaded. > A few questions I didn't directly see an answer to in the threads I > scanned... > I'm assuming you actually use the bridging firewall functionality. So, > what iptables modules do you use? Can you reduce your iptables rules to > a core that triggers the bug? > Or does it get triggered even with an empty set of firewall rules? > Are you using a stock .35 kernel or is it patched? > Is this something I can trigger on a poor guy's laptop or does it > require specialized hardware (I'm catching up on qemu/kvm...)? > Keep netdev, as this most probably is a networking bug. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-06 20:10 ` Bart De Schuymer 2011-06-06 20:23 ` Eric Dumazet @ 2011-06-07 3:33 ` Brad Campbell 2011-06-07 13:30 ` Patrick McHardy 1 sibling, 1 reply; 21+ messages in thread From: Brad Campbell @ 2011-06-07 3:33 UTC (permalink / raw) To: Bart De Schuymer; +Cc: kvm, linux-mm, linux-kernel, netdev, netfilter-devel On 07/06/11 04:10, Bart De Schuymer wrote: > Hi Brad, > > This has probably nothing to do with ebtables, so please rmmod in case > it's loaded. > A few questions I didn't directly see an answer to in the threads I > scanned... > I'm assuming you actually use the bridging firewall functionality. So, > what iptables modules do you use? Can you reduce your iptables rules to > a core that triggers the bug? > Or does it get triggered even with an empty set of firewall rules? > Are you using a stock .35 kernel or is it patched? > Is this something I can trigger on a poor guy's laptop or does it > require specialized hardware (I'm catching up on qemu/kvm...)? Not specialised hardware as such, I've just not been able to reproduce it outside of this specific operating scenario. I can't trigger it with empty firewall rules as it relies on a DNAT to occur. If I try it directly to the internal IP address (as I have to without netfilter loaded) then of course nothing fails. It's a pain in the bum as a fault, but it's one I can easily reproduce as long as I use the same set of circumstances. I'll try using 3.0-rc2 (current git) tonight, and if I can reproduce it on that then I'll attempt to pare down the IPTABLES rules to a bare minimum. It is nothing to do with ebtables as I don't compile it. I'm not really sure about "bridging firewall" functionality. I just use a couple of hand coded bash scripts to set the tables up. brad@srv:~$ lsmod Module Size Used by xt_iprange 1637 1 xt_DSCP 2077 2 xt_length 1216 1 xt_CLASSIFY 1091 26 sch_sfq 6681 4 xt_CHECKSUM 1229 2 brad@srv:~$ lsmod Module Size Used by xt_iprange 1637 1 xt_DSCP 2077 2 xt_length 1216 1 xt_CLASSIFY 1091 26 sch_sfq 6681 4 xt_CHECKSUM 1229 2 ipt_REJECT 2277 1 ipt_MASQUERADE 1759 7 ipt_REDIRECT 1133 1 xt_recent 8223 2 xt_state 1226 5 iptable_nat 3993 1 nf_nat 16773 3 ipt_MASQUERADE,ipt_REDIRECT,iptable_nat nf_conntrack_ipv4 11868 8 iptable_nat,nf_nat nf_conntrack 60962 5 ipt_MASQUERADE,xt_state,iptable_nat,nf_nat,nf_conntrack_ipv4 nf_defrag_ipv4 1417 1 nf_conntrack_ipv4 xt_TCPMSS 2567 2 xt_tcpmss 1469 0 xt_tcpudp 2467 56 iptable_mangle 1487 1 pppoe 9574 2 pppox 2188 1 pppoe iptable_filter 1442 1 ip_tables 16762 3 iptable_nat,iptable_mangle,iptable_filter x_tables 20462 17 xt_iprange,xt_DSCP,xt_length,xt_CLASSIFY,xt_CHECKSUM,ipt_REJECT,ipt_MASQUERADE,ipt_REDIRECT,xt_recent,xt_state,iptable_nat,xt_TCPMSS,xt_tcpmss,xt_tcpudp,iptable_mangle,iptable_filter,ip_tables ppp_generic 24243 6 pppoe,pppox slhc 5293 1 ppp_generic cls_u32 6468 6 sch_htb 14432 2 deflate 1937 0 zlib_deflate 21228 1 deflate des_generic 16135 0 cbc 2721 0 ecb 1975 0 crypto_blkcipher 13645 2 cbc,ecb sha1_generic 2095 0 md5 4001 0 hmac 2977 0 crypto_hash 14519 3 sha1_generic,md5,hmac cryptomgr 2636 0 aead 6137 1 cryptomgr crypto_algapi 15289 9 deflate,des_generic,cbc,ecb,crypto_blkcipher,hmac,crypto_hash,cryptomgr,aead af_key 27372 0 fuse 66747 1 w83627ehf 32052 0 hwmon_vid 2867 1 w83627ehf vhost_net 16802 6 powernow_k8 12932 0 mperf 1263 1 powernow_k8 kvm_amd 53431 24 kvm 235155 1 kvm_amd pl2303 12732 1 xhci_hcd 62865 0 i2c_piix4 8391 0 k10temp 3183 0 usbserial 34452 3 pl2303 usb_storage 37887 1 usb_libusual 10999 1 usb_storage ohci_hcd 18105 0 ehci_hcd 33641 0 ahci 20748 4 usbcore 130936 7 pl2303,xhci_hcd,usbserial,usb_storage,usb_libusual,ohci_hcd,ehci_hcd libahci 21202 1 ahci sata_mv 26939 0 megaraid_sas 71659 14 Nat Table (external ip substituted for xxx.xxx.xxx.xxx) Chain PREROUTING (policy ACCEPT 1761K packets, 152M bytes) pkts bytes target prot opt in out source destination 5 210 DNAT udp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:1195 to:192.168.253.199 6 252 DNAT udp -- !ppp0 * 0.0.0.0/0 xxx.xxx.xxx.xxx udp dpt:1195 to:192.168.253.199 0 0 DNAT tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:25001 to:192.168.253.199:465 0 0 DNAT tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:25000 to:192.168.253.199:993 0 0 DNAT tcp -- !ppp0 * 0.0.0.0/0 xxx.xxx.xxx.xxx tcp dpt:25001 to:192.168.253.199:465 0 0 DNAT tcp -- !ppp0 * 0.0.0.0/0 xxx.xxx.xxx.xxx tcp dpt:25000 to:192.168.253.199:993 2 142 DNAT 47 -- ppp0 * 0.0.0.0/0 0.0.0.0/0 to:192.168.253.199 18 880 DNAT tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:1723 to:192.168.253.199 0 0 DNAT 47 -- !ppp0 * 0.0.0.0/0 xxx.xxx.xxx.xxx to:192.168.253.199 0 0 DNAT tcp -- !ppp0 * 0.0.0.0/0 xxx.xxx.xxx.xxx tcp dpt:1723 to:192.168.253.199 2969 149K DNAT tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:443 to:192.168.253.198 20 1280 DNAT tcp -- !ppp0 * 0.0.0.0/0 xxx.xxx.xxx.xxx tcp dpt:443 to:192.168.253.198 0 0 DNAT tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:3101 to:192.168.253.197 0 0 DNAT tcp -- !ppp0 * 0.0.0.0/0 xxx.xxx.xxx.xxx tcp dpt:3101 to:192.168.253.197 0 0 DNAT tcp -- !ppp0 * 0.0.0.0/0 xxx.xxx.xxx.xxx tcp dpt:4101 to:192.168.253.197 44528 2718K REDIRECT tcp -- !ppp0 * 0.0.0.0/0 !192.168.0.0/16 tcp dpt:80 redir ports 8080 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:3724 to:192.168.2.107 596K 33M DNAT tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpts:2001:2030 to:10.99.99.2 1420K 119M DNAT udp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 udp dpts:2001:2030 to:10.99.99.2 7483 449K DNAT all -- !ppp0 * 0.0.0.0/0 xxx.xxx.xxx.xxx to:192.168.2.1 Mangle Table Chain INPUT (policy ACCEPT 270K packets, 17M bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 170K packets, 12M bytes) pkts bytes target prot opt in out source destination Chain POSTROUTING (policy ACCEPT 2205K packets, 166M bytes) pkts bytes target prot opt in out source destination 0 0 MASQUERADE all -- * * 0.0.0.0/0 192.168.254.3 6 360 ACCEPT all -- * * 0.0.0.0/0 xxx.xxx.xxx.xxx 20424 2120K MASQUERADE all -- * ppp0 192.168.0.0/16 !192.168.0.0/16 0 0 MASQUERADE all -- * ppp0 10.0.0.0/24 0.0.0.0/0 3 204 MASQUERADE all -- * * 192.168.2.0/24 10.8.0.0/24 1418K 128M MASQUERADE all -- * * 10.99.99.0/24 0.0.0.0/0 68248 4095K MASQUERADE all -- * * 192.168.253.0/24 10.8.0.0/16 13305 2405K MASQUERADE all -- * * 192.168.253.0/24 !192.168.0.0/16 Chain PREROUTING (policy ACCEPT 278M packets, 293G bytes) pkts bytes target prot opt in out source destination 169 55528 CHECKSUM udp -- br1 * 0.0.0.0/0 0.0.0.0/0 udp dpt:67 CHECKSUM fill Chain INPUT (policy ACCEPT 180M packets, 250G bytes) pkts bytes target prot opt in out source destination Chain FORWARD (policy ACCEPT 98M packets, 44G bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 155M packets, 180G bytes) pkts bytes target prot opt in out source destination Chain POSTROUTING (policy ACCEPT 253M packets, 223G bytes) pkts bytes target prot opt in out source destination 165 54182 CHECKSUM udp -- * br1 0.0.0.0/0 0.0.0.0/0 udp spt:67 CHECKSUM fill 51 3712 CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp dpt:53 CLASSIFY set 1:20 85274 6454K CLASSIFY udp -- * ppp0 0.0.0.0/0 0.0.0.0/0 udp dpt:53 CLASSIFY set 1:20 187 257K CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp spt:81 CLASSIFY set 1:20 25M 1180M CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp flags:0x3F/0x10 state ESTABLISHED length 40:100 CLASSIFY set 1:15 728K 67M CLASSIFY icmp -- * ppp0 0.0.0.0/0 0.0.0.0/0 CLASSIFY set 1:15 231 23484 CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp dpt:2401 CLASSIFY set 1:15 65636 5610K CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 CLASSIFY set 1:10 2018 315K CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp spt:22 CLASSIFY set 1:10 80 10092 CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp dpt:3389 CLASSIFY set 1:10 26063 8910K CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 CLASSIFY set 1:15 932K 131M CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 CLASSIFY set 1:15 3511 267K CLASSIFY udp -- * ppp0 0.0.0.0/0 0.0.0.0/0 udp dpt:123 CLASSIFY set 1:10 0 0 CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp spt:20 CLASSIFY set 1:15 3 180 CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp dpt:20 CLASSIFY set 1:15 94058 38M CLASSIFY 47 -- * ppp0 0.0.0.0/0 0.0.0.0/0 CLASSIFY set 1:10 1086K 183M CLASSIFY udp -- * ppp0 0.0.0.0/0 0.0.0.0/0 udp spt:1194 CLASSIFY set 1:10 1086K 183M TOS udp -- * ppp0 0.0.0.0/0 0.0.0.0/0 udp spt:1194 TOS set 0x10/0x3f 48817 10M CLASSIFY udp -- * ppp0 0.0.0.0/0 0.0.0.0/0 udp spt:1195 CLASSIFY set 1:10 48817 10M TOS udp -- * ppp0 0.0.0.0/0 0.0.0.0/0 udp spt:1195 TOS set 0x10/0x3f 94058 38M CLASSIFY 47 -- * ppp0 0.0.0.0/0 0.0.0.0/0 CLASSIFY set 1:15 106 7207 CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp dpt:1863 CLASSIFY set 1:15 188K 34M CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp dpt:443 CLASSIFY set 1:15 51541 3327K CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp dpts:6660:6669 CLASSIFY set 1:15 0 0 CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp spts:2021:2030 CLASSIFY set 1:15 85 4944 CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp dpt:19999 CLASSIFY set 1:15 208K 86M CLASSIFY udp -- * * 0.0.0.0/0 0.0.0.0/0 source IP range 192.168.2.80-192.168.2.120 CLASSIFY set 1:10 0 0 CLASSIFY tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp spt:12345 CLASSIFY set 1:15 1 80 CLASSIFY udp -- * ppp0 0.0.0.0/0 0.0.0.0/0 udp spt:12345 CLASSIFY set 1:15 Default table Chain INPUT (policy ACCEPT 176M packets, 247G bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT udp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4569 1187K 582M ACCEPT udp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:1194 2 577 ACCEPT udp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:1195 28 1224 ACCEPT tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:3389 230 12372 tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 state NEW recent: SET name: DEFAULT side: source 3 180 DROP tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 state NEW recent: UPDATE seconds: 300 hit_count: 4 name: DEFAULT side: source 1750 143K ACCEPT tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 3 144 ACCEPT tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:113 120 6090 ACCEPT tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:81 36094 29M ACCEPT tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:25 1456K 1706M ACCEPT all -- ppp0 * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED 31047 2334K REJECT tcp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 tcp option=!2 reject-with tcp-reset 552K 60M ACCEPT all -- !ppp0 * 0.0.0.0/0 0.0.0.0/0 state NEW 13552 1207K ACCEPT icmp -- ppp0 * 0.0.0.0/0 0.0.0.0/0 5712 392K DROP all -- ppp0 * 0.0.0.0/0 0.0.0.0/0 Chain FORWARD (policy ACCEPT 98M packets, 44G bytes) pkts bytes target prot opt in out source destination 1207K 68M TCPMSS tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp flags:0x06/0x02 TCPMSS clamp to PMTU Chain OUTPUT (policy ACCEPT 155M packets, 180G bytes) pkts bytes target prot opt in out source destination 31675 1895K TCPMSS tcp -- * ppp0 0.0.0.0/0 0.0.0.0/0 tcp flags:0x06/0x02 TCPMSS clamp to PMTU lsmod ipt_REJECT 2277 1 ipt_MASQUERADE 1759 7 ipt_REDIRECT 1133 1 xt_recent 8223 2 xt_state 1226 5 iptable_nat 3993 1 nf_nat 16773 3 ipt_MASQUERADE,ipt_REDIRECT,iptable_nat nf_conntrack_ipv4 11868 8 iptable_nat,nf_nat nf_conntrack 60962 5 ipt_MASQUERADE,xt_state,iptable_nat,nf_nat,nf_conntrack_ipv4 nf_defrag_ipv4 1417 1 nf_conntrack_ipv4 xt_TCPMSS 2567 2 xt_tcpmss 1469 0 xt_tcpudp 2467 56 iptable_mangle 1487 1 pppoe 9574 2 pppox 2188 1 pppoe iptable_filter 1442 1 ip_tables 16762 3 iptable_nat,iptable_mangle,iptable_filter x_tables 20462 17 xt_iprange,xt_DSCP,xt_length,xt_CLASSIFY,xt_CHECKSUM,ipt_REJECT,ipt_MASQUERADE,ipt_REDIRECT,xt_recent,xt_state,iptable_nat,xt_TCPMSS,xt_tcpmss,xt_tcpudp,iptable_mangle,iptable_filter,ip_tables ppp_generic 24243 6 pppoe,pppox slhc 5293 1 ppp_generic cls_u32 6468 6 sch_htb 14432 2 deflate 1937 0 zlib_deflate 21228 1 deflate des_generic 16135 0 cbc 2721 0 ecb 1975 0 crypto_blkcipher 13645 2 cbc,ecb sha1_generic 2095 0 md5 4001 0 hmac 2977 0 crypto_hash 14519 3 sha1_generic,md5,hmac cryptomgr 2636 0 aead 6137 1 cryptomgr crypto_algapi 15289 9 deflate,des_generic,cbc,ecb,crypto_blkcipher,hmac,crypto_hash,cryptomgr,aead af_key 27372 0 fuse 66747 1 w83627ehf 32052 0 hwmon_vid 2867 1 w83627ehf vhost_net 16802 6 powernow_k8 12932 0 mperf 1263 1 powernow_k8 kvm_amd 53431 24 kvm 235155 1 kvm_amd pl2303 12732 1 xhci_hcd 62865 0 i2c_piix4 8391 0 k10temp 3183 0 usbserial 34452 3 pl2303 usb_storage 37887 1 usb_libusual 10999 1 usb_storage ohci_hcd 18105 0 ehci_hcd 33641 0 ahci 20748 4 usbcore 130936 7 pl2303,xhci_hcd,usbserial,usb_storage,usb_libusual,ohci_hcd,ehci_hcd libahci 21202 1 ahci sata_mv 26939 0 megaraid_sas 71659 14 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-07 3:33 ` Brad Campbell @ 2011-06-07 13:30 ` Patrick McHardy 2011-06-07 14:40 ` Brad Campbell 0 siblings, 1 reply; 21+ messages in thread From: Patrick McHardy @ 2011-06-07 13:30 UTC (permalink / raw) To: Brad Campbell Cc: Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel On 07.06.2011 05:33, Brad Campbell wrote: > On 07/06/11 04:10, Bart De Schuymer wrote: >> Hi Brad, >> >> This has probably nothing to do with ebtables, so please rmmod in case >> it's loaded. >> A few questions I didn't directly see an answer to in the threads I >> scanned... >> I'm assuming you actually use the bridging firewall functionality. So, >> what iptables modules do you use? Can you reduce your iptables rules to >> a core that triggers the bug? >> Or does it get triggered even with an empty set of firewall rules? >> Are you using a stock .35 kernel or is it patched? >> Is this something I can trigger on a poor guy's laptop or does it >> require specialized hardware (I'm catching up on qemu/kvm...)? > > Not specialised hardware as such, I've just not been able to reproduce > it outside of this specific operating scenario. The last similar problem we've had was related to the 32/64 bit compat code. Are you running 32 bit userspace on a 64 bit kernel? > I can't trigger it with empty firewall rules as it relies on a DNAT to > occur. If I try it directly to the internal IP address (as I have to > without netfilter loaded) then of course nothing fails. > > It's a pain in the bum as a fault, but it's one I can easily reproduce > as long as I use the same set of circumstances. > > I'll try using 3.0-rc2 (current git) tonight, and if I can reproduce it > on that then I'll attempt to pare down the IPTABLES rules to a bare > minimum. > > It is nothing to do with ebtables as I don't compile it. I'm not really > sure about "bridging firewall" functionality. I just use a couple of > hand coded bash scripts to set the tables up. >From one of your previous mails: > # CONFIG_BRIDGE_NF_EBTABLES is not set How about CONFIG_BRIDGE_NETFILTER? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-07 13:30 ` Patrick McHardy @ 2011-06-07 14:40 ` Brad Campbell 2011-06-07 15:35 ` Patrick McHardy 2011-06-07 18:04 ` Bart De Schuymer 0 siblings, 2 replies; 21+ messages in thread From: Brad Campbell @ 2011-06-07 14:40 UTC (permalink / raw) To: Patrick McHardy Cc: Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel On 07/06/11 21:30, Patrick McHardy wrote: > On 07.06.2011 05:33, Brad Campbell wrote: >> On 07/06/11 04:10, Bart De Schuymer wrote: >>> Hi Brad, >>> >>> This has probably nothing to do with ebtables, so please rmmod in case >>> it's loaded. >>> A few questions I didn't directly see an answer to in the threads I >>> scanned... >>> I'm assuming you actually use the bridging firewall functionality. So, >>> what iptables modules do you use? Can you reduce your iptables rules to >>> a core that triggers the bug? >>> Or does it get triggered even with an empty set of firewall rules? >>> Are you using a stock .35 kernel or is it patched? >>> Is this something I can trigger on a poor guy's laptop or does it >>> require specialized hardware (I'm catching up on qemu/kvm...)? >> >> Not specialised hardware as such, I've just not been able to reproduce >> it outside of this specific operating scenario. > > The last similar problem we've had was related to the 32/64 bit compat > code. Are you running 32 bit userspace on a 64 bit kernel? No, 32 bit Guest OS, but a completely 64 bit userspace on a 64 bit kernel. Userspace is current Debian Stable. Kernel is Vanilla and qemu-kvm is current git >> I can't trigger it with empty firewall rules as it relies on a DNAT to >> occur. If I try it directly to the internal IP address (as I have to >> without netfilter loaded) then of course nothing fails. >> >> It's a pain in the bum as a fault, but it's one I can easily reproduce >> as long as I use the same set of circumstances. >> >> I'll try using 3.0-rc2 (current git) tonight, and if I can reproduce it >> on that then I'll attempt to pare down the IPTABLES rules to a bare >> minimum. >> >> It is nothing to do with ebtables as I don't compile it. I'm not really >> sure about "bridging firewall" functionality. I just use a couple of >> hand coded bash scripts to set the tables up. > > From one of your previous mails: > >> # CONFIG_BRIDGE_NF_EBTABLES is not set > > How about CONFIG_BRIDGE_NETFILTER? > It was compiled in. With the following table set I was able to reproduce the problem on 3.0-rc2. Replaced my IP with xxx.xxx.xxx.xxx, but otherwise unmodified root@srv:~# iptables-save # Generated by iptables-save v1.4.10 on Tue Jun 7 22:11:30 2011 *filter :INPUT ACCEPT [978:107619] :FORWARD ACCEPT [142:7068] :OUTPUT ACCEPT [1659:291870] -A INPUT -i ppp0 -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT ! -i ppp0 -m state --state NEW -j ACCEPT -A INPUT -i ppp0 -j DROP COMMIT # Completed on Tue Jun 7 22:11:30 2011 # Generated by iptables-save v1.4.10 on Tue Jun 7 22:11:30 2011 *nat :PREROUTING ACCEPT [813:49170] :INPUT ACCEPT [91:7090] :OUTPUT ACCEPT [267:20731] :POSTROUTING ACCEPT [296:22281] -A PREROUTING -d xxx.xxx.xxx.xxx/32 ! -i ppp0 -p tcp -m tcp --dport 443 -j DNAT --to-destination 192.168.253.198 COMMIT # Completed on Tue Jun 7 22:11:30 2011 # Generated by iptables-save v1.4.10 on Tue Jun 7 22:11:30 2011 *mangle :PREROUTING ACCEPT [2729:274392] :INPUT ACCEPT [2508:262976] :FORWARD ACCEPT [142:7068] :OUTPUT ACCEPT [1674:293701] :POSTROUTING ACCEPT [2131:346411] -A FORWARD -o ppp0 -p tcp -m tcp --tcp-flags SYN,RST SYN -m tcpmss --mss 1400:1536 -j TCPMSS --clamp-mss-to-pmtu COMMIT # Completed on Tue Jun 7 22:11:30 2011 I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access the address the way I was doing it, so that's a no-go for me. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-07 14:40 ` Brad Campbell @ 2011-06-07 15:35 ` Patrick McHardy 2011-06-07 18:31 ` Eric Dumazet 2011-06-07 23:43 ` Brad Campbell 2011-06-07 18:04 ` Bart De Schuymer 1 sibling, 2 replies; 21+ messages in thread From: Patrick McHardy @ 2011-06-07 15:35 UTC (permalink / raw) To: Brad Campbell Cc: Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel On 07.06.2011 16:40, Brad Campbell wrote: > On 07/06/11 21:30, Patrick McHardy wrote: >> On 07.06.2011 05:33, Brad Campbell wrote: >>> On 07/06/11 04:10, Bart De Schuymer wrote: >>>> Hi Brad, >>>> >>>> This has probably nothing to do with ebtables, so please rmmod in case >>>> it's loaded. >>>> A few questions I didn't directly see an answer to in the threads I >>>> scanned... >>>> I'm assuming you actually use the bridging firewall functionality. So, >>>> what iptables modules do you use? Can you reduce your iptables rules to >>>> a core that triggers the bug? >>>> Or does it get triggered even with an empty set of firewall rules? >>>> Are you using a stock .35 kernel or is it patched? >>>> Is this something I can trigger on a poor guy's laptop or does it >>>> require specialized hardware (I'm catching up on qemu/kvm...)? >>> >>> Not specialised hardware as such, I've just not been able to reproduce >>> it outside of this specific operating scenario. >> >> The last similar problem we've had was related to the 32/64 bit compat >> code. Are you running 32 bit userspace on a 64 bit kernel? > > No, 32 bit Guest OS, but a completely 64 bit userspace on a 64 bit kernel. > > Userspace is current Debian Stable. Kernel is Vanilla and qemu-kvm is > current git > > >>> I can't trigger it with empty firewall rules as it relies on a DNAT to >>> occur. If I try it directly to the internal IP address (as I have to >>> without netfilter loaded) then of course nothing fails. >>> >>> It's a pain in the bum as a fault, but it's one I can easily reproduce >>> as long as I use the same set of circumstances. >>> >>> I'll try using 3.0-rc2 (current git) tonight, and if I can reproduce it >>> on that then I'll attempt to pare down the IPTABLES rules to a bare >>> minimum. >>> >>> It is nothing to do with ebtables as I don't compile it. I'm not really >>> sure about "bridging firewall" functionality. I just use a couple of >>> hand coded bash scripts to set the tables up. >> >> From one of your previous mails: >> >>> # CONFIG_BRIDGE_NF_EBTABLES is not set >> >> How about CONFIG_BRIDGE_NETFILTER? >> > > It was compiled in. > > With the following table set I was able to reproduce the problem on > 3.0-rc2. Replaced my IP with xxx.xxx.xxx.xxx, but otherwise unmodified Which kernel was the last version without this problem? > root@srv:~# iptables-save > # Generated by iptables-save v1.4.10 on Tue Jun 7 22:11:30 2011 > *filter > :INPUT ACCEPT [978:107619] > :FORWARD ACCEPT [142:7068] > :OUTPUT ACCEPT [1659:291870] > -A INPUT -i ppp0 -m state --state RELATED,ESTABLISHED -j ACCEPT > -A INPUT ! -i ppp0 -m state --state NEW -j ACCEPT > -A INPUT -i ppp0 -j DROP > COMMIT > # Completed on Tue Jun 7 22:11:30 2011 > # Generated by iptables-save v1.4.10 on Tue Jun 7 22:11:30 2011 > *nat > :PREROUTING ACCEPT [813:49170] > :INPUT ACCEPT [91:7090] > :OUTPUT ACCEPT [267:20731] > :POSTROUTING ACCEPT [296:22281] > -A PREROUTING -d xxx.xxx.xxx.xxx/32 ! -i ppp0 -p tcp -m tcp --dport 443 > -j DNAT --to-destination 192.168.253.198 > COMMIT > # Completed on Tue Jun 7 22:11:30 2011 > # Generated by iptables-save v1.4.10 on Tue Jun 7 22:11:30 2011 > *mangle > :PREROUTING ACCEPT [2729:274392] > :INPUT ACCEPT [2508:262976] > :FORWARD ACCEPT [142:7068] > :OUTPUT ACCEPT [1674:293701] > :POSTROUTING ACCEPT [2131:346411] > -A FORWARD -o ppp0 -p tcp -m tcp --tcp-flags SYN,RST SYN -m tcpmss --mss > 1400:1536 -j TCPMSS --clamp-mss-to-pmtu > COMMIT > # Completed on Tue Jun 7 22:11:30 2011 The main suspects would be NAT and TCPMSS. Did you also try whether the crash occurs with only one of these these rules? > I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access > the address the way I was doing it, so that's a no-go for me. That's really weird since you're apparently not using any bridge netfilter features. It shouldn't have any effect besides changing at which point ip_tables is invoked. How are your network devices configured (specifically any bridges)? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-07 15:35 ` Patrick McHardy @ 2011-06-07 18:31 ` Eric Dumazet 2011-06-07 22:57 ` Patrick McHardy 2011-06-07 23:43 ` Brad Campbell 1 sibling, 1 reply; 21+ messages in thread From: Eric Dumazet @ 2011-06-07 18:31 UTC (permalink / raw) To: Patrick McHardy Cc: Brad Campbell, Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit : > The main suspects would be NAT and TCPMSS. Did you also try whether > the crash occurs with only one of these these rules? > > > I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access > > the address the way I was doing it, so that's a no-go for me. > > That's really weird since you're apparently not using any bridge > netfilter features. It shouldn't have any effect besides changing > at which point ip_tables is invoked. How are your network devices > configured (specifically any bridges)? Something in the kernel does u16 *ptr = addr (given by kmalloc()) ptr[-1] = 0; Could be an off-one error in a memmove()/memcopy() or loop... I cant see a network issue here. I checked arch/x86/lib/memmove_64.S and it seems fine. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-07 18:31 ` Eric Dumazet @ 2011-06-07 22:57 ` Patrick McHardy 2011-06-08 0:18 ` Brad Campbell 0 siblings, 1 reply; 21+ messages in thread From: Patrick McHardy @ 2011-06-07 22:57 UTC (permalink / raw) To: Eric Dumazet Cc: Brad Campbell, Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel On 07.06.2011 20:31, Eric Dumazet wrote: > Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit : > >> The main suspects would be NAT and TCPMSS. Did you also try whether >> the crash occurs with only one of these these rules? >> >>> I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access >>> the address the way I was doing it, so that's a no-go for me. >> >> That's really weird since you're apparently not using any bridge >> netfilter features. It shouldn't have any effect besides changing >> at which point ip_tables is invoked. How are your network devices >> configured (specifically any bridges)? > > Something in the kernel does > > u16 *ptr = addr (given by kmalloc()) > > ptr[-1] = 0; > > Could be an off-one error in a memmove()/memcopy() or loop... > > I cant see a network issue here. So far me neither, but netfilter appears to trigger the bug. > I checked arch/x86/lib/memmove_64.S and it seems fine. I was thinking it might be a missing skb_make_writable() combined with vhost_net specifics in the netfilter code (TCPMSS and NAT are both suspect), but was unable to find something. I also went through the dst_metrics() conversion to see whether anything could cause problems with the bridge fake_rttable, but also nothing so far. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-07 22:57 ` Patrick McHardy @ 2011-06-08 0:18 ` Brad Campbell 2011-06-08 3:59 ` Eric Dumazet 0 siblings, 1 reply; 21+ messages in thread From: Brad Campbell @ 2011-06-08 0:18 UTC (permalink / raw) To: Patrick McHardy Cc: Eric Dumazet, Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel On 08/06/11 06:57, Patrick McHardy wrote: > On 07.06.2011 20:31, Eric Dumazet wrote: >> Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit : >> >>> The main suspects would be NAT and TCPMSS. Did you also try whether >>> the crash occurs with only one of these these rules? >>> >>>> I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access >>>> the address the way I was doing it, so that's a no-go for me. >>> >>> That's really weird since you're apparently not using any bridge >>> netfilter features. It shouldn't have any effect besides changing >>> at which point ip_tables is invoked. How are your network devices >>> configured (specifically any bridges)? >> >> Something in the kernel does >> >> u16 *ptr = addr (given by kmalloc()) >> >> ptr[-1] = 0; >> >> Could be an off-one error in a memmove()/memcopy() or loop... >> >> I cant see a network issue here. > > So far me neither, but netfilter appears to trigger the bug. Would it help if I tried some older kernels? This issue only surfaced for me recently as I only installed the VM's in question about 12 weeks ago and have only just started really using them in anger. I could try reproducing it on progressively older kernels to see if I can find one that works and then bisect from there. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-08 0:18 ` Brad Campbell @ 2011-06-08 3:59 ` Eric Dumazet 2011-06-08 17:02 ` Brad Campbell 0 siblings, 1 reply; 21+ messages in thread From: Eric Dumazet @ 2011-06-08 3:59 UTC (permalink / raw) To: Brad Campbell Cc: Patrick McHardy, Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel Le mercredi 08 juin 2011 à 08:18 +0800, Brad Campbell a écrit : > On 08/06/11 06:57, Patrick McHardy wrote: > > On 07.06.2011 20:31, Eric Dumazet wrote: > >> Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit : > >> > >>> The main suspects would be NAT and TCPMSS. Did you also try whether > >>> the crash occurs with only one of these these rules? > >>> > >>>> I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access > >>>> the address the way I was doing it, so that's a no-go for me. > >>> > >>> That's really weird since you're apparently not using any bridge > >>> netfilter features. It shouldn't have any effect besides changing > >>> at which point ip_tables is invoked. How are your network devices > >>> configured (specifically any bridges)? > >> > >> Something in the kernel does > >> > >> u16 *ptr = addr (given by kmalloc()) > >> > >> ptr[-1] = 0; > >> > >> Could be an off-one error in a memmove()/memcopy() or loop... > >> > >> I cant see a network issue here. > > > > So far me neither, but netfilter appears to trigger the bug. > > Would it help if I tried some older kernels? This issue only surfaced > for me recently as I only installed the VM's in question about 12 weeks > ago and have only just started really using them in anger. I could try > reproducing it on progressively older kernels to see if I can find one > that works and then bisect from there. Well, a bisection definitely should help, but needs a lot of time in your case. Could you try following patch, because this is the 'usual suspect' I had yesterday : diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 46cbd28..9f548f9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -792,6 +792,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta; } +#if 0 if (fastpath && size + sizeof(struct skb_shared_info) <= ksize(skb->head)) { memmove(skb->head + size, skb_shinfo(skb), @@ -802,7 +803,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, off = nhead; goto adjust_others; } - +#endif data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask); if (!data) goto nodata; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-08 3:59 ` Eric Dumazet @ 2011-06-08 17:02 ` Brad Campbell 2011-06-08 21:22 ` Eric Dumazet 2011-06-10 2:52 ` Simon Horman 0 siblings, 2 replies; 21+ messages in thread From: Brad Campbell @ 2011-06-08 17:02 UTC (permalink / raw) To: Eric Dumazet Cc: Patrick McHardy, Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel On 08/06/11 11:59, Eric Dumazet wrote: > Well, a bisection definitely should help, but needs a lot of time in > your case. Yes. compile, test, crash, walk out to the other building to press reset, lather, rinse, repeat. I need a reset button on the end of a 50M wire, or a hardware watchdog! Actually it's not so bad. If I turn off slub debugging the kernel panics and reboots itself. This.. : [ 2.913034] netconsole: remote ethernet address 00:16:cb:a7:dd:d1 [ 2.913066] netconsole: device eth0 not up yet, forcing it [ 3.660062] Refined TSC clocksource calibration: 3213.422 MHz. [ 3.660118] Switching to clocksource tsc [ 63.200273] r8169 0000:03:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168e-1.fw (-2) [ 63.223513] r8169 0000:03:00.0: eth0: link down [ 63.223556] r8169 0000:03:00.0: eth0: link down ..is slowing down reboots considerably. 3.0-rc does _not_ like some timing hardware in my machine. Having said that, at least it does not randomly panic on SCSI like 2.6.39 does. Ok, I've ruled out TCPMSS. Found out where it was being set and neutered it. I've replicated it with only the single DNAT rule. > Could you try following patch, because this is the 'usual suspect' I had > yesterday : > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 46cbd28..9f548f9 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -792,6 +792,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, > fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta; > } > > +#if 0 > if (fastpath&& > size + sizeof(struct skb_shared_info)<= ksize(skb->head)) { > memmove(skb->head + size, skb_shinfo(skb), > @@ -802,7 +803,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, > off = nhead; > goto adjust_others; > } > - > +#endif > data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask); > if (!data) > goto nodata; > > > Nope.. that's not it. <sigh> That might have changed the characteristic of the fault slightly, but unfortunately I got caught with a couple of fsck's, so I only got to test it 3 times tonight. It's unfortunate that this is a production system, so I can only take it down between about 9pm and 1am. That would normally be pretty productive, except that an fsck of a 14TB ext4 can take 30 minutes if it panics at the wrong time. I'm out of time tonight, but I'll have a crack at some bisection tomorrow night. Now I just have to go back far enough that it works, and be near enough not to have to futz around with /proc /sys or drivers. I really, really, really appreciate you guys helping me with this. It has been driving me absolutely bonkers. If I'm ever in the same town as any of you, dinner and drinks are on me. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-08 17:02 ` Brad Campbell @ 2011-06-08 21:22 ` Eric Dumazet 2011-06-10 2:52 ` Simon Horman 1 sibling, 0 replies; 21+ messages in thread From: Eric Dumazet @ 2011-06-08 21:22 UTC (permalink / raw) To: Brad Campbell Cc: Patrick McHardy, Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel Le jeudi 09 juin 2011 à 01:02 +0800, Brad Campbell a écrit : > On 08/06/11 11:59, Eric Dumazet wrote: > > > Well, a bisection definitely should help, but needs a lot of time in > > your case. > > Yes. compile, test, crash, walk out to the other building to press > reset, lather, rinse, repeat. > > I need a reset button on the end of a 50M wire, or a hardware watchdog! > > Actually it's not so bad. If I turn off slub debugging the kernel panics > and reboots itself. > > This.. : > [ 2.913034] netconsole: remote ethernet address 00:16:cb:a7:dd:d1 > [ 2.913066] netconsole: device eth0 not up yet, forcing it > [ 3.660062] Refined TSC clocksource calibration: 3213.422 MHz. > [ 3.660118] Switching to clocksource tsc > [ 63.200273] r8169 0000:03:00.0: eth0: unable to load firmware patch > rtl_nic/rtl8168e-1.fw (-2) > [ 63.223513] r8169 0000:03:00.0: eth0: link down > [ 63.223556] r8169 0000:03:00.0: eth0: link down > > ..is slowing down reboots considerably. 3.0-rc does _not_ like some > timing hardware in my machine. Having said that, at least it does not > randomly panic on SCSI like 2.6.39 does. > > Ok, I've ruled out TCPMSS. Found out where it was being set and neutered > it. I've replicated it with only the single DNAT rule. > > > > Could you try following patch, because this is the 'usual suspect' I had > > yesterday : > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index 46cbd28..9f548f9 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -792,6 +792,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, > > fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta; > > } > > > > +#if 0 > > if (fastpath&& > > size + sizeof(struct skb_shared_info)<= ksize(skb->head)) { > > memmove(skb->head + size, skb_shinfo(skb), > > @@ -802,7 +803,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, > > off = nhead; > > goto adjust_others; > > } > > - > > +#endif > > data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask); > > if (!data) > > goto nodata; > > > > > > > > Nope.. that's not it. <sigh> That might have changed the characteristic > of the fault slightly, but unfortunately I got caught with a couple of > fsck's, so I only got to test it 3 times tonight. > > It's unfortunate that this is a production system, so I can only take it > down between about 9pm and 1am. That would normally be pretty > productive, except that an fsck of a 14TB ext4 can take 30 minutes if it > panics at the wrong time. > > I'm out of time tonight, but I'll have a crack at some bisection > tomorrow night. Now I just have to go back far enough that it works, and > be near enough not to have to futz around with /proc /sys or drivers. > > I really, really, really appreciate you guys helping me with this. It > has been driving me absolutely bonkers. If I'm ever in the same town as > any of you, dinner and drinks are on me. Hmm, I wonder if kmemcheck could help you, but its slow as hell, so not appropriate for production :( -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-08 17:02 ` Brad Campbell 2011-06-08 21:22 ` Eric Dumazet @ 2011-06-10 2:52 ` Simon Horman 2011-06-10 12:37 ` Mark Lord 2011-06-12 15:38 ` Avi Kivity 1 sibling, 2 replies; 21+ messages in thread From: Simon Horman @ 2011-06-10 2:52 UTC (permalink / raw) To: Brad Campbell Cc: Eric Dumazet, Patrick McHardy, Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel On Thu, Jun 09, 2011 at 01:02:13AM +0800, Brad Campbell wrote: > On 08/06/11 11:59, Eric Dumazet wrote: > > >Well, a bisection definitely should help, but needs a lot of time in > >your case. > > Yes. compile, test, crash, walk out to the other building to press > reset, lather, rinse, repeat. > > I need a reset button on the end of a 50M wire, or a hardware watchdog! Not strictly on-topic, but in situations where I have machines that either don't have lights-out facilities or have broken ones I find that network controlled power switches to be very useful. At one point I would have need an 8000km long wire to the reset switch :-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-10 2:52 ` Simon Horman @ 2011-06-10 12:37 ` Mark Lord 2011-06-10 16:43 ` Henrique de Moraes Holschuh 2011-06-12 15:38 ` Avi Kivity 1 sibling, 1 reply; 21+ messages in thread From: Mark Lord @ 2011-06-10 12:37 UTC (permalink / raw) To: Simon Horman Cc: Brad Campbell, Eric Dumazet, Patrick McHardy, Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel On 11-06-09 10:52 PM, Simon Horman wrote: > On Thu, Jun 09, 2011 at 01:02:13AM +0800, Brad Campbell wrote: >> On 08/06/11 11:59, Eric Dumazet wrote: >> >>> Well, a bisection definitely should help, but needs a lot of time in >>> your case. >> >> Yes. compile, test, crash, walk out to the other building to press >> reset, lather, rinse, repeat. >> >> I need a reset button on the end of a 50M wire, or a hardware watchdog! Something many of us don't realize is that nearly all Intel chipsets have a built-in hardware watchdog timer. This includes chipset for consumer desktop boards as well as the big iron server stuff. It's the "i8xx_tco" driver in the kernel enables use of them: modprobe i8xx_tco Cheers ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-10 12:37 ` Mark Lord @ 2011-06-10 16:43 ` Henrique de Moraes Holschuh 0 siblings, 0 replies; 21+ messages in thread From: Henrique de Moraes Holschuh @ 2011-06-10 16:43 UTC (permalink / raw) To: Mark Lord Cc: Simon Horman, Brad Campbell, Eric Dumazet, Patrick McHardy, Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel On Fri, 10 Jun 2011, Mark Lord wrote: > Something many of us don't realize is that nearly all Intel chipsets > have a built-in hardware watchdog timer. This includes chipset for > consumer desktop boards as well as the big iron server stuff. > > It's the "i8xx_tco" driver in the kernel enables use of them: That's the old module name, but yes, it is very useful in desktops and laptops (when it works). Server-class hardware will have a baseboard management unit that can really power-cycle the system instead of just rebooting. And test it first before you depend on it triggering at a remote location, as the firmware might cause the Intel chipset watchdog to actually hang the box instead of causing a proper reboot (happens on the IBM thinkpad T43, for example). -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-10 2:52 ` Simon Horman 2011-06-10 12:37 ` Mark Lord @ 2011-06-12 15:38 ` Avi Kivity 1 sibling, 0 replies; 21+ messages in thread From: Avi Kivity @ 2011-06-12 15:38 UTC (permalink / raw) To: Simon Horman Cc: Brad Campbell, Eric Dumazet, Patrick McHardy, Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel On 06/10/2011 05:52 AM, Simon Horman wrote: > At one point I would have need an 8000km long wire to the reset switch :-) Even more off-topic, there has been a case when a 200,000,000 km long wire to the reset button was needed. IIRC they got away with a watchdog. -- error compiling committee.c: too many arguments to function -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-07 15:35 ` Patrick McHardy 2011-06-07 18:31 ` Eric Dumazet @ 2011-06-07 23:43 ` Brad Campbell 1 sibling, 0 replies; 21+ messages in thread From: Brad Campbell @ 2011-06-07 23:43 UTC (permalink / raw) To: Patrick McHardy Cc: Bart De Schuymer, kvm, linux-mm, linux-kernel, netdev, netfilter-devel On 07/06/11 23:35, Patrick McHardy wrote: > The main suspects would be NAT and TCPMSS. Did you also try whether > the crash occurs with only one of these these rules? To be honest I'm actually having trouble finding where TCPMSS is actually set in that ruleset. This is a production machine so I can only take it down after about 9PM at night. I'll have another crack at it tonight. >> I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access >> the address the way I was doing it, so that's a no-go for me. > > That's really weird since you're apparently not using any bridge > netfilter features. It shouldn't have any effect besides changing > at which point ip_tables is invoked. How are your network devices > configured (specifically any bridges)? > I have one bridge with all my virtual machines on it. In this particular instance the packets leave VM A destined for the IP address of ppp0 (the external interface). This is intercepted by the DNAT PREROUTING rule above and shunted back to VM B. The VM's are on br1 and the external address is ppp0. Without CONFIG_BRIDGE_NETFILTER compiled in I can see the traffic entering and leaving VM B with tcpdump, but the packets never seem to get back to VM A. VM A is XP 32 bit, VM B is Linux. I have some other Linux VM's, so I'll do some more testing tonight between those to see where the packets are going without CONFIG_BRIDGE_NETFILTER set. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-07 14:40 ` Brad Campbell 2011-06-07 15:35 ` Patrick McHardy @ 2011-06-07 18:04 ` Bart De Schuymer 2011-06-08 0:15 ` Brad Campbell 1 sibling, 1 reply; 21+ messages in thread From: Bart De Schuymer @ 2011-06-07 18:04 UTC (permalink / raw) To: Brad Campbell Cc: Patrick McHardy, kvm, linux-mm, linux-kernel, netdev, netfilter-devel Op 7/06/2011 16:40, Brad Campbell schreef: > On 07/06/11 21:30, Patrick McHardy wrote: >> On 07.06.2011 05:33, Brad Campbell wrote: >>> On 07/06/11 04:10, Bart De Schuymer wrote: >>>> Hi Brad, >>>> >>>> This has probably nothing to do with ebtables, so please rmmod in case >>>> it's loaded. >>>> A few questions I didn't directly see an answer to in the threads I >>>> scanned... >>>> I'm assuming you actually use the bridging firewall functionality. So, >>>> what iptables modules do you use? Can you reduce your iptables >>>> rules to >>>> a core that triggers the bug? >>>> Or does it get triggered even with an empty set of firewall rules? >>>> Are you using a stock .35 kernel or is it patched? >>>> Is this something I can trigger on a poor guy's laptop or does it >>>> require specialized hardware (I'm catching up on qemu/kvm...)? >>> >>> Not specialised hardware as such, I've just not been able to reproduce >>> it outside of this specific operating scenario. >> >> The last similar problem we've had was related to the 32/64 bit compat >> code. Are you running 32 bit userspace on a 64 bit kernel? > > No, 32 bit Guest OS, but a completely 64 bit userspace on a 64 bit > kernel. > > Userspace is current Debian Stable. Kernel is Vanilla and qemu-kvm is > current git > If the bug is easily triggered with your guest os, then you could try to capture the traffic with wireshark (or something else) in a configuration that doesn't crash your system. Save the traffic in a pcap file. Then you can see if resending that traffic in the vulnerable configuration triggers the bug (I don't know if something in Windows exists, but tcpreplay should work for Linux). Once you have such a capture , chances are the bug is even easily reproducible by us (unless it's hardware-specific). Success isn't guaranteed, but I think it's worth a shot... cheers, Bart -- Bart De Schuymer www.artinalgorithms.be -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 2011-06-07 18:04 ` Bart De Schuymer @ 2011-06-08 0:15 ` Brad Campbell 0 siblings, 0 replies; 21+ messages in thread From: Brad Campbell @ 2011-06-08 0:15 UTC (permalink / raw) To: Bart De Schuymer Cc: Patrick McHardy, kvm, linux-mm, linux-kernel, netdev, netfilter-devel On 08/06/11 02:04, Bart De Schuymer wrote: > If the bug is easily triggered with your guest os, then you could try to > capture the traffic with wireshark (or something else) in a > configuration that doesn't crash your system. Save the traffic in a pcap > file. Then you can see if resending that traffic in the vulnerable > configuration triggers the bug (I don't know if something in Windows > exists, but tcpreplay should work for Linux). Once you have such a > capture , chances are the bug is even easily reproducible by us (unless > it's hardware-specific). Success isn't guaranteed, but I think it's > worth a shot... The issue with this is I don't have a configuration that does not crash the system. This only happens under the specific circumstance that traffic from VM A is being DNAT'd to VM B. If I disable CONFIG_BRIDGE_NETFILTER, or I leave out the DNAT then I can't replicate the problem as I don't seem to be able to get the packets to go where I want them to go. Let me try and explain it a little more clearly with made up IP addresses to illustrate the problem. I have VM A (1.1.1.2) and VM B (1.1.1.3) on br1 (1.1.1.1) I have public IP on ppp0 (2.2.2.2). VM B can talk to VM A using its host address (1.1.1.2) and there is no problem. The DNAT says anything destined for PPP0 that is on port 443 and coming from anywhere other than PPP0 (ie inside the network) is to be DNAT'd to 1.1.1.3. So VM B (1.1.1.3) tries to connect to ppp0 (2.2.2.2) on port 443, and this is redirected to VM B on 1.1.1.2. Only under this specific circumstance does the problem occur. I can get VM B (1.1.1.3) to talk directly to VM A (1.1.1.2) all day long and there is no problem, it's only when VM B tries to talk to ppp0 that there is an issue (and it happens within seconds of the initial connection). All these tests have been performed with VM B being a Windows XP guest. Tonight I'll try it with a Linux guest and see if I can make it happen. If that works I might be able to come up with some reproducible test case for you. I have a desktop machine that has Intel VT extensions, so I'll work toward making a portable test case. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <4DEB3AE4.8040700@redhat.com>]
[parent not found: <4DEB8872.2060801@fnarfbargle.com>]
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39 [not found] ` <4DEB8872.2060801@fnarfbargle.com> @ 2011-06-05 13:58 ` Avi Kivity 0 siblings, 0 replies; 21+ messages in thread From: Avi Kivity @ 2011-06-05 13:58 UTC (permalink / raw) To: Brad Campbell Cc: CaT, Hugh Dickins, Andrea Arcangeli, Borislav Petkov, linux-kernel, kvm, linux-mm, netdev, netfilter-devel On 06/05/2011 04:45 PM, Brad Campbell wrote: >> The mailing list might be set not to send your own mails back to you. >> Check the list archive. > > > Yep, I did that first.. > > Given the response to previous issues along the same line, it looks a > bit like I just remember not to actually use the system in the way > that triggers the bug and be happy that 99% of the time the kernel > does not panic, but have that lovely feeling in the back of the skull > that says "any time now, and without obvious reason the whole machine > might just come crashing down".. > > I guess it's still better than running Xen or Windows.. Not at all. Can some networking/netfilter expert look at this? Please file a bug with all the relevant information in this thread. If you can look for a previous version that worked, that might increase the chances of the bug being resolved faster. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2011-06-12 15:38 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20110601011527.GN19505@random.random> [not found] ` <alpine.LSU.2.00.1105312120530.22808@sister.anvils> [not found] ` <4DE5DCA8.7070704@fnarfbargle.com> [not found] ` <4DE5E29E.7080009@redhat.com> [not found] ` <4DE60669.9050606@fnarfbargle.com> [not found] ` <4DE60918.3010008@redhat.com> [not found] ` <4DE60940.1070107@redhat.com> [not found] ` <4DE61A2B.7000008@fnarfbargle.com> [not found] ` <20110601111841.GB3956@zip.com.au> [not found] ` <4DE62801.9080804@fnarfbargle.com> [not found] ` <20110601230342.GC3956@zip.com.au> [not found] ` <4DE8E3ED.7080004@fnarfbargle.com> [not found] ` <isavsg$3or$1@dough.gmane.org> 2011-06-03 16:07 ` KVM induced panic on 2.6.38[2367] & 2.6.39 Brad Campbell 2011-06-06 20:10 ` Bart De Schuymer 2011-06-06 20:23 ` Eric Dumazet 2011-06-07 3:33 ` Brad Campbell 2011-06-07 13:30 ` Patrick McHardy 2011-06-07 14:40 ` Brad Campbell 2011-06-07 15:35 ` Patrick McHardy 2011-06-07 18:31 ` Eric Dumazet 2011-06-07 22:57 ` Patrick McHardy 2011-06-08 0:18 ` Brad Campbell 2011-06-08 3:59 ` Eric Dumazet 2011-06-08 17:02 ` Brad Campbell 2011-06-08 21:22 ` Eric Dumazet 2011-06-10 2:52 ` Simon Horman 2011-06-10 12:37 ` Mark Lord 2011-06-10 16:43 ` Henrique de Moraes Holschuh 2011-06-12 15:38 ` Avi Kivity 2011-06-07 23:43 ` Brad Campbell 2011-06-07 18:04 ` Bart De Schuymer 2011-06-08 0:15 ` Brad Campbell [not found] ` <4DEB3AE4.8040700@redhat.com> [not found] ` <4DEB8872.2060801@fnarfbargle.com> 2011-06-05 13:58 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).