From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alessandro Suardi Subject: Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines Date: Thu, 25 Aug 2005 19:26:41 +0200 Message-ID: <5a4c581d050825102678c27b4e@mail.gmail.com> References: <5a4c581d05082506395fa984ae@mail.gmail.com> <20050825165550.GC4442@rama.de.gnumonks.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Return-path: To: Harald Welte , Alessandro Suardi , netdev@oss.sgi.com, Linux Kernel Mailing List , netfilter-devel@lists.netfilter.org In-Reply-To: <20050825165550.GC4442@rama.de.gnumonks.org> Content-Disposition: inline Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org On 8/25/05, Harald Welte wrote: > On Thu, Aug 25, 2005 at 03:39:02PM +0200, Alessandro Suardi wrote: > > Howdy, and excuse me for crossposting - feel free to zap CC to > > unrelated, if any, mailing lists. > > > > just gave PeerGuardian a spin on my eDonkey home box and > > said box didn't last half a day before oopsing in netlink/nf/tcp > > related routines (or so it seems to my untrained eye). > > Yes, it indeed could be that there is some fishy interaction between the > tcp stack and ip_queue causing the oops. > > > K7800, 256MB RAM, uptodate FC3 running 2.6.13-rc6-git12, > > doing nothing but running MetaMachine's eDonkey 1.4.3 QT gui. > > PeerGuardian is the 1.5 beta version available from methlabs.org. > > Is it true that PeerGuardian is a proprietary application? I'm not > going to debug this problem using a proprietary ip_queue program, sorry. I'm not sure I understand the issue; I built PG from these sources: http://prdownloads.sourceforge.net/peerguardian/pglinux-1.5beta.tar.gz?download and I had to install the iptables-devel FC3 rpm to build. The PG sources seem to be licensed under GPLv2. But maybe you're referring to the fact that whatever PG does, it doesn't show up as output from 'iptables -L' ? > If you can produce a testcase with open source userspace ip_queue code, > I could look into reproducing the problem locally and debugging the > problem more thoroughly. So far the box has been running for over four hours, I'll configure my laptop as a netdump server hoping it might capture something if the ed2k box crashes again later. I'm afraid I won't be able to set up a real testcase (and btw, edonkey v1.4.3 from MetaMachine is actually a proprietary program, though entirely in userspace). > While it definitely is a kernel bug (whatever userspace sends should not > crash the kernel), it might be something that specifically [only] > PeerGuardian does to the packet. Something that ip_queue doesn't check > (but should check) on packet reinjection and therefore upsets the TCP stack. > > Also helpful would be the output of an "strace -f -x -s65535 -e > trace=sendmsg" on the PeerGuardian (daemon?) process. > > > > [] die+0xe4/0x170 > > [] do_trap+0x7f/0xc0 > > [] do_invalid_op+0xa3/0xb0 > > [] error_code+0x4f/0x54 > > [] kfree_skbmem+0xb/0x20 > > [] __kfree_skb+0x5f/0xf0 > > ok, so something down the chain from kfree_skb() results in an invalid > operation? looks more like some compiler problem, bad memory or memory > corruption to me. Try to reproduce the problem without PG. compiler is fc3's latest - gcc-3.4.4-2.fc3. I might have a go at memtest86 in the next weeks if more symptoms point at possible bad RAM. > > [] tcp_clean_rtx_queue+0x16a/0x470 > > [] tcp_ack+0xf6/0x360 > > [] tcp_rcv_established+0x277/0x7a0 > > [] tcp_v4_do_rcv+0xf0/0x110 > > [] tcp_v4_rcv+0x6e0/0x820 > > [] ip_local_deliver_finish+0x84/0x160 > > so something in the tcp stack ends up doing tcp_clean_rtx_queue() > > > [] nf_reinject+0x13a/0x1c0 > > [] ipq_issue_verdict+0x28/0x40 > > [] ipq_set_verdict+0x48/0x70 > > ip_queue reinjects a packet via nf_reinject() > > > [] ipq_receive_peer+0x39/0x50 > > [] ipq_receive_sk+0x172/0x190 > > ip_queue receives and ipq verdict msg packet from netlink > > > [] netlink_data_ready+0x35/0x60 > > [] netlink_sendskb+0x24/0x60 > > [] netlink_unicast+0x127/0x160 > > [] netlink_sendmsg+0x204/0x2b0 > > [] sock_sendmsg+0xb0/0xe0 > > [] sys_sendmsg+0x134/0x240 > > [] sys_socketcall+0x224/0x230 > > [] sysenter_past_esp+0x54/0x75 > > process sendmsg()s on the netlink socket. Thanks, --alessandro "Not every smile means I'm laughing inside" (Wallflowers - "From The Bottom Of My Heart")