From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [RFC/BUG] ipv6: bug in "ipv6: Copy cork options in ip6_append_data" Date: Sun, 16 Jun 2013 02:12:33 -0700 Message-ID: <1371373953.3252.162.camel@edumazet-glaptop> References: <1368742990.3301.67.camel@edumazet-glaptop> <20130615185131.GA2148@breakpoint.cc> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: David Miller , Herbert Xu , netdev , Hideaki YOSHIFUJI , Neal Cardwell To: Sebastian Andrzej Siewior Return-path: Received: from mail-wg0-f42.google.com ([74.125.82.42]:36509 "EHLO mail-wg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754923Ab3FPJMi (ORCPT ); Sun, 16 Jun 2013 05:12:38 -0400 Received: by mail-wg0-f42.google.com with SMTP id z11so2742811wgg.1 for ; Sun, 16 Jun 2013 02:12:36 -0700 (PDT) In-Reply-To: <20130615185131.GA2148@breakpoint.cc> Sender: netdev-owner@vger.kernel.org List-ID: On Sat, 2013-06-15 at 20:51 +0200, Sebastian Andrzej Siewior wrote: > On Thu, May 16, 2013 at 03:23:10PM -0700, Eric Dumazet wrote: > > Hi Herbert > Hi Eric, > > > Looking at the code added in commit 0178b695fd6b40a62a215cb > > ("ipv6: Copy cork options in ip6_append_data") it looks like we can have > > either a memleak or corruption (later in ip6_cork_release()) in case one > > of the sub-allocation (ip6_opt_dup()/ip6_rthdr_dup()) fails. > > Would this explain the following on 3.9.5? No, thats a different issue. > > | BUG: unable to handle kernel paging request at 00000000ffffc52c > | IP: [] ip6_append_data+0xb93/0xbea > | RIP: 0010:[] [] ip6_append_data+0xb93/0xbea > | RSP: 0018:ffff880072cf7a28 EFLAGS: 00010202 > | RAX: 00000000ffffc334 RBX: ffff88007c14cd80 RCX: 0000000000000008 > | RDX: 00000000ffffffe0 RSI: 0000000000000048 RDI: ffff88007c14cd80 > | RBP: 0000000000000000 R08: ffff880072cf7a98 R09: 0000000000000040 > | R10: 0000000000000000 R11: ffff88007c14cd80 R12: ffff88007c6208c0 > | R13: 0000000000000008 R14: 0000000000000000 R15: 000000000000fff0 > | FS: 00007f2342014700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 > | CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > | CR2: 00000000ffffc52c CR3: 0000000020799000 CR4: 00000000000006f0 > | DR0: 00000000327ff15b DR1: 0000000000000000 DR2: 0000000000000000 > | DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 > | Process trinity-child0 (pid: 31667, threadinfo ffff880072cf6000, task ffff880037509830) > | Stack: > | 0000000000000001 0000000000000400 0000000800000028 0000ffe800000000 > | 0000000000000000 0000000000000008 0000000000000008 ffff88007c14ce90 > | ffffffff812f9545 ffff880072cf7db8 0000000000000000 0000002000000010 > | Call Trace: > | [] ? ip_skb_dst_mtu+0x32/0x32 > | [] ? _raw_spin_lock_bh+0xe/0x1c > | [] ? should_resched+0x5/0x23 > | [] ? udpv6_sendmsg+0x668/0x84d > | [] ? sock_sendmsg+0x4f/0x6c > | [] ? __sys_sendmsg+0x1f2/0x284 > | [] ? _raw_spin_lock_irqsave+0x14/0x35 > | [] ? remove_wait_queue+0xe/0x48 > | [] ? _raw_spin_unlock_irqrestore+0xc/0xd > | [] ? n_tty_write+0x309/0x348 > | [] ? kvm_clock_read+0x1c/0x1e > | [] ? timerqueue_add+0x79/0x98 > | [] ? enqueue_hrtimer+0x36/0x6d > | [] ? _raw_spin_unlock_irqrestore+0xc/0xd > | [] ? fget_light+0x2e/0x7c > | [] ? sys_sendmsg+0x39/0x57 > | [] ? system_call_fastpath+0x16/0x1b > | Code: 00 0f 8f 12 fa ff ff e9 d9 f4 ff ff c7 44 24 70 f2 ff ff ff 8b 4c 24 14 29 8b e4 02 00 00 49 8b 84 24 48 01 00 00 48 85 c0 74 0c <48> 8b 80 f8 01 00 00 65 48 ff 40 70 48 8b 43 30 48 8b 80 70 01 > | RIP [] ip6_append_data+0xb93/0xbea > > unfortunately I have no idea how this happend. trinity was running a while and > I managed not to get any logs due to a pebkac. The RIP is at > > |IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS); > > |81342d1e: 49 8b 84 24 48 01 00 mov 0x148(%r12),%rax > |81342d25: 00 > |81342d26: 48 85 c0 test %rax,%rax > |81342d29: 74 0c je ffffffff81342d37 > |81342d2b: 48 8b 80 f8 01 00 00 mov 0x1f8(%rax),%rax > ^^^ > |81342d32: 65 48 ff 40 70 incq %gs:0x70(%rax) > > This looks like rt6i_idev is not NULL but it is also not a valid pointer since the > upper 32bit are NULL. Yep, this was discussed 2 months ago. Initial report from Dave Jones http://comments.gmane.org/gmane.linux.network/264030 So far, I am not sure we solved the problem. Could you try latest net-next tree ?