From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Greear Subject: Re: 3.7.3+: Bad paging request in ip_rcv_finish while running NFS traffic. Date: Wed, 23 Jan 2013 16:13:28 -0800 Message-ID: <51007CA8.2050105@candelatech.com> References: <50FDADF4.3060601@candelatech.com> <50FDDE35.7070806@candelatech.com> <1358829606.3464.3151.camel@edumazet-glaptop> <50FE2A57.3040804@candelatech.com> <50FEC796.5090404@candelatech.com> <1358875020.3464.4006.camel@edumazet-glaptop> <1358875607.3464.4020.camel@edumazet-glaptop> <50FF102F.2050008@candelatech.com> <50FF4BC9.1060206@candelatech.com> <5100785D.8040101@candelatech.com> <1358985688.12374.1247.camel@edumazet-glaptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: netdev , "linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" To: Eric Dumazet Return-path: In-Reply-To: <1358985688.12374.1247.camel@edumazet-glaptop> Sender: linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On 01/23/2013 04:01 PM, Eric Dumazet wrote: > On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote: >> On 01/22/2013 06:32 PM, Ben Greear wrote: >> >> So, I'm slowly making some progress. I've verified that the skb >> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish >> method. I'm trying to track it backwards and figure out which >> device it belongs to, etc....takes a while to reproduce though. >> >> One thing about this stack trace below...the dev_seq_stop() does >> a rcu read-unlock. Now, I can't figure out exactly how ip_rcv() >> can cause dev_seq_stop() to run, but if this stack is legit, >> then maybe by the time we enter the ip_rcv_finish() code we are >> running without rcu_readlock() held? >> >> If so, that would probably explain the bug. >> > > The whole thing is run under rcu_read_lock() done in > __netif_receive_skb() I was worried that the dev_seq_stop might be called incorrectly causing an asymetric unlock. I have no idea how that might happened, but several crashes have that dev_seq_stop method listed, so it got me suspicious. > > My suspicion was that we called netif_rx() from macvlan leaving a > not refcounted skb dst. > > But the patch I sent to you didnt solve the bug, so its something else. > > You could trace at which point the dst was released. (where you set > dst->input/output to deadbeef) My current code is in some garbage collector timer code, but I can work on saving the call-site that first pokes the dst into the garbage collection list... Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html