From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail.candelatech.com ([208.74.158.172]:50597 "EHLO ns3.lanforge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751465Ab3AXANc (ORCPT ); Wed, 23 Jan 2013 19:13:32 -0500 Message-ID: <51007CA8.2050105@candelatech.com> Date: Wed, 23 Jan 2013 16:13:28 -0800 From: Ben Greear MIME-Version: 1.0 To: Eric Dumazet CC: netdev , "linux-nfs@vger.kernel.org" Subject: Re: 3.7.3+: Bad paging request in ip_rcv_finish while running NFS traffic. References: <50FDADF4.3060601@candelatech.com> <50FDDE35.7070806@candelatech.com> <1358829606.3464.3151.camel@edumazet-glaptop> <50FE2A57.3040804@candelatech.com> <50FEC796.5090404@candelatech.com> <1358875020.3464.4006.camel@edumazet-glaptop> <1358875607.3464.4020.camel@edumazet-glaptop> <50FF102F.2050008@candelatech.com> <50FF4BC9.1060206@candelatech.com> <5100785D.8040101@candelatech.com> <1358985688.12374.1247.camel@edumazet-glaptop> In-Reply-To: <1358985688.12374.1247.camel@edumazet-glaptop> Content-Type: text/plain; charset=UTF-8; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: On 01/23/2013 04:01 PM, Eric Dumazet wrote: > On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote: >> On 01/22/2013 06:32 PM, Ben Greear wrote: >> >> So, I'm slowly making some progress. I've verified that the skb >> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish >> method. I'm trying to track it backwards and figure out which >> device it belongs to, etc....takes a while to reproduce though. >> >> One thing about this stack trace below...the dev_seq_stop() does >> a rcu read-unlock. Now, I can't figure out exactly how ip_rcv() >> can cause dev_seq_stop() to run, but if this stack is legit, >> then maybe by the time we enter the ip_rcv_finish() code we are >> running without rcu_readlock() held? >> >> If so, that would probably explain the bug. >> > > The whole thing is run under rcu_read_lock() done in > __netif_receive_skb() I was worried that the dev_seq_stop might be called incorrectly causing an asymetric unlock. I have no idea how that might happened, but several crashes have that dev_seq_stop method listed, so it got me suspicious. > > My suspicion was that we called netif_rx() from macvlan leaving a > not refcounted skb dst. > > But the patch I sent to you didnt solve the bug, so its something else. > > You could trace at which point the dst was released. (where you set > dst->input/output to deadbeef) My current code is in some garbage collector timer code, but I can work on saving the call-site that first pokes the dst into the garbage collection list... Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com