From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ethan Tuttle Subject: Re: mvneta: oops in __rcu_read_lock on mirabox Date: Mon, 16 Sep 2013 01:56:46 -0700 Message-ID: References: <20130915205701.5c61a444@skate> <20130916065047.GH27487@1wt.eu> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Thomas Petazzoni , Andrew Lunn , Jason Cooper , netdev@vger.kernel.org, Ezequiel Garcia , =?ISO-8859-1?Q?Gregory_Cl=E9ment?= , linux-arm-kernel@lists.infradead.org To: Willy Tarreau Return-path: Received: from mail-we0-f173.google.com ([74.125.82.173]:39945 "EHLO mail-we0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750781Ab3IPI4r (ORCPT ); Mon, 16 Sep 2013 04:56:47 -0400 Received: by mail-we0-f173.google.com with SMTP id w62so3368342wes.32 for ; Mon, 16 Sep 2013 01:56:46 -0700 (PDT) In-Reply-To: <20130916065047.GH27487@1wt.eu> Sender: netdev-owner@vger.kernel.org List-ID: Hi guys. Here's the config I was building with: https://gist.github.com/anonymous/6578139 It's based on the one I found in archlinuxarm's git repo. I didn't change any of the options - at least, not manually. Thanks for the follow up! Ethan On Sun, Sep 15, 2013 at 11:50 PM, Willy Tarreau wrote: > Hi Thomas, > > On Sun, Sep 15, 2013 at 08:57:01PM +0200, Thomas Petazzoni wrote: >> Hello Ethan, >> >> On Sat, 14 Sep 2013 18:05:32 -0700, Ethan Tuttle wrote: >> > When I upgraded my mirabox from 3.11-rc4 to 3.11, I started seeing >> > oopses while receiving network traffic (see below). Sending a flood >> > ping will trigger the oops within a few minutes. >> > >> > The stack looks similar, but not identical to, the one reported >> > earlier by Jochen De Smet[1]. In my case the PC is always >> > __rcu_read_lock. >> > >> > A git bisect found a878764 "Merge >> > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net" to be the >> > first bad commit... interesting, because neither of the merge parents >> > produce the oops. I rebased the net changes onto the other merge >> > parent and bisected that series, which identified 702821f "net: revert >> > 8728c544a9c ("net: dev_pick_tx() fix")" as the first bad commit. >> > Indeed, reverting 702821f from 3.11 produces a kernel which stands up >> > to a ping flood for hours. >> > >> > Each of the times I reproduced this, it was identified as "Unhandled >> > prefetch abort: unknown 25 (0x409) at 0xc0036ea0", except once when I >> > got "unknown 16 (0x400)". >> > >> > I'm assuming this is an mvneta bug that was exposed by 702821f. >> > That's just a guess, and I don't have the skills to debug this any >> > further. In any case, I figured the maintainers would want to know >> > about it. >> >> Thanks a lot for the report and the detailed investigation. >> Unfortunately, I don't have Armada 370 hardware with me this week, so >> I'm unable to test and reproduce the issue. >> >> However, I've added a bunch of Armada 370 people/maintainers in Cc, >> hopefully they can at least try to reproduce and confirm that reverting >> this patch makes the problem go away, which would confirm that we >> should look for a bug in the mvneta driver around this problem. > > I'm currently testing on 3.11.1 (which I had here) and am not getting > any issue after 50M packets. My kernel is running in thumb mode and > without SMP. > > Ethan, we'll need your config I guess. > > Thanks, > Willy >