From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Fastabend Subject: Re: RCU callback crashes Date: Wed, 20 Dec 2017 12:23:52 -0800 Message-ID: <92220030-e9f1-1963-00b6-05f37abb82ee@gmail.com> References: <20171219175921.7db9b0e1@cakuba.netronome.com> <20171220061118.GB1916@nanopsycho> <20171219222227.402e684a@cakuba.netronome.com> <20171219223404.03786d66@cakuba.netronome.com> <20171220121705.18401098@cakuba.netronome.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: Jiri Pirko , "netdev@vger.kernel.org" , Cong Wang To: Jakub Kicinski Return-path: Received: from mail-pg0-f52.google.com ([74.125.83.52]:37005 "EHLO mail-pg0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754388AbdLTUYF (ORCPT ); Wed, 20 Dec 2017 15:24:05 -0500 Received: by mail-pg0-f52.google.com with SMTP id o13so1610944pgp.4 for ; Wed, 20 Dec 2017 12:24:05 -0800 (PST) In-Reply-To: <20171220121705.18401098@cakuba.netronome.com> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 12/20/2017 12:17 PM, Jakub Kicinski wrote: > On Wed, 20 Dec 2017 10:04:17 -0800, John Fastabend wrote: >> On 12/19/2017 10:34 PM, Jakub Kicinski wrote: >>> On Tue, 19 Dec 2017 22:22:27 -0800, Jakub Kicinski wrote: >>>>>> I get this: >>>>> >>>>> Could you try to run it with kasan on? >>>> >>>> I didn't manage to reproduce it with KASAN on so far :( Even enabling >>>> object debugging to get the second splat in my email (which is more >>>> useful) actually makes the crash go away, I only see the warning... >>> >>> Ah, no object debug but KASAN on produces this: >>> >> >> @Jakub, This is with mq and pfifo_fast I guess? > > Sorry for falling silent, I was convinced I saw this before your code > went in, it just takes a lot longer to trigger... I've been running > net-next from Dec 1st now for an hour now and it didn't crash :/ > > Trying KASAN now.. > Its possible my patches just made it worse because the kfree on the skb lists was exposed as well. I'm trying to see how removing that rcu grace period was safe in the first place. The datapath is using rcu_read critical section to protect the qdisc but the control path (a) doesn't use rcu grace period and (b) doesn't use the qidisc lock. Going to go get a coffee and I'll think about it a bit more. Any ideas Cong? Perhaps we need a patch for net (mine was against net-next) and stable as well probably. Thanks, John