From mboxrd@z Thu Jan  1 00:00:00 1970
From: John Fastabend <john.fastabend@gmail.com>
Subject: Re: RCU callback crashes
Date: Thu, 21 Dec 2017 08:26:56 -0800
Message-ID: <97c5063d-fa28-c02f-2ad7-95a08e8d3cee@gmail.com>
References: <20171219175921.7db9b0e1@cakuba.netronome.com>
 <20171220061118.GB1916@nanopsycho>
 <20171219222227.402e684a@cakuba.netronome.com>
 <20171219223404.03786d66@cakuba.netronome.com>
 <CAM_iQpWUjfv2-Sirmdb5WfV4pZ4uF0m7=HR5YGWaKxb4KHp8gQ@mail.gmail.com>
 <CAM_iQpVPUifm3rcXu8SP9ShSmm7z9z+8UjppdY_AxMYQwHE9YQ@mail.gmail.com>
 <CAM_iQpUngX+oSDiforfZceqMZrg=jDJnNf3QFF9WFQdHrU9o-g@mail.gmail.com>
 <20171220163710.7a5f06e5@cakuba.netronome.com>
 <20171220164058.2a862e27@cakuba.netronome.com>
 <20171220164419.42c63ebf@cakuba.netronome.com>
 <CAM_iQpVLAxgbdL8HG=Aheq0=yMS5_10=ndD-F1TON3J7GpkBxQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Cc: Jiri Pirko <jiri@resnulli.us>,
        "netdev@vger.kernel.org" <netdev@vger.kernel.org>
To: Cong Wang <xiyou.wangcong@gmail.com>,
        Jakub Kicinski <kubakici@wp.pl>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-pf0-f170.google.com ([209.85.192.170]:40280 "EHLO
        mail-pf0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1753218AbdLUQ1N (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 21 Dec 2017 11:27:13 -0500
Received: by mail-pf0-f170.google.com with SMTP id v26so14134618pfl.7
        for <netdev@vger.kernel.org>; Thu, 21 Dec 2017 08:27:13 -0800 (PST)
In-Reply-To: <CAM_iQpVLAxgbdL8HG=Aheq0=yMS5_10=ndD-F1TON3J7GpkBxQ@mail.gmail.com>
Content-Language: en-US
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On 12/20/2017 11:27 PM, Cong Wang wrote:
> On Wed, Dec 20, 2017 at 4:50 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> On Wed, 20 Dec 2017 16:41:14 -0800, Jakub Kicinski wrote:
>>> Just as I hit send... :)  but this looks unrelated, "Comm: sshd" -
>>> so probably from the management interface.
>>>
>>> [  154.604041] ==================================================================
>>> [  154.612245] BUG: KASAN: slab-out-of-bounds in pfifo_fast_dequeue+0x140/0x2d0
>>> [  154.620219] Read of size 8 at addr ffff88086bb64040 by task sshd/983
>>> [  154.627403]
>>> [  154.629161] CPU: 10 PID: 983 Comm: sshd Not tainted 4.15.0-rc3-perf-00984-g82d3fc87a4aa-dirty #13
>>> [  154.639190] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>>> [  154.647665] Call Trace:
>>> [  154.650494]  dump_stack+0xa6/0x118
>>> [  154.654387]  ? _atomic_dec_and_lock+0xe8/0xe8
>>> [  154.659355]  ? trace_event_raw_event_rcu_torture_read+0x190/0x190
>>> [  154.666263]  ? rcu_segcblist_enqueue+0xe9/0x120
>>> [  154.671422]  ? _raw_spin_unlock_bh+0x91/0xc0
>>> [  154.676286]  ? pfifo_fast_dequeue+0x140/0x2d0
>>> [  154.681251]  print_address_description+0x6a/0x270
>>> [  154.686601]  ? pfifo_fast_dequeue+0x140/0x2d0
>>> [  154.691565]  kasan_report+0x23f/0x350
>>> [  154.695752]  pfifo_fast_dequeue+0x140/0x2d0
>>
>> If we trust stack decode it's:
>>
>>    615  static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>>    616  {
>>    617          struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
>>    618          struct sk_buff *skb = NULL;
>>    619          int band;
>>    620
>>    621          for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>>    622                  struct skb_array *q = band2list(priv, band);
>>    623
>>>> 624                  if (__skb_array_empty(q))
>>    625                          continue;
>>    626
>>    627                  skb = skb_array_consume_bh(q);
>>    628          }
>>    629          if (likely(skb)) {
>>    630                  qdisc_qstats_cpu_backlog_dec(qdisc, skb);
>>    631                  qdisc_bstats_cpu_update(qdisc, skb);
>>    632                  qdisc_qstats_cpu_qlen_dec(qdisc);
>>    633          }
>>    634
>>    635          return skb;
>>    636  }
> 
> Yeah, this one is clearly a different one and it is introduced by John's
> "lockless" patchset.
> 
> I will take a look tomorrow if John doesn't.
> 

I guess this path

  dev_deactivate_many
    dev_deactivate_queue
      qdisc_reset

here we have the qdisc lock but no rcu call or sync before the reset
does a kfree_skb and cleans up list walks. So possible for xmit path to
also be pushing skbs onto the array/lists still. I don't think this is
the issue triggered above but needs to be fixed

Also net_synchronize uses synchronize_rcu and we also have _bh variants
involved here...

Finally looks like net_tx_action is calling into qdisc_run without
rcu_read. Either need to check is_running bit (wanted to avoid this)
or put in rcu critical section. Maybe this is what you hit.

@Jakub, does your test have traffic generator running or just control
path? My theory would be a bit odd if you didn't have traffic, but
something is kicking the dequeue so must be some traffic.

I'll come up with some fixes today.

Thanks,
John