netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>, davem@davemloft.net
Cc: xiyou.wangcong@gmail.com, roid@mellanox.com, ast@fb.com,
	hannes@stressinduktion.org, jiri@mellanox.com,
	netdev@vger.kernel.org
Subject: Re: [PATCH net] net, sched: respect rcu grace period on cls destruction
Date: Sun, 27 Nov 2016 18:55:08 -0800	[thread overview]
Message-ID: <583B9C8C.9040703@gmail.com> (raw)
In-Reply-To: <0d6d89f885033f1739e97f7f3372ae6e1db72892.1480204343.git.daniel@iogearbox.net>

On 16-11-26 04:18 PM, Daniel Borkmann wrote:
> Roi reported a crash in flower where tp->root was NULL in ->classify()
> callbacks. Reason is that in ->destroy() tp->root is set to NULL via
> RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
> this doesn't respect RCU grace period for them, and as a result, still
> outstanding readers from tc_classify() will try to blindly dereference
> a NULL tp->root.
> 
> The tp->root object is strictly private to the classifier implementation
> and holds internal data the core such as tc_ctl_tfilter() doesn't know
> about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
> is only checked for NULL in ->get() callback, but nowhere else. This is
> misleading and seemed to be copied from old classifier code that was not
> cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
> fix NULL pointer dereference") moved tp->root initialization into ->init()
> routine, where before it was part of ->change(), so ->get() had to deal
> with tp->root being NULL back then, so that was indeed a valid case, after
> d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
> ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
> in packet classifiers"); but the NULLifying was reintroduced with the
> RCUification, but it's not correct for every classifier implementation.
> 
> In the cases that are fixed here with one exception of cls_cgroup, tp->root
> object is allocated and initialized inside ->init() callback, which is always
> performed at a point in time after we allocate a new tp, which means tp and
> thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
> Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
> handler, same for the tp which is kfree_rcu()'ed right when we return
> from ->destroy() in tcf_destroy(). This means, the head object's lifetime
> for such classifiers is always tied to the tp lifetime. The RCU callback
> invocation for the two kfree_rcu() could be out of order, but that's fine
> since both are independent.
> 
> Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
> means that 1) we don't need a useless NULL check in fast-path and, 2) that
> outstanding readers of that tp in tc_classify() can still execute under
> respect with RCU grace period as it is actually expected.
> 
> Things that haven't been touched here: cls_fw and cls_route. They each
> handle tp->root being NULL in ->classify() path for historic reasons, so
> their ->destroy() implementation can stay as is. If someone actually
> cares, they could get cleaned up at some point to avoid the test in fast
> path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
> !head should anyone actually be using/testing it, so it at least aligns with
> cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
> destruction (to a sleepable context) after RCU grace period as concurrent
> readers might still access it. (Note that in this case we need to hold module
> reference to keep work callback address intact, since we only wait on module
> unload for all call_rcu()s to finish.)
> 
> This fixes one race to bring RCU grace period guarantees back. Next step
> as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
> proto tp when all filters are gone") to get the order of unlinking the tp
> in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
> RCU_INIT_POINTER() before tcf_destroy() and let the notification for
> removal be done through the prior ->delete() callback. Both are independant
> issues. Once we have that right, we can then clean tp->root up for a number
> of classifiers by not making them RCU pointers, which requires a new callback
> (->uninit) that is triggered from tp's RCU callback, where we just kfree()
> tp->root from there.

Thanks looks good to me and appreciate the detailed commit message.

Acked-by: John Fastabend <john.r.fastabend@intel.com>

> 
> Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
> Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
> Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
> Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
> Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
> Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
> Reported-by: Roi Dayan <roid@mellanox.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Roi Dayan <roid@mellanox.com>
> Cc: Jiri Pirko <jiri@mellanox.com>
> ---

  reply	other threads:[~2016-11-28  2:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-27  0:18 [PATCH net] net, sched: respect rcu grace period on cls destruction Daniel Borkmann
2016-11-28  2:55 ` John Fastabend [this message]
2016-11-28  6:57 ` Cong Wang
2016-11-28  9:09   ` Daniel Borkmann
2016-11-28 10:47     ` Paul E. McKenney
2016-11-28 10:51       ` Daniel Borkmann
2016-11-29  6:55       ` Cong Wang
2016-11-29  5:16 ` Roi Dayan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=583B9C8C.9040703@gmail.com \
    --to=john.fastabend@gmail.com \
    --cc=ast@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hannes@stressinduktion.org \
    --cc=jiri@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=roid@mellanox.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).