From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ido Schimmel Subject: Re: [Patch net-next v3] net_sched: change tcf_del_walker() to take idrinfo->lock Date: Fri, 28 Sep 2018 21:11:14 +0300 Message-ID: <20180928181114.GA28797@splinter> References: <20180919233729.10951-1-xiyou.wangcong@gmail.com> <20180928145900.GA17640@splinter> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Linux Kernel Network Developers , Jiri Pirko , Jamal Hadi Salim , Vlad Buslov To: Cong Wang Return-path: Received: from out3-smtp.messagingengine.com ([66.111.4.27]:37081 "EHLO out3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbeI2AgQ (ORCPT ); Fri, 28 Sep 2018 20:36:16 -0400 Content-Disposition: inline In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Fri, Sep 28, 2018 at 10:56:47AM -0700, Cong Wang wrote: > On Fri, Sep 28, 2018 at 7:59 AM Ido Schimmel wrote: > > > > On Wed, Sep 19, 2018 at 04:37:29PM -0700, Cong Wang wrote: > > > From: Vlad Buslov > > > > > > From: Vlad Buslov > > > > > > Action API was changed to work with actions and action_idr in concurrency > > > safe manner, however tcf_del_walker() still uses actions without taking a > > > reference or idrinfo->lock first, and deletes them directly, disregarding > > > possible concurrent delete. > > > > > > Change tcf_del_walker() to take idrinfo->lock while iterating over actions > > > and use new tcf_idr_release_unsafe() to release them while holding the > > > lock. > > > > > > And the blocking function fl_hw_destroy_tmplt() could be called when we > > > put a filter chain, so defer it to a work queue. > > > > I'm getting a use-after-free when running tc_chains.sh selftest and I > > believe it's caused by this patch. > > > > To reproduce: > > # cd tools/testing/selftests/net/forwarding > > # export TESTS="template_filter_fits"; ./tc_chains.sh veth0 veth1 > > > > __tcf_chain_put() > > tc_chain_tmplt_del() > > fl_tmplt_destroy() > > tcf_queue_work(&tmplt->rwork, fl_tmplt_destroy_work) > > tcf_chain_destroy() > > kfree(chain) > > > > Some time later fl_tmplt_destroy_work() starts executing and > > dereferencing 'chain'. > > Oops, forgot to hold the chain... I will test this: > > diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c > index 92dd5071a708..cbb68d5515d6 100644 > --- a/net/sched/cls_flower.c > +++ b/net/sched/cls_flower.c > @@ -1444,6 +1444,7 @@ static void fl_tmplt_destroy_work(struct > work_struct *work) > struct fl_flow_tmplt, rwork); > > fl_hw_destroy_tmplt(tmplt->chain, tmplt); > + tcf_chain_put(tmplt->chain); > kfree(tmplt); > } > > @@ -1451,6 +1452,7 @@ static void fl_tmplt_destroy(void *tmplt_priv) > { > struct fl_flow_tmplt *tmplt = tmplt_priv; > > + tcf_chain_hold(tmplt->chain); > tcf_queue_work(&tmplt->rwork, fl_tmplt_destroy_work); > } I don't think this will work given the reference count already dropped to 0, which is why the template deletion function was invoked. I didn't test the patch, but I don't see what would prevent the chain from being freed. Thanks for looking into this.