Netdev List

* Re: [PATCH 1/7] fix hnode refcounting
From: Jamal Hadi Salim @ 2018-09-07 12:13 UTC (permalink / raw)
  To: Al Viro; +Cc: netdev, Cong Wang, Jiri Pirko, stable
In-Reply-To: <20180907023529.GV19965@ZenIV.linux.org.uk>

On 2018-09-06 10:35 p.m., Al Viro wrote:
> On Thu, Sep 06, 2018 at 06:21:09AM -0400, Jamal Hadi Salim wrote:
[..]
> 
> Argh...  Unfortunately, there's this: in u32_delete() we have
>          if (root_ht) {
>                  if (root_ht->refcnt > 1) {
>                          *last = false;
>                          goto ret;
>                  }
>                  if (root_ht->refcnt == 1) {
>                          if (!ht_empty(root_ht)) {
>                                  *last = false;
>                                  goto ret;
>                          }
>                  }
>          }
> and that would need to be updated.  

It is not detrimental as you have it right now but
you are right an adjustment is needed...

Deleting of a root directly should not be allowed. But
you can flush a whole tp. Consider this:
--
sudo tc qdisc add dev $P ingress
sudo tc filter add dev $P parent ffff: protocol ip prio 10 \
u32 match ip protocol 1 0xff

Which creates root ht 800

You shouldnt be allowed to do this:
--
tc filter delete dev $P parent ffff: protocol ip prio 10 handle 800: u32
---

But you can delete the tp entirely as such:
---
tc filter delete dev $P parent ffff: protocol ip prio 10 u32
--

The later will go via the destroy() path and flush all filters.

You should also be able to delete individual filters. ex:
$tc filter del dev $P parent ffff: prio 10 handle 800:0:800 u32

Where that code you are referring to is important is when
the last filter deleted - we need the caller to know
and it destroys root.

i.e you should return last=true when the last filter is
deleted so root gets auto deleted (just like it was autocreated)

> However, that logics is bloody odd
> to start with.  First of all, root_ht has come from
>         struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
> and the only place where it's ever modified is
>          rcu_assign_pointer(tp->root, root_ht);
> in u32_init(), where we'd bloody well checked that root_ht is non-NULL
> (see
>          if (root_ht == NULL)
>                  return -ENOBUFS;
> upstream of that place) and where that assignment is inevitable on the
> way to returning 0.  No matter what, if tp has passed u32_init() it
> will have non-NULL ->root, forever.  And there is no way for tcf_proto
> to be seen outside of tcf_proto_create() without ->init() having returned
> 0 - it gets freed before anyone sees it.
> 

Yes, the check for root_ht is not necessary - but the check for the
last filter (and testing for last) is needed.

> So this 'if (root_ht)' can't be false.  What's more, what the hell is the
> whole thing checking?  We are in u32_delete().  It's called (as ->delete())
> from tfilter_del_notify(), which is called from tc_del_tfilter().  If we
> return 0 with *last true, we follow up calling tcf_proto_destroy().
> OK, let's look at the logics in there:
> 	* if there are links to root hnode => false
> 	* if there's no links to root hnode and it has knodes => false
> (BTW, if we ever get there with root_ht->refcnt < 1, we are obviously screwed)
> 	* if there is a tcf_proto sharing tp->data => false (i.e. any filters
> with different prio - don't bother)
> 	* if tp is the only one with reference to tp->data and there are *any*
> knodes => false.
> 
> Any extra links can come only from knodes in a non-empty hnode.  And it's not
> a common case.  Shouldn't thIe whole thing be
> 	* shared tp->data => false
> 	* any non-empty hnode => false
> instead?  Perhaps even with the knode counter in tp->data, avoiding any loops
> in there, as well as the entire ht_empty()...
> 
> Now, in the very beginning of u32_delete() we have this:
>          struct tc_u_hnode *ht = arg;
> 	
>          if (ht == NULL)
>                  goto out;
> OK, but the call of ->delete() is
>          err = tp->ops->delete(tp, fh, last, extack);
> and arg == NULL seen in u32_delete() means fh == NULL in tfilter_del_notify().
> Which is called in
>          if (!fh) {
> 		...
> 	} else {
>                  bool last;
> 
>                  err = tfilter_del_notify(net, skb, n, tp, block,
>                                           q, parent, fh, false, &last,
>                                           extack);
> How can we ever get there with NULL fh?
>

Try:
tc filter delete dev $P parent ffff: protocol ip prio 10 u32
tcm handle is 0, so will hit that code path.

> The whole thing makes very little sense; looks like it used to live in
> u32_destroy() prior to commit 763dbf6328e41 ("net_sched: move the empty tp
> check from ->destroy() to ->delete()"), but looking at the rationale in
> that commit...  I don't see how it fixes anything - sure, now we remove
> tcf_proto from the list before calling ->destroy().  Without any RCU delays
> in between.  How could it possibly solve any issues with ->classify()
> called in parallel with ->destroy()?  cls_u32 (at least these days)
> does try to survive u32_destroy() in parallel with u32_classify();
> if any other classifiers do not, they are still broken and that commit
> has not done anything for them.
> 
> Anyway, adjusting 1/7 for that is trivial, but I would really like to
> understand what that code is doing...  Comments?
> 

Refer to above.

cheers,
jamal

^ permalink raw reply