From: Al Viro <viro@ZenIV.linux.org.uk>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
Kees Cook <keescook@chromium.org>,
LKML <linux-kernel@vger.kernel.org>,
Jiri Pirko <jiri@resnulli.us>, David Miller <davem@davemloft.net>,
Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL
Date: Tue, 28 Aug 2018 01:03:10 +0100 [thread overview]
Message-ID: <20180828000310.GE6515@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CAM_iQpVEyq9hR3bbOtLFKoLo6nHCtiL6A__uEz3JdDO79GF_8A@mail.gmail.com>
On Mon, Aug 27, 2018 at 02:31:41PM -0700, Cong Wang wrote:
> > I cant think of any challenges. Cong/Jiri? Would it require development
> > time classifiers/actions/qdiscs to sit in that directory (I suspect you
> > dont want them in include/net).
> > BTW, the idea of improving grep-ability of the code by prefixing the
> > ops appropriately makes sense. i.e we should have ops->cls_init,
> > ops->act_init etc.
>
> Hmm? Isn't struct tcf_proto_ops used and must be provided
> by each tc filter module? How does it work if you move it into
> net/sched/* for out-of-tree modules? Are they supposed to
> include "..../net/sched/tcf_proto.h"?? Or something else?
If you care about out-of-tree modules, that could easily live in
include/net/tcf_proto.h, provided that it's not pulled by indirect
includes into hell knows how many places. Try
make allmodconfig
make >/dev/null 2>&1
find -name '.*.cmd'|xargs grep sch_generic.h
That finds 2977 files here, most of them having nothing to do with
net/sched.
> BTW, we need some grep tool that really understands C syntax,
> not making each variable friendly to plain grep.
This isn't the matter of C syntax; it needs to handle C typization,
and you really can't do that anywhere near reliably without looking
at preprocessor output. Which very much depends upon .config...
BTW, something odd in cls_u32.c: what happens if we have the following
graph:
tcf_proto <tp>, it's ->data being <c0> and ->root - <ht0>
tc_u_common <c0>, in its ->hlist
<ht1>, in its ->ht[0]
<knode>
<ht0>
and set ->ht_down in <knode> to the <ht0>? AFAICS,
there's nothing to prevent that - TCA_U32_LINK being
0x80000000 will do just that. What happens upon u32_destroy()
in that case? Unless I'm misreading that code, refcounts will be
<c0>: 1
<ht0>: 2
<ht1>: 1
and in u32_destroy() we'll get this:
root_ht = <ht0>
tp_c = <c0>
if (root_ht && --root_ht->refcnt == 0)
u32_destroy_hnode(tp, root_ht, extack);
decrements refcnt to 1 and does nothing else.
if (--tp_c->refcnt == 0) {
is satisfied
hlist_del(&tp_c->hnode);
<c0> unhashed
while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) {
we take ht = <ht1>
u32_clear_hnode(tp, ht, extack);
which does
for (h = 0; h <= ht->divisor; h++) {
while ((n = rtnl_dereference(ht->ht[h])) != NULL) {
n = <knode>
RCU_INIT_POINTER(ht->ht[h],
rtnl_dereference(n->next));
remove <knode> from <ht1>->ht[0]
tcf_unbind_filter(tp, &n->res);
u32_remove_hw_knode(tp, n, extack);
idr_remove(&ht->handle_idr, n->handle);
if (tcf_exts_get_net(&n->exts))
tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
else
u32_destroy_key(n->tp, n, true);
... and we hit u32_destroy_key(<tp>, <knode>, true), which does
struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
ht = <ht0>
tcf_exts_destroy(&n->exts);
tcf_exts_put_net(&n->exts);
if (ht && --ht->refcnt == 0)
kfree(ht);
*NOW* <ht0>->refcnt is 0, and we free the damn thing.
....
kfree(n);
<knode> is freed and we return to u32_destroy_hnode() where we
see that there's nothing else left in <ht1>->ht[...] and return
to u32_destroy(). Where
RCU_INIT_POINTER(tp_c->hlist, ht->next);
sets <c0>->hlist to <ht1>->next, aka <h0>. Which is already freed.
/* u32_destroy_key() will later free ht for us, if it's
* still referenced by some knode
*/
if (--ht->refcnt == 0)
kfree_rcu(ht, rcu);
<ht1>->refcnt reaches 0 and we free it (RCU-delayed)
}
... and we go for the next iteration, this time with ht = <ht0>.
Doing all kinds of unsanitary things to the memory it used to occupy...
Incidentally, if we hit
tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
instead of u32_destroy_key(), the things don't seem to be any better - we
won't do anything to <knode> until rtnl is dropped, so u32_destroy() won't
break on the second pass through the loop - it'll free <ht0> there and
return. Setting us up for trouble, since when u32_delete_key_freepf_work()
finally gets to u32_destroy_key() we'll have <knode>->ht_down pointing
to freed memory and decrementing its contents...
What am I missing in there? Is it just "we should never have ->ht_down
pointing to anyone's ->root"? If so, I'm not sure how to detect that;
if not... what should happen to the orphaned root_ht? Should it
remain on the list? We might have two tcf_proto sharing tp->data,
so tp_c and its list might very well survive the u32_destroy()...
Note, BTW, that if we do leave the orphan on the list and later
change the tc_u_knode so that ->ht_down doesn't point to that
thing anymore, we'll get its refcount incremented to 2 in
u32_init_knode(), then decremented to 1 by u32_set_parms() and
then arrange for u32_delete_key_work() to be run. Which will
drive the refcount to 0 and free the damn thing. While it's
still in the middle of ->hlist...
next prev parent reply other threads:[~2018-08-28 0:03 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-26 5:58 [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL Kees Cook
2018-08-26 6:15 ` Al Viro
2018-08-26 6:19 ` Kees Cook
2018-08-26 17:30 ` Jamal Hadi Salim
2018-08-26 21:56 ` Kees Cook
2018-08-27 11:46 ` Jamal Hadi Salim
2018-08-27 14:08 ` Kees Cook
2018-08-27 14:26 ` Roman Mashak
2018-08-26 17:32 ` Al Viro
2018-08-26 18:57 ` Joe Perches
2018-08-26 21:24 ` Al Viro
2018-08-26 22:26 ` Joe Perches
2018-08-26 22:43 ` Al Viro
2018-08-27 2:00 ` Julia Lawall
2018-08-27 2:35 ` Al Viro
2018-08-27 3:35 ` Julia Lawall
2018-08-27 4:04 ` Al Viro
2018-08-27 4:41 ` Julia Lawall
2018-08-27 1:59 ` Julia Lawall
2018-08-26 22:57 ` Al Viro
2018-08-27 11:57 ` Jamal Hadi Salim
2018-08-27 21:31 ` Cong Wang
2018-08-28 0:03 ` Al Viro [this message]
2018-08-28 15:59 ` Al Viro
2018-08-31 4:03 ` Al Viro
2018-08-29 19:07 ` Cong Wang
2018-08-29 21:33 ` Al Viro
2018-08-26 21:22 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180828000310.GE6515@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=davem@davemloft.net \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.