From: Stefano Brivio <sbrivio@redhat.com>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Phil Sutter <phil@nwl.cc>,
netfilter-devel@vger.kernel.org, Florian Westphal <fw@strlen.de>
Subject: Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
Date: Tue, 25 Feb 2020 15:34:35 +0100 [thread overview]
Message-ID: <20200225153435.17319874@redhat.com> (raw)
In-Reply-To: <20200225134236.sdz5ujufvxm2in3h@salvia>
On Tue, 25 Feb 2020 14:42:36 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Tue, Feb 25, 2020 at 02:13:46PM +0100, Stefano Brivio wrote:
> > On Tue, 25 Feb 2020 13:39:34 +0100
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >
> > > Hi,
> > >
> > > On Sun, Feb 23, 2020 at 10:22:58PM +0100, Stefano Brivio wrote:
> > > > Hi Phil,
> > > >
> > > > On Sat, 22 Feb 2020 02:19:33 +0100
> > > > Phil Sutter <phil@nwl.cc> wrote:
> > > >
> > > > > Hi Stefano,
> > > > >
> > > > > On Fri, Feb 21, 2020 at 11:22:18PM +0100, Stefano Brivio wrote:
> > > > > > On Fri, 21 Feb 2020 22:17:04 +0100
> > > > > > Phil Sutter <phil@nwl.cc> wrote:
> > > > > >
> > > > > > > I found another problem, but it's maybe on user space side (and not a
> > > > > > > crash this time ;):
> > > > > > >
> > > > > > > | # nft add table t
> > > > > > > | # nft add set t s '{ type inet_service . inet_service ; flags interval ; }
> > > > > > > | # nft add element t s '{ 20-30 . 40, 25-35 . 40 }'
> > > > > > > | # nft list ruleset
> > > > > > > | table ip t {
> > > > > > > | set s {
> > > > > > > | type inet_service . inet_service
> > > > > > > | flags interval
> > > > > > > | elements = { 20-30 . 40 }
> > > > > > > | }
> > > > > > > | }
> > > > > > >
> > > > > > > As you see, the second element disappears. It happens only if ranges
> > > > > > > overlap and non-range parts are identical.
> > > > > > >
> > > > > > > Looking at do_add_setelems(), set_to_intervals() should not be called
> > > > > > > for concatenated ranges, although I *think* range merging happens only
> > > > > > > there. So user space should cover for that already?!
> > > > > >
> > > > > > Yes. I didn't consider the need for this kind of specification, given
> > > > > > that you can obtain the same result by simply adding two elements:
> > > > > > separate, partially overlapping elements can be inserted (which is, if I
> > > > > > recall correctly, not the case for rbtree).
> > > > > >
> > > > > > If I recall correctly, we had a short discussion with Florian about
> > > > > > this, but I don't remember the conclusion.
> > > > > >
> > > > > > However, I see the ugliness, and how this breaks probably legitimate
> > > > > > expectations. I guess we could call set_to_intervals() in this case,
> > > > > > that function might need some minor adjustments.
> > > > > >
> > > > > > An alternative, and I'm not sure which one is the most desirable, would
> > > > > > be to refuse that kind of insertion.
> > > > >
> > > > > I don't think having concatenated ranges not merge even if possible is a
> > > > > problem. It's just a "nice feature" with some controversial aspects.
> > > > >
> > > > > The bug I'm reporting is that Above 'add element' command passes two
> > > > > elements to nftables but only the first one makes it into the set. If
> > > > > overlapping elements are fine in pipapo, they should both be there. If
> > > > > not (or otherwise unwanted), we better error out instead of silently
> > > > > dropping the second one.
> > > >
> > > > Indeed, I agree there's a blatant bug, I was just wondering how to
> > > > solve it.
> > >
> > > With hashtable and bitmap, adding an element that already exists is
> > > fine:
> > >
> > > nft add element x y { 1.1.1.1 }
> > > nft add element x y { 22 }
> > > nft add element x z { 1.1.1.1-2.2.2.2 }
> > > nft add element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }
> > >
> > > then, through 'create':
> > >
> > > nft create element x y { 1.1.1.1 }
> > > nft create element x y { 22 }
> > > nft create element x z { 1.1.1.1-2.2.2.2 }
> > > nft create element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }
> > >
> > > these hit EEXIST, all good.
> > >
> > > In pipapo, the following is silently ignored:
> > >
> > > nft add element x k { 1.1.1.1-2.2.2.3 . 3.3.3.3 }
> > > ^
> > > (just added a slightly large range). I tried _without_ these patches.
> >
> > I suppose you're doing something like:
> >
> > nft add table x
> > nft add set x k '{ type ipv4_addr . ipv4_addr ; flags interval ; }'
> > nft add element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }
> > nft add element x k { 1.1.1.1-2.2.2.3 . 3.3.3.3 }
> >
> > in which case, yes, this is exactly the problem reported by Phil and my
> > point below: nft_pipapo_insert() reports -EEXIST on the second element,
> > but it's cleared by nft_add_set_elem().
> >
> > > In rbtree, if you try to add an interval that already exists:
> > >
> > > # nft add element x z { 1.1.0.0-1.1.2.254 }
> > > Error: interval overlaps with an existing one
> > > add element x z { 1.1.0.0-1.1.2.254 }
> > > ^^^^^^^^^^^^^^^^^
> > > Error: Could not process rule: File exists
> > > add element x z { 1.1.0.0-1.1.2.254 }
> > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >
> > > This complains as an overlap.
> >
> > It doesn't for me:
>
> I'm assuming there's already:
>
> nft create element x z { 1.1.1.1-2.2.2.2 }
>
> to trigger the EEXIST, see below.
>
> > # nft add table x
> > # nft add set x z '{ type ipv4_addr ; flags interval ; }'
> > # nft add element x z { 1.1.0.0-1.1.2.254 }
> > # nft add element x z { 1.1.0.0-1.1.2.254 }
> > # echo $?
> > 0
> >
> > that is, -EEXIST from nft_rbtree_insert() is also cleared by
> > nft_add_set_elem(). Am I trying it in a different way?
>
> No, this is the same behaviour I see here:
>
> # nft add element x z { 1.1.0.0-1.1.2.254 }
> # nft add element x z { 1.1.0.0-1.1.2.254 }
> # echo $?
> 0
>
> This is fine. If you use 'create' instead, the second command gets
> EEXIST.
>
> But:
>
> # nft add element x z { 1.1.0.0-1.1.2.254 }
> # nft add element x z { 1.1.0.0-1.1.2.1 }
>
> Then, the second command hits EEXIST because of the overlap.
Okay, thanks, I see now. But this is not returned by the kernel: set
back-ends have no way to return -EEXIST on 'add' to userspace. This is
returned by set_overlap() in src/segtree.c, the element is not even
sent to the kernel.
> > > I think it's better to not extend userspace to dump the element cache
> > > for add/create, this slows down things for incremental updates
> > > (there's a ticket on bugzilla regarding this problem on the rbtree
> > > IIRC). Better if pipapo can handle this from the kernel.
> >
> > It does, the overlapping (entire or partial) is detected and
> > nft_pipapo_insert() reports -EEXIST.
>
> OK.
>
> > > There is a automerge feature in the rbtree userspace that has been
> > > controversial. This was initially turned on by default, users were
> > > reporting that this was confusing. We can probably introduce a kernel
> > > flag to turn on this automerge feature so pipapo knows user is asking
> > > to not bail out with EEXIST, instead just merge? Not sure how hard is
> > > to implement merging.
> >
> > It's doable, the problem is that with multiple ranges as different
> > fields of single elements it might be ambiguous for which fields
> > merging should be attempted.
> >
> > That is, in the '{ 20-30 . 40-50, 25-35 . 45-50 }' case below, should I
> > attempt merging 0, 1 or 2 fields? I think the 'automerge' feature would
> > be even more confusing here, assuming it can be defined at all.
>
> I'm not very fond of the automerge feature either. Some people have
> been asking for this though. I still don't understand the usecase.
I would then avoid touching that unless somebody comes up with a use
case, which in this case it will also need to suggest a specification
of how to handle this for concatenated fields.
> > > > I found out from notes with an old discussion with Florian what the
> > > > problem really was: for concatenated ranges, we might have stuff like:
> > > >
> > > > '{ 20-30 . 40-50, 25-35 . 45-50 }'
> > >
> > > I think the second element should hit EEXIST.
> >
> > Yes, I also think so, and nft_pipapo_insert() reports that.
> >
> > > > And the only sane way to handle this is as two separate elements. Also
> > > > note that I gave a rather confusing description of the behaviour with
> > > > "partially overlapping elements": what can partially overlap are ranges
> > > > within one field, but there can't be an overlap (even partial) over the
> > > > whole concatenation, because that creates ambiguity. That is,
> > > >
> > > > '{ 20-30 . 40, 25-35 . 40 }'
> > > >
> > > > the "second element" here is not allowed, while:
> > > >
> > > > '{ 20-30 . 40, 25-35 . 41 }'
> > > >
> > > > these elements both are.
> > > >
> > > > Now, this turns into a question for Pablo. I started digging a bit
> > > > further, and I think that both userspace and nft_pipapo_insert()
> > > > observe a reasonable behaviour here: nft passes those as two elements,
> > > > nft_pipapo_insert() rejects the second one with -EEXIST.
> > > >
> > > > However, in nft_add_set_elem(), we hit this:
> > > >
> > > > err = set->ops->insert(ctx->net, set, &elem, &ext2);
> > > > if (err) {
> > > > if (err == -EEXIST) {
> > > > if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
> > > > nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
> > > > nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
> > > > nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF)) {
> > > > err = -EBUSY;
> > > > goto err_element_clash;
> > > > }
> > > > if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) &&
> > > > nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) &&
> > > > memcmp(nft_set_ext_data(ext),
> > > > nft_set_ext_data(ext2), set->dlen) != 0) ||
> > > > (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) &&
> > > > nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
> > > > *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
> > > > err = -EBUSY;
> > > > else if (!(nlmsg_flags & NLM_F_EXCL))
> > > > err = 0;
> > > > }
> > > > goto err_element_clash;
> > > > }
> > > >
> > > > and, in particular, as there's no "data" or "objref" extension
> > > > associated with the elements, we hit the:
> > > >
> > > > if (!(nlmsg_flags & NLM_F_EXCL))
> > > >
> > > > clause, introduced by commit c016c7e45ddf ("netfilter: nf_tables: honor
> > > > NLM_F_EXCL flag in set element insertion"). The error is reset, and we
> > > > return success, but the set back-end indicated a problem.
> > > >
> > > > Now, if NLM_F_EXCL is not passed and the entry the user wants to add is
> > > > already present, even though I'd give RFC 3549 a different
> > > > interpretation (we won't replace the entry, but we should still report
> > > > the error IMHO), I see why we might return success in this case.
> > > >
> > > > The additional problem with concatenated ranges here is that the entry
> > > > might be conflicting (overlapping over the whole concatenation), but
> > > > not be the same as the user wants to insert. I think -EEXIST is the
> > > > code that still fits best in this case, so... do you see a better
> > > > alternative than changing nft_pipapo_insert() to return, say, -EINVAL?
> > >
> > > Please, no -EINVAL, it's very overloaded and I think we should only
> > > use this for missing netlink attributes / malformed netlink message.
> > >
> > > If I understood your reasoning, I agree -EEXIST for an overlapping
> > > element is fine, even if NLM_F_EXCL is not set.
> >
> > So you're suggesting that I change nft_add_set_elem() like this:
> >
> > @@ -5075,8 +5075,6 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
> > nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
> > *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
> > err = -EBUSY;
> > - else if (!(nlmsg_flags & NLM_F_EXCL))
> > - err = 0;
>
> IIRC, this is disabling the 'nft create element' behaviour. NLM_F_EXCL
> is not set for 'nft add element commands'.
Right, and if it's not set, set back-ends can't return -EEXIST to
userspace.
> Are you observing any inconsistency? I still don't see where the
> problem is.
This is the problem Phil reported:
> > > > > > On Fri, 21 Feb 2020 22:17:04 +0100
> > > > > > Phil Sutter <phil@nwl.cc> wrote:
> > > > > >
> > > > > > > I found another problem, but it's maybe on user space side (and not a
> > > > > > > crash this time ;):
> > > > > > >
> > > > > > > | # nft add table t
> > > > > > > | # nft add set t s '{ type inet_service . inet_service ; flags interval ; }
> > > > > > > | # nft add element t s '{ 20-30 . 40, 25-35 . 40 }'
> > > > > > > | # nft list ruleset
> > > > > > > | table ip t {
> > > > > > > | set s {
> > > > > > > | type inet_service . inet_service
> > > > > > > | flags interval
> > > > > > > | elements = { 20-30 . 40 }
> > > > > > > | }
> > > > > > > | }
> > > > > > >
> > > > > > > As you see, the second element disappears. It happens only if ranges
> > > > > > > overlap and non-range parts are identical.
Or also simply with:
# nft add element t s '{ 20-30 . 40 }'
# nft add element t s '{ 25-35 . 40 }'
the second element is silently ignored. I'm returning -EEXIST from
nft_pipapo_insert(), but nft_add_set_elem() clears it because NLM_F_EXCL
is not set.
Are you suggesting that this is consistent and therefore not a problem?
Or are you proposing that I should handle this in userspace as it's done
for non-concatenated ranges?
--
Stefano
next prev parent reply other threads:[~2020-02-25 14:34 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-21 2:04 [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Stefano Brivio
2020-02-21 2:04 ` [PATCH nf 1/2] nft_set_pipapo: Actually fetch key data in nft_pipapo_remove() Stefano Brivio
2020-02-21 2:04 ` [PATCH nf 2/2] selftests: nft_concat_range: Add test for reported add/flush/add issue Stefano Brivio
2020-02-21 21:17 ` [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Phil Sutter
2020-02-21 22:22 ` Stefano Brivio
2020-02-22 1:19 ` Phil Sutter
2020-02-23 21:22 ` Stefano Brivio
2020-02-25 12:39 ` Pablo Neira Ayuso
2020-02-25 12:45 ` Stefano Brivio
2020-02-25 13:13 ` Stefano Brivio
2020-02-25 13:42 ` Pablo Neira Ayuso
2020-02-25 14:34 ` Stefano Brivio [this message]
2020-02-25 18:48 ` Phil Sutter
2020-02-25 19:33 ` Stefano Brivio
2020-02-25 20:21 ` Pablo Neira Ayuso
2020-02-25 20:38 ` Stefano Brivio
2020-02-25 20:58 ` Pablo Neira Ayuso
2020-02-26 10:58 ` Pablo Neira Ayuso
2020-02-26 11:01 ` Pablo Neira Ayuso
2020-02-26 11:02 ` Stefano Brivio
2020-02-26 11:29 ` Pablo Neira Ayuso
2020-02-26 11:36 ` Stefano Brivio
2020-02-26 11:53 ` Pablo Neira Ayuso
2020-02-26 10:59 ` Stefano Brivio
2020-02-26 11:10 ` Pablo Neira Ayuso
2020-02-26 11:19 ` Stefano Brivio
2020-02-26 11:34 ` Pablo Neira Ayuso
2020-02-26 11:39 ` Stefano Brivio
2020-02-26 11:54 ` Stefano Brivio
2020-02-26 12:10 ` Pablo Neira Ayuso
2020-02-26 13:33 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200225153435.17319874@redhat.com \
--to=sbrivio@redhat.com \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=phil@nwl.cc \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.