From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Patrick McHardy <kaber@trash.net>
Cc: netfilter-devel@vger.kernel.org, arturo.borrero.glez@gmail.com
Subject: Re: [PATCH] netfilter: nf_tables: fix racy rule deletion
Date: Sun, 26 Jan 2014 09:54:46 +0100 [thread overview]
Message-ID: <20140126085446.GA4130@localhost> (raw)
In-Reply-To: <20140125171451.GA11373@macbook.localnet>
[-- Attachment #1: Type: text/plain, Size: 4260 bytes --]
On Sat, Jan 25, 2014 at 05:14:51PM +0000, Patrick McHardy wrote:
> On Sat, Jan 25, 2014 at 04:35:33PM +0000, Patrick McHardy wrote:
> > On Sat, Jan 25, 2014 at 01:55:52PM +0000, Patrick McHardy wrote:
> > > On Sat, Jan 25, 2014 at 02:03:51PM +0100, Pablo Neira Ayuso wrote:
> > > > We still have a bug somewhere else. When creating 10000 rules like:
> > > > tcp dport { 22, 23 }, I can see more than 10000 sets.
> > > >
> > > > # ./nft-set-get ip | wc -l
> > > > 10016
> > > >
> > > > It seems set 511 is not being used. See below:
> > > >
> > > > # ./nft-rule-get
> > > > ip filter output 513 512
> > > > [ payload load 1b @ network header + 9 => reg 1 ]
> > > > [ cmp eq reg 1 0x00000006 ]
> > > > [ payload load 2b @ transport header + 2 => reg 1 ]
> > > > [ lookup reg 1 set set510 ]
> > > > [ counter pkts 0 bytes 0 ]
> > > >
> > > > ip filter output 514 513
> > > > [ payload load 1b @ network header + 9 => reg 1 ]
> > > > [ cmp eq reg 1 0x00000006 ]
> > > > [ payload load 2b @ transport header + 2 => reg 1 ]
> > > > [ lookup reg 1 set set512 ]
> > > > [ counter pkts 0 bytes 0 ]
> > > >
> > > > It seems to happen every 512 sets are added. Still investigating, so
> > > > this needs a second follow up patch to resolve what Arturo is reporting.
> > >
> > > Yeah, we seem to have a couple of bugs in nf_tables_set_alloc_name().
> > > I'll fix them up and will then have a look at this patch.
> >
> > I can't reproduce the gaps in the name space, but we have an obvious
> > overflow since we're using BITS_PER_LONG * PAGE_SIZE instead of BITS_PER_BYTE.
> >
> > This shouldn't have affected your test case though since the overflow only
> > happens for more than 32768 sets.
> >
>
> As a start, please try this patch. It fixes the overflow, might also
> fix your problem.
>
> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index 9ce3053..e8c7437 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
> @@ -1989,13 +1992,13 @@ static int nf_tables_set_alloc_name(struct nft_ctx *ctx, struct nft_set *set,
>
> if (!sscanf(i->name, name, &tmp))
> continue;
> - if (tmp < 0 || tmp > BITS_PER_LONG * PAGE_SIZE)
> + if (tmp < 0 || tmp >= BITS_PER_BYTE * PAGE_SIZE)
> continue;
>
> set_bit(tmp, inuse);
> }
>
> - n = find_first_zero_bit(inuse, BITS_PER_LONG * PAGE_SIZE);
> + n = find_first_zero_bit(inuse, BITS_PER_BYTE * PAGE_SIZE);
> free_page((unsigned long)inuse);
> }
>
Tested this patch, it works fine here, I hit -EMFILE with 32768 sets
with no crashes.
The problem I was reporting was different though, I found a bug in the
batching code of libmnl. The mnl_nlmsg_batch_next function was not
accounting the last message not fitting in the batch.
With my patch + libmnl patch I can perform:
nft -f pablo-lots-test; nft flush table filter; nft delete chain filter output; nft delete table filter
without seeing unused anonymous sets left attached to the table and
-EBUSY problems in that table.
> Another thing is that our name allocation algorithm really sucks. It
> was copied from dev_alloc_name(), but network device allocation doesn't
> happen on the same scale as we might have. I'm considering switching to
> something taking O(1). Basically, the name allocation is only useful for
> anonymous sets anyway since in all other cases you need to manually populate
> them. So if we switch to a prefix string that can't clash with user defined
> names, we can simply use an incrementing 64 bit counter. So my
> proposal would be to just use names starting with \0. Alternatively use a
> handle instead of a name for anonymous sets.
>
> The second upside is that its not possible anymore for the user to run
> into unexpected EEXIST when using setN or mapN as name.
>
> Thoughts?
I like the u64 handle for anonymous sets, it's similar to what we do
with other objects, we get O(1) handle allocation.
I think we can allow both u64 and set%d, map%d. The kernel can check
if the handle is available first, if not check if the name looks like
set%d, map%d (so the the maximum number of sets limitation only
applies to that case). Userspace only needs to send both set%d and the
u64 handle.
Would you be OK with that?
[-- Attachment #2: libmnl.patch --]
[-- Type: text/x-diff, Size: 533 bytes --]
diff --git a/src/nlmsg.c b/src/nlmsg.c
index fdb7af8..0a414a7 100644
--- a/src/nlmsg.c
+++ b/src/nlmsg.c
@@ -484,14 +484,15 @@ EXPORT_SYMBOL(mnl_nlmsg_batch_stop);
bool mnl_nlmsg_batch_next(struct mnl_nlmsg_batch *b)
{
struct nlmsghdr *nlh = b->cur;
+ bool ret = true;
if (b->buflen + nlh->nlmsg_len > b->limit) {
b->overflow = true;
- return false;
+ ret = false;
}
b->cur = b->buf + b->buflen + nlh->nlmsg_len;
b->buflen += nlh->nlmsg_len;
- return true;
+ return ret;
}
EXPORT_SYMBOL(mnl_nlmsg_batch_next);
next prev parent reply other threads:[~2014-01-26 8:55 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-25 13:03 [PATCH] netfilter: nf_tables: fix racy rule deletion Pablo Neira Ayuso
2014-01-25 13:55 ` Patrick McHardy
2014-01-25 16:35 ` Patrick McHardy
2014-01-25 17:14 ` Patrick McHardy
2014-01-26 8:54 ` Pablo Neira Ayuso [this message]
2014-01-26 12:23 ` Patrick McHardy
2014-02-05 23:02 ` Pablo Neira Ayuso
2014-02-05 15:48 ` Patrick McHardy
2014-02-05 16:38 ` Pablo Neira Ayuso
2014-02-05 16:48 ` Patrick McHardy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140126085446.GA4130@localhost \
--to=pablo@netfilter.org \
--cc=arturo.borrero.glez@gmail.com \
--cc=kaber@trash.net \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).