From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Patrick McHardy <kaber@trash.net>
Cc: netfilter-devel@vger.kernel.org, arturo.borrero.glez@gmail.com
Subject: Re: [PATCH] netfilter: nf_tables: fix racy rule deletion
Date: Sun, 26 Jan 2014 09:54:46 +0100 [thread overview]
Message-ID: <20140126085446.GA4130@localhost> (raw)
In-Reply-To: <20140125171451.GA11373@macbook.localnet>
[-- Attachment #1: Type: text/plain, Size: 4260 bytes --]
On Sat, Jan 25, 2014 at 05:14:51PM +0000, Patrick McHardy wrote:
> On Sat, Jan 25, 2014 at 04:35:33PM +0000, Patrick McHardy wrote:
> > On Sat, Jan 25, 2014 at 01:55:52PM +0000, Patrick McHardy wrote:
> > > On Sat, Jan 25, 2014 at 02:03:51PM +0100, Pablo Neira Ayuso wrote:
> > > > We still have a bug somewhere else. When creating 10000 rules like:
> > > > tcp dport { 22, 23 }, I can see more than 10000 sets.
> > > >
> > > > # ./nft-set-get ip | wc -l
> > > > 10016
> > > >
> > > > It seems set 511 is not being used. See below:
> > > >
> > > > # ./nft-rule-get
> > > > ip filter output 513 512
> > > > [ payload load 1b @ network header + 9 => reg 1 ]
> > > > [ cmp eq reg 1 0x00000006 ]
> > > > [ payload load 2b @ transport header + 2 => reg 1 ]
> > > > [ lookup reg 1 set set510 ]
> > > > [ counter pkts 0 bytes 0 ]
> > > >
> > > > ip filter output 514 513
> > > > [ payload load 1b @ network header + 9 => reg 1 ]
> > > > [ cmp eq reg 1 0x00000006 ]
> > > > [ payload load 2b @ transport header + 2 => reg 1 ]
> > > > [ lookup reg 1 set set512 ]
> > > > [ counter pkts 0 bytes 0 ]
> > > >
> > > > It seems to happen every 512 sets are added. Still investigating, so
> > > > this needs a second follow up patch to resolve what Arturo is reporting.
> > >
> > > Yeah, we seem to have a couple of bugs in nf_tables_set_alloc_name().
> > > I'll fix them up and will then have a look at this patch.
> >
> > I can't reproduce the gaps in the name space, but we have an obvious
> > overflow since we're using BITS_PER_LONG * PAGE_SIZE instead of BITS_PER_BYTE.
> >
> > This shouldn't have affected your test case though since the overflow only
> > happens for more than 32768 sets.
> >
>
> As a start, please try this patch. It fixes the overflow, might also
> fix your problem.
>
> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index 9ce3053..e8c7437 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
> @@ -1989,13 +1992,13 @@ static int nf_tables_set_alloc_name(struct nft_ctx *ctx, struct nft_set *set,
>
> if (!sscanf(i->name, name, &tmp))
> continue;
> - if (tmp < 0 || tmp > BITS_PER_LONG * PAGE_SIZE)
> + if (tmp < 0 || tmp >= BITS_PER_BYTE * PAGE_SIZE)
> continue;
>
> set_bit(tmp, inuse);
> }
>
> - n = find_first_zero_bit(inuse, BITS_PER_LONG * PAGE_SIZE);
> + n = find_first_zero_bit(inuse, BITS_PER_BYTE * PAGE_SIZE);
> free_page((unsigned long)inuse);
> }
>
Tested this patch, it works fine here, I hit -EMFILE with 32768 sets
with no crashes.
The problem I was reporting was different though, I found a bug in the
batching code of libmnl. The mnl_nlmsg_batch_next function was not
accounting the last message not fitting in the batch.
With my patch + libmnl patch I can perform:
nft -f pablo-lots-test; nft flush table filter; nft delete chain filter output; nft delete table filter
without seeing unused anonymous sets left attached to the table and
-EBUSY problems in that table.
> Another thing is that our name allocation algorithm really sucks. It
> was copied from dev_alloc_name(), but network device allocation doesn't
> happen on the same scale as we might have. I'm considering switching to
> something taking O(1). Basically, the name allocation is only useful for
> anonymous sets anyway since in all other cases you need to manually populate
> them. So if we switch to a prefix string that can't clash with user defined
> names, we can simply use an incrementing 64 bit counter. So my
> proposal would be to just use names starting with \0. Alternatively use a
> handle instead of a name for anonymous sets.
>
> The second upside is that its not possible anymore for the user to run
> into unexpected EEXIST when using setN or mapN as name.
>
> Thoughts?
I like the u64 handle for anonymous sets, it's similar to what we do
with other objects, we get O(1) handle allocation.
I think we can allow both u64 and set%d, map%d. The kernel can check
if the handle is available first, if not check if the name looks like
set%d, map%d (so the the maximum number of sets limitation only
applies to that case). Userspace only needs to send both set%d and the
u64 handle.
Would you be OK with that?
[-- Attachment #2: libmnl.patch --]
[-- Type: text/x-diff, Size: 533 bytes --]
diff --git a/src/nlmsg.c b/src/nlmsg.c
index fdb7af8..0a414a7 100644
--- a/src/nlmsg.c
+++ b/src/nlmsg.c
@@ -484,14 +484,15 @@ EXPORT_SYMBOL(mnl_nlmsg_batch_stop);
bool mnl_nlmsg_batch_next(struct mnl_nlmsg_batch *b)
{
struct nlmsghdr *nlh = b->cur;
+ bool ret = true;
if (b->buflen + nlh->nlmsg_len > b->limit) {
b->overflow = true;
- return false;
+ ret = false;
}
b->cur = b->buf + b->buflen + nlh->nlmsg_len;
b->buflen += nlh->nlmsg_len;
- return true;
+ return ret;
}
EXPORT_SYMBOL(mnl_nlmsg_batch_next);
next prev parent reply other threads:[~2014-01-26 8:55 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-25 13:03 [PATCH] netfilter: nf_tables: fix racy rule deletion Pablo Neira Ayuso
2014-01-25 13:55 ` Patrick McHardy
2014-01-25 16:35 ` Patrick McHardy
2014-01-25 17:14 ` Patrick McHardy
2014-01-26 8:54 ` Pablo Neira Ayuso [this message]
2014-01-26 12:23 ` Patrick McHardy
2014-02-05 23:02 ` Pablo Neira Ayuso
2014-02-05 15:48 ` Patrick McHardy
2014-02-05 16:38 ` Pablo Neira Ayuso
2014-02-05 16:48 ` Patrick McHardy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140126085446.GA4130@localhost \
--to=pablo@netfilter.org \
--cc=arturo.borrero.glez@gmail.com \
--cc=kaber@trash.net \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.