netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Florian Westphal <fw@strlen.de>
Cc: "Pablo Neira Ayuso" <pablo@netfilter.org>,
	netfilter-devel@vger.kernel.org,
	"Kadlecsik József" <kadlec@blackhole.kfki.hu>,
	"Eric Garver" <eric@garver.life>, "Phil Sutter" <phil@nwl.cc>,
	"Sabrina Dubroca" <sd@queasysnail.net>,
	"Jay Ligatti" <ligatti@usf.edu>,
	"Ori Rottenstreich" <or@cs.technion.ac.il>,
	"Kirill Kogan" <kirill.kogan@gmail.com>
Subject: Re: [PATCH nf-next 3/8] nf_tables: Add set type for arbitrary concatenation of ranges
Date: Thu, 21 Nov 2019 20:54:42 +0100	[thread overview]
Message-ID: <20191121205442.5eb3d113@redhat.com> (raw)
In-Reply-To: <20191120150609.GB20235@breakpoint.cc>

On Wed, 20 Nov 2019 16:06:09 +0100
Florian Westphal <fw@strlen.de> wrote:

> Stefano Brivio <sbrivio@redhat.com> wrote:
> > +static bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
> > +			      const u32 *key, const struct nft_set_ext **ext)
> > +{
> > +	struct nft_pipapo *priv = nft_set_priv(set);
> > +	unsigned long *res_map, *fill_map;
> > +	u8 genmask = nft_genmask_cur(net);
> > +	const u8 *rp = (const u8 *)key;
> > +	struct nft_pipapo_match *m;
> > +	struct nft_pipapo_field *f;
> > +	bool map_index;
> > +	int i;
> > +
> > +	map_index = raw_cpu_read(nft_pipapo_scratch_index);  
> 
> I'm afraid this will need local_bh_disable to prevent reentry from
> softirq processing.

I'm afraid you're right and yes, not just this: from here to the point
where we're done using scratch maps or their index. Adding in v2.

Well, at least vectorised versions for (at least) x86, ARM and s390x
won't have any overhead from it as they will already do that with
kernel_fpu_begin()/kernel_neon_begin().

> > +	rcu_read_lock();  
> 
> All netfilter hooks run inside rcu read section, so this isn't needed.

Dropping in v2.

> > +static int pipapo_realloc_scratch(struct nft_pipapo_match *m,
> > +				  unsigned long bsize_max)
> > +{
> > +	int i;
> > +
> > +	for_each_possible_cpu(i) {
> > +		unsigned long *scratch;
> > +
> > +		scratch = kzalloc_node(bsize_max * sizeof(*scratch) * 2,
> > +				       GFP_KERNEL, cpu_to_node(i));
> > +		if (!scratch)
> > +			return -ENOMEM;  
> 
> No need to handle partial failures on the other cpu / no rollback?
> AFAICS ->destroy will handle it correctly, i.e. next insertion may
> enter this again and allocate a same-sized chunk, so AFAICS its fine.

There's no need because this is just called on insertion, so the new
scratch maps will be bigger than the previous ones, and if only some
allocations here succeed, that means some CPUs have a bigger allocated
map, but the element is not inserted and the extra room is not used,
because the caller won't update m->bsize_max.

> But still, it looks odd -- perhaps add a comment that there is no need
> to rollback earlier allocs.

Sure, added.

> > +
> > +		kfree(*per_cpu_ptr(m->scratch, i));  
> 
> I was about to ask what would prevent nft_pipapo_lookup() from accessing
> m->scratch.  Its because "m" is the private clone.  Perhaps add a
> comment here to that effect.

I renamed 'm' to 'clone' and updated kerneldoc header, I think it's
even clearer than a comment that way.

> > + * @net:	Network namespace
> > + * @set:	nftables API set representation
> > + * @elem:	nftables API element representation containing key data
> > + * @flags:	If NFT_SET_ELEM_INTERVAL_END is passed, this is the end element
> > + * @ext2:	Filled with pointer to &struct nft_set_ext in inserted element
> > + *
> > + * In this set implementation, this functions needs to be called twice, with
> > + * start and end element, to obtain a valid entry insertion. Calls to this
> > + * function are serialised, so we can store element and key data on the first
> > + * call with start element, and use it on the second call once we get the end
> > + * element too.  
> 
> What guaranttess this?

Well, the only guarantee that I'm expecting here is that the insert
function is not called concurrently in the same namespace, and as far
as I understand that comes from nf_tables_valid_genid(). However:

> AFAICS userspace could send a single element, with either
> NFT_SET_ELEM_INTERVAL_END, or only the start element.

this is all possible, and:

- for a single element with NFT_SET_ELEM_INTERVAL_END, we'll reuse the
  last 'start' element ever seen, or an all-zero key if no 'start'
  elements were seen at all

- for a single 'start' element, no element is added

If the user chooses to configure firewalling with syzbot, my assumption
is that all we have to do is to avoid crashing or leaking anything.

We could opt to be stricter indeed, by checking that a single netlink
batch contains a corresponding number of start and end elements. This
can't be done by the insert function though, we don't have enough
context there.

A possible solution might be to implement a ->validate() callback
similar to what's done for chains -- or maybe export the context to
insert functions so that we can relate stuff to portid/seq.

Do you think it's worth it? In some sense, this should already be all
consistent and safe.

-- 
Stefano


  reply	other threads:[~2019-11-21 19:55 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-19  1:06 [PATCH nf-next 0/8] nftables: Set implementation for arbitrary concatenation of ranges Stefano Brivio
2019-11-19  1:06 ` [PATCH nf-next 1/8] nf_tables: Support for subkeys, set with multiple ranged fields Stefano Brivio
2019-11-19  1:06 ` [PATCH nf-next 2/8] bitmap: Introduce bitmap_cut(): cut bits and shift remaining Stefano Brivio
2019-11-19  1:06 ` [PATCH nf-next 3/8] nf_tables: Add set type for arbitrary concatenation of ranges Stefano Brivio
2019-11-20 15:06   ` Florian Westphal
2019-11-21 19:54     ` Stefano Brivio [this message]
2019-11-21 20:41       ` Florian Westphal
2019-11-21 21:00         ` Stefano Brivio
2019-11-22 13:39           ` Stefano Brivio
2019-11-19  1:06 ` [PATCH nf-next 4/8] selftests: netfilter: Introduce tests for sets with range concatenation Stefano Brivio
2019-11-19  1:06 ` [PATCH nf-next 5/8] nft_set_pipapo: Provide unrolled lookup loops for common field sizes Stefano Brivio
2019-11-19  1:06 ` [PATCH nf-next 6/8] nft_set_pipapo: Prepare for vectorised implementation: alignment Stefano Brivio
2019-11-19  1:06 ` [PATCH nf-next 7/8] nft_set_pipapo: Prepare for vectorised implementation: helpers Stefano Brivio
2019-11-19  1:06 ` [PATCH nf-next 8/8] nft_set_pipapo: Introduce AVX2-based lookup implementation Stefano Brivio
2019-11-20 15:16   ` Florian Westphal
2019-11-20 16:08     ` Phil Sutter
2019-11-21 19:55       ` Stefano Brivio
2019-11-21 20:22         ` Pablo Neira Ayuso
2019-11-21 20:46           ` Florian Westphal
2019-11-21 20:54             ` Pablo Neira Ayuso
2019-11-21 20:56               ` Pablo Neira Ayuso
2019-11-21 20:51     ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191121205442.5eb3d113@redhat.com \
    --to=sbrivio@redhat.com \
    --cc=eric@garver.life \
    --cc=fw@strlen.de \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=kirill.kogan@gmail.com \
    --cc=ligatti@usf.edu \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=or@cs.technion.ac.il \
    --cc=pablo@netfilter.org \
    --cc=phil@nwl.cc \
    --cc=sd@queasysnail.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).