netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* nftables conntrack set ops for zone, helper assignment, etc.
@ 2017-01-11  9:17 Florian Westphal
  2017-01-11 20:07 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Westphal @ 2017-01-11  9:17 UTC (permalink / raw)
  To: netfilter-devel; +Cc: christophe.leroy

[ CCing Christophe wrt. nft helper assigment ]

Hi.

to make bridge conntrack a reality I need a way to assign packets to conntrack zones.

Whatever solution is chosen, it should allow to eventually implement all of the CT
target features, namely:
       --helper name
       --ctevents event[,...]
       --expevents event[,...]
       --zone-orig {id|mark}
       --zone-reply {id|mark}
       --zone {id|mark}
       --timeout name

Unfortunately, its not that simple to do this with nft.
With CT target, all options are passed to single CT instance so all
information is available at checkentry (config) time.

Once the target function was run, skb->nfct is set to the template,
next -j CT (if any) is a no-op (XT_CONTINUE if skb->nfct != NULL).

For nft, it would be nice to use the 'set' syntax, i.e.

nft ... ct helper set ftp
nft ... ct zone set 4
nft ... ct timeout set policyname
nft ... ct ctevents set new|destroy|mark   [1]

Seems simple enough.  BUT:
now consider case where we want to set both helper and zone:

nft ... ct helper set ftp ct zone set 4

nft_ct only works with a single key, and we have no way to "propagate"
or "chain" the helper and the zone set operation.

We also don't know that the 'ct zone set 4' is the 'last' action,
where we're "done" with said skb (i.e. skb->nfct template is fully
set up the way we want).

Now, lets consider using nft features like maps:

nft ... ct zone set vlan id map { 1 : 1, 2 : 2, }
nft ... ct original zone set meta mark
nft ... ct helper set tcp dport map { 21 : ftp, 2121 : ftp, 12345 : sip }

... which would require that template instantiation happens from packet
path.

I'd argue that #3 above is not all that important but ability to set
zone from mark, vlan id etc at runtime seems too nice to ignore.

I've thought about ways to implement this.

AFAICS, zone id is the only attribute that needs to be propagated and
used in nf_conntrack_in() [because it influences conntrack lookup].

All other attibutes could be done at any later point in time provided it
occurs before ct confirmation, i.e. we could operate on the actual conntrack
and not a template.

So, I propose following solution:

nft_ct gets two new ops:

1) a template set op, used when we have sreg + ZONE key (zone key is to be
added).

2) a 'unconfirmed set' op, used when we have sreg and
one of timeout, helper, ctevents, expevents key (the latter two don't exist
either yet could be added easily).

This assumes that allmost all ct keys are readonly for confirmed/real
conntracks, otherwise we'd need syntax to indicate that we want to operate
on template/unconfirmed entry (plus netlink flag to indicate this to
kernel).

For 1), eval would work similar to this:
  fetch skb->nfct.
  If its a valid conntrack, bail (nothing to do).

  If its NULL, attach a percpu template scratch space iff
  that template scratch space has refcount of one.
  Otherwise, need to allocate fresh copy (we would not
  have this problem if we disallow nfqueue of nfct templates but I don't
  see how this could be done without compat breakage).
  Alternatively we could force this duplication in the nfqueue backend
  or disallow queueing skbs where nfct is a template.

In pseudocode:
template_unused(nfct):
  return refcount(nfct) == 1;

eval(skb):
 nfct = skb->nfct;
 if (!nfct) {
    nfct = percpu(this_template);
    if (!template_unused(tmpl)))
       nfct = ALLOC();

 /* nfct now either percpu cached object or newly alloc'd */
 nfct->zone = value;
 inc_refcount(nfct);
 skb->nfct = nfct;

For 2), do following:
 fetch skb->nfct.  If its a confirmed conntrack and key set operation
 works with confirmed conntracks, do the set op, else bail (can't
 expand/change extension area for confirmed ones, but ctevent mask
 manipulation would work for instance).
 If its NULL or a template, call nf_conntrack_in to obtain skb->nfct

 If still NULL, bail (invalid skb).  Otherwise, do the set operation.

 In pseudocode:
 doit(skb, key, value):
   if key == helper:
	if skb->nfct is unconfirmed:
	  add helper extension and return

   else set key/value of skb->nfct

 eval(skb):
   if (skb->nfct == NULL or template)
	skb->nfct = nf_conntrack_in(skb, ...)
	if (skb->nfct != NULL)
	  doit(skb)
   else
     doit(skb)

Problems with above approach:

1).  nft ct $key set $expr

... would behave differently depending on the key:

Zone key would not work for real conntracks (can't change zone at later
point).
Other keys would do conntrack lookups and assign skb->nfct. I think we
could change existing label/mark set support to perform a conntrack
lookup too if conntrack is not yet assigned to alleviate this, otherwise
it might be a bit confusion to users...

2). The templating scheme used now is rather unfriendly when
we need to reset the percpu scratch space. Not a big deal if we
only have to cope with zones since that no longer uses the extension area,
i.e. our template object would really just be a very bloated way to
pass the zone information.

3). nfqueue.  We can infer its presence with refcount test
on the percpu template, if its > 1, template is still assigned to another
skb.  If it happens we eat cost of additional kmalloc/free.
I think this isn't a big deal though.

4). Doing helper and template assignment requires lookups in packet path,
however I think we can make such lookups faster if needed and it would
only happen for unconfirmed/new conntracks.

5) helper autoload won't work from packet path.
   I'd propose to work around this by allowing periodic 'modprobes' from
   a work queue.
   Alternatively we could offload this job to userspace (nft should be
   able to figure out the needed modules too).

6).

A consequence of such a design would be that this works:
nf .. ct zone set 42 ct timeout set bla

The first part, ct zone set 42, would set template to zone 42, and set
skb->nfct to a template

The second part would do a conntrack lookup with the zone provided by
the template call nf_conntrack_in(skb), with the zone provided, and assign
skb->nfct to the *real* conntrack, and set timeout policy to the one
provided.  Problem is that things won't work when order is switched, i.e.:

nf .. ct timeout set bla ct zone set 42

(first we do a conntrack lookup in default zone, then fail
 to set the zone because skb->nfct is already set to a non-templated
 conntrack).

In case noone reports obvious showstoppers I'd explore an implementation
of the above scheme since I believe the advantages outweight the problems.

Footnotes:

[1], yuck, this creates again a syntax problem with meta keyword, because
     "ctevents set 'mark|new' refers to the IPCT event named 'mark', not the meta mark
     keyword, but we can't represent this ... :-(
     We will probably have to use a different word, e.g. 'ctmark' instead here, or force
     users to use "" quotes...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nftables conntrack set ops for zone, helper assignment, etc.
  2017-01-11  9:17 nftables conntrack set ops for zone, helper assignment, etc Florian Westphal
@ 2017-01-11 20:07 ` Pablo Neira Ayuso
  2017-01-11 23:01   ` Florian Westphal
  0 siblings, 1 reply; 7+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-11 20:07 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, christophe.leroy

Hi Florian,

On Wed, Jan 11, 2017 at 10:17:21AM +0100, Florian Westphal wrote:
> [ CCing Christophe wrt. nft helper assigment ]
> 
> Hi.
> 
> to make bridge conntrack a reality I need a way to assign packets to conntrack zones.
> 
> Whatever solution is chosen, it should allow to eventually implement all of the CT
> target features, namely:
>        --helper name
>        --ctevents event[,...]
>        --expevents event[,...]
>        --zone-orig {id|mark}
>        --zone-reply {id|mark}
>        --zone {id|mark}
>        --timeout name
> 
> Unfortunately, its not that simple to do this with nft.
> With CT target, all options are passed to single CT instance so all
> information is available at checkentry (config) time.
> 
> Once the target function was run, skb->nfct is set to the template,
> next -j CT (if any) is a no-op (XT_CONTINUE if skb->nfct != NULL).
> 
> For nft, it would be nice to use the 'set' syntax, i.e.
> 
> nft ... ct helper set ftp
> nft ... ct zone set 4
> nft ... ct timeout set policyname
> nft ... ct ctevents set new|destroy|mark   [1]
> 
> Seems simple enough.  BUT:
> now consider case where we want to set both helper and zone:
> 
> nft ... ct helper set ftp ct zone set 4
> 
> nft_ct only works with a single key, and we have no way to "propagate"
> or "chain" the helper and the zone set operation.
> 
> We also don't know that the 'ct zone set 4' is the 'last' action,
> where we're "done" with said skb (i.e. skb->nfct template is fully
> set up the way we want).
> 
> Now, lets consider using nft features like maps:
> 
> nft ... ct zone set vlan id map { 1 : 1, 2 : 2, }
> nft ... ct original zone set meta mark
> nft ... ct helper set tcp dport map { 21 : ftp, 2121 : ftp, 12345 : sip }
> 
> ... which would require that template instantiation happens from packet
> path.
> 
> I'd argue that #3 above is not all that important but ability to set
> zone from mark, vlan id etc at runtime seems too nice to ignore.
> 
> I've thought about ways to implement this.
> 
> AFAICS, zone id is the only attribute that needs to be propagated and
> used in nf_conntrack_in() [because it influences conntrack lookup].

Right, for custom timeouts, we can modify the original default timeout
originally set via proto->new() later on.

> All other attibutes could be done at any later point in time provided it
> occurs before ct confirmation, i.e. we could operate on the actual conntrack
> and not a template.
> 
> So, I propose following solution:
> 
> nft_ct gets two new ops:
> 
> 1) a template set op, used when we have sreg + ZONE key (zone key is to be
> added).
> 
> 2) a 'unconfirmed set' op, used when we have sreg and
> one of timeout, helper, ctevents, expevents key (the latter two don't exist
> either yet could be added easily).
> 
> This assumes that allmost all ct keys are readonly for confirmed/real
> conntracks, otherwise we'd need syntax to indicate that we want to operate
> on template/unconfirmed entry (plus netlink flag to indicate this to
> kernel).
> 
> For 1), eval would work similar to this:
>   fetch skb->nfct.
>   If its a valid conntrack, bail (nothing to do).
> 
>   If its NULL, attach a percpu template scratch space iff
>   that template scratch space has refcount of one.
>   Otherwise, need to allocate fresh copy (we would not
>   have this problem if we disallow nfqueue of nfct templates but I don't
>   see how this could be done without compat breakage).
>   Alternatively we could force this duplication in the nfqueue backend
>   or disallow queueing skbs where nfct is a template.
> 
> In pseudocode:
> template_unused(nfct):
>   return refcount(nfct) == 1;
> 
> eval(skb):
>  nfct = skb->nfct;
>  if (!nfct) {
>     nfct = percpu(this_template);
>     if (!template_unused(tmpl)))
>        nfct = ALLOC();
> 
>  /* nfct now either percpu cached object or newly alloc'd */
>  nfct->zone = value;
>  inc_refcount(nfct);
>  skb->nfct = nfct;
> 
> For 2), do following:
>  fetch skb->nfct.  If its a confirmed conntrack and key set operation
>  works with confirmed conntracks, do the set op, else bail (can't
>  expand/change extension area for confirmed ones, but ctevent mask
>  manipulation would work for instance).
>  If its NULL or a template, call nf_conntrack_in to obtain skb->nfct
> 
>  If still NULL, bail (invalid skb).  Otherwise, do the set operation.
> 
>  In pseudocode:
>  doit(skb, key, value):
>    if key == helper:
> 	if skb->nfct is unconfirmed:
> 	  add helper extension and return
> 
>    else set key/value of skb->nfct
> 
>  eval(skb):
>    if (skb->nfct == NULL or template)
> 	skb->nfct = nf_conntrack_in(skb, ...)
> 	if (skb->nfct != NULL)
> 	  doit(skb)
>    else
>      doit(skb)
> 
> Problems with above approach:
> 
> 1).  nft ct $key set $expr
> 
> ... would behave differently depending on the key:
> 
> Zone key would not work for real conntracks (can't change zone at later
> point).
> Other keys would do conntrack lookups and assign skb->nfct. I think we
> could change existing label/mark set support to perform a conntrack
> lookup too if conntrack is not yet assigned to alleviate this, otherwise
> it might be a bit confusion to users...

We still need the existing template behaviour around for iptables.

> 2). The templating scheme used now is rather unfriendly when
> we need to reset the percpu scratch space. Not a big deal if we
> only have to cope with zones since that no longer uses the extension area,
> i.e. our template object would really just be a very bloated way to
> pass the zone information.

I like the percpu scratch approach. Can we probably have a smaller
object, so the skb->nfct interpretation depends on skb->nfctinfo so it
tells us this is a template? I know you're sending a patch to remove
this field, also asking if we could fit in such info in the new
approach.

> 3). nfqueue.  We can infer its presence with refcount test
> on the percpu template, if its > 1, template is still assigned to another
> skb.  If it happens we eat cost of additional kmalloc/free.
> I think this isn't a big deal though.

Right, we have to deal with the escape path.

> 4). Doing helper and template assignment requires lookups in packet path,
> however I think we can make such lookups faster if needed and it would
> only happen for unconfirmed/new conntracks.
>
> 5) helper autoload won't work from packet path.

Another problem we mentioned before is that we would need to extend
nft_ct to include a layer 4 protocol attribute specific to look up for
helpers, which we agreed is ugly.

>    I'd propose to work around this by allowing periodic 'modprobes' from
>    a work queue.
>    Alternatively we could offload this job to userspace (nft should be
>    able to figure out the needed modules too).
> 6).
> 
> A consequence of such a design would be that this works:
> nf .. ct zone set 42 ct timeout set bla
> 
> The first part, ct zone set 42, would set template to zone 42, and set
> skb->nfct to a template
> 
> The second part would do a conntrack lookup with the zone provided by
> the template call nf_conntrack_in(skb), with the zone provided, and assign
> skb->nfct to the *real* conntrack, and set timeout policy to the one
> provided.  Problem is that things won't work when order is switched, i.e.:
> 
> nf .. ct timeout set bla ct zone set 42
> 
> (first we do a conntrack lookup in default zone, then fail
>  to set the zone because skb->nfct is already set to a non-templated
>  conntrack).
> 
> In case noone reports obvious showstoppers I'd explore an implementation
> of the above scheme since I believe the advantages outweight the problems.

I think it should be possible to reuse the new generic object
infrastructure to add new stateless objects, eg.

table ip x {
        ct helper "ftp-standard" { # [1]
                type "ftp"
                protocol tcp
                #
                # Other configuration options:
                #
                # maximum-expectations 3
                # expectation-timeout 300
                #
                # Other specific ftp helper options:
                #
                # loose         # instead of enabling this via modparam?
                #
        }

        chain y {
                type filter hook prerouting priority -250;

                tcp dport 21 ct helper set "ftp-standard"
        }
}

So [1] creates a ftp-standard object of helper type. This object needs
some specific configuration, such as what layer 4 protocol you want to
use and the helper type. This would also solve the module autoload
issue since this is loaded at object creation time, from the control
plane. From rules, we use the "objref" expression from rules to refer
to this object, the obj->type->eval() will just fetch the ct from the
skbuff and attach the helper that we can retrieve via the obj->data
area.

This also provides maps, since "objref" already supports this, eg.

        chain y {
                type filter hook prerouting priority 0;

                ct helper set tcp dport { 21 : "ftp", 2121 : "ftp" }
        }

Note that this doesn't involve any string copy to the register area.
The objref expression is similar to the lookup expression, but it just
takes a _SREG to fetch the key, and then looks up for the object and
run obj->type->eval() in one go. So we don't place pointers in the
data register area since this is illegal in nf_tables.

Similarly, we can add new ct timeout objects. Actually, we could
extend struct nft_object_type so we get select_ops() there too. So we
have ct object types and different operations, eg.

table ip x {
        ct timeout "tcp-short-established" {
                established     100
                #
                # ... actually, any tcp state timeout goes here, if not
                # specified this uses default values.
                #
        }

        chain y {
                type filter hook prerouting priority -250;

                ct timeout set "tcp-short-established"
        }

In this case, we won't reuse nfnetlink_cttimeout, instead we have a
native interface for nftables. So this fits into the transaction
infrastructure that we have.

For ct zones, we can simply rely on the templates to pass it on to
nf_conntrack_in(), a ct zone object to store a single ID is probably
too much.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nftables conntrack set ops for zone, helper assignment, etc.
  2017-01-11 20:07 ` Pablo Neira Ayuso
@ 2017-01-11 23:01   ` Florian Westphal
  2017-01-12 13:00     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Westphal @ 2017-01-11 23:01 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, christophe.leroy

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > 1).  nft ct $key set $expr
> > 
> > ... would behave differently depending on the key:
> > 
> > Zone key would not work for real conntracks (can't change zone at later
> > point).
> > Other keys would do conntrack lookups and assign skb->nfct. I think we
> > could change existing label/mark set support to perform a conntrack
> > lookup too if conntrack is not yet assigned to alleviate this, otherwise
> > it might be a bit confusion to users...
> 
> We still need the existing template behaviour around for iptables.

Can you elaborate?
I was only referring to the existing nft_ct.c eval set function
(which deals with setting ctmark and labels).

So instead of being a nop in case skb->nfct is not set (or a template)
I'd propose to do a conntrack lookup and assign skb->nfct.

I would not touch/change the CT target.

> > 2). The templating scheme used now is rather unfriendly when
> > we need to reset the percpu scratch space. Not a big deal if we
> > only have to cope with zones since that no longer uses the extension area,
> > i.e. our template object would really just be a very bloated way to
> > pass the zone information.
> 
> I like the percpu scratch approach. Can we probably have a smaller
> object, so the skb->nfct interpretation depends on skb->nfctinfo so it
> tells us this is a template? I know you're sending a patch to remove
> this field, also asking if we could fit in such info in the new
> approach.

Probably, I also thought about adding
struct nf_conn_template {
	atomic_t ref;
	u16 zone;
	void *helper;
	unsgined long *timeout;
	char pad[PAD];
	unsigned long status;
};

(templates are always allocated via kmalloc anyway so they
 could be smaller than nf_conn provided they provide the
 status flags at same offset (TEMPLATE bit is set).

Alternatively we might squeeze another marker into the 3 bits we have
in the "nfctinfo" area.

However, I don't see a real advantage in doing so (would be a bit ugly
and we'd have to make sure that all skb->nfct places check
nfct_is_template() before e.g. dereferencing extension area...

> > 3). nfqueue.  We can infer its presence with refcount test
> > on the percpu template, if its > 1, template is still assigned to another
> > skb.  If it happens we eat cost of additional kmalloc/free.
> > I think this isn't a big deal though.
> 
> Right, we have to deal with the escape path.

Jup.  Should not be too much work though.

> > 4). Doing helper and template assignment requires lookups in packet path,
> > however I think we can make such lookups faster if needed and it would
> > only happen for unconfirmed/new conntracks.
> >
> > 5) helper autoload won't work from packet path.
> 
> Another problem we mentioned before is that we would need to extend
> nft_ct to include a layer 4 protocol attribute specific to look up for
> helpers, which we agreed is ugly.

We don't need that anymore if we do helper lookups from packet path since
we can just look at the packet payload.

> > A consequence of such a design would be that this works:
> > nf .. ct zone set 42 ct timeout set bla
> > 
> > The first part, ct zone set 42, would set template to zone 42, and set
> > skb->nfct to a template
> > 
> > The second part would do a conntrack lookup with the zone provided by
> > the template call nf_conntrack_in(skb), with the zone provided, and assign
> > skb->nfct to the *real* conntrack, and set timeout policy to the one
> > provided.  Problem is that things won't work when order is switched, i.e.:
> > 
> > nf .. ct timeout set bla ct zone set 42
> > 
> > (first we do a conntrack lookup in default zone, then fail
> >  to set the zone because skb->nfct is already set to a non-templated
> >  conntrack).
> > 
> > In case noone reports obvious showstoppers I'd explore an implementation
> > of the above scheme since I believe the advantages outweight the problems.
> 
> I think it should be possible to reuse the new generic object
> infrastructure to add new stateless objects, eg.
> 
> table ip x {
>         ct helper "ftp-standard" { # [1]
>                 type "ftp"
>                 protocol tcp
>                 #
>                 # Other configuration options:
>                 #
>                 # maximum-expectations 3
>                 # expectation-timeout 300
>                 #
>                 # Other specific ftp helper options:
>                 #
>                 # loose         # instead of enabling this via modparam?

Interesting, did not think about avoiding module params completely,
I like this idea.

>         chain y {
>                 type filter hook prerouting priority -250;
> 
>                 tcp dport 21 ct helper set "ftp-standard"
>         }
> }
> 
> So [1] creates a ftp-standard object of helper type. This object needs
> some specific configuration, such as what layer 4 protocol you want to
> use and the helper type. This would also solve the module autoload
> issue since this is loaded at object creation time, from the control
> plane. From rules, we use the "objref" expression from rules to refer
> to this object, the obj->type->eval() will just fetch the ct from the
> skbuff and attach the helper that we can retrieve via the obj->data
> area.
> 
> This also provides maps, since "objref" already supports this, eg.
> 
>         chain y {
>                 type filter hook prerouting priority 0;
> 
>                 ct helper set tcp dport { 21 : "ftp", 2121 : "ftp" }

Do you mean { 21 : "ftp-standard, 2121 : "ftp-standard" } (ie. it uses
an arbitrary "object name"?

> Note that this doesn't involve any string copy to the register area.
> The objref expression is similar to the lookup expression, but it just
> takes a _SREG to fetch the key, and then looks up for the object and
> run obj->type->eval() in one go. So we don't place pointers in the
> data register area since this is illegal in nf_tables.
> 
> Similarly, we can add new ct timeout objects. Actually, we could
> extend struct nft_object_type so we get select_ops() there too. So we
> have ct object types and different operations, eg.
> 
> table ip x {
>         ct timeout "tcp-short-established" {
>                 established     100
>                 #
>                 # ... actually, any tcp state timeout goes here, if not
>                 # specified this uses default values.
>                 #
>         }
> 
>         chain y {
>                 type filter hook prerouting priority -250;
> 
>                 ct timeout set "tcp-short-established"
>         }
> 
> In this case, we won't reuse nfnetlink_cttimeout, instead we have a
> native interface for nftables. So this fits into the transaction
> infrastructure that we have.
> 
> For ct zones, we can simply rely on the templates to pass it on to
> nf_conntrack_in(), a ct zone object to store a single ID is probably
> too much.

Hmm, but we can't use config path for that (or ability to set zone e.g.
from skb->nfmark is off the table).

Or do you suggest this:

ct zone set meta mark   # as per my proposal -- set template
ct eventmask set new	# as per my proposal -- lookup&change real conntrack
ct helper set "foo"	# as per my proposal, lookup and change real
conntrack if its unconfirmed, EXCEPT helper is not looked up based on skb
payload and name "foo" but instead taken from objref infrastructure.
Same for timeout.

If so, that looks good to me, I will probably start by adding zone get
support, then zone set, and then look at the existing objref facilities.

(eventmask is probably simple but its not high on my "wanted" list)

Thanks,
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nftables conntrack set ops for zone, helper assignment, etc.
  2017-01-11 23:01   ` Florian Westphal
@ 2017-01-12 13:00     ` Pablo Neira Ayuso
  2017-01-12 13:29       ` Florian Westphal
  0 siblings, 1 reply; 7+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-12 13:00 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, christophe.leroy

On Thu, Jan 12, 2017 at 12:01:41AM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > 1).  nft ct $key set $expr
> > > 
> > > ... would behave differently depending on the key:
> > > 
> > > Zone key would not work for real conntracks (can't change zone at later
> > > point).
> > > Other keys would do conntrack lookups and assign skb->nfct. I think we
> > > could change existing label/mark set support to perform a conntrack
> > > lookup too if conntrack is not yet assigned to alleviate this, otherwise
> > > it might be a bit confusion to users...
> > 
> > We still need the existing template behaviour around for iptables.
> 
> Can you elaborate?
> I was only referring to the existing nft_ct.c eval set function
> (which deals with setting ctmark and labels).
> 
> So instead of being a nop in case skb->nfct is not set (or a template)
> I'd propose to do a conntrack lookup and assign skb->nfct.
> 
> I would not touch/change the CT target.

OK, now I see. I was wondering how this would interfer with the
existing CT target.

> > > 2). The templating scheme used now is rather unfriendly when
> > > we need to reset the percpu scratch space. Not a big deal if we
> > > only have to cope with zones since that no longer uses the extension area,
> > > i.e. our template object would really just be a very bloated way to
> > > pass the zone information.
> > 
> > I like the percpu scratch approach. Can we probably have a smaller
> > object, so the skb->nfct interpretation depends on skb->nfctinfo so it
> > tells us this is a template? I know you're sending a patch to remove
> > this field, also asking if we could fit in such info in the new
> > approach.
> 
> Probably, I also thought about adding
> struct nf_conn_template {
> 	atomic_t ref;
> 	u16 zone;
> 	void *helper;
> 	unsgined long *timeout;
> 	char pad[PAD];
> 	unsigned long status;
> };
> 
> (templates are always allocated via kmalloc anyway so they
>  could be smaller than nf_conn provided they provide the
>  status flags at same offset (TEMPLATE bit is set).
> 
> Alternatively we might squeeze another marker into the 3 bits we have
> in the "nfctinfo" area.
> 
> However, I don't see a real advantage in doing so (would be a bit ugly
> and we'd have to make sure that all skb->nfct places check
> nfct_is_template() before e.g. dereferencing extension area...

Right.

> > So [1] creates a ftp-standard object of helper type. This object needs
> > some specific configuration, such as what layer 4 protocol you want to
> > use and the helper type. This would also solve the module autoload
> > issue since this is loaded at object creation time, from the control
> > plane. From rules, we use the "objref" expression from rules to refer
> > to this object, the obj->type->eval() will just fetch the ct from the
> > skbuff and attach the helper that we can retrieve via the obj->data
> > area.
> > 
> > This also provides maps, since "objref" already supports this, eg.
> > 
> >         chain y {
> >                 type filter hook prerouting priority 0;
> > 
> >                 ct helper set tcp dport { 21 : "ftp", 2121 : "ftp" }
> 
> Do you mean { 21 : "ftp-standard, 2121 : "ftp-standard" } (ie. it uses
> an arbitrary "object name"?

Yes. That should be "ftp-standard", any arbitrary name actually.

[...]
> > Note that this doesn't involve any string copy to the register area.
> > The objref expression is similar to the lookup expression, but it just
> > takes a _SREG to fetch the key, and then looks up for the object and
> > run obj->type->eval() in one go. So we don't place pointers in the
> > data register area since this is illegal in nf_tables.
> > 
> > Similarly, we can add new ct timeout objects. Actually, we could
> > extend struct nft_object_type so we get select_ops() there too. So we
> > have ct object types and different operations, eg.
> > 
> > table ip x {
> >         ct timeout "tcp-short-established" {
> >                 established     100
> >                 #
> >                 # ... actually, any tcp state timeout goes here, if not
> >                 # specified this uses default values.
> >                 #
> >         }
> > 
> >         chain y {
> >                 type filter hook prerouting priority -250;
> > 
> >                 ct timeout set "tcp-short-established"
> >         }
> > 
> > In this case, we won't reuse nfnetlink_cttimeout, instead we have a
> > native interface for nftables. So this fits into the transaction
> > infrastructure that we have.
> > 
> > For ct zones, we can simply rely on the templates to pass it on to
> > nf_conntrack_in(), a ct zone object to store a single ID is probably
> > too much.
> 
> Hmm, but we can't use config path for that (or ability to set zone e.g.
> from skb->nfmark is off the table).
> 
> Or do you suggest this:
> 
> ct zone set meta mark   # as per my proposal -- set template
> ct eventmask set new	# as per my proposal -- lookup&change real conntrack
> ct helper set "foo"	# as per my proposal, lookup and change real
> conntrack if its unconfirmed, EXCEPT helper is not looked up based on skb
> payload and name "foo" but instead taken from objref infrastructure.
> Same for timeout.

Yes, something like the one above. By look&change, you refer to this?

ct = nf_ct_get(...);
if (!ct || !nf_ct_unconfirmed(ct))
        return;

/*
 * ... update ct with helper here.
 */

This is fine for helpers, since the helper invocations comes late in
the pipeline.

For timeouts, after a second look I think this is more complicated: we
set them from nf_conntrack_in() via l4proto->packet(). Thus, if we add
the conntrack extension to store timeouts later on, the first
conntrack state will not get refreshed with the right timeout.
Assuming this is the first packet of a flow, that should be easy to
amend later. But if we pickup a flow from the middle, timeout
amendment gets a bit more tricky.  Calling l4proto->packet() is not
possible either since this sets packets as invalid, so we can drop it
from the ruleset.

Back to helpers, users are familiar with the current way to attach
helpers, ie. from the raw chain.

Am I missing anything? I'm starting to think we can't escape the
conntrack template. If we make it percpu, this would actually fix
another limitation in iptables since we need to set all ct extensions
(helper, timeout and zones) in one go, ie. with one single rule.

> If so, that looks good to me, I will probably start by adding zone get
> support, then zone set, and then look at the existing objref facilities.

OK.

> (eventmask is probably simple but its not high on my "wanted" list)

This is basically there for state synchronization and anyone
performing intensive ct logging via ulogd2, to filter out events. So
this can wait, yes.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nftables conntrack set ops for zone, helper assignment, etc.
  2017-01-12 13:00     ` Pablo Neira Ayuso
@ 2017-01-12 13:29       ` Florian Westphal
  2017-01-12 14:28         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Westphal @ 2017-01-12 13:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, christophe.leroy

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Thu, Jan 12, 2017 at 12:01:41AM +0100, Florian Westphal wrote:
> > ct zone set meta mark   # as per my proposal -- set template
> > ct eventmask set new	# as per my proposal -- lookup&change real conntrack
> > ct helper set "foo"	# as per my proposal, lookup and change real
> > conntrack if its unconfirmed, EXCEPT helper is not looked up based on skb
> > payload and name "foo" but instead taken from objref infrastructure.
> > Same for timeout.
> 
> Yes, something like the one above. By look&change, you refer to this?
> 
> ct = nf_ct_get(...);
> if (!ct || !nf_ct_unconfirmed(ct))
>         return;
> 
> /*
>  * ... update ct with helper here.
>  */

No, I meant more intrusive version:

ct = nf_ct_get(...);
if (!ct || nf_ct_is_template(ct))))
	nf_conntrack_in(net, info->pf, hook, skb);

ct = nf_ct_get(...);
if (!ct)
	return;

switch (priv->key) {
case NFT_CT_MARK:
	ct->mark = sreg32;
	break;
case NFT_CT_HELPER:
	if (nf_ct_is_confirmed(ct))
		return; /* too late */

	/* update helper here. sreg contains name of objectref,
	 * so we'd use that to obtain a helper template, then set/add
	 * helper extension to ct based on that.
	 * But we don't set skb->nfct to the template, we only use
	 * it as source to get the needed helper information.
	 */
	break;
case NFT_CT_EVENTMASK:
	struct nf_conntrack_ecache *e = nf_ct_ecache_find(ct);

	if (!e) {
		if (!nf_ct_is_confirmed(ct))
			nf_ct_ecache_ext_add(ct, sreg16, ~0, GFP_ATOMIC);
		return;
	}

	e->ctmask = sreg16;
	break;
case ...
}

NFT_CT_ZONE would have its own eval() version that doesn't
call nf_conntrack_in but instead sets up skb->nfct to a percpu template
that will then be used to transport the zoneinfo until nf_conntrack_in
is called (either by later hook or by later call via 'ct foo set bar').

[ unless skb->nfct is != NULL, then its a no-op ]

> For timeouts, after a second look I think this is more complicated: we
> set them from nf_conntrack_in() via l4proto->packet(). Thus, if we add
> the conntrack extension to store timeouts later on, the first
> conntrack state will not get refreshed with the right timeout.
>
> Assuming this is the first packet of a flow, that should be easy to
> amend later. But if we pickup a flow from the middle, timeout
> amendment gets a bit more tricky.  Calling l4proto->packet() is not
> possible either since this sets packets as invalid, so we can drop it
> from the ruleset.

Ok.  However I don't see why this isn't solveable in some way.

We could e.g. add a new l4proto callback, something like
l4proto->update_timeouts(ct, timeouts);

> Back to helpers, users are familiar with the current way to attach
> helpers, ie. from the raw chain.
> 
> Am I missing anything? I'm starting to think we can't escape the
> conntrack template.

For Helpers?  Why not?  As long as ct isn't in the main table it should
be fine afaics?  (Unless you mean "can't escape conntrack template to
read to helper info that we need to assign to ct from".

For zones, yes, I don't see a way to avoid a template for them.
But thats the only ct key where I think that a template has to be used.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nftables conntrack set ops for zone, helper assignment, etc.
  2017-01-12 13:29       ` Florian Westphal
@ 2017-01-12 14:28         ` Pablo Neira Ayuso
  2017-01-12 14:53           ` Florian Westphal
  0 siblings, 1 reply; 7+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-12 14:28 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, christophe.leroy

On Thu, Jan 12, 2017 at 02:29:30PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Thu, Jan 12, 2017 at 12:01:41AM +0100, Florian Westphal wrote:
> > > ct zone set meta mark   # as per my proposal -- set template
> > > ct eventmask set new	# as per my proposal -- lookup&change real conntrack
> > > ct helper set "foo"	# as per my proposal, lookup and change real
> > > conntrack if its unconfirmed, EXCEPT helper is not looked up based on skb
> > > payload and name "foo" but instead taken from objref infrastructure.
> > > Same for timeout.
> > 
> > Yes, something like the one above. By look&change, you refer to this?
> > 
> > ct = nf_ct_get(...);
> > if (!ct || !nf_ct_unconfirmed(ct))
> >         return;
> > 
> > /*
> >  * ... update ct with helper here.
> >  */
> 
> No, I meant more intrusive version:
> 
> ct = nf_ct_get(...);
> if (!ct || nf_ct_is_template(ct))))
> 	nf_conntrack_in(net, info->pf, hook, skb);

OK, then we have to defrag first. Let me give another twist to this
discussion, still brainstorming.

Did you consider to move this logic into a explicit 'track' statement.
Instead of this implicit lookup hidden in the helper/timeout
assignment, syntax would be something like:

        tcp dport 21 track with helper "ftp-standalone" timeout "tcp-short-timeout"

Note, that:

        track with helper "ftp-standalone" timeout "tcp-short-timeout"

performs this lookup&change as you need, but in one go, only for
unconfirmed conntrack. And this would be achieved with one single
kernel expression, in nft --debug=netlink representation:

 [ track helper "ftp-standalone" timeout "tcp-short-timeout" ]

For zone ID, we can use the same thing:

        track with zone 1       [ This can be combined with helper/timeouts too. ]

In this case, we pass the zone ID via nf_conntrack_in() [ or a new
function that is called from nf_conntrack_in() that takes the zone ID
and that doesn't depend on templates anymore, we can strip off
nf_conntrack_in() from the template logic ].

The new track kernel expression would need to perform similar tricks
as 'objref', as we also want map support there. So we can add
_SREG_ZONE, _SREG_TIMEOUT and _SREG_HELPER attribute to indicate the
set key that we use to perform the lookups to fetch the data/objects.

Syntax would be:

        track with helper map tcp dport { 21 : "ftp-standalone",
                                          2121 : "ftp-standaline" } \
                   timeout "tcp-short-timeout"

In nft --debug=netlink representation, this looks like:

  [ track helper sreg X timeout "tcp-short-timeout" ]

_SREG_TIMEOUT and _SREG_HELPER would require a NFT_SET_OBJECT set
type. _SREG_ZONE will be just fine with a NFT_SET_MAP that contains
simple data maps, not pointers.

With this proposal, I'm moving to the opposite side than in my
previous email 8): I think we could completely get rid of templates.

Look, and we could also allow a simple:

        track

alone itself with no options, that means, track this. This results in
creation of conntrack if it doesn't exists, otherwise lookup and
attach to skb->nfct. I know this is different front as this would be
only useful for bridge and ingress at this stage. For existing ip/ip6
families, this would nf_conntrack_in() into a no-op if skb->nfct is
set.

> ct = nf_ct_get(...);
> if (!ct)
> 	return;
> 
> switch (priv->key) {
> case NFT_CT_MARK:
> 	ct->mark = sreg32;
> 	break;
> case NFT_CT_HELPER:
> 	if (nf_ct_is_confirmed(ct))
> 		return; /* too late */
> 
> 	/* update helper here. sreg contains name of objectref,
> 	 * so we'd use that to obtain a helper template, then set/add
> 	 * helper extension to ct based on that.
> 	 * But we don't set skb->nfct to the template, we only use
> 	 * it as source to get the needed helper information.
> 	 */
> 	break;
> case NFT_CT_EVENTMASK:
> 	struct nf_conntrack_ecache *e = nf_ct_ecache_find(ct);
> 
> 	if (!e) {
> 		if (!nf_ct_is_confirmed(ct))
> 			nf_ct_ecache_ext_add(ct, sreg16, ~0, GFP_ATOMIC);
> 		return;
> 	}
> 
> 	e->ctmask = sreg16;
> 	break;
> case ...
> }
> 
> NFT_CT_ZONE would have its own eval() version that doesn't
> call nf_conntrack_in but instead sets up skb->nfct to a percpu template
> that will then be used to transport the zoneinfo until nf_conntrack_in
> is called (either by later hook or by later call via 'ct foo set bar').
> 
> [ unless skb->nfct is != NULL, then its a no-op ]

I see.

> > For timeouts, after a second look I think this is more complicated: we
> > set them from nf_conntrack_in() via l4proto->packet(). Thus, if we add
> > the conntrack extension to store timeouts later on, the first
> > conntrack state will not get refreshed with the right timeout.
> >
> > Assuming this is the first packet of a flow, that should be easy to
> > amend later. But if we pickup a flow from the middle, timeout
> > amendment gets a bit more tricky.  Calling l4proto->packet() is not
> > possible either since this sets packets as invalid, so we can drop it
> > from the ruleset.
> 
> Ok.  However I don't see why this isn't solveable in some way.
> 
> We could e.g. add a new l4proto callback, something like
> l4proto->update_timeouts(ct, timeouts);

Yes, timeout logic happens at the end of TCP tracker timeout which
seems to perform a few tricks there. And matching 'ct expiration' will
change semantics for unconfirmed ct, but that should be OK, I don't
know of anyone using it.

> > Back to helpers, users are familiar with the current way to attach
> > helpers, ie. from the raw chain.
> > 
> > Am I missing anything? I'm starting to think we can't escape the
> > conntrack template.
> 
> For Helpers?  Why not?  As long as ct isn't in the main table it should
> be fine afaics?  (Unless you mean "can't escape conntrack template to
> read to helper info that we need to assign to ct from".
>
> For zones, yes, I don't see a way to avoid a template for them.
> But thats the only ct key where I think that a template has to be used.

Yes, following the approach you propose zone would be the only one
that requires the template. So this needs to happen before the
conntrack hook.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nftables conntrack set ops for zone, helper assignment, etc.
  2017-01-12 14:28         ` Pablo Neira Ayuso
@ 2017-01-12 14:53           ` Florian Westphal
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Westphal @ 2017-01-12 14:53 UTC (permalink / raw)
  To: pablo; +Cc: Florian Westphal, netfilter-devel, christophe.leroy

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > No, I meant more intrusive version:
> > 
> > ct = nf_ct_get(...);
> > if (!ct || nf_ct_is_template(ct))))
> > 	nf_conntrack_in(net, info->pf, hook, skb);
> 
> OK, then we have to defrag first. Let me give another twist to this
> discussion, still brainstorming.

No, nf_conntrack forces a dependency on the defrag module, so unless
someone inserts such rule before the defrag hook we're fine (raw
table prio is fine for instance).

> Did you consider to move this logic into a explicit 'track' statement.

Yes, but that doesn't help a lot imo since that forces a user
to structure ruleset so that they do

ct zone set meta mark track with helper "ftp-standalone"

which looks fine but what does the 'track' keyword provide?
Would a user care?  Or is that just more about exposing whats going on
behind the scenes?

> Instead of this implicit lookup hidden in the helper/timeout
> assignment, syntax would be something like:
> 
>         tcp dport 21 track with helper "ftp-standalone" timeout "tcp-short-timeout"
> 
> Note, that:
> 
>         track with helper "ftp-standalone" timeout "tcp-short-timeout"
> 
> performs this lookup&change as you need, but in one go, only for
> unconfirmed conntrack. And this would be achieved with one single
> kernel expression, in nft --debug=netlink representation:
> 
>  [ track helper "ftp-standalone" timeout "tcp-short-timeout" ]

I see, so we add a new expression instead of using ct set syntax.

> For zone ID, we can use the same thing:
> 
>         track with zone 1       [ This can be combined with helper/timeouts too. ]

Hmm....  So we really have comeptely new user syntax for all of
this.

The reason I scrapped this idea early on is that this breaks horribly
with icmp error messages and reply traffic (at very least, its
highly counter-intuitive/inconvenient).  Consider this fine looking
but non-working line:

tcp dport 21 track with helper "ftp-standalone" zone 1

instead, users would have to do this:

track with zone 1 # so icmp, reply traffic, ftp data traffic is in zone 1 too
tcp dport 21 track with helper "ftp-standalone"

and that won't work either since we would have to support 're-tracking'
already tracked conntrack...

I planned to force explicit 'track' for the bridge conntrack support,
but that works only because the nf_conntrack_in PREROUTING equivalent
added in bridge_conntrack would not ever create new conntracks (only
RELATED).

We can do this because there is no bridge conntrack so far so we don't
have backwards compat problems.

But then I thought 'why add special "track" if I can just let
ct foo set bla do it for me'?

> In this case, we pass the zone ID via nf_conntrack_in() [ or a new
> function that is called from nf_conntrack_in() that takes the zone ID
> and that doesn't depend on templates anymore, we can strip off
> nf_conntrack_in() from the template logic ].

I found no way to avoid templates for the zone because of the RELATED
packets.

> > > Back to helpers, users are familiar with the current way to attach
> > > helpers, ie. from the raw chain.
> > > 
> > > Am I missing anything? I'm starting to think we can't escape the
> > > conntrack template.
> > 
> > For Helpers?  Why not?  As long as ct isn't in the main table it should
> > be fine afaics?  (Unless you mean "can't escape conntrack template to
> > read to helper info that we need to assign to ct from".
> >
> > For zones, yes, I don't see a way to avoid a template for them.
> > But thats the only ct key where I think that a template has to be used.
> 
> Yes, following the approach you propose zone would be the only one
> that requires the template. So this needs to happen before the
> conntrack hook.

Yes, thats right.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-01-12 14:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-11  9:17 nftables conntrack set ops for zone, helper assignment, etc Florian Westphal
2017-01-11 20:07 ` Pablo Neira Ayuso
2017-01-11 23:01   ` Florian Westphal
2017-01-12 13:00     ` Pablo Neira Ayuso
2017-01-12 13:29       ` Florian Westphal
2017-01-12 14:28         ` Pablo Neira Ayuso
2017-01-12 14:53           ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).