* nftables conntrack set ops for zone, helper assignment, etc. @ 2017-01-11 9:17 Florian Westphal 2017-01-11 20:07 ` Pablo Neira Ayuso 0 siblings, 1 reply; 7+ messages in thread From: Florian Westphal @ 2017-01-11 9:17 UTC (permalink / raw) To: netfilter-devel; +Cc: christophe.leroy [ CCing Christophe wrt. nft helper assigment ] Hi. to make bridge conntrack a reality I need a way to assign packets to conntrack zones. Whatever solution is chosen, it should allow to eventually implement all of the CT target features, namely: --helper name --ctevents event[,...] --expevents event[,...] --zone-orig {id|mark} --zone-reply {id|mark} --zone {id|mark} --timeout name Unfortunately, its not that simple to do this with nft. With CT target, all options are passed to single CT instance so all information is available at checkentry (config) time. Once the target function was run, skb->nfct is set to the template, next -j CT (if any) is a no-op (XT_CONTINUE if skb->nfct != NULL). For nft, it would be nice to use the 'set' syntax, i.e. nft ... ct helper set ftp nft ... ct zone set 4 nft ... ct timeout set policyname nft ... ct ctevents set new|destroy|mark [1] Seems simple enough. BUT: now consider case where we want to set both helper and zone: nft ... ct helper set ftp ct zone set 4 nft_ct only works with a single key, and we have no way to "propagate" or "chain" the helper and the zone set operation. We also don't know that the 'ct zone set 4' is the 'last' action, where we're "done" with said skb (i.e. skb->nfct template is fully set up the way we want). Now, lets consider using nft features like maps: nft ... ct zone set vlan id map { 1 : 1, 2 : 2, } nft ... ct original zone set meta mark nft ... ct helper set tcp dport map { 21 : ftp, 2121 : ftp, 12345 : sip } ... which would require that template instantiation happens from packet path. I'd argue that #3 above is not all that important but ability to set zone from mark, vlan id etc at runtime seems too nice to ignore. I've thought about ways to implement this. AFAICS, zone id is the only attribute that needs to be propagated and used in nf_conntrack_in() [because it influences conntrack lookup]. All other attibutes could be done at any later point in time provided it occurs before ct confirmation, i.e. we could operate on the actual conntrack and not a template. So, I propose following solution: nft_ct gets two new ops: 1) a template set op, used when we have sreg + ZONE key (zone key is to be added). 2) a 'unconfirmed set' op, used when we have sreg and one of timeout, helper, ctevents, expevents key (the latter two don't exist either yet could be added easily). This assumes that allmost all ct keys are readonly for confirmed/real conntracks, otherwise we'd need syntax to indicate that we want to operate on template/unconfirmed entry (plus netlink flag to indicate this to kernel). For 1), eval would work similar to this: fetch skb->nfct. If its a valid conntrack, bail (nothing to do). If its NULL, attach a percpu template scratch space iff that template scratch space has refcount of one. Otherwise, need to allocate fresh copy (we would not have this problem if we disallow nfqueue of nfct templates but I don't see how this could be done without compat breakage). Alternatively we could force this duplication in the nfqueue backend or disallow queueing skbs where nfct is a template. In pseudocode: template_unused(nfct): return refcount(nfct) == 1; eval(skb): nfct = skb->nfct; if (!nfct) { nfct = percpu(this_template); if (!template_unused(tmpl))) nfct = ALLOC(); /* nfct now either percpu cached object or newly alloc'd */ nfct->zone = value; inc_refcount(nfct); skb->nfct = nfct; For 2), do following: fetch skb->nfct. If its a confirmed conntrack and key set operation works with confirmed conntracks, do the set op, else bail (can't expand/change extension area for confirmed ones, but ctevent mask manipulation would work for instance). If its NULL or a template, call nf_conntrack_in to obtain skb->nfct If still NULL, bail (invalid skb). Otherwise, do the set operation. In pseudocode: doit(skb, key, value): if key == helper: if skb->nfct is unconfirmed: add helper extension and return else set key/value of skb->nfct eval(skb): if (skb->nfct == NULL or template) skb->nfct = nf_conntrack_in(skb, ...) if (skb->nfct != NULL) doit(skb) else doit(skb) Problems with above approach: 1). nft ct $key set $expr ... would behave differently depending on the key: Zone key would not work for real conntracks (can't change zone at later point). Other keys would do conntrack lookups and assign skb->nfct. I think we could change existing label/mark set support to perform a conntrack lookup too if conntrack is not yet assigned to alleviate this, otherwise it might be a bit confusion to users... 2). The templating scheme used now is rather unfriendly when we need to reset the percpu scratch space. Not a big deal if we only have to cope with zones since that no longer uses the extension area, i.e. our template object would really just be a very bloated way to pass the zone information. 3). nfqueue. We can infer its presence with refcount test on the percpu template, if its > 1, template is still assigned to another skb. If it happens we eat cost of additional kmalloc/free. I think this isn't a big deal though. 4). Doing helper and template assignment requires lookups in packet path, however I think we can make such lookups faster if needed and it would only happen for unconfirmed/new conntracks. 5) helper autoload won't work from packet path. I'd propose to work around this by allowing periodic 'modprobes' from a work queue. Alternatively we could offload this job to userspace (nft should be able to figure out the needed modules too). 6). A consequence of such a design would be that this works: nf .. ct zone set 42 ct timeout set bla The first part, ct zone set 42, would set template to zone 42, and set skb->nfct to a template The second part would do a conntrack lookup with the zone provided by the template call nf_conntrack_in(skb), with the zone provided, and assign skb->nfct to the *real* conntrack, and set timeout policy to the one provided. Problem is that things won't work when order is switched, i.e.: nf .. ct timeout set bla ct zone set 42 (first we do a conntrack lookup in default zone, then fail to set the zone because skb->nfct is already set to a non-templated conntrack). In case noone reports obvious showstoppers I'd explore an implementation of the above scheme since I believe the advantages outweight the problems. Footnotes: [1], yuck, this creates again a syntax problem with meta keyword, because "ctevents set 'mark|new' refers to the IPCT event named 'mark', not the meta mark keyword, but we can't represent this ... :-( We will probably have to use a different word, e.g. 'ctmark' instead here, or force users to use "" quotes... ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nftables conntrack set ops for zone, helper assignment, etc. 2017-01-11 9:17 nftables conntrack set ops for zone, helper assignment, etc Florian Westphal @ 2017-01-11 20:07 ` Pablo Neira Ayuso 2017-01-11 23:01 ` Florian Westphal 0 siblings, 1 reply; 7+ messages in thread From: Pablo Neira Ayuso @ 2017-01-11 20:07 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, christophe.leroy Hi Florian, On Wed, Jan 11, 2017 at 10:17:21AM +0100, Florian Westphal wrote: > [ CCing Christophe wrt. nft helper assigment ] > > Hi. > > to make bridge conntrack a reality I need a way to assign packets to conntrack zones. > > Whatever solution is chosen, it should allow to eventually implement all of the CT > target features, namely: > --helper name > --ctevents event[,...] > --expevents event[,...] > --zone-orig {id|mark} > --zone-reply {id|mark} > --zone {id|mark} > --timeout name > > Unfortunately, its not that simple to do this with nft. > With CT target, all options are passed to single CT instance so all > information is available at checkentry (config) time. > > Once the target function was run, skb->nfct is set to the template, > next -j CT (if any) is a no-op (XT_CONTINUE if skb->nfct != NULL). > > For nft, it would be nice to use the 'set' syntax, i.e. > > nft ... ct helper set ftp > nft ... ct zone set 4 > nft ... ct timeout set policyname > nft ... ct ctevents set new|destroy|mark [1] > > Seems simple enough. BUT: > now consider case where we want to set both helper and zone: > > nft ... ct helper set ftp ct zone set 4 > > nft_ct only works with a single key, and we have no way to "propagate" > or "chain" the helper and the zone set operation. > > We also don't know that the 'ct zone set 4' is the 'last' action, > where we're "done" with said skb (i.e. skb->nfct template is fully > set up the way we want). > > Now, lets consider using nft features like maps: > > nft ... ct zone set vlan id map { 1 : 1, 2 : 2, } > nft ... ct original zone set meta mark > nft ... ct helper set tcp dport map { 21 : ftp, 2121 : ftp, 12345 : sip } > > ... which would require that template instantiation happens from packet > path. > > I'd argue that #3 above is not all that important but ability to set > zone from mark, vlan id etc at runtime seems too nice to ignore. > > I've thought about ways to implement this. > > AFAICS, zone id is the only attribute that needs to be propagated and > used in nf_conntrack_in() [because it influences conntrack lookup]. Right, for custom timeouts, we can modify the original default timeout originally set via proto->new() later on. > All other attibutes could be done at any later point in time provided it > occurs before ct confirmation, i.e. we could operate on the actual conntrack > and not a template. > > So, I propose following solution: > > nft_ct gets two new ops: > > 1) a template set op, used when we have sreg + ZONE key (zone key is to be > added). > > 2) a 'unconfirmed set' op, used when we have sreg and > one of timeout, helper, ctevents, expevents key (the latter two don't exist > either yet could be added easily). > > This assumes that allmost all ct keys are readonly for confirmed/real > conntracks, otherwise we'd need syntax to indicate that we want to operate > on template/unconfirmed entry (plus netlink flag to indicate this to > kernel). > > For 1), eval would work similar to this: > fetch skb->nfct. > If its a valid conntrack, bail (nothing to do). > > If its NULL, attach a percpu template scratch space iff > that template scratch space has refcount of one. > Otherwise, need to allocate fresh copy (we would not > have this problem if we disallow nfqueue of nfct templates but I don't > see how this could be done without compat breakage). > Alternatively we could force this duplication in the nfqueue backend > or disallow queueing skbs where nfct is a template. > > In pseudocode: > template_unused(nfct): > return refcount(nfct) == 1; > > eval(skb): > nfct = skb->nfct; > if (!nfct) { > nfct = percpu(this_template); > if (!template_unused(tmpl))) > nfct = ALLOC(); > > /* nfct now either percpu cached object or newly alloc'd */ > nfct->zone = value; > inc_refcount(nfct); > skb->nfct = nfct; > > For 2), do following: > fetch skb->nfct. If its a confirmed conntrack and key set operation > works with confirmed conntracks, do the set op, else bail (can't > expand/change extension area for confirmed ones, but ctevent mask > manipulation would work for instance). > If its NULL or a template, call nf_conntrack_in to obtain skb->nfct > > If still NULL, bail (invalid skb). Otherwise, do the set operation. > > In pseudocode: > doit(skb, key, value): > if key == helper: > if skb->nfct is unconfirmed: > add helper extension and return > > else set key/value of skb->nfct > > eval(skb): > if (skb->nfct == NULL or template) > skb->nfct = nf_conntrack_in(skb, ...) > if (skb->nfct != NULL) > doit(skb) > else > doit(skb) > > Problems with above approach: > > 1). nft ct $key set $expr > > ... would behave differently depending on the key: > > Zone key would not work for real conntracks (can't change zone at later > point). > Other keys would do conntrack lookups and assign skb->nfct. I think we > could change existing label/mark set support to perform a conntrack > lookup too if conntrack is not yet assigned to alleviate this, otherwise > it might be a bit confusion to users... We still need the existing template behaviour around for iptables. > 2). The templating scheme used now is rather unfriendly when > we need to reset the percpu scratch space. Not a big deal if we > only have to cope with zones since that no longer uses the extension area, > i.e. our template object would really just be a very bloated way to > pass the zone information. I like the percpu scratch approach. Can we probably have a smaller object, so the skb->nfct interpretation depends on skb->nfctinfo so it tells us this is a template? I know you're sending a patch to remove this field, also asking if we could fit in such info in the new approach. > 3). nfqueue. We can infer its presence with refcount test > on the percpu template, if its > 1, template is still assigned to another > skb. If it happens we eat cost of additional kmalloc/free. > I think this isn't a big deal though. Right, we have to deal with the escape path. > 4). Doing helper and template assignment requires lookups in packet path, > however I think we can make such lookups faster if needed and it would > only happen for unconfirmed/new conntracks. > > 5) helper autoload won't work from packet path. Another problem we mentioned before is that we would need to extend nft_ct to include a layer 4 protocol attribute specific to look up for helpers, which we agreed is ugly. > I'd propose to work around this by allowing periodic 'modprobes' from > a work queue. > Alternatively we could offload this job to userspace (nft should be > able to figure out the needed modules too). > 6). > > A consequence of such a design would be that this works: > nf .. ct zone set 42 ct timeout set bla > > The first part, ct zone set 42, would set template to zone 42, and set > skb->nfct to a template > > The second part would do a conntrack lookup with the zone provided by > the template call nf_conntrack_in(skb), with the zone provided, and assign > skb->nfct to the *real* conntrack, and set timeout policy to the one > provided. Problem is that things won't work when order is switched, i.e.: > > nf .. ct timeout set bla ct zone set 42 > > (first we do a conntrack lookup in default zone, then fail > to set the zone because skb->nfct is already set to a non-templated > conntrack). > > In case noone reports obvious showstoppers I'd explore an implementation > of the above scheme since I believe the advantages outweight the problems. I think it should be possible to reuse the new generic object infrastructure to add new stateless objects, eg. table ip x { ct helper "ftp-standard" { # [1] type "ftp" protocol tcp # # Other configuration options: # # maximum-expectations 3 # expectation-timeout 300 # # Other specific ftp helper options: # # loose # instead of enabling this via modparam? # } chain y { type filter hook prerouting priority -250; tcp dport 21 ct helper set "ftp-standard" } } So [1] creates a ftp-standard object of helper type. This object needs some specific configuration, such as what layer 4 protocol you want to use and the helper type. This would also solve the module autoload issue since this is loaded at object creation time, from the control plane. From rules, we use the "objref" expression from rules to refer to this object, the obj->type->eval() will just fetch the ct from the skbuff and attach the helper that we can retrieve via the obj->data area. This also provides maps, since "objref" already supports this, eg. chain y { type filter hook prerouting priority 0; ct helper set tcp dport { 21 : "ftp", 2121 : "ftp" } } Note that this doesn't involve any string copy to the register area. The objref expression is similar to the lookup expression, but it just takes a _SREG to fetch the key, and then looks up for the object and run obj->type->eval() in one go. So we don't place pointers in the data register area since this is illegal in nf_tables. Similarly, we can add new ct timeout objects. Actually, we could extend struct nft_object_type so we get select_ops() there too. So we have ct object types and different operations, eg. table ip x { ct timeout "tcp-short-established" { established 100 # # ... actually, any tcp state timeout goes here, if not # specified this uses default values. # } chain y { type filter hook prerouting priority -250; ct timeout set "tcp-short-established" } In this case, we won't reuse nfnetlink_cttimeout, instead we have a native interface for nftables. So this fits into the transaction infrastructure that we have. For ct zones, we can simply rely on the templates to pass it on to nf_conntrack_in(), a ct zone object to store a single ID is probably too much. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nftables conntrack set ops for zone, helper assignment, etc. 2017-01-11 20:07 ` Pablo Neira Ayuso @ 2017-01-11 23:01 ` Florian Westphal 2017-01-12 13:00 ` Pablo Neira Ayuso 0 siblings, 1 reply; 7+ messages in thread From: Florian Westphal @ 2017-01-11 23:01 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, christophe.leroy Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > 1). nft ct $key set $expr > > > > ... would behave differently depending on the key: > > > > Zone key would not work for real conntracks (can't change zone at later > > point). > > Other keys would do conntrack lookups and assign skb->nfct. I think we > > could change existing label/mark set support to perform a conntrack > > lookup too if conntrack is not yet assigned to alleviate this, otherwise > > it might be a bit confusion to users... > > We still need the existing template behaviour around for iptables. Can you elaborate? I was only referring to the existing nft_ct.c eval set function (which deals with setting ctmark and labels). So instead of being a nop in case skb->nfct is not set (or a template) I'd propose to do a conntrack lookup and assign skb->nfct. I would not touch/change the CT target. > > 2). The templating scheme used now is rather unfriendly when > > we need to reset the percpu scratch space. Not a big deal if we > > only have to cope with zones since that no longer uses the extension area, > > i.e. our template object would really just be a very bloated way to > > pass the zone information. > > I like the percpu scratch approach. Can we probably have a smaller > object, so the skb->nfct interpretation depends on skb->nfctinfo so it > tells us this is a template? I know you're sending a patch to remove > this field, also asking if we could fit in such info in the new > approach. Probably, I also thought about adding struct nf_conn_template { atomic_t ref; u16 zone; void *helper; unsgined long *timeout; char pad[PAD]; unsigned long status; }; (templates are always allocated via kmalloc anyway so they could be smaller than nf_conn provided they provide the status flags at same offset (TEMPLATE bit is set). Alternatively we might squeeze another marker into the 3 bits we have in the "nfctinfo" area. However, I don't see a real advantage in doing so (would be a bit ugly and we'd have to make sure that all skb->nfct places check nfct_is_template() before e.g. dereferencing extension area... > > 3). nfqueue. We can infer its presence with refcount test > > on the percpu template, if its > 1, template is still assigned to another > > skb. If it happens we eat cost of additional kmalloc/free. > > I think this isn't a big deal though. > > Right, we have to deal with the escape path. Jup. Should not be too much work though. > > 4). Doing helper and template assignment requires lookups in packet path, > > however I think we can make such lookups faster if needed and it would > > only happen for unconfirmed/new conntracks. > > > > 5) helper autoload won't work from packet path. > > Another problem we mentioned before is that we would need to extend > nft_ct to include a layer 4 protocol attribute specific to look up for > helpers, which we agreed is ugly. We don't need that anymore if we do helper lookups from packet path since we can just look at the packet payload. > > A consequence of such a design would be that this works: > > nf .. ct zone set 42 ct timeout set bla > > > > The first part, ct zone set 42, would set template to zone 42, and set > > skb->nfct to a template > > > > The second part would do a conntrack lookup with the zone provided by > > the template call nf_conntrack_in(skb), with the zone provided, and assign > > skb->nfct to the *real* conntrack, and set timeout policy to the one > > provided. Problem is that things won't work when order is switched, i.e.: > > > > nf .. ct timeout set bla ct zone set 42 > > > > (first we do a conntrack lookup in default zone, then fail > > to set the zone because skb->nfct is already set to a non-templated > > conntrack). > > > > In case noone reports obvious showstoppers I'd explore an implementation > > of the above scheme since I believe the advantages outweight the problems. > > I think it should be possible to reuse the new generic object > infrastructure to add new stateless objects, eg. > > table ip x { > ct helper "ftp-standard" { # [1] > type "ftp" > protocol tcp > # > # Other configuration options: > # > # maximum-expectations 3 > # expectation-timeout 300 > # > # Other specific ftp helper options: > # > # loose # instead of enabling this via modparam? Interesting, did not think about avoiding module params completely, I like this idea. > chain y { > type filter hook prerouting priority -250; > > tcp dport 21 ct helper set "ftp-standard" > } > } > > So [1] creates a ftp-standard object of helper type. This object needs > some specific configuration, such as what layer 4 protocol you want to > use and the helper type. This would also solve the module autoload > issue since this is loaded at object creation time, from the control > plane. From rules, we use the "objref" expression from rules to refer > to this object, the obj->type->eval() will just fetch the ct from the > skbuff and attach the helper that we can retrieve via the obj->data > area. > > This also provides maps, since "objref" already supports this, eg. > > chain y { > type filter hook prerouting priority 0; > > ct helper set tcp dport { 21 : "ftp", 2121 : "ftp" } Do you mean { 21 : "ftp-standard, 2121 : "ftp-standard" } (ie. it uses an arbitrary "object name"? > Note that this doesn't involve any string copy to the register area. > The objref expression is similar to the lookup expression, but it just > takes a _SREG to fetch the key, and then looks up for the object and > run obj->type->eval() in one go. So we don't place pointers in the > data register area since this is illegal in nf_tables. > > Similarly, we can add new ct timeout objects. Actually, we could > extend struct nft_object_type so we get select_ops() there too. So we > have ct object types and different operations, eg. > > table ip x { > ct timeout "tcp-short-established" { > established 100 > # > # ... actually, any tcp state timeout goes here, if not > # specified this uses default values. > # > } > > chain y { > type filter hook prerouting priority -250; > > ct timeout set "tcp-short-established" > } > > In this case, we won't reuse nfnetlink_cttimeout, instead we have a > native interface for nftables. So this fits into the transaction > infrastructure that we have. > > For ct zones, we can simply rely on the templates to pass it on to > nf_conntrack_in(), a ct zone object to store a single ID is probably > too much. Hmm, but we can't use config path for that (or ability to set zone e.g. from skb->nfmark is off the table). Or do you suggest this: ct zone set meta mark # as per my proposal -- set template ct eventmask set new # as per my proposal -- lookup&change real conntrack ct helper set "foo" # as per my proposal, lookup and change real conntrack if its unconfirmed, EXCEPT helper is not looked up based on skb payload and name "foo" but instead taken from objref infrastructure. Same for timeout. If so, that looks good to me, I will probably start by adding zone get support, then zone set, and then look at the existing objref facilities. (eventmask is probably simple but its not high on my "wanted" list) Thanks, Florian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nftables conntrack set ops for zone, helper assignment, etc. 2017-01-11 23:01 ` Florian Westphal @ 2017-01-12 13:00 ` Pablo Neira Ayuso 2017-01-12 13:29 ` Florian Westphal 0 siblings, 1 reply; 7+ messages in thread From: Pablo Neira Ayuso @ 2017-01-12 13:00 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, christophe.leroy On Thu, Jan 12, 2017 at 12:01:41AM +0100, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > 1). nft ct $key set $expr > > > > > > ... would behave differently depending on the key: > > > > > > Zone key would not work for real conntracks (can't change zone at later > > > point). > > > Other keys would do conntrack lookups and assign skb->nfct. I think we > > > could change existing label/mark set support to perform a conntrack > > > lookup too if conntrack is not yet assigned to alleviate this, otherwise > > > it might be a bit confusion to users... > > > > We still need the existing template behaviour around for iptables. > > Can you elaborate? > I was only referring to the existing nft_ct.c eval set function > (which deals with setting ctmark and labels). > > So instead of being a nop in case skb->nfct is not set (or a template) > I'd propose to do a conntrack lookup and assign skb->nfct. > > I would not touch/change the CT target. OK, now I see. I was wondering how this would interfer with the existing CT target. > > > 2). The templating scheme used now is rather unfriendly when > > > we need to reset the percpu scratch space. Not a big deal if we > > > only have to cope with zones since that no longer uses the extension area, > > > i.e. our template object would really just be a very bloated way to > > > pass the zone information. > > > > I like the percpu scratch approach. Can we probably have a smaller > > object, so the skb->nfct interpretation depends on skb->nfctinfo so it > > tells us this is a template? I know you're sending a patch to remove > > this field, also asking if we could fit in such info in the new > > approach. > > Probably, I also thought about adding > struct nf_conn_template { > atomic_t ref; > u16 zone; > void *helper; > unsgined long *timeout; > char pad[PAD]; > unsigned long status; > }; > > (templates are always allocated via kmalloc anyway so they > could be smaller than nf_conn provided they provide the > status flags at same offset (TEMPLATE bit is set). > > Alternatively we might squeeze another marker into the 3 bits we have > in the "nfctinfo" area. > > However, I don't see a real advantage in doing so (would be a bit ugly > and we'd have to make sure that all skb->nfct places check > nfct_is_template() before e.g. dereferencing extension area... Right. > > So [1] creates a ftp-standard object of helper type. This object needs > > some specific configuration, such as what layer 4 protocol you want to > > use and the helper type. This would also solve the module autoload > > issue since this is loaded at object creation time, from the control > > plane. From rules, we use the "objref" expression from rules to refer > > to this object, the obj->type->eval() will just fetch the ct from the > > skbuff and attach the helper that we can retrieve via the obj->data > > area. > > > > This also provides maps, since "objref" already supports this, eg. > > > > chain y { > > type filter hook prerouting priority 0; > > > > ct helper set tcp dport { 21 : "ftp", 2121 : "ftp" } > > Do you mean { 21 : "ftp-standard, 2121 : "ftp-standard" } (ie. it uses > an arbitrary "object name"? Yes. That should be "ftp-standard", any arbitrary name actually. [...] > > Note that this doesn't involve any string copy to the register area. > > The objref expression is similar to the lookup expression, but it just > > takes a _SREG to fetch the key, and then looks up for the object and > > run obj->type->eval() in one go. So we don't place pointers in the > > data register area since this is illegal in nf_tables. > > > > Similarly, we can add new ct timeout objects. Actually, we could > > extend struct nft_object_type so we get select_ops() there too. So we > > have ct object types and different operations, eg. > > > > table ip x { > > ct timeout "tcp-short-established" { > > established 100 > > # > > # ... actually, any tcp state timeout goes here, if not > > # specified this uses default values. > > # > > } > > > > chain y { > > type filter hook prerouting priority -250; > > > > ct timeout set "tcp-short-established" > > } > > > > In this case, we won't reuse nfnetlink_cttimeout, instead we have a > > native interface for nftables. So this fits into the transaction > > infrastructure that we have. > > > > For ct zones, we can simply rely on the templates to pass it on to > > nf_conntrack_in(), a ct zone object to store a single ID is probably > > too much. > > Hmm, but we can't use config path for that (or ability to set zone e.g. > from skb->nfmark is off the table). > > Or do you suggest this: > > ct zone set meta mark # as per my proposal -- set template > ct eventmask set new # as per my proposal -- lookup&change real conntrack > ct helper set "foo" # as per my proposal, lookup and change real > conntrack if its unconfirmed, EXCEPT helper is not looked up based on skb > payload and name "foo" but instead taken from objref infrastructure. > Same for timeout. Yes, something like the one above. By look&change, you refer to this? ct = nf_ct_get(...); if (!ct || !nf_ct_unconfirmed(ct)) return; /* * ... update ct with helper here. */ This is fine for helpers, since the helper invocations comes late in the pipeline. For timeouts, after a second look I think this is more complicated: we set them from nf_conntrack_in() via l4proto->packet(). Thus, if we add the conntrack extension to store timeouts later on, the first conntrack state will not get refreshed with the right timeout. Assuming this is the first packet of a flow, that should be easy to amend later. But if we pickup a flow from the middle, timeout amendment gets a bit more tricky. Calling l4proto->packet() is not possible either since this sets packets as invalid, so we can drop it from the ruleset. Back to helpers, users are familiar with the current way to attach helpers, ie. from the raw chain. Am I missing anything? I'm starting to think we can't escape the conntrack template. If we make it percpu, this would actually fix another limitation in iptables since we need to set all ct extensions (helper, timeout and zones) in one go, ie. with one single rule. > If so, that looks good to me, I will probably start by adding zone get > support, then zone set, and then look at the existing objref facilities. OK. > (eventmask is probably simple but its not high on my "wanted" list) This is basically there for state synchronization and anyone performing intensive ct logging via ulogd2, to filter out events. So this can wait, yes. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nftables conntrack set ops for zone, helper assignment, etc. 2017-01-12 13:00 ` Pablo Neira Ayuso @ 2017-01-12 13:29 ` Florian Westphal 2017-01-12 14:28 ` Pablo Neira Ayuso 0 siblings, 1 reply; 7+ messages in thread From: Florian Westphal @ 2017-01-12 13:29 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, christophe.leroy Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Thu, Jan 12, 2017 at 12:01:41AM +0100, Florian Westphal wrote: > > ct zone set meta mark # as per my proposal -- set template > > ct eventmask set new # as per my proposal -- lookup&change real conntrack > > ct helper set "foo" # as per my proposal, lookup and change real > > conntrack if its unconfirmed, EXCEPT helper is not looked up based on skb > > payload and name "foo" but instead taken from objref infrastructure. > > Same for timeout. > > Yes, something like the one above. By look&change, you refer to this? > > ct = nf_ct_get(...); > if (!ct || !nf_ct_unconfirmed(ct)) > return; > > /* > * ... update ct with helper here. > */ No, I meant more intrusive version: ct = nf_ct_get(...); if (!ct || nf_ct_is_template(ct)))) nf_conntrack_in(net, info->pf, hook, skb); ct = nf_ct_get(...); if (!ct) return; switch (priv->key) { case NFT_CT_MARK: ct->mark = sreg32; break; case NFT_CT_HELPER: if (nf_ct_is_confirmed(ct)) return; /* too late */ /* update helper here. sreg contains name of objectref, * so we'd use that to obtain a helper template, then set/add * helper extension to ct based on that. * But we don't set skb->nfct to the template, we only use * it as source to get the needed helper information. */ break; case NFT_CT_EVENTMASK: struct nf_conntrack_ecache *e = nf_ct_ecache_find(ct); if (!e) { if (!nf_ct_is_confirmed(ct)) nf_ct_ecache_ext_add(ct, sreg16, ~0, GFP_ATOMIC); return; } e->ctmask = sreg16; break; case ... } NFT_CT_ZONE would have its own eval() version that doesn't call nf_conntrack_in but instead sets up skb->nfct to a percpu template that will then be used to transport the zoneinfo until nf_conntrack_in is called (either by later hook or by later call via 'ct foo set bar'). [ unless skb->nfct is != NULL, then its a no-op ] > For timeouts, after a second look I think this is more complicated: we > set them from nf_conntrack_in() via l4proto->packet(). Thus, if we add > the conntrack extension to store timeouts later on, the first > conntrack state will not get refreshed with the right timeout. > > Assuming this is the first packet of a flow, that should be easy to > amend later. But if we pickup a flow from the middle, timeout > amendment gets a bit more tricky. Calling l4proto->packet() is not > possible either since this sets packets as invalid, so we can drop it > from the ruleset. Ok. However I don't see why this isn't solveable in some way. We could e.g. add a new l4proto callback, something like l4proto->update_timeouts(ct, timeouts); > Back to helpers, users are familiar with the current way to attach > helpers, ie. from the raw chain. > > Am I missing anything? I'm starting to think we can't escape the > conntrack template. For Helpers? Why not? As long as ct isn't in the main table it should be fine afaics? (Unless you mean "can't escape conntrack template to read to helper info that we need to assign to ct from". For zones, yes, I don't see a way to avoid a template for them. But thats the only ct key where I think that a template has to be used. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nftables conntrack set ops for zone, helper assignment, etc. 2017-01-12 13:29 ` Florian Westphal @ 2017-01-12 14:28 ` Pablo Neira Ayuso 2017-01-12 14:53 ` Florian Westphal 0 siblings, 1 reply; 7+ messages in thread From: Pablo Neira Ayuso @ 2017-01-12 14:28 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, christophe.leroy On Thu, Jan 12, 2017 at 02:29:30PM +0100, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Thu, Jan 12, 2017 at 12:01:41AM +0100, Florian Westphal wrote: > > > ct zone set meta mark # as per my proposal -- set template > > > ct eventmask set new # as per my proposal -- lookup&change real conntrack > > > ct helper set "foo" # as per my proposal, lookup and change real > > > conntrack if its unconfirmed, EXCEPT helper is not looked up based on skb > > > payload and name "foo" but instead taken from objref infrastructure. > > > Same for timeout. > > > > Yes, something like the one above. By look&change, you refer to this? > > > > ct = nf_ct_get(...); > > if (!ct || !nf_ct_unconfirmed(ct)) > > return; > > > > /* > > * ... update ct with helper here. > > */ > > No, I meant more intrusive version: > > ct = nf_ct_get(...); > if (!ct || nf_ct_is_template(ct)))) > nf_conntrack_in(net, info->pf, hook, skb); OK, then we have to defrag first. Let me give another twist to this discussion, still brainstorming. Did you consider to move this logic into a explicit 'track' statement. Instead of this implicit lookup hidden in the helper/timeout assignment, syntax would be something like: tcp dport 21 track with helper "ftp-standalone" timeout "tcp-short-timeout" Note, that: track with helper "ftp-standalone" timeout "tcp-short-timeout" performs this lookup&change as you need, but in one go, only for unconfirmed conntrack. And this would be achieved with one single kernel expression, in nft --debug=netlink representation: [ track helper "ftp-standalone" timeout "tcp-short-timeout" ] For zone ID, we can use the same thing: track with zone 1 [ This can be combined with helper/timeouts too. ] In this case, we pass the zone ID via nf_conntrack_in() [ or a new function that is called from nf_conntrack_in() that takes the zone ID and that doesn't depend on templates anymore, we can strip off nf_conntrack_in() from the template logic ]. The new track kernel expression would need to perform similar tricks as 'objref', as we also want map support there. So we can add _SREG_ZONE, _SREG_TIMEOUT and _SREG_HELPER attribute to indicate the set key that we use to perform the lookups to fetch the data/objects. Syntax would be: track with helper map tcp dport { 21 : "ftp-standalone", 2121 : "ftp-standaline" } \ timeout "tcp-short-timeout" In nft --debug=netlink representation, this looks like: [ track helper sreg X timeout "tcp-short-timeout" ] _SREG_TIMEOUT and _SREG_HELPER would require a NFT_SET_OBJECT set type. _SREG_ZONE will be just fine with a NFT_SET_MAP that contains simple data maps, not pointers. With this proposal, I'm moving to the opposite side than in my previous email 8): I think we could completely get rid of templates. Look, and we could also allow a simple: track alone itself with no options, that means, track this. This results in creation of conntrack if it doesn't exists, otherwise lookup and attach to skb->nfct. I know this is different front as this would be only useful for bridge and ingress at this stage. For existing ip/ip6 families, this would nf_conntrack_in() into a no-op if skb->nfct is set. > ct = nf_ct_get(...); > if (!ct) > return; > > switch (priv->key) { > case NFT_CT_MARK: > ct->mark = sreg32; > break; > case NFT_CT_HELPER: > if (nf_ct_is_confirmed(ct)) > return; /* too late */ > > /* update helper here. sreg contains name of objectref, > * so we'd use that to obtain a helper template, then set/add > * helper extension to ct based on that. > * But we don't set skb->nfct to the template, we only use > * it as source to get the needed helper information. > */ > break; > case NFT_CT_EVENTMASK: > struct nf_conntrack_ecache *e = nf_ct_ecache_find(ct); > > if (!e) { > if (!nf_ct_is_confirmed(ct)) > nf_ct_ecache_ext_add(ct, sreg16, ~0, GFP_ATOMIC); > return; > } > > e->ctmask = sreg16; > break; > case ... > } > > NFT_CT_ZONE would have its own eval() version that doesn't > call nf_conntrack_in but instead sets up skb->nfct to a percpu template > that will then be used to transport the zoneinfo until nf_conntrack_in > is called (either by later hook or by later call via 'ct foo set bar'). > > [ unless skb->nfct is != NULL, then its a no-op ] I see. > > For timeouts, after a second look I think this is more complicated: we > > set them from nf_conntrack_in() via l4proto->packet(). Thus, if we add > > the conntrack extension to store timeouts later on, the first > > conntrack state will not get refreshed with the right timeout. > > > > Assuming this is the first packet of a flow, that should be easy to > > amend later. But if we pickup a flow from the middle, timeout > > amendment gets a bit more tricky. Calling l4proto->packet() is not > > possible either since this sets packets as invalid, so we can drop it > > from the ruleset. > > Ok. However I don't see why this isn't solveable in some way. > > We could e.g. add a new l4proto callback, something like > l4proto->update_timeouts(ct, timeouts); Yes, timeout logic happens at the end of TCP tracker timeout which seems to perform a few tricks there. And matching 'ct expiration' will change semantics for unconfirmed ct, but that should be OK, I don't know of anyone using it. > > Back to helpers, users are familiar with the current way to attach > > helpers, ie. from the raw chain. > > > > Am I missing anything? I'm starting to think we can't escape the > > conntrack template. > > For Helpers? Why not? As long as ct isn't in the main table it should > be fine afaics? (Unless you mean "can't escape conntrack template to > read to helper info that we need to assign to ct from". > > For zones, yes, I don't see a way to avoid a template for them. > But thats the only ct key where I think that a template has to be used. Yes, following the approach you propose zone would be the only one that requires the template. So this needs to happen before the conntrack hook. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nftables conntrack set ops for zone, helper assignment, etc. 2017-01-12 14:28 ` Pablo Neira Ayuso @ 2017-01-12 14:53 ` Florian Westphal 0 siblings, 0 replies; 7+ messages in thread From: Florian Westphal @ 2017-01-12 14:53 UTC (permalink / raw) To: pablo; +Cc: Florian Westphal, netfilter-devel, christophe.leroy Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > No, I meant more intrusive version: > > > > ct = nf_ct_get(...); > > if (!ct || nf_ct_is_template(ct)))) > > nf_conntrack_in(net, info->pf, hook, skb); > > OK, then we have to defrag first. Let me give another twist to this > discussion, still brainstorming. No, nf_conntrack forces a dependency on the defrag module, so unless someone inserts such rule before the defrag hook we're fine (raw table prio is fine for instance). > Did you consider to move this logic into a explicit 'track' statement. Yes, but that doesn't help a lot imo since that forces a user to structure ruleset so that they do ct zone set meta mark track with helper "ftp-standalone" which looks fine but what does the 'track' keyword provide? Would a user care? Or is that just more about exposing whats going on behind the scenes? > Instead of this implicit lookup hidden in the helper/timeout > assignment, syntax would be something like: > > tcp dport 21 track with helper "ftp-standalone" timeout "tcp-short-timeout" > > Note, that: > > track with helper "ftp-standalone" timeout "tcp-short-timeout" > > performs this lookup&change as you need, but in one go, only for > unconfirmed conntrack. And this would be achieved with one single > kernel expression, in nft --debug=netlink representation: > > [ track helper "ftp-standalone" timeout "tcp-short-timeout" ] I see, so we add a new expression instead of using ct set syntax. > For zone ID, we can use the same thing: > > track with zone 1 [ This can be combined with helper/timeouts too. ] Hmm.... So we really have comeptely new user syntax for all of this. The reason I scrapped this idea early on is that this breaks horribly with icmp error messages and reply traffic (at very least, its highly counter-intuitive/inconvenient). Consider this fine looking but non-working line: tcp dport 21 track with helper "ftp-standalone" zone 1 instead, users would have to do this: track with zone 1 # so icmp, reply traffic, ftp data traffic is in zone 1 too tcp dport 21 track with helper "ftp-standalone" and that won't work either since we would have to support 're-tracking' already tracked conntrack... I planned to force explicit 'track' for the bridge conntrack support, but that works only because the nf_conntrack_in PREROUTING equivalent added in bridge_conntrack would not ever create new conntracks (only RELATED). We can do this because there is no bridge conntrack so far so we don't have backwards compat problems. But then I thought 'why add special "track" if I can just let ct foo set bla do it for me'? > In this case, we pass the zone ID via nf_conntrack_in() [ or a new > function that is called from nf_conntrack_in() that takes the zone ID > and that doesn't depend on templates anymore, we can strip off > nf_conntrack_in() from the template logic ]. I found no way to avoid templates for the zone because of the RELATED packets. > > > Back to helpers, users are familiar with the current way to attach > > > helpers, ie. from the raw chain. > > > > > > Am I missing anything? I'm starting to think we can't escape the > > > conntrack template. > > > > For Helpers? Why not? As long as ct isn't in the main table it should > > be fine afaics? (Unless you mean "can't escape conntrack template to > > read to helper info that we need to assign to ct from". > > > > For zones, yes, I don't see a way to avoid a template for them. > > But thats the only ct key where I think that a template has to be used. > > Yes, following the approach you propose zone would be the only one > that requires the template. So this needs to happen before the > conntrack hook. Yes, thats right. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-01-12 14:53 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-01-11 9:17 nftables conntrack set ops for zone, helper assignment, etc Florian Westphal 2017-01-11 20:07 ` Pablo Neira Ayuso 2017-01-11 23:01 ` Florian Westphal 2017-01-12 13:00 ` Pablo Neira Ayuso 2017-01-12 13:29 ` Florian Westphal 2017-01-12 14:28 ` Pablo Neira Ayuso 2017-01-12 14:53 ` Florian Westphal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).