* [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly
@ 2023-07-16 21:09 Xin Long
2023-07-16 21:09 ` [PATCH net-next 1/3] netfilter: allow exp not to be removed in nf_ct_find_expectation Xin Long
` (5 more replies)
0 siblings, 6 replies; 29+ messages in thread
From: Xin Long @ 2023-07-16 21:09 UTC (permalink / raw)
To: network dev, dev
Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti,
Aaron Conole
With the OVS upcall, the original ct in the skb will be dropped, and when
the skb comes back from userspace it has to create a new ct again through
nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
However, the new ct will not be able to have the exp as the original ct
has taken it away from the hash table in nf_ct_find_expectation(). This
will cause some flow never to be matched, like:
'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
matched.
OVS conntrack works around this by adding its own exp lookup function to
not remove the exp from the hash table and saving the exp and its master
info to the flow keys instead of create a real ct. But this way doesn't
work for TC act_ct.
The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
allows both OVS conntrack and TC act_ct to have a simple and clear fix
for this problem in the patch 2/3 and 3/3.
Xin Long (3):
netfilter: allow exp not to be removed in nf_ct_find_expectation
net: sched: set IPS_CONFIRMED in tmpl status only when commit is set
in act_ct
openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set
in conntrack
include/net/netfilter/nf_conntrack_expect.h | 2 +-
net/netfilter/nf_conntrack_core.c | 2 +-
net/netfilter/nf_conntrack_expect.c | 4 +-
net/netfilter/nft_ct.c | 2 +
net/openvswitch/conntrack.c | 78 +++------------------
net/sched/act_ct.c | 3 +-
6 files changed, 18 insertions(+), 73 deletions(-)
--
2.39.1
^ permalink raw reply [flat|nested] 29+ messages in thread* [PATCH net-next 1/3] netfilter: allow exp not to be removed in nf_ct_find_expectation 2023-07-16 21:09 [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly Xin Long @ 2023-07-16 21:09 ` Xin Long 2023-07-19 16:07 ` Aaron Conole 2023-07-16 21:09 ` [PATCH net-next 2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct Xin Long ` (4 subsequent siblings) 5 siblings, 1 reply; 29+ messages in thread From: Xin Long @ 2023-07-16 21:09 UTC (permalink / raw) To: network dev, dev Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti, Aaron Conole Currently nf_conntrack_in() calling nf_ct_find_expectation() will remove the exp from the hash table. However, in some scenario, we expect the exp not to be removed when the created ct will not be confirmed, like in OVS and TC conntrack in the following patches. This patch allows exp not to be removed by setting IPS_CONFIRMED in the status of the tmpl. Signed-off-by: Xin Long <lucien.xin@gmail.com> --- include/net/netfilter/nf_conntrack_expect.h | 2 +- net/netfilter/nf_conntrack_core.c | 2 +- net/netfilter/nf_conntrack_expect.c | 4 ++-- net/netfilter/nft_ct.c | 2 ++ 4 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_expect.h b/include/net/netfilter/nf_conntrack_expect.h index cf0d81be5a96..165e7a03b8e9 100644 --- a/include/net/netfilter/nf_conntrack_expect.h +++ b/include/net/netfilter/nf_conntrack_expect.h @@ -100,7 +100,7 @@ nf_ct_expect_find_get(struct net *net, struct nf_conntrack_expect * nf_ct_find_expectation(struct net *net, const struct nf_conntrack_zone *zone, - const struct nf_conntrack_tuple *tuple); + const struct nf_conntrack_tuple *tuple, bool unlink); void nf_ct_unlink_expect_report(struct nf_conntrack_expect *exp, u32 portid, int report); diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 992393102d5f..9f6f2e643575 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1756,7 +1756,7 @@ init_conntrack(struct net *net, struct nf_conn *tmpl, cnet = nf_ct_pernet(net); if (cnet->expect_count) { spin_lock_bh(&nf_conntrack_expect_lock); - exp = nf_ct_find_expectation(net, zone, tuple); + exp = nf_ct_find_expectation(net, zone, tuple, !tmpl || nf_ct_is_confirmed(tmpl)); if (exp) { /* Welcome, Mr. Bond. We've been expecting you... */ __set_bit(IPS_EXPECTED_BIT, &ct->status); diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c index 96948e98ec53..81ca348915c9 100644 --- a/net/netfilter/nf_conntrack_expect.c +++ b/net/netfilter/nf_conntrack_expect.c @@ -171,7 +171,7 @@ EXPORT_SYMBOL_GPL(nf_ct_expect_find_get); struct nf_conntrack_expect * nf_ct_find_expectation(struct net *net, const struct nf_conntrack_zone *zone, - const struct nf_conntrack_tuple *tuple) + const struct nf_conntrack_tuple *tuple, bool unlink) { struct nf_conntrack_net *cnet = nf_ct_pernet(net); struct nf_conntrack_expect *i, *exp = NULL; @@ -211,7 +211,7 @@ nf_ct_find_expectation(struct net *net, !refcount_inc_not_zero(&exp->master->ct_general.use))) return NULL; - if (exp->flags & NF_CT_EXPECT_PERMANENT) { + if (exp->flags & NF_CT_EXPECT_PERMANENT || !unlink) { refcount_inc(&exp->use); return exp; } else if (del_timer(&exp->timeout)) { diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c index 38958e067aa8..e87fd4314c68 100644 --- a/net/netfilter/nft_ct.c +++ b/net/netfilter/nft_ct.c @@ -262,6 +262,7 @@ static void nft_ct_set_zone_eval(const struct nft_expr *expr, regs->verdict.code = NF_DROP; return; } + __set_bit(IPS_CONFIRMED_BIT, &ct->status); } nf_ct_set(skb, ct, IP_CT_NEW); @@ -368,6 +369,7 @@ static bool nft_ct_tmpl_alloc_pcpu(void) return false; } + __set_bit(IPS_CONFIRMED_BIT, &tmp->status); per_cpu(nft_ct_pcpu_template, cpu) = tmp; } -- 2.39.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 1/3] netfilter: allow exp not to be removed in nf_ct_find_expectation 2023-07-16 21:09 ` [PATCH net-next 1/3] netfilter: allow exp not to be removed in nf_ct_find_expectation Xin Long @ 2023-07-19 16:07 ` Aaron Conole 0 siblings, 0 replies; 29+ messages in thread From: Aaron Conole @ 2023-07-19 16:07 UTC (permalink / raw) To: Xin Long Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti Xin Long <lucien.xin@gmail.com> writes: > Currently nf_conntrack_in() calling nf_ct_find_expectation() will > remove the exp from the hash table. However, in some scenario, we > expect the exp not to be removed when the created ct will not be > confirmed, like in OVS and TC conntrack in the following patches. > > This patch allows exp not to be removed by setting IPS_CONFIRMED > in the status of the tmpl. > > Signed-off-by: Xin Long <lucien.xin@gmail.com> > --- LGTM Acked-by: Aaron Conole <aconole@redhat.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH net-next 2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct 2023-07-16 21:09 [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly Xin Long 2023-07-16 21:09 ` [PATCH net-next 1/3] netfilter: allow exp not to be removed in nf_ct_find_expectation Xin Long @ 2023-07-16 21:09 ` Xin Long 2023-07-19 16:07 ` Aaron Conole 2023-07-19 16:44 ` Davide Caratti 2023-07-16 21:09 ` [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack Xin Long ` (3 subsequent siblings) 5 siblings, 2 replies; 29+ messages in thread From: Xin Long @ 2023-07-16 21:09 UTC (permalink / raw) To: network dev, dev Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti, Aaron Conole With the following flows, the packets will be dropped if OVS TC offload is enabled. 'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)' 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)' 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal' In the 1st flow, it finds the exp from the hashtable and removes it then creates the ct with this exp in act_ct. However, in the 2nd flow it goes to the OVS upcall at the 1st time. When the skb comes back from userspace, it has to create the ct again without exp(the exp was removed last time). With no 'rel' set in the ct, the 3rd flow can never get matched. In OVS conntrack, it works around it by adding its own exp lookup function ovs_ct_expect_find() where it doesn't remove the exp. Instead of creating a real ct, it only updates its keys with the exp and its master info. So when the skb comes back, the exp is still in the hashtable. However, we can't do this trick in act_ct, as tc flower match is using a real ct, and passing the exp and its master info to flower parsing via tc_skb_cb is also not possible (tc_skb_cb size is not big enough). The simple and clear fix is to not remove the exp at the 1st flow, namely, not set IPS_CONFIRMED in tmpl when commit is not set in act_ct. Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/sched/act_ct.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index abc71a06d634..7c652d14528b 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -1238,7 +1238,8 @@ static int tcf_ct_fill_params(struct net *net, } } - __set_bit(IPS_CONFIRMED_BIT, &tmpl->status); + if (p->ct_action & TCA_CT_ACT_COMMIT) + __set_bit(IPS_CONFIRMED_BIT, &tmpl->status); return 0; err: nf_ct_put(p->tmpl); -- 2.39.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct 2023-07-16 21:09 ` [PATCH net-next 2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct Xin Long @ 2023-07-19 16:07 ` Aaron Conole 2023-07-19 16:44 ` Davide Caratti 1 sibling, 0 replies; 29+ messages in thread From: Aaron Conole @ 2023-07-19 16:07 UTC (permalink / raw) To: Xin Long Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti Xin Long <lucien.xin@gmail.com> writes: > With the following flows, the packets will be dropped if OVS TC offload is > enabled. > > 'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)' > 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)' > 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal' > > In the 1st flow, it finds the exp from the hashtable and removes it then > creates the ct with this exp in act_ct. However, in the 2nd flow it goes > to the OVS upcall at the 1st time. When the skb comes back from userspace, > it has to create the ct again without exp(the exp was removed last time). > With no 'rel' set in the ct, the 3rd flow can never get matched. > > In OVS conntrack, it works around it by adding its own exp lookup function > ovs_ct_expect_find() where it doesn't remove the exp. Instead of creating > a real ct, it only updates its keys with the exp and its master info. So > when the skb comes back, the exp is still in the hashtable. > > However, we can't do this trick in act_ct, as tc flower match is using a > real ct, and passing the exp and its master info to flower parsing via > tc_skb_cb is also not possible (tc_skb_cb size is not big enough). > > The simple and clear fix is to not remove the exp at the 1st flow, namely, > not set IPS_CONFIRMED in tmpl when commit is not set in act_ct. > > Reported-by: Shuang Li <shuali@redhat.com> > Signed-off-by: Xin Long <lucien.xin@gmail.com> > --- Acked-by: Aaron Conole <aconole@redhat.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct 2023-07-16 21:09 ` [PATCH net-next 2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct Xin Long 2023-07-19 16:07 ` Aaron Conole @ 2023-07-19 16:44 ` Davide Caratti 1 sibling, 0 replies; 29+ messages in thread From: Davide Caratti @ 2023-07-19 16:44 UTC (permalink / raw) To: Xin Long Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner, Aaron Conole On Sun, Jul 16, 2023 at 05:09:18PM -0400, Xin Long wrote: > With the following flows, the packets will be dropped if OVS TC offload is > enabled. [...] > > The simple and clear fix is to not remove the exp at the 1st flow, namely, > not set IPS_CONFIRMED in tmpl when commit is not set in act_ct. > > Reported-by: Shuang Li <shuali@redhat.com> > Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2023-07-16 21:09 [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly Xin Long 2023-07-16 21:09 ` [PATCH net-next 1/3] netfilter: allow exp not to be removed in nf_ct_find_expectation Xin Long 2023-07-16 21:09 ` [PATCH net-next 2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct Xin Long @ 2023-07-16 21:09 ` Xin Long 2023-07-19 16:08 ` Aaron Conole 2024-06-17 20:10 ` Ilya Maximets 2023-07-19 2:58 ` [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly Jakub Kicinski ` (2 subsequent siblings) 5 siblings, 2 replies; 29+ messages in thread From: Xin Long @ 2023-07-16 21:09 UTC (permalink / raw) To: network dev, dev Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti, Aaron Conole By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed from the hashtable when lookup, we can simplify the exp processing code a lot in openvswitch conntrack. Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/openvswitch/conntrack.c | 78 +++++-------------------------------- 1 file changed, 10 insertions(+), 68 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 331730fd3580..fa955e892210 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, return 0; } -static struct nf_conntrack_expect * -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, - u16 proto, const struct sk_buff *skb) -{ - struct nf_conntrack_tuple tuple; - struct nf_conntrack_expect *exp; - - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, &tuple)) - return NULL; - - exp = __nf_ct_expect_find(net, zone, &tuple); - if (exp) { - struct nf_conntrack_tuple_hash *h; - - /* Delete existing conntrack entry, if it clashes with the - * expectation. This can happen since conntrack ALGs do not - * check for clashes between (new) expectations and existing - * conntrack entries. nf_conntrack_in() will check the - * expectations only if a conntrack entry can not be found, - * which can lead to OVS finding the expectation (here) in the - * init direction, but which will not be removed by the - * nf_conntrack_in() call, if a matching conntrack entry is - * found instead. In this case all init direction packets - * would be reported as new related packets, while reply - * direction packets would be reported as un-related - * established packets. - */ - h = nf_conntrack_find_get(net, zone, &tuple); - if (h) { - struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); - - nf_ct_delete(ct, 0, 0); - nf_ct_put(ct); - } - } - - return exp; -} - /* This replicates logic from nf_conntrack_core.c that is not exported. */ static enum ip_conntrack_info ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, const struct ovs_conntrack_info *info, struct sk_buff *skb) { - struct nf_conntrack_expect *exp; - - /* If we pass an expected packet through nf_conntrack_in() the - * expectation is typically removed, but the packet could still be - * lost in upcall processing. To prevent this from happening we - * perform an explicit expectation lookup. Expected connections are - * always new, and will be passed through conntrack only when they are - * committed, as it is OK to remove the expectation at that time. - */ - exp = ovs_ct_expect_find(net, &info->zone, info->family, skb); - if (exp) { - u8 state; - - /* NOTE: New connections are NATted and Helped only when - * committed, so we are not calling into NAT here. - */ - state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED; - __ovs_ct_update_key(key, state, &info->zone, exp->master); - } else { - struct nf_conn *ct; - int err; + struct nf_conn *ct; + int err; - err = __ovs_ct_lookup(net, key, info, skb); - if (err) - return err; + err = __ovs_ct_lookup(net, key, info, skb); + if (err) + return err; - ct = (struct nf_conn *)skb_nfct(skb); - if (ct) - nf_ct_deliver_cached_events(ct); - } + ct = (struct nf_conn *)skb_nfct(skb); + if (ct) + nf_ct_deliver_cached_events(ct); return 0; } @@ -1460,7 +1401,8 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr, if (err) goto err_free_ct; - __set_bit(IPS_CONFIRMED_BIT, &ct_info.ct->status); + if (ct_info.commit) + __set_bit(IPS_CONFIRMED_BIT, &ct_info.ct->status); return 0; err_free_ct: __ovs_ct_free_action(&ct_info); -- 2.39.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2023-07-16 21:09 ` [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack Xin Long @ 2023-07-19 16:08 ` Aaron Conole 2024-06-17 20:10 ` Ilya Maximets 1 sibling, 0 replies; 29+ messages in thread From: Aaron Conole @ 2023-07-19 16:08 UTC (permalink / raw) To: Xin Long Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti Xin Long <lucien.xin@gmail.com> writes: > By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed > from the hashtable when lookup, we can simplify the exp processing code a > lot in openvswitch conntrack. > > Signed-off-by: Xin Long <lucien.xin@gmail.com> > --- Acked-by: Aaron Conole <aconole@redhat.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2023-07-16 21:09 ` [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack Xin Long 2023-07-19 16:08 ` Aaron Conole @ 2024-06-17 20:10 ` Ilya Maximets 2024-06-18 11:34 ` Ilya Maximets 1 sibling, 1 reply; 29+ messages in thread From: Ilya Maximets @ 2024-06-17 20:10 UTC (permalink / raw) To: Xin Long, network dev, dev Cc: Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Florian Westphal, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole, i.maximets On 7/16/23 23:09, Xin Long wrote: > By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed > from the hashtable when lookup, we can simplify the exp processing code a > lot in openvswitch conntrack. > > Signed-off-by: Xin Long <lucien.xin@gmail.com> > --- > net/openvswitch/conntrack.c | 78 +++++-------------------------------- > 1 file changed, 10 insertions(+), 68 deletions(-) > > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c > index 331730fd3580..fa955e892210 100644 > --- a/net/openvswitch/conntrack.c > +++ b/net/openvswitch/conntrack.c > @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, > return 0; > } > > -static struct nf_conntrack_expect * > -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, > - u16 proto, const struct sk_buff *skb) > -{ > - struct nf_conntrack_tuple tuple; > - struct nf_conntrack_expect *exp; > - > - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, &tuple)) > - return NULL; > - > - exp = __nf_ct_expect_find(net, zone, &tuple); > - if (exp) { > - struct nf_conntrack_tuple_hash *h; > - > - /* Delete existing conntrack entry, if it clashes with the > - * expectation. This can happen since conntrack ALGs do not > - * check for clashes between (new) expectations and existing > - * conntrack entries. nf_conntrack_in() will check the > - * expectations only if a conntrack entry can not be found, > - * which can lead to OVS finding the expectation (here) in the > - * init direction, but which will not be removed by the > - * nf_conntrack_in() call, if a matching conntrack entry is > - * found instead. In this case all init direction packets > - * would be reported as new related packets, while reply > - * direction packets would be reported as un-related > - * established packets. > - */ > - h = nf_conntrack_find_get(net, zone, &tuple); > - if (h) { > - struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); > - > - nf_ct_delete(ct, 0, 0); > - nf_ct_put(ct); > - } > - } > - > - return exp; > -} > - > /* This replicates logic from nf_conntrack_core.c that is not exported. */ > static enum ip_conntrack_info > ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) > @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, > const struct ovs_conntrack_info *info, > struct sk_buff *skb) > { > - struct nf_conntrack_expect *exp; > - > - /* If we pass an expected packet through nf_conntrack_in() the > - * expectation is typically removed, but the packet could still be > - * lost in upcall processing. To prevent this from happening we > - * perform an explicit expectation lookup. Expected connections are > - * always new, and will be passed through conntrack only when they are > - * committed, as it is OK to remove the expectation at that time. > - */ > - exp = ovs_ct_expect_find(net, &info->zone, info->family, skb); > - if (exp) { > - u8 state; > - > - /* NOTE: New connections are NATted and Helped only when > - * committed, so we are not calling into NAT here. > - */ > - state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED; > - __ovs_ct_update_key(key, state, &info->zone, exp->master); Hi, Xin, others. Unfortunately, it seems like removal of this code broke the expected behavior. OVS in userspace expects that SYN packet of a new related FTP connection will get +new+rel+trk flags, but after this patch we're only getting +rel+trk and not new. This is a problem because we need to commit this connection with the label and we do that for +new packets. If we can't get +new packet we'll have to commit every single +rel+trk packet, which doesn't make a lot of sense. And it's a significant behavior change regardless. Could you, please, take a look? The issue can be reproduced by running check-kernel tests in OVS repo. 'FTP SNAT orig tuple' tests fail 100% of the time. Best regards, Ilya Maximets. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-06-17 20:10 ` Ilya Maximets @ 2024-06-18 11:34 ` Ilya Maximets 2024-06-18 14:58 ` Xin Long 0 siblings, 1 reply; 29+ messages in thread From: Ilya Maximets @ 2024-06-18 11:34 UTC (permalink / raw) To: Xin Long, network dev, dev Cc: i.maximets, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Florian Westphal, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole On 6/17/24 22:10, Ilya Maximets wrote: > On 7/16/23 23:09, Xin Long wrote: >> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed >> from the hashtable when lookup, we can simplify the exp processing code a >> lot in openvswitch conntrack. >> >> Signed-off-by: Xin Long <lucien.xin@gmail.com> >> --- >> net/openvswitch/conntrack.c | 78 +++++-------------------------------- >> 1 file changed, 10 insertions(+), 68 deletions(-) >> >> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c >> index 331730fd3580..fa955e892210 100644 >> --- a/net/openvswitch/conntrack.c >> +++ b/net/openvswitch/conntrack.c >> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, >> return 0; >> } >> >> -static struct nf_conntrack_expect * >> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, >> - u16 proto, const struct sk_buff *skb) >> -{ >> - struct nf_conntrack_tuple tuple; >> - struct nf_conntrack_expect *exp; >> - >> - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, &tuple)) >> - return NULL; >> - >> - exp = __nf_ct_expect_find(net, zone, &tuple); >> - if (exp) { >> - struct nf_conntrack_tuple_hash *h; >> - >> - /* Delete existing conntrack entry, if it clashes with the >> - * expectation. This can happen since conntrack ALGs do not >> - * check for clashes between (new) expectations and existing >> - * conntrack entries. nf_conntrack_in() will check the >> - * expectations only if a conntrack entry can not be found, >> - * which can lead to OVS finding the expectation (here) in the >> - * init direction, but which will not be removed by the >> - * nf_conntrack_in() call, if a matching conntrack entry is >> - * found instead. In this case all init direction packets >> - * would be reported as new related packets, while reply >> - * direction packets would be reported as un-related >> - * established packets. >> - */ >> - h = nf_conntrack_find_get(net, zone, &tuple); >> - if (h) { >> - struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); >> - >> - nf_ct_delete(ct, 0, 0); >> - nf_ct_put(ct); >> - } >> - } >> - >> - return exp; >> -} >> - >> /* This replicates logic from nf_conntrack_core.c that is not exported. */ >> static enum ip_conntrack_info >> ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) >> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, >> const struct ovs_conntrack_info *info, >> struct sk_buff *skb) >> { >> - struct nf_conntrack_expect *exp; >> - >> - /* If we pass an expected packet through nf_conntrack_in() the >> - * expectation is typically removed, but the packet could still be >> - * lost in upcall processing. To prevent this from happening we >> - * perform an explicit expectation lookup. Expected connections are >> - * always new, and will be passed through conntrack only when they are >> - * committed, as it is OK to remove the expectation at that time. >> - */ >> - exp = ovs_ct_expect_find(net, &info->zone, info->family, skb); >> - if (exp) { >> - u8 state; >> - >> - /* NOTE: New connections are NATted and Helped only when >> - * committed, so we are not calling into NAT here. >> - */ >> - state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED; >> - __ovs_ct_update_key(key, state, &info->zone, exp->master); > > Hi, Xin, others. > > Unfortunately, it seems like removal of this code broke the expected behavior. > OVS in userspace expects that SYN packet of a new related FTP connection will > get +new+rel+trk flags, but after this patch we're only getting +rel+trk and not > new. This is a problem because we need to commit this connection with the label > and we do that for +new packets. If we can't get +new packet we'll have to commit > every single +rel+trk packet, which doesn't make a lot of sense. And it's a > significant behavior change regardless. Interestingly enough I see +new+rel+trk packets in cases without SNAT, but we can only get +rel+trk in cases with SNAT. So, this may be just a generic conntrack bug somewhere. At least the behavior seems fairly inconsistent. > > Could you, please, take a look? > > The issue can be reproduced by running check-kernel tests in OVS repo. > 'FTP SNAT orig tuple' tests fail 100% of the time. > > Best regards, Ilya Maximets. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-06-18 11:34 ` Ilya Maximets @ 2024-06-18 14:58 ` Xin Long 2024-06-18 15:50 ` Ilya Maximets 0 siblings, 1 reply; 29+ messages in thread From: Xin Long @ 2024-06-18 14:58 UTC (permalink / raw) To: Ilya Maximets Cc: network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Florian Westphal, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets <i.maximets@ovn.org> wrote: > > On 6/17/24 22:10, Ilya Maximets wrote: > > On 7/16/23 23:09, Xin Long wrote: > >> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed > >> from the hashtable when lookup, we can simplify the exp processing code a > >> lot in openvswitch conntrack. > >> > >> Signed-off-by: Xin Long <lucien.xin@gmail.com> > >> --- > >> net/openvswitch/conntrack.c | 78 +++++-------------------------------- > >> 1 file changed, 10 insertions(+), 68 deletions(-) > >> > >> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c > >> index 331730fd3580..fa955e892210 100644 > >> --- a/net/openvswitch/conntrack.c > >> +++ b/net/openvswitch/conntrack.c > >> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, > >> return 0; > >> } > >> > >> -static struct nf_conntrack_expect * > >> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, > >> - u16 proto, const struct sk_buff *skb) > >> -{ > >> - struct nf_conntrack_tuple tuple; > >> - struct nf_conntrack_expect *exp; > >> - > >> - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, &tuple)) > >> - return NULL; > >> - > >> - exp = __nf_ct_expect_find(net, zone, &tuple); > >> - if (exp) { > >> - struct nf_conntrack_tuple_hash *h; > >> - > >> - /* Delete existing conntrack entry, if it clashes with the > >> - * expectation. This can happen since conntrack ALGs do not > >> - * check for clashes between (new) expectations and existing > >> - * conntrack entries. nf_conntrack_in() will check the > >> - * expectations only if a conntrack entry can not be found, > >> - * which can lead to OVS finding the expectation (here) in the > >> - * init direction, but which will not be removed by the > >> - * nf_conntrack_in() call, if a matching conntrack entry is > >> - * found instead. In this case all init direction packets > >> - * would be reported as new related packets, while reply > >> - * direction packets would be reported as un-related > >> - * established packets. > >> - */ > >> - h = nf_conntrack_find_get(net, zone, &tuple); > >> - if (h) { > >> - struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); > >> - > >> - nf_ct_delete(ct, 0, 0); > >> - nf_ct_put(ct); > >> - } > >> - } > >> - > >> - return exp; > >> -} > >> - > >> /* This replicates logic from nf_conntrack_core.c that is not exported. */ > >> static enum ip_conntrack_info > >> ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) > >> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, > >> const struct ovs_conntrack_info *info, > >> struct sk_buff *skb) > >> { > >> - struct nf_conntrack_expect *exp; > >> - > >> - /* If we pass an expected packet through nf_conntrack_in() the > >> - * expectation is typically removed, but the packet could still be > >> - * lost in upcall processing. To prevent this from happening we > >> - * perform an explicit expectation lookup. Expected connections are > >> - * always new, and will be passed through conntrack only when they are > >> - * committed, as it is OK to remove the expectation at that time. > >> - */ > >> - exp = ovs_ct_expect_find(net, &info->zone, info->family, skb); > >> - if (exp) { > >> - u8 state; > >> - > >> - /* NOTE: New connections are NATted and Helped only when > >> - * committed, so we are not calling into NAT here. > >> - */ > >> - state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED; > >> - __ovs_ct_update_key(key, state, &info->zone, exp->master); > > > > Hi, Xin, others. > > > > Unfortunately, it seems like removal of this code broke the expected behavior. > > OVS in userspace expects that SYN packet of a new related FTP connection will > > get +new+rel+trk flags, but after this patch we're only getting +rel+trk and not > > new. This is a problem because we need to commit this connection with the label > > and we do that for +new packets. If we can't get +new packet we'll have to commit > > every single +rel+trk packet, which doesn't make a lot of sense. And it's a > > significant behavior change regardless. > > Interestingly enough I see +new+rel+trk packets in cases without SNAT, > but we can only get +rel+trk in cases with SNAT. So, this may be just > a generic conntrack bug somewhere. At least the behavior seems fairly > inconsistent. > In nf_conntrack, IP_CT_RELATED and IP_CT_NEW do not exist at the same time. With this patch, we expect OVS_CS_F_RELATED and OVS_CS_F_NEW are set at the same time by ovs_ct_update_key() when this related ct is not confirmed. The check-kernel test of "FTP SNAT orig tuple" skiped on my env somehow: # make check-kernel 144: conntrack - FTP SNAT orig tuple skipped (system-traffic.at:7295) Any idea why? or do you know any other testcase that expects +new+rel+trk but returns +rel+trk only? Thanks. > > > > Could you, please, take a look? > > > > The issue can be reproduced by running check-kernel tests in OVS repo. > > 'FTP SNAT orig tuple' tests fail 100% of the time. > > > > Best regards, Ilya Maximets. > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-06-18 14:58 ` Xin Long @ 2024-06-18 15:50 ` Ilya Maximets 2024-06-19 12:58 ` Ilya Maximets 0 siblings, 1 reply; 29+ messages in thread From: Ilya Maximets @ 2024-06-18 15:50 UTC (permalink / raw) To: Xin Long Cc: i.maximets, network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Florian Westphal, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole On 6/18/24 16:58, Xin Long wrote: > On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets <i.maximets@ovn.org> wrote: >> >> On 6/17/24 22:10, Ilya Maximets wrote: >>> On 7/16/23 23:09, Xin Long wrote: >>>> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed >>>> from the hashtable when lookup, we can simplify the exp processing code a >>>> lot in openvswitch conntrack. >>>> >>>> Signed-off-by: Xin Long <lucien.xin@gmail.com> >>>> --- >>>> net/openvswitch/conntrack.c | 78 +++++-------------------------------- >>>> 1 file changed, 10 insertions(+), 68 deletions(-) >>>> >>>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c >>>> index 331730fd3580..fa955e892210 100644 >>>> --- a/net/openvswitch/conntrack.c >>>> +++ b/net/openvswitch/conntrack.c >>>> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, >>>> return 0; >>>> } >>>> >>>> -static struct nf_conntrack_expect * >>>> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, >>>> - u16 proto, const struct sk_buff *skb) >>>> -{ >>>> - struct nf_conntrack_tuple tuple; >>>> - struct nf_conntrack_expect *exp; >>>> - >>>> - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, &tuple)) >>>> - return NULL; >>>> - >>>> - exp = __nf_ct_expect_find(net, zone, &tuple); >>>> - if (exp) { >>>> - struct nf_conntrack_tuple_hash *h; >>>> - >>>> - /* Delete existing conntrack entry, if it clashes with the >>>> - * expectation. This can happen since conntrack ALGs do not >>>> - * check for clashes between (new) expectations and existing >>>> - * conntrack entries. nf_conntrack_in() will check the >>>> - * expectations only if a conntrack entry can not be found, >>>> - * which can lead to OVS finding the expectation (here) in the >>>> - * init direction, but which will not be removed by the >>>> - * nf_conntrack_in() call, if a matching conntrack entry is >>>> - * found instead. In this case all init direction packets >>>> - * would be reported as new related packets, while reply >>>> - * direction packets would be reported as un-related >>>> - * established packets. >>>> - */ >>>> - h = nf_conntrack_find_get(net, zone, &tuple); >>>> - if (h) { >>>> - struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); >>>> - >>>> - nf_ct_delete(ct, 0, 0); >>>> - nf_ct_put(ct); >>>> - } >>>> - } >>>> - >>>> - return exp; >>>> -} >>>> - >>>> /* This replicates logic from nf_conntrack_core.c that is not exported. */ >>>> static enum ip_conntrack_info >>>> ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) >>>> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, >>>> const struct ovs_conntrack_info *info, >>>> struct sk_buff *skb) >>>> { >>>> - struct nf_conntrack_expect *exp; >>>> - >>>> - /* If we pass an expected packet through nf_conntrack_in() the >>>> - * expectation is typically removed, but the packet could still be >>>> - * lost in upcall processing. To prevent this from happening we >>>> - * perform an explicit expectation lookup. Expected connections are >>>> - * always new, and will be passed through conntrack only when they are >>>> - * committed, as it is OK to remove the expectation at that time. >>>> - */ >>>> - exp = ovs_ct_expect_find(net, &info->zone, info->family, skb); >>>> - if (exp) { >>>> - u8 state; >>>> - >>>> - /* NOTE: New connections are NATted and Helped only when >>>> - * committed, so we are not calling into NAT here. >>>> - */ >>>> - state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED; >>>> - __ovs_ct_update_key(key, state, &info->zone, exp->master); >>> >>> Hi, Xin, others. >>> >>> Unfortunately, it seems like removal of this code broke the expected behavior. >>> OVS in userspace expects that SYN packet of a new related FTP connection will >>> get +new+rel+trk flags, but after this patch we're only getting +rel+trk and not >>> new. This is a problem because we need to commit this connection with the label >>> and we do that for +new packets. If we can't get +new packet we'll have to commit >>> every single +rel+trk packet, which doesn't make a lot of sense. And it's a >>> significant behavior change regardless. >> >> Interestingly enough I see +new+rel+trk packets in cases without SNAT, >> but we can only get +rel+trk in cases with SNAT. So, this may be just >> a generic conntrack bug somewhere. At least the behavior seems fairly >> inconsistent. >> > In nf_conntrack, IP_CT_RELATED and IP_CT_NEW do not exist at the same > time. With this patch, we expect OVS_CS_F_RELATED and OVS_CS_F_NEW > are set at the same time by ovs_ct_update_key() when this related ct > is not confirmed. > > The check-kernel test of "FTP SNAT orig tuple" skiped on my env somehow: > > # make check-kernel > 144: conntrack - FTP SNAT orig tuple skipped (system-traffic.at:7295) > > Any idea why? or do you know any other testcase that expects +new+rel+trk > but returns +rel+trk only? You need to install lftp and pyftpdlib. The pyftpdlib may only be available via pip on some systems. > > Thanks. >>> >>> Could you, please, take a look? >>> >>> The issue can be reproduced by running check-kernel tests in OVS repo. >>> 'FTP SNAT orig tuple' tests fail 100% of the time. >>> >>> Best regards, Ilya Maximets. >> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-06-18 15:50 ` Ilya Maximets @ 2024-06-19 12:58 ` Ilya Maximets 2024-06-19 14:07 ` Xin Long 0 siblings, 1 reply; 29+ messages in thread From: Ilya Maximets @ 2024-06-19 12:58 UTC (permalink / raw) To: Xin Long Cc: i.maximets, network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Florian Westphal, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole On 6/18/24 17:50, Ilya Maximets wrote: > On 6/18/24 16:58, Xin Long wrote: >> On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets <i.maximets@ovn.org> wrote: >>> >>> On 6/17/24 22:10, Ilya Maximets wrote: >>>> On 7/16/23 23:09, Xin Long wrote: >>>>> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed >>>>> from the hashtable when lookup, we can simplify the exp processing code a >>>>> lot in openvswitch conntrack. >>>>> >>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com> >>>>> --- >>>>> net/openvswitch/conntrack.c | 78 +++++-------------------------------- >>>>> 1 file changed, 10 insertions(+), 68 deletions(-) >>>>> >>>>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c >>>>> index 331730fd3580..fa955e892210 100644 >>>>> --- a/net/openvswitch/conntrack.c >>>>> +++ b/net/openvswitch/conntrack.c >>>>> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, >>>>> return 0; >>>>> } >>>>> >>>>> -static struct nf_conntrack_expect * >>>>> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, >>>>> - u16 proto, const struct sk_buff *skb) >>>>> -{ >>>>> - struct nf_conntrack_tuple tuple; >>>>> - struct nf_conntrack_expect *exp; >>>>> - >>>>> - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, &tuple)) >>>>> - return NULL; >>>>> - >>>>> - exp = __nf_ct_expect_find(net, zone, &tuple); >>>>> - if (exp) { >>>>> - struct nf_conntrack_tuple_hash *h; >>>>> - >>>>> - /* Delete existing conntrack entry, if it clashes with the >>>>> - * expectation. This can happen since conntrack ALGs do not >>>>> - * check for clashes between (new) expectations and existing >>>>> - * conntrack entries. nf_conntrack_in() will check the >>>>> - * expectations only if a conntrack entry can not be found, >>>>> - * which can lead to OVS finding the expectation (here) in the >>>>> - * init direction, but which will not be removed by the >>>>> - * nf_conntrack_in() call, if a matching conntrack entry is >>>>> - * found instead. In this case all init direction packets >>>>> - * would be reported as new related packets, while reply >>>>> - * direction packets would be reported as un-related >>>>> - * established packets. >>>>> - */ >>>>> - h = nf_conntrack_find_get(net, zone, &tuple); >>>>> - if (h) { >>>>> - struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); >>>>> - >>>>> - nf_ct_delete(ct, 0, 0); >>>>> - nf_ct_put(ct); >>>>> - } >>>>> - } >>>>> - >>>>> - return exp; >>>>> -} >>>>> - >>>>> /* This replicates logic from nf_conntrack_core.c that is not exported. */ >>>>> static enum ip_conntrack_info >>>>> ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) >>>>> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, >>>>> const struct ovs_conntrack_info *info, >>>>> struct sk_buff *skb) >>>>> { >>>>> - struct nf_conntrack_expect *exp; >>>>> - >>>>> - /* If we pass an expected packet through nf_conntrack_in() the >>>>> - * expectation is typically removed, but the packet could still be >>>>> - * lost in upcall processing. To prevent this from happening we >>>>> - * perform an explicit expectation lookup. Expected connections are >>>>> - * always new, and will be passed through conntrack only when they are >>>>> - * committed, as it is OK to remove the expectation at that time. >>>>> - */ >>>>> - exp = ovs_ct_expect_find(net, &info->zone, info->family, skb); >>>>> - if (exp) { >>>>> - u8 state; >>>>> - >>>>> - /* NOTE: New connections are NATted and Helped only when >>>>> - * committed, so we are not calling into NAT here. >>>>> - */ >>>>> - state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED; >>>>> - __ovs_ct_update_key(key, state, &info->zone, exp->master); >>>> >>>> Hi, Xin, others. >>>> >>>> Unfortunately, it seems like removal of this code broke the expected behavior. >>>> OVS in userspace expects that SYN packet of a new related FTP connection will >>>> get +new+rel+trk flags, but after this patch we're only getting +rel+trk and not >>>> new. This is a problem because we need to commit this connection with the label >>>> and we do that for +new packets. If we can't get +new packet we'll have to commit >>>> every single +rel+trk packet, which doesn't make a lot of sense. And it's a >>>> significant behavior change regardless. >>> >>> Interestingly enough I see +new+rel+trk packets in cases without SNAT, >>> but we can only get +rel+trk in cases with SNAT. So, this may be just >>> a generic conntrack bug somewhere. At least the behavior seems fairly >>> inconsistent. >>> >> In nf_conntrack, IP_CT_RELATED and IP_CT_NEW do not exist at the same >> time. With this patch, we expect OVS_CS_F_RELATED and OVS_CS_F_NEW >> are set at the same time by ovs_ct_update_key() when this related ct >> is not confirmed. >> >> The check-kernel test of "FTP SNAT orig tuple" skiped on my env somehow: >> >> # make check-kernel >> 144: conntrack - FTP SNAT orig tuple skipped (system-traffic.at:7295) >> >> Any idea why? or do you know any other testcase that expects +new+rel+trk >> but returns +rel+trk only? > > You need to install lftp and pyftpdlib. The pyftpdlib may only be available > via pip on some systems. > >> >> Thanks. >>>> >>>> Could you, please, take a look? >>>> >>>> The issue can be reproduced by running check-kernel tests in OVS repo. >>>> 'FTP SNAT orig tuple' tests fail 100% of the time. >>>> >>>> Best regards, Ilya Maximets. >>> > Hmm. After further investigation, it seems that the issue is not about ct state, but the ct label. Before this commit we had information about both the original tuple of the parent connection and the mark/label of the parent connection: system@ovs-system: miss upcall: recirc_id(0x2),dp_hash(0),skb_priority(0),in_port(3),skb_mark(0),ct_state(0x25), ct_zone(0x1),ct_mark(0),ct_label(0x4d2000000000000000000000001), ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=6,tp_src=50648,tp_dst=21), eth(src=de:d9:f3:c8:5a:3a,dst=80:88:88:88:88:88),eth_type(0x0800), ipv4(src=10.1.1.2,dst=10.1.1.9,proto=6,tos=0,ttl=64,frag=no), tcp(src=57027,dst=38153),tcp_flags(syn) But after this change, we still have the original tuple of the parent connection, but the label is no longer in the flow key: system@ovs-system: miss upcall: recirc_id(0x2),dp_hash(0),skb_priority(0),in_port(3),skb_mark(0),ct_state(0x25), ct_zone(0x1),ct_mark(0),ct_label(0), ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=6,tp_src=34668,tp_dst=21), eth(src=66:eb:74:c6:79:24,dst=80:88:88:88:88:88),eth_type(0x0800), ipv4(src=10.1.1.2,dst=10.1.1.9,proto=6,tos=0,ttl=64,frag=no), tcp(src=49529,dst=35459),tcp_flags(syn) ct_state(0x25) == +new+rel+trk Best regards, Ilya Maximets. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-06-19 12:58 ` Ilya Maximets @ 2024-06-19 14:07 ` Xin Long 2024-06-19 17:30 ` Ilya Maximets 0 siblings, 1 reply; 29+ messages in thread From: Xin Long @ 2024-06-19 14:07 UTC (permalink / raw) To: Ilya Maximets Cc: network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Florian Westphal, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole On Wed, Jun 19, 2024 at 8:58 AM Ilya Maximets <i.maximets@ovn.org> wrote: > > On 6/18/24 17:50, Ilya Maximets wrote: > > On 6/18/24 16:58, Xin Long wrote: > >> On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets <i.maximets@ovn.org> wrote: > >>> > >>> On 6/17/24 22:10, Ilya Maximets wrote: > >>>> On 7/16/23 23:09, Xin Long wrote: > >>>>> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed > >>>>> from the hashtable when lookup, we can simplify the exp processing code a > >>>>> lot in openvswitch conntrack. > >>>>> > >>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com> > >>>>> --- > >>>>> net/openvswitch/conntrack.c | 78 +++++-------------------------------- > >>>>> 1 file changed, 10 insertions(+), 68 deletions(-) > >>>>> > >>>>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c > >>>>> index 331730fd3580..fa955e892210 100644 > >>>>> --- a/net/openvswitch/conntrack.c > >>>>> +++ b/net/openvswitch/conntrack.c > >>>>> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, > >>>>> return 0; > >>>>> } > >>>>> > >>>>> -static struct nf_conntrack_expect * > >>>>> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, > >>>>> - u16 proto, const struct sk_buff *skb) > >>>>> -{ > >>>>> - struct nf_conntrack_tuple tuple; > >>>>> - struct nf_conntrack_expect *exp; > >>>>> - > >>>>> - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, &tuple)) > >>>>> - return NULL; > >>>>> - > >>>>> - exp = __nf_ct_expect_find(net, zone, &tuple); > >>>>> - if (exp) { > >>>>> - struct nf_conntrack_tuple_hash *h; > >>>>> - > >>>>> - /* Delete existing conntrack entry, if it clashes with the > >>>>> - * expectation. This can happen since conntrack ALGs do not > >>>>> - * check for clashes between (new) expectations and existing > >>>>> - * conntrack entries. nf_conntrack_in() will check the > >>>>> - * expectations only if a conntrack entry can not be found, > >>>>> - * which can lead to OVS finding the expectation (here) in the > >>>>> - * init direction, but which will not be removed by the > >>>>> - * nf_conntrack_in() call, if a matching conntrack entry is > >>>>> - * found instead. In this case all init direction packets > >>>>> - * would be reported as new related packets, while reply > >>>>> - * direction packets would be reported as un-related > >>>>> - * established packets. > >>>>> - */ > >>>>> - h = nf_conntrack_find_get(net, zone, &tuple); > >>>>> - if (h) { > >>>>> - struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); > >>>>> - > >>>>> - nf_ct_delete(ct, 0, 0); > >>>>> - nf_ct_put(ct); > >>>>> - } > >>>>> - } > >>>>> - > >>>>> - return exp; > >>>>> -} > >>>>> - > >>>>> /* This replicates logic from nf_conntrack_core.c that is not exported. */ > >>>>> static enum ip_conntrack_info > >>>>> ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) > >>>>> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, > >>>>> const struct ovs_conntrack_info *info, > >>>>> struct sk_buff *skb) > >>>>> { > >>>>> - struct nf_conntrack_expect *exp; > >>>>> - > >>>>> - /* If we pass an expected packet through nf_conntrack_in() the > >>>>> - * expectation is typically removed, but the packet could still be > >>>>> - * lost in upcall processing. To prevent this from happening we > >>>>> - * perform an explicit expectation lookup. Expected connections are > >>>>> - * always new, and will be passed through conntrack only when they are > >>>>> - * committed, as it is OK to remove the expectation at that time. > >>>>> - */ > >>>>> - exp = ovs_ct_expect_find(net, &info->zone, info->family, skb); > >>>>> - if (exp) { > >>>>> - u8 state; > >>>>> - > >>>>> - /* NOTE: New connections are NATted and Helped only when > >>>>> - * committed, so we are not calling into NAT here. > >>>>> - */ > >>>>> - state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED; > >>>>> - __ovs_ct_update_key(key, state, &info->zone, exp->master); > >>>> > >>>> Hi, Xin, others. > >>>> > >>>> Unfortunately, it seems like removal of this code broke the expected behavior. > >>>> OVS in userspace expects that SYN packet of a new related FTP connection will > >>>> get +new+rel+trk flags, but after this patch we're only getting +rel+trk and not > >>>> new. This is a problem because we need to commit this connection with the label > >>>> and we do that for +new packets. If we can't get +new packet we'll have to commit > >>>> every single +rel+trk packet, which doesn't make a lot of sense. And it's a > >>>> significant behavior change regardless. > >>> > >>> Interestingly enough I see +new+rel+trk packets in cases without SNAT, > >>> but we can only get +rel+trk in cases with SNAT. So, this may be just > >>> a generic conntrack bug somewhere. At least the behavior seems fairly > >>> inconsistent. > >>> > >> In nf_conntrack, IP_CT_RELATED and IP_CT_NEW do not exist at the same > >> time. With this patch, we expect OVS_CS_F_RELATED and OVS_CS_F_NEW > >> are set at the same time by ovs_ct_update_key() when this related ct > >> is not confirmed. > >> > >> The check-kernel test of "FTP SNAT orig tuple" skiped on my env somehow: > >> > >> # make check-kernel > >> 144: conntrack - FTP SNAT orig tuple skipped (system-traffic.at:7295) > >> > >> Any idea why? or do you know any other testcase that expects +new+rel+trk > >> but returns +rel+trk only? > > > > You need to install lftp and pyftpdlib. The pyftpdlib may only be available > > via pip on some systems. > > > >> > >> Thanks. > >>>> > >>>> Could you, please, take a look? > >>>> > >>>> The issue can be reproduced by running check-kernel tests in OVS repo. > >>>> 'FTP SNAT orig tuple' tests fail 100% of the time. > >>>> > >>>> Best regards, Ilya Maximets. > >>> > > > > Hmm. After further investigation, it seems that the issue is not about ct state, > but the ct label. Before this commit we had information about both the original > tuple of the parent connection and the mark/label of the parent connection: > Make senses. Now I can see the difference after this commit. We will need a fix in __ovs_ct_update_key() to copy mark & label from ct->master for exp ct. @@ -196,8 +196,8 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, { key->ct_state = state; key->ct_zone = zone->id; - key->ct.mark = ovs_ct_get_mark(ct); - ovs_ct_get_labels(ct, &key->ct.labels); + key->ct.mark = 0; + memset(&key->ct.labels, 0, OVS_CT_LABELS_LEN); if (ct) { const struct nf_conntrack_tuple *orig; @@ -205,6 +205,8 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, /* Use the master if we have one. */ if (ct->master) ct = ct->master; + key->ct.mark = ovs_ct_get_mark(ct); + ovs_ct_get_labels(ct, &key->ct.labels); orig = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple; /* IP version must match with the master connection. */ We may need to run some regression tests for such a change. Thanks. > system@ovs-system: miss upcall: > recirc_id(0x2),dp_hash(0),skb_priority(0),in_port(3),skb_mark(0),ct_state(0x25), > ct_zone(0x1),ct_mark(0),ct_label(0x4d2000000000000000000000001), > ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=6,tp_src=50648,tp_dst=21), > eth(src=de:d9:f3:c8:5a:3a,dst=80:88:88:88:88:88),eth_type(0x0800), > ipv4(src=10.1.1.2,dst=10.1.1.9,proto=6,tos=0,ttl=64,frag=no), > tcp(src=57027,dst=38153),tcp_flags(syn) > > But after this change, we still have the original tuple of the parent connection, > but the label is no longer in the flow key: > > system@ovs-system: miss upcall: > recirc_id(0x2),dp_hash(0),skb_priority(0),in_port(3),skb_mark(0),ct_state(0x25), > ct_zone(0x1),ct_mark(0),ct_label(0), > ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=6,tp_src=34668,tp_dst=21), > eth(src=66:eb:74:c6:79:24,dst=80:88:88:88:88:88),eth_type(0x0800), > ipv4(src=10.1.1.2,dst=10.1.1.9,proto=6,tos=0,ttl=64,frag=no), > tcp(src=49529,dst=35459),tcp_flags(syn) > > ct_state(0x25) == +new+rel+trk > > Best regards, Ilya Maximets. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-06-19 14:07 ` Xin Long @ 2024-06-19 17:30 ` Ilya Maximets 2024-06-19 20:11 ` Xin Long 0 siblings, 1 reply; 29+ messages in thread From: Ilya Maximets @ 2024-06-19 17:30 UTC (permalink / raw) To: Xin Long Cc: i.maximets, network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Florian Westphal, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole On 6/19/24 16:07, Xin Long wrote: > On Wed, Jun 19, 2024 at 8:58 AM Ilya Maximets <i.maximets@ovn.org> wrote: >> >> On 6/18/24 17:50, Ilya Maximets wrote: >>> On 6/18/24 16:58, Xin Long wrote: >>>> On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets <i.maximets@ovn.org> wrote: >>>>> >>>>> On 6/17/24 22:10, Ilya Maximets wrote: >>>>>> On 7/16/23 23:09, Xin Long wrote: >>>>>>> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed >>>>>>> from the hashtable when lookup, we can simplify the exp processing code a >>>>>>> lot in openvswitch conntrack. >>>>>>> >>>>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com> >>>>>>> --- >>>>>>> net/openvswitch/conntrack.c | 78 +++++-------------------------------- >>>>>>> 1 file changed, 10 insertions(+), 68 deletions(-) >>>>>>> >>>>>>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c >>>>>>> index 331730fd3580..fa955e892210 100644 >>>>>>> --- a/net/openvswitch/conntrack.c >>>>>>> +++ b/net/openvswitch/conntrack.c >>>>>>> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, >>>>>>> return 0; >>>>>>> } >>>>>>> >>>>>>> -static struct nf_conntrack_expect * >>>>>>> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, >>>>>>> - u16 proto, const struct sk_buff *skb) >>>>>>> -{ >>>>>>> - struct nf_conntrack_tuple tuple; >>>>>>> - struct nf_conntrack_expect *exp; >>>>>>> - >>>>>>> - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, &tuple)) >>>>>>> - return NULL; >>>>>>> - >>>>>>> - exp = __nf_ct_expect_find(net, zone, &tuple); >>>>>>> - if (exp) { >>>>>>> - struct nf_conntrack_tuple_hash *h; >>>>>>> - >>>>>>> - /* Delete existing conntrack entry, if it clashes with the >>>>>>> - * expectation. This can happen since conntrack ALGs do not >>>>>>> - * check for clashes between (new) expectations and existing >>>>>>> - * conntrack entries. nf_conntrack_in() will check the >>>>>>> - * expectations only if a conntrack entry can not be found, >>>>>>> - * which can lead to OVS finding the expectation (here) in the >>>>>>> - * init direction, but which will not be removed by the >>>>>>> - * nf_conntrack_in() call, if a matching conntrack entry is >>>>>>> - * found instead. In this case all init direction packets >>>>>>> - * would be reported as new related packets, while reply >>>>>>> - * direction packets would be reported as un-related >>>>>>> - * established packets. >>>>>>> - */ >>>>>>> - h = nf_conntrack_find_get(net, zone, &tuple); >>>>>>> - if (h) { >>>>>>> - struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); >>>>>>> - >>>>>>> - nf_ct_delete(ct, 0, 0); >>>>>>> - nf_ct_put(ct); >>>>>>> - } >>>>>>> - } >>>>>>> - >>>>>>> - return exp; >>>>>>> -} >>>>>>> - >>>>>>> /* This replicates logic from nf_conntrack_core.c that is not exported. */ >>>>>>> static enum ip_conntrack_info >>>>>>> ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) >>>>>>> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, >>>>>>> const struct ovs_conntrack_info *info, >>>>>>> struct sk_buff *skb) >>>>>>> { >>>>>>> - struct nf_conntrack_expect *exp; >>>>>>> - >>>>>>> - /* If we pass an expected packet through nf_conntrack_in() the >>>>>>> - * expectation is typically removed, but the packet could still be >>>>>>> - * lost in upcall processing. To prevent this from happening we >>>>>>> - * perform an explicit expectation lookup. Expected connections are >>>>>>> - * always new, and will be passed through conntrack only when they are >>>>>>> - * committed, as it is OK to remove the expectation at that time. >>>>>>> - */ >>>>>>> - exp = ovs_ct_expect_find(net, &info->zone, info->family, skb); >>>>>>> - if (exp) { >>>>>>> - u8 state; >>>>>>> - >>>>>>> - /* NOTE: New connections are NATted and Helped only when >>>>>>> - * committed, so we are not calling into NAT here. >>>>>>> - */ >>>>>>> - state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED; >>>>>>> - __ovs_ct_update_key(key, state, &info->zone, exp->master); >>>>>> >>>>>> Hi, Xin, others. >>>>>> >>>>>> Unfortunately, it seems like removal of this code broke the expected behavior. >>>>>> OVS in userspace expects that SYN packet of a new related FTP connection will >>>>>> get +new+rel+trk flags, but after this patch we're only getting +rel+trk and not >>>>>> new. This is a problem because we need to commit this connection with the label >>>>>> and we do that for +new packets. If we can't get +new packet we'll have to commit >>>>>> every single +rel+trk packet, which doesn't make a lot of sense. And it's a >>>>>> significant behavior change regardless. >>>>> >>>>> Interestingly enough I see +new+rel+trk packets in cases without SNAT, >>>>> but we can only get +rel+trk in cases with SNAT. So, this may be just >>>>> a generic conntrack bug somewhere. At least the behavior seems fairly >>>>> inconsistent. >>>>> >>>> In nf_conntrack, IP_CT_RELATED and IP_CT_NEW do not exist at the same >>>> time. With this patch, we expect OVS_CS_F_RELATED and OVS_CS_F_NEW >>>> are set at the same time by ovs_ct_update_key() when this related ct >>>> is not confirmed. >>>> >>>> The check-kernel test of "FTP SNAT orig tuple" skiped on my env somehow: >>>> >>>> # make check-kernel >>>> 144: conntrack - FTP SNAT orig tuple skipped (system-traffic.at:7295) >>>> >>>> Any idea why? or do you know any other testcase that expects +new+rel+trk >>>> but returns +rel+trk only? >>> >>> You need to install lftp and pyftpdlib. The pyftpdlib may only be available >>> via pip on some systems. >>> >>>> >>>> Thanks. >>>>>> >>>>>> Could you, please, take a look? >>>>>> >>>>>> The issue can be reproduced by running check-kernel tests in OVS repo. >>>>>> 'FTP SNAT orig tuple' tests fail 100% of the time. >>>>>> >>>>>> Best regards, Ilya Maximets. >>>>> >>> >> >> Hmm. After further investigation, it seems that the issue is not about ct state, >> but the ct label. Before this commit we had information about both the original >> tuple of the parent connection and the mark/label of the parent connection: >> > Make senses. Now I can see the difference after this commit. > We will need a fix in __ovs_ct_update_key() to copy mark & label from > ct->master for exp ct. > > @@ -196,8 +196,8 @@ static void __ovs_ct_update_key(struct sw_flow_key > *key, u8 state, > { > key->ct_state = state; > key->ct_zone = zone->id; > - key->ct.mark = ovs_ct_get_mark(ct); > - ovs_ct_get_labels(ct, &key->ct.labels); > + key->ct.mark = 0; > + memset(&key->ct.labels, 0, OVS_CT_LABELS_LEN); > > if (ct) { > const struct nf_conntrack_tuple *orig; > @@ -205,6 +205,8 @@ static void __ovs_ct_update_key(struct sw_flow_key > *key, u8 state, > /* Use the master if we have one. */ > if (ct->master) > ct = ct->master; > + key->ct.mark = ovs_ct_get_mark(ct); > + ovs_ct_get_labels(ct, &key->ct.labels); > orig = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple; > > /* IP version must match with the master connection. */ > > We may need to run some regression tests for such a change. Thank, Xin! This seems like a change in the right direction and it fixes this particular test. But, I guess, we should get mark/labels from the master connection only if it is not yet confirmed. Users may commit different labels for the related connection. This should be more in line with the previous behavior. What do you think? Best regards, Ilya Maximets. > > Thanks. > >> system@ovs-system: miss upcall: >> recirc_id(0x2),dp_hash(0),skb_priority(0),in_port(3),skb_mark(0),ct_state(0x25), >> ct_zone(0x1),ct_mark(0),ct_label(0x4d2000000000000000000000001), >> ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=6,tp_src=50648,tp_dst=21), >> eth(src=de:d9:f3:c8:5a:3a,dst=80:88:88:88:88:88),eth_type(0x0800), >> ipv4(src=10.1.1.2,dst=10.1.1.9,proto=6,tos=0,ttl=64,frag=no), >> tcp(src=57027,dst=38153),tcp_flags(syn) >> >> But after this change, we still have the original tuple of the parent connection, >> but the label is no longer in the flow key: >> >> system@ovs-system: miss upcall: >> recirc_id(0x2),dp_hash(0),skb_priority(0),in_port(3),skb_mark(0),ct_state(0x25), >> ct_zone(0x1),ct_mark(0),ct_label(0), >> ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=6,tp_src=34668,tp_dst=21), >> eth(src=66:eb:74:c6:79:24,dst=80:88:88:88:88:88),eth_type(0x0800), >> ipv4(src=10.1.1.2,dst=10.1.1.9,proto=6,tos=0,ttl=64,frag=no), >> tcp(src=49529,dst=35459),tcp_flags(syn) >> >> ct_state(0x25) == +new+rel+trk >> >> Best regards, Ilya Maximets. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-06-19 17:30 ` Ilya Maximets @ 2024-06-19 20:11 ` Xin Long 2024-06-19 20:19 ` Florian Westphal 0 siblings, 1 reply; 29+ messages in thread From: Xin Long @ 2024-06-19 20:11 UTC (permalink / raw) To: Ilya Maximets, Florian Westphal Cc: network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole On Wed, Jun 19, 2024 at 1:30 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > On 6/19/24 16:07, Xin Long wrote: > > On Wed, Jun 19, 2024 at 8:58 AM Ilya Maximets <i.maximets@ovn.org> wrote: > >> > >> On 6/18/24 17:50, Ilya Maximets wrote: > >>> On 6/18/24 16:58, Xin Long wrote: > >>>> On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets <i.maximets@ovn.org> wrote: > >>>>> > >>>>> On 6/17/24 22:10, Ilya Maximets wrote: > >>>>>> On 7/16/23 23:09, Xin Long wrote: > >>>>>>> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed > >>>>>>> from the hashtable when lookup, we can simplify the exp processing code a > >>>>>>> lot in openvswitch conntrack. > >>>>>>> > >>>>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com> > >>>>>>> --- > >>>>>>> net/openvswitch/conntrack.c | 78 +++++-------------------------------- > >>>>>>> 1 file changed, 10 insertions(+), 68 deletions(-) > >>>>>>> > >>>>>>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c > >>>>>>> index 331730fd3580..fa955e892210 100644 > >>>>>>> --- a/net/openvswitch/conntrack.c > >>>>>>> +++ b/net/openvswitch/conntrack.c > >>>>>>> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, > >>>>>>> return 0; > >>>>>>> } > >>>>>>> > >>>>>>> -static struct nf_conntrack_expect * > >>>>>>> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, > >>>>>>> - u16 proto, const struct sk_buff *skb) > >>>>>>> -{ > >>>>>>> - struct nf_conntrack_tuple tuple; > >>>>>>> - struct nf_conntrack_expect *exp; > >>>>>>> - > >>>>>>> - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, &tuple)) > >>>>>>> - return NULL; > >>>>>>> - > >>>>>>> - exp = __nf_ct_expect_find(net, zone, &tuple); > >>>>>>> - if (exp) { > >>>>>>> - struct nf_conntrack_tuple_hash *h; > >>>>>>> - > >>>>>>> - /* Delete existing conntrack entry, if it clashes with the > >>>>>>> - * expectation. This can happen since conntrack ALGs do not > >>>>>>> - * check for clashes between (new) expectations and existing > >>>>>>> - * conntrack entries. nf_conntrack_in() will check the > >>>>>>> - * expectations only if a conntrack entry can not be found, > >>>>>>> - * which can lead to OVS finding the expectation (here) in the > >>>>>>> - * init direction, but which will not be removed by the > >>>>>>> - * nf_conntrack_in() call, if a matching conntrack entry is > >>>>>>> - * found instead. In this case all init direction packets > >>>>>>> - * would be reported as new related packets, while reply > >>>>>>> - * direction packets would be reported as un-related > >>>>>>> - * established packets. > >>>>>>> - */ > >>>>>>> - h = nf_conntrack_find_get(net, zone, &tuple); > >>>>>>> - if (h) { > >>>>>>> - struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); > >>>>>>> - > >>>>>>> - nf_ct_delete(ct, 0, 0); > >>>>>>> - nf_ct_put(ct); > >>>>>>> - } > >>>>>>> - } > >>>>>>> - > >>>>>>> - return exp; > >>>>>>> -} > >>>>>>> - > >>>>>>> /* This replicates logic from nf_conntrack_core.c that is not exported. */ > >>>>>>> static enum ip_conntrack_info > >>>>>>> ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) > >>>>>>> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, > >>>>>>> const struct ovs_conntrack_info *info, > >>>>>>> struct sk_buff *skb) > >>>>>>> { > >>>>>>> - struct nf_conntrack_expect *exp; > >>>>>>> - > >>>>>>> - /* If we pass an expected packet through nf_conntrack_in() the > >>>>>>> - * expectation is typically removed, but the packet could still be > >>>>>>> - * lost in upcall processing. To prevent this from happening we > >>>>>>> - * perform an explicit expectation lookup. Expected connections are > >>>>>>> - * always new, and will be passed through conntrack only when they are > >>>>>>> - * committed, as it is OK to remove the expectation at that time. > >>>>>>> - */ > >>>>>>> - exp = ovs_ct_expect_find(net, &info->zone, info->family, skb); > >>>>>>> - if (exp) { > >>>>>>> - u8 state; > >>>>>>> - > >>>>>>> - /* NOTE: New connections are NATted and Helped only when > >>>>>>> - * committed, so we are not calling into NAT here. > >>>>>>> - */ > >>>>>>> - state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED; > >>>>>>> - __ovs_ct_update_key(key, state, &info->zone, exp->master); > >>>>>> > >>>>>> Hi, Xin, others. > >>>>>> > >>>>>> Unfortunately, it seems like removal of this code broke the expected behavior. > >>>>>> OVS in userspace expects that SYN packet of a new related FTP connection will > >>>>>> get +new+rel+trk flags, but after this patch we're only getting +rel+trk and not > >>>>>> new. This is a problem because we need to commit this connection with the label > >>>>>> and we do that for +new packets. If we can't get +new packet we'll have to commit > >>>>>> every single +rel+trk packet, which doesn't make a lot of sense. And it's a > >>>>>> significant behavior change regardless. > >>>>> > >>>>> Interestingly enough I see +new+rel+trk packets in cases without SNAT, > >>>>> but we can only get +rel+trk in cases with SNAT. So, this may be just > >>>>> a generic conntrack bug somewhere. At least the behavior seems fairly > >>>>> inconsistent. > >>>>> > >>>> In nf_conntrack, IP_CT_RELATED and IP_CT_NEW do not exist at the same > >>>> time. With this patch, we expect OVS_CS_F_RELATED and OVS_CS_F_NEW > >>>> are set at the same time by ovs_ct_update_key() when this related ct > >>>> is not confirmed. > >>>> > >>>> The check-kernel test of "FTP SNAT orig tuple" skiped on my env somehow: > >>>> > >>>> # make check-kernel > >>>> 144: conntrack - FTP SNAT orig tuple skipped (system-traffic.at:7295) > >>>> > >>>> Any idea why? or do you know any other testcase that expects +new+rel+trk > >>>> but returns +rel+trk only? > >>> > >>> You need to install lftp and pyftpdlib. The pyftpdlib may only be available > >>> via pip on some systems. > >>> > >>>> > >>>> Thanks. > >>>>>> > >>>>>> Could you, please, take a look? > >>>>>> > >>>>>> The issue can be reproduced by running check-kernel tests in OVS repo. > >>>>>> 'FTP SNAT orig tuple' tests fail 100% of the time. > >>>>>> > >>>>>> Best regards, Ilya Maximets. > >>>>> > >>> > >> > >> Hmm. After further investigation, it seems that the issue is not about ct state, > >> but the ct label. Before this commit we had information about both the original > >> tuple of the parent connection and the mark/label of the parent connection: > >> > > Make senses. Now I can see the difference after this commit. > > We will need a fix in __ovs_ct_update_key() to copy mark & label from > > ct->master for exp ct. > > > > @@ -196,8 +196,8 @@ static void __ovs_ct_update_key(struct sw_flow_key > > *key, u8 state, > > { > > key->ct_state = state; > > key->ct_zone = zone->id; > > - key->ct.mark = ovs_ct_get_mark(ct); > > - ovs_ct_get_labels(ct, &key->ct.labels); > > + key->ct.mark = 0; > > + memset(&key->ct.labels, 0, OVS_CT_LABELS_LEN); > > > > if (ct) { > > const struct nf_conntrack_tuple *orig; > > @@ -205,6 +205,8 @@ static void __ovs_ct_update_key(struct sw_flow_key > > *key, u8 state, > > /* Use the master if we have one. */ > > if (ct->master) > > ct = ct->master; > > + key->ct.mark = ovs_ct_get_mark(ct); > > + ovs_ct_get_labels(ct, &key->ct.labels); > > orig = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple; > > > > /* IP version must match with the master connection. */ > > > > We may need to run some regression tests for such a change. > > Thank, Xin! This seems like a change in the right direction and it fixes > this particular test. But, I guess, we should get mark/labels from the > master connection only if it is not yet confirmed. Users may commit different > labels for the related connection. This should be more in line with the > previous behavior. > > What do you think? > You're right. Also, I noticed the related ct->mark is set to master ct->mark in init_conntrack() as well as secmark when creating the related ct. Hi, Florian, Any reason why the labels are not set to master ct's in there? Thanks. > > > > > Thanks. > > > >> system@ovs-system: miss upcall: > >> recirc_id(0x2),dp_hash(0),skb_priority(0),in_port(3),skb_mark(0),ct_state(0x25), > >> ct_zone(0x1),ct_mark(0),ct_label(0x4d2000000000000000000000001), > >> ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=6,tp_src=50648,tp_dst=21), > >> eth(src=de:d9:f3:c8:5a:3a,dst=80:88:88:88:88:88),eth_type(0x0800), > >> ipv4(src=10.1.1.2,dst=10.1.1.9,proto=6,tos=0,ttl=64,frag=no), > >> tcp(src=57027,dst=38153),tcp_flags(syn) > >> > >> But after this change, we still have the original tuple of the parent connection, > >> but the label is no longer in the flow key: > >> > >> system@ovs-system: miss upcall: > >> recirc_id(0x2),dp_hash(0),skb_priority(0),in_port(3),skb_mark(0),ct_state(0x25), > >> ct_zone(0x1),ct_mark(0),ct_label(0), > >> ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=6,tp_src=34668,tp_dst=21), > >> eth(src=66:eb:74:c6:79:24,dst=80:88:88:88:88:88),eth_type(0x0800), > >> ipv4(src=10.1.1.2,dst=10.1.1.9,proto=6,tos=0,ttl=64,frag=no), > >> tcp(src=49529,dst=35459),tcp_flags(syn) > >> > >> ct_state(0x25) == +new+rel+trk > >> > >> Best regards, Ilya Maximets. > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-06-19 20:11 ` Xin Long @ 2024-06-19 20:19 ` Florian Westphal 2024-06-19 20:50 ` Xin Long 0 siblings, 1 reply; 29+ messages in thread From: Florian Westphal @ 2024-06-19 20:19 UTC (permalink / raw) To: Xin Long Cc: Ilya Maximets, Florian Westphal, network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole Xin Long <lucien.xin@gmail.com> wrote: > > master connection only if it is not yet confirmed. Users may commit different > > labels for the related connection. This should be more in line with the > > previous behavior. > > > > What do you think? > > > You're right. > Also, I noticed the related ct->mark is set to master ct->mark in > init_conntrack() as well as secmark when creating the related ct. > > Hi, Florian, > > Any reason why the labels are not set to master ct's in there? The intent was to have lables be set only via ctnetlink (userspace) or ruleset. The original use case was for tagging connections based on observed behaviour/properties at a later time, not at start of flow. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-06-19 20:19 ` Florian Westphal @ 2024-06-19 20:50 ` Xin Long 2024-06-19 21:20 ` Florian Westphal 0 siblings, 1 reply; 29+ messages in thread From: Xin Long @ 2024-06-19 20:50 UTC (permalink / raw) To: Florian Westphal Cc: Ilya Maximets, network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole On Wed, Jun 19, 2024 at 4:20 PM Florian Westphal <fw@strlen.de> wrote: > > Xin Long <lucien.xin@gmail.com> wrote: > > > master connection only if it is not yet confirmed. Users may commit different > > > labels for the related connection. This should be more in line with the > > > previous behavior. > > > > > > What do you think? > > > > > You're right. > > Also, I noticed the related ct->mark is set to master ct->mark in > > init_conntrack() as well as secmark when creating the related ct. > > > > Hi, Florian, > > > > Any reason why the labels are not set to master ct's in there? > > The intent was to have lables be set only via ctnetlink (userspace) > or ruleset. > > The original use case was for tagging connections based on > observed behaviour/properties at a later time, not at start of flow. Got it, I will fix this in ovs. Thanks. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-06-19 20:50 ` Xin Long @ 2024-06-19 21:20 ` Florian Westphal 2024-06-19 22:10 ` Xin Long 0 siblings, 1 reply; 29+ messages in thread From: Florian Westphal @ 2024-06-19 21:20 UTC (permalink / raw) To: Xin Long Cc: Florian Westphal, Ilya Maximets, network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole Xin Long <lucien.xin@gmail.com> wrote: > Got it, I will fix this in ovs. Thanks! I don't want to throw more work at you, but since you are already working on ovs+conntrack: ovs_ct_init() is bad, as this runs unconditionally. modprobe openvswitch -> conntrack becomes active in all existing and future namespaces. Conntrack is slow, we should avoid this behaviour. ovs_ct_limit_init() should be called only when the feature is configured (the problematic call is nf_conncount_init, as that turns on connection tracking, the kmalloc etc is fine). Likewise, nf_connlabels_get() should only be called when labels are added/configured, not at the start. I resolved this for tc in 70f06c115bcc but it seems i never got around to fix it for ovs. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-06-19 21:20 ` Florian Westphal @ 2024-06-19 22:10 ` Xin Long 2024-07-08 22:03 ` Xin Long 0 siblings, 1 reply; 29+ messages in thread From: Xin Long @ 2024-06-19 22:10 UTC (permalink / raw) To: Florian Westphal Cc: Ilya Maximets, network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole On Wed, Jun 19, 2024 at 5:20 PM Florian Westphal <fw@strlen.de> wrote: > > Xin Long <lucien.xin@gmail.com> wrote: > > Got it, I will fix this in ovs. > > Thanks! > > I don't want to throw more work at you, but since you are > already working on ovs+conntrack: > > ovs_ct_init() is bad, as this runs unconditionally. > > modprobe openvswitch -> conntrack becomes active in all > existing and future namespaces. > > Conntrack is slow, we should avoid this behaviour. > > ovs_ct_limit_init() should be called only when the feature is > configured (the problematic call is nf_conncount_init, as that > turns on connection tracking, the kmalloc etc is fine). understand. > > Likewise, nf_connlabels_get() should only be called when > labels are added/configured, not at the start. > > I resolved this for tc in 70f06c115bcc but it seems i never > got around to fix it for ovs. I will take a look. Thanks. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-06-19 22:10 ` Xin Long @ 2024-07-08 22:03 ` Xin Long 2024-07-08 22:38 ` Florian Westphal 0 siblings, 1 reply; 29+ messages in thread From: Xin Long @ 2024-07-08 22:03 UTC (permalink / raw) To: Florian Westphal Cc: Ilya Maximets, network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole On Wed, Jun 19, 2024 at 6:10 PM Xin Long <lucien.xin@gmail.com> wrote: > > On Wed, Jun 19, 2024 at 5:20 PM Florian Westphal <fw@strlen.de> wrote: > > > > Xin Long <lucien.xin@gmail.com> wrote: > > > Got it, I will fix this in ovs. > > > > Thanks! > > > > I don't want to throw more work at you, but since you are > > already working on ovs+conntrack: > > > > ovs_ct_init() is bad, as this runs unconditionally. > > > > modprobe openvswitch -> conntrack becomes active in all > > existing and future namespaces. > > > > Conntrack is slow, we should avoid this behaviour. > > > > ovs_ct_limit_init() should be called only when the feature is > > configured (the problematic call is nf_conncount_init, as that > > turns on connection tracking, the kmalloc etc is fine). > understand. > > > > > Likewise, nf_connlabels_get() should only be called when > > labels are added/configured, not at the start. > > > > I resolved this for tc in 70f06c115bcc but it seems i never > > got around to fix it for ovs. Hi, Florian, I noticed the warning in nf_ct_ext_add() while I'm making this change: WARN_ON(nf_ct_is_confirmed(ct)); It can be triggered by these ovs flows from ovs selftests: table=0,priority=30,in_port=1,ip,nw_dst=172.1.1.2,actions=ct(commit,nat(dst=10.1.1.2:80),table=1) table=1,ip,actions=ct(commit,nat(src=10.1.1.240),exec(set_field:0xac->ct_mark,set_field:0xac->ct_label),table=2) The 1st flow creates the ct and commits/confirms it, then the 2nd flow is setting ct_label on a confirmed ct. With this change, the connlabels ext is not yet allocated at the time, and then the warning is triggered when allocating it in nf_ct_ext_add(). tc act_ct action doesn't have this issue, as it returns an error if the connlabels is not found in tcf_ct_act_set_labels(), instead of allocating it. I can avoid this warning by not allocating ext for commit ct in ovs: @@ -426,7 +426,7 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, struct nf_conn_labels *cl; int err; - cl = ovs_ct_get_conn_labels(ct); + cl = nf_ct_labels_find(ct); if (!cl) return -ENOSPC; However, the test case would fail, although the failure can be worked around by setting ct_label in the 1st rule: table=0,priority=30,in_port=1,ip,nw_dst=172.1.1.2,actions=ct(commit,nat(dst=10.1.1.2:80),exec(set_field:0x01->ct_label),table=1) So I'm worrying our change may break some existing OVS user cases. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-07-08 22:03 ` Xin Long @ 2024-07-08 22:38 ` Florian Westphal 2024-07-09 1:49 ` Xin Long 0 siblings, 1 reply; 29+ messages in thread From: Florian Westphal @ 2024-07-08 22:38 UTC (permalink / raw) To: Xin Long Cc: Florian Westphal, Ilya Maximets, network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole Xin Long <lucien.xin@gmail.com> wrote: > I can avoid this warning by not allocating ext for commit ct in ovs: > > @@ -426,7 +426,7 @@ static int ovs_ct_set_labels(struct nf_conn *ct, > struct sw_flow_key *key, > struct nf_conn_labels *cl; > int err; > > - cl = ovs_ct_get_conn_labels(ct); > + cl = nf_ct_labels_find(ct); > if (!cl) > return -ENOSPC; > > However, the test case would fail, although the failure can be worked around > by setting ct_label in the 1st rule: > > table=0,priority=30,in_port=1,ip,nw_dst=172.1.1.2,actions=ct(commit,nat(dst=10.1.1.2:80),exec(set_field:0x01->ct_label),table=1) > > So I'm worrying our change may break some existing OVS user cases. Then ovs_ct_limit_init() and nf_connlabels_get() need to be called once on the first conntrack operatation, regardless if labels are asked for or not. Not nice but still better than current state. ovs_ct_execute() perhaps? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-07-08 22:38 ` Florian Westphal @ 2024-07-09 1:49 ` Xin Long 2024-07-09 5:49 ` Florian Westphal 0 siblings, 1 reply; 29+ messages in thread From: Xin Long @ 2024-07-09 1:49 UTC (permalink / raw) To: Florian Westphal Cc: Ilya Maximets, network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole On Mon, Jul 8, 2024 at 6:38 PM Florian Westphal <fw@strlen.de> wrote: > > Xin Long <lucien.xin@gmail.com> wrote: > > I can avoid this warning by not allocating ext for commit ct in ovs: > > > > @@ -426,7 +426,7 @@ static int ovs_ct_set_labels(struct nf_conn *ct, > > struct sw_flow_key *key, > > struct nf_conn_labels *cl; > > int err; > > > > - cl = ovs_ct_get_conn_labels(ct); > > + cl = nf_ct_labels_find(ct); > > if (!cl) > > return -ENOSPC; ovs_ct_get_conn_labels() must be replaced with nf_ct_labels_find() in here anyway, thinking that the confirmed ct without labels was created in other places (not by OVS conntrack), the warning may still be triggered when trying to set labels in OVS after. > > > > However, the test case would fail, although the failure can be worked around > > by setting ct_label in the 1st rule: > > > > table=0,priority=30,in_port=1,ip,nw_dst=172.1.1.2,actions=ct(commit,nat(dst=10.1.1.2:80),exec(set_field:0x01->ct_label),table=1) > > > > So I'm worrying our change may break some existing OVS user cases. > > Then ovs_ct_limit_init() and nf_connlabels_get() need to be called > once on the first conntrack operatation, regardless if labels are asked > for or not. > > Not nice but still better than current state. Right, not nice, it undermines the bits check against NF_CT_LABELS_MAX_SIZE. > > ovs_ct_execute() perhaps? ovs_ct_execute() is in the hot path, and a better place would be ovs_ct_copy_action() where the ct for an ovs flow is added. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2024-07-09 1:49 ` Xin Long @ 2024-07-09 5:49 ` Florian Westphal 0 siblings, 0 replies; 29+ messages in thread From: Florian Westphal @ 2024-07-09 5:49 UTC (permalink / raw) To: Xin Long Cc: Florian Westphal, Ilya Maximets, network dev, dev, Marcelo Ricardo Leitner, Jiri Pirko, Davide Caratti, Jamal Hadi Salim, Eric Dumazet, Cong Wang, kuba, Paolo Abeni, davem, Pablo Neira Ayuso, Aaron Conole Xin Long <lucien.xin@gmail.com> wrote: > > Xin Long <lucien.xin@gmail.com> wrote: > > > I can avoid this warning by not allocating ext for commit ct in ovs: > > > > > > @@ -426,7 +426,7 @@ static int ovs_ct_set_labels(struct nf_conn *ct, > > > struct sw_flow_key *key, > > > struct nf_conn_labels *cl; > > > int err; > > > > > > - cl = ovs_ct_get_conn_labels(ct); > > > + cl = nf_ct_labels_find(ct); > > > if (!cl) > > > return -ENOSPC; > ovs_ct_get_conn_labels() must be replaced with nf_ct_labels_find() in here > anyway, thinking that the confirmed ct without labels was created in other > places (not by OVS conntrack), the warning may still be triggered when > trying to set labels in OVS after. Right. > > Then ovs_ct_limit_init() and nf_connlabels_get() need to be called > > once on the first conntrack operatation, regardless if labels are asked > > for or not. > > > > Not nice but still better than current state. > Right, not nice, it undermines the bits check against NF_CT_LABELS_MAX_SIZE. Yes, but OTOH this bit check isn't used anymore, it comes from days when label area was dynamically sized, today is hardcoded to 128 anyway. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly 2023-07-16 21:09 [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly Xin Long ` (2 preceding siblings ...) 2023-07-16 21:09 ` [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack Xin Long @ 2023-07-19 2:58 ` Jakub Kicinski 2023-07-19 3:01 ` Florian Westphal 2023-07-19 13:31 ` Aaron Conole 2023-07-20 8:20 ` patchwork-bot+netdevbpf 5 siblings, 1 reply; 29+ messages in thread From: Jakub Kicinski @ 2023-07-19 2:58 UTC (permalink / raw) To: Pablo Neira Ayuso, Florian Westphal Cc: Xin Long, network dev, dev, davem, Eric Dumazet, Paolo Abeni, Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko, Marcelo Ricardo Leitner, Davide Caratti, Aaron Conole On Sun, 16 Jul 2023 17:09:16 -0400 Xin Long wrote: > With the OVS upcall, the original ct in the skb will be dropped, and when > the skb comes back from userspace it has to create a new ct again through > nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act(). > > However, the new ct will not be able to have the exp as the original ct > has taken it away from the hash table in nf_ct_find_expectation(). This > will cause some flow never to be matched, like: > > 'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)' > 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)' > 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal' > > if the 2nd flow triggers the OVS upcall, the 3rd flow will never get > matched. > > OVS conntrack works around this by adding its own exp lookup function to > not remove the exp from the hash table and saving the exp and its master > info to the flow keys instead of create a real ct. But this way doesn't > work for TC act_ct. > > The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from > the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This > allows both OVS conntrack and TC act_ct to have a simple and clear fix > for this problem in the patch 2/3 and 3/3. Florian, Pablo, any opinion on these? Would you prefer to take them via netfilter? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly 2023-07-19 2:58 ` [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly Jakub Kicinski @ 2023-07-19 3:01 ` Florian Westphal 2023-07-19 16:12 ` Florian Westphal 0 siblings, 1 reply; 29+ messages in thread From: Florian Westphal @ 2023-07-19 3:01 UTC (permalink / raw) To: Jakub Kicinski Cc: Pablo Neira Ayuso, Florian Westphal, Xin Long, network dev, dev, davem, Eric Dumazet, Paolo Abeni, Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko, Marcelo Ricardo Leitner, Davide Caratti, Aaron Conole Jakub Kicinski <kuba@kernel.org> wrote: > On Sun, 16 Jul 2023 17:09:16 -0400 Xin Long wrote: > > With the OVS upcall, the original ct in the skb will be dropped, and when > > the skb comes back from userspace it has to create a new ct again through > > nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act(). > > > > However, the new ct will not be able to have the exp as the original ct > > has taken it away from the hash table in nf_ct_find_expectation(). This > > will cause some flow never to be matched, like: > > > > 'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)' > > 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)' > > 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal' > > > > if the 2nd flow triggers the OVS upcall, the 3rd flow will never get > > matched. > > > > OVS conntrack works around this by adding its own exp lookup function to > > not remove the exp from the hash table and saving the exp and its master > > info to the flow keys instead of create a real ct. But this way doesn't > > work for TC act_ct. > > > > The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from > > the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This > > allows both OVS conntrack and TC act_ct to have a simple and clear fix > > for this problem in the patch 2/3 and 3/3. > > Florian, Pablo, any opinion on these? Sorry for the silence. I dislike moving tc/ovs artifacts into the conntrack core. But so far I could not come up with a counter-proposal, and it doesn't look too outrageous. Let me look at it again later today (its early morning here) and I will get back to this in my late evening. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly 2023-07-19 3:01 ` Florian Westphal @ 2023-07-19 16:12 ` Florian Westphal 0 siblings, 0 replies; 29+ messages in thread From: Florian Westphal @ 2023-07-19 16:12 UTC (permalink / raw) To: Florian Westphal Cc: Jakub Kicinski, Pablo Neira Ayuso, Xin Long, network dev, dev, davem, Eric Dumazet, Paolo Abeni, Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko, Marcelo Ricardo Leitner, Davide Caratti, Aaron Conole Florian Westphal <fw@strlen.de> wrote: > Jakub Kicinski <kuba@kernel.org> wrote: > > On Sun, 16 Jul 2023 17:09:16 -0400 Xin Long wrote: > > > With the OVS upcall, the original ct in the skb will be dropped, and when > > > the skb comes back from userspace it has to create a new ct again through > > > nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act(). > > > > > > However, the new ct will not be able to have the exp as the original ct > > > has taken it away from the hash table in nf_ct_find_expectation(). This > > > will cause some flow never to be matched, like: > > > > > > 'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)' > > > 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)' > > > 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal' > > > > > > if the 2nd flow triggers the OVS upcall, the 3rd flow will never get > > > matched. > > > > > > OVS conntrack works around this by adding its own exp lookup function to > > > not remove the exp from the hash table and saving the exp and its master > > > info to the flow keys instead of create a real ct. But this way doesn't > > > work for TC act_ct. > > > > > > The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from > > > the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This > > > allows both OVS conntrack and TC act_ct to have a simple and clear fix > > > for this problem in the patch 2/3 and 3/3. > > > > Florian, Pablo, any opinion on these? > > Sorry for the silence. I dislike moving tc/ovs artifacts into > the conntrack core. Can't find a better solution, feel free to take this though the net-next tree. Acked-by: Florian Westphal <fw@strlen.de> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly 2023-07-16 21:09 [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly Xin Long ` (3 preceding siblings ...) 2023-07-19 2:58 ` [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly Jakub Kicinski @ 2023-07-19 13:31 ` Aaron Conole 2023-07-20 8:20 ` patchwork-bot+netdevbpf 5 siblings, 0 replies; 29+ messages in thread From: Aaron Conole @ 2023-07-19 13:31 UTC (permalink / raw) To: Xin Long Cc: network dev, dev, davem, kuba, Eric Dumazet, Paolo Abeni, Pravin B Shelar, Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso, Florian Westphal, Marcelo Ricardo Leitner, Davide Caratti Xin Long <lucien.xin@gmail.com> writes: > With the OVS upcall, the original ct in the skb will be dropped, and when > the skb comes back from userspace it has to create a new ct again through > nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act(). > > However, the new ct will not be able to have the exp as the original ct > has taken it away from the hash table in nf_ct_find_expectation(). This > will cause some flow never to be matched, like: > > 'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)' > 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)' > 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal' > > if the 2nd flow triggers the OVS upcall, the 3rd flow will never get > matched. > > OVS conntrack works around this by adding its own exp lookup function to > not remove the exp from the hash table and saving the exp and its master > info to the flow keys instead of create a real ct. But this way doesn't > work for TC act_ct. > > The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from > the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This > allows both OVS conntrack and TC act_ct to have a simple and clear fix > for this problem in the patch 2/3 and 3/3. > > Xin Long (3): > netfilter: allow exp not to be removed in nf_ct_find_expectation > net: sched: set IPS_CONFIRMED in tmpl status only when commit is set > in act_ct > openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set > in conntrack > > include/net/netfilter/nf_conntrack_expect.h | 2 +- > net/netfilter/nf_conntrack_core.c | 2 +- > net/netfilter/nf_conntrack_expect.c | 4 +- > net/netfilter/nft_ct.c | 2 + > net/openvswitch/conntrack.c | 78 +++------------------ > net/sched/act_ct.c | 3 +- > 6 files changed, 18 insertions(+), 73 deletions(-) Hi Xin, The changes look okay to me, and I don't see anything that immediately jumps out. I would like to test it today / tomorrow with the ovs check-kernel testsuite if you haven't done so already. -Aaron ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly 2023-07-16 21:09 [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly Xin Long ` (4 preceding siblings ...) 2023-07-19 13:31 ` Aaron Conole @ 2023-07-20 8:20 ` patchwork-bot+netdevbpf 5 siblings, 0 replies; 29+ messages in thread From: patchwork-bot+netdevbpf @ 2023-07-20 8:20 UTC (permalink / raw) To: Xin Long Cc: netdev, dev, davem, kuba, edumazet, pabeni, pshelar, jhs, xiyou.wangcong, jiri, pablo, fw, marcelo.leitner, dcaratti, aconole Hello: This series was applied to netdev/net-next.git (main) by Paolo Abeni <pabeni@redhat.com>: On Sun, 16 Jul 2023 17:09:16 -0400 you wrote: > With the OVS upcall, the original ct in the skb will be dropped, and when > the skb comes back from userspace it has to create a new ct again through > nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act(). > > However, the new ct will not be able to have the exp as the original ct > has taken it away from the hash table in nf_ct_find_expectation(). This > will cause some flow never to be matched, like: > > [...] Here is the summary with links: - [net-next,1/3] netfilter: allow exp not to be removed in nf_ct_find_expectation https://git.kernel.org/netdev/net-next/c/4914109a8e1e - [net-next,2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct https://git.kernel.org/netdev/net-next/c/76622ced50a1 - [net-next,3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack https://git.kernel.org/netdev/net-next/c/8c8b73320805 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2024-07-09 5:49 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-16 21:09 [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly Xin Long 2023-07-16 21:09 ` [PATCH net-next 1/3] netfilter: allow exp not to be removed in nf_ct_find_expectation Xin Long 2023-07-19 16:07 ` Aaron Conole 2023-07-16 21:09 ` [PATCH net-next 2/3] net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct Xin Long 2023-07-19 16:07 ` Aaron Conole 2023-07-19 16:44 ` Davide Caratti 2023-07-16 21:09 ` [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack Xin Long 2023-07-19 16:08 ` Aaron Conole 2024-06-17 20:10 ` Ilya Maximets 2024-06-18 11:34 ` Ilya Maximets 2024-06-18 14:58 ` Xin Long 2024-06-18 15:50 ` Ilya Maximets 2024-06-19 12:58 ` Ilya Maximets 2024-06-19 14:07 ` Xin Long 2024-06-19 17:30 ` Ilya Maximets 2024-06-19 20:11 ` Xin Long 2024-06-19 20:19 ` Florian Westphal 2024-06-19 20:50 ` Xin Long 2024-06-19 21:20 ` Florian Westphal 2024-06-19 22:10 ` Xin Long 2024-07-08 22:03 ` Xin Long 2024-07-08 22:38 ` Florian Westphal 2024-07-09 1:49 ` Xin Long 2024-07-09 5:49 ` Florian Westphal 2023-07-19 2:58 ` [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly Jakub Kicinski 2023-07-19 3:01 ` Florian Westphal 2023-07-19 16:12 ` Florian Westphal 2023-07-19 13:31 ` Aaron Conole 2023-07-20 8:20 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).