All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Sven Auhagen <sven.auhagen@voleatech.de>
Cc: netfilter-devel@vger.kernel.org, cratiu@nvidia.com,
	ozsh@nvidia.com, vladbu@nvidia.com, gal@nvidia.com, fw@strlen.de
Subject: Re: [PATCH nf] netfilter: flowtable: infer TCP state and timeout before flow teardown
Date: Thu, 11 Apr 2024 11:27:44 +0200	[thread overview]
Message-ID: <ZhetEIvz_vCB-A5D@calendula> (raw)
In-Reply-To: <uhn7bt3jdrvmczhlw3dsrinb2opr2qksnbip7asekilgczm35v@hyvzkxrgdhgn>

[-- Attachment #1: Type: text/plain, Size: 4800 bytes --]

On Tue, Apr 09, 2024 at 01:35:15PM +0200, Sven Auhagen wrote:
> On Tue, Apr 09, 2024 at 01:11:43PM +0200, Pablo Neira Ayuso wrote:
> > Hi Sven,
> > 
> > On Mon, Apr 08, 2024 at 07:24:43AM +0200, Sven Auhagen wrote:
> > > Hi Pablo,
> > > 
> > > after some testing the problem only happens very rarely now.
> > > I suspect it happens only on connections that are at some point
> > > one way only or in some other way not in a correct state anymore.
> > > Never the less your latest patches are very good and reduce the problem
> > > to an absolute minimum that FIN WAIT is offlodaded and the timeout
> > > is correct now.
> > 
> > Thanks for testing, I am going to submit this patch.
> > 
> > If you have a bit more cycles, I still would like to know what corner
> > case scenario is still triggering this so...
> > 
> > > Here is one example if a flow that still is in FIN WAIT:
> > > 
> > > [NEW] tcp      6 120 SYN_SENT src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 [UNREPLIED] src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 mark=16777216
> > > [UPDATE] tcp      6 60 SYN_RECV src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 mark=16777216
> > > [UPDATE] tcp      6 86400 ESTABLISHED src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 [OFFLOAD] mark=16777216
> > > [UPDATE] tcp      6 120 FIN_WAIT src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 [OFFLOAD] mark=16777216
> > > [UPDATE] tcp      6 30 LAST_ACK src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 [ASSURED] mark=16777216
> > >  [UPDATE] tcp      6 120 TIME_WAIT src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 [ASSURED] mark=16777216
> > >  [DESTROY] tcp      6 TIME_WAIT src=fd00::192:168:5:32 dst=2a05:d014:687:ed01::21 sport=58790 dport=443 packets=15 bytes=1750 src=2a05:d014:687:ed01::21 dst=2003:a:c7f:e5e8:a3ae:63f7:7f92:e286 sport=443 dport=60848 packets=13 bytes=6905 [ASSURED] mark=16777216 delta-time=120
> > 
> > ... could you run conntrack -E -o timestamp? I'd like to know if this is
> > a flow that is handed over back to classic path after 30 seconds, then
> > being placed in the flowtable again.
> 
> Sure here is a fresh output:
> 
> [1712662404.573225]	    [NEW] tcp      6 120 SYN_SENT src=192.168.7.101 dst=157.240.251.61 sport=52717 dport=5222 [UNREPLIED] src=157.240.251.61 dst=87.138.198.79 sport=5222 dport=26886 mark=25165825
> [1712662404.588094]	 [UPDATE] tcp      6 60 SYN_RECV src=192.168.7.101 dst=157.240.251.61 sport=52717 dport=5222 src=157.240.251.61 dst=87.138.198.79 sport=5222 dport=26886 mark=25165825
> [1712662404.591802]	 [UPDATE] tcp      6 86400 ESTABLISHED src=192.168.7.101 dst=157.240.251.61 sport=52717 dport=5222 src=157.240.251.61 dst=87.138.198.79 sport=5222 dport=26886 [OFFLOAD] mark=25165825
> [1712662405.682563]	 [UPDATE] tcp      6 120 FIN_WAIT src=192.168.7.101 dst=157.240.251.61 sport=52717 dport=5222 src=157.240.251.61 dst=87.138.198.79 sport=5222 dport=26886 [OFFLOAD] mark=25165825

It is happening right away, a bit more of 1 second after.

I can also see IP_CT_TCP_FLAG_CLOSE_INIT is not set on when ct->state
is adjusted to _FIN_WAIT state in the fixup routine.

There are also packets that might trigger transitions to sIG
(ignored) in TCP tracking. Probably conntrack -E is misleading because
this does not allow us to see what actual packets are coming after the
fin.

Can you give a try to this untested patch? It applies on top of the
two patches you already have for TCP and UDP.

> [1712662405.689501]	 [UPDATE] tcp      6 30 LAST_ACK src=192.168.7.101 dst=157.240.251.61 sport=52717 dport=5222 src=157.240.251.61 dst=87.138.198.79 sport=5222 dport=26886 [ASSURED] mark=25165825
> [1712662405.704370]	 [UPDATE] tcp      6 120 TIME_WAIT src=192.168.7.101 dst=157.240.251.61 sport=52717 dport=5222 src=157.240.251.61 dst=87.138.198.79 sport=5222 dport=26886 [ASSURED] mark=25165825

Can you also enable -o id,timestamp? The id should tell us if we are
always talking on the same object.

> [1712662451.967906]	[DESTROY] tcp      6 ESTABLISHED src=192.168.6.122 dst=52.98.243.2 sport=52717 dport=443 packets=14 bytes=4134 src=52.98.243.2 dst=37.24.174.42 sport=443 dport=20116 packets=17 bytes=13712 [ASSURED] mark=16777216 delta-time=140

[-- Attachment #2: adjust-ct-tcp-flags.patch --]
[-- Type: text/x-diff, Size: 3283 bytes --]

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 8507deed6b1f..2388c4fe99a0 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -293,7 +293,7 @@ int nf_flow_table_init(struct nf_flowtable *flow_table);
 void nf_flow_table_free(struct nf_flowtable *flow_table);
 
 void flow_offload_teardown(struct flow_offload *flow);
-void flow_offload_teardown_tcp(struct flow_offload *flow, bool fin);
+void flow_offload_teardown_tcp(struct flow_offload *flow, bool fin, int dir);
 
 void nf_flow_snat_port(const struct flow_offload *flow,
 		       struct sk_buff *skb, unsigned int thoff,
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 16068ef04490..e3a7d8eff727 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -350,18 +350,21 @@ void flow_offload_teardown(struct flow_offload *flow)
 }
 EXPORT_SYMBOL_GPL(flow_offload_teardown);
 
-void flow_offload_teardown_tcp(struct flow_offload *flow, bool fin)
+void flow_offload_teardown_tcp(struct flow_offload *flow, bool fin, int dir)
 {
+	struct nf_conn *ct = flow->ct;
 	enum tcp_conntrack tcp_state;
 
-	if (fin)
+	if (fin) {
 		tcp_state = TCP_CONNTRACK_FIN_WAIT;
-	else /* rst */
+		ct->proto.tcp.seen[dir].flags |= IP_CT_TCP_FLAG_CLOSE_INIT;
+	} else { /* rst */
 		tcp_state = TCP_CONNTRACK_CLOSE;
+	}
 
-	flow_offload_fixup_tcp(nf_ct_net(flow->ct), flow->ct, tcp_state);
+	flow_offload_fixup_tcp(nf_ct_net(ct), ct, tcp_state);
 	smp_mb__before_atomic();
-	clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
+	clear_bit(IPS_OFFLOAD_BIT, &ct->status);
 	set_bit(NF_FLOW_TEARDOWN, &flow->flags);
 }
 EXPORT_SYMBOL_GPL(flow_offload_teardown_tcp);
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 9840ab5e3ae6..04fabfa9ff7e 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -20,7 +20,7 @@
 #include <linux/udp.h>
 
 static int nf_flow_state_check(struct flow_offload *flow, int proto,
-			       struct sk_buff *skb, unsigned int thoff)
+			       struct sk_buff *skb, unsigned int thoff, int dir)
 {
 	struct tcphdr *tcph;
 
@@ -29,7 +29,7 @@ static int nf_flow_state_check(struct flow_offload *flow, int proto,
 
 	tcph = (void *)(skb_network_header(skb) + thoff);
 	if (unlikely(tcph->fin || tcph->rst)) {
-		flow_offload_teardown_tcp(flow, tcph->fin);
+		flow_offload_teardown_tcp(flow, tcph->fin, dir);
 		return -1;
 	}
 
@@ -379,7 +379,7 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
 
 	iph = (struct iphdr *)(skb_network_header(skb) + ctx->offset);
 	thoff = (iph->ihl * 4) + ctx->offset;
-	if (nf_flow_state_check(flow, iph->protocol, skb, thoff))
+	if (nf_flow_state_check(flow, iph->protocol, skb, thoff, dir))
 		return 0;
 
 	if (!nf_flow_dst_check(&tuplehash->tuple)) {
@@ -658,7 +658,7 @@ static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
 
 	ip6h = (struct ipv6hdr *)(skb_network_header(skb) + ctx->offset);
 	thoff = sizeof(*ip6h) + ctx->offset;
-	if (nf_flow_state_check(flow, ip6h->nexthdr, skb, thoff))
+	if (nf_flow_state_check(flow, ip6h->nexthdr, skb, thoff, dir))
 		return 0;
 
 	if (!nf_flow_dst_check(&tuplehash->tuple)) {

  reply	other threads:[~2024-04-11  9:27 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-18  9:39 [PATCH nf] netfilter: flowtable: infer TCP state and timeout before flow teardown Pablo Neira Ayuso
2024-03-18 10:05 ` Sven Auhagen
2024-03-20  8:39 ` Sven Auhagen
2024-03-20  8:45   ` Pablo Neira Ayuso
2024-03-20  8:49     ` Sven Auhagen
2024-03-20  9:07       ` Pablo Neira Ayuso
2024-03-20  9:20         ` Sven Auhagen
2024-03-20  9:27           ` Pablo Neira Ayuso
2024-03-20  9:31             ` Sven Auhagen
2024-03-20  9:51               ` Pablo Neira Ayuso
2024-03-20 10:13                 ` Sven Auhagen
2024-03-20 10:36                   ` Pablo Neira Ayuso
2024-03-20 10:38                     ` Sven Auhagen
2024-03-20 10:29                 ` Sven Auhagen
2024-03-20 10:47                   ` Pablo Neira Ayuso
2024-03-20 11:15                     ` Sven Auhagen
2024-03-20 12:37                       ` Pablo Neira Ayuso
2024-03-20 13:37                         ` Sven Auhagen
2024-04-08  5:24                         ` Sven Auhagen
2024-04-09 11:11                           ` Pablo Neira Ayuso
2024-04-09 11:35                             ` Sven Auhagen
2024-04-11  9:27                               ` Pablo Neira Ayuso [this message]
2024-04-11 11:05                                 ` Florian Westphal
2024-04-11 11:40                                   ` Pablo Neira Ayuso
2024-04-11 12:13                                     ` Florian Westphal
2024-04-11 15:50                                       ` Pablo Neira Ayuso
2024-04-19  7:47                                         ` Sven Auhagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZhetEIvz_vCB-A5D@calendula \
    --to=pablo@netfilter.org \
    --cc=cratiu@nvidia.com \
    --cc=fw@strlen.de \
    --cc=gal@nvidia.com \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=ozsh@nvidia.com \
    --cc=sven.auhagen@voleatech.de \
    --cc=vladbu@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.