netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 nf-next 0/4] netfilter: conntrack: ignore overly delayed tcp packets
@ 2022-08-26 13:32 Florian Westphal
  2022-08-26 13:32 ` [PATCH v2 nf-next 1/4] netfilter: conntrack: prepare tcp_in_window for ternary return value Florian Westphal
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Florian Westphal @ 2022-08-26 13:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

v2:
 - rebase on top of nf-next
 - add patch 4, this was not part of v1.

Consider following ruleset:
... ct state new accept
... ct state invalid drop

Normally a tcp receiver will reply with an ack once it receives
a delayed packet. Example:

+0.0001 < P. 1:1461(1460) ack 1 win 257
+.0 > . 1:1(0) ack 1461 win 65535
+0.0001 < P. 1461:2921(1460) ack 1 win 257
[..]
+0.0001 < P. 65701:67161(1460) ack 1 win 257
+.0 > . 1:1(0) ack 67161 win 65535 // all data received

// delayed packet, already acked
+0.0001 < P. 1:1461(1460) ack 1 win 257

// nf_ct_proto_6: SEQ is under the lower bound (already ACKed data retransmitted) IN=.. SEQ=1 ACK=4162510439 WINDOW=257 ACK PSHR
+.0 > . 1:1(0) ack 67161 win 65535

If the delayed packet is not dropped, the receiver can
immediately send another ack, but this doesn't happen if
INVALID packets are dropped by the ruleset (which is a common thing to do).

This changes conntrack to treat such packets as valid, with the caveat
that they will not extend the tcp timeout or cause state changes.

Ideally we could augment state matching so that this decision
is pushed to the ruleset but so far I don't see how this could be done
with the limited space we have in sk_buff (except for yet another skb
extension, but that appears to be too much for such a narrow use case).

Florian Westphal (4):
  netfilter: conntrack: prepare tcp_in_window for ternary return value
  netfilter: conntrack: ignore overly delayed tcp packets
  netfilter: conntrack: remove unneeded indent level
  netfilter: conntrack: reduce timeout when receiving out-of-window fin
    or rst

 net/netfilter/nf_conntrack_proto_tcp.c | 318 ++++++++++++++++---------
 1 file changed, 199 insertions(+), 119 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 nf-next 1/4] netfilter: conntrack: prepare tcp_in_window for ternary return value
  2022-08-26 13:32 [PATCH v2 nf-next 0/4] netfilter: conntrack: ignore overly delayed tcp packets Florian Westphal
@ 2022-08-26 13:32 ` Florian Westphal
  2022-08-26 13:32 ` [PATCH v2 nf-next 2/4] netfilter: conntrack: ignore overly delayed tcp packets Florian Westphal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2022-08-26 13:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

tcp_in_window returns true if the packet is in window and false if it is
not.

If its outside of window, packet will be treated as INVALID.

There are corner cases where the packet should still be tracked, because
rulesets may drop or log such packets, even though they can occur during
normal operation, such as overly delayed acks.

In extreme cases, connection may hang forever because conntrack state
differs from real state.

There is no retransmission for ACKs.

In case of ACK loss after conntrack processing, its possible that a
connection can be stuck because the actual retransmits are considered
stale ("SEQ is under the lower bound (already ACKed data
retransmitted)".

The problem is made worse by carrier-grade-nat which can also result
in stale packets from old connections to get treated as 'recent' packets
in conntrack (it doesn't support tcp timestamps at this time).

Prepare tcp_in_window() to return an enum that tells the desired
action (in-window/accept, bogus/drop).

A third action (accept the packet as in-window, but do not change
state) is added in a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v2: rebase.

 net/netfilter/nf_conntrack_proto_tcp.c | 134 ++++++++++++++++---------
 1 file changed, 86 insertions(+), 48 deletions(-)

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index a634c72b1ffc..a2cda7f76f4e 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -47,6 +47,11 @@ static const char *const tcp_conntrack_names[] = {
 	"SYN_SENT2",
 };
 
+enum nf_ct_tcp_action {
+	NFCT_TCP_INVALID,
+	NFCT_TCP_ACCEPT,
+};
+
 #define SECS * HZ
 #define MINS * 60 SECS
 #define HOURS * 60 MINS
@@ -472,13 +477,37 @@ static void tcp_init_sender(struct ip_ct_tcp_state *sender,
 	}
 }
 
-static bool tcp_in_window(struct nf_conn *ct,
-			  enum ip_conntrack_dir dir,
-			  unsigned int index,
-			  const struct sk_buff *skb,
-			  unsigned int dataoff,
-			  const struct tcphdr *tcph,
-			  const struct nf_hook_state *hook_state)
+__printf(6, 7)
+static enum nf_ct_tcp_action nf_tcp_log_invalid(const struct sk_buff *skb,
+						const struct nf_conn *ct,
+						const struct nf_hook_state *state,
+						const struct ip_ct_tcp_state *sender,
+						enum nf_ct_tcp_action ret,
+						const char *fmt, ...)
+{
+	const struct nf_tcp_net *tn = nf_tcp_pernet(nf_ct_net(ct));
+	struct va_format vaf;
+	va_list args;
+	bool be_liberal;
+
+	be_liberal = sender->flags & IP_CT_TCP_FLAG_BE_LIBERAL || tn->tcp_be_liberal;
+	if (be_liberal)
+		return NFCT_TCP_ACCEPT;
+
+	va_start(args, fmt);
+	vaf.fmt = fmt;
+	vaf.va = &args;
+	nf_ct_l4proto_log_invalid(skb, ct, state, "%pV", &vaf);
+	va_end(args);
+
+	return ret;
+}
+
+static enum nf_ct_tcp_action
+tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir,
+	      unsigned int index, const struct sk_buff *skb,
+	      unsigned int dataoff, const struct tcphdr *tcph,
+	      const struct nf_hook_state *hook_state)
 {
 	struct ip_ct_tcp *state = &ct->proto.tcp;
 	struct net *net = nf_ct_net(ct);
@@ -486,9 +515,9 @@ static bool tcp_in_window(struct nf_conn *ct,
 	struct ip_ct_tcp_state *sender = &state->seen[dir];
 	struct ip_ct_tcp_state *receiver = &state->seen[!dir];
 	__u32 seq, ack, sack, end, win, swin;
-	u16 win_raw;
+	bool in_recv_win, seq_ok;
 	s32 receiver_offset;
-	bool res, in_recv_win;
+	u16 win_raw;
 
 	/*
 	 * Get the required data from the packet.
@@ -517,7 +546,7 @@ static bool tcp_in_window(struct nf_conn *ct,
 					end, win);
 			if (!tcph->ack)
 				/* Simultaneous open */
-				return true;
+				return NFCT_TCP_ACCEPT;
 		} else {
 			/*
 			 * We are in the middle of a connection,
@@ -584,13 +613,52 @@ static bool tcp_in_window(struct nf_conn *ct,
 		 */
 		seq = end = sender->td_end;
 
+	seq_ok = before(seq, sender->td_maxend + 1);
+	if (!seq_ok) {
+		u32 overshot = end - sender->td_maxend + 1;
+		bool ack_ok;
+
+		ack_ok = after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1);
+		in_recv_win = receiver->td_maxwin &&
+			      after(end, sender->td_end - receiver->td_maxwin - 1);
+
+		if (in_recv_win &&
+		    ack_ok &&
+		    overshot <= receiver->td_maxwin &&
+		    before(sack, receiver->td_end + 1)) {
+			/* Work around TCPs that send more bytes than allowed by
+			 * the receive window.
+			 *
+			 * If the (marked as invalid) packet is allowed to pass by
+			 * the ruleset and the peer acks this data, then its possible
+			 * all future packets will trigger 'ACK is over upper bound' check.
+			 *
+			 * Thus if only the sequence check fails then do update td_end so
+			 * possible ACK for this data can update internal state.
+			 */
+			sender->td_end = end;
+			sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
+
+			nf_ct_l4proto_log_invalid(skb, ct, hook_state,
+						  "%u bytes more than expected", overshot);
+			return NFCT_TCP_ACCEPT;
+		}
+
+		return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_INVALID,
+					  "SEQ is over upper bound %u (over the window of the receiver)",
+					  sender->td_maxend + 1);
+	}
+
+	if (!before(sack, receiver->td_end + 1))
+		return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_INVALID,
+					  "ACK is over upper bound %u (ACKed data not seen yet)",
+					  receiver->td_end + 1);
+
 	/* Is the ending sequence in the receive window (if available)? */
 	in_recv_win = !receiver->td_maxwin ||
 		      after(end, sender->td_end - receiver->td_maxwin - 1);
 
-	if (before(seq, sender->td_maxend + 1) &&
-	    in_recv_win &&
-	    before(sack, receiver->td_end + 1) &&
+	if (in_recv_win &&
 	    after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1)) {
 		/*
 		 * Take into account window scaling (RFC 1323).
@@ -648,44 +716,12 @@ static bool tcp_in_window(struct nf_conn *ct,
 				state->retrans = 0;
 			}
 		}
-		res = true;
 	} else {
-		res = false;
 		if (sender->flags & IP_CT_TCP_FLAG_BE_LIBERAL ||
 		    tn->tcp_be_liberal)
-			res = true;
-		if (!res) {
-			bool seq_ok = before(seq, sender->td_maxend + 1);
-
-			if (!seq_ok) {
-				u32 overshot = end - sender->td_maxend + 1;
-				bool ack_ok;
-
-				ack_ok = after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1);
-
-				if (in_recv_win &&
-				    ack_ok &&
-				    overshot <= receiver->td_maxwin &&
-				    before(sack, receiver->td_end + 1)) {
-					/* Work around TCPs that send more bytes than allowed by
-					 * the receive window.
-					 *
-					 * If the (marked as invalid) packet is allowed to pass by
-					 * the ruleset and the peer acks this data, then its possible
-					 * all future packets will trigger 'ACK is over upper bound' check.
-					 *
-					 * Thus if only the sequence check fails then do update td_end so
-					 * possible ACK for this data can update internal state.
-					 */
-					sender->td_end = end;
-					sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
-
-					nf_ct_l4proto_log_invalid(skb, ct, hook_state,
-								  "%u bytes more than expected", overshot);
-					return res;
-				}
-			}
+			return NFCT_TCP_ACCEPT;
 
+		{
 			nf_ct_l4proto_log_invalid(skb, ct, hook_state,
 			"%s",
 			before(seq, sender->td_maxend + 1) ?
@@ -697,9 +733,11 @@ static bool tcp_in_window(struct nf_conn *ct,
 			: "SEQ is under the lower bound (already ACKed data retransmitted)"
 			: "SEQ is over the upper bound (over the window of the receiver)");
 		}
+
+		return NFCT_TCP_INVALID;
 	}
 
-	return res;
+	return NFCT_TCP_ACCEPT;
 }
 
 /* table of valid flag combinations - PUSH, ECE and CWR are always valid */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 nf-next 2/4] netfilter: conntrack: ignore overly delayed tcp packets
  2022-08-26 13:32 [PATCH v2 nf-next 0/4] netfilter: conntrack: ignore overly delayed tcp packets Florian Westphal
  2022-08-26 13:32 ` [PATCH v2 nf-next 1/4] netfilter: conntrack: prepare tcp_in_window for ternary return value Florian Westphal
@ 2022-08-26 13:32 ` Florian Westphal
  2022-08-26 13:32 ` [PATCH v2 nf-next 3/4] netfilter: conntrack: remove unneeded indent level Florian Westphal
  2022-08-26 13:32 ` [PATCH v2 nf-next 4/4] netfilter: conntrack: reduce timeout when receiving out-of-window fin or rst Florian Westphal
  3 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2022-08-26 13:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

If 'nf_conntrack_tcp_loose' is off (the default), tcp packets that are
outside of the current window are marked as INVALID.

nf/iptables rulesets often drop such packets via 'ct state invalid' or
similar checks.

For overly delayed acks, this can be a nuisance if such 'invalid' packets
are also logged.

Since they are not invalid in a strict sense, just ignore them, i.e.
conntrack won't extend timeout or change state so that they do not match
invalid state rules anymore.

This also avoids unwantend connection stalls in case conntrack considers
retransmission (of data that did not reach the peer) as too old.

The else branch of the conditional becomes obsolete.
Next patch will reformant the now always-true if condition.

The existing workaround for data that exceeds the calculated receive
window is adjusted to use the 'ignore' state so that these packets do
not refresh the timeout or change state other than updating ->td_end.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v2: rebase

 net/netfilter/nf_conntrack_proto_tcp.c | 49 +++++++++++---------------
 1 file changed, 21 insertions(+), 28 deletions(-)

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index a2cda7f76f4e..5404bf7203f3 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -48,6 +48,7 @@ static const char *const tcp_conntrack_names[] = {
 };
 
 enum nf_ct_tcp_action {
+	NFCT_TCP_IGNORE,
 	NFCT_TCP_INVALID,
 	NFCT_TCP_ACCEPT,
 };
@@ -510,8 +511,6 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir,
 	      const struct nf_hook_state *hook_state)
 {
 	struct ip_ct_tcp *state = &ct->proto.tcp;
-	struct net *net = nf_ct_net(ct);
-	struct nf_tcp_net *tn = nf_tcp_pernet(net);
 	struct ip_ct_tcp_state *sender = &state->seen[dir];
 	struct ip_ct_tcp_state *receiver = &state->seen[!dir];
 	__u32 seq, ack, sack, end, win, swin;
@@ -639,9 +638,8 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir,
 			sender->td_end = end;
 			sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
 
-			nf_ct_l4proto_log_invalid(skb, ct, hook_state,
+			return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_IGNORE,
 						  "%u bytes more than expected", overshot);
-			return NFCT_TCP_ACCEPT;
 		}
 
 		return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_INVALID,
@@ -657,9 +655,15 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir,
 	/* Is the ending sequence in the receive window (if available)? */
 	in_recv_win = !receiver->td_maxwin ||
 		      after(end, sender->td_end - receiver->td_maxwin - 1);
-
-	if (in_recv_win &&
-	    after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1)) {
+	if (!in_recv_win)
+		return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_IGNORE,
+					  "SEQ is under lower bound %u (already ACKed data retransmitted)",
+					  sender->td_end - receiver->td_maxwin - 1);
+	if (!after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1))
+		return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_IGNORE,
+					  "ignored ACK under lower bound %u (possible overly delayed)",
+					  receiver->td_end - MAXACKWINDOW(sender) - 1);
+	if (1) {
 		/*
 		 * Take into account window scaling (RFC 1323).
 		 */
@@ -716,25 +720,6 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir,
 				state->retrans = 0;
 			}
 		}
-	} else {
-		if (sender->flags & IP_CT_TCP_FLAG_BE_LIBERAL ||
-		    tn->tcp_be_liberal)
-			return NFCT_TCP_ACCEPT;
-
-		{
-			nf_ct_l4proto_log_invalid(skb, ct, hook_state,
-			"%s",
-			before(seq, sender->td_maxend + 1) ?
-			in_recv_win ?
-			before(sack, receiver->td_end + 1) ?
-			after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1) ? "BUG"
-			: "ACK is under the lower bound (possible overly delayed ACK)"
-			: "ACK is over the upper bound (ACKed data not seen yet)"
-			: "SEQ is under the lower bound (already ACKed data retransmitted)"
-			: "SEQ is over the upper bound (over the window of the receiver)");
-		}
-
-		return NFCT_TCP_INVALID;
 	}
 
 	return NFCT_TCP_ACCEPT;
@@ -899,6 +884,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
 	struct nf_conntrack_tuple *tuple;
 	enum tcp_conntrack new_state, old_state;
 	unsigned int index, *timeouts;
+	enum nf_ct_tcp_action res;
 	enum ip_conntrack_dir dir;
 	const struct tcphdr *th;
 	struct tcphdr _tcph;
@@ -1164,10 +1150,17 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
 		break;
 	}
 
-	if (!tcp_in_window(ct, dir, index,
-			   skb, dataoff, th, state)) {
+	res = tcp_in_window(ct, dir, index,
+			    skb, dataoff, th, state);
+	switch (res) {
+	case NFCT_TCP_IGNORE:
+		spin_unlock_bh(&ct->lock);
+		return NF_ACCEPT;
+	case NFCT_TCP_INVALID:
 		spin_unlock_bh(&ct->lock);
 		return -NF_ACCEPT;
+	case NFCT_TCP_ACCEPT:
+		break;
 	}
      in_window:
 	/* From now on we have got in-window packets */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 nf-next 3/4] netfilter: conntrack: remove unneeded indent level
  2022-08-26 13:32 [PATCH v2 nf-next 0/4] netfilter: conntrack: ignore overly delayed tcp packets Florian Westphal
  2022-08-26 13:32 ` [PATCH v2 nf-next 1/4] netfilter: conntrack: prepare tcp_in_window for ternary return value Florian Westphal
  2022-08-26 13:32 ` [PATCH v2 nf-next 2/4] netfilter: conntrack: ignore overly delayed tcp packets Florian Westphal
@ 2022-08-26 13:32 ` Florian Westphal
  2022-08-26 13:32 ` [PATCH v2 nf-next 4/4] netfilter: conntrack: reduce timeout when receiving out-of-window fin or rst Florian Westphal
  3 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2022-08-26 13:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

After previous patch, the conditional branch is obsolete, reformat it.
gcc generates same code as before this change.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v2: rebase

 net/netfilter/nf_conntrack_proto_tcp.c | 99 ++++++++++++--------------
 1 file changed, 45 insertions(+), 54 deletions(-)

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 5404bf7203f3..a2630647a4bb 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -663,62 +663,53 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir,
 		return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_IGNORE,
 					  "ignored ACK under lower bound %u (possible overly delayed)",
 					  receiver->td_end - MAXACKWINDOW(sender) - 1);
-	if (1) {
-		/*
-		 * Take into account window scaling (RFC 1323).
-		 */
-		if (!tcph->syn)
-			win <<= sender->td_scale;
-
-		/*
-		 * Update sender data.
-		 */
-		swin = win + (sack - ack);
-		if (sender->td_maxwin < swin)
-			sender->td_maxwin = swin;
-		if (after(end, sender->td_end)) {
-			sender->td_end = end;
-			sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
-		}
-		if (tcph->ack) {
-			if (!(sender->flags & IP_CT_TCP_FLAG_MAXACK_SET)) {
-				sender->td_maxack = ack;
-				sender->flags |= IP_CT_TCP_FLAG_MAXACK_SET;
-			} else if (after(ack, sender->td_maxack))
-				sender->td_maxack = ack;
-		}
 
-		/*
-		 * Update receiver data.
-		 */
-		if (receiver->td_maxwin != 0 && after(end, sender->td_maxend))
-			receiver->td_maxwin += end - sender->td_maxend;
-		if (after(sack + win, receiver->td_maxend - 1)) {
-			receiver->td_maxend = sack + win;
-			if (win == 0)
-				receiver->td_maxend++;
-		}
-		if (ack == receiver->td_end)
-			receiver->flags &= ~IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
+	/* Take into account window scaling (RFC 1323). */
+	if (!tcph->syn)
+		win <<= sender->td_scale;
+
+	/* Update sender data. */
+	swin = win + (sack - ack);
+	if (sender->td_maxwin < swin)
+		sender->td_maxwin = swin;
+	if (after(end, sender->td_end)) {
+		sender->td_end = end;
+		sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
+	}
+	if (tcph->ack) {
+		if (!(sender->flags & IP_CT_TCP_FLAG_MAXACK_SET)) {
+			sender->td_maxack = ack;
+			sender->flags |= IP_CT_TCP_FLAG_MAXACK_SET;
+		} else if (after(ack, sender->td_maxack))
+			sender->td_maxack = ack;
+	}
 
-		/*
-		 * Check retransmissions.
-		 */
-		if (index == TCP_ACK_SET) {
-			if (state->last_dir == dir
-			    && state->last_seq == seq
-			    && state->last_ack == ack
-			    && state->last_end == end
-			    && state->last_win == win_raw)
-				state->retrans++;
-			else {
-				state->last_dir = dir;
-				state->last_seq = seq;
-				state->last_ack = ack;
-				state->last_end = end;
-				state->last_win = win_raw;
-				state->retrans = 0;
-			}
+	/* Update receiver data. */
+	if (receiver->td_maxwin != 0 && after(end, sender->td_maxend))
+		receiver->td_maxwin += end - sender->td_maxend;
+	if (after(sack + win, receiver->td_maxend - 1)) {
+		receiver->td_maxend = sack + win;
+		if (win == 0)
+			receiver->td_maxend++;
+	}
+	if (ack == receiver->td_end)
+		receiver->flags &= ~IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
+
+	/* Check retransmissions. */
+	if (index == TCP_ACK_SET) {
+		if (state->last_dir == dir
+		    && state->last_seq == seq
+		    && state->last_ack == ack
+		    && state->last_end == end
+		    && state->last_win == win_raw)
+			state->retrans++;
+		else {
+			state->last_dir = dir;
+			state->last_seq = seq;
+			state->last_ack = ack;
+			state->last_end = end;
+			state->last_win = win_raw;
+			state->retrans = 0;
 		}
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 nf-next 4/4] netfilter: conntrack: reduce timeout when receiving out-of-window fin or rst
  2022-08-26 13:32 [PATCH v2 nf-next 0/4] netfilter: conntrack: ignore overly delayed tcp packets Florian Westphal
                   ` (2 preceding siblings ...)
  2022-08-26 13:32 ` [PATCH v2 nf-next 3/4] netfilter: conntrack: remove unneeded indent level Florian Westphal
@ 2022-08-26 13:32 ` Florian Westphal
  3 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2022-08-26 13:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

In case the endpoints and conntrack go out-of-sync, i.e. there is
disagreement wrt. validy of sequence/ack numbers between conntracks
internal state and those of the endpoints, connections can hang for a
long time (until ESTABLISHED timeout).

This adds a check to detect a fin/fin exchange even if those are
invalid.  The timeout is then lowered to UNACKED (default 300s).

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v2: new patch.

 net/netfilter/nf_conntrack_proto_tcp.c | 58 ++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index a2630647a4bb..fd345c40bed9 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -716,6 +716,63 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir,
 	return NFCT_TCP_ACCEPT;
 }
 
+static void __cold nf_tcp_handle_invalid(struct nf_conn *ct,
+					 enum ip_conntrack_dir dir,
+					 int index,
+					 const struct sk_buff *skb,
+					 const struct nf_hook_state *hook_state)
+{
+	const unsigned int *timeouts;
+	const struct nf_tcp_net *tn;
+	unsigned int timeout;
+	u32 expires;
+
+	if (!test_bit(IPS_ASSURED_BIT, &ct->status) ||
+	    test_bit(IPS_FIXED_TIMEOUT_BIT, &ct->status))
+		return;
+
+	/* We don't want to have connections hanging around in ESTABLISHED
+	 * state for long time 'just because' conntrack deemed a FIN/RST
+	 * out-of-window.
+	 *
+	 * Shrink the timeout just like when there is unacked data.
+	 * This speeds up eviction of 'dead' connections where the
+	 * connection and conntracks internal state are out of sync.
+	 */
+	switch (index) {
+	case TCP_RST_SET:
+	case TCP_FIN_SET:
+		break;
+	default:
+		return;
+	}
+
+	if (ct->proto.tcp.last_dir != dir &&
+	    (ct->proto.tcp.last_index == TCP_FIN_SET ||
+	     ct->proto.tcp.last_index == TCP_RST_SET)) {
+		expires = nf_ct_expires(ct);
+		if (expires < 120 * HZ)
+			return;
+
+		tn = nf_tcp_pernet(nf_ct_net(ct));
+		timeouts = nf_ct_timeout_lookup(ct);
+		if (!timeouts)
+			timeouts = tn->timeouts;
+
+		timeout = READ_ONCE(timeouts[TCP_CONNTRACK_UNACK]);
+		if (expires > timeout) {
+			nf_ct_l4proto_log_invalid(skb, ct, hook_state,
+					  "packet (index %d, dir %d) response for index %d lower timeout to %u",
+					  index, dir, ct->proto.tcp.last_index, timeout);
+
+			WRITE_ONCE(ct->timeout, timeout + nfct_time_stamp);
+		}
+	} else {
+                ct->proto.tcp.last_index = index;
+                ct->proto.tcp.last_dir = dir;
+	}
+}
+
 /* table of valid flag combinations - PUSH, ECE and CWR are always valid */
 static const u8 tcp_valid_flags[(TCPHDR_FIN|TCPHDR_SYN|TCPHDR_RST|TCPHDR_ACK|
 				 TCPHDR_URG) + 1] =
@@ -1148,6 +1205,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
 		spin_unlock_bh(&ct->lock);
 		return NF_ACCEPT;
 	case NFCT_TCP_INVALID:
+		nf_tcp_handle_invalid(ct, dir, index, skb, state);
 		spin_unlock_bh(&ct->lock);
 		return -NF_ACCEPT;
 	case NFCT_TCP_ACCEPT:
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-26 13:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-26 13:32 [PATCH v2 nf-next 0/4] netfilter: conntrack: ignore overly delayed tcp packets Florian Westphal
2022-08-26 13:32 ` [PATCH v2 nf-next 1/4] netfilter: conntrack: prepare tcp_in_window for ternary return value Florian Westphal
2022-08-26 13:32 ` [PATCH v2 nf-next 2/4] netfilter: conntrack: ignore overly delayed tcp packets Florian Westphal
2022-08-26 13:32 ` [PATCH v2 nf-next 3/4] netfilter: conntrack: remove unneeded indent level Florian Westphal
2022-08-26 13:32 ` [PATCH v2 nf-next 4/4] netfilter: conntrack: reduce timeout when receiving out-of-window fin or rst Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).