netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 net-next] tcp: Refine SYN handling for PAWS.
@ 2023-03-29 20:13 Kuniyuki Iwashima
  2023-03-29 20:26 ` Eric Dumazet
  2023-03-31  8:20 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 3+ messages in thread
From: Kuniyuki Iwashima @ 2023-03-29 20:13 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, Jason Xing

Our Network Load Balancer (NLB) [0] has multiple nodes with different
IP addresses, and each node forwards TCP flows from clients to backend
targets.  NLB has an option to preserve the client's source IP address
and port when routing packets to backend targets. [1]

When a client connects to two different NLB nodes, they may select the
same backend target.  Then, if the client has used the same source IP
and port, the two flows at the backend side will have the same 4-tuple.

While testing around such cases, I saw these sequences on the backend
target.

IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 2819965599, win 62727, options [mss 8365,sackOK,TS val 1029816180 ecr 0,nop,wscale 7], length 0
IP 10.0.3.249.10000 > 10.0.0.215.60000: Flags [S.], seq 3040695044, ack 2819965600, win 62643, options [mss 8961,sackOK,TS val 1224784076 ecr 1029816180,nop,wscale 7], length 0
IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [.], ack 1, win 491, options [nop,nop,TS val 1029816181 ecr 1224784076], length 0
IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 2681819307, win 62727, options [mss 8365,sackOK,TS val 572088282 ecr 0,nop,wscale 7], length 0
IP 10.0.3.249.10000 > 10.0.0.215.60000: Flags [.], ack 1, win 490, options [nop,nop,TS val 1224794914 ecr 1029816181,nop,nop,sack 1 {4156821004:4156821005}], length 0

It seems to be working correctly, but the last ACK was generated by
tcp_send_dupack() and PAWSEstab was increased.  This is because the
second connection has a smaller timestamp than the first one.

In this case, we should send a dup ACK in tcp_send_challenge_ack()
to increase the correct counter and rate-limit it properly.

Let's check the SYN flag after the PAWS tests to avoid adding unnecessary
overhead for most packets.

Link: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html [0]
Link: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#client-ip-preservation [1]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
---
v3:
  - Remove Fixes tag

v2: https://lore.kernel.org/netdev/20230328184257.62219-1-kuniyu@amazon.com/
  - Respin for net-next
  - Update changelog about challenge ACK vs dup ACK
  - Add Reviewed-by tag

v1: https://lore.kernel.org/netdev/20230327230628.45660-1-kuniyu@amazon.com/
---
---
 net/ipv4/tcp_input.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2b75cd9e2e92..a057330d6f59 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5714,6 +5714,8 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
 	    tp->rx_opt.saw_tstamp &&
 	    tcp_paws_discard(sk, skb)) {
 		if (!th->rst) {
+			if (unlikely(th->syn))
+				goto syn_challenge;
 			NET_INC_STATS(sock_net(sk), LINUX_MIB_PAWSESTABREJECTED);
 			if (!tcp_oow_rate_limited(sock_net(sk), skb,
 						  LINUX_MIB_TCPACKSKIPPEDPAWS,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3 net-next] tcp: Refine SYN handling for PAWS.
  2023-03-29 20:13 [PATCH v3 net-next] tcp: Refine SYN handling for PAWS Kuniyuki Iwashima
@ 2023-03-29 20:26 ` Eric Dumazet
  2023-03-31  8:20 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Dumazet @ 2023-03-29 20:26 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev, Jason Xing

On Wed, Mar 29, 2023 at 10:14 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> Our Network Load Balancer (NLB) [0] has multiple nodes with different
> IP addresses, and each node forwards TCP flows from clients to backend
> targets.  NLB has an option to preserve the client's source IP address
> and port when routing packets to backend targets. [1]
>
> When a client connects to two different NLB nodes, they may select the
> same backend target.  Then, if the client has used the same source IP
> and port, the two flows at the backend side will have the same 4-tuple.
>
> While testing around such cases, I saw these sequences on the backend
> target.
>
> IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 2819965599, win 62727, options [mss 8365,sackOK,TS val 1029816180 ecr 0,nop,wscale 7], length 0
> IP 10.0.3.249.10000 > 10.0.0.215.60000: Flags [S.], seq 3040695044, ack 2819965600, win 62643, options [mss 8961,sackOK,TS val 1224784076 ecr 1029816180,nop,wscale 7], length 0
> IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [.], ack 1, win 491, options [nop,nop,TS val 1029816181 ecr 1224784076], length 0
> IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 2681819307, win 62727, options [mss 8365,sackOK,TS val 572088282 ecr 0,nop,wscale 7], length 0
> IP 10.0.3.249.10000 > 10.0.0.215.60000: Flags [.], ack 1, win 490, options [nop,nop,TS val 1224794914 ecr 1029816181,nop,nop,sack 1 {4156821004:4156821005}], length 0
>
> It seems to be working correctly, but the last ACK was generated by
> tcp_send_dupack() and PAWSEstab was increased.  This is because the
> second connection has a smaller timestamp than the first one.
>
> In this case, we should send a dup ACK in tcp_send_challenge_ack()
> to increase the correct counter and rate-limit it properly.
>
> Let's check the SYN flag after the PAWS tests to avoid adding unnecessary
> overhead for most packets.
>
> Link: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html [0]
> Link: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#client-ip-preservation [1]
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3 net-next] tcp: Refine SYN handling for PAWS.
  2023-03-29 20:13 [PATCH v3 net-next] tcp: Refine SYN handling for PAWS Kuniyuki Iwashima
  2023-03-29 20:26 ` Eric Dumazet
@ 2023-03-31  8:20 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-03-31  8:20 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: davem, edumazet, kuba, pabeni, kuni1840, netdev, kerneljasonxing

Hello:

This patch was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Wed, 29 Mar 2023 13:13:48 -0700 you wrote:
> Our Network Load Balancer (NLB) [0] has multiple nodes with different
> IP addresses, and each node forwards TCP flows from clients to backend
> targets.  NLB has an option to preserve the client's source IP address
> and port when routing packets to backend targets. [1]
> 
> When a client connects to two different NLB nodes, they may select the
> same backend target.  Then, if the client has used the same source IP
> and port, the two flows at the backend side will have the same 4-tuple.
> 
> [...]

Here is the summary with links:
  - [v3,net-next] tcp: Refine SYN handling for PAWS.
    https://git.kernel.org/netdev/net-next/c/ee05d90d0ac7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-03-31  8:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-29 20:13 [PATCH v3 net-next] tcp: Refine SYN handling for PAWS Kuniyuki Iwashima
2023-03-29 20:26 ` Eric Dumazet
2023-03-31  8:20 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).