* [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip @ 2025-05-19 10:32 Duan Jiong 2025-05-19 20:11 ` Julian Anastasov 2025-05-20 7:14 ` kernel test robot 0 siblings, 2 replies; 8+ messages in thread From: Duan Jiong @ 2025-05-19 10:32 UTC (permalink / raw) To: ja, pablo; +Cc: netdev, Duan Jiong Now suppose there are two net namespaces, one is the server and its ip is 192.168.99.4, the other is the client and its ip is 192.168.99.5, and the other is configured with ipvs vip 192.168.99.6 in the host net namespace, configuring ipvs with the backend 192.168.99.5. Also configure iptables -t nat -A POSTROUTING -p TCP -j MASQUERADE to avoid packet loss when accessing with the specified source port. First we use curl --local-port 15280 to specify the source port to access the vip, after the request is completed again use curl --local-port 15280 to specify the source port to access 192.168.99.5, this time the request will always be stuck in the main. The packet sent by the client arrives at the server without any problem, but ipvs will process the packet back from the server with the wrong snat for vip, and at this time, since the client will directly rst after receiving the packet, the client will be stuck until the vip ct rule on the host times out. Signed-off-by: Duan Jiong <djduanjiong@gmail.com> --- net/netfilter/ipvs/ip_vs_core.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index c7a8a08b7308..98abe4085a11 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -1260,6 +1260,8 @@ handle_response(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, unsigned int hooknum) { struct ip_vs_protocol *pp = pd->pp; + enum ip_conntrack_info ctinfo; + struct nf_conn *ct = nf_ct_get(skb, &ctinfo); if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) goto after_nat; @@ -1270,6 +1272,12 @@ handle_response(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, goto drop; /* mangle the packet */ + if (ct != NULL && + hooknum == NF_INET_FORWARD && + !ip_vs_addr_equal(af, + &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3, + &cp->vaddr)) + return NF_ACCEPT; if (pp->snat_handler && !SNAT_CALL(pp->snat_handler, skb, pp, cp, iph)) goto drop; -- 2.32.1 (Apple Git-133) ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip 2025-05-19 10:32 [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip Duan Jiong @ 2025-05-19 20:11 ` Julian Anastasov 2025-05-20 1:52 ` Duan Jiong 2025-05-20 7:14 ` kernel test robot 1 sibling, 1 reply; 8+ messages in thread From: Julian Anastasov @ 2025-05-19 20:11 UTC (permalink / raw) To: Duan Jiong; +Cc: pablo, netdev, lvs-devel Hello, Adding lvs-devel@ to CC... On Mon, 19 May 2025, Duan Jiong wrote: > Now suppose there are two net namespaces, one is the server and > its ip is 192.168.99.4, the other is the client and its ip > is 192.168.99.5, and the other is configured with ipvs vip > 192.168.99.6 in the host net namespace, configuring ipvs with > the backend 192.168.99.5. > > Also configure > iptables -t nat -A POSTROUTING -p TCP -j MASQUERADE > to avoid packet loss when accessing with the specified > source port. May be I don't quite understand why the MASQUERADE rule is used... > > First we use curl --local-port 15280 to specify the source port > to access the vip, after the request is completed again use > curl --local-port 15280 to specify the source port to access > 192.168.99.5, this time the request will always be stuck in > the main. > > The packet sent by the client arrives at the server without > any problem, but ipvs will process the packet back from the > server with the wrong snat for vip, and at this time, since > the client will directly rst after receiving the packet, the > client will be stuck until the vip ct rule on the host > times out. > > Signed-off-by: Duan Jiong <djduanjiong@gmail.com> > --- > net/netfilter/ipvs/ip_vs_core.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c > index c7a8a08b7308..98abe4085a11 100644 > --- a/net/netfilter/ipvs/ip_vs_core.c > +++ b/net/netfilter/ipvs/ip_vs_core.c > @@ -1260,6 +1260,8 @@ handle_response(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, > unsigned int hooknum) > { > struct ip_vs_protocol *pp = pd->pp; > + enum ip_conntrack_info ctinfo; > + struct nf_conn *ct = nf_ct_get(skb, &ctinfo); > > if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) > goto after_nat; > @@ -1270,6 +1272,12 @@ handle_response(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, > goto drop; > > /* mangle the packet */ > + if (ct != NULL && > + hooknum == NF_INET_FORWARD && > + !ip_vs_addr_equal(af, > + &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3, > + &cp->vaddr)) > + return NF_ACCEPT; Such check will prevent SNAT for active FTP connections because their original direction is from real server to client. In which case ip_vs_addr_equal will see difference? When Netfilter creates new connection for packet from real server? It does not look good IPVS connection to be DNAT-ed but not SNAT-ed. May be you can explain better what IPs/ports are present in the transferred packets. > if (pp->snat_handler && > !SNAT_CALL(pp->snat_handler, skb, pp, cp, iph)) > goto drop; > -- > 2.32.1 (Apple Git-133) Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip 2025-05-19 20:11 ` Julian Anastasov @ 2025-05-20 1:52 ` Duan Jiong 2025-05-20 13:27 ` Julian Anastasov 0 siblings, 1 reply; 8+ messages in thread From: Duan Jiong @ 2025-05-20 1:52 UTC (permalink / raw) To: Julian Anastasov; +Cc: pablo, netdev, lvs-devel On Tue, May 20, 2025 at 4:11 AM Julian Anastasov <ja@ssi.bg> wrote: > > > Hello, > > Adding lvs-devel@ to CC... > > On Mon, 19 May 2025, Duan Jiong wrote: > > > Now suppose there are two net namespaces, one is the server and > > its ip is 192.168.99.4, the other is the client and its ip > > is 192.168.99.5, and the other is configured with ipvs vip > > 192.168.99.6 in the host net namespace, configuring ipvs with > > the backend 192.168.99.5. > > > > Also configure > > iptables -t nat -A POSTROUTING -p TCP -j MASQUERADE > > to avoid packet loss when accessing with the specified > > source port. > > May be I don't quite understand why the MASQUERADE > rule is used... If nat is not configured, __nf_conntrack_confirm drops packets due to tuple conflicts. I'll post my reproduction method later on. > > > > > First we use curl --local-port 15280 to specify the source port > > to access the vip, after the request is completed again use > > curl --local-port 15280 to specify the source port to access > > 192.168.99.5, this time the request will always be stuck in > > the main. > > > > The packet sent by the client arrives at the server without > > any problem, but ipvs will process the packet back from the > > server with the wrong snat for vip, and at this time, since > > the client will directly rst after receiving the packet, the > > client will be stuck until the vip ct rule on the host > > times out. > > > > Signed-off-by: Duan Jiong <djduanjiong@gmail.com> > > --- > > net/netfilter/ipvs/ip_vs_core.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c > > index c7a8a08b7308..98abe4085a11 100644 > > --- a/net/netfilter/ipvs/ip_vs_core.c > > +++ b/net/netfilter/ipvs/ip_vs_core.c > > @@ -1260,6 +1260,8 @@ handle_response(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, > > unsigned int hooknum) > > { > > struct ip_vs_protocol *pp = pd->pp; > > + enum ip_conntrack_info ctinfo; > > + struct nf_conn *ct = nf_ct_get(skb, &ctinfo); > > > > if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) > > goto after_nat; > > @@ -1270,6 +1272,12 @@ handle_response(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, > > goto drop; > > > > /* mangle the packet */ > > + if (ct != NULL && > > + hooknum == NF_INET_FORWARD && > > + !ip_vs_addr_equal(af, > > + &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3, > > + &cp->vaddr)) > > + return NF_ACCEPT; > > Such check will prevent SNAT for active FTP connections > because their original direction is from real server to client. > In which case ip_vs_addr_equal will see difference? When > Netfilter creates new connection for packet from real server? > It does not look good IPVS connection to be DNAT-ed but not > SNAT-ed. > > May be you can explain better what IPs/ports are present in > the transferred packets. > > > if (pp->snat_handler && > > !SNAT_CALL(pp->snat_handler, skb, pp, cp, iph)) > > goto drop; > > -- > > 2.32.1 (Apple Git-133) > > Regards > > -- > Julian Anastasov <ja@ssi.bg> > 1. setup environment [root@centos9s vagrant]# cat setup.sh #!/bin/bash ip netns add server ip link add svrh type veth peer name svr ip link set svr netns server ip link set svrh up ip link set dev svrh address ee:ee:ee:ee:ee:ee ip netns exec server ip link set svr up ip netns exec server ip addr add 192.168.99.4/32 dev svr ip netns exec server ip route add 169.254.1.1 dev svr scope link ip netns exec server ip route add default via 169.254.1.1 dev svr ip netns exec server ip neigh add 169.254.1.1 lladdr ee:ee:ee:ee:ee:ee dev svr nud permanent ip route add 192.168.99.4/32 dev svrh ip netns add client ip link add clih type veth peer name cli ip link set cli netns client ip link set clih up ip link set dev clih address ee:ee:ee:ee:ee:ee ip netns exec client ip link set cli up ip netns exec client ip addr add 192.168.99.5/32 dev cli ip netns exec client ip route add 169.254.1.1 dev cli scope link ip netns exec client ip route add default via 169.254.1.1 dev cli ip netns exec client ip neigh add 169.254.1.1 lladdr ee:ee:ee:ee:ee:ee dev cli nud permanent ip route add 192.168.99.5/32 dev clih ip addr add 192.168.99.6/32 dev lo ipvsadm -A -t 192.168.99.6:8080 -s rr ipvsadm -a -t 192.168.99.6:8080 -r 192.168.99.4:8080 -m echo 1 > /proc/sys/net/ipv4/ip_forward echo 1 > /proc/sys/net/ipv4/vs/conntrack iptables -t nat -A POSTROUTING -p TCP -j MASQUERADE 2. start server ip netns exec server python -m http.server 8080 3. curl vip ip netns exec client curl --local-port 15280 http://192.168.99.6:8080 4. curl rs ip netns exec client curl --local-port 15280 http://192.168.99.4:8080 Here are the ct rules for executing curl and the tcpdump capture. [root@centos9s vagrant]# tcpdump -s0 -nn -i clih dropped privs to tcpdump tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on clih, link-type EN10MB (Ethernet), snapshot length 262144 bytes 01:50:14.328558 IP6 fe80::fc0e:fff:fef8:7c05 > ff02::2: ICMP6, router solicitation, length 16 01:50:28.430769 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [S], seq 614710449, win 64240, options [mss 1460,sackOK,TS val 2654895687 ecr 0,nop,wscale 7], length 0 01:50:28.431026 ARP, Request who-has 192.168.99.5 tell 192.168.99.6, length 28 01:50:28.431034 ARP, Reply 192.168.99.5 is-at fe:0e:0f:f8:7c:05, length 28 01:50:28.431035 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], seq 3593264529, ack 614710450, win 65160, options [mss 1460,sackOK,TS val 4198589191 ecr 2654895687,nop,wscale 7], length 0 01:50:28.431048 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.], ack 1, win 502, options [nop,nop,TS val 2654895687 ecr 4198589191], length 0 01:50:28.431683 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [P.], seq 1:82, ack 1, win 502, options [nop,nop,TS val 2654895688 ecr 4198589191], length 81: HTTP: GET / HTTP/1.1 01:50:28.431709 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [.], ack 82, win 509, options [nop,nop,TS val 4198589192 ecr 2654895688], length 0 01:50:28.434072 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [P.], seq 1:157, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr 2654895688], length 156: HTTP: HTTP/1.0 200 OK 01:50:28.434083 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.], ack 157, win 501, options [nop,nop,TS val 2654895690 ecr 4198589194], length 0 01:50:28.434166 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [P.], seq 157:1195, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr 2654895690], length 1038: HTTP 01:50:28.434171 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.], ack 1195, win 501, options [nop,nop,TS val 2654895690 ecr 4198589194], length 0 01:50:28.434221 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [F.], seq 1195, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr 2654895690], length 0 01:50:28.434669 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [F.], seq 82, ack 1196, win 501, options [nop,nop,TS val 2654895691 ecr 4198589194], length 0 01:50:28.434712 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [.], ack 83, win 509, options [nop,nop,TS val 4198589195 ecr 2654895691], length 0 01:50:33.158284 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S], seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236082988 ecr 0,nop,wscale 7], length 0 01:50:33.158429 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS val 4198593919 ecr 2236082988,nop,wscale 7], length 0 01:50:33.158496 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], seq 886133764, win 0, length 0 01:50:34.168530 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S], seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236083999 ecr 0,nop,wscale 7], length 0 01:50:34.168722 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS val 4198594929 ecr 2236082988,nop,wscale 7], length 0 01:50:34.168754 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS val 4198594929 ecr 2236082988,nop,wscale 7], length 0 01:50:34.168751 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], seq 886133764, win 0, length 0 01:50:34.168769 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], seq 886133764, win 0, length 0 01:50:36.216624 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS val 4198596977 ecr 2236082988,nop,wscale 7], length 0 01:50:36.216626 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S], seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236086047 ecr 0,nop,wscale 7], length 0 01:50:36.216678 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], seq 886133764, win 0, length 0 01:50:36.216690 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS val 4198596977 ecr 2236082988,nop,wscale 7], length 0 01:50:36.216693 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], seq 886133764, win 0, length 0 ^C 28 packets captured 28 packets received by filter 0 packets dropped by kernel [root@centos9s vagrant]# cat^C [root@centos9s vagrant]# cat /proc/net/nf_conntrack | grep 15280 ipv4 2 tcp 6 7 CLOSE src=192.168.99.5 dst=192.168.99.6 sport=15280 dport=8080 src=192.168.99.4 dst=192.168.99.6 sport=8080 dport=15280 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2 ipv4 2 tcp 6 53 SYN_RECV src=192.168.99.5 dst=192.168.99.4 sport=15280 dport=8080 src=192.168.99.4 dst=192.168.99.6 sport=8080 dport=1279 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip 2025-05-20 1:52 ` Duan Jiong @ 2025-05-20 13:27 ` Julian Anastasov 2025-05-20 13:44 ` Florian Westphal 2025-05-21 2:01 ` Duan Jiong 0 siblings, 2 replies; 8+ messages in thread From: Julian Anastasov @ 2025-05-20 13:27 UTC (permalink / raw) To: Duan Jiong; +Cc: pablo, netdev, lvs-devel Hello, On Tue, 20 May 2025, Duan Jiong wrote: > 1. setup environment > > [root@centos9s vagrant]# cat setup.sh > #!/bin/bash > > ip netns add server > ip link add svrh type veth peer name svr > ip link set svr netns server > ip link set svrh up > ip link set dev svrh address ee:ee:ee:ee:ee:ee > ip netns exec server ip link set svr up > ip netns exec server ip addr add 192.168.99.4/32 dev svr > ip netns exec server ip route add 169.254.1.1 dev svr scope link > ip netns exec server ip route add default via 169.254.1.1 dev svr > ip netns exec server ip neigh add 169.254.1.1 lladdr ee:ee:ee:ee:ee:ee > dev svr nud permanent > ip route add 192.168.99.4/32 dev svrh > > ip netns add client > ip link add clih type veth peer name cli > ip link set cli netns client > ip link set clih up > ip link set dev clih address ee:ee:ee:ee:ee:ee > ip netns exec client ip link set cli up > ip netns exec client ip addr add 192.168.99.5/32 dev cli > ip netns exec client ip route add 169.254.1.1 dev cli scope link > ip netns exec client ip route add default via 169.254.1.1 dev cli > ip netns exec client ip neigh add 169.254.1.1 lladdr ee:ee:ee:ee:ee:ee > dev cli nud permanent > ip route add 192.168.99.5/32 dev clih > > ip addr add 192.168.99.6/32 dev lo > ipvsadm -A -t 192.168.99.6:8080 -s rr > ipvsadm -a -t 192.168.99.6:8080 -r 192.168.99.4:8080 -m > > echo 1 > /proc/sys/net/ipv4/ip_forward > echo 1 > /proc/sys/net/ipv4/vs/conntrack > iptables -t nat -A POSTROUTING -p TCP -j MASQUERADE > > 2. start server > ip netns exec server python -m http.server 8080 > > 3. curl vip > ip netns exec client curl --local-port 15280 http://192.168.99.6:8080 > > 4. curl rs > ip netns exec client curl --local-port 15280 http://192.168.99.4:8080 > > Here are the ct rules for executing curl and the tcpdump capture. > > [root@centos9s vagrant]# tcpdump -s0 -nn -i clih > dropped privs to tcpdump > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > listening on clih, link-type EN10MB (Ethernet), snapshot length 262144 bytes > 01:50:14.328558 IP6 fe80::fc0e:fff:fef8:7c05 > ff02::2: ICMP6, router > solicitation, length 16 Client correctly connects to VIP: > 01:50:28.430769 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [S], > seq 614710449, win 64240, options [mss 1460,sackOK,TS val 2654895687 > ecr 0,nop,wscale 7], length 0 > 01:50:28.431026 ARP, Request who-has 192.168.99.5 tell 192.168.99.6, length 28 > 01:50:28.431034 ARP, Reply 192.168.99.5 is-at fe:0e:0f:f8:7c:05, length 28 > 01:50:28.431035 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], > seq 3593264529, ack 614710450, win 65160, options [mss 1460,sackOK,TS > val 4198589191 ecr 2654895687,nop,wscale 7], length 0 > 01:50:28.431048 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.], > ack 1, win 502, options [nop,nop,TS val 2654895687 ecr 4198589191], > length 0 > 01:50:28.431683 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [P.], > seq 1:82, ack 1, win 502, options [nop,nop,TS val 2654895688 ecr > 4198589191], length 81: HTTP: GET / HTTP/1.1 > 01:50:28.431709 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [.], > ack 82, win 509, options [nop,nop,TS val 4198589192 ecr 2654895688], > length 0 > 01:50:28.434072 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [P.], > seq 1:157, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr > 2654895688], length 156: HTTP: HTTP/1.0 200 OK > 01:50:28.434083 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.], > ack 157, win 501, options [nop,nop,TS val 2654895690 ecr 4198589194], > length 0 > 01:50:28.434166 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [P.], > seq 157:1195, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr > 2654895690], length 1038: HTTP > 01:50:28.434171 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.], > ack 1195, win 501, options [nop,nop,TS val 2654895690 ecr 4198589194], > length 0 > 01:50:28.434221 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [F.], > seq 1195, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr > 2654895690], length 0 > 01:50:28.434669 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [F.], > seq 82, ack 1196, win 501, options [nop,nop,TS val 2654895691 ecr > 4198589194], length 0 > 01:50:28.434712 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [.], > ack 83, win 509, options [nop,nop,TS val 4198589195 ecr 2654895691], > length 0 But the following packet is different from your initial posting. Why client connects directly to the real server? Is it allowed to have two conntracks with equal reply tuple 192.168.99.4:8080 -> 192.168.99.6:15280 and should we support such kind of setups? May be you'll need a function in ip_vs_nfct.c that ensures the packet is in reply direction and its original dest is the vaddr as you already check. You will need an alternative function in ip_vs.h when CONFIG_IP_VS_NFCT is not defined. See ip_vs_conntrack_enabled() for reference. You can not directly use nf_ functions in ip_vs_core.c > 01:50:33.158284 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S], > seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236082988 > ecr 0,nop,wscale 7], length 0 > 01:50:33.158429 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS > val 4198593919 ecr 2236082988,nop,wscale 7], length 0 > 01:50:33.158496 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], > seq 886133764, win 0, length 0 > 01:50:34.168530 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S], > seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236083999 > ecr 0,nop,wscale 7], length 0 > 01:50:34.168722 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS > val 4198594929 ecr 2236082988,nop,wscale 7], length 0 > 01:50:34.168754 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS > val 4198594929 ecr 2236082988,nop,wscale 7], length 0 > 01:50:34.168751 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], > seq 886133764, win 0, length 0 > 01:50:34.168769 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], > seq 886133764, win 0, length 0 > 01:50:36.216624 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS > val 4198596977 ecr 2236082988,nop,wscale 7], length 0 > 01:50:36.216626 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S], > seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236086047 > ecr 0,nop,wscale 7], length 0 > 01:50:36.216678 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], > seq 886133764, win 0, length 0 > 01:50:36.216690 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS > val 4198596977 ecr 2236082988,nop,wscale 7], length 0 > 01:50:36.216693 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], > seq 886133764, win 0, length 0 > ^C > 28 packets captured > 28 packets received by filter > 0 packets dropped by kernel > [root@centos9s vagrant]# cat^C > [root@centos9s vagrant]# cat /proc/net/nf_conntrack | grep 15280 > ipv4 2 tcp 6 7 CLOSE src=192.168.99.5 dst=192.168.99.6 > sport=15280 dport=8080 src=192.168.99.4 dst=192.168.99.6 sport=8080 > dport=15280 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 > zone=0 use=2 > ipv4 2 tcp 6 53 SYN_RECV src=192.168.99.5 dst=192.168.99.4 > sport=15280 dport=8080 src=192.168.99.4 dst=192.168.99.6 sport=8080 > dport=1279 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2 dport=1279 ? Not 15280 ? Is it from your test? Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip 2025-05-20 13:27 ` Julian Anastasov @ 2025-05-20 13:44 ` Florian Westphal 2025-05-21 2:04 ` Duan Jiong 2025-05-21 2:01 ` Duan Jiong 1 sibling, 1 reply; 8+ messages in thread From: Florian Westphal @ 2025-05-20 13:44 UTC (permalink / raw) To: Julian Anastasov; +Cc: Duan Jiong, pablo, netdev, lvs-devel Julian Anastasov <ja@ssi.bg> wrote: > But the following packet is different from your > initial posting. Why client connects directly to the real server? > Is it allowed to have two conntracks with equal reply tuple > 192.168.99.4:8080 -> 192.168.99.6:15280 and should we support > such kind of setups? I don't even see how it would work, if you allow C1 -> S C2 -> S ... in conntrack and you receive packet from S, does that need to go to C1 or C2? Such duplicate CT entries are free'd (refused) at nf_confirm ( conntrack table insertion) time. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip 2025-05-20 13:44 ` Florian Westphal @ 2025-05-21 2:04 ` Duan Jiong 0 siblings, 0 replies; 8+ messages in thread From: Duan Jiong @ 2025-05-21 2:04 UTC (permalink / raw) To: Florian Westphal; +Cc: Julian Anastasov, pablo, netdev, lvs-devel On Tue, May 20, 2025 at 9:45 PM Florian Westphal <fw@strlen.de> wrote: > > Julian Anastasov <ja@ssi.bg> wrote: > > But the following packet is different from your > > initial posting. Why client connects directly to the real server? > > Is it allowed to have two conntracks with equal reply tuple > > 192.168.99.4:8080 -> 192.168.99.6:15280 and should we support > > such kind of setups? > > I don't even see how it would work, if you allow > > C1 -> S > C2 -> S > > ... in conntrack and you receive packet from S, does that need to > go to C1 or C2? > > Such duplicate CT entries are free'd (refused) at nf_confirm ( > conntrack table insertion) time. iptables -t nat -A POSTROUTING -p TCP -j MASQUERADE Indeed, there is nothing wrong with this logic, but after I added the MASQUERADE rule, it seems that I did snat before confirm causing the source port to change ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip 2025-05-20 13:27 ` Julian Anastasov 2025-05-20 13:44 ` Florian Westphal @ 2025-05-21 2:01 ` Duan Jiong 1 sibling, 0 replies; 8+ messages in thread From: Duan Jiong @ 2025-05-21 2:01 UTC (permalink / raw) To: Julian Anastasov; +Cc: pablo, netdev, lvs-devel On Tue, May 20, 2025 at 9:28 PM Julian Anastasov <ja@ssi.bg> wrote: > > > Hello, > > On Tue, 20 May 2025, Duan Jiong wrote: > > > 1. setup environment > > > > [root@centos9s vagrant]# cat setup.sh > > #!/bin/bash > > > > ip netns add server > > ip link add svrh type veth peer name svr > > ip link set svr netns server > > ip link set svrh up > > ip link set dev svrh address ee:ee:ee:ee:ee:ee > > ip netns exec server ip link set svr up > > ip netns exec server ip addr add 192.168.99.4/32 dev svr > > ip netns exec server ip route add 169.254.1.1 dev svr scope link > > ip netns exec server ip route add default via 169.254.1.1 dev svr > > ip netns exec server ip neigh add 169.254.1.1 lladdr ee:ee:ee:ee:ee:ee > > dev svr nud permanent > > ip route add 192.168.99.4/32 dev svrh > > > > ip netns add client > > ip link add clih type veth peer name cli > > ip link set cli netns client > > ip link set clih up > > ip link set dev clih address ee:ee:ee:ee:ee:ee > > ip netns exec client ip link set cli up > > ip netns exec client ip addr add 192.168.99.5/32 dev cli > > ip netns exec client ip route add 169.254.1.1 dev cli scope link > > ip netns exec client ip route add default via 169.254.1.1 dev cli > > ip netns exec client ip neigh add 169.254.1.1 lladdr ee:ee:ee:ee:ee:ee > > dev cli nud permanent > > ip route add 192.168.99.5/32 dev clih > > > > ip addr add 192.168.99.6/32 dev lo > > ipvsadm -A -t 192.168.99.6:8080 -s rr > > ipvsadm -a -t 192.168.99.6:8080 -r 192.168.99.4:8080 -m > > > > echo 1 > /proc/sys/net/ipv4/ip_forward > > echo 1 > /proc/sys/net/ipv4/vs/conntrack > > iptables -t nat -A POSTROUTING -p TCP -j MASQUERADE > > > > 2. start server > > ip netns exec server python -m http.server 8080 > > > > 3. curl vip > > ip netns exec client curl --local-port 15280 http://192.168.99.6:8080 > > > > 4. curl rs > > ip netns exec client curl --local-port 15280 http://192.168.99.4:8080 > > > > Here are the ct rules for executing curl and the tcpdump capture. > > > > [root@centos9s vagrant]# tcpdump -s0 -nn -i clih > > dropped privs to tcpdump > > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > > listening on clih, link-type EN10MB (Ethernet), snapshot length 262144 bytes > > 01:50:14.328558 IP6 fe80::fc0e:fff:fef8:7c05 > ff02::2: ICMP6, router > > solicitation, length 16 > > Client correctly connects to VIP: > > > 01:50:28.430769 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [S], > > seq 614710449, win 64240, options [mss 1460,sackOK,TS val 2654895687 > > ecr 0,nop,wscale 7], length 0 > > 01:50:28.431026 ARP, Request who-has 192.168.99.5 tell 192.168.99.6, length 28 > > 01:50:28.431034 ARP, Reply 192.168.99.5 is-at fe:0e:0f:f8:7c:05, length 28 > > 01:50:28.431035 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], > > seq 3593264529, ack 614710450, win 65160, options [mss 1460,sackOK,TS > > val 4198589191 ecr 2654895687,nop,wscale 7], length 0 > > 01:50:28.431048 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.], > > ack 1, win 502, options [nop,nop,TS val 2654895687 ecr 4198589191], > > length 0 > > 01:50:28.431683 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [P.], > > seq 1:82, ack 1, win 502, options [nop,nop,TS val 2654895688 ecr > > 4198589191], length 81: HTTP: GET / HTTP/1.1 > > 01:50:28.431709 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [.], > > ack 82, win 509, options [nop,nop,TS val 4198589192 ecr 2654895688], > > length 0 > > 01:50:28.434072 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [P.], > > seq 1:157, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr > > 2654895688], length 156: HTTP: HTTP/1.0 200 OK > > 01:50:28.434083 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.], > > ack 157, win 501, options [nop,nop,TS val 2654895690 ecr 4198589194], > > length 0 > > 01:50:28.434166 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [P.], > > seq 157:1195, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr > > 2654895690], length 1038: HTTP > > 01:50:28.434171 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.], > > ack 1195, win 501, options [nop,nop,TS val 2654895690 ecr 4198589194], > > length 0 > > 01:50:28.434221 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [F.], > > seq 1195, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr > > 2654895690], length 0 > > 01:50:28.434669 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [F.], > > seq 82, ack 1196, win 501, options [nop,nop,TS val 2654895691 ecr > > 4198589194], length 0 > > 01:50:28.434712 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [.], > > ack 83, win 509, options [nop,nop,TS val 4198589195 ecr 2654895691], > > length 0 > > But the following packet is different from your > initial posting. Why client connects directly to the real server? when there is a problem accessing the vip, the first thing users may consider is to check whether the back-end service is normal or not > Is it allowed to have two conntracks with equal reply tuple > 192.168.99.4:8080 -> 192.168.99.6:15280 and should we support > such kind of setups? No, I don't think this needs to be supported, the tuple in the reply direction should be different, it's just that here ipvs mistakenly did snat > > May be you'll need a function in ip_vs_nfct.c that ensures > the packet is in reply direction and its original dest is the > vaddr as you already check. You will need an alternative > function in ip_vs.h when CONFIG_IP_VS_NFCT is not defined. > See ip_vs_conntrack_enabled() for reference. You can not directly > use nf_ functions in ip_vs_core.c > > > 01:50:33.158284 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S], > > seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236082988 > > ecr 0,nop,wscale 7], length 0 > > 01:50:33.158429 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], > > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS > > val 4198593919 ecr 2236082988,nop,wscale 7], length 0 > > 01:50:33.158496 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], > > seq 886133764, win 0, length 0 > > 01:50:34.168530 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S], > > seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236083999 > > ecr 0,nop,wscale 7], length 0 > > 01:50:34.168722 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], > > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS > > val 4198594929 ecr 2236082988,nop,wscale 7], length 0 > > 01:50:34.168754 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], > > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS > > val 4198594929 ecr 2236082988,nop,wscale 7], length 0 > > 01:50:34.168751 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], > > seq 886133764, win 0, length 0 > > 01:50:34.168769 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], > > seq 886133764, win 0, length 0 > > 01:50:36.216624 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], > > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS > > val 4198596977 ecr 2236082988,nop,wscale 7], length 0 > > 01:50:36.216626 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S], > > seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236086047 > > ecr 0,nop,wscale 7], length 0 > > 01:50:36.216678 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], > > seq 886133764, win 0, length 0 > > 01:50:36.216690 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.], > > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS > > val 4198596977 ecr 2236082988,nop,wscale 7], length 0 > > 01:50:36.216693 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R], > > seq 886133764, win 0, length 0 > > ^C > > 28 packets captured > > 28 packets received by filter > > 0 packets dropped by kernel > > [root@centos9s vagrant]# cat^C > > [root@centos9s vagrant]# cat /proc/net/nf_conntrack | grep 15280 > > ipv4 2 tcp 6 7 CLOSE src=192.168.99.5 dst=192.168.99.6 > > sport=15280 dport=8080 src=192.168.99.4 dst=192.168.99.6 sport=8080 > > dport=15280 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 > > zone=0 use=2 > > ipv4 2 tcp 6 53 SYN_RECV src=192.168.99.5 dst=192.168.99.4 > > sport=15280 dport=8080 src=192.168.99.4 dst=192.168.99.6 sport=8080 > > dport=1279 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2 > > dport=1279 ? Not 15280 ? Is it from your test? Yes, It's because I added the iptables rule earlier, if I don't add this the source port will remain at 15280, and then the syn packet will be dropped in the __nf_conntrack_confirm function. > > Regards > > -- > Julian Anastasov <ja@ssi.bg> > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip 2025-05-19 10:32 [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip Duan Jiong 2025-05-19 20:11 ` Julian Anastasov @ 2025-05-20 7:14 ` kernel test robot 1 sibling, 0 replies; 8+ messages in thread From: kernel test robot @ 2025-05-20 7:14 UTC (permalink / raw) To: Duan Jiong, ja, pablo; +Cc: oe-kbuild-all, netdev, Duan Jiong Hi Duan, kernel test robot noticed the following build errors: [auto build test ERROR on netfilter-nf/main] [also build test ERROR on horms-ipvs/master linus/master v6.15-rc7 next-20250516] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Duan-Jiong/ipvs-skip-ipvs-snat-processing-when-packet-dst-is-not-vip/20250519-183312 base: https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git main patch link: https://lore.kernel.org/r/20250519103203.17255-1-djduanjiong%40gmail.com patch subject: [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip config: i386-randconfig-014-20250520 (https://download.01.org/0day-ci/archive/20250520/202505201507.zvDoaADX-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250520/202505201507.zvDoaADX-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202505201507.zvDoaADX-lkp@intel.com/ All error/warnings (new ones prefixed by >>): net/netfilter/ipvs/ip_vs_core.c: In function 'handle_response': >> net/netfilter/ipvs/ip_vs_core.c:1263:32: error: storage size of 'ctinfo' isn't known 1263 | enum ip_conntrack_info ctinfo; | ^~~~~~ >> net/netfilter/ipvs/ip_vs_core.c:1264:30: error: implicit declaration of function 'nf_ct_get' [-Werror=implicit-function-declaration] 1264 | struct nf_conn *ct = nf_ct_get(skb, &ctinfo); | ^~~~~~~~~ >> net/netfilter/ipvs/ip_vs_core.c:1278:24: error: invalid use of undefined type 'struct nf_conn' 1278 | &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3, | ^~ >> net/netfilter/ipvs/ip_vs_core.c:1278:36: error: 'IP_CT_DIR_ORIGINAL' undeclared (first use in this function) 1278 | &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3, | ^~~~~~~~~~~~~~~~~~ net/netfilter/ipvs/ip_vs_core.c:1278:36: note: each undeclared identifier is reported only once for each function it appears in >> net/netfilter/ipvs/ip_vs_core.c:1263:32: warning: unused variable 'ctinfo' [-Wunused-variable] 1263 | enum ip_conntrack_info ctinfo; | ^~~~~~ net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_in_icmp': net/netfilter/ipvs/ip_vs_core.c:1602:15: warning: variable 'outer_proto' set but not used [-Wunused-but-set-variable] 1602 | char *outer_proto = "IPIP"; | ^~~~~~~~~~~ cc1: some warnings being treated as errors vim +1263 net/netfilter/ipvs/ip_vs_core.c 1254 1255 /* Handle response packets: rewrite addresses and send away... 1256 */ 1257 static unsigned int 1258 handle_response(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, 1259 struct ip_vs_conn *cp, struct ip_vs_iphdr *iph, 1260 unsigned int hooknum) 1261 { 1262 struct ip_vs_protocol *pp = pd->pp; > 1263 enum ip_conntrack_info ctinfo; > 1264 struct nf_conn *ct = nf_ct_get(skb, &ctinfo); 1265 1266 if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) 1267 goto after_nat; 1268 1269 IP_VS_DBG_PKT(11, af, pp, skb, iph->off, "Outgoing packet"); 1270 1271 if (skb_ensure_writable(skb, iph->len)) 1272 goto drop; 1273 1274 /* mangle the packet */ 1275 if (ct != NULL && 1276 hooknum == NF_INET_FORWARD && 1277 !ip_vs_addr_equal(af, > 1278 &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3, 1279 &cp->vaddr)) 1280 return NF_ACCEPT; 1281 if (pp->snat_handler && 1282 !SNAT_CALL(pp->snat_handler, skb, pp, cp, iph)) 1283 goto drop; 1284 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-05-21 2:04 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-19 10:32 [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip Duan Jiong 2025-05-19 20:11 ` Julian Anastasov 2025-05-20 1:52 ` Duan Jiong 2025-05-20 13:27 ` Julian Anastasov 2025-05-20 13:44 ` Florian Westphal 2025-05-21 2:04 ` Duan Jiong 2025-05-21 2:01 ` Duan Jiong 2025-05-20 7:14 ` kernel test robot
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.