* [PATCH net] ipv6: gro: flush instead of assuming different flows on hop_limit mismatch
@ 2022-01-21  1:19 Jakub Kicinski
  2022-01-21  8:55 ` Eric Dumazet
  2022-01-24 19:23 ` kernel test robot
  0 siblings, 2 replies; 6+ messages in thread
From: Jakub Kicinski @ 2022-01-21  1:19 UTC (permalink / raw)
  To: davem, edumazet; +Cc: dsahern, pabeni, herbert, netdev, Jakub Kicinski
IPv6 GRO considers packets to belong to different flows when their
hop_limit is different. This seems counter-intuitive, the flow is
the same. hop_limit may vary because of various bugs or hacks but
that doesn't mean it's okay for GRO to reorder packets.
Practical impact of this problem on overall TCP performance
is unclear, but TCP itself detects this reordering and bumps
TCPSACKReorder resulting in user complaints.
Note that the code plays an easy to miss trick by upcasting next_hdr
to a u16 pointer and compares next_hdr and hop_limit in one go.
Coalesce the flush setting to reduce the instruction count a touch.
Fixes: 787e92083601 ("ipv6: Add GRO support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 net/ipv6/ip6_offload.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index b29e9ba5e113..570071a87e71 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -249,7 +249,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
 		 if ((first_word & htonl(0xF00FFFFF)) ||
 		     !ipv6_addr_equal(&iph->saddr, &iph2->saddr) ||
 		     !ipv6_addr_equal(&iph->daddr, &iph2->daddr) ||
-		     *(u16 *)&iph->nexthdr != *(u16 *)&iph2->nexthdr) {
+		     iph->nexthdr != iph2->nexthdr) {
 not_same_flow:
 			NAPI_GRO_CB(p)->same_flow = 0;
 			continue;
@@ -260,8 +260,9 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
 				goto not_same_flow;
 		}
 		/* flush if Traffic Class fields are different */
-		NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF00000));
-		NAPI_GRO_CB(p)->flush |= flush;
+		NAPI_GRO_CB(p)->flush |= flush |
+					 !!((first_word & htonl(0x0FF00000)) |
+					    (iph->hop_limit ^ iph2->hop_limit));
 
 		/* If the previous IP ID value was based on an atomic
 		 * datagram we can overwrite the value and ignore it.
-- 
2.31.1
^ permalink raw reply related	[flat|nested] 6+ messages in thread* Re: [PATCH net] ipv6: gro: flush instead of assuming different flows on hop_limit mismatch 2022-01-21 1:19 [PATCH net] ipv6: gro: flush instead of assuming different flows on hop_limit mismatch Jakub Kicinski @ 2022-01-21 8:55 ` Eric Dumazet 2022-01-21 15:15 ` Jakub Kicinski 2022-01-24 19:23 ` kernel test robot 1 sibling, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2022-01-21 8:55 UTC (permalink / raw) To: Jakub Kicinski; +Cc: David Miller, David Ahern, Paolo Abeni, Herbert Xu, netdev On Thu, Jan 20, 2022 at 5:19 PM Jakub Kicinski <kuba@kernel.org> wrote: > > IPv6 GRO considers packets to belong to different flows when their > hop_limit is different. This seems counter-intuitive, the flow is > the same. hop_limit may vary because of various bugs or hacks but > that doesn't mean it's okay for GRO to reorder packets. > > Practical impact of this problem on overall TCP performance > is unclear, but TCP itself detects this reordering and bumps > TCPSACKReorder resulting in user complaints. > > Note that the code plays an easy to miss trick by upcasting next_hdr > to a u16 pointer and compares next_hdr and hop_limit in one go. > Coalesce the flush setting to reduce the instruction count a touch. > There are downsides to this change. We had an internal discussion at Google years ago about this difference in behavior of IPv6/IPv4 We came to the conclusion the IPv6 behavior was better for our needs (and we do not care much about IPv4 GRO, since Google DC traffic is 99.99% IPv6) In our case, we wanted to keep this 'ipv6 feature' because we were experimenting with the idea of sending TSO packets with different flowlabels, to use different paths in the network, to increase nominal throughput for WAN flows (one flow would use multiple fiber links) The issue with 'ipv4 gro style about ttl mismatch' was that because of small differences in RTT for each path, a receiver could very well receive mixed packets. Even without playing with ECMP hashes, this scenario can happen if the sender uses a bonding device in balance-rr mode. After your change, GRO would be defeated and deliver one MSS at a time to TCP stack. We implemented SACK compress in TCP stack to avoid extra SACK being sent by the receiver We have an extension of this SACK compression for TCP flows terminated by Google servers, since modern TCP stacks do not need the old rule of TCP_FASTRETRANS_THRESH DUPACK to start retransmits. Something like this pseudo code: diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index dc49a3d551eb919baf5ad812ef21698c5c7b9679..d72554ab70fd2e16ed60dc78a905f4aa1414f8c9 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5494,7 +5494,8 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible) } if (tp->dup_ack_counter < TCP_FASTRETRANS_THRESH) { tp->dup_ack_counter++; - goto send_now; + if (peer_is_using_old_rule_about_fastretrans(tp)) + goto send_now; } tp->compressed_ack++; if (hrtimer_is_queued(&tp->compressed_ack_timer)) > Fixes: 787e92083601 ("ipv6: Add GRO support") > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > net/ipv6/ip6_offload.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c > index b29e9ba5e113..570071a87e71 100644 > --- a/net/ipv6/ip6_offload.c > +++ b/net/ipv6/ip6_offload.c > @@ -249,7 +249,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, > if ((first_word & htonl(0xF00FFFFF)) || > !ipv6_addr_equal(&iph->saddr, &iph2->saddr) || > !ipv6_addr_equal(&iph->daddr, &iph2->daddr) || > - *(u16 *)&iph->nexthdr != *(u16 *)&iph2->nexthdr) { > + iph->nexthdr != iph2->nexthdr) { > not_same_flow: > NAPI_GRO_CB(p)->same_flow = 0; > continue; > @@ -260,8 +260,9 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, > goto not_same_flow; > } > /* flush if Traffic Class fields are different */ > - NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF00000)); > - NAPI_GRO_CB(p)->flush |= flush; > + NAPI_GRO_CB(p)->flush |= flush | > + !!((first_word & htonl(0x0FF00000)) | > + (iph->hop_limit ^ iph2->hop_limit)); > > /* If the previous IP ID value was based on an atomic > * datagram we can overwrite the value and ignore it. > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] ipv6: gro: flush instead of assuming different flows on hop_limit mismatch 2022-01-21 8:55 ` Eric Dumazet @ 2022-01-21 15:15 ` Jakub Kicinski 2022-01-21 16:37 ` Eric Dumazet 0 siblings, 1 reply; 6+ messages in thread From: Jakub Kicinski @ 2022-01-21 15:15 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, David Ahern, Paolo Abeni, Herbert Xu, netdev, ycheng, ncardwell On Fri, 21 Jan 2022 00:55:08 -0800 Eric Dumazet wrote: > On Thu, Jan 20, 2022 at 5:19 PM Jakub Kicinski <kuba@kernel.org> wrote: > > IPv6 GRO considers packets to belong to different flows when their > > hop_limit is different. This seems counter-intuitive, the flow is > > the same. hop_limit may vary because of various bugs or hacks but > > that doesn't mean it's okay for GRO to reorder packets. > > > > Practical impact of this problem on overall TCP performance > > is unclear, but TCP itself detects this reordering and bumps > > TCPSACKReorder resulting in user complaints. > > > > Note that the code plays an easy to miss trick by upcasting next_hdr > > to a u16 pointer and compares next_hdr and hop_limit in one go. > > Coalesce the flush setting to reduce the instruction count a touch. > > There are downsides to this change. > > We had an internal discussion at Google years ago about this > difference in behavior of IPv6/IPv4 > > We came to the conclusion the IPv6 behavior was better for our needs > (and we do not care > much about IPv4 GRO, since Google DC traffic is 99.99% IPv6) > > In our case, we wanted to keep this 'ipv6 feature' because we were > experimenting with the idea of sending > TSO packets with different flowlabels, to use different paths in the > network, to increase nominal > throughput for WAN flows (one flow would use multiple fiber links) > > The issue with 'ipv4 gro style about ttl mismatch' was that because of > small differences in RTT for each path, > a receiver could very well receive mixed packets. > > Even without playing with ECMP hashes, this scenario can happen if the sender > uses a bonding device in balance-rr mode. > > After your change, GRO would be defeated and deliver one MSS at a time > to TCP stack. Indeed. Sounds like we're trading correctness for an optimization of a questionable practical application, but our motivation isn't 100% pure either [1] so whatever way we can fix this is fine by me :) [1] We have some shenanigans that bump TTL to indicate re-transmitted packets so we can identify them in the network. > We implemented SACK compress in TCP stack to avoid extra SACK being > sent by the receiver > > We have an extension of this SACK compression for TCP flows terminated > by Google servers, > since modern TCP stacks do not need the old rule of TCP_FASTRETRANS_THRESH > DUPACK to start retransmits. > > Something like this pseudo code: > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index dc49a3d551eb919baf5ad812ef21698c5c7b9679..d72554ab70fd2e16ed60dc78a905f4aa1414f8c9 > 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -5494,7 +5494,8 @@ static void __tcp_ack_snd_check(struct sock *sk, > int ofo_possible) > } > if (tp->dup_ack_counter < TCP_FASTRETRANS_THRESH) { > tp->dup_ack_counter++; > - goto send_now; > + if (peer_is_using_old_rule_about_fastretrans(tp)) > + goto send_now; > } > tp->compressed_ack++; > if (hrtimer_is_queued(&tp->compressed_ack_timer)) > Is this something we could upstream / test? peer_is_using.. does not exist upstream. Coincidentally, speaking of sending SACKs, my initial testing was on 5.12 kernels and there I saw what appeared to a lay person (me) like missing ACKs. Receiver would receive segments: _AB_C_D_E where _ indicates loss. It'd SACK A, then generate the next SACK after E (SACKing C D E), sender would rexmit A which makes receiver ACK all the way to the end of B. Now sender thinks B arrived after CDE because it was never sacked. Perhaps it was fixed by commit a29cb6914681 ("net: tcp better handling of reordering then loss cases").. or it's a result of some out-of-tree hack. I thought I'd mention it tho in case it immediately rings a bell for anyone. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] ipv6: gro: flush instead of assuming different flows on hop_limit mismatch 2022-01-21 15:15 ` Jakub Kicinski @ 2022-01-21 16:37 ` Eric Dumazet 2022-01-25 0:02 ` Jakub Kicinski 0 siblings, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2022-01-21 16:37 UTC (permalink / raw) To: Jakub Kicinski Cc: David Miller, David Ahern, Paolo Abeni, Herbert Xu, netdev, Yuchung Cheng, Neal Cardwell On Fri, Jan 21, 2022 at 7:15 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 21 Jan 2022 00:55:08 -0800 Eric Dumazet wrote: > > On Thu, Jan 20, 2022 at 5:19 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > IPv6 GRO considers packets to belong to different flows when their > > > hop_limit is different. This seems counter-intuitive, the flow is > > > the same. hop_limit may vary because of various bugs or hacks but > > > that doesn't mean it's okay for GRO to reorder packets. > > > > > > Practical impact of this problem on overall TCP performance > > > is unclear, but TCP itself detects this reordering and bumps > > > TCPSACKReorder resulting in user complaints. > > > > > > Note that the code plays an easy to miss trick by upcasting next_hdr > > > to a u16 pointer and compares next_hdr and hop_limit in one go. > > > Coalesce the flush setting to reduce the instruction count a touch. > > > > There are downsides to this change. > > > > We had an internal discussion at Google years ago about this > > difference in behavior of IPv6/IPv4 > > > > We came to the conclusion the IPv6 behavior was better for our needs > > (and we do not care > > much about IPv4 GRO, since Google DC traffic is 99.99% IPv6) > > > > In our case, we wanted to keep this 'ipv6 feature' because we were > > experimenting with the idea of sending > > TSO packets with different flowlabels, to use different paths in the > > network, to increase nominal > > throughput for WAN flows (one flow would use multiple fiber links) > > > > The issue with 'ipv4 gro style about ttl mismatch' was that because of > > small differences in RTT for each path, > > a receiver could very well receive mixed packets. > > > > Even without playing with ECMP hashes, this scenario can happen if the sender > > uses a bonding device in balance-rr mode. > > > > After your change, GRO would be defeated and deliver one MSS at a time > > to TCP stack. > > Indeed. Sounds like we're trading correctness for an optimization of a > questionable practical application, but our motivation isn't 100% pure > either [1] so whatever way we can fix this is fine by me :) > > [1] We have some shenanigans that bump TTL to indicate re-transmitted > packets so we can identify them in the network. > > > We implemented SACK compress in TCP stack to avoid extra SACK being > > sent by the receiver > > > > We have an extension of this SACK compression for TCP flows terminated > > by Google servers, > > since modern TCP stacks do not need the old rule of TCP_FASTRETRANS_THRESH > > DUPACK to start retransmits. > > > > Something like this pseudo code: > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > index dc49a3d551eb919baf5ad812ef21698c5c7b9679..d72554ab70fd2e16ed60dc78a905f4aa1414f8c9 > > 100644 > > --- a/net/ipv4/tcp_input.c > > +++ b/net/ipv4/tcp_input.c > > @@ -5494,7 +5494,8 @@ static void __tcp_ack_snd_check(struct sock *sk, > > int ofo_possible) > > } > > if (tp->dup_ack_counter < TCP_FASTRETRANS_THRESH) { > > tp->dup_ack_counter++; > > - goto send_now; > > + if (peer_is_using_old_rule_about_fastretrans(tp)) > > + goto send_now; > > } > > tp->compressed_ack++; > > if (hrtimer_is_queued(&tp->compressed_ack_timer)) > > > > Is this something we could upstream / test? peer_is_using.. does not > exist upstream. Sure, because we do not have a standardized way (at SYN SYNACK time) to advertise that the stack is not 10 years old. This could be a per net-ns sysctl, or a per socket flag, or a per cgroup flag. In our case, we do negotiate special TCP options, and allow these options only from internal communications. (So we store this private bit in the socket itself) > > > Coincidentally, speaking of sending SACKs, my initial testing was on > 5.12 kernels and there I saw what appeared to a lay person (me) like > missing ACKs. Receiver would receive segments: > > _AB_C_D_E > > where _ indicates loss. It'd SACK A, then generate the next SACK after E > (SACKing C D E), sender would rexmit A which makes receiver ACK all > the way to the end of B. Now sender thinks B arrived after CDE because > it was never sacked. > > Perhaps it was fixed by commit a29cb6914681 ("net: tcp better handling > of reordering then loss cases").. or it's a result of some out-of-tree > hack. I thought I'd mention it tho in case it immediately rings a bell > for anyone. Could all the missing SACK have been lost ? Writing a packetdrill test for this scenario should not be too hard. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] ipv6: gro: flush instead of assuming different flows on hop_limit mismatch 2022-01-21 16:37 ` Eric Dumazet @ 2022-01-25 0:02 ` Jakub Kicinski 0 siblings, 0 replies; 6+ messages in thread From: Jakub Kicinski @ 2022-01-25 0:02 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, David Ahern, Paolo Abeni, Herbert Xu, netdev, Yuchung Cheng, Neal Cardwell Sorry for the delay I had to do some homework and more tests. On Fri, 21 Jan 2022 08:37:12 -0800 Eric Dumazet wrote: > On Fri, Jan 21, 2022 at 7:15 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > We implemented SACK compress in TCP stack to avoid extra SACK being > > > sent by the receiver > > > > > > We have an extension of this SACK compression for TCP flows terminated > > > by Google servers, > > > since modern TCP stacks do not need the old rule of TCP_FASTRETRANS_THRESH > > > DUPACK to start retransmits. > > > > > > Something like this pseudo code: > > > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > > index dc49a3d551eb919baf5ad812ef21698c5c7b9679..d72554ab70fd2e16ed60dc78a905f4aa1414f8c9 > > > 100644 > > > --- a/net/ipv4/tcp_input.c > > > +++ b/net/ipv4/tcp_input.c > > > @@ -5494,7 +5494,8 @@ static void __tcp_ack_snd_check(struct sock *sk, > > > int ofo_possible) > > > } > > > if (tp->dup_ack_counter < TCP_FASTRETRANS_THRESH) { > > > tp->dup_ack_counter++; > > > - goto send_now; > > > + if (peer_is_using_old_rule_about_fastretrans(tp)) > > > + goto send_now; > > > } > > > tp->compressed_ack++; > > > if (hrtimer_is_queued(&tp->compressed_ack_timer)) > > > > > > > Is this something we could upstream / test? peer_is_using.. does not > > exist upstream. > > Sure, because we do not have a standardized way (at SYN SYNACK time) > to advertise > that the stack is not 10 years old. > > This could be a per net-ns sysctl, or a per socket flag, or a per cgroup flag. > > In our case, we do negotiate special TCP options, and allow these options > only from internal communications. > > (So we store this private bit in the socket itself) This does not fix the problem, unfortunately. I still see TCP detecting reordering based on SACK if re-transmits have higher TTL. > > Coincidentally, speaking of sending SACKs, my initial testing was on > > 5.12 kernels and there I saw what appeared to a lay person (me) like > > missing ACKs. Receiver would receive segments: > > > > _AB_C_D_E > > > > where _ indicates loss. It'd SACK A, then generate the next SACK after E > > (SACKing C D E), sender would rexmit A which makes receiver ACK all > > the way to the end of B. Now sender thinks B arrived after CDE because > > it was never sacked. > > > > Perhaps it was fixed by commit a29cb6914681 ("net: tcp better handling > > of reordering then loss cases").. or it's a result of some out-of-tree > > hack. I thought I'd mention it tho in case it immediately rings a bell > > for anyone. > > Could all the missing SACK have been lost ? I had tcpdump on both ends, but I can't repro any more with the GRO fix applied. Maybe it was also related to that. Somehow. > Writing a packetdrill test for this scenario should not be too hard. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] ipv6: gro: flush instead of assuming different flows on hop_limit mismatch 2022-01-21 1:19 [PATCH net] ipv6: gro: flush instead of assuming different flows on hop_limit mismatch Jakub Kicinski 2022-01-21 8:55 ` Eric Dumazet @ 2022-01-24 19:23 ` kernel test robot 1 sibling, 0 replies; 6+ messages in thread From: kernel test robot @ 2022-01-24 19:23 UTC (permalink / raw) To: Jakub Kicinski, davem, edumazet Cc: kbuild-all, dsahern, pabeni, herbert, netdev, Jakub Kicinski Hi Jakub, I love your patch! Perhaps something to improve: [auto build test WARNING on net/master] url: https://github.com/0day-ci/linux/commits/Jakub-Kicinski/ipv6-gro-flush-instead-of-assuming-different-flows-on-hop_limit-mismatch/20220121-092033 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 57afdc0aab094b4c811b3fe030b2567812a495f3 config: x86_64-randconfig-s022 (https://download.01.org/0day-ci/archive/20220125/202201250210.roaIok2H-lkp@intel.com/config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce: # apt-get install sparse # sparse version: v0.6.4-dirty # https://github.com/0day-ci/linux/commit/6f8f3e541288381a67df8b670068d5add231d082 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Jakub-Kicinski/ipv6-gro-flush-instead-of-assuming-different-flows-on-hop_limit-mismatch/20220121-092033 git checkout 6f8f3e541288381a67df8b670068d5add231d082 # save the config file to linux build tree mkdir build_dir make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=x86_64 SHELL=/bin/bash net/ipv6/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> sparse warnings: (new ones prefixed by >>) >> net/ipv6/ip6_offload.c:264:57: sparse: sparse: restricted __be32 degrades to integer >> net/ipv6/ip6_offload.c:263:48: sparse: sparse: dubious: x | !y vim +264 net/ipv6/ip6_offload.c 182 183 INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, 184 struct sk_buff *skb) 185 { 186 const struct net_offload *ops; 187 struct sk_buff *pp = NULL; 188 struct sk_buff *p; 189 struct ipv6hdr *iph; 190 unsigned int nlen; 191 unsigned int hlen; 192 unsigned int off; 193 u16 flush = 1; 194 int proto; 195 196 off = skb_gro_offset(skb); 197 hlen = off + sizeof(*iph); 198 iph = skb_gro_header_fast(skb, off); 199 if (skb_gro_header_hard(skb, hlen)) { 200 iph = skb_gro_header_slow(skb, hlen, off); 201 if (unlikely(!iph)) 202 goto out; 203 } 204 205 skb_set_network_header(skb, off); 206 skb_gro_pull(skb, sizeof(*iph)); 207 skb_set_transport_header(skb, skb_gro_offset(skb)); 208 209 flush += ntohs(iph->payload_len) != skb_gro_len(skb); 210 211 proto = iph->nexthdr; 212 ops = rcu_dereference(inet6_offloads[proto]); 213 if (!ops || !ops->callbacks.gro_receive) { 214 __pskb_pull(skb, skb_gro_offset(skb)); 215 skb_gro_frag0_invalidate(skb); 216 proto = ipv6_gso_pull_exthdrs(skb, proto); 217 skb_gro_pull(skb, -skb_transport_offset(skb)); 218 skb_reset_transport_header(skb); 219 __skb_push(skb, skb_gro_offset(skb)); 220 221 ops = rcu_dereference(inet6_offloads[proto]); 222 if (!ops || !ops->callbacks.gro_receive) 223 goto out; 224 225 iph = ipv6_hdr(skb); 226 } 227 228 NAPI_GRO_CB(skb)->proto = proto; 229 230 flush--; 231 nlen = skb_network_header_len(skb); 232 233 list_for_each_entry(p, head, list) { 234 const struct ipv6hdr *iph2; 235 __be32 first_word; /* <Version:4><Traffic_Class:8><Flow_Label:20> */ 236 237 if (!NAPI_GRO_CB(p)->same_flow) 238 continue; 239 240 iph2 = (struct ipv6hdr *)(p->data + off); 241 first_word = *(__be32 *)iph ^ *(__be32 *)iph2; 242 243 /* All fields must match except length and Traffic Class. 244 * XXX skbs on the gro_list have all been parsed and pulled 245 * already so we don't need to compare nlen 246 * (nlen != (sizeof(*iph2) + ipv6_exthdrs_len(iph2, &ops))) 247 * memcmp() alone below is sufficient, right? 248 */ 249 if ((first_word & htonl(0xF00FFFFF)) || 250 !ipv6_addr_equal(&iph->saddr, &iph2->saddr) || 251 !ipv6_addr_equal(&iph->daddr, &iph2->daddr) || 252 iph->nexthdr != iph2->nexthdr) { 253 not_same_flow: 254 NAPI_GRO_CB(p)->same_flow = 0; 255 continue; 256 } 257 if (unlikely(nlen > sizeof(struct ipv6hdr))) { 258 if (memcmp(iph + 1, iph2 + 1, 259 nlen - sizeof(struct ipv6hdr))) 260 goto not_same_flow; 261 } 262 /* flush if Traffic Class fields are different */ > 263 NAPI_GRO_CB(p)->flush |= flush | > 264 !!((first_word & htonl(0x0FF00000)) | 265 (iph->hop_limit ^ iph2->hop_limit)); 266 267 /* If the previous IP ID value was based on an atomic 268 * datagram we can overwrite the value and ignore it. 269 */ 270 if (NAPI_GRO_CB(skb)->is_atomic) 271 NAPI_GRO_CB(p)->flush_id = 0; 272 } 273 274 NAPI_GRO_CB(skb)->is_atomic = true; 275 NAPI_GRO_CB(skb)->flush |= flush; 276 277 skb_gro_postpull_rcsum(skb, iph, nlen); 278 279 pp = indirect_call_gro_receive_l4(tcp6_gro_receive, udp6_gro_receive, 280 ops->callbacks.gro_receive, head, skb); 281 282 out: 283 skb_gro_flush_final(skb, pp, flush); 284 285 return pp; 286 } 287 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-01-25 3:42 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-01-21 1:19 [PATCH net] ipv6: gro: flush instead of assuming different flows on hop_limit mismatch Jakub Kicinski 2022-01-21 8:55 ` Eric Dumazet 2022-01-21 15:15 ` Jakub Kicinski 2022-01-21 16:37 ` Eric Dumazet 2022-01-25 0:02 ` Jakub Kicinski 2022-01-24 19:23 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).