netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fw: [Bug 79891] New: Router causes TCP retransmits for windows hosts after "ip_forward: fix inverted local_df test"
@ 2014-07-10 13:44 Stephen Hemminger
  2014-07-10 17:11 ` Cong Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Hemminger @ 2014-07-10 13:44 UTC (permalink / raw)
  To: netdev



Begin forwarded message:

Date: Thu, 10 Jul 2014 06:20:14 -0700
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: [Bug 79891] New: Router causes TCP retransmits for windows hosts after "ip_forward: fix inverted local_df test"


https://bugzilla.kernel.org/show_bug.cgi?id=79891

            Bug ID: 79891
           Summary: Router causes TCP retransmits for windows hosts after
                    "ip_forward: fix inverted local_df test"
           Product: Networking
           Version: 2.5
    Kernel Version: 3.2.60
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: IPV4
          Assignee: shemminger@linux-foundation.org
          Reporter: tm@del.bg
        Regression: No

Created attachment 142651
  --> https://bugzilla.kernel.org/attachment.cgi?id=142651&action=edit
TCP retransmissions from a windows host

After upgrading a router to Linux 3.2.60 most windows machines behind it
started experiencing connection stalls. Downgrade to 3.2.59 resolved the
problem. Using git bisect I pinpointed it to
"59d9f389df3cdf72833d5ee17c3fe959b6bdc792 is the first bad commit", which
entered the kernel from here
http://marc.info/?l=linux-netdev&m=139949081418806&w=2

No connection problems for Linux hosts at all – only windows. Please, find
attached tcpdump packet capture demonstrating the bad behaviour.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fw: [Bug 79891] New: Router causes TCP retransmits for windows hosts after "ip_forward: fix inverted local_df test"
  2014-07-10 13:44 Fw: [Bug 79891] New: Router causes TCP retransmits for windows hosts after "ip_forward: fix inverted local_df test" Stephen Hemminger
@ 2014-07-10 17:11 ` Cong Wang
  2014-07-11 18:11   ` Ben Hutchings
  0 siblings, 1 reply; 7+ messages in thread
From: Cong Wang @ 2014-07-10 17:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

On Thu, Jul 10, 2014 at 6:44 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> After upgrading a router to Linux 3.2.60 most windows machines behind it
> started experiencing connection stalls. Downgrade to 3.2.59 resolved the
> problem. Using git bisect I pinpointed it to
> "59d9f389df3cdf72833d5ee17c3fe959b6bdc792 is the first bad commit", which
> entered the kernel from here
> http://marc.info/?l=linux-netdev&m=139949081418806&w=2
>

This commit should have been reverted for older kernels like 3.2.y.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fw: [Bug 79891] New: Router causes TCP retransmits for windows hosts after "ip_forward: fix inverted local_df test"
  2014-07-10 17:11 ` Cong Wang
@ 2014-07-11 18:11   ` Ben Hutchings
  2014-07-11 18:38     ` Cong Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Ben Hutchings @ 2014-07-11 18:11 UTC (permalink / raw)
  To: Cong Wang; +Cc: Stephen Hemminger, netdev, David Miller, stable

[-- Attachment #1: Type: text/plain, Size: 820 bytes --]

On Thu, 2014-07-10 at 10:11 -0700, Cong Wang wrote:
> On Thu, Jul 10, 2014 at 6:44 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > After upgrading a router to Linux 3.2.60 most windows machines behind it
> > started experiencing connection stalls. Downgrade to 3.2.59 resolved the
> > problem. Using git bisect I pinpointed it to
> > "59d9f389df3cdf72833d5ee17c3fe959b6bdc792 is the first bad commit", which
> > entered the kernel from here
> > http://marc.info/?l=linux-netdev&m=139949081418806&w=2
> >
> 
> This commit should have been reverted for older kernels like 3.2.y.

Really?  We already had fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in
forwarding path") backported in 3.2.57.

Ben.

-- 
Ben Hutchings
To err is human; to really foul things up requires a computer.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fw: [Bug 79891] New: Router causes TCP retransmits for windows hosts after "ip_forward: fix inverted local_df test"
  2014-07-11 18:11   ` Ben Hutchings
@ 2014-07-11 18:38     ` Cong Wang
  2014-07-11 19:01       ` Ben Hutchings
  0 siblings, 1 reply; 7+ messages in thread
From: Cong Wang @ 2014-07-11 18:38 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Stephen Hemminger, netdev, David Miller, stable

On Fri, Jul 11, 2014 at 11:11 AM, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Thu, 2014-07-10 at 10:11 -0700, Cong Wang wrote:
>> On Thu, Jul 10, 2014 at 6:44 AM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>> >
>> > After upgrading a router to Linux 3.2.60 most windows machines behind it
>> > started experiencing connection stalls. Downgrade to 3.2.59 resolved the
>> > problem. Using git bisect I pinpointed it to
>> > "59d9f389df3cdf72833d5ee17c3fe959b6bdc792 is the first bad commit", which
>> > entered the kernel from here
>> > http://marc.info/?l=linux-netdev&m=139949081418806&w=2
>> >
>>
>> This commit should have been reverted for older kernels like 3.2.y.
>
> Really?  We already had fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in
> forwarding path") backported in 3.2.57.

I haven't read the code, but according to a previous discussion it sounds
like that should be reverted:

http://lists.openwall.net/netdev/2014/06/11/67

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fw: [Bug 79891] New: Router causes TCP retransmits for windows hosts after "ip_forward: fix inverted local_df test"
  2014-07-11 18:38     ` Cong Wang
@ 2014-07-11 19:01       ` Ben Hutchings
  2014-07-11 19:14         ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Ben Hutchings @ 2014-07-11 19:01 UTC (permalink / raw)
  To: Cong Wang; +Cc: Stephen Hemminger, netdev, David Miller, stable

[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]

On Fri, 2014-07-11 at 11:38 -0700, Cong Wang wrote:
> On Fri, Jul 11, 2014 at 11:11 AM, Ben Hutchings <ben@decadent.org.uk> wrote:
> > On Thu, 2014-07-10 at 10:11 -0700, Cong Wang wrote:
> >> On Thu, Jul 10, 2014 at 6:44 AM, Stephen Hemminger
> >> <stephen@networkplumber.org> wrote:
> >> >
> >> > After upgrading a router to Linux 3.2.60 most windows machines behind it
> >> > started experiencing connection stalls. Downgrade to 3.2.59 resolved the
> >> > problem. Using git bisect I pinpointed it to
> >> > "59d9f389df3cdf72833d5ee17c3fe959b6bdc792 is the first bad commit", which
> >> > entered the kernel from here
> >> > http://marc.info/?l=linux-netdev&m=139949081418806&w=2
> >> >
> >>
> >> This commit should have been reverted for older kernels like 3.2.y.
> >
> > Really?  We already had fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in
> > forwarding path") backported in 3.2.57.
> 
> I haven't read the code, but according to a previous discussion it sounds
> like that should be reverted:
> 
> http://lists.openwall.net/netdev/2014/06/11/67

My reading of that is we need 895162b1101b ("netfilter: ipv4: defrag:
set local_df flag on defragmented skb") in 3.2.y and 3.4.y.  But there
seem to be many other places that local_df should be set, that have only
recently been fixed.  So maybe reverting is the safer option.

Ben.

-- 
Ben Hutchings
To err is human; to really foul things up requires a computer.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug 79891] New: Router causes TCP retransmits for windows hosts after "ip_forward: fix inverted local_df test"
  2014-07-11 19:01       ` Ben Hutchings
@ 2014-07-11 19:14         ` David Miller
  2014-07-11 22:07           ` Florian Westphal
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2014-07-11 19:14 UTC (permalink / raw)
  To: ben; +Cc: cwang, stephen, netdev, stable

From: Ben Hutchings <ben@decadent.org.uk>
Date: Fri, 11 Jul 2014 20:01:12 +0100

> On Fri, 2014-07-11 at 11:38 -0700, Cong Wang wrote:
>> On Fri, Jul 11, 2014 at 11:11 AM, Ben Hutchings <ben@decadent.org.uk> wrote:
>> > On Thu, 2014-07-10 at 10:11 -0700, Cong Wang wrote:
>> >> On Thu, Jul 10, 2014 at 6:44 AM, Stephen Hemminger
>> >> <stephen@networkplumber.org> wrote:
>> >> >
>> >> > After upgrading a router to Linux 3.2.60 most windows machines behind it
>> >> > started experiencing connection stalls. Downgrade to 3.2.59 resolved the
>> >> > problem. Using git bisect I pinpointed it to
>> >> > "59d9f389df3cdf72833d5ee17c3fe959b6bdc792 is the first bad commit", which
>> >> > entered the kernel from here
>> >> > http://marc.info/?l=linux-netdev&m=139949081418806&w=2
>> >> >
>> >>
>> >> This commit should have been reverted for older kernels like 3.2.y.
>> >
>> > Really?  We already had fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in
>> > forwarding path") backported in 3.2.57.
>> 
>> I haven't read the code, but according to a previous discussion it sounds
>> like that should be reverted:
>> 
>> http://lists.openwall.net/netdev/2014/06/11/67
> 
> My reading of that is we need 895162b1101b ("netfilter: ipv4: defrag:
> set local_df flag on defragmented skb") in 3.2.y and 3.4.y.  But there
> seem to be many other places that local_df should be set, that have only
> recently been fixed.  So maybe reverting is the safer option.

Reverting is indeed probably safer.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug 79891] New: Router causes TCP retransmits for windows hosts after "ip_forward: fix inverted local_df test"
  2014-07-11 19:14         ` David Miller
@ 2014-07-11 22:07           ` Florian Westphal
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Westphal @ 2014-07-11 22:07 UTC (permalink / raw)
  To: David Miller; +Cc: ben, cwang, stephen, netdev, stable

David Miller <davem@davemloft.net> wrote:
> From: Ben Hutchings <ben@decadent.org.uk>
> Date: Fri, 11 Jul 2014 20:01:12 +0100
> 
> > On Fri, 2014-07-11 at 11:38 -0700, Cong Wang wrote:
[..]

> >> >> > http://marc.info/?l=linux-netdev&m=139949081418806&w=2
> >> >> >
> >> >>
> >> >> This commit should have been reverted for older kernels like 3.2.y.
> >> >
> >> > Really?  We already had fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in
> >> > forwarding path") backported in 3.2.57.
> >> 
> >> I haven't read the code, but according to a previous discussion it sounds
> >> like that should be reverted:
> >> 
> >> http://lists.openwall.net/netdev/2014/06/11/67
> > 
> > My reading of that is we need 895162b1101b ("netfilter: ipv4: defrag:
> > set local_df flag on defragmented skb") in 3.2.y and 3.4.y.  But there
> > seem to be many other places that local_df should be set, that have only
> > recently been fixed.  So maybe reverting is the safer option.
> 
> Reverting is indeed probably safer.

Right, I agree.  Reverting is safer.

IMO there are two possible options for 3.2 / 3.4:

1. Revert fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding
   path")
2. Backport 21d1196a3 ("ipv4: set transport header earlier") to 3.2/3.4 -stable

[ The problem is that transport header is not yet set in 3.2/3.4 in forward
  path so skb_gso_network_seglen() returns bogus length ]

There is a 3rd alternative (i mention this for completeness only).
You could sort-of 'soft-revert' to the old behaviour to not care
about GRO packets in the forward path.  The minium change is:

diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -50,7 +50,7 @@ static bool ip_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
        if (skb->len <= mtu)
                return false;
 
-       if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu)
+       if (skb_is_gso(skb))
                return false;
 
        return true;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index cb9df0e..f05d6ef 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -354,7 +354,7 @@ static bool ip6_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
        if (skb->ignore_df)
                return false;
 
-       if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu)
+       if (skb_is_gso(skb))
                return false;
 
        return true;

Dave/Greg, if this is what you prefer just let me know and I can submit such patch for 3.2
and 3.4 stable series.

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-11 22:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-10 13:44 Fw: [Bug 79891] New: Router causes TCP retransmits for windows hosts after "ip_forward: fix inverted local_df test" Stephen Hemminger
2014-07-10 17:11 ` Cong Wang
2014-07-11 18:11   ` Ben Hutchings
2014-07-11 18:38     ` Cong Wang
2014-07-11 19:01       ` Ben Hutchings
2014-07-11 19:14         ` David Miller
2014-07-11 22:07           ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).