netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fw: [Bug 195713] New: TCP recv queue grows huge
@ 2017-05-11 16:47 Stephen Hemminger
  2017-05-11 17:06 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2017-05-11 16:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev



Begin forwarded message:

Date: Thu, 11 May 2017 13:25:23 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 195713] New: TCP recv queue grows huge


https://bugzilla.kernel.org/show_bug.cgi?id=195713

            Bug ID: 195713
           Summary: TCP recv queue grows huge
           Product: Networking
           Version: 2.5
    Kernel Version: 3.13.0 4.4.0 4.9.0
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: IPV4
          Assignee: stephen@networkplumber.org
          Reporter: mkm@nabto.com
        Regression: No

I was testing how TCP handled advertising reductions of the window sizes
especially Window Full events. To create this setup I made a slow TCP receiver
and a fast TCP sender. To add some reality to the scenario I simulated 10ms
delay on the loopback device using the netem tc module.

Steps to reproduce:
Bevare these steps will use all the memory on your system

1. create latency on loopback
>sudo tc qdisc change dev lo root netem delay 0ms  

2. slow tcp receiver:
>nc -l 4242 | pv -L 1k  

3. fast tcp sender:
>nc 127.0.0.1 4242 < /dev/zero  

What to expect:
It is expected that the TCP recv queue is not groving unbounded e.g. the
following output from netstat:

>netstat -an | grep 4242
>tcp   5563486      0 127.0.0.1:4242          127.0.0.1:59113        
>ESTABLISHED
>tcp        0 3415559 127.0.0.1:59113         127.0.0.1:4242         
>ESTABLISHED  

What is seen:

The TCP receive queue grows until there is no more memory available on the
system.

>netstat -an | grep 4242
>tcp   223786525      0 127.0.0.1:4242          127.0.0.1:59114      
>ESTABLISHED
>tcp        0   4191037 127.0.0.1:59114         127.0.0.1:4242       
>ESTABLISHED  

Note: After the TCP recv queue reaches ~ 2^31 bytes netstat reports a 0 which
is not correct, it has probably not been created with this bug in mind.

Systems on which the bug reproducible:

  * debian testing, kernel 4.9.0
  * ubuntu 14.04, kernel 3.13.0
  * ubuntu 16.04, kernel 4.4.0

I have not testet on other systems than the above mentioned.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fw: [Bug 195713] New: TCP recv queue grows huge
  2017-05-11 16:47 Fw: [Bug 195713] New: TCP recv queue grows huge Stephen Hemminger
@ 2017-05-11 17:06 ` Eric Dumazet
  2017-05-11 19:29   ` Michael Madsen
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2017-05-11 17:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, netdev, mkm

On Thu, 2017-05-11 at 09:47 -0700, Stephen Hemminger wrote:
> 
> Begin forwarded message:
> 
> Date: Thu, 11 May 2017 13:25:23 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 195713] New: TCP recv queue grows huge
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=195713
> 
>             Bug ID: 195713
>            Summary: TCP recv queue grows huge
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 3.13.0 4.4.0 4.9.0
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>           Assignee: stephen@networkplumber.org
>           Reporter: mkm@nabto.com
>         Regression: No
> 
> I was testing how TCP handled advertising reductions of the window sizes
> especially Window Full events. To create this setup I made a slow TCP receiver
> and a fast TCP sender. To add some reality to the scenario I simulated 10ms
> delay on the loopback device using the netem tc module.
> 
> Steps to reproduce:
> Bevare these steps will use all the memory on your system
> 
> 1. create latency on loopback
> >sudo tc qdisc change dev lo root netem delay 0ms  
> 
> 2. slow tcp receiver:
> >nc -l 4242 | pv -L 1k  
> 
> 3. fast tcp sender:
> >nc 127.0.0.1 4242 < /dev/zero  
> 
> What to expect:
> It is expected that the TCP recv queue is not groving unbounded e.g. the
> following output from netstat:
> 
> >netstat -an | grep 4242
> >tcp   5563486      0 127.0.0.1:4242          127.0.0.1:59113        
> >ESTABLISHED
> >tcp        0 3415559 127.0.0.1:59113         127.0.0.1:4242         
> >ESTABLISHED  
> 
> What is seen:
> 
> The TCP receive queue grows until there is no more memory available on the
> system.
> 
> >netstat -an | grep 4242
> >tcp   223786525      0 127.0.0.1:4242          127.0.0.1:59114      
> >ESTABLISHED
> >tcp        0   4191037 127.0.0.1:59114         127.0.0.1:4242       
> >ESTABLISHED  
> 
> Note: After the TCP recv queue reaches ~ 2^31 bytes netstat reports a 0 which
> is not correct, it has probably not been created with this bug in mind.
> 
> Systems on which the bug reproducible:
> 
>   * debian testing, kernel 4.9.0
>   * ubuntu 14.04, kernel 3.13.0
>   * ubuntu 16.04, kernel 4.4.0
> 
> I have not testet on other systems than the above mentioned.
> 


Not reproducible on my test machine.

Somehow some sysctl must have been set to an insane value by
mkm@nabto.com ?

Please use/report ss -temoi instead of old netstat which does not
provide info.

lpaa23:~# tc -s -d qd sh dev lo
qdisc netem 8002: root refcnt 2 limit 1000
 Sent 1153017 bytes 388 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 

lpaa23:~# ss -temoi dst :4242 or src :4242
State      Recv-Q Send-Q Local Address:Port                 Peer
Address:Port                
ESTAB      0      3255206 127.0.0.1:35672                127.0.0.1:4242
timer:(persist,15sec,0) ino:3740676 sk:1 <->
	 skmem:(r0,rb1060272,t0,tb4194304,f2650,w3319206,o0,bl0,d0) ts sack
cubic wscale:8,8 rto:230 backoff:7 rtt:20.879/26.142 mss:65483
rcvmss:536 advmss:65483 cwnd:19 ssthresh:19 bytes_acked:3258385
segs_out:86 segs_in:50 data_segs_out:68 send 476.7Mbps lastsnd:43940
lastrcv:163390 lastack:13500 pacing_rate 572.0Mbps delivery_rate
11146.0Mbps busy:163390ms rwnd_limited:163380ms(100.0%) retrans:0/1
rcv_space:43690 notsent:3255206 minrtt:0.002
ESTAB      3022864 0      127.0.0.1:4242                 127.0.0.1:35672
ino:3703653 sk:2 <->
	 skmem:(r3259664,rb3406910,t0,tb2626560,f752,w0,o0,bl0,d17) ts sack
cubic wscale:8,8 rto:210 rtt:0.019/0.009 ato:120 mss:21888 rcvmss:65483
advmss:65483 cwnd:10 bytes_received:3258384 segs_out:49 segs_in:86
data_segs_in:68 send 92160.0Mbps lastsnd:163390 lastrcv:43940
lastack:43940 rcv_rtt:0.239 rcv_space:61440 minrtt:0.019


lpaa23:~# uname -a
Linux lpaa23 4.11.0-smp-DEV #197 SMP @1494476384 x86_64 GNU/Linux

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fw: [Bug 195713] New: TCP recv queue grows huge
  2017-05-11 17:06 ` Eric Dumazet
@ 2017-05-11 19:29   ` Michael Madsen
  2017-05-11 19:42     ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Madsen @ 2017-05-11 19:29 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger; +Cc: Eric Dumazet, netdev



On 05/11/2017 07:06 PM, Eric Dumazet wrote:
> On Thu, 2017-05-11 at 09:47 -0700, Stephen Hemminger wrote:
>> Begin forwarded message:
>>
>> Date: Thu, 11 May 2017 13:25:23 +0000
>> From: bugzilla-daemon@bugzilla.kernel.org
>> To: stephen@networkplumber.org
>> Subject: [Bug 195713] New: TCP recv queue grows huge
>>
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=195713
>>
>>              Bug ID: 195713
>>             Summary: TCP recv queue grows huge
>>             Product: Networking
>>             Version: 2.5
>>      Kernel Version: 3.13.0 4.4.0 4.9.0
>>            Hardware: All
>>                  OS: Linux
>>                Tree: Mainline
>>              Status: NEW
>>            Severity: normal
>>            Priority: P1
>>           Component: IPV4
>>            Assignee: stephen@networkplumber.org
>>            Reporter: mkm@nabto.com
>>          Regression: No
>>
>> I was testing how TCP handled advertising reductions of the window sizes
>> especially Window Full events. To create this setup I made a slow TCP receiver
>> and a fast TCP sender. To add some reality to the scenario I simulated 10ms
>> delay on the loopback device using the netem tc module.
>>
>> Steps to reproduce:
>> Bevare these steps will use all the memory on your system
>>
>> 1. create latency on loopback
>>> sudo tc qdisc change dev lo root netem delay 0ms
>> 2. slow tcp receiver:
>>> nc -l 4242 | pv -L 1k
>> 3. fast tcp sender:
>>> nc 127.0.0.1 4242 < /dev/zero
>> What to expect:
>> It is expected that the TCP recv queue is not groving unbounded e.g. the
>> following output from netstat:
>>
>>> netstat -an | grep 4242
>>> tcp   5563486      0 127.0.0.1:4242          127.0.0.1:59113
>>> ESTABLISHED
>>> tcp        0 3415559 127.0.0.1:59113         127.0.0.1:4242
>>> ESTABLISHED
>> What is seen:
>>
>> The TCP receive queue grows until there is no more memory available on the
>> system.
>>
>>> netstat -an | grep 4242
>>> tcp   223786525      0 127.0.0.1:4242          127.0.0.1:59114
>>> ESTABLISHED
>>> tcp        0   4191037 127.0.0.1:59114         127.0.0.1:4242
>>> ESTABLISHED
>> Note: After the TCP recv queue reaches ~ 2^31 bytes netstat reports a 0 which
>> is not correct, it has probably not been created with this bug in mind.
>>
>> Systems on which the bug reproducible:
>>
>>    * debian testing, kernel 4.9.0
>>    * ubuntu 14.04, kernel 3.13.0
>>    * ubuntu 16.04, kernel 4.4.0
>>
>> I have not testet on other systems than the above mentioned.
>>
>
> Not reproducible on my test machine.
>
> Somehow some sysctl must have been set to an insane value by
> mkm@nabto.com ?
>
> Please use/report ss -temoi instead of old netstat which does not
> provide info.
>
> lpaa23:~# tc -s -d qd sh dev lo
> qdisc netem 8002: root refcnt 2 limit 1000
>   Sent 1153017 bytes 388 pkt (dropped 0, overlimits 0 requeues 0)
>   backlog 0b 0p requeues 0
>
> lpaa23:~# ss -temoi dst :4242 or src :4242
> State      Recv-Q Send-Q Local Address:Port                 Peer
> Address:Port
> ESTAB      0      3255206 127.0.0.1:35672                127.0.0.1:4242
> timer:(persist,15sec,0) ino:3740676 sk:1 <->
> 	 skmem:(r0,rb1060272,t0,tb4194304,f2650,w3319206,o0,bl0,d0) ts sack
> cubic wscale:8,8 rto:230 backoff:7 rtt:20.879/26.142 mss:65483
> rcvmss:536 advmss:65483 cwnd:19 ssthresh:19 bytes_acked:3258385
> segs_out:86 segs_in:50 data_segs_out:68 send 476.7Mbps lastsnd:43940
> lastrcv:163390 lastack:13500 pacing_rate 572.0Mbps delivery_rate
> 11146.0Mbps busy:163390ms rwnd_limited:163380ms(100.0%) retrans:0/1
> rcv_space:43690 notsent:3255206 minrtt:0.002
> ESTAB      3022864 0      127.0.0.1:4242                 127.0.0.1:35672
> ino:3703653 sk:2 <->
> 	 skmem:(r3259664,rb3406910,t0,tb2626560,f752,w0,o0,bl0,d17) ts sack
> cubic wscale:8,8 rto:210 rtt:0.019/0.009 ato:120 mss:21888 rcvmss:65483
> advmss:65483 cwnd:10 bytes_received:3258384 segs_out:49 segs_in:86
> data_segs_in:68 send 92160.0Mbps lastsnd:163390 lastrcv:43940
> lastack:43940 rcv_rtt:0.239 rcv_space:61440 minrtt:0.019
>
>
> lpaa23:~# uname -a
> Linux lpaa23 4.11.0-smp-DEV #197 SMP @1494476384 x86_64 GNU/Linux
>
>
>

I've made an error in the bugreport, sorry, the tc step should set a 
nonzero delay e.g.
tc qdisc change dev lo root netem delay 100ms

tc -s -d qd sh dev lo
qdisc netem 8001: root refcnt 2 limit 1000 delay 100.0ms
  Sent 2310729789 bytes 56051 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0

netstat -an | grep 4242
tcp   1737737598      0 127.0.0.1:4242 127.0.0.1:47724         ESTABLISHED
tcp        0 3734810 127.0.0.1:47724         127.0.0.1:4242 ESTABLISHED

ss -temoi dst :4242 or src :4242
State      Recv-Q Send-Q Local Address:Port                 Peer 
Address:Port
ESTAB      1771226600 0      127.0.0.1:4242 
127.0.0.1:47724                 uid:1000 ino:248318 sk:21 <->
skmem:(r4292138050,rb5633129,t40,tb2626560,f3006,w0,o0,bl0,d0) ts sack 
cubic wscale:7,7 rto:600 rtt:200.15/100.075 ato:40 mss:21888 cwnd:10 
bytes_received:1771576125 segs_out:13932 segs_in:27728 
data_segs_in:27726 send 8.7Mbps lastsnd:132144 lastrcv:4 lastack:4852 
pacing_rate 17.5Mbps rcv_rtt:202 rcv_space:188413 minrtt:200.15
ESTAB      0      3866200 127.0.0.1:47724 
127.0.0.1:4242                  timer:(on,372ms,0) uid:1000 ino:246613 
sk:22 <->
skmem:(r0,rb1061808,t4,tb4194304,f267688,w3943000,o0,bl0,d0) ts sack 
cubic wscale:7,7 rto:404 rtt:200.112/0.058 mss:65483 cwnd:89 
bytes_acked:1769019586 segs_out:27732 segs_in:13913 data_segs_out:27730 
send 233.0Mbps lastsnd:32 lastrcv:26247708 lastack:32 pacing_rate 
466.0Mbps unacked:44 rcv_space:43690 notsent:1047728 minrtt:200.011

uname -a
Linux mkm 4.9.0-2-amd64 #1 SMP Debian 4.9.18-1 (2017-03-30) x86_64 GNU/Linux

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fw: [Bug 195713] New: TCP recv queue grows huge
  2017-05-11 19:29   ` Michael Madsen
@ 2017-05-11 19:42     ` Eric Dumazet
  2017-05-11 22:24       ` [PATCH net] netem: fix skb_orphan_partial() Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2017-05-11 19:42 UTC (permalink / raw)
  To: Michael Madsen; +Cc: Stephen Hemminger, Eric Dumazet, netdev

On Thu, 2017-05-11 at 21:29 +0200, Michael Madsen wrote:
> 
> On 05/11/2017 07:06 PM, Eric Dumazet wrote:
> > On Thu, 2017-05-11 at 09:47 -0700, Stephen Hemminger wrote:
> >> Begin forwarded message:
> >>
> >> Date: Thu, 11 May 2017 13:25:23 +0000
> >> From: bugzilla-daemon@bugzilla.kernel.org
> >> To: stephen@networkplumber.org
> >> Subject: [Bug 195713] New: TCP recv queue grows huge
> >>
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=195713
> >>
> >>              Bug ID: 195713
> >>             Summary: TCP recv queue grows huge
> >>             Product: Networking
> >>             Version: 2.5
> >>      Kernel Version: 3.13.0 4.4.0 4.9.0
> >>            Hardware: All
> >>                  OS: Linux
> >>                Tree: Mainline
> >>              Status: NEW
> >>            Severity: normal
> >>            Priority: P1
> >>           Component: IPV4
> >>            Assignee: stephen@networkplumber.org
> >>            Reporter: mkm@nabto.com
> >>          Regression: No
> >>
> >> I was testing how TCP handled advertising reductions of the window sizes
> >> especially Window Full events. To create this setup I made a slow TCP receiver
> >> and a fast TCP sender. To add some reality to the scenario I simulated 10ms
> >> delay on the loopback device using the netem tc module.
> >>
> >> Steps to reproduce:
> >> Bevare these steps will use all the memory on your system
> >>
> >> 1. create latency on loopback
> >>> sudo tc qdisc change dev lo root netem delay 0ms
> >> 2. slow tcp receiver:
> >>> nc -l 4242 | pv -L 1k
> >> 3. fast tcp sender:
> >>> nc 127.0.0.1 4242 < /dev/zero
> >> What to expect:
> >> It is expected that the TCP recv queue is not groving unbounded e.g. the
> >> following output from netstat:
> >>
> >>> netstat -an | grep 4242
> >>> tcp   5563486      0 127.0.0.1:4242          127.0.0.1:59113
> >>> ESTABLISHED
> >>> tcp        0 3415559 127.0.0.1:59113         127.0.0.1:4242
> >>> ESTABLISHED
> >> What is seen:
> >>
> >> The TCP receive queue grows until there is no more memory available on the
> >> system.
> >>
> >>> netstat -an | grep 4242
> >>> tcp   223786525      0 127.0.0.1:4242          127.0.0.1:59114
> >>> ESTABLISHED
> >>> tcp        0   4191037 127.0.0.1:59114         127.0.0.1:4242
> >>> ESTABLISHED
> >> Note: After the TCP recv queue reaches ~ 2^31 bytes netstat reports a 0 which
> >> is not correct, it has probably not been created with this bug in mind.
> >>
> >> Systems on which the bug reproducible:
> >>
> >>    * debian testing, kernel 4.9.0
> >>    * ubuntu 14.04, kernel 3.13.0
> >>    * ubuntu 16.04, kernel 4.4.0
> >>
> >> I have not testet on other systems than the above mentioned.
> >>
> >
> > Not reproducible on my test machine.
> >
> > Somehow some sysctl must have been set to an insane value by
> > mkm@nabto.com ?
> >
> > Please use/report ss -temoi instead of old netstat which does not
> > provide info.
> >
> > lpaa23:~# tc -s -d qd sh dev lo
> > qdisc netem 8002: root refcnt 2 limit 1000
> >   Sent 1153017 bytes 388 pkt (dropped 0, overlimits 0 requeues 0)
> >   backlog 0b 0p requeues 0
> >
> > lpaa23:~# ss -temoi dst :4242 or src :4242
> > State      Recv-Q Send-Q Local Address:Port                 Peer
> > Address:Port
> > ESTAB      0      3255206 127.0.0.1:35672                127.0.0.1:4242
> > timer:(persist,15sec,0) ino:3740676 sk:1 <->
> > 	 skmem:(r0,rb1060272,t0,tb4194304,f2650,w3319206,o0,bl0,d0) ts sack
> > cubic wscale:8,8 rto:230 backoff:7 rtt:20.879/26.142 mss:65483
> > rcvmss:536 advmss:65483 cwnd:19 ssthresh:19 bytes_acked:3258385
> > segs_out:86 segs_in:50 data_segs_out:68 send 476.7Mbps lastsnd:43940
> > lastrcv:163390 lastack:13500 pacing_rate 572.0Mbps delivery_rate
> > 11146.0Mbps busy:163390ms rwnd_limited:163380ms(100.0%) retrans:0/1
> > rcv_space:43690 notsent:3255206 minrtt:0.002
> > ESTAB      3022864 0      127.0.0.1:4242                 127.0.0.1:35672
> > ino:3703653 sk:2 <->
> > 	 skmem:(r3259664,rb3406910,t0,tb2626560,f752,w0,o0,bl0,d17) ts sack
> > cubic wscale:8,8 rto:210 rtt:0.019/0.009 ato:120 mss:21888 rcvmss:65483
> > advmss:65483 cwnd:10 bytes_received:3258384 segs_out:49 segs_in:86
> > data_segs_in:68 send 92160.0Mbps lastsnd:163390 lastrcv:43940
> > lastack:43940 rcv_rtt:0.239 rcv_space:61440 minrtt:0.019
> >
> >
> > lpaa23:~# uname -a
> > Linux lpaa23 4.11.0-smp-DEV #197 SMP @1494476384 x86_64 GNU/Linux
> >
> >
> >
> 
> I've made an error in the bugreport, sorry, the tc step should set a 
> nonzero delay e.g.
> tc qdisc change dev lo root netem delay 100ms
> 
> tc -s -d qd sh dev lo
> qdisc netem 8001: root refcnt 2 limit 1000 delay 100.0ms
>   Sent 2310729789 bytes 56051 pkt (dropped 0, overlimits 0 requeues 0)
>   backlog 0b 0p requeues 0
> 
> netstat -an | grep 4242
> tcp   1737737598      0 127.0.0.1:4242 127.0.0.1:47724         ESTABLISHED
> tcp        0 3734810 127.0.0.1:47724         127.0.0.1:4242 ESTABLISHED
> 
> ss -temoi dst :4242 or src :4242
> State      Recv-Q Send-Q Local Address:Port                 Peer 
> Address:Port
> ESTAB      1771226600 0      127.0.0.1:4242 
> 127.0.0.1:47724                 uid:1000 ino:248318 sk:21 <->
> skmem:(r4292138050,rb5633129,t40,tb2626560,f3006,w0,o0,bl0,d0) ts sack 
> cubic wscale:7,7 rto:600 rtt:200.15/100.075 ato:40 mss:21888 cwnd:10 
> bytes_received:1771576125 segs_out:13932 segs_in:27728 
> data_segs_in:27726 send 8.7Mbps lastsnd:132144 lastrcv:4 lastack:4852 
> pacing_rate 17.5Mbps rcv_rtt:202 rcv_space:188413 minrtt:200.15
> ESTAB      0      3866200 127.0.0.1:47724 
> 127.0.0.1:4242                  timer:(on,372ms,0) uid:1000 ino:246613 
> sk:22 <->
> skmem:(r0,rb1061808,t4,tb4194304,f267688,w3943000,o0,bl0,d0) ts sack 
> cubic wscale:7,7 rto:404 rtt:200.112/0.058 mss:65483 cwnd:89 
> bytes_acked:1769019586 segs_out:27732 segs_in:13913 data_segs_out:27730 
> send 233.0Mbps lastsnd:32 lastrcv:26247708 lastack:32 pacing_rate 
> 466.0Mbps unacked:44 rcv_space:43690 notsent:1047728 minrtt:200.011
> 
> uname -a
> Linux mkm 4.9.0-2-amd64 #1 SMP Debian 4.9.18-1 (2017-03-30) x86_64 GNU/Linux

Oh this is a bug in netem, using skb_orphan_partial() even for packets
that might loopback to this host.

I will send a fix, thanks for the report.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net] netem: fix skb_orphan_partial()
  2017-05-11 19:42     ` Eric Dumazet
@ 2017-05-11 22:24       ` Eric Dumazet
  2017-05-12  1:33         ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2017-05-11 22:24 UTC (permalink / raw)
  To: Michael Madsen, David Miller; +Cc: Stephen Hemminger, Eric Dumazet, netdev

From: Eric Dumazet <edumazet@google.com>

I should have known that lowering skb->truesize was dangerous :/

In case packets are not leaving the host via a standard Ethernet device,
but looped back to local sockets, bad things can happen, as reported
by Michael Madsen ( https://bugzilla.kernel.org/show_bug.cgi?id=195713 )

So instead of tweaking skb->truesize, lets change skb->destructor
and keep a reference on the owner socket via its sk_refcnt.

Fixes: f2f872f9272a ("netem: Introduce skb_orphan_partial() helper")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michael Madsen <mkm@nabto.com>
---
 net/core/sock.c |   20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 79c6aee6af9b817bd7086f04ae8f46342a3bf4b6..e43e71d7856b385111cd4c4b1bd835a78c670c60 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1803,28 +1803,24 @@ EXPORT_SYMBOL(skb_set_owner_w);
  * delay queue. We want to allow the owner socket to send more
  * packets, as if they were already TX completed by a typical driver.
  * But we also want to keep skb->sk set because some packet schedulers
- * rely on it (sch_fq for example). So we set skb->truesize to a small
- * amount (1) and decrease sk_wmem_alloc accordingly.
+ * rely on it (sch_fq for example).
  */
 void skb_orphan_partial(struct sk_buff *skb)
 {
-	/* If this skb is a TCP pure ACK or already went here,
-	 * we have nothing to do. 2 is already a very small truesize.
-	 */
-	if (skb->truesize <= 2)
+	if (skb_is_tcp_pure_ack(skb))
 		return;
 
-	/* TCP stack sets skb->ooo_okay based on sk_wmem_alloc,
-	 * so we do not completely orphan skb, but transfert all
-	 * accounted bytes but one, to avoid unexpected reorders.
-	 */
 	if (skb->destructor == sock_wfree
 #ifdef CONFIG_INET
 	    || skb->destructor == tcp_wfree
 #endif
 		) {
-		atomic_sub(skb->truesize - 1, &skb->sk->sk_wmem_alloc);
-		skb->truesize = 1;
+		struct sock *sk = skb->sk;
+
+		if (atomic_inc_not_zero(&sk->sk_refcnt)) {
+			atomic_sub(skb->truesize, &sk->sk_wmem_alloc);
+			skb->destructor = sock_efree;
+		}
 	} else {
 		skb_orphan(skb);
 	}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] netem: fix skb_orphan_partial()
  2017-05-11 22:24       ` [PATCH net] netem: fix skb_orphan_partial() Eric Dumazet
@ 2017-05-12  1:33         ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2017-05-12  1:33 UTC (permalink / raw)
  To: eric.dumazet; +Cc: mkm, stephen, edumazet, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 11 May 2017 15:24:41 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> I should have known that lowering skb->truesize was dangerous :/
> 
> In case packets are not leaving the host via a standard Ethernet device,
> but looped back to local sockets, bad things can happen, as reported
> by Michael Madsen ( https://bugzilla.kernel.org/show_bug.cgi?id=195713 )
> 
> So instead of tweaking skb->truesize, lets change skb->destructor
> and keep a reference on the owner socket via its sk_refcnt.
> 
> Fixes: f2f872f9272a ("netem: Introduce skb_orphan_partial() helper")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Michael Madsen <mkm@nabto.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-05-12  1:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-11 16:47 Fw: [Bug 195713] New: TCP recv queue grows huge Stephen Hemminger
2017-05-11 17:06 ` Eric Dumazet
2017-05-11 19:29   ` Michael Madsen
2017-05-11 19:42     ` Eric Dumazet
2017-05-11 22:24       ` [PATCH net] netem: fix skb_orphan_partial() Eric Dumazet
2017-05-12  1:33         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).