netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* TCP out of memory - possible bug [3.18.0-rc3] / sched?
@ 2014-11-03 18:59 Tomasz Mloduchowski
  2014-11-03 19:41 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Tomasz Mloduchowski @ 2014-11-03 18:59 UTC (permalink / raw)
  To: netdev

Hi List,

I hope this is the right place to report a networking issue with
3.18.0-rc2 and 3.18.0-rc3 - under heavy P2P load (tested both
rtorrent/libtorrent and bitcoind, so not protocol-specific), the system
quickly exhausts tcp_mem limits in a very strange sequence of events.

It might be scheduler or networking subsystem related.

It's 100% reproducible on my system, first observed under 3.18.0-rc2.

Neither terminating the offending application, nor removing the network
card seems to bring the 'mem' item in /proc/net/sockstat down from it's
extreme values.

http://static.qdot.me/tcp_mem_issue.png contains the plot of the 'mem'
and 'sockets' fields from sockstat - violet is the 'mem', quickly
exhausting the default 512k pages limit after a short period of reliable
operation.


Best Regards,
Tomasz

-- snip --

sched: RT throttling activated
kworker/dying (792) used greatest stack depth: 11984 bytes left
TCP: out of memory -- consider tuning tcp_mem
TCP: out of memory -- consider tuning tcp_mem
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5802 at net/core/stream.c:201
sk_stream_kill_queues+0x12c/0x130()
Modules linked in:
CPU: 0 PID: 5802 Comm: main Not tainted 3.18.0-rc3 #2
Hardware name: LENOVO 37023L0/37023L0, BIOS GFET48WW (1.27 ) 07/01/2014
 0000000000000009 ffff8800cae2fd88 ffffffff81b6c0ff 0000000000000000
 0000000000000000 ffff8800cae2fdc8 ffffffff810c235c ffff8800cae2fda8
 ffff8800bd996580 ffff8800bd996700 0000000000000004 ffff8800bd996610
Call Trace:
 [<ffffffff81b6c0ff>] dump_stack+0x46/0x58
 [<ffffffff810c235c>] warn_slowpath_common+0x7c/0xa0
 [<ffffffff810c2425>] warn_slowpath_null+0x15/0x20
 [<ffffffff81954fbc>] sk_stream_kill_queues+0x12c/0x130
 [<ffffffff819b8b45>] inet_csk_destroy_sock+0x55/0x140
 [<ffffffff819bd9ee>] tcp_close+0x22e/0x430
 [<ffffffff819e3b82>] inet_release+0x72/0x80
 [<ffffffff819455fa>] sock_release+0x1a/0x90
 [<ffffffff8194567d>] sock_close+0xd/0x20
 [<ffffffff811dfdf6>] __fput+0xc6/0x1d0
 [<ffffffff811dff49>] ____fput+0x9/0x10
 [<ffffffff810db94f>] task_work_run+0x8f/0xd0
 [<ffffffff81046b12>] do_notify_resume+0x82/0xa0
 [<ffffffff81b7613f>] int_signal+0x12/0x17
---[ end trace 080b1124407d2571 ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5802 at net/ipv4/af_inet.c:153
inet_sock_destruct+0x1d9/0x1e0()
Modules linked in:
CPU: 0 PID: 5802 Comm: main Tainted: G        W      3.18.0-rc3 #2
Hardware name: LENOVO 37023L0/37023L0, BIOS GFET48WW (1.27 ) 07/01/2014
 0000000000000009 ffff8800cae2fd68 ffffffff81b6c0ff 0000000000000000
 0000000000000000 ffff8800cae2fda8 ffffffff810c235c ffff8800cae2fd88
 ffff8800bd996580 ffff8800bd996700 0000000000000004 ffff8800bd996610
Call Trace:
 [<ffffffff81b6c0ff>] dump_stack+0x46/0x58
 [<ffffffff810c235c>] warn_slowpath_common+0x7c/0xa0
 [<ffffffff810c2425>] warn_slowpath_null+0x15/0x20
 [<ffffffff819e5179>] inet_sock_destruct+0x1d9/0x1e0
 [<ffffffff81949e7e>] __sk_free+0x1e/0x100
 [<ffffffff81949f79>] sk_free+0x19/0x20
 [<ffffffff819bd918>] tcp_close+0x158/0x430
 [<ffffffff819e3b82>] inet_release+0x72/0x80
 [<ffffffff819455fa>] sock_release+0x1a/0x90
 [<ffffffff8194567d>] sock_close+0xd/0x20
 [<ffffffff811dfdf6>] __fput+0xc6/0x1d0
 [<ffffffff811dff49>] ____fput+0x9/0x10
 [<ffffffff810db94f>] task_work_run+0x8f/0xd0
 [<ffffffff81046b12>] do_notify_resume+0x82/0xa0
 [<ffffffff81b7613f>] int_signal+0x12/0x17
---[ end trace 080b1124407d2572 ]---
kworker/dying (679) used greatest stack depth: 11784 bytes left
TCP: out of memory -- consider tuning tcp_mem

-- snip --

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP out of memory - possible bug [3.18.0-rc3] / sched?
  2014-11-03 18:59 TCP out of memory - possible bug [3.18.0-rc3] / sched? Tomasz Mloduchowski
@ 2014-11-03 19:41 ` Eric Dumazet
  2015-01-04  8:42   ` Markus Trippelsdorf
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2014-11-03 19:41 UTC (permalink / raw)
  To: Tomasz Mloduchowski; +Cc: netdev

On Mon, 2014-11-03 at 19:59 +0100, Tomasz Mloduchowski wrote:
> Hi List,
> 
> I hope this is the right place to report a networking issue with
> 3.18.0-rc2 and 3.18.0-rc3 - under heavy P2P load (tested both
> rtorrent/libtorrent and bitcoind, so not protocol-specific), the system
> quickly exhausts tcp_mem limits in a very strange sequence of events.
> 
> It might be scheduler or networking subsystem related.
> 
> It's 100% reproducible on my system, first observed under 3.18.0-rc2.
> 

Sounds a perfect case for a bisection maybe ?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP out of memory - possible bug [3.18.0-rc3] / sched?
  2014-11-03 19:41 ` Eric Dumazet
@ 2015-01-04  8:42   ` Markus Trippelsdorf
  2015-01-04 21:01     ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Markus Trippelsdorf @ 2015-01-04  8:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tomasz Mloduchowski, netdev

On 2014.11.03 at 11:41 -0800, Eric Dumazet wrote:
> On Mon, 2014-11-03 at 19:59 +0100, Tomasz Mloduchowski wrote:
> > Hi List,
> > 
> > I hope this is the right place to report a networking issue with
> > 3.18.0-rc2 and 3.18.0-rc3 - under heavy P2P load (tested both
> > rtorrent/libtorrent and bitcoind, so not protocol-specific), the system
> > quickly exhausts tcp_mem limits in a very strange sequence of events.
> > 
> > It might be scheduler or networking subsystem related.
> > 
> > It's 100% reproducible on my system, first observed under 3.18.0-rc2.
> > 
> 
> Sounds a perfect case for a bisection maybe ?

Any news on this issue?

I stumbled across the same problem today with 3.19.0-rc2:

Jan  4 09:23:22 x4 kernel: TCP: out of memory -- consider tuning tcp_mem
Jan  4 09:23:23 x4 kernel: ------------[ cut here ]------------
Jan  4 09:23:23 x4 kernel: WARNING: CPU: 3 PID: 32504 at net/core/stream.c:201 inet_csk_destroy_sock+0x4d/0x100()
Jan  4 09:23:23 x4 kernel: CPU: 3 PID: 32504 Comm: main Not tainted 3.19.0-rc2-00193-g5e0f872c7d7e-dirty #52
Jan  4 09:23:23 x4 kernel: Hardware name: System manufacturer System Product Name/M4A78T-E, BIOS 3503    04/13/2011
Jan  4 09:23:23 x4 kernel: 0000000000000000 ffffffff819b8544 ffffffff81686ffc 0000000000000000
Jan  4 09:23:23 x4 kernel: ffffffff8106b352 ffff88008a623a80 ffff88008a623be8 0000000000000004
Jan  4 09:23:23 x4 kernel: ffff88008a623ae8 ffff88012d3f9240 ffffffff8162fead ffff88008a623a80
Jan  4 09:23:23 x4 kernel: Call Trace:
Jan  4 09:23:23 x4 kernel: [<ffffffff81686ffc>] ? dump_stack+0x40/0x50
Jan  4 09:23:23 x4 kernel: [<ffffffff8106b352>] ? warn_slowpath_common+0x72/0xc0
Jan  4 09:23:23 x4 kernel: [<ffffffff8162fead>] ? inet_csk_destroy_sock+0x4d/0x100
Jan  4 09:23:23 x4 kernel: [<ffffffff81633746>] ? tcp_close+0x226/0x400
Jan  4 09:23:23 x4 kernel: [<ffffffff816582cd>] ? inet_release+0x6d/0x80
Jan  4 09:23:23 x4 kernel: [<ffffffff815cf090>] ? sock_release+0x10/0x80
Jan  4 09:23:23 x4 kernel: [<ffffffff815cf10d>] ? sock_close+0xd/0x20
Jan  4 09:23:23 x4 kernel: [<ffffffff8110eeca>] ? __fput+0xca/0x1e0
Jan  4 09:23:23 x4 kernel: [<ffffffff81081c0f>] ? task_work_run+0xaf/0x100
Jan  4 09:23:23 x4 kernel: [<ffffffff81689734>] ? __schedule+0x254/0x760
Jan  4 09:23:23 x4 kernel: [<ffffffff81039ae1>] ? do_notify_resume+0x61/0xa0
Jan  4 09:23:23 x4 kernel: [<ffffffff8110f33e>] ? fput+0x3e/0x80
Jan  4 09:23:23 x4 kernel: [<ffffffff8168dc18>] ? int_signal+0x12/0x17
Jan  4 09:23:23 x4 kernel: ---[ end trace ccb3dc614f5600b1 ]---
Jan  4 09:23:23 x4 kernel: ------------[ cut here ]------------
Jan  4 09:23:23 x4 kernel: WARNING: CPU: 3 PID: 32504 at net/ipv4/af_inet.c:153 inet_sock_destruct+0x169/0x1e0()
Jan  4 09:23:23 x4 kernel: CPU: 3 PID: 32504 Comm: main Tainted: G        W      3.19.0-rc2-00193-g5e0f872c7d7e-dirty #52
Jan  4 09:23:23 x4 kernel: Hardware name: System manufacturer System Product Name/M4A78T-E, BIOS 3503    04/13/2011
Jan  4 09:23:23 x4 kernel: 0000000000000000 ffffffff819b9876 ffffffff81686ffc 0000000000000000
Jan  4 09:23:23 x4 kernel: ffffffff8106b352 ffff88008a623a80 ffff88008a623bc8 ffff880100104a30
Jan  4 09:23:23 x4 kernel: ffff880216864660 ffff88012d3f9240 ffffffff816591e9 ffff8800d5e2a810
Jan  4 09:23:23 x4 kernel: Call Trace:
Jan  4 09:23:23 x4 kernel: [<ffffffff81686ffc>] ? dump_stack+0x40/0x50
Jan  4 09:23:23 x4 kernel: [<ffffffff8106b352>] ? warn_slowpath_common+0x72/0xc0
Jan  4 09:23:23 x4 kernel: [<ffffffff816591e9>] ? inet_sock_destruct+0x169/0x1e0
Jan  4 09:23:23 x4 kernel: [<ffffffff815d1792>] ? __sk_free+0x12/0xc0
Jan  4 09:23:23 x4 kernel: [<ffffffff816582cd>] ? inet_release+0x6d/0x80
Jan  4 09:23:23 x4 kernel: [<ffffffff815cf090>] ? sock_release+0x10/0x80
Jan  4 09:23:23 x4 kernel: [<ffffffff815cf10d>] ? sock_close+0xd/0x20
Jan  4 09:23:23 x4 kernel: [<ffffffff8110eeca>] ? __fput+0xca/0x1e0
Jan  4 09:23:23 x4 kernel: [<ffffffff81081c0f>] ? task_work_run+0xaf/0x100
Jan  4 09:23:23 x4 kernel: [<ffffffff81689734>] ? __schedule+0x254/0x760
Jan  4 09:23:23 x4 kernel: [<ffffffff81039ae1>] ? do_notify_resume+0x61/0xa0
Jan  4 09:23:23 x4 kernel: [<ffffffff8110f33e>] ? fput+0x3e/0x80
Jan  4 09:23:23 x4 kernel: [<ffffffff8168dc18>] ? int_signal+0x12/0x17
Jan  4 09:23:23 x4 kernel: ---[ end trace ccb3dc614f5600b2 ]---

rtorrent was spinning and the network was unusable until reboot.

-- 
Markus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP out of memory - possible bug [3.18.0-rc3] / sched?
  2015-01-04  8:42   ` Markus Trippelsdorf
@ 2015-01-04 21:01     ` Eric Dumazet
  2015-01-04 21:39       ` Markus Trippelsdorf
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2015-01-04 21:01 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: Tomasz Mloduchowski, netdev

On Sun, 2015-01-04 at 09:42 +0100, Markus Trippelsdorf wrote:

> Any news on this issue?
> 
> I stumbled across the same problem today with 3.19.0-rc2:

What make you think it was fixed in 3.19.0-rc2 ?

Have you tried a bisection ?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP out of memory - possible bug [3.18.0-rc3] / sched?
  2015-01-04 21:01     ` Eric Dumazet
@ 2015-01-04 21:39       ` Markus Trippelsdorf
  0 siblings, 0 replies; 5+ messages in thread
From: Markus Trippelsdorf @ 2015-01-04 21:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tomasz Mloduchowski, netdev

On 2015.01.04 at 13:01 -0800, Eric Dumazet wrote:
> On Sun, 2015-01-04 at 09:42 +0100, Markus Trippelsdorf wrote:
> 
> > Any news on this issue?
> > 
> > I stumbled across the same problem today with 3.19.0-rc2:
> 
> What make you think it was fixed in 3.19.0-rc2 ?

Nothing.

> Have you tried a bisection ?

It was a one time event for me. So a bisection is out of the question.

But Tomasz wrote that he can reproduce the issue reliably.

Have you tried a bisection, Tomasz?

-- 
Markus

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-01-04 21:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-03 18:59 TCP out of memory - possible bug [3.18.0-rc3] / sched? Tomasz Mloduchowski
2014-11-03 19:41 ` Eric Dumazet
2015-01-04  8:42   ` Markus Trippelsdorf
2015-01-04 21:01     ` Eric Dumazet
2015-01-04 21:39       ` Markus Trippelsdorf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).