netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* conntrack (possibly) hangs on our ARM CPU in case we delete 5k+ connections as fast as possible
@ 2017-03-14  9:56 Peter Marczis
  2017-03-14 10:33 ` Florian Westphal
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Marczis @ 2017-03-14  9:56 UTC (permalink / raw)
  To: netfilter-devel

Hello developers,
I'm seeking some help to debug and solve one of my issues.

At GreenWave I'm working on a "SOHO" Router product, and of course we
use linux / netfilter.
We observed that if we create 30k connections, everything works as
expected, but when we start to disconnect them,
conntrack (well not confirmed yet fully) makes the kernel side busy,
and looks like no scheduling happens.

The whole thing works as expected, the only problem it makes our
processes and well everything on user side hanging for a couple of
seconds 10-30s,
which of course triggers our HW Watchdog, and we end up in a reboot.

This is of course a corner case, I just would like to understand
what's happening. I dig as deep in the code I can, looks like there is
a hash table protected by RCU in the code.
What I really seek is some help on how could I debug the kernel side
things better, to see if it locks on this table long, or why the
scheduling is not happening.

Some extra info:
Kernel: 3.4.108 - armv7l - conntrack compiled into it.

~ # cat /proc/cpuinfo
Processor       : ARMv7 Processor rev 1 (v7l)
processor       : 0
BogoMIPS        : 1594.16

processor       : 1
BogoMIPS        : 1594.16

Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc09
CPU revision    : 1

Hardware        : CORTINA-G2 EB
Revision        : 7542a1
Serial          : 0000000000000000

Idle value:
~ # cat /proc/meminfo
MemTotal:         436936 kB
MemFree:          298336 kB
Buffers:              16 kB
Cached:            30504 kB
SwapCached:            0 kB


Thanks a lot in advance ! I really appreciate your work !

-- 
Br,
 Peter G. Marczis
  SW. Developer
 +45 28 12 92 10

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: conntrack (possibly) hangs on our ARM CPU in case we delete 5k+ connections as fast as possible
  2017-03-14  9:56 conntrack (possibly) hangs on our ARM CPU in case we delete 5k+ connections as fast as possible Peter Marczis
@ 2017-03-14 10:33 ` Florian Westphal
  2017-03-14 11:11   ` Peter Marczis
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2017-03-14 10:33 UTC (permalink / raw)
  To: Peter Marczis; +Cc: netfilter-devel

Peter Marczis <peter.marczis@greenwavesystems.com> wrote:
> Hello developers,
> I'm seeking some help to debug and solve one of my issues.
> 
> We observed that if we create 30k connections, everything works as
> expected, but when we start to disconnect them,
> conntrack (well not confirmed yet fully) makes the kernel side busy,
> and looks like no scheduling happens.

What do you mean by 'disconnect'?  conntrack -F ?

My wild guss is you need to backport

commit d93c6258ee4255749c10012c50a31c08f4e9fb16
netfilter: conntrack: resched in nf_ct_iterate_cleanup

> The whole thing works as expected, the only problem it makes our
> processes and well everything on user side hanging for a couple of
> seconds 10-30s,
> which of course triggers our HW Watchdog, and we end up in a reboot.

You could try

CONFIG_LOCKUP_DETECTOR=y
CONFIG_HARDLOCKUP_DETECTOR=y


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: conntrack (possibly) hangs on our ARM CPU in case we delete 5k+ connections as fast as possible
  2017-03-14 10:33 ` Florian Westphal
@ 2017-03-14 11:11   ` Peter Marczis
  2017-03-14 11:20     ` Florian Westphal
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Marczis @ 2017-03-14 11:11 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

Hi,
I mean we destroy the sockets, we used two very basic python script to
open and close TCP sockets between the WAN and LAN interface.

Thanks for the hint, I will try those !

Br,
 Peter

On Tue, Mar 14, 2017 at 11:33 AM, Florian Westphal <fw@strlen.de> wrote:
> Peter Marczis <peter.marczis@greenwavesystems.com> wrote:
>> Hello developers,
>> I'm seeking some help to debug and solve one of my issues.
>>
>> We observed that if we create 30k connections, everything works as
>> expected, but when we start to disconnect them,
>> conntrack (well not confirmed yet fully) makes the kernel side busy,
>> and looks like no scheduling happens.
>
> What do you mean by 'disconnect'?  conntrack -F ?
>
> My wild guss is you need to backport
>
> commit d93c6258ee4255749c10012c50a31c08f4e9fb16
> netfilter: conntrack: resched in nf_ct_iterate_cleanup
>
>> The whole thing works as expected, the only problem it makes our
>> processes and well everything on user side hanging for a couple of
>> seconds 10-30s,
>> which of course triggers our HW Watchdog, and we end up in a reboot.
>
> You could try
>
> CONFIG_LOCKUP_DETECTOR=y
> CONFIG_HARDLOCKUP_DETECTOR=y
>



-- 
Br,
 Peter G. Marczis
  SW. Developer
 +45 28 12 92 10

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: conntrack (possibly) hangs on our ARM CPU in case we delete 5k+ connections as fast as possible
  2017-03-14 11:11   ` Peter Marczis
@ 2017-03-14 11:20     ` Florian Westphal
  2017-03-16 10:58       ` Peter Marczis
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2017-03-14 11:20 UTC (permalink / raw)
  To: Peter Marczis; +Cc: Florian Westphal, netfilter-devel

Peter Marczis <peter.marczis@greenwavesystems.com> wrote:
> I mean we destroy the sockets, we used two very basic python script to
> open and close TCP sockets between the WAN and LAN interface.

Then my guess wrt. nf_ct_iterate_cleanup is certainly wrong.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: conntrack (possibly) hangs on our ARM CPU in case we delete 5k+ connections as fast as possible
  2017-03-14 11:20     ` Florian Westphal
@ 2017-03-16 10:58       ` Peter Marczis
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Marczis @ 2017-03-16 10:58 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

Hello Florian !
Thanks a lot again for the help. We decided to not backport the
commit, as we far behind. We will upgrade the kernel instead in the
future...
Again, thanks a lot for your help.
Br,
 Peter.
Peter G. Marczis
SW. Developer
Bregneroedvej 96, 3460 Birkeroed, Denmark
Cell: +45 28 12 92 10 Skype: 28199210


*This e-mail may contain confidential and privileged material for the
sole use of the intended recipient. Any review, use, distribution or
disclosure by others is strictly prohibited. If you are not the
intended recipient (or authorized to receive for the recipient),
please contact the sender by reply e-mail and delete all copies of
this message.



On Tue, Mar 14, 2017 at 12:20 PM, Florian Westphal <fw@strlen.de> wrote:
> Peter Marczis <peter.marczis@greenwavesystems.com> wrote:
>> I mean we destroy the sockets, we used two very basic python script to
>> open and close TCP sockets between the WAN and LAN interface.
>
> Then my guess wrt. nf_ct_iterate_cleanup is certainly wrong.
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-03-16 10:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-14  9:56 conntrack (possibly) hangs on our ARM CPU in case we delete 5k+ connections as fast as possible Peter Marczis
2017-03-14 10:33 ` Florian Westphal
2017-03-14 11:11   ` Peter Marczis
2017-03-14 11:20     ` Florian Westphal
2017-03-16 10:58       ` Peter Marczis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).