netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* OOM triggered by SCTP
@ 2019-07-16 21:47 Marek Majkowski
  2019-07-16 23:20 ` Neil Horman
  2019-07-16 23:59 ` malc
  0 siblings, 2 replies; 4+ messages in thread
From: Marek Majkowski @ 2019-07-16 21:47 UTC (permalink / raw)
  To: vyasevich, nhorman, marcelo.leitner, linux-sctp; +Cc: netdev, kernel-team

Morning,

My poor man's fuzzer found something interesting in SCTP. It seems
like creating large number of SCTP sockets + some magic dance, upsets
a memory subsystem related to SCTP. The sequence:

 - create SCTP socket
 - call setsockopts (SCTP_EVENTS)
 - call bind(::1, port)
 - call sendmsg(long buffer, MSG_CONFIRM, ::1, port)
 - close SCTP socket
 - repeat couple thousand times

Full code:
https://gist.github.com/majek/bd083dae769804d39134ce01f4f802bb#file-test_sctp-c

I'm running it on virtme the simplest way:
$ virtme-run --show-boot-console --rw --pwd --kimg bzImage --memory
512M --script-sh ./test_sctp

Originally I was running it inside net namespace, and just having a
localhost interface is sufficient to trigger the problem.

Kernel is 5.2.1 (with KASAN and such, but that shouldn't be a factor).
In some tests I saw a message that might indicate something funny
hitting neighbor table:

neighbour: ndisc_cache: neighbor table overflow!

I'm not addr-decoding the stack trace, since it seems unrelated to the
root cause.

Cheers,
    Marek

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: OOM triggered by SCTP
  2019-07-16 21:47 OOM triggered by SCTP Marek Majkowski
@ 2019-07-16 23:20 ` Neil Horman
  2019-07-16 23:59 ` malc
  1 sibling, 0 replies; 4+ messages in thread
From: Neil Horman @ 2019-07-16 23:20 UTC (permalink / raw)
  To: Marek Majkowski
  Cc: vyasevich, marcelo.leitner, linux-sctp, netdev, kernel-team

On Tue, Jul 16, 2019 at 11:47:40PM +0200, Marek Majkowski wrote:
> Morning,
> 
> My poor man's fuzzer found something interesting in SCTP. It seems
> like creating large number of SCTP sockets + some magic dance, upsets
> a memory subsystem related to SCTP. The sequence:
> 
>  - create SCTP socket
>  - call setsockopts (SCTP_EVENTS)
>  - call bind(::1, port)
>  - call sendmsg(long buffer, MSG_CONFIRM, ::1, port)
>  - close SCTP socket
>  - repeat couple thousand times
> 
> Full code:
> https://gist.github.com/majek/bd083dae769804d39134ce01f4f802bb#file-test_sctp-c
> 
> I'm running it on virtme the simplest way:
> $ virtme-run --show-boot-console --rw --pwd --kimg bzImage --memory
> 512M --script-sh ./test_sctp
> 
> Originally I was running it inside net namespace, and just having a
> localhost interface is sufficient to trigger the problem.
> 
> Kernel is 5.2.1 (with KASAN and such, but that shouldn't be a factor).
> In some tests I saw a message that might indicate something funny
> hitting neighbor table:
> 
> neighbour: ndisc_cache: neighbor table overflow!
> 
> I'm not addr-decoding the stack trace, since it seems unrelated to the
> root cause.
> 
Why would you have to decode anything, the decoded stack trace should be
available in your demsg log.  Cant you just attach that here?

Neil

> Cheers,
>     Marek
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: OOM triggered by SCTP
  2019-07-16 21:47 OOM triggered by SCTP Marek Majkowski
  2019-07-16 23:20 ` Neil Horman
@ 2019-07-16 23:59 ` malc
  2019-07-17  9:08   ` Marek Majkowski
  1 sibling, 1 reply; 4+ messages in thread
From: malc @ 2019-07-16 23:59 UTC (permalink / raw)
  To: Marek Majkowski
  Cc: vyasevich, nhorman, marcelo.leitner, Linux SCTP, netdev,
	kernel-team

On Tue, Jul 16, 2019 at 10:49 PM Marek Majkowski <marek@cloudflare.com> wrote:
>
> Morning,
>
> My poor man's fuzzer found something interesting in SCTP. It seems
> like creating large number of SCTP sockets + some magic dance, upsets
> a memory subsystem related to SCTP. The sequence:
>
>  - create SCTP socket
>  - call setsockopts (SCTP_EVENTS)
>  - call bind(::1, port)
>  - call sendmsg(long buffer, MSG_CONFIRM, ::1, port)
>  - close SCTP socket
>  - repeat couple thousand times
>
> Full code:
> https://gist.github.com/majek/bd083dae769804d39134ce01f4f802bb#file-test_sctp-c
>
> I'm running it on virtme the simplest way:
> $ virtme-run --show-boot-console --rw --pwd --kimg bzImage --memory
> 512M --script-sh ./test_sctp
>
> Originally I was running it inside net namespace, and just having a
> localhost interface is sufficient to trigger the problem.
>
> Kernel is 5.2.1 (with KASAN and such, but that shouldn't be a factor).
> In some tests I saw a message that might indicate something funny
> hitting neighbor table:
>
> neighbour: ndisc_cache: neighbor table overflow!
>
> I'm not addr-decoding the stack trace, since it seems unrelated to the
> root cause.
>
> Cheers,
>     Marek

I _think_ this is an 'expected' peculiarity of SCTP on loopback - you
test_sctp.c ends up creating actual associations to itself on the same
socket (you can test safely by reducing the port range (say
30000-32000) and setting the for-loop-clause to 'run < 1')
You'll see a bunch of associations established like the following
(note that I(kernel) was dropping packets for this capture - even with
/only/ 2000 sockets used...)

$ tshark -r sctp.pcap -Y 'sctp.assoc_index==4'
  21 0.000409127          ::1 → ::1           SCTP INIT
  22 0.000436281          ::1 → ::1           SCTP INIT_ACK
  23 0.000442106          ::1 → ::1           SCTP COOKIE_ECHO
  24 0.000463007          ::1 → ::1           SCTP COOKIE_ACK DATA
(Message Fragment)
                                              presumably your close()
happens here and we enter SHUTDOWN-PENDING, where we wait for pending
data to be acknowledged, I'm not convinced that we shouldn't be
SACK'ing the data from the 'peer' at this point - but for whatever
reason, we aren't.
                                              We then run thru
path-max-retrans, and finally ABORT (the abort indication also shows
the PMR-exceeded indication in the 'Cause Information')
  25 0.000476083          ::1 → ::1           SCTP SACK
13619 3.017788109          ::1 → ::1           SCTP DATA (retransmission)
14022 3.222690889          ::1 → ::1           SCTP SACK
18922 21.938217449          ::1 → ::1           SCTP SACK
33476 69.831029904          ::1 → ::1           SCTP HEARTBEAT
33561 69.831310796          ::1 → ::1           SCTP HEARTBEAT_ACK
40816 94.102667600          ::1 → ::1           SCTP SACK
40910 95.942741287          ::1 → ::1           SCTP DATA (retransmission)
41039 96.152023010          ::1 → ::1           SCTP SACK
41100 100.182685237          ::1 → ::1           SCTP SACK
41212 108.230746764          ::1 → ::1           SCTP DATA (retransmission)
41345 108.439061392          ::1 → ::1           SCTP SACK
41407 116.422688507          ::1 → ::1           SCTP HEARTBEAT
41413 116.423183124          ::1 → ::1           SCTP HEARTBEAT_ACK
41494 124.823749255          ::1 → ::1           SCTP SACK
41576 126.663648718          ::1 → ::1           SCTP ABORT

With your entire 512M - you'd only have about 16KB for each of these
31K associations tops, I suspect that having a 64KB pending data chunk
(fragmented ULP msg) for each association for >= 90s is what is
exhausting memory here - although I'm sure Neil or Michael will be
along to correct me ;-)

What's interesting - as you reduce the payload size - we end up
bundling DATA from the 'initiator' side (in COOKIE ECHO) - and
everything works as expected... (the SACK here is for the bundled DATA
chunks TSN.

mlashley@duality /tmp $ tshark -r /tmp/sctp_index4_10K.pcap
   1 0.000000000          ::1 → ::1          SCTP INIT
   2 0.000014491          ::1 → ::1          SCTP INIT_ACK
   3 0.000024190          ::1 → ::1          SCTP COOKIE_ECHO DATA
   4 0.000034833          ::1 → ::1          SCTP COOKIE_ACK
   5 0.000040646          ::1 → ::1          SCTP SACK
   6 0.000050287          ::1 → ::1          SCTP ABORT

In short - the SCTP associations /can/ persist after user-space calls
close() whilst there is outstanding data (for path.max.retrans *
rto-with-doubling[due to T3-rtx expiry])

(My tests on 5.2.0 as it is what I had to hand...)

Cheers,
malc.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: OOM triggered by SCTP
  2019-07-16 23:59 ` malc
@ 2019-07-17  9:08   ` Marek Majkowski
  0 siblings, 0 replies; 4+ messages in thread
From: Marek Majkowski @ 2019-07-17  9:08 UTC (permalink / raw)
  To: malc; +Cc: vyasevich, nhorman, marcelo.leitner, Linux SCTP, netdev,
	kernel-team

Malc, thanks taking a look.

I'm able to trigger the problem on non-SMP virtme with 4GiB ram, but
I'm not able to trigger it on my SMP host with 16GiB.

The slab info from dmesg (on 4GiB run):
Unreclaimable slab info:
SCTPv6                 31068KB      31068KB
sctp_chunk             24321KB      24990KB
sctp_bind_bucket         972KB        972KB
skbuff_head_cache      28484KB      29051KB
kmalloc-8k                82KB        148KB
kmalloc-4k             81897KB      82943KB
kmalloc-2k               314KB        382KB
kmalloc-1k             27446KB      29547KB
kmalloc-512            30312KB      30915KB

The biggest issue is that the OOM is often unrecoverable:

---[ end Kernel panic - not syncing: System is deadlocked on memory ]---
Out of memory and no killable processes...
Kernel panic - not syncing: System is deadlocked on memory

I noticed sctp_mem toggle. Would tweaking it change anything?
net.sctp.sctp_mem = 80976    107969    161952
net.sctp.sctp_rmem = 4096    865500    3455008
net.sctp.sctp_wmem = 4096    16384    3455008

For the record, stuffing "shutdown(sd, SHUT_RDWR)" before the "close"
doesn't solve the problem.

Marek

On Wed, Jul 17, 2019 at 1:59 AM malc <mlashley@gmail.com> wrote:
>
> On Tue, Jul 16, 2019 at 10:49 PM Marek Majkowski <marek@cloudflare.com> wrote:
> >
> > Morning,
> >
> > My poor man's fuzzer found something interesting in SCTP. It seems
> > like creating large number of SCTP sockets + some magic dance, upsets
> > a memory subsystem related to SCTP. The sequence:
> >
> >  - create SCTP socket
> >  - call setsockopts (SCTP_EVENTS)
> >  - call bind(::1, port)
> >  - call sendmsg(long buffer, MSG_CONFIRM, ::1, port)
> >  - close SCTP socket
> >  - repeat couple thousand times
> >
> > Full code:
> > https://gist.github.com/majek/bd083dae769804d39134ce01f4f802bb#file-test_sctp-c
> >
> > I'm running it on virtme the simplest way:
> > $ virtme-run --show-boot-console --rw --pwd --kimg bzImage --memory
> > 512M --script-sh ./test_sctp
> >
> > Originally I was running it inside net namespace, and just having a
> > localhost interface is sufficient to trigger the problem.
> >
> > Kernel is 5.2.1 (with KASAN and such, but that shouldn't be a factor).
> > In some tests I saw a message that might indicate something funny
> > hitting neighbor table:
> >
> > neighbour: ndisc_cache: neighbor table overflow!
> >
> > I'm not addr-decoding the stack trace, since it seems unrelated to the
> > root cause.
> >
> > Cheers,
> >     Marek
>
> I _think_ this is an 'expected' peculiarity of SCTP on loopback - you
> test_sctp.c ends up creating actual associations to itself on the same
> socket (you can test safely by reducing the port range (say
> 30000-32000) and setting the for-loop-clause to 'run < 1')
> You'll see a bunch of associations established like the following
> (note that I(kernel) was dropping packets for this capture - even with
> /only/ 2000 sockets used...)
>
> $ tshark -r sctp.pcap -Y 'sctp.assoc_index==4'
>   21 0.000409127          ::1 → ::1           SCTP INIT
>   22 0.000436281          ::1 → ::1           SCTP INIT_ACK
>   23 0.000442106          ::1 → ::1           SCTP COOKIE_ECHO
>   24 0.000463007          ::1 → ::1           SCTP COOKIE_ACK DATA
> (Message Fragment)
>                                               presumably your close()
> happens here and we enter SHUTDOWN-PENDING, where we wait for pending
> data to be acknowledged, I'm not convinced that we shouldn't be
> SACK'ing the data from the 'peer' at this point - but for whatever
> reason, we aren't.
>                                               We then run thru
> path-max-retrans, and finally ABORT (the abort indication also shows
> the PMR-exceeded indication in the 'Cause Information')
>   25 0.000476083          ::1 → ::1           SCTP SACK
> 13619 3.017788109          ::1 → ::1           SCTP DATA (retransmission)
> 14022 3.222690889          ::1 → ::1           SCTP SACK
> 18922 21.938217449          ::1 → ::1           SCTP SACK
> 33476 69.831029904          ::1 → ::1           SCTP HEARTBEAT
> 33561 69.831310796          ::1 → ::1           SCTP HEARTBEAT_ACK
> 40816 94.102667600          ::1 → ::1           SCTP SACK
> 40910 95.942741287          ::1 → ::1           SCTP DATA (retransmission)
> 41039 96.152023010          ::1 → ::1           SCTP SACK
> 41100 100.182685237          ::1 → ::1           SCTP SACK
> 41212 108.230746764          ::1 → ::1           SCTP DATA (retransmission)
> 41345 108.439061392          ::1 → ::1           SCTP SACK
> 41407 116.422688507          ::1 → ::1           SCTP HEARTBEAT
> 41413 116.423183124          ::1 → ::1           SCTP HEARTBEAT_ACK
> 41494 124.823749255          ::1 → ::1           SCTP SACK
> 41576 126.663648718          ::1 → ::1           SCTP ABORT
>
> With your entire 512M - you'd only have about 16KB for each of these
> 31K associations tops, I suspect that having a 64KB pending data chunk
> (fragmented ULP msg) for each association for >= 90s is what is
> exhausting memory here - although I'm sure Neil or Michael will be
> along to correct me ;-)
>
> What's interesting - as you reduce the payload size - we end up
> bundling DATA from the 'initiator' side (in COOKIE ECHO) - and
> everything works as expected... (the SACK here is for the bundled DATA
> chunks TSN.
>
> mlashley@duality /tmp $ tshark -r /tmp/sctp_index4_10K.pcap
>    1 0.000000000          ::1 → ::1          SCTP INIT
>    2 0.000014491          ::1 → ::1          SCTP INIT_ACK
>    3 0.000024190          ::1 → ::1          SCTP COOKIE_ECHO DATA
>    4 0.000034833          ::1 → ::1          SCTP COOKIE_ACK
>    5 0.000040646          ::1 → ::1          SCTP SACK
>    6 0.000050287          ::1 → ::1          SCTP ABORT
>
> In short - the SCTP associations /can/ persist after user-space calls
> close() whilst there is outstanding data (for path.max.retrans *
> rto-with-doubling[due to T3-rtx expiry])
>
> (My tests on 5.2.0 as it is what I had to hand...)
>
> Cheers,
> malc.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-07-17  9:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-07-16 21:47 OOM triggered by SCTP Marek Majkowski
2019-07-16 23:20 ` Neil Horman
2019-07-16 23:59 ` malc
2019-07-17  9:08   ` Marek Majkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).