* OOM triggered by SCTP
@ 2019-07-16 21:47 Marek Majkowski
2019-07-16 23:20 ` Neil Horman
2019-07-16 23:59 ` malc
0 siblings, 2 replies; 4+ messages in thread
From: Marek Majkowski @ 2019-07-16 21:47 UTC (permalink / raw)
To: vyasevich, nhorman, marcelo.leitner, linux-sctp; +Cc: netdev, kernel-team
Morning,
My poor man's fuzzer found something interesting in SCTP. It seems
like creating large number of SCTP sockets + some magic dance, upsets
a memory subsystem related to SCTP. The sequence:
- create SCTP socket
- call setsockopts (SCTP_EVENTS)
- call bind(::1, port)
- call sendmsg(long buffer, MSG_CONFIRM, ::1, port)
- close SCTP socket
- repeat couple thousand times
Full code:
https://gist.github.com/majek/bd083dae769804d39134ce01f4f802bb#file-test_sctp-c
I'm running it on virtme the simplest way:
$ virtme-run --show-boot-console --rw --pwd --kimg bzImage --memory
512M --script-sh ./test_sctp
Originally I was running it inside net namespace, and just having a
localhost interface is sufficient to trigger the problem.
Kernel is 5.2.1 (with KASAN and such, but that shouldn't be a factor).
In some tests I saw a message that might indicate something funny
hitting neighbor table:
neighbour: ndisc_cache: neighbor table overflow!
I'm not addr-decoding the stack trace, since it seems unrelated to the
root cause.
Cheers,
Marek
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: OOM triggered by SCTP 2019-07-16 21:47 OOM triggered by SCTP Marek Majkowski @ 2019-07-16 23:20 ` Neil Horman 2019-07-16 23:59 ` malc 1 sibling, 0 replies; 4+ messages in thread From: Neil Horman @ 2019-07-16 23:20 UTC (permalink / raw) To: Marek Majkowski Cc: vyasevich, marcelo.leitner, linux-sctp, netdev, kernel-team On Tue, Jul 16, 2019 at 11:47:40PM +0200, Marek Majkowski wrote: > Morning, > > My poor man's fuzzer found something interesting in SCTP. It seems > like creating large number of SCTP sockets + some magic dance, upsets > a memory subsystem related to SCTP. The sequence: > > - create SCTP socket > - call setsockopts (SCTP_EVENTS) > - call bind(::1, port) > - call sendmsg(long buffer, MSG_CONFIRM, ::1, port) > - close SCTP socket > - repeat couple thousand times > > Full code: > https://gist.github.com/majek/bd083dae769804d39134ce01f4f802bb#file-test_sctp-c > > I'm running it on virtme the simplest way: > $ virtme-run --show-boot-console --rw --pwd --kimg bzImage --memory > 512M --script-sh ./test_sctp > > Originally I was running it inside net namespace, and just having a > localhost interface is sufficient to trigger the problem. > > Kernel is 5.2.1 (with KASAN and such, but that shouldn't be a factor). > In some tests I saw a message that might indicate something funny > hitting neighbor table: > > neighbour: ndisc_cache: neighbor table overflow! > > I'm not addr-decoding the stack trace, since it seems unrelated to the > root cause. > Why would you have to decode anything, the decoded stack trace should be available in your demsg log. Cant you just attach that here? Neil > Cheers, > Marek > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: OOM triggered by SCTP 2019-07-16 21:47 OOM triggered by SCTP Marek Majkowski 2019-07-16 23:20 ` Neil Horman @ 2019-07-16 23:59 ` malc 2019-07-17 9:08 ` Marek Majkowski 1 sibling, 1 reply; 4+ messages in thread From: malc @ 2019-07-16 23:59 UTC (permalink / raw) To: Marek Majkowski Cc: vyasevich, nhorman, marcelo.leitner, Linux SCTP, netdev, kernel-team On Tue, Jul 16, 2019 at 10:49 PM Marek Majkowski <marek@cloudflare.com> wrote: > > Morning, > > My poor man's fuzzer found something interesting in SCTP. It seems > like creating large number of SCTP sockets + some magic dance, upsets > a memory subsystem related to SCTP. The sequence: > > - create SCTP socket > - call setsockopts (SCTP_EVENTS) > - call bind(::1, port) > - call sendmsg(long buffer, MSG_CONFIRM, ::1, port) > - close SCTP socket > - repeat couple thousand times > > Full code: > https://gist.github.com/majek/bd083dae769804d39134ce01f4f802bb#file-test_sctp-c > > I'm running it on virtme the simplest way: > $ virtme-run --show-boot-console --rw --pwd --kimg bzImage --memory > 512M --script-sh ./test_sctp > > Originally I was running it inside net namespace, and just having a > localhost interface is sufficient to trigger the problem. > > Kernel is 5.2.1 (with KASAN and such, but that shouldn't be a factor). > In some tests I saw a message that might indicate something funny > hitting neighbor table: > > neighbour: ndisc_cache: neighbor table overflow! > > I'm not addr-decoding the stack trace, since it seems unrelated to the > root cause. > > Cheers, > Marek I _think_ this is an 'expected' peculiarity of SCTP on loopback - you test_sctp.c ends up creating actual associations to itself on the same socket (you can test safely by reducing the port range (say 30000-32000) and setting the for-loop-clause to 'run < 1') You'll see a bunch of associations established like the following (note that I(kernel) was dropping packets for this capture - even with /only/ 2000 sockets used...) $ tshark -r sctp.pcap -Y 'sctp.assoc_index==4' 21 0.000409127 ::1 → ::1 SCTP INIT 22 0.000436281 ::1 → ::1 SCTP INIT_ACK 23 0.000442106 ::1 → ::1 SCTP COOKIE_ECHO 24 0.000463007 ::1 → ::1 SCTP COOKIE_ACK DATA (Message Fragment) presumably your close() happens here and we enter SHUTDOWN-PENDING, where we wait for pending data to be acknowledged, I'm not convinced that we shouldn't be SACK'ing the data from the 'peer' at this point - but for whatever reason, we aren't. We then run thru path-max-retrans, and finally ABORT (the abort indication also shows the PMR-exceeded indication in the 'Cause Information') 25 0.000476083 ::1 → ::1 SCTP SACK 13619 3.017788109 ::1 → ::1 SCTP DATA (retransmission) 14022 3.222690889 ::1 → ::1 SCTP SACK 18922 21.938217449 ::1 → ::1 SCTP SACK 33476 69.831029904 ::1 → ::1 SCTP HEARTBEAT 33561 69.831310796 ::1 → ::1 SCTP HEARTBEAT_ACK 40816 94.102667600 ::1 → ::1 SCTP SACK 40910 95.942741287 ::1 → ::1 SCTP DATA (retransmission) 41039 96.152023010 ::1 → ::1 SCTP SACK 41100 100.182685237 ::1 → ::1 SCTP SACK 41212 108.230746764 ::1 → ::1 SCTP DATA (retransmission) 41345 108.439061392 ::1 → ::1 SCTP SACK 41407 116.422688507 ::1 → ::1 SCTP HEARTBEAT 41413 116.423183124 ::1 → ::1 SCTP HEARTBEAT_ACK 41494 124.823749255 ::1 → ::1 SCTP SACK 41576 126.663648718 ::1 → ::1 SCTP ABORT With your entire 512M - you'd only have about 16KB for each of these 31K associations tops, I suspect that having a 64KB pending data chunk (fragmented ULP msg) for each association for >= 90s is what is exhausting memory here - although I'm sure Neil or Michael will be along to correct me ;-) What's interesting - as you reduce the payload size - we end up bundling DATA from the 'initiator' side (in COOKIE ECHO) - and everything works as expected... (the SACK here is for the bundled DATA chunks TSN. mlashley@duality /tmp $ tshark -r /tmp/sctp_index4_10K.pcap 1 0.000000000 ::1 → ::1 SCTP INIT 2 0.000014491 ::1 → ::1 SCTP INIT_ACK 3 0.000024190 ::1 → ::1 SCTP COOKIE_ECHO DATA 4 0.000034833 ::1 → ::1 SCTP COOKIE_ACK 5 0.000040646 ::1 → ::1 SCTP SACK 6 0.000050287 ::1 → ::1 SCTP ABORT In short - the SCTP associations /can/ persist after user-space calls close() whilst there is outstanding data (for path.max.retrans * rto-with-doubling[due to T3-rtx expiry]) (My tests on 5.2.0 as it is what I had to hand...) Cheers, malc. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: OOM triggered by SCTP 2019-07-16 23:59 ` malc @ 2019-07-17 9:08 ` Marek Majkowski 0 siblings, 0 replies; 4+ messages in thread From: Marek Majkowski @ 2019-07-17 9:08 UTC (permalink / raw) To: malc; +Cc: vyasevich, nhorman, marcelo.leitner, Linux SCTP, netdev, kernel-team Malc, thanks taking a look. I'm able to trigger the problem on non-SMP virtme with 4GiB ram, but I'm not able to trigger it on my SMP host with 16GiB. The slab info from dmesg (on 4GiB run): Unreclaimable slab info: SCTPv6 31068KB 31068KB sctp_chunk 24321KB 24990KB sctp_bind_bucket 972KB 972KB skbuff_head_cache 28484KB 29051KB kmalloc-8k 82KB 148KB kmalloc-4k 81897KB 82943KB kmalloc-2k 314KB 382KB kmalloc-1k 27446KB 29547KB kmalloc-512 30312KB 30915KB The biggest issue is that the OOM is often unrecoverable: ---[ end Kernel panic - not syncing: System is deadlocked on memory ]--- Out of memory and no killable processes... Kernel panic - not syncing: System is deadlocked on memory I noticed sctp_mem toggle. Would tweaking it change anything? net.sctp.sctp_mem = 80976 107969 161952 net.sctp.sctp_rmem = 4096 865500 3455008 net.sctp.sctp_wmem = 4096 16384 3455008 For the record, stuffing "shutdown(sd, SHUT_RDWR)" before the "close" doesn't solve the problem. Marek On Wed, Jul 17, 2019 at 1:59 AM malc <mlashley@gmail.com> wrote: > > On Tue, Jul 16, 2019 at 10:49 PM Marek Majkowski <marek@cloudflare.com> wrote: > > > > Morning, > > > > My poor man's fuzzer found something interesting in SCTP. It seems > > like creating large number of SCTP sockets + some magic dance, upsets > > a memory subsystem related to SCTP. The sequence: > > > > - create SCTP socket > > - call setsockopts (SCTP_EVENTS) > > - call bind(::1, port) > > - call sendmsg(long buffer, MSG_CONFIRM, ::1, port) > > - close SCTP socket > > - repeat couple thousand times > > > > Full code: > > https://gist.github.com/majek/bd083dae769804d39134ce01f4f802bb#file-test_sctp-c > > > > I'm running it on virtme the simplest way: > > $ virtme-run --show-boot-console --rw --pwd --kimg bzImage --memory > > 512M --script-sh ./test_sctp > > > > Originally I was running it inside net namespace, and just having a > > localhost interface is sufficient to trigger the problem. > > > > Kernel is 5.2.1 (with KASAN and such, but that shouldn't be a factor). > > In some tests I saw a message that might indicate something funny > > hitting neighbor table: > > > > neighbour: ndisc_cache: neighbor table overflow! > > > > I'm not addr-decoding the stack trace, since it seems unrelated to the > > root cause. > > > > Cheers, > > Marek > > I _think_ this is an 'expected' peculiarity of SCTP on loopback - you > test_sctp.c ends up creating actual associations to itself on the same > socket (you can test safely by reducing the port range (say > 30000-32000) and setting the for-loop-clause to 'run < 1') > You'll see a bunch of associations established like the following > (note that I(kernel) was dropping packets for this capture - even with > /only/ 2000 sockets used...) > > $ tshark -r sctp.pcap -Y 'sctp.assoc_index==4' > 21 0.000409127 ::1 → ::1 SCTP INIT > 22 0.000436281 ::1 → ::1 SCTP INIT_ACK > 23 0.000442106 ::1 → ::1 SCTP COOKIE_ECHO > 24 0.000463007 ::1 → ::1 SCTP COOKIE_ACK DATA > (Message Fragment) > presumably your close() > happens here and we enter SHUTDOWN-PENDING, where we wait for pending > data to be acknowledged, I'm not convinced that we shouldn't be > SACK'ing the data from the 'peer' at this point - but for whatever > reason, we aren't. > We then run thru > path-max-retrans, and finally ABORT (the abort indication also shows > the PMR-exceeded indication in the 'Cause Information') > 25 0.000476083 ::1 → ::1 SCTP SACK > 13619 3.017788109 ::1 → ::1 SCTP DATA (retransmission) > 14022 3.222690889 ::1 → ::1 SCTP SACK > 18922 21.938217449 ::1 → ::1 SCTP SACK > 33476 69.831029904 ::1 → ::1 SCTP HEARTBEAT > 33561 69.831310796 ::1 → ::1 SCTP HEARTBEAT_ACK > 40816 94.102667600 ::1 → ::1 SCTP SACK > 40910 95.942741287 ::1 → ::1 SCTP DATA (retransmission) > 41039 96.152023010 ::1 → ::1 SCTP SACK > 41100 100.182685237 ::1 → ::1 SCTP SACK > 41212 108.230746764 ::1 → ::1 SCTP DATA (retransmission) > 41345 108.439061392 ::1 → ::1 SCTP SACK > 41407 116.422688507 ::1 → ::1 SCTP HEARTBEAT > 41413 116.423183124 ::1 → ::1 SCTP HEARTBEAT_ACK > 41494 124.823749255 ::1 → ::1 SCTP SACK > 41576 126.663648718 ::1 → ::1 SCTP ABORT > > With your entire 512M - you'd only have about 16KB for each of these > 31K associations tops, I suspect that having a 64KB pending data chunk > (fragmented ULP msg) for each association for >= 90s is what is > exhausting memory here - although I'm sure Neil or Michael will be > along to correct me ;-) > > What's interesting - as you reduce the payload size - we end up > bundling DATA from the 'initiator' side (in COOKIE ECHO) - and > everything works as expected... (the SACK here is for the bundled DATA > chunks TSN. > > mlashley@duality /tmp $ tshark -r /tmp/sctp_index4_10K.pcap > 1 0.000000000 ::1 → ::1 SCTP INIT > 2 0.000014491 ::1 → ::1 SCTP INIT_ACK > 3 0.000024190 ::1 → ::1 SCTP COOKIE_ECHO DATA > 4 0.000034833 ::1 → ::1 SCTP COOKIE_ACK > 5 0.000040646 ::1 → ::1 SCTP SACK > 6 0.000050287 ::1 → ::1 SCTP ABORT > > In short - the SCTP associations /can/ persist after user-space calls > close() whilst there is outstanding data (for path.max.retrans * > rto-with-doubling[due to T3-rtx expiry]) > > (My tests on 5.2.0 as it is what I had to hand...) > > Cheers, > malc. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-07-17 9:09 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-07-16 21:47 OOM triggered by SCTP Marek Majkowski 2019-07-16 23:20 ` Neil Horman 2019-07-16 23:59 ` malc 2019-07-17 9:08 ` Marek Majkowski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).