netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <jakub.kicinski@netronome.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: ast@kernel.org, Eric Dumazet <edumazet@google.com>,
	netdev@vger.kernel.org,
	David Beckett <david.beckett@netronome.com>,
	David Miller <davem@davemloft.net>
Subject: Re: [bpf PATCH v4 1/4] bpf: tls, implement unhash to avoid transition out of ESTABLISHED
Date: Wed, 22 May 2019 09:57:30 -0700	[thread overview]
Message-ID: <20190522095730.047ad08f@cakuba.netronome.com> (raw)
In-Reply-To: <155746426913.20677.2783358822817593806.stgit@john-XPS-13-9360>

On Thu, 09 May 2019 21:57:49 -0700, John Fastabend wrote:
> It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE
> state via tcp_disconnect() without calling into close callback. This
> would allow a kTLS enabled socket to exist outside of ESTABLISHED
> state which is not supported.
> 
> Solve this the same way we solved the sock{map|hash} case by adding
> an unhash hook to remove tear down the TLS state.
> 
> In the process we also make the close hook more robust. We add a put
> call into the close path, also in the unhash path, to remove the
> reference to ulp data after free. Its no longer valid and may confuse
> things later if the socket (re)enters kTLS code paths. Second we add
> an 'if(ctx)' check to ensure the ctx is still valid and not released
> from a previous unhash/close path.
> 
> Fixes: d91c3e17f75f2 ("net/tls: Only attach to sockets in ESTABLISHED state")
> Reported-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Looks like David Beckett managed to trigger another nasty on the
release path :/

    BUG: kernel NULL pointer dereference, address: 0000000000000012
    PGD 0 P4D 0
    Oops: 0000 [#1] SMP PTI
    CPU: 7 PID: 0 Comm: swapper/7 Not tainted
    5.2.0-rc1-00139-g14629453a6d3 #21 RIP: 0010:tcp_peek_len+0x10/0x60
    RSP: 0018:ffffc02e41c54b98 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff9cf924c4e030 RCX: 0000000000000051
    RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff9cf97128f480
    RBP: ffff9cf9365e0300 R08: ffff9cf94fe7d2c0 R09: 0000000000000000
    R10: 000000000000036b R11: ffff9cf939735e00 R12: ffff9cf91ad9ae40
    R13: ffff9cf924c4e000 R14: ffff9cf9a8fcbaae R15: 0000000000000020
    FS: 0000000000000000(0000) GS:ffff9cf9af7c0000(0000)
    knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0:
    0000000080050033 CR2: 0000000000000012 CR3: 000000013920a003 CR4:
    00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2:
    0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
    0000000000000400 Call Trace:
     <IRQ>
     strp_data_ready+0x48/0x90
     tls_data_ready+0x22/0xd0 [tls]
     tcp_rcv_established+0x569/0x620
     tcp_v4_do_rcv+0x127/0x1e0
     tcp_v4_rcv+0xad7/0xbf0
     ip_protocol_deliver_rcu+0x2c/0x1c0
     ip_local_deliver_finish+0x41/0x50
     ip_local_deliver+0x6b/0xe0
     ? ip_protocol_deliver_rcu+0x1c0/0x1c0
     ip_rcv+0x52/0xd0
     ? ip_rcv_finish_core.isra.20+0x380/0x380
     __netif_receive_skb_one_core+0x7e/0x90
     netif_receive_skb_internal+0x42/0xf0
     napi_gro_receive+0xed/0x150
     nfp_net_poll+0x7a2/0xd30 [nfp]
     ? kmem_cache_free_bulk+0x286/0x310
     net_rx_action+0x149/0x3b0
     __do_softirq+0xe3/0x30a
     ? handle_irq_event_percpu+0x6a/0x80
     irq_exit+0xe8/0xf0
     do_IRQ+0x85/0xd0
     common_interrupt+0xf/0xf
     </IRQ>
    RIP: 0010:cpuidle_enter_state+0xbc/0x450

If I read this right strparser calls sock->ops->peek_len(sock), but the
sock->sk is already NULL.  I'm guess this is because inet_release()
does:

		sock->sk = NULL;
		sk->sk_prot->close(sk, timeout);

And I don't really see a way for ktls to know that sock->sk is about to
be cleared, and therefore no way to stop strparser.  Or for strparser
to always do the check, given tcp_peek_len() will do another dereference
of sock->sk :S

That's mostly a guess, it takes me half an hour of ktls connections
running to repro.

Any advice would be appreciated..  Can we move the sock->sk assignment
after close?..

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 5183a2daba64..aff93e7cdb31 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -428,8 +428,8 @@ int inet_release(struct socket *sock)
                if (sock_flag(sk, SOCK_LINGER) &&
                    !(current->flags & PF_EXITING))
                        timeout = sk->sk_lingertime;
-               sock->sk = NULL;
                sk->sk_prot->close(sk, timeout);
+               sock->sk = NULL;
        }
        return 0;
 }

I don't see IPv6 clearing this pointer, perhaps we don't have to?
We tested it and it seems to works, but this is pre-git code, so
it's hard to tell what the reason to clear was :)

  parent reply	other threads:[~2019-05-22 16:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-10  4:57 [bpf PATCH v4 0/4] sockmap/ktls fixes John Fastabend
2019-05-10  4:57 ` [bpf PATCH v4 1/4] bpf: tls, implement unhash to avoid transition out of ESTABLISHED John Fastabend
2019-05-10 16:53   ` Jakub Kicinski
2019-05-10 17:00   ` Jakub Kicinski
2019-05-10 23:03     ` John Fastabend
2019-05-14 22:34       ` John Fastabend
2019-05-14 22:58         ` Jakub Kicinski
2019-05-15  4:17           ` John Fastabend
2019-05-22 16:57   ` Jakub Kicinski [this message]
2019-05-22 21:57     ` John Fastabend
2019-05-22 22:15       ` Jakub Kicinski
2019-05-10  4:58 ` [bpf PATCH v4 2/4] bpf: sockmap, only stop/flush strp if it was enabled at some point John Fastabend
2019-05-10  4:58 ` [bpf PATCH v4 3/4] bpf: sockmap remove duplicate queue free John Fastabend
2019-05-10  4:58 ` [bpf PATCH v4 4/4] bpf: sockmap fix msg->sg.size account on ingress skb John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190522095730.047ad08f@cakuba.netronome.com \
    --to=jakub.kicinski@netronome.com \
    --cc=ast@kernel.org \
    --cc=davem@davemloft.net \
    --cc=david.beckett@netronome.com \
    --cc=edumazet@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).