BPF List
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	John Fastabend <john.fastabend@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>, bpf <bpf@vger.kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Jussi Maki <joamaki@gmail.com>
Subject: Re: sockmap test is broken
Date: Wed, 17 Nov 2021 21:20:46 -0800	[thread overview]
Message-ID: <6195e2ae7a82f_2b4cc20884@john.notmuch> (raw)
In-Reply-To: <CAEf4Bza6HHeVTFxrmPJRUgsLYU7g06MctMoAGy3ayKq8ES9FTQ@mail.gmail.com>

Andrii Nakryiko wrote:
> On Wed, Nov 17, 2021 at 10:00 AM John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > Andrii Nakryiko wrote:
> > > On Sun, Nov 14, 2021 at 9:21 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > > >
> > > > Alexei Starovoitov wrote:
> > > > > test_maps is failing in bpf tree:
> > > > >
> > > > > $ ./test_maps
> > > > > Failed sockmap recv
> > > > >
> > > > > and causing BPF CI to stay red.
> > > > >
> > > > > Since bpf-next is fine, I suspect it is one of John's or Jussi's patches.
> > > > >
> > > > > Please take a look.
> > > >
> > > > I'll look into it thanks.
> > >
> > > Any updates, John? Should we just disable test_maps in CI to make it
> > > useful again?
> >
> > I'm debugging this now. Hopefully I'll have a fix shortly (today I hope).
> > Maybe, it makes sense to wait for EOD and if I still don't have the fix
> > disable it then. Anyways fixing it is top of list now.
> 
> Sounds good, let's hope you find it and fix it today.

OK got the fix, but its fairly subtle. Whats happening is when socks are
removed from a map their programs are not actually being removed. They
continue to live with the sock for the lifetime of the socket or until
the last reference held from BPF side is lost. At which point all progs
are dropped and socket returns to normal/preBPF state. We never noticed
it on our real use cases because once we move sockets into BPF we never
release them until the socket is free. The fix is to null the set progs
and then do the update_sk_prot call which will decide based on the
configured programs what proto ops need to be set to.

So why did the sockmap test work earlier. Before I 'fixed' the logic to
use a recv hook that didn't fall back into normal tcp recv handler the
strparser was being detatched so no packets made it to the BPF layer and
then eventually, with some delay and suboptimal code, we used to fall back
to normal tcp stack recv hook. Packet would then end up being recieved
via recv() and the test was happy. Per original patch its slightly racy
to do this fallback because you don't know, maybe a packet shows up
just as you do these checks and you consume it via old tcp handlers
and skip over your parser or whatever BPF logic is there.

I'll test the below tomorrow and create a commit msg so we have a chance
at recalling details in a month or so. Its a bit subtle to send out
this evening I think and would like to run it through some of our
own Cilium tests.

If your willing to wait another day we should be able to get the
CI going again. Either way fix on its way soon just not tonight.

---

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 1ae52ac943f6..8eb671c827f9 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1124,6 +1124,8 @@ void sk_psock_start_strp(struct sock *sk, struct sk_psock *psock)
 
 void sk_psock_stop_strp(struct sock *sk, struct sk_psock *psock)
 {
+       psock_set_prog(&psock->progs.stream_parser, NULL);
+
        if (!psock->saved_data_ready)
                return;
 
@@ -1212,6 +1214,9 @@ void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock)
 
 void sk_psock_stop_verdict(struct sock *sk, struct sk_psock *psock)
 {
+       psock_set_prog(&psock->progs.stream_verdict, NULL);
+       psock_set_prog(&psock->progs.skb_verdict, NULL);
+
        if (!psock->saved_data_ready)
                return;
 
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index f39ef79ced67..4ca4b11f4e5f 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -167,8 +167,11 @@ static void sock_map_del_link(struct sock *sk,
                write_lock_bh(&sk->sk_callback_lock);
                if (strp_stop)
                        sk_psock_stop_strp(sk, psock);
-               else
+               if (verdict_stop)
                        sk_psock_stop_verdict(sk, psock);
+
+               if (psock->psock_update_sk_prot)
+                       psock->psock_update_sk_prot(sk, psock, false);
                write_unlock_bh(&sk->sk_callback_lock);

  reply	other threads:[~2021-11-18  5:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-14 18:43 sockmap test is broken Alexei Starovoitov
2021-11-15  5:21 ` John Fastabend
2021-11-15 19:54   ` sunyucong
2021-11-17 17:48   ` Andrii Nakryiko
2021-11-17 18:00     ` John Fastabend
2021-11-17 18:08       ` Andrii Nakryiko
2021-11-18  5:20         ` John Fastabend [this message]
2021-11-19 18:18           ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6195e2ae7a82f_2b4cc20884@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=joamaki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox