From: John Fastabend <john.fastabend@gmail.com>
To: wangyufen <wangyufen@huawei.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Jakub Sitnicki <jakub@cloudflare.com>
Cc: ast@kernel.org, john.fastabend@gmail.com, lmb@cloudflare.com,
davem@davemloft.net, kafai@fb.com, dsahern@kernel.org,
kuba@kernel.org, songliubraving@fb.com, yhs@fb.com,
kpsingh@kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed
Date: Tue, 15 Mar 2022 22:23:58 -0700 [thread overview]
Message-ID: <6231746e8e561_ad0208bf@john.notmuch> (raw)
In-Reply-To: <f5a45e95-bac2-e1be-2d7b-5e6d55f9b408@huawei.com>
wangyufen wrote:
>
> 在 2022/3/16 0:25, Daniel Borkmann 写道:
> > On 3/15/22 1:12 PM, Jakub Sitnicki wrote:
> >> On Tue, Mar 15, 2022 at 03:24 PM +08, wangyufen wrote:
> >>> 在 2022/3/14 23:30, Jakub Sitnicki 写道:
> >>>> On Mon, Mar 14, 2022 at 08:44 PM +08, Wang Yufen wrote:
> >>>>> A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to
> >>>>> delete
> >>>>> the sockmap element, the tcp socket will switch to use the TCP
> >>>>> protocol
> >>>>> stack to send and receive packets. The switching process may cause
> >>>>> some
> >>>>> issues, such as if some msgs exist in the ingress queue and are
> >>>>> cleared
> >>>>> by sk_psock_drop(), the packets are lost, and the tcp data is
> >>>>> abnormal.
> >>>>>
> >>>>> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
> >>>>> ---
> >>>> Can you please tell us a bit more about the life-cycle of the
> >>>> socket in
> >>>> your workload? Questions that come to mind:
> >>>>
> >>>> 1) What triggers the removal of the socket from sockmap in your case?
> >>> We use sk_msg to redirect with sock hash, like this:
> >>>
> >>> skA redirect skB
> >>> Tx <-----------> skB,Rx
> >>>
> >>> And construct a scenario where the packet sending speed is high, the
> >>> packet receiving speed is slow, so the packets are stacked in the
> >>> ingress
> >>> queue on the receiving side. In this case, if run
> >>> bpf_map_delete_elem() to
> >>> delete the sockmap entry, will trigger the following procedure:
> >>>
> >>> sock_hash_delete_elem()
> >>> sock_map_unref()
> >>> sk_psock_put()
> >>> sk_psock_drop()
> >>> sk_psock_stop()
> >>> __sk_psock_zap_ingress()
> >>> __sk_psock_purge_ingress_msg()
> >>>
> >>>> 2) Would it still be a problem if removal from sockmap did not
> >>>> cause any
> >>>> packets to get dropped?
> >>> Yes, it still be a problem. If removal from sockmap did not cause any
> >>> packets to get dropped, packet receiving process switches to use TCP
> >>> protocol stack. The packets in the psock ingress queue cannot be
> >>> received
> >>>
> >>> by the user.
> >>
> >> Thanks for the context. So, if I understand correctly, you want to avoid
> >> breaking the network pipe by updating the sockmap from user-space.
> >>
> >> This sounds awfully similar to BPF_MAP_FREEZE. Have you considered that?
> >
> > +1
> >
> > Aside from that, the patch as-is also fails BPF CI in a lot of places,
> > please
> > make sure to check selftests:
> >
> > https://github.com/kernel-patches/bpf/runs/5537367301?check_suite_focus=true
> >
> >
> > [...]
> > #145/73 sockmap_listen/sockmap IPv6 test_udp_redir:OK
> > #145/74 sockmap_listen/sockmap IPv6 test_udp_unix_redir:OK
> > #145/75 sockmap_listen/sockmap Unix test_unix_redir:OK
> > #145/76 sockmap_listen/sockmap Unix test_unix_redir:OK
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > #145/77 sockmap_listen/sockhash IPv4 TCP test_insert_invalid:FAIL
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > #145/78 sockmap_listen/sockhash IPv4 TCP test_insert_opened:FAIL
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > #145/79 sockmap_listen/sockhash IPv4 TCP test_insert_bound:FAIL
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > [...]
> >
> > Thanks,
> > Daniel
> > .
>
> I'm not sure about this patch. The main purpose is to point out the
> possible problems
>
> when the socket is deleted from the map.I'm sorry for the trouble.
>
> Thanks.
If you want to delete a socket you should flush it first. To do this
stop redirecting traffic to it and then read all the data out. At
the moment its a bit tricky to know when the recieving socket is
empty though. Adding a flag on delete to only delete when the
ingress qlen == 0 might be a possibility if you need delete to
work and are trying to work out how to safely delete sockets.
next prev parent reply other threads:[~2022-03-16 5:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-14 12:44 [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed Wang Yufen
2022-03-14 15:30 ` Jakub Sitnicki
2022-03-15 7:24 ` wangyufen
2022-03-15 12:12 ` Jakub Sitnicki
2022-03-15 16:25 ` Daniel Borkmann
[not found] ` <f5a45e95-bac2-e1be-2d7b-5e6d55f9b408@huawei.com>
2022-03-16 5:23 ` John Fastabend [this message]
2022-03-16 14:57 ` Jakub Sitnicki
2022-03-16 0:36 ` Cong Wang
2022-03-16 3:25 ` wangyufen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6231746e8e561_ad0208bf@john.notmuch \
--to=john.fastabend@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=jakub@cloudflare.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=lmb@cloudflare.com \
--cc=netdev@vger.kernel.org \
--cc=songliubraving@fb.com \
--cc=wangyufen@huawei.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.