From: Martin KaFai Lau <martin.lau@linux.dev>
To: Jordan Rife <jrife@google.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
Daniel Borkmann <daniel@iogearbox.net>,
Yonghong Song <yonghong.song@linux.dev>,
Aditi Ghag <aditi.ghag@isovalent.com>
Subject: Re: [RFC PATCH bpf-next 0/3] Avoid skipping sockets with socket iterators
Date: Tue, 18 Mar 2025 16:32:55 -0700 [thread overview]
Message-ID: <08387a7e-55b0-4499-a225-07207453c8d5@linux.dev> (raw)
In-Reply-To: <CADKFtnQyiz_r_vfyYfTvzi3MvNpRt62mDrNyEvp9tm82UcSFjQ@mail.gmail.com>
On 3/18/25 4:09 PM, Jordan Rife wrote:
> To add to this, I actually encountered some strange behavior today
> where using bpf_sock_destroy actually /causes/ sockets to repeat
> during iteration. In my environment, I just have one socket in a
> network namespace with a socket iterator that destroys it. The
> iterator visits the same socket twice and calls bpf_sock_destroy twice
> as a result. In the UDP case (and maybe TCP, I haven't checked)
> bpf_sock_destroy() can call udp_abort (sk->sk_prot->diag_destroy()) ->
> __udp_disconnect() -> udp_v4_rehash() (sk->sk_prot->rehash(sk)) which
> rehashes the socket and moves it to a new bucket. Depending on where a
> socket lands, you may encounter it again as you progress through the
> buckets. Doing some inspection with bpftrace seems to confirm this. As
> opposed to the edge cases I described before, this is more likely. I
> noticed this when I tried to use bpf_seq_write to write something for
> every socket that got deleted for an accurate count at the end in
> userspace which seems like a fairly valid use case.
imo, this is not a problem for bpf. The bpf prog has access to many fields of a
udp_sock (ip addresses, ports, state...etc) to make the right decision. The bpf
prog can decide if that rehashed socket needs to be bpf_sock_destroy(), e.g. the
saddr in this case because of inet_reset_saddr(sk) before the rehash. From the
bpf prog's pov, the rehashed udp_sock is not much different from a new udp_sock
getting added from the userspace into the later bucket.
next prev parent reply other threads:[~2025-03-18 23:33 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-13 23:35 [RFC PATCH bpf-next 0/3] Avoid skipping sockets with socket iterators Jordan Rife
2025-03-13 23:35 ` [RFC PATCH bpf-next 1/3] bpf: udp: Avoid socket skips during iteration Jordan Rife
2025-03-17 17:48 ` Willem de Bruijn
2025-03-18 1:54 ` Jordan Rife
2025-03-13 23:35 ` [RFC PATCH bpf-next 2/3] bpf: tcp: " Jordan Rife
2025-03-13 23:35 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add tests for socket skips and repeats Jordan Rife
2025-03-17 22:06 ` [RFC PATCH bpf-next 0/3] Avoid skipping sockets with socket iterators Martin KaFai Lau
2025-03-18 1:45 ` Jordan Rife
2025-03-18 23:09 ` Jordan Rife
2025-03-18 23:32 ` Martin KaFai Lau [this message]
2025-03-19 0:23 ` Jordan Rife
2025-03-21 5:46 ` Martin KaFai Lau
2025-03-19 0:30 ` Martin KaFai Lau
2025-03-31 17:23 ` Jordan Rife
2025-03-31 20:44 ` Martin KaFai Lau
2025-03-31 21:58 ` Jordan Rife
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=08387a7e-55b0-4499-a225-07207453c8d5@linux.dev \
--to=martin.lau@linux.dev \
--cc=aditi.ghag@isovalent.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jrife@google.com \
--cc=netdev@vger.kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.