From: Kuniyuki Iwashima <kuniyu@amazon.com>
To: <kafai@fb.com>
Cc: <aditivghag@gmail.com>, <bpf@vger.kernel.org>,
<daniel@iogearbox.net>, <kuniyu@amazon.com>,
<netdev@vger.kernel.org>, <yhs@fb.com>
Subject: Re: [RFC] Socket termination for policy enforcement and load-balancing
Date: Wed, 31 Aug 2022 16:43:25 -0700 [thread overview]
Message-ID: <20220831234326.49672-1-kuniyu@amazon.com> (raw)
In-Reply-To: <20220831230157.7lchomcdxmvq3qqw@kafai-mbp.dhcp.thefacebook.com>
Thanks for CCing, Martin.
Date: Wed, 31 Aug 2022 16:01:57 -0700
From: Martin KaFai Lau <kafai@fb.com>
> On Wed, Aug 31, 2022 at 09:37:41AM -0700, Aditi Ghag wrote:
> > - Use BPF (sockets) iterator to identify sockets connected to a
> > deleted backend. The BPF (sockets) iterator is network namespace aware
> > so we'll either need to enter every possible container network
> > namespace to identify the affected connections, or adapt the iterator
> > to be without netns checks [3]. This was discussed with my colleague
> > Daniel Borkmann based on the feedback he shared from the LSFMMBPF
> > conference discussions.
> Being able to iterate all sockets across different netns will
> be useful.
>
> It should be doable to ignore the netns check. For udp, a quick
> thought is to have another iter target. eg. "udp_all_netns".
> From the sk, the bpf prog should be able to learn the netns and
> the bpf prog can filter the netns by itself.
>
> The TCP side is going to have an 'optional' per netns ehash table [0] soon,
> not lhash2 (listening hash) though. Ideally, the same bpf
> all-netns iter interface should work similarly for both udp and
> tcp case. Thus, both should be considered and work at the same time.
I'm going to add optional hash tables for UDP as well. The first series [1]
had TCP/UDP stuff and was split, and UDP part is pending for now.
So, if the both series was merged, the TCP/UDP all netns iter would have
similar logic.
[1]: https://lore.kernel.org/netdev/20220826000445.46552-14-kuniyu@amazon.com/
>
> For udp, something more useful than plain udp_abort() could potentially
> be done. eg. directly connect to another backend (by bpf kfunc?).
> There may be some details in socket locking...etc but should
> be doable and the bpf-iter program could be sleepable also.
> fwiw, we are iterating the tcp socket to retire some older
> bpf-tcp-cc (congestion control) on the long-lived connections
> by bpf_setsockopt(TCP_CONGESTION).
>
> Also, potentially, instead of iterating all,
> a more selective case can be done by
> bpf_prog_test_run()+bpf_sk_lookup_*()+udp_abort().
>
> [0]: https://lore.kernel.org/netdev/20220830191518.77083-1-kuniyu@amazon.com/
next prev parent reply other threads:[~2022-08-31 23:44 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-31 16:37 [RFC] Socket termination for policy enforcement and load-balancing Aditi Ghag
2022-08-31 22:02 ` sdf
2022-09-04 17:41 ` Aditi Ghag
2022-09-06 16:28 ` Stanislav Fomichev
2022-08-31 23:01 ` Martin KaFai Lau
2022-08-31 23:43 ` Kuniyuki Iwashima [this message]
2022-09-04 18:14 ` Aditi Ghag
2022-09-04 21:24 ` Kumar Kartikeya Dwivedi
2022-09-08 2:26 ` Martin KaFai Lau
2022-09-12 10:01 ` Aditi Ghag
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220831234326.49672-1-kuniyu@amazon.com \
--to=kuniyu@amazon.com \
--cc=aditivghag@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kafai@fb.com \
--cc=netdev@vger.kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).