public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Kui-Feng Lee <sinquersw@gmail.com>
To: Kui-Feng Lee <kuifeng@meta.com>,
	bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev,
	song@kernel.org, kernel-team@meta.com, andrii@kernel.org,
	sdf@google.com
Subject: Re: [PATCH bpf-next v6 0/8] Transit between BPF TCP congestion controls.
Date: Fri, 10 Mar 2023 08:28:59 -0800	[thread overview]
Message-ID: <7e0b5974-0518-fe8d-0485-a8b2b73059cb@gmail.com> (raw)
In-Reply-To: <20230310043812.3087672-1-kuifeng@meta.com>



On 3/9/23 20:38, Kui-Feng Lee wrote:
> Major changes:
> 
>   - Create bpf_links in the kernel for BPF struct_ops to register and
>     unregister it.
> 
>   - Enables switching between implementations of bpf-tcp-cc under a
>     name instantly by replacing the backing struct_ops map of a
>     bpf_link.
> 
> Previously, BPF struct_ops didn't go off, as even when the user
> program creating it was terminated, none of these ever were pinned.
> For instance, the TCP congestion control subsystem indirectly
> maintains a reference count on the struct_ops of any registered BPF
> implemented algorithm. Thus, the algorithm won't be deactivated until
> someone deliberately unregisters it.  For compatibility with other BPF
> programs, bpf_links have been created to work in coordination with
> struct_ops maps. This ensures that the registration and unregistration
> of these respective maps is carried out at the start and end of the
> bpf_link.
> 
> We also faced complications when attempting to replace an existing TCP
> congestion control algorithm with a new implementation on the fly. A
> struct_ops map was used to register a TCP congestion control algorithm
> with a unique name.  We had to either register the alternative
> implementation with a new name and move over or unregister the current
> one before being able to reregistration with the same name.  To fix
> this problem, we can an option to migrate the registration of the
> algorithm from struct_ops maps to bpf_links. By modifying the backing
> map of a bpf_link, it suddenly becomes possible to replace an existing
> TCP congestion control algorithm with ease.

The major differences from v5:

  - Add a new step to bpf_object__load() to prepare vdata.

  - Accept BPF_F_REPLACE.

  - Check section IDs in find_struct_ops_map_by_offset()

  - Add a test case to check mixing w/ & w/o link struct_ops.

  - Add a test case of using struct_ops w/o link to update a link.

  - Improve bpf_link__detach_struct_ops() to handle the w/ link case.


> 
> The major differences from v4:
> 
>   - Rebase.
> 
>   - Reorder patches and merge part 4 to part 2 of the v4.
> 
> The major differences from v3:
> 
>   - Remove bpf_struct_ops_map_free_rcu(), and use synchronize_rcu().
> 
>   - Improve the commit log of the part 1.
> 
>   - Before transitioning to the READY state, we conduct a value check
>     to ensure that struct_ops can be successfully utilized and links
>     created later.
> 
> The major differences from v2:
> 
>   - Simplify states
> 
>     - Remove TOBEUNREG.
> 
>     - Rename UNREG to READY.
> 
>   - Stop using the refcnt of the kvalue of a struct_ops. Explicitly
>     increase and decrease the refcount of struct_ops.
> 
>   - Prepare kernel vdata during the load phase of libbpf.
> 
> The major differences from v1:
> 
>   - Added bpf_struct_ops_link to replace the previous union-based
>     approach.
> 
>   - Added UNREG and TOBEUNREG to the state of bpf_struct_ops_map.
> 
>     - bpf_struct_ops_transit_state() maintains state transitions.
> 
>   - Fixed synchronization issue.
> 
>   - Prepare kernel vdata of struct_ops during the loading phase of
>     bpf_object.
> 
>   - Merged previous patch 3 to patch 1.
> 
v5: https://lore.kernel.org/all/20230308005050.255859-1-kuifeng@meta.com/
> v4: https://lore.kernel.org/all/20230307232913.576893-1-andrii@kernel.org/
> v3: https://lore.kernel.org/all/20230303012122.852654-1-kuifeng@meta.com/
> v2: https://lore.kernel.org/bpf/20230223011238.12313-1-kuifeng@meta.com/
> v1: https://lore.kernel.org/bpf/20230214221718.503964-1-kuifeng@meta.com/
> 
> Kui-Feng Lee (8):
>    bpf: Retire the struct_ops map kvalue->refcnt.
>    net: Update an existing TCP congestion control algorithm.
>    bpf: Create links for BPF struct_ops maps.
>    libbpf: Create a bpf_link in bpf_map__attach_struct_ops().
>    bpf: Update the struct_ops of a bpf_link.
>    libbpf: Update a bpf_link with another struct_ops.
>    libbpf: Use .struct_ops.link section to indicate a struct_ops with a
>      link.
>    selftests/bpf: Test switching TCP Congestion Control algorithms.
> 
>   include/linux/bpf.h                           |  10 +
>   include/net/tcp.h                             |   3 +
>   include/uapi/linux/bpf.h                      |  20 +-
>   kernel/bpf/bpf_struct_ops.c                   | 229 +++++++++++++++---
>   kernel/bpf/syscall.c                          |  49 +++-
>   net/bpf/bpf_dummy_struct_ops.c                |   6 +
>   net/ipv4/bpf_tcp_ca.c                         |  14 +-
>   net/ipv4/tcp_cong.c                           |  60 ++++-
>   tools/include/uapi/linux/bpf.h                |  20 +-
>   tools/lib/bpf/libbpf.c                        | 180 +++++++++++---
>   tools/lib/bpf/libbpf.h                        |   1 +
>   tools/lib/bpf/libbpf.map                      |   1 +
>   .../selftests/bpf/prog_tests/bpf_tcp_ca.c     |  91 +++++++
>   .../selftests/bpf/progs/tcp_ca_update.c       |  80 ++++++
>   14 files changed, 671 insertions(+), 93 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/tcp_ca_update.c
> 

      parent reply	other threads:[~2023-03-10 16:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-10  4:38 [PATCH bpf-next v6 0/8] Transit between BPF TCP congestion controls Kui-Feng Lee
2023-03-10  4:38 ` [PATCH bpf-next v6 1/8] bpf: Retire the struct_ops map kvalue->refcnt Kui-Feng Lee
2023-03-14  6:05   ` Martin KaFai Lau
2023-03-10  4:38 ` [PATCH bpf-next v6 2/8] net: Update an existing TCP congestion control algorithm Kui-Feng Lee
2023-03-10 16:47   ` Stephen Hemminger
2023-03-13 15:46     ` Kui-Feng Lee
2023-03-13 16:43       ` Kui-Feng Lee
2023-03-14  0:28   ` Martin KaFai Lau
2023-03-14  4:31     ` Kui-Feng Lee
2023-03-10  4:38 ` [PATCH bpf-next v6 3/8] bpf: Create links for BPF struct_ops maps Kui-Feng Lee
2023-03-14  1:42   ` Martin KaFai Lau
2023-03-16  0:21     ` Kui-Feng Lee
2023-03-10  4:38 ` [PATCH bpf-next v6 4/8] libbpf: Create a bpf_link in bpf_map__attach_struct_ops() Kui-Feng Lee
2023-03-10  4:38 ` [PATCH bpf-next v6 5/8] bpf: Update the struct_ops of a bpf_link Kui-Feng Lee
2023-03-10  4:38 ` [PATCH bpf-next v6 6/8] libbpf: Update a bpf_link with another struct_ops Kui-Feng Lee
2023-03-10  4:38 ` [PATCH bpf-next v6 7/8] libbpf: Use .struct_ops.link section to indicate a struct_ops with a link Kui-Feng Lee
2023-03-10  4:38 ` [PATCH bpf-next v6 8/8] selftests/bpf: Test switching TCP Congestion Control algorithms Kui-Feng Lee
2023-03-14  5:04   ` Martin KaFai Lau
2023-03-10 16:28 ` Kui-Feng Lee [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e0b5974-0518-fe8d-0485-a8b2b73059cb@gmail.com \
    --to=sinquersw@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kuifeng@meta.com \
    --cc=martin.lau@linux.dev \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox