BPF List
 help / color / mirror / Atom feed
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Kernel Team <kernel-team@fb.com>,
	Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map
Date: Tue, 30 Aug 2022 10:27:41 -0700	[thread overview]
Message-ID: <20220830172759.4069786-1-davemarchevsky@fb.com> (raw)

[ RFCv2: This RFC focuses on locking improvements. Some feedback from RFCv1
feedback and other discussions is not yet applied e.g. conversion to kfuncs,
dropping open-coded iter. This isn't meant to imply that I disagree with this
feedback, but rather want to make it easier to compare locking changes to
RFCv1. Please see changelog at end of this cover letter for a list of
newly-added patches and major changes worth looking at ]

Introduce a new map type, bpf_rbtree_map, allowing BPF programs to
create and manipulate rbtree data structures. bpf_rbtree_map differs
from 'classic' BPF map patterns in a few important ways:

  * The map does not allocate its own elements. Instead,
    BPF programs must call bpf_rbtree_alloc_node helper to allocate and
    bpf_rbtree_add to add map elem - referred to as 'node' from now on -
    to the rbtree. This means that rbtree maps can grow dynamically, do
    not preallocate, and that 'max_entries' has no meaning for rbtree
    maps.
    * Separating allocation and insertion allows alloc_node call to
      occur in contexts where it's safe to allocate.

  * It's possible to remove a node from a rbtree map with
    bpf_rbtree_remove helper.
    * Goal here is to allow a node to be removed from one rbtree and
      added to another
      [ NOTE: This functionality is still in progress ]

Helpers are added to manipulate nodes and trees:
  * bpf_rbtree_{alloc,free}_node: Allocate / free node structs
  * bpf_rbtree_{add,remove}: Add / remove nodes from rbtree maps
    * A comparator function is passed to bpf_rbtree_add in order to
      find the correct place to add the node.
  * bpf_rbtree_find: Find a node matching some condition in the rbtree
    * A comparator function is passed in order to determine whether a
      node matches what's being searched for.

bpf_rbtree_add is very similar to the 'map_push_elem' builtin, but since
verifier needs special logic to setup the comparator callback a new
helper is added. Same for bpf_rbtree_find and 'map_lookup_elem' builtin.

In order to safely support separate allocation / insertion and passing
nodes between rbtrees, some invariants must hold:

  * A node that is not in a rbtree map must either be free'd or added to
    a rbtree map before the program terminates
    * Nodes are in this state when returned from bpf_rbtree_alloc_node
      or bpf_rbtree_remove.

If a node is in a rbtree map it is 'owned' by the map, otherwise it is
owned by the BPF program which holds a reference to it. Owner is
responsible for the lifetime of the node.

This matches existing acquire / release semantics well. node_alloc and
remove operations 'acquire' a node while add and node_free operations
'release' the node. The verifier enforces that acquired nodes are
released before program terminates.

Some other implementation details:

  * The value type of an rbtree map is expected to be a struct which
    contains 'struct rb_node' at offset 0.
  * BPF programs may not modify the node struct's rb_node field.
    * Otherwise the tree could be left in corrupted state
  * Rbtree map is value-only. Keys have no meaning
  * Since the value type is not known until the rbtree map is
    instantiated, helper protos have input and return type
    'struct rb_node *' which verifier replaces with actual
    map value type.
  * [ TODO: Existing logic prevents any writes to PTR_TO_BTF_ID. This
    broadly turned off in this patch and replaced with "no writes to
    struct rb_node is PTR_TO_BTF_ID struct has one". This is a hack and
    needs to be replaced. ]

Changelog:

RFCv1 -> RFCv2: lore.kernel.org/bpf/20220722183438.3319790-1-davemarchevsky@fb.com
Major changes:
  * Add new patch ("bpf: Add verifier check for BPF_PTR_POISON retval and arg"),
    use BPF_PTR_POISON as placeholder btf_id for helpers which return or
    manipulate structs which contain rbtree node. (Alexei)
    * Migrate all rbtree helpers to use BPF_PTR_POISON
    * Testing this patch uncovered that rbtree_find was not in
      function_returns_rbtree_node check, add it. Rbtree_add was not
      setting a return btf_id type, fix that as well.

  * Add new patch ("libbpf: Add support for private BSS map section")
    allowing struct bpf_spin_lock declaration in special internal map

  * Add new patches improving ergonomics of associating lock with rbtree(1),
    teaching verifier to track rbtree locks(2), and teaching verifier to reject
    programs calling rbtree helpers without holding the necessary lock
    * (1): "bpf: Support declarative association of lock with rbtree map" (3)
    * (2): "bpf: Verifier tracking of rbtree_spin_lock held"
    * (3): "bpf: Check rbtree lock held during verification"
    * These are the changes most worth a look. Each commit has associated test
      change patch also added at the end of the series. Test patches are left
      unsquashed to ease RFC review.

Minor changes:
  * patch "bpf: Add rbtree map"
    * Alloc nodes w/ GFP_NOWAIT instead of GFP_KERNEL (Yonghong, Alexei)
    * Rename access_may_touch_field -> access_can_write_field (Alexei)
  * patch "bpf: Add CONDITIONAL_RELEASE type flag"
    * No need to return int from mark_ptr_or_null_regs, go back to returning
      void (Alexei)

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>

Dave Marchevsky (18):
  bpf: Add verifier support for custom callback return range
  bpf: Add verifier check for BPF_PTR_POISON retval and arg
  bpf: Add rb_node_off to bpf_map
  bpf: Add rbtree map
  libbpf: Add support for private BSS map section
  bpf: Add bpf_spin_lock member to rbtree
  bpf: Add bpf_rbtree_{lock,unlock} helpers
  bpf: Enforce spinlock hold for bpf_rbtree_{add,remove,find}
  bpf: Support declarative association of lock with rbtree map
  bpf: Verifier tracking of rbtree_spin_lock held
  bpf: Check rbtree lock held during verification
  bpf: Add OBJ_NON_OWNING_REF type flag
  bpf: Add CONDITIONAL_RELEASE type flag
  bpf: Introduce PTR_ITER and PTR_ITER_END type flags
  selftests/bpf: Add rbtree map tests
  selftests/bpf: Declarative lock definition test changes
  selftests/bpf: Lock tracking test changes
  selftests/bpf: Rbtree static lock verification test changes

 include/linux/bpf.h                           |  17 +
 include/linux/bpf_types.h                     |   1 +
 include/linux/bpf_verifier.h                  |   3 +
 include/linux/btf.h                           |   1 +
 include/linux/poison.h                        |   3 +
 include/uapi/linux/bpf.h                      | 120 ++++
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/btf.c                              |  21 +
 kernel/bpf/helpers.c                          |  48 +-
 kernel/bpf/rbtree.c                           | 442 ++++++++++++++
 kernel/bpf/syscall.c                          |  11 +-
 kernel/bpf/verifier.c                         | 546 ++++++++++++++++--
 tools/include/uapi/linux/bpf.h                | 120 ++++
 tools/lib/bpf/libbpf.c                        | 164 +++++-
 .../selftests/bpf/prog_tests/rbtree_map.c     | 134 +++++
 .../testing/selftests/bpf/progs/rbtree_map.c  | 119 ++++
 .../selftests/bpf/progs/rbtree_map_fail.c     | 243 ++++++++
 .../bpf/progs/rbtree_map_load_fail.c          |  31 +
 18 files changed, 1951 insertions(+), 75 deletions(-)
 create mode 100644 kernel/bpf/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c

-- 
2.30.2


             reply	other threads:[~2022-08-30 17:49 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-30 17:27 Dave Marchevsky [this message]
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range Dave Marchevsky
2022-09-01 21:01   ` Joanne Koong
2022-09-06 23:42     ` Dave Marchevsky
2022-09-07  1:53       ` Alexei Starovoitov
2022-09-08 21:36         ` Dave Marchevsky
2022-09-08 21:40           ` Alexei Starovoitov
2022-09-08 23:10             ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 02/18] bpf: Add verifier check for BPF_PTR_POISON retval and arg Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 03/18] bpf: Add rb_node_off to bpf_map Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 04/18] bpf: Add rbtree map Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 05/18] libbpf: Add support for private BSS map section Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 06/18] bpf: Add bpf_spin_lock member to rbtree Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 07/18] bpf: Add bpf_rbtree_{lock,unlock} helpers Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 08/18] bpf: Enforce spinlock hold for bpf_rbtree_{add,remove,find} Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 09/18] bpf: Support declarative association of lock with rbtree map Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 10/18] bpf: Verifier tracking of rbtree_spin_lock held Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 11/18] bpf: Check rbtree lock held during verification Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 12/18] bpf: Add OBJ_NON_OWNING_REF type flag Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 13/18] bpf: Add CONDITIONAL_RELEASE " Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 14/18] bpf: Introduce PTR_ITER and PTR_ITER_END type flags Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 15/18] selftests/bpf: Add rbtree map tests Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 16/18] selftests/bpf: Declarative lock definition test changes Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 17/18] selftests/bpf: Lock tracking " Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 18/18] selftests/bpf: Rbtree static lock verification " Dave Marchevsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220830172759.4069786-1-davemarchevsky@fb.com \
    --to=davemarchevsky@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox