From: sdf@google.com
To: Aditi Ghag <aditi.ghag@isovalent.com>
Cc: bpf@vger.kernel.org, kafai@fb.com, edumazet@google.com
Subject: Re: [PATCH 1/2] bpf: Add socket destroy capability
Date: Mon, 19 Dec 2022 10:22:25 -0800 [thread overview]
Message-ID: <Y6Cr4X4h0buvET8U@google.com> (raw)
In-Reply-To: <c3b935a5a72b1371f9262348616a7fa84061b85f.1671242108.git.aditi.ghag@isovalent.com>
On 12/17, Aditi Ghag wrote:
> The socket destroy helper is used to
> forcefully terminate sockets from certain
> BPF contexts. We plan to use the capability
> in Cilium to force client sockets to reconnect
> when their remote load-balancing backends are
> deleted. The other use case is on-the-fly
> policy enforcement where existing socket
> connections prevented by policies need to
> be terminated.
> The helper is currently exposed to iterator
> type BPF programs where users can filter,
> and terminate a set of sockets.
> Sockets are destroyed asynchronously using
> the work queue infrastructure. This allows
> for current the locking semantics within
s/current the/the current/ ?
> socket destroy handlers, as BPF iterators
> invoking the helper acquire *sock* locks.
> This also allows the helper to be invoked
> from non-sleepable contexts.
> The other approach to skip acquiring locks
> by passing an argument to the `diag_destroy`
> handler didn't work out well for UDP, as
> the UDP abort function internally invokes
> another function that ends up acquiring
> *sock* lock.
> While there are sleepable BPF iterators,
> these are limited to only certain map types.
> Furthermore, it's limiting in the sense that
> it wouldn't allow us to extend the helper
> to other non-sleepable BPF programs.
> The work queue infrastructure processes work
> items from per-cpu structures. As the sock
> destroy work items are executed asynchronously,
> we need to ref count sockets before they are
> added to the work queue. The 'work_pending'
> check prevents duplicate ref counting of sockets
> in case users invoke the destroy helper for a
> socket multiple times. The `{READ,WRITE}_ONCE`
> macros ensure that the socket pointer stored
> in a work queue item isn't clobbered while
> the item is being processed. As BPF programs
> are non-preemptible, we can expect that once
> a socket is ref counted, no other socket can
> sneak in before the ref counted socket is
> added to the work queue for asynchronous destroy.
> Finally, users are expected to retry when the
> helper fails to queue a work item for a socket
> to be destroyed in case there is another destroy
> operation is in progress.
nit: maybe reformat to fit into 80 characters per line? A bit hard to
read with this narrow formatting..
> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
> ---
> include/linux/bpf.h | 1 +
> include/uapi/linux/bpf.h | 17 +++++++++
> kernel/bpf/core.c | 1 +
> kernel/trace/bpf_trace.c | 2 +
> net/core/filter.c | 70 ++++++++++++++++++++++++++++++++++
> tools/include/uapi/linux/bpf.h | 17 +++++++++
> 6 files changed, 108 insertions(+)
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 3de24cfb7a3d..60eaa05dfab3 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2676,6 +2676,7 @@ extern const struct bpf_func_proto
> bpf_get_retval_proto;
> extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
> extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
> extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
> +extern const struct bpf_func_proto bpf_sock_destroy_proto;
> const struct bpf_func_proto *tracing_prog_func_proto(
> enum bpf_func_id func_id, const struct bpf_prog *prog);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 464ca3f01fe7..789ac7c59fdf 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5484,6 +5484,22 @@ union bpf_attr {
> * 0 on success.
> *
> * **-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * int bpf_sock_destroy(struct sock *sk)
> + * Description
> + * Destroy the given socket with **ECONNABORTED** error code.
> + *
> + * *sk* must be a non-**NULL** pointer to a socket.
> + *
> + * Return
> + * The socket is destroyed asynchronosuly, so 0 return value may
> + * not suggest indicate that the socket was successfully destroyed.
s/suggest indicate/ with either suggest or indicate?
> + *
> + * On error, may return **EPROTONOSUPPORT**, **EBUSY**, **EINVAL**.
> + *
> + * **-EPROTONOSUPPORT** if protocol specific destroy handler is not
> implemented.
> + *
> + * **-EBUSY** if another socket destroy operation is in progress.
> */
> #define ___BPF_FUNC_MAPPER(FN, ctx...) \
> FN(unspec, 0, ##ctx) \
> @@ -5698,6 +5714,7 @@ union bpf_attr {
> FN(user_ringbuf_drain, 209, ##ctx) \
> FN(cgrp_storage_get, 210, ##ctx) \
> FN(cgrp_storage_delete, 211, ##ctx) \
> + FN(sock_destroy, 212, ##ctx) \
> /* */
> /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that
> don't
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 7f98dec6e90f..c59bef9805e5 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2651,6 +2651,7 @@ const struct bpf_func_proto bpf_snprintf_btf_proto
> __weak;
> const struct bpf_func_proto bpf_seq_printf_btf_proto __weak;
> const struct bpf_func_proto bpf_set_retval_proto __weak;
> const struct bpf_func_proto bpf_get_retval_proto __weak;
> +const struct bpf_func_proto bpf_sock_destroy_proto __weak;
> const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> {
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 3bbd3f0c810c..016dbee6b5e4 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1930,6 +1930,8 @@ tracing_prog_func_proto(enum bpf_func_id func_id,
> const struct bpf_prog *prog)
> return &bpf_get_socket_ptr_cookie_proto;
> case BPF_FUNC_xdp_get_buff_len:
> return &bpf_xdp_get_buff_len_trace_proto;
> + case BPF_FUNC_sock_destroy:
> + return &bpf_sock_destroy_proto;
> #endif
> case BPF_FUNC_seq_printf:
> return prog->expected_attach_type == BPF_TRACE_ITER ?
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 929358677183..9753606ecc26 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -11569,6 +11569,8 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id)
> break;
> case BPF_FUNC_ktime_get_coarse_ns:
> return &bpf_ktime_get_coarse_ns_proto;
> + case BPF_FUNC_sock_destroy:
> + return &bpf_sock_destroy_proto;
> default:
> return bpf_base_func_proto(func_id);
> }
> @@ -11578,3 +11580,71 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id)
> return func;
> }
> +
> +struct sock_destroy_work {
> + struct sock *sk;
> + struct work_struct destroy;
> +};
> +
> +static DEFINE_PER_CPU(struct sock_destroy_work, sock_destroy_workqueue);
> +
> +static void bpf_sock_destroy_fn(struct work_struct *work)
> +{
> + struct sock_destroy_work *sd_work = container_of(work,
> + struct sock_destroy_work, destroy);
> + struct sock *sk = READ_ONCE(sd_work->sk);
> +
> + sk->sk_prot->diag_destroy(sk, ECONNABORTED);
> + sock_put(sk);
> +}
> +
> +static int __init bpf_sock_destroy_workqueue_init(void)
> +{
> + int cpu;
> + struct sock_destroy_work *work;
> +
> + for_each_possible_cpu(cpu) {
> + work = per_cpu_ptr(&sock_destroy_workqueue, cpu);
> + INIT_WORK(&work->destroy, bpf_sock_destroy_fn);
> + }
> +
> + return 0;
> +}
> +subsys_initcall(bpf_sock_destroy_workqueue_init);
> +
> +BPF_CALL_1(bpf_sock_destroy, struct sock *, sk)
> +{
> + struct sock_destroy_work *sd_work;
> +
> + if (!sk->sk_prot->diag_destroy)
> + return -EOPNOTSUPP;
> +
> + sd_work = this_cpu_ptr(&sock_destroy_workqueue);
[..]
> + /* This check prevents duplicate ref counting
> + * of sockets, in case the handler is invoked
> + * multiple times for the same socket.
> + */
This means this helper can also be called for a single socket during
invocation; is it an ok compromise?
I'm also assuming it's still possible that this helper gets called for
the same socket on different cpus?
> + if (work_pending(&sd_work->destroy))
> + return -EBUSY;
> +
> + /* Ref counting ensures that the socket
> + * isn't deleted from underneath us before
> + * the work queue item is processed.
> + */
> + if (!refcount_inc_not_zero(&sk->sk_refcnt))
> + return -EINVAL;
> +
> + WRITE_ONCE(sd_work->sk, sk);
> + if (!queue_work(system_wq, &sd_work->destroy)) {
> + sock_put(sk);
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
> +
> +const struct bpf_func_proto bpf_sock_destroy_proto = {
> + .func = bpf_sock_destroy,
> + .ret_type = RET_INTEGER,
> + .arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
> +};
> diff --git a/tools/include/uapi/linux/bpf.h
> b/tools/include/uapi/linux/bpf.h
> index 464ca3f01fe7..07154a4d92f9 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5484,6 +5484,22 @@ union bpf_attr {
> * 0 on success.
> *
> * **-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * int bpf_sock_destroy(void *sk)
> + * Description
> + * Destroy the given socket with **ECONNABORTED** error code.
> + *
> + * *sk* must be a non-**NULL** pointer to a socket.
> + *
> + * Return
> + * The socket is destroyed asynchronosuly, so 0 return value may
> + * not indicate that the socket was successfully destroyed.
> + *
> + * On error, may return **EPROTONOSUPPORT**, **EBUSY**, **EINVAL**.
> + *
> + * **-EPROTONOSUPPORT** if protocol specific destroy handler is not
> implemented.
> + *
> + * **-EBUSY** if another socket destroy operation is in progress.
> */
> #define ___BPF_FUNC_MAPPER(FN, ctx...) \
> FN(unspec, 0, ##ctx) \
> @@ -5698,6 +5714,7 @@ union bpf_attr {
> FN(user_ringbuf_drain, 209, ##ctx) \
> FN(cgrp_storage_get, 210, ##ctx) \
> FN(cgrp_storage_delete, 211, ##ctx) \
> + FN(sock_destroy, 212, ##ctx) \
> /* */
> /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that
> don't
> --
> 2.34.1
next prev parent reply other threads:[~2022-12-19 18:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-17 1:57 [PATCH 0/2] bpf-next: Add socket destroy capability Aditi Ghag
2022-12-17 1:57 ` [PATCH 1/2] bpf: " Aditi Ghag
2022-12-19 18:22 ` sdf [this message]
2023-02-23 22:02 ` Aditi Ghag
2022-12-20 10:26 ` Alan Maguire
2023-02-23 22:12 ` Aditi Ghag
2022-12-22 5:08 ` Martin KaFai Lau
2022-12-22 10:10 ` Daniel Borkmann
2023-01-02 19:30 ` Aditi Ghag
[not found] ` <CACkfWH-qS3vaRA2uSoKUwGcwZZJe=Misaa0wsLw3R4JSYGUx3A@mail.gmail.com>
2023-01-04 2:37 ` Martin KaFai Lau
2023-02-23 22:05 ` Aditi Ghag
2022-12-17 1:57 ` [PATCH 2/2] selftests/bpf: Add tests for bpf_sock_destroy Aditi Ghag
2022-12-19 18:25 ` sdf
2023-02-23 22:24 ` Aditi Ghag
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y6Cr4X4h0buvET8U@google.com \
--to=sdf@google.com \
--cc=aditi.ghag@isovalent.com \
--cc=bpf@vger.kernel.org \
--cc=edumazet@google.com \
--cc=kafai@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox