BPF List
 help / color / mirror / Atom feed
From: Chris Mason <clm@meta.com>
To: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>,
	bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org,
	daniel@iogearbox.net, kafai@meta.com, kernel-team@meta.com,
	eddyz87@gmail.com, memxor@gmail.com
Cc: Mykyta Yatsenko <yatsenko@meta.com>
Subject: Re: [PATCH bpf-next v3 6/7] bpf: task work scheduling kfuncs
Date: Tue, 9 Sep 2025 13:49:47 -0400	[thread overview]
Message-ID: <5d6226f6-c3ae-4e68-a420-76f553a462ec@meta.com> (raw)
In-Reply-To: <20250905164508.1489482-7-mykyta.yatsenko5@gmail.com>

On 9/5/25 12:45 PM, Mykyta Yatsenko wrote:
> From: Mykyta Yatsenko <yatsenko@meta.com>
> 
> Implementation of the new bpf_task_work_schedule kfuncs, that let a BPF
> program schedule task_work callbacks for a target task:
>  * bpf_task_work_schedule_signal() → schedules with TWA_SIGNAL
>  * bpf_task_work_schedule_resume() → schedules with TWA_RESUME
> 
> Each map value should embed a struct bpf_task_work, which the kernel
> side pairs with struct bpf_task_work_kern, containing a pointer to
> struct bpf_task_work_ctx, that maintains metadata relevant for the
> concrete callback scheduling.
> 
> A small state machine and refcounting scheme ensures safe reuse and
> teardown:
>  STANDBY -> PENDING -> SCHEDULING -> SCHEDULED -> RUNNING -> STANDBY
> 
> A FREED terminal state coordinates with map-value
> deletion (bpf_task_work_cancel_and_free()).
> 
> Scheduling itself is deferred via irq_work to keep the kfunc callable
> from NMI context.
> 
> Lifetime is guarded with refcount_t + RCU Tasks Trace.
> 
> Main components:
>  * struct bpf_task_work_context – Metadata and state management per task
> work.
>  * enum bpf_task_work_state – A state machine to serialize work
>  scheduling and execution.
>  * bpf_task_work_schedule() – The central helper that initiates
> scheduling.
>  * bpf_task_work_acquire_ctx() - Attempts to take ownership of the context,
>  pointed by passed struct bpf_task_work, allocates new context if none
>  exists yet.
>  * bpf_task_work_callback() – Invoked when the actual task_work runs.
>  * bpf_task_work_irq() – An intermediate step (runs in softirq context)
> to enqueue task work.
>  * bpf_task_work_cancel_and_free() – Cleanup for deleted BPF map entries.
> 
> Flow of successful task work scheduling
>  1) bpf_task_work_schedule_* is called from BPF code.
>  2) Transition state from STANDBY to PENDING, marks context is owned by
>  this task work scheduler
>  3) irq_work_queue() schedules bpf_task_work_irq().
>  4) Transition state from PENDING to SCHEDULING.
>  4) bpf_task_work_irq() attempts task_work_add(). If successful, state
>  transitions to SCHEDULED.
>  5) Task work calls bpf_task_work_callback(), which transition state to
>  RUNNING.
>  6) BPF callback is executed
>  7) Context is cleaned up, refcounts released, context state set back to
>  STANDBY.
> 
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
>  kernel/bpf/helpers.c | 319 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 317 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 109cb249e88c..418a0a211699 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c

[ ... ]

> +static void bpf_task_work_irq(struct irq_work *irq_work)
> +{
> +	struct bpf_task_work_ctx *ctx = container_of(irq_work, struct bpf_task_work_ctx, irq_work);
> +	enum bpf_task_work_state state;
> +	int err;
> +
> +	guard(rcu_tasks_trace)();
> +
> +	if (cmpxchg(&ctx->state, BPF_TW_PENDING, BPF_TW_SCHEDULING) != BPF_TW_PENDING) {
> +		bpf_task_work_ctx_put(ctx);
> +		return;
> +	}
> +
> +	err = task_work_add(ctx->task, &ctx->work, ctx->mode);
> +	if (err) {
> +		bpf_task_work_ctx_reset(ctx);
> +		/*
> +		 * try to switch back to STANDBY for another task_work reuse, but we might have
> +		 * gone to FREED already, which is fine as we already cleaned up after ourselves
> +		 */
> +		(void)cmpxchg(&ctx->state, BPF_TW_SCHEDULING, BPF_TW_STANDBY);
> +
> +		/* we don't have RCU protection, so put after switching state */
> +		bpf_task_work_ctx_put(ctx);
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^

Do we want to return here?  I didn't follow all of the references, but
even if this isn't the last reference, it looks like the rest of the
function isn't meant to work on the ctx after this point.

> +	}
> +
> +	/*
> +	 * It's technically possible for just scheduled task_work callback to
> +	 * complete running by now, going SCHEDULING -> RUNNING and then
> +	 * dropping its ctx refcount. Instead of capturing extra ref just to
> +	 * protected below ctx->state access, we rely on RCU protection to
> +	 * perform below SCHEDULING -> SCHEDULED attempt.
> +	 */
> +	state = cmpxchg(&ctx->state, BPF_TW_SCHEDULING, BPF_TW_SCHEDULED);
> +	if (state == BPF_TW_FREED)
> +		bpf_task_work_cancel(ctx); /* clean up if we switched into FREED state */
> +}
-chris

  parent reply	other threads:[~2025-09-09 17:50 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-05 16:44 [PATCH bpf-next v3 0/7] bpf: Introduce deferred task context execution Mykyta Yatsenko
2025-09-05 16:44 ` [PATCH bpf-next v3 1/7] bpf: refactor special field-type detection Mykyta Yatsenko
2025-09-05 19:36   ` Eduard Zingerman
2025-09-05 21:29   ` Andrii Nakryiko
2025-09-05 16:45 ` [PATCH bpf-next v3 2/7] bpf: extract generic helper from process_timer_func() Mykyta Yatsenko
2025-09-05 21:15   ` Eduard Zingerman
2025-09-05 21:28   ` Eduard Zingerman
2025-09-05 21:31     ` Andrii Nakryiko
2025-09-05 21:32       ` Eduard Zingerman
2025-09-05 21:29   ` Andrii Nakryiko
2025-09-05 16:45 ` [PATCH bpf-next v3 3/7] bpf: htab: extract helper for freeing special structs Mykyta Yatsenko
2025-09-05 21:31   ` Andrii Nakryiko
2025-09-05 21:31   ` Eduard Zingerman
2025-09-05 16:45 ` [PATCH bpf-next v3 4/7] bpf: bpf task work plumbing Mykyta Yatsenko
2025-09-05 21:31   ` Andrii Nakryiko
2025-09-05 23:09   ` Eduard Zingerman
2025-09-15 15:59     ` Mykyta Yatsenko
2025-09-15 20:12       ` Andrii Nakryiko
2025-09-15 20:20         ` Mykyta Yatsenko
2025-09-15 20:28           ` Andrii Nakryiko
2025-09-05 16:45 ` [PATCH bpf-next v3 5/7] bpf: extract map key pointer calculation Mykyta Yatsenko
2025-09-05 21:31   ` Andrii Nakryiko
2025-09-05 23:19   ` Eduard Zingerman
2025-09-08 13:39     ` Mykyta Yatsenko
2025-09-08 17:18       ` Eduard Zingerman
2025-09-05 16:45 ` [PATCH bpf-next v3 6/7] bpf: task work scheduling kfuncs Mykyta Yatsenko
2025-09-05 21:31   ` Andrii Nakryiko
2025-09-06 20:22   ` Eduard Zingerman
2025-09-08 13:13     ` Mykyta Yatsenko
2025-09-08 17:38       ` Eduard Zingerman
2025-09-09  3:42         ` Andrii Nakryiko
2025-09-09  4:15           ` Eduard Zingerman
2025-09-09  3:33       ` Andrii Nakryiko
2025-09-09  4:05         ` Eduard Zingerman
2025-09-10 14:14           ` Andrii Nakryiko
2025-09-09 17:49   ` Chris Mason [this message]
2025-09-09 18:59     ` Mykyta Yatsenko
2025-09-05 16:45 ` [PATCH bpf-next v3 7/7] selftests/bpf: BPF task work scheduling tests Mykyta Yatsenko
2025-09-05 21:31   ` Andrii Nakryiko
2025-09-08  7:43   ` Eduard Zingerman
2025-09-08 13:21     ` Mykyta Yatsenko
2025-09-08 18:23       ` Eduard Zingerman
2025-09-09  3:44         ` Andrii Nakryiko
2025-09-08 18:23   ` Eduard Zingerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d6226f6-c3ae-4e68-a420-76f553a462ec@meta.com \
    --to=clm@meta.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=kafai@meta.com \
    --cc=kernel-team@meta.com \
    --cc=memxor@gmail.com \
    --cc=mykyta.yatsenko5@gmail.com \
    --cc=yatsenko@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox