All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] bpf: add bpf_msleep_interruptible()
@ 2025-05-05  6:38 Sergey Senozhatsky
  2025-05-05  9:49 ` Matt Bobrowski
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2025-05-05  6:38 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, bpf, linux-kernel,
	Sergey Senozhatsky

bpf_msleep_interruptible() puts a calling context into an
interruptible sleep.  This function is expected to be used
for testing only (perhaps in conjunction with fault-injection)
to simulate various execution delays or timeouts.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 include/linux/bpf.h            |  1 +
 include/uapi/linux/bpf.h       |  9 +++++++++
 kernel/bpf/helpers.c           | 13 +++++++++++++
 kernel/trace/bpf_trace.c       |  2 ++
 tools/include/uapi/linux/bpf.h |  9 +++++++++
 5 files changed, 34 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3f0cc89c0622..85bd1daaa7df 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3392,6 +3392,7 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
 extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
 extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
 extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
+extern const struct bpf_func_proto bpf_msleep_interruptible_proto;
 
 const struct bpf_func_proto *tracing_prog_func_proto(
   enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 71d5ac83cf5d..cbbb6d70a7a3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5814,6 +5814,14 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf_local_storage cannot be found.
+ *
+ * long bpf_msleep_interruptible(long timeout)
+ *	Description
+ *		Make the current task sleep until *timeout* milliseconds have
+ *		elapsed or until a signal is received.
+ *
+ *	Return
+ *		The remaining time of the sleep duration in milliseconds.
  */
 #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
 	FN(unspec, 0, ##ctx)				\
@@ -6028,6 +6036,7 @@ union bpf_attr {
 	FN(user_ringbuf_drain, 209, ##ctx)		\
 	FN(cgrp_storage_get, 210, ##ctx)		\
 	FN(cgrp_storage_delete, 211, ##ctx)		\
+	FN(msleep_interruptible, 212, ##ctx)		\
 	/* This helper list is effectively frozen. If you are trying to	\
 	 * add a new helper, you should add a kfunc instead which has	\
 	 * less stability guarantees. See Documentation/bpf/kfuncs.rst	\
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e3a2662f4e33..0a3449c282f2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1905,6 +1905,19 @@ static const struct bpf_func_proto bpf_dynptr_data_proto = {
 	.arg3_type	= ARG_CONST_ALLOC_SIZE_OR_ZERO,
 };
 
+BPF_CALL_1(bpf_msleep_interruptible, long, timeout)
+{
+	return msleep_interruptible(timeout);
+}
+
+const struct bpf_func_proto bpf_msleep_interruptible_proto = {
+	.func		= bpf_msleep_interruptible,
+	.gpl_only	= false,
+	.might_sleep	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 52c432a44aeb..8a0b96aed0c0 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1475,6 +1475,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_branch_snapshot_proto;
 	case BPF_FUNC_find_vma:
 		return &bpf_find_vma_proto;
+	case BPF_FUNC_msleep_interruptible:
+		return &bpf_msleep_interruptible_proto;
 	default:
 		break;
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 71d5ac83cf5d..cbbb6d70a7a3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5814,6 +5814,14 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf_local_storage cannot be found.
+ *
+ * long bpf_msleep_interruptible(long timeout)
+ *	Description
+ *		Make the current task sleep until *timeout* milliseconds have
+ *		elapsed or until a signal is received.
+ *
+ *	Return
+ *		The remaining time of the sleep duration in milliseconds.
  */
 #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
 	FN(unspec, 0, ##ctx)				\
@@ -6028,6 +6036,7 @@ union bpf_attr {
 	FN(user_ringbuf_drain, 209, ##ctx)		\
 	FN(cgrp_storage_get, 210, ##ctx)		\
 	FN(cgrp_storage_delete, 211, ##ctx)		\
+	FN(msleep_interruptible, 212, ##ctx)		\
 	/* This helper list is effectively frozen. If you are trying to	\
 	 * add a new helper, you should add a kfunc instead which has	\
 	 * less stability guarantees. See Documentation/bpf/kfuncs.rst	\
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] bpf: add bpf_msleep_interruptible()
  2025-05-05  6:38 [PATCH] bpf: add bpf_msleep_interruptible() Sergey Senozhatsky
@ 2025-05-05  9:49 ` Matt Bobrowski
  2025-05-05 13:26   ` Sergey Senozhatsky
  2025-05-05 19:07 ` Song Liu
  2025-05-05 19:55 ` Andrii Nakryiko
  2 siblings, 1 reply; 12+ messages in thread
From: Matt Bobrowski @ 2025-05-05  9:49 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, bpf, linux-kernel

On Mon, May 05, 2025 at 03:38:59PM +0900, Sergey Senozhatsky wrote:
> bpf_msleep_interruptible() puts a calling context into an
> interruptible sleep.  This function is expected to be used
> for testing only (perhaps in conjunction with fault-injection)
> to simulate various execution delays or timeouts.
>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  include/linux/bpf.h            |  1 +
>  include/uapi/linux/bpf.h       |  9 +++++++++
>  kernel/bpf/helpers.c           | 13 +++++++++++++
>  kernel/trace/bpf_trace.c       |  2 ++
>  tools/include/uapi/linux/bpf.h |  9 +++++++++
>  5 files changed, 34 insertions(+)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 3f0cc89c0622..85bd1daaa7df 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -3392,6 +3392,7 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
>  extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
>  extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
>  extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
> +extern const struct bpf_func_proto bpf_msleep_interruptible_proto;
>  
>  const struct bpf_func_proto *tracing_prog_func_proto(
>    enum bpf_func_id func_id, const struct bpf_prog *prog);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 71d5ac83cf5d..cbbb6d70a7a3 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5814,6 +5814,14 @@ union bpf_attr {
>   *		0 on success.
>   *
>   *		**-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * long bpf_msleep_interruptible(long timeout)
> + *	Description
> + *		Make the current task sleep until *timeout* milliseconds have
> + *		elapsed or until a signal is received.
> + *
> + *	Return
> + *		The remaining time of the sleep duration in milliseconds.
>   */
>  #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
>  	FN(unspec, 0, ##ctx)				\
> @@ -6028,6 +6036,7 @@ union bpf_attr {
>  	FN(user_ringbuf_drain, 209, ##ctx)		\
>  	FN(cgrp_storage_get, 210, ##ctx)		\
>  	FN(cgrp_storage_delete, 211, ##ctx)		\
> +	FN(msleep_interruptible, 212, ##ctx)		\
>  	/* This helper list is effectively frozen. If you are trying to	\
>  	 * add a new helper, you should add a kfunc instead which has	\
>  	 * less stability guarantees. See Documentation/bpf/kfuncs.rst	\

I noticed that you've written the newly proposed BPF helper in the
legacy BPF helper form, which I believe is now discouraged, as also
stated within the above comment. You probably want to respin this
patch series having written this newly proposed BPF helper in BPF
kfuncs [0] form instead.

Additionally, as part of your patch series I think you'll also want to
include some selftests.

[0] https://docs.kernel.org/bpf/kfuncs.html

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index e3a2662f4e33..0a3449c282f2 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1905,6 +1905,19 @@ static const struct bpf_func_proto bpf_dynptr_data_proto = {
>  	.arg3_type	= ARG_CONST_ALLOC_SIZE_OR_ZERO,
>  };
>  
> +BPF_CALL_1(bpf_msleep_interruptible, long, timeout)
> +{
> +	return msleep_interruptible(timeout);
> +}
> +
> +const struct bpf_func_proto bpf_msleep_interruptible_proto = {
> +	.func		= bpf_msleep_interruptible,
> +	.gpl_only	= false,
> +	.might_sleep	= true,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_ANYTHING,
> +};
> +
>  const struct bpf_func_proto bpf_get_current_task_proto __weak;
>  const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
>  const struct bpf_func_proto bpf_probe_read_user_proto __weak;
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 52c432a44aeb..8a0b96aed0c0 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1475,6 +1475,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  		return &bpf_get_branch_snapshot_proto;
>  	case BPF_FUNC_find_vma:
>  		return &bpf_find_vma_proto;
> +	case BPF_FUNC_msleep_interruptible:
> +		return &bpf_msleep_interruptible_proto;
>  	default:
>  		break;
>  	}
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 71d5ac83cf5d..cbbb6d70a7a3 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5814,6 +5814,14 @@ union bpf_attr {
>   *		0 on success.
>   *
>   *		**-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * long bpf_msleep_interruptible(long timeout)
> + *	Description
> + *		Make the current task sleep until *timeout* milliseconds have
> + *		elapsed or until a signal is received.
> + *
> + *	Return
> + *		The remaining time of the sleep duration in milliseconds.
>   */
>  #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
>  	FN(unspec, 0, ##ctx)				\
> @@ -6028,6 +6036,7 @@ union bpf_attr {
>  	FN(user_ringbuf_drain, 209, ##ctx)		\
>  	FN(cgrp_storage_get, 210, ##ctx)		\
>  	FN(cgrp_storage_delete, 211, ##ctx)		\
> +	FN(msleep_interruptible, 212, ##ctx)		\
>  	/* This helper list is effectively frozen. If you are trying to	\
>  	 * add a new helper, you should add a kfunc instead which has	\
>  	 * less stability guarantees. See Documentation/bpf/kfuncs.rst	\
> -- 
> 2.49.0.906.g1f30a19c02-goog
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bpf: add bpf_msleep_interruptible()
  2025-05-05  9:49 ` Matt Bobrowski
@ 2025-05-05 13:26   ` Sergey Senozhatsky
  2025-05-05 19:01     ` Song Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Sergey Senozhatsky @ 2025-05-05 13:26 UTC (permalink / raw)
  To: Matt Bobrowski
  Cc: Sergey Senozhatsky, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu, bpf,
	linux-kernel

On (25/05/05 09:49), Matt Bobrowski wrote:
> I noticed that you've written the newly proposed BPF helper in the
> legacy BPF helper form, which I believe is now discouraged, as also
> stated within the above comment. You probably want to respin this
> patch series having written this newly proposed BPF helper in BPF
> kfuncs [0] form instead.

Oh, okay, I didn't know about kfunc.  So I guess it's something like
the patch below.

> Additionally, as part of your patch series I think you'll also want to
> include some selftests.

Let me take a look.  Any hints on how to test it?

---
 include/uapi/linux/bpf.h       | 8 ++++++++
 kernel/bpf/helpers.c           | 6 ++++++
 tools/include/uapi/linux/bpf.h | 8 ++++++++
 3 files changed, 22 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 71d5ac83cf5d..8624cb2ac7d9 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5814,6 +5814,14 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf_local_storage cannot be found.
+ *
+ * unsigned long bpf_msleep_interruptible(unsigned int msecs)
+ *	Description
+ *		Make the current task sleep until *msecs* milliseconds have
+ *		elapsed or until a signal is received.
+ *
+ *	Return
+ *		The remaining time of the sleep duration in milliseconds.
  */
 #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
 	FN(unspec, 0, ##ctx)				\
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e3a2662f4e33..6ba75bf816f2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -3194,6 +3194,11 @@ __bpf_kfunc void bpf_local_irq_restore(unsigned long *flags__irq_flag)
 	local_irq_restore(*flags__irq_flag);
 }
 
+__bpf_kfunc unsigned long bpf_msleep_interruptible(unsigned int msecs)
+{
+	return msleep_interruptible(msecs);
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(generic_btf_ids)
@@ -3294,6 +3299,7 @@ BTF_ID_FLAGS(func, bpf_iter_kmem_cache_next, KF_ITER_NEXT | KF_RET_NULL | KF_SLE
 BTF_ID_FLAGS(func, bpf_iter_kmem_cache_destroy, KF_ITER_DESTROY | KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_local_irq_save)
 BTF_ID_FLAGS(func, bpf_local_irq_restore)
+BTF_ID_FLAGS(func, bpf_msleep_interruptible)
 BTF_KFUNCS_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 71d5ac83cf5d..8624cb2ac7d9 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5814,6 +5814,14 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf_local_storage cannot be found.
+ *
+ * unsigned long bpf_msleep_interruptible(unsigned int msecs)
+ *	Description
+ *		Make the current task sleep until *msecs* milliseconds have
+ *		elapsed or until a signal is received.
+ *
+ *	Return
+ *		The remaining time of the sleep duration in milliseconds.
  */
 #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
 	FN(unspec, 0, ##ctx)				\
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] bpf: add bpf_msleep_interruptible()
  2025-05-05 13:26   ` Sergey Senozhatsky
@ 2025-05-05 19:01     ` Song Liu
  2025-05-06  2:22       ` Sergey Senozhatsky
  0 siblings, 1 reply; 12+ messages in thread
From: Song Liu @ 2025-05-05 19:01 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Matt Bobrowski, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Martin KaFai Lau, Eduard Zingerman, bpf,
	linux-kernel

On Mon, May 5, 2025 at 6:27 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (25/05/05 09:49), Matt Bobrowski wrote:
> > I noticed that you've written the newly proposed BPF helper in the
> > legacy BPF helper form, which I believe is now discouraged, as also
> > stated within the above comment. You probably want to respin this
> > patch series having written this newly proposed BPF helper in BPF
> > kfuncs [0] form instead.
>
> Oh, okay, I didn't know about kfunc.  So I guess it's something like
> the patch below.
>
> > Additionally, as part of your patch series I think you'll also want to
> > include some selftests.
>
> Let me take a look.  Any hints on how to test it?

Please check out tools/testing/selftests/bpf/. The most common way
is to add files to tools/testing/selftests/bpf/progs and
tools/testing/selftests/bpf/prog_tests, and then use test_progs.

>
> ---
>  include/uapi/linux/bpf.h       | 8 ++++++++
>  kernel/bpf/helpers.c           | 6 ++++++
>  tools/include/uapi/linux/bpf.h | 8 ++++++++
>  3 files changed, 22 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 71d5ac83cf5d..8624cb2ac7d9 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5814,6 +5814,14 @@ union bpf_attr {
>   *             0 on success.
>   *
>   *             **-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * unsigned long bpf_msleep_interruptible(unsigned int msecs)
> + *     Description
> + *             Make the current task sleep until *msecs* milliseconds have
> + *             elapsed or until a signal is received.
> + *
> + *     Return
> + *             The remaining time of the sleep duration in milliseconds.
>   */
>  #define ___BPF_FUNC_MAPPER(FN, ctx...)                 \
>         FN(unspec, 0, ##ctx)                            \

kfunc shouldn't have any changes in include/uapi/linux/bpf.h.

Thanks,
Song

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bpf: add bpf_msleep_interruptible()
  2025-05-05  6:38 [PATCH] bpf: add bpf_msleep_interruptible() Sergey Senozhatsky
  2025-05-05  9:49 ` Matt Bobrowski
@ 2025-05-05 19:07 ` Song Liu
  2025-05-05 19:30   ` Kumar Kartikeya Dwivedi
  2025-05-05 19:55 ` Andrii Nakryiko
  2 siblings, 1 reply; 12+ messages in thread
From: Song Liu @ 2025-05-05 19:07 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Martin KaFai Lau, Eduard Zingerman, bpf, linux-kernel

On Sun, May 4, 2025 at 11:40 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> bpf_msleep_interruptible() puts a calling context into an
> interruptible sleep.  This function is expected to be used
> for testing only (perhaps in conjunction with fault-injection)
> to simulate various execution delays or timeouts.

Please keep in mind that BPF programs run in not sleepable
contexts, including NMIs. Maybe udelay() is a better option?

Thanks,
Song

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bpf: add bpf_msleep_interruptible()
  2025-05-05 19:07 ` Song Liu
@ 2025-05-05 19:30   ` Kumar Kartikeya Dwivedi
  2025-05-06  2:20     ` Sergey Senozhatsky
  0 siblings, 1 reply; 12+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-05 19:30 UTC (permalink / raw)
  To: Song Liu
  Cc: Sergey Senozhatsky, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Martin KaFai Lau, Eduard Zingerman, bpf,
	linux-kernel

On Mon, 5 May 2025 at 21:07, Song Liu <song@kernel.org> wrote:
>
> On Sun, May 4, 2025 at 11:40 PM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
> >
> > bpf_msleep_interruptible() puts a calling context into an
> > interruptible sleep.  This function is expected to be used
> > for testing only (perhaps in conjunction with fault-injection)
> > to simulate various execution delays or timeouts.
>
> Please keep in mind that BPF programs run in not sleepable
> contexts, including NMIs. Maybe udelay() is a better option?

Or mark the kfunc KF_SLEEPABLE, IIUC the intention is to use it from
hooks which are sleepable.

>
> Thanks,
> Song
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bpf: add bpf_msleep_interruptible()
  2025-05-05  6:38 [PATCH] bpf: add bpf_msleep_interruptible() Sergey Senozhatsky
  2025-05-05  9:49 ` Matt Bobrowski
  2025-05-05 19:07 ` Song Liu
@ 2025-05-05 19:55 ` Andrii Nakryiko
  2025-05-05 20:45   ` Kumar Kartikeya Dwivedi
  2025-05-06  2:29   ` Sergey Senozhatsky
  2 siblings, 2 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2025-05-05 19:55 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, bpf, linux-kernel

On Sun, May 4, 2025 at 11:40 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> bpf_msleep_interruptible() puts a calling context into an
> interruptible sleep.  This function is expected to be used
> for testing only (perhaps in conjunction with fault-injection)
> to simulate various execution delays or timeouts.

I'm a bit worried that we'll be opening a bit too much of an
opportunity to arbitrarily slow down kernel in a way that would be
actually pretty hard to detect (CPU profilers won't see this, you'd
need to rely on off-CPU profiling, which is not as developed as far as
profilers go).

I understand the appeal, don't get me wrong, but we have no way to
enforce "is expected to be used for testing only". It's also all too
easy to sleep for a really long time, and there isn't really any
reasonable limit that would mitigate this, IMO.

If I had to do this for my own testing/fuzzing needs, I'd probably try
to go with a custom kfunc provided by my small and trivial kernel
module (modules can extend BPF with custom kfuncs). And see if it's
useful.

One other alternative to enforce the "for testing only" aspect might
be a custom kernel config, that would be expected to not make it into
production. Though I'd start with the kernel module approach first,
probably.

P.S. BPF's "sleepable" is really "faultable", where a BPF program
might wait (potentially for a long time) for kernel to fault memory
in, but that's a bit more well-defined sleeping behavior. Here it's
just a random amount of time to put whatever task the BPF program
happened to run in the context of, which seems like a much bigger
leap. So while we do have sleepable BPF programs, they can't just
arbitrarily and voluntarily sleep (at least today).

P.P.S. And when you think about this, we do rely on sleepable/trace
RCU grace periods to be not controlled so directly and arbitrarily by
any one BPF program, while here with bpf_msleep_interruptible() we'll
be giving a lot of control to one BPF program to delay resource
freeing of all other BPF programs (and not just sleepable ones, mind
you: think sleepable hooks running non-sleepable BPF programs, like
with sleepable tracepoints of uprobes).

>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  include/linux/bpf.h            |  1 +
>  include/uapi/linux/bpf.h       |  9 +++++++++
>  kernel/bpf/helpers.c           | 13 +++++++++++++
>  kernel/trace/bpf_trace.c       |  2 ++
>  tools/include/uapi/linux/bpf.h |  9 +++++++++
>  5 files changed, 34 insertions(+)
>

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bpf: add bpf_msleep_interruptible()
  2025-05-05 19:55 ` Andrii Nakryiko
@ 2025-05-05 20:45   ` Kumar Kartikeya Dwivedi
  2025-05-05 21:19     ` Andrii Nakryiko
  2025-05-06  2:29   ` Sergey Senozhatsky
  1 sibling, 1 reply; 12+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-05 20:45 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Sergey Senozhatsky, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu, bpf,
	linux-kernel

On Mon, 5 May 2025 at 21:57, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Sun, May 4, 2025 at 11:40 PM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
> >
> > bpf_msleep_interruptible() puts a calling context into an
> > interruptible sleep.  This function is expected to be used
> > for testing only (perhaps in conjunction with fault-injection)
> > to simulate various execution delays or timeouts.
>
> I'm a bit worried that we'll be opening a bit too much of an
> opportunity to arbitrarily slow down kernel in a way that would be
> actually pretty hard to detect (CPU profilers won't see this, you'd
> need to rely on off-CPU profiling, which is not as developed as far as
> profilers go).
>
> I understand the appeal, don't get me wrong, but we have no way to
> enforce "is expected to be used for testing only". It's also all too
> easy to sleep for a really long time, and there isn't really any
> reasonable limit that would mitigate this, IMO.
>
> If I had to do this for my own testing/fuzzing needs, I'd probably try
> to go with a custom kfunc provided by my small and trivial kernel
> module (modules can extend BPF with custom kfuncs). And see if it's
> useful.
>
> One other alternative to enforce the "for testing only" aspect might
> be a custom kernel config, that would be expected to not make it into
> production. Though I'd start with the kernel module approach first,
> probably.
>
> P.S. BPF's "sleepable" is really "faultable", where a BPF program
> might wait (potentially for a long time) for kernel to fault memory
> in, but that's a bit more well-defined sleeping behavior. Here it's
> just a random amount of time to put whatever task the BPF program
> happened to run in the context of, which seems like a much bigger
> leap. So while we do have sleepable BPF programs, they can't just
> arbitrarily and voluntarily sleep (at least today).
>
> P.P.S. And when you think about this, we do rely on sleepable/trace
> RCU grace periods to be not controlled so directly and arbitrarily by
> any one BPF program, while here with bpf_msleep_interruptible() we'll
> be giving a lot of control to one BPF program to delay resource
> freeing of all other BPF programs (and not just sleepable ones, mind
> you: think sleepable hooks running non-sleepable BPF programs, like
> with sleepable tracepoints of uprobes).
>

I agree with the sentiment, but I think it's already possible such that
adding this will neither worsen or improve the status quo.

You can have a userfaultfd in user space, and do a bpf_copy_from_user
on the address such that you can trap the fault for as long as you
wish (with rcu_tasks_trace read section open) [0]. I used it in the
past to reconstruct race conditions.

  [0]: https://lore.kernel.org/bpf/20220114163953.1455836-11-memxor@gmail.com.

So we probably need a solution to this problem even for 'faulting'
sleep, perhaps by scoping the read section to the program with SRCU,
or something similar.


> >
> > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> > ---
> >  include/linux/bpf.h            |  1 +
> >  include/uapi/linux/bpf.h       |  9 +++++++++
> >  kernel/bpf/helpers.c           | 13 +++++++++++++
> >  kernel/trace/bpf_trace.c       |  2 ++
> >  tools/include/uapi/linux/bpf.h |  9 +++++++++
> >  5 files changed, 34 insertions(+)
> >
>
> [...]
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bpf: add bpf_msleep_interruptible()
  2025-05-05 20:45   ` Kumar Kartikeya Dwivedi
@ 2025-05-05 21:19     ` Andrii Nakryiko
  0 siblings, 0 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2025-05-05 21:19 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Sergey Senozhatsky, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu, bpf,
	linux-kernel

On Mon, May 5, 2025 at 1:46 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Mon, 5 May 2025 at 21:57, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sun, May 4, 2025 at 11:40 PM Sergey Senozhatsky
> > <senozhatsky@chromium.org> wrote:
> > >
> > > bpf_msleep_interruptible() puts a calling context into an
> > > interruptible sleep.  This function is expected to be used
> > > for testing only (perhaps in conjunction with fault-injection)
> > > to simulate various execution delays or timeouts.
> >
> > I'm a bit worried that we'll be opening a bit too much of an
> > opportunity to arbitrarily slow down kernel in a way that would be
> > actually pretty hard to detect (CPU profilers won't see this, you'd
> > need to rely on off-CPU profiling, which is not as developed as far as
> > profilers go).
> >
> > I understand the appeal, don't get me wrong, but we have no way to
> > enforce "is expected to be used for testing only". It's also all too
> > easy to sleep for a really long time, and there isn't really any
> > reasonable limit that would mitigate this, IMO.
> >
> > If I had to do this for my own testing/fuzzing needs, I'd probably try
> > to go with a custom kfunc provided by my small and trivial kernel
> > module (modules can extend BPF with custom kfuncs). And see if it's
> > useful.
> >
> > One other alternative to enforce the "for testing only" aspect might
> > be a custom kernel config, that would be expected to not make it into
> > production. Though I'd start with the kernel module approach first,
> > probably.
> >
> > P.S. BPF's "sleepable" is really "faultable", where a BPF program
> > might wait (potentially for a long time) for kernel to fault memory
> > in, but that's a bit more well-defined sleeping behavior. Here it's
> > just a random amount of time to put whatever task the BPF program
> > happened to run in the context of, which seems like a much bigger
> > leap. So while we do have sleepable BPF programs, they can't just
> > arbitrarily and voluntarily sleep (at least today).
> >
> > P.P.S. And when you think about this, we do rely on sleepable/trace
> > RCU grace periods to be not controlled so directly and arbitrarily by
> > any one BPF program, while here with bpf_msleep_interruptible() we'll
> > be giving a lot of control to one BPF program to delay resource
> > freeing of all other BPF programs (and not just sleepable ones, mind
> > you: think sleepable hooks running non-sleepable BPF programs, like
> > with sleepable tracepoints of uprobes).
> >
>
> I agree with the sentiment, but I think it's already possible such that
> adding this will neither worsen or improve the status quo.
>
> You can have a userfaultfd in user space, and do a bpf_copy_from_user
> on the address such that you can trap the fault for as long as you
> wish (with rcu_tasks_trace read section open) [0]. I used it in the
> past to reconstruct race conditions.

That's true, but a) you can disable userfaultfd if you can't accept
user-space arbitrarily delaying kernel page fault code path, and b)
you'd have to jump through quite a lot of hoops to achieve this. So
yes, similar issue exists with userfaultfd, but here it's quite a lot
easier to accidentally misuse.

>
>   [0]: https://lore.kernel.org/bpf/20220114163953.1455836-11-memxor@gmail.com.
>
> So we probably need a solution to this problem even for 'faulting'
> sleep, perhaps by scoping the read section to the program with SRCU,
> or something similar.

The problem is that whatever flavor of RCU we use, it can't be just
bound to individual programs. RCU is used to protect maps from being
freed too soon, and maps are a shared resource across BPF programs (in
arbitrary configurations that are completely user dependent). So the
solution would have to be a bit more nuanced than just using a
separate RCU domain for a given program or a group of programs,
probably.

>
>
> > >
> > > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> > > ---
> > >  include/linux/bpf.h            |  1 +
> > >  include/uapi/linux/bpf.h       |  9 +++++++++
> > >  kernel/bpf/helpers.c           | 13 +++++++++++++
> > >  kernel/trace/bpf_trace.c       |  2 ++
> > >  tools/include/uapi/linux/bpf.h |  9 +++++++++
> > >  5 files changed, 34 insertions(+)
> > >
> >
> > [...]
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bpf: add bpf_msleep_interruptible()
  2025-05-05 19:30   ` Kumar Kartikeya Dwivedi
@ 2025-05-06  2:20     ` Sergey Senozhatsky
  0 siblings, 0 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2025-05-06  2:20 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Song Liu
  Cc: Sergey Senozhatsky, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Martin KaFai Lau, Eduard Zingerman, bpf,
	linux-kernel

On (25/05/05 21:30), Kumar Kartikeya Dwivedi wrote:
> On Mon, 5 May 2025 at 21:07, Song Liu <song@kernel.org> wrote:
> > Please keep in mind that BPF programs run in not sleepable
> > contexts, including NMIs. Maybe udelay() is a better option?

If we want to timeout various SCSI operations, for instance, I think
we need to sleep for seconds.  udelay() won't probably work for that.

> Or mark the kfunc KF_SLEEPABLE, IIUC the intention is to use it from
> hooks which are sleepable.

I think it is already marked as KF_SLEEPABLE.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bpf: add bpf_msleep_interruptible()
  2025-05-05 19:01     ` Song Liu
@ 2025-05-06  2:22       ` Sergey Senozhatsky
  0 siblings, 0 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2025-05-06  2:22 UTC (permalink / raw)
  To: Song Liu
  Cc: Sergey Senozhatsky, Matt Bobrowski, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Martin KaFai Lau,
	Eduard Zingerman, bpf, linux-kernel

On (25/05/05 12:01), Song Liu wrote:
[..]
> > + * unsigned long bpf_msleep_interruptible(unsigned int msecs)
> > + *     Description
> > + *             Make the current task sleep until *msecs* milliseconds have
> > + *             elapsed or until a signal is received.
> > + *
> > + *     Return
> > + *             The remaining time of the sleep duration in milliseconds.
> >   */
> >  #define ___BPF_FUNC_MAPPER(FN, ctx...)                 \
> >         FN(unspec, 0, ##ctx)                            \
> 
> kfunc shouldn't have any changes in include/uapi/linux/bpf.h.

Ack.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bpf: add bpf_msleep_interruptible()
  2025-05-05 19:55 ` Andrii Nakryiko
  2025-05-05 20:45   ` Kumar Kartikeya Dwivedi
@ 2025-05-06  2:29   ` Sergey Senozhatsky
  1 sibling, 0 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2025-05-06  2:29 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Sergey Senozhatsky, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu, bpf,
	linux-kernel

On (25/05/05 12:55), Andrii Nakryiko wrote:
[..]
> I understand the appeal, don't get me wrong, but we have no way to
> enforce "is expected to be used for testing only". It's also all too
> easy to sleep for a really long time, and there isn't really any
> reasonable limit that would mitigate this, IMO.

Sure, I understand your concerns.

> If I had to do this for my own testing/fuzzing needs, I'd probably try
> to go with a custom kfunc provided by my small and trivial kernel
> module (modules can extend BPF with custom kfuncs). And see if it's
> useful.

A downstream kernel module?  I guess I can give it a try.

> One other alternative to enforce the "for testing only" aspect might
> be a custom kernel config, that would be expected to not make it into
> production. Though I'd start with the kernel module approach first,
> probably.

Something like `Depends on: DEBUG_KERNEL` maybe?

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-05-06  2:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-05  6:38 [PATCH] bpf: add bpf_msleep_interruptible() Sergey Senozhatsky
2025-05-05  9:49 ` Matt Bobrowski
2025-05-05 13:26   ` Sergey Senozhatsky
2025-05-05 19:01     ` Song Liu
2025-05-06  2:22       ` Sergey Senozhatsky
2025-05-05 19:07 ` Song Liu
2025-05-05 19:30   ` Kumar Kartikeya Dwivedi
2025-05-06  2:20     ` Sergey Senozhatsky
2025-05-05 19:55 ` Andrii Nakryiko
2025-05-05 20:45   ` Kumar Kartikeya Dwivedi
2025-05-05 21:19     ` Andrii Nakryiko
2025-05-06  2:29   ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.