* [PATCH v3 1/2] PM: wakeup: Add kfuncs to traverse over wakeup_sources
2026-03-31 15:34 [PATCH v3 0/2] Support BPF traversal of wakeup sources Samuel Wu
@ 2026-03-31 15:34 ` Samuel Wu
2026-04-01 9:11 ` Greg Kroah-Hartman
2026-03-31 15:34 ` [PATCH v3 2/2] selftests/bpf: Add tests for wakeup_sources kfuncs Samuel Wu
2026-04-01 9:15 ` [PATCH v3 0/2] Support BPF traversal of wakeup sources Greg Kroah-Hartman
2 siblings, 1 reply; 11+ messages in thread
From: Samuel Wu @ 2026-03-31 15:34 UTC (permalink / raw)
To: Rafael J. Wysocki, Len Brown, Pavel Machek, Greg Kroah-Hartman,
Danilo Krummrich, Andrii Nakryiko, Eduard Zingerman,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
Shuah Khan
Cc: Samuel Wu, kernel-team, linux-kernel, linux-pm, driver-core, bpf,
linux-kselftest
Iterating through wakeup sources via sysfs or debugfs can be inefficient
or restricted. Introduce BPF kfuncs to allow high-performance and safe
in-kernel traversal of the wakeup_sources list.
The new kfuncs include:
- bpf_wakeup_sources_get_head() to obtain the list head.
- bpf_wakeup_sources_read_lock/unlock() to manage the SRCU lock.
For verifier safety, the underlying SRCU index is wrapped in an opaque
'struct bpf_ws_lock' pointer. This enables the use of KF_ACQUIRE and
KF_RELEASE flags, allowing the BPF verifier to strictly enforce paired
lock/unlock cycles and prevent resource leaks.
Signed-off-by: Samuel Wu <wusamuel@google.com>
---
drivers/base/power/power.h | 7 ++++
drivers/base/power/wakeup.c | 72 +++++++++++++++++++++++++++++++++++--
2 files changed, 77 insertions(+), 2 deletions(-)
diff --git a/drivers/base/power/power.h b/drivers/base/power/power.h
index 922ed457db19..8823aceeac8b 100644
--- a/drivers/base/power/power.h
+++ b/drivers/base/power/power.h
@@ -168,3 +168,10 @@ static inline void device_pm_init(struct device *dev)
device_pm_sleep_init(dev);
pm_runtime_init(dev);
}
+
+#ifdef CONFIG_BPF_SYSCALL
+struct bpf_ws_lock { };
+struct bpf_ws_lock *bpf_wakeup_sources_read_lock(void);
+void bpf_wakeup_sources_read_unlock(struct bpf_ws_lock *lock);
+void *bpf_wakeup_sources_get_head(void);
+#endif
diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index b8e48a023bf0..8eda7d35d9cc 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -1168,11 +1168,79 @@ static const struct file_operations wakeup_sources_stats_fops = {
.release = seq_release_private,
};
-static int __init wakeup_sources_debugfs_init(void)
+#ifdef CONFIG_BPF_SYSCALL
+#include <linux/btf.h>
+
+__bpf_kfunc_start_defs();
+
+/**
+ * bpf_wakeup_sources_read_lock - Acquire the SRCU lock for wakeup sources
+ *
+ * The underlying SRCU lock returns an integer index. However, the BPF verifier
+ * requires a pointer (PTR_TO_BTF_ID) to strictly track the state of acquired
+ * resources using KF_ACQUIRE and KF_RELEASE semantics. We use an opaque
+ * structure pointer (struct bpf_ws_lock *) to satisfy the verifier while
+ * safely encoding the integer index within the pointer address itself.
+ *
+ * Return: An opaque pointer encoding the SRCU lock index + 1 (to avoid NULL).
+ */
+__bpf_kfunc struct bpf_ws_lock *bpf_wakeup_sources_read_lock(void)
+{
+ return (struct bpf_ws_lock *)(long)(wakeup_sources_read_lock() + 1);
+}
+
+/**
+ * bpf_wakeup_sources_read_unlock - Release the SRCU lock for wakeup sources
+ * @lock: The opaque pointer returned by bpf_wakeup_sources_read_lock()
+ *
+ * The BPF verifier guarantees that @lock is a valid, unreleased pointer from
+ * the acquire function. We decode the pointer back into the integer SRCU index
+ * by subtracting 1 and release the lock.
+ */
+__bpf_kfunc void bpf_wakeup_sources_read_unlock(struct bpf_ws_lock *lock)
+{
+ wakeup_sources_read_unlock((int)(long)lock - 1);
+}
+
+/**
+ * bpf_wakeup_sources_get_head - Get the head of the wakeup sources list
+ *
+ * Return: The head of the wakeup sources list.
+ */
+__bpf_kfunc void *bpf_wakeup_sources_get_head(void)
+{
+ return &wakeup_sources;
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(wakeup_source_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_wakeup_sources_read_lock, KF_ACQUIRE)
+BTF_ID_FLAGS(func, bpf_wakeup_sources_read_unlock, KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_wakeup_sources_get_head)
+BTF_KFUNCS_END(wakeup_source_kfunc_ids)
+
+static const struct btf_kfunc_id_set wakeup_source_kfunc_set = {
+ .owner = THIS_MODULE,
+ .set = &wakeup_source_kfunc_ids,
+};
+
+static void __init wakeup_sources_bpf_init(void)
+{
+ if (register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &wakeup_source_kfunc_set))
+ pm_pr_dbg("Wakeup: failed to register BTF kfuncs\n");
+}
+#else
+static inline void wakeup_sources_bpf_init(void) {}
+#endif /* CONFIG_BPF_SYSCALL */
+
+static int __init wakeup_sources_init(void)
{
debugfs_create_file("wakeup_sources", 0444, NULL, NULL,
&wakeup_sources_stats_fops);
+ wakeup_sources_bpf_init();
+
return 0;
}
-postcore_initcall(wakeup_sources_debugfs_init);
+postcore_initcall(wakeup_sources_init);
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v3 1/2] PM: wakeup: Add kfuncs to traverse over wakeup_sources
2026-03-31 15:34 ` [PATCH v3 1/2] PM: wakeup: Add kfuncs to traverse over wakeup_sources Samuel Wu
@ 2026-04-01 9:11 ` Greg Kroah-Hartman
2026-04-01 14:22 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2026-04-01 9:11 UTC (permalink / raw)
To: Samuel Wu
Cc: Rafael J. Wysocki, Len Brown, Pavel Machek, Danilo Krummrich,
Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, kernel-team,
linux-kernel, linux-pm, driver-core, bpf, linux-kselftest
On Tue, Mar 31, 2026 at 08:34:10AM -0700, Samuel Wu wrote:
> Iterating through wakeup sources via sysfs or debugfs can be inefficient
> or restricted. Introduce BPF kfuncs to allow high-performance and safe
> in-kernel traversal of the wakeup_sources list.
What exactly is "inefficient"? I think you might have some numbers in
your 0/2 patch, but putting it in here would be best.
And who is going to be calling these functions, just ebpf scripts?
> The new kfuncs include:
> - bpf_wakeup_sources_get_head() to obtain the list head.
> - bpf_wakeup_sources_read_lock/unlock() to manage the SRCU lock.
Does this mean we can stop exporting wakeup_sources_read_lock() now?
> For verifier safety, the underlying SRCU index is wrapped in an opaque
> 'struct bpf_ws_lock' pointer. This enables the use of KF_ACQUIRE and
> KF_RELEASE flags, allowing the BPF verifier to strictly enforce paired
> lock/unlock cycles and prevent resource leaks.
But it's an index, not a lock. Is this just a verifier thing?
>
> Signed-off-by: Samuel Wu <wusamuel@google.com>
> ---
> drivers/base/power/power.h | 7 ++++
> drivers/base/power/wakeup.c | 72 +++++++++++++++++++++++++++++++++++--
> 2 files changed, 77 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/base/power/power.h b/drivers/base/power/power.h
> index 922ed457db19..8823aceeac8b 100644
> --- a/drivers/base/power/power.h
> +++ b/drivers/base/power/power.h
> @@ -168,3 +168,10 @@ static inline void device_pm_init(struct device *dev)
> device_pm_sleep_init(dev);
> pm_runtime_init(dev);
> }
> +
> +#ifdef CONFIG_BPF_SYSCALL
> +struct bpf_ws_lock { };
An empty structure? This is just an int, so you are casting an int to a
pointer? Can we make wakeup_sources_read_lock() actually use a
structure instead to make this simpler?
> +struct bpf_ws_lock *bpf_wakeup_sources_read_lock(void);
> +void bpf_wakeup_sources_read_unlock(struct bpf_ws_lock *lock);
> +void *bpf_wakeup_sources_get_head(void);
> +#endif
> diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> index b8e48a023bf0..8eda7d35d9cc 100644
> --- a/drivers/base/power/wakeup.c
> +++ b/drivers/base/power/wakeup.c
> @@ -1168,11 +1168,79 @@ static const struct file_operations wakeup_sources_stats_fops = {
> .release = seq_release_private,
> };
>
> -static int __init wakeup_sources_debugfs_init(void)
> +#ifdef CONFIG_BPF_SYSCALL
> +#include <linux/btf.h>
> +
> +__bpf_kfunc_start_defs();
> +
> +/**
> + * bpf_wakeup_sources_read_lock - Acquire the SRCU lock for wakeup sources
> + *
> + * The underlying SRCU lock returns an integer index. However, the BPF verifier
> + * requires a pointer (PTR_TO_BTF_ID) to strictly track the state of acquired
> + * resources using KF_ACQUIRE and KF_RELEASE semantics. We use an opaque
> + * structure pointer (struct bpf_ws_lock *) to satisfy the verifier while
> + * safely encoding the integer index within the pointer address itself.
> + *
> + * Return: An opaque pointer encoding the SRCU lock index + 1 (to avoid NULL).
> + */
> +__bpf_kfunc struct bpf_ws_lock *bpf_wakeup_sources_read_lock(void)
> +{
> + return (struct bpf_ws_lock *)(long)(wakeup_sources_read_lock() + 1);
Why are you incrementing this by 1?
> +}
> +
> +/**
> + * bpf_wakeup_sources_read_unlock - Release the SRCU lock for wakeup sources
> + * @lock: The opaque pointer returned by bpf_wakeup_sources_read_lock()
> + *
> + * The BPF verifier guarantees that @lock is a valid, unreleased pointer from
> + * the acquire function. We decode the pointer back into the integer SRCU index
> + * by subtracting 1 and release the lock.
> + */
> +__bpf_kfunc void bpf_wakeup_sources_read_unlock(struct bpf_ws_lock *lock)
> +{
> + wakeup_sources_read_unlock((int)(long)lock - 1);
Why decrementing by one?
So it's really an int, but you are casting it to a pointer, incrementing
it by one to make it a "fake" pointer value (i.e. unaligned mess), and
then when unlocking casting the pointer back to an int, and then
decrementing the value?
This feels "odd" :(
> +}
> +
> +/**
> + * bpf_wakeup_sources_get_head - Get the head of the wakeup sources list
> + *
> + * Return: The head of the wakeup sources list.
> + */
> +__bpf_kfunc void *bpf_wakeup_sources_get_head(void)
> +{
> + return &wakeup_sources;
> +}
> +
> +__bpf_kfunc_end_defs();
> +
> +BTF_KFUNCS_START(wakeup_source_kfunc_ids)
> +BTF_ID_FLAGS(func, bpf_wakeup_sources_read_lock, KF_ACQUIRE)
> +BTF_ID_FLAGS(func, bpf_wakeup_sources_read_unlock, KF_RELEASE)
> +BTF_ID_FLAGS(func, bpf_wakeup_sources_get_head)
> +BTF_KFUNCS_END(wakeup_source_kfunc_ids)
> +
> +static const struct btf_kfunc_id_set wakeup_source_kfunc_set = {
> + .owner = THIS_MODULE,
This isn't a module.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v3 1/2] PM: wakeup: Add kfuncs to traverse over wakeup_sources
2026-04-01 9:11 ` Greg Kroah-Hartman
@ 2026-04-01 14:22 ` Kumar Kartikeya Dwivedi
0 siblings, 0 replies; 11+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-04-01 14:22 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Samuel Wu, Rafael J. Wysocki, Len Brown, Pavel Machek,
Danilo Krummrich, Andrii Nakryiko, Eduard Zingerman,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, Jiri Olsa, Shuah Khan, kernel-team, linux-kernel,
linux-pm, driver-core, bpf, linux-kselftest
On Wed, 1 Apr 2026 at 11:11, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Mar 31, 2026 at 08:34:10AM -0700, Samuel Wu wrote:
> > Iterating through wakeup sources via sysfs or debugfs can be inefficient
> > or restricted. Introduce BPF kfuncs to allow high-performance and safe
> > in-kernel traversal of the wakeup_sources list.
>
> What exactly is "inefficient"? I think you might have some numbers in
> your 0/2 patch, but putting it in here would be best.
>
> And who is going to be calling these functions, just ebpf scripts?
>
> > The new kfuncs include:
> > - bpf_wakeup_sources_get_head() to obtain the list head.
> > - bpf_wakeup_sources_read_lock/unlock() to manage the SRCU lock.
>
> Does this mean we can stop exporting wakeup_sources_read_lock() now?
>
> > For verifier safety, the underlying SRCU index is wrapped in an opaque
> > 'struct bpf_ws_lock' pointer. This enables the use of KF_ACQUIRE and
> > KF_RELEASE flags, allowing the BPF verifier to strictly enforce paired
> > lock/unlock cycles and prevent resource leaks.
>
> But it's an index, not a lock. Is this just a verifier thing?
It's a verifier thing. The index must be passed to SRCU unlock wrapped
by the unlock kfunc for correctness. The verifier understands such
acquire/release tracking for pointers (e.g., taking refcount and
putting it), but not for scalar values, so we need to launder it
through a pointer to an empty struct, which isn't really usable except
for passing it eventually to unlock. If the program doesn't do the
unlock, the verifier will reject it.
>
> >
> > Signed-off-by: Samuel Wu <wusamuel@google.com>
> > ---
> > drivers/base/power/power.h | 7 ++++
> > drivers/base/power/wakeup.c | 72 +++++++++++++++++++++++++++++++++++--
> > 2 files changed, 77 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/base/power/power.h b/drivers/base/power/power.h
> > index 922ed457db19..8823aceeac8b 100644
> > --- a/drivers/base/power/power.h
> > +++ b/drivers/base/power/power.h
> > @@ -168,3 +168,10 @@ static inline void device_pm_init(struct device *dev)
> > device_pm_sleep_init(dev);
> > pm_runtime_init(dev);
> > }
> > +
> > +#ifdef CONFIG_BPF_SYSCALL
> > +struct bpf_ws_lock { };
>
> An empty structure? This is just an int, so you are casting an int to a
> pointer? Can we make wakeup_sources_read_lock() actually use a
> structure instead to make this simpler?
See above.
>
> > +struct bpf_ws_lock *bpf_wakeup_sources_read_lock(void);
> > +void bpf_wakeup_sources_read_unlock(struct bpf_ws_lock *lock);
> > +void *bpf_wakeup_sources_get_head(void);
> > +#endif
> > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> > index b8e48a023bf0..8eda7d35d9cc 100644
> > --- a/drivers/base/power/wakeup.c
> > +++ b/drivers/base/power/wakeup.c
> > @@ -1168,11 +1168,79 @@ static const struct file_operations wakeup_sources_stats_fops = {
> > .release = seq_release_private,
> > };
> >
> > -static int __init wakeup_sources_debugfs_init(void)
> > +#ifdef CONFIG_BPF_SYSCALL
> > +#include <linux/btf.h>
> > +
> > +__bpf_kfunc_start_defs();
> > +
> > +/**
> > + * bpf_wakeup_sources_read_lock - Acquire the SRCU lock for wakeup sources
> > + *
> > + * The underlying SRCU lock returns an integer index. However, the BPF verifier
> > + * requires a pointer (PTR_TO_BTF_ID) to strictly track the state of acquired
> > + * resources using KF_ACQUIRE and KF_RELEASE semantics. We use an opaque
> > + * structure pointer (struct bpf_ws_lock *) to satisfy the verifier while
> > + * safely encoding the integer index within the pointer address itself.
> > + *
> > + * Return: An opaque pointer encoding the SRCU lock index + 1 (to avoid NULL).
> > + */
> > +__bpf_kfunc struct bpf_ws_lock *bpf_wakeup_sources_read_lock(void)
> > +{
> > + return (struct bpf_ws_lock *)(long)(wakeup_sources_read_lock() + 1);
>
> Why are you incrementing this by 1?
I think SRCU indices can be 0, so it would appear as a NULL pointer to
the program.
>
> > +}
> > +
> > +/**
> > + * bpf_wakeup_sources_read_unlock - Release the SRCU lock for wakeup sources
> > + * @lock: The opaque pointer returned by bpf_wakeup_sources_read_lock()
> > + *
> > + * The BPF verifier guarantees that @lock is a valid, unreleased pointer from
> > + * the acquire function. We decode the pointer back into the integer SRCU index
> > + * by subtracting 1 and release the lock.
> > + */
> > +__bpf_kfunc void bpf_wakeup_sources_read_unlock(struct bpf_ws_lock *lock)
> > +{
> > + wakeup_sources_read_unlock((int)(long)lock - 1);
>
> Why decrementing by one?
>
> So it's really an int, but you are casting it to a pointer, incrementing
> it by one to make it a "fake" pointer value (i.e. unaligned mess), and
> then when unlocking casting the pointer back to an int, and then
> decrementing the value?
>
> This feels "odd" :(
It isn't readable, though, because it's an empty struct, so I don't
think it would cause any issues in practice.
>
> [...]
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 2/2] selftests/bpf: Add tests for wakeup_sources kfuncs
2026-03-31 15:34 [PATCH v3 0/2] Support BPF traversal of wakeup sources Samuel Wu
2026-03-31 15:34 ` [PATCH v3 1/2] PM: wakeup: Add kfuncs to traverse over wakeup_sources Samuel Wu
@ 2026-03-31 15:34 ` Samuel Wu
2026-04-01 9:15 ` [PATCH v3 0/2] Support BPF traversal of wakeup sources Greg Kroah-Hartman
2 siblings, 0 replies; 11+ messages in thread
From: Samuel Wu @ 2026-03-31 15:34 UTC (permalink / raw)
To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
Danilo Krummrich, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
Shuah Khan
Cc: Samuel Wu, kernel-team, linux-kernel, linux-pm, driver-core, bpf,
linux-kselftest
Introduce a set of BPF selftests to verify the safety and functionality
of wakeup_source kfuncs.
The suite includes:
1. A functional test (test_wakeup_source.c) that iterates over the
global wakeup_sources list. It uses CO-RE to read timing statistics
and validates them in user-space via the BPF ring buffer.
2. A negative test suite (wakeup_source_fail.c) ensuring the BPF
verifier correctly enforces reference tracking and type safety.
3. Enable CONFIG_PM_WAKELOCKS in the test config, allowing creation of
wakeup sources via /sys/power/wake_lock.
A shared header (wakeup_source.h) is introduced to ensure consistent
memory layout for the Ring Buffer data between BPF and user-space.
Signed-off-by: Samuel Wu <wusamuel@google.com>
---
tools/testing/selftests/bpf/config | 3 +-
.../selftests/bpf/prog_tests/wakeup_source.c | 101 ++++++++++++++++++
.../selftests/bpf/progs/test_wakeup_source.c | 92 ++++++++++++++++
.../selftests/bpf/progs/wakeup_source.h | 22 ++++
.../selftests/bpf/progs/wakeup_source_fail.c | 76 +++++++++++++
5 files changed, 293 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/wakeup_source.c
create mode 100644 tools/testing/selftests/bpf/progs/test_wakeup_source.c
create mode 100644 tools/testing/selftests/bpf/progs/wakeup_source.h
create mode 100644 tools/testing/selftests/bpf/progs/wakeup_source_fail.c
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index 24855381290d..bac60b444551 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -130,4 +130,5 @@ CONFIG_INFINIBAND=y
CONFIG_SMC=y
CONFIG_SMC_HS_CTRL_BPF=y
CONFIG_DIBS=y
-CONFIG_DIBS_LO=y
\ No newline at end of file
+CONFIG_DIBS_LO=y
+CONFIG_PM_WAKELOCKS=y
diff --git a/tools/testing/selftests/bpf/prog_tests/wakeup_source.c b/tools/testing/selftests/bpf/prog_tests/wakeup_source.c
new file mode 100644
index 000000000000..ff2899cbf3a8
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/wakeup_source.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2026 Google LLC */
+
+#include <test_progs.h>
+#include <fcntl.h>
+#include "test_wakeup_source.skel.h"
+#include "wakeup_source_fail.skel.h"
+#include "progs/wakeup_source.h"
+
+static int lock_ws(const char *name)
+{
+ int fd;
+ ssize_t bytes;
+
+ fd = open("/sys/power/wake_lock", O_WRONLY);
+ if (!ASSERT_OK_FD(fd, "open /sys/power/wake_lock"))
+ return -1;
+
+ bytes = write(fd, name, strlen(name));
+ close(fd);
+ if (!ASSERT_EQ(bytes, strlen(name), "write to wake_lock"))
+ return -1;
+
+ return 0;
+}
+
+static void unlock_ws(const char *name)
+{
+ int fd;
+
+ fd = open("/sys/power/wake_unlock", O_WRONLY);
+ if (fd < 0)
+ return;
+
+ write(fd, name, strlen(name));
+ close(fd);
+}
+
+struct rb_ctx {
+ const char *name;
+ bool found;
+ long long active_time_ns;
+ long long total_time_ns;
+};
+
+static int process_sample(void *ctx, void *data, size_t len)
+{
+ struct rb_ctx *rb_ctx = ctx;
+ struct wakeup_event_t *e = data;
+
+ if (strcmp(e->name, rb_ctx->name) == 0) {
+ rb_ctx->found = true;
+ rb_ctx->active_time_ns = e->active_time_ns;
+ rb_ctx->total_time_ns = e->total_time_ns;
+ }
+ return 0;
+}
+
+void test_wakeup_source(void)
+{
+ if (test__start_subtest("iterate_and_verify_times")) {
+ struct test_wakeup_source *skel;
+ struct ring_buffer *rb = NULL;
+ struct rb_ctx rb_ctx = {
+ .name = "bpf_selftest_ws_times",
+ .found = false,
+ };
+ int err;
+
+ skel = test_wakeup_source__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+ return;
+
+ rb = ring_buffer__new(bpf_map__fd(skel->maps.rb), process_sample, &rb_ctx, NULL);
+ if (!ASSERT_OK_PTR(rb, "ring_buffer__new"))
+ goto destroy;
+
+ /* Create a temporary wakeup source */
+ if (!ASSERT_OK(lock_ws(rb_ctx.name), "lock_ws"))
+ goto unlock;
+
+ err = bpf_prog_test_run_opts(bpf_program__fd(
+ skel->progs.iterate_wakeupsources), NULL);
+ ASSERT_OK(err, "bpf_prog_test_run");
+
+ ring_buffer__consume(rb);
+
+ ASSERT_TRUE(rb_ctx.found, "found_test_ws_in_rb");
+ ASSERT_GT(rb_ctx.active_time_ns, 0, "active_time_gt_0");
+ ASSERT_GT(rb_ctx.total_time_ns, 0, "total_time_gt_0");
+
+unlock:
+ unlock_ws(rb_ctx.name);
+destroy:
+ if (rb)
+ ring_buffer__free(rb);
+ test_wakeup_source__destroy(skel);
+ }
+
+ RUN_TESTS(wakeup_source_fail);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_wakeup_source.c b/tools/testing/selftests/bpf/progs/test_wakeup_source.c
new file mode 100644
index 000000000000..fd2fb6aebd82
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_wakeup_source.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2026 Google LLC */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+#include "bpf_misc.h"
+#include "wakeup_source.h"
+
+#define MAX_LOOP_ITER 1000
+#define RB_SIZE (16384 * 4)
+
+struct {
+ __uint(type, BPF_MAP_TYPE_RINGBUF);
+ __uint(max_entries, RB_SIZE);
+} rb SEC(".maps");
+
+struct bpf_ws_lock;
+struct bpf_ws_lock *bpf_wakeup_sources_read_lock(void) __ksym;
+void bpf_wakeup_sources_read_unlock(struct bpf_ws_lock *lock) __ksym;
+void *bpf_wakeup_sources_get_head(void) __ksym;
+
+SEC("syscall")
+__success __retval(0)
+int iterate_wakeupsources(void *ctx)
+{
+ struct list_head *head = bpf_wakeup_sources_get_head();
+ struct list_head *pos = head;
+ struct bpf_ws_lock *lock;
+ int i;
+
+ lock = bpf_wakeup_sources_read_lock();
+ if (!lock)
+ return 0;
+
+ bpf_for(i, 0, MAX_LOOP_ITER) {
+ if (bpf_core_read(&pos, sizeof(pos), &pos->next) || !pos || pos == head)
+ break;
+
+ struct wakeup_event_t *e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0);
+
+ if (!e)
+ break;
+
+ struct wakeup_source *ws = bpf_core_cast(
+ (void *)pos - bpf_core_field_offset(struct wakeup_source, entry),
+ struct wakeup_source);
+ s64 active_time = 0;
+ bool active = BPF_CORE_READ_BITFIELD(ws, active);
+ bool autosleep_enable = BPF_CORE_READ_BITFIELD(ws, autosleep_enabled);
+ s64 last_time = ws->last_time;
+ s64 max_time = ws->max_time;
+ s64 prevent_sleep_time = ws->prevent_sleep_time;
+ s64 total_time = ws->total_time;
+
+ if (active) {
+ s64 curr_time = bpf_ktime_get_ns();
+ s64 prevent_time = ws->start_prevent_time;
+
+ if (curr_time > last_time)
+ active_time = curr_time - last_time;
+
+ total_time += active_time;
+ if (active_time > max_time)
+ max_time = active_time;
+ if (autosleep_enable && curr_time > prevent_time)
+ prevent_sleep_time += curr_time - prevent_time;
+ }
+
+ e->active_count = ws->active_count;
+ e->active_time_ns = active_time;
+ e->event_count = ws->event_count;
+ e->expire_count = ws->expire_count;
+ e->last_time_ns = last_time;
+ e->max_time_ns = max_time;
+ e->prevent_sleep_time_ns = prevent_sleep_time;
+ e->total_time_ns = total_time;
+ e->wakeup_count = ws->wakeup_count;
+
+ if (bpf_probe_read_kernel_str(
+ e->name, WAKEUP_NAME_LEN, ws->name) < 0)
+ e->name[0] = '\0';
+
+ bpf_ringbuf_submit(e, 0);
+ }
+
+ bpf_wakeup_sources_read_unlock(lock);
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/wakeup_source.h b/tools/testing/selftests/bpf/progs/wakeup_source.h
new file mode 100644
index 000000000000..cd74de92c82f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/wakeup_source.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright 2026 Google LLC */
+
+#ifndef __WAKEUP_SOURCE_H__
+#define __WAKEUP_SOURCE_H__
+
+#define WAKEUP_NAME_LEN 128
+
+struct wakeup_event_t {
+ unsigned long active_count;
+ long long active_time_ns;
+ unsigned long event_count;
+ unsigned long expire_count;
+ long long last_time_ns;
+ long long max_time_ns;
+ long long prevent_sleep_time_ns;
+ long long total_time_ns;
+ unsigned long wakeup_count;
+ char name[WAKEUP_NAME_LEN];
+};
+
+#endif /* __WAKEUP_SOURCE_H__ */
diff --git a/tools/testing/selftests/bpf/progs/wakeup_source_fail.c b/tools/testing/selftests/bpf/progs/wakeup_source_fail.c
new file mode 100644
index 000000000000..0f8d29865a01
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/wakeup_source_fail.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2026 Google LLC */
+
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+struct bpf_ws_lock;
+
+struct bpf_ws_lock *bpf_wakeup_sources_read_lock(void) __ksym;
+void bpf_wakeup_sources_read_unlock(struct bpf_ws_lock *lock) __ksym;
+void *bpf_wakeup_sources_get_head(void) __ksym;
+
+SEC("syscall")
+__failure __msg("BPF_EXIT instruction in main prog would lead to reference leak")
+int wakeup_source_lock_no_unlock(void *ctx)
+{
+ struct bpf_ws_lock *lock;
+
+ lock = bpf_wakeup_sources_read_lock();
+ if (!lock)
+ return 0;
+
+ return 0;
+}
+
+SEC("syscall")
+__failure __msg("access beyond struct")
+int wakeup_source_access_lock_fields(void *ctx)
+{
+ struct bpf_ws_lock *lock;
+ int val;
+
+ lock = bpf_wakeup_sources_read_lock();
+ if (!lock)
+ return 0;
+
+ val = *(int *)lock;
+
+ bpf_wakeup_sources_read_unlock(lock);
+ return val;
+}
+
+SEC("syscall")
+__failure __msg("type=scalar expected=fp")
+int wakeup_source_unlock_no_lock(void *ctx)
+{
+ struct bpf_ws_lock *lock = (void *)0x1;
+
+ bpf_wakeup_sources_read_unlock(lock);
+
+ return 0;
+}
+
+SEC("syscall")
+__failure __msg("Possibly NULL pointer passed to trusted arg0")
+int wakeup_source_unlock_null(void *ctx)
+{
+ bpf_wakeup_sources_read_unlock(NULL);
+
+ return 0;
+}
+
+SEC("syscall")
+__failure __msg("R0 invalid mem access 'scalar'")
+int wakeup_source_unsafe_dereference(void *ctx)
+{
+ struct list_head *head = bpf_wakeup_sources_get_head();
+
+ if (head->next)
+ return 1;
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v3 0/2] Support BPF traversal of wakeup sources
2026-03-31 15:34 [PATCH v3 0/2] Support BPF traversal of wakeup sources Samuel Wu
2026-03-31 15:34 ` [PATCH v3 1/2] PM: wakeup: Add kfuncs to traverse over wakeup_sources Samuel Wu
2026-03-31 15:34 ` [PATCH v3 2/2] selftests/bpf: Add tests for wakeup_sources kfuncs Samuel Wu
@ 2026-04-01 9:15 ` Greg Kroah-Hartman
2026-04-01 19:07 ` Samuel Wu
2 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2026-04-01 9:15 UTC (permalink / raw)
To: Samuel Wu
Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Danilo Krummrich,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, kernel-team,
linux-kernel, linux-pm, driver-core, bpf, linux-kselftest
On Tue, Mar 31, 2026 at 08:34:09AM -0700, Samuel Wu wrote:
> This patchset adds requisite kfuncs for BPF programs to safely traverse
> wakeup_sources, and puts a config flag around the sysfs interface.
>
> Currently, a traversal of wakeup sources require going through
> /sys/class/wakeup/* or /d/wakeup_sources/*. The repeated syscalls to query
> sysfs is inefficient, as there can be hundreds of wakeup_sources, with each
> wakeup source also having multiple attributes. debugfs is unstable and
> insecure.
Describe "inefficient" please?
And if you really think that doing an open/read/close on a virtual
filesystem is inefficient, then I have the syscall for you!
I've been trying to get readfile() accepted every few years, looks like
I last tried in 2020:
https://lore.kernel.org/r/20200704140250.423345-1-gregkh@linuxfoundation.org
but I keep the patchset up to date in my local tree all the time.
Would that help you out here instead?
> Adding kfuncs to lock/unlock wakeup sources allows BPF program to safely
> traverse the wakeup sources list. The head address of wakeup_sources can
> safely be resolved through BPF helper functions or variable attributes.
Who is going to be calling this?
> On a quiescent Pixel 6 traversing 150 wakeup_sources, I am seeing ~34x
> speedup (sampled 75 times in table below). For a device under load, the
> speedup is greater.
> +-------+----+----------+----------+
> | | n | AVG (ms) | STD (ms) |
> +-------+----+----------+----------+
> | sysfs | 75 | 44.9 | 12.6 |
> +-------+----+----------+----------+
> | BPF | 75 | 1.3 | 0.7 |
> +-------+----+----------+----------+
150 sysfs calls in 44.9 ms feels very slow. but really, what are you
expecting here, sysfs should NEVER be on a "fast path" that you care
about performance. Why are you hammering on sysfs here? What HAS to
have this type of performance?
In other words, what problem are you trying to solve that having access
to 150+ sysfs files all at once in a faster way is going to fix?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 0/2] Support BPF traversal of wakeup sources
2026-04-01 9:15 ` [PATCH v3 0/2] Support BPF traversal of wakeup sources Greg Kroah-Hartman
@ 2026-04-01 19:07 ` Samuel Wu
2026-04-02 4:06 ` Greg Kroah-Hartman
0 siblings, 1 reply; 11+ messages in thread
From: Samuel Wu @ 2026-04-01 19:07 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Danilo Krummrich,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, kernel-team,
linux-kernel, linux-pm, driver-core, bpf, linux-kselftest
On Wed, Apr 1, 2026 at 2:15 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Mar 31, 2026 at 08:34:09AM -0700, Samuel Wu wrote:
> > This patchset adds requisite kfuncs for BPF programs to safely traverse
> > wakeup_sources, and puts a config flag around the sysfs interface.
> >
> > Currently, a traversal of wakeup sources require going through
> > /sys/class/wakeup/* or /d/wakeup_sources/*. The repeated syscalls to query
> > sysfs is inefficient, as there can be hundreds of wakeup_sources, with each
> > wakeup source also having multiple attributes. debugfs is unstable and
> > insecure.
>
> Describe "inefficient" please?
Ack; I’ll provide a more detailed breakdown in the v4 cover letter. To
summarize: the "inefficiency" isn't just the number of sources (150),
but the fact that each source has 10 attributes. We are looking at
1,500+ sysfs nodes to get a full snapshot of the system.
>
> And if you really think that doing an open/read/close on a virtual
> filesystem is inefficient, then I have the syscall for you!
>
> I've been trying to get readfile() accepted every few years, looks like
> I last tried in 2020:
> https://lore.kernel.org/r/20200704140250.423345-1-gregkh@linuxfoundation.org
> but I keep the patchset up to date in my local tree all the time.
>
> Would that help you out here instead?
`readfile()` seems like it would be a great optimization for many
usecases, but it doesn't solve the context switch bottleneck.
Additionally, current userspace implementations attempt to speed up
this traversal by caching fds, so this new syscall wouldn't help as
much as one might initially expect.
> > Adding kfuncs to lock/unlock wakeup sources allows BPF program to safely
> > traverse the wakeup sources list. The head address of wakeup_sources can
> > safely be resolved through BPF helper functions or variable attributes.
>
> Who is going to be calling this?
BPF programs call the new kfuncs. I realized I left some stale text in
the v2 cover letter regarding the interface; I'll clean that up for
the next version to make this point clearer.
> > On a quiescent Pixel 6 traversing 150 wakeup_sources, I am seeing ~34x
> > speedup (sampled 75 times in table below). For a device under load, the
> > speedup is greater.
> > +-------+----+----------+----------+
> > | | n | AVG (ms) | STD (ms) |
> > +-------+----+----------+----------+
> > | sysfs | 75 | 44.9 | 12.6 |
> > +-------+----+----------+----------+
> > | BPF | 75 | 1.3 | 0.7 |
> > +-------+----+----------+----------+
>
> 150 sysfs calls in 44.9 ms feels very slow. but really, what are you
> expecting here, sysfs should NEVER be on a "fast path" that you care
> about performance. Why are you hammering on sysfs here? What HAS to
> have this type of performance?
>
> In other words, what problem are you trying to solve that having access
> to 150+ sysfs files all at once in a faster way is going to fix?
The 44.9ms is the cost of reading ~1,500 sysfs nodes (150 sources * 10
attributes). This is even worse on wearables, where the compute and
power constrained platform exacerbates performance sensitivity even
more.
On these platforms, the CPU is suspended as much as possible. A
byproduct of this is that the wakeup source traversal occurs on a
user-perceptible path, which impacts battery life and UI
responsiveness.
Beyond the performance improvement, moving to BPF offers other benefits:
1. Reduce Memory: Drop the fd caching logic
2. Simplify Security: Consolidate SELinux permissions rather than
managing labels for every single *possible* wakeup_source
Thanks Greg!
-- Sam
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 0/2] Support BPF traversal of wakeup sources
2026-04-01 19:07 ` Samuel Wu
@ 2026-04-02 4:06 ` Greg Kroah-Hartman
2026-04-02 19:37 ` Samuel Wu
0 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2026-04-02 4:06 UTC (permalink / raw)
To: Samuel Wu
Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Danilo Krummrich,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, kernel-team,
linux-kernel, linux-pm, driver-core, bpf, linux-kselftest
On Wed, Apr 01, 2026 at 12:07:12PM -0700, Samuel Wu wrote:
> On Wed, Apr 1, 2026 at 2:15 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Tue, Mar 31, 2026 at 08:34:09AM -0700, Samuel Wu wrote:
> > > This patchset adds requisite kfuncs for BPF programs to safely traverse
> > > wakeup_sources, and puts a config flag around the sysfs interface.
> > >
> > > Currently, a traversal of wakeup sources require going through
> > > /sys/class/wakeup/* or /d/wakeup_sources/*. The repeated syscalls to query
> > > sysfs is inefficient, as there can be hundreds of wakeup_sources, with each
> > > wakeup source also having multiple attributes. debugfs is unstable and
> > > insecure.
> >
> > Describe "inefficient" please?
>
> Ack; I’ll provide a more detailed breakdown in the v4 cover letter. To
> summarize: the "inefficiency" isn't just the number of sources (150),
> but the fact that each source has 10 attributes. We are looking at
> 1,500+ sysfs nodes to get a full snapshot of the system.
Wait, no, something is wrong here. You should NEVER be wanting to
combine multiple sysfs files at the same time into a "snapshot" of the
system because by virtue of how this works, it's going to change while
you are actually traversing all of those files!
Why are you trying to read 1500+ sysfs files at once, and what are you
doing with that information? And if you really need it "all at once",
why can't we provide it for you in a sane manner, instead of being
forced to either walk the whole sysfs tree, or rely on a bpf script?
So, what problem are you trying to solve that looking at all of these
files solves for you at the moment?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 0/2] Support BPF traversal of wakeup sources
2026-04-02 4:06 ` Greg Kroah-Hartman
@ 2026-04-02 19:37 ` Samuel Wu
2026-04-03 10:04 ` Greg Kroah-Hartman
0 siblings, 1 reply; 11+ messages in thread
From: Samuel Wu @ 2026-04-02 19:37 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Danilo Krummrich,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, kernel-team,
linux-kernel, linux-pm, driver-core, bpf, linux-kselftest
On Wed, Apr 1, 2026 at 9:06 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Wed, Apr 01, 2026 at 12:07:12PM -0700, Samuel Wu wrote:
> > On Wed, Apr 1, 2026 at 2:15 AM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Tue, Mar 31, 2026 at 08:34:09AM -0700, Samuel Wu wrote:
> > > > This patchset adds requisite kfuncs for BPF programs to safely traverse
> > > > wakeup_sources, and puts a config flag around the sysfs interface.
> > > >
> > > > Currently, a traversal of wakeup sources require going through
> > > > /sys/class/wakeup/* or /d/wakeup_sources/*. The repeated syscalls to query
> > > > sysfs is inefficient, as there can be hundreds of wakeup_sources, with each
> > > > wakeup source also having multiple attributes. debugfs is unstable and
> > > > insecure.
> > >
> > > Describe "inefficient" please?
> >
> > Ack; I’ll provide a more detailed breakdown in the v4 cover letter. To
> > summarize: the "inefficiency" isn't just the number of sources (150),
> > but the fact that each source has 10 attributes. We are looking at
> > 1,500+ sysfs nodes to get a full snapshot of the system.
>
> Wait, no, something is wrong here. You should NEVER be wanting to
> combine multiple sysfs files at the same time into a "snapshot" of the
> system because by virtue of how this works, it's going to change while
> you are actually traversing all of those files!
Agree, the current approach with sysfs might have stale values. The
BPF approach holds a lock while traversing the list. It's not a
perfect snapshot, but it's internally consistent and arguably better
than the current sysfs implementation.
> Why are you trying to read 1500+ sysfs files at once, and what are you
> doing with that information? And if you really need it "all at once",
> why can't we provide it for you in a sane manner, instead of being
> forced to either walk the whole sysfs tree, or rely on a bpf script?
The data is fundamental for debugging and improving power at scale.
The original discussion and patch [1] provide more context of the
intent. To summarize the history, debugfs was unstable and insecure,
leading to the current sysfs implementation. However, sysfs has the
constraint of one attribute per node, requiring 10 sysfs accesses per
wakeup source.
That said, I completely agree that reading 1500+ sysfs files at once
is unreasonable. Perhaps the sysfs approach was manageable at the time
of [1], but moving forward we need a more scalable solution. This is
the main motivator and makes BPF the sane approach, as it improves
traversal in nearly every aspect (e.g. cycles, memory, simplicity,
scalability).
[1]: https://lore.kernel.org/all/20190715214348.81865-1-trong@android.com/
Thanks!
Sam
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 0/2] Support BPF traversal of wakeup sources
2026-04-02 19:37 ` Samuel Wu
@ 2026-04-03 10:04 ` Greg Kroah-Hartman
2026-04-03 16:28 ` Samuel Wu
0 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2026-04-03 10:04 UTC (permalink / raw)
To: Samuel Wu
Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Danilo Krummrich,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, kernel-team,
linux-kernel, linux-pm, driver-core, bpf, linux-kselftest
On Thu, Apr 02, 2026 at 12:37:12PM -0700, Samuel Wu wrote:
> On Wed, Apr 1, 2026 at 9:06 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Apr 01, 2026 at 12:07:12PM -0700, Samuel Wu wrote:
> > > On Wed, Apr 1, 2026 at 2:15 AM Greg Kroah-Hartman
> > > <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > On Tue, Mar 31, 2026 at 08:34:09AM -0700, Samuel Wu wrote:
> > > > > This patchset adds requisite kfuncs for BPF programs to safely traverse
> > > > > wakeup_sources, and puts a config flag around the sysfs interface.
> > > > >
> > > > > Currently, a traversal of wakeup sources require going through
> > > > > /sys/class/wakeup/* or /d/wakeup_sources/*. The repeated syscalls to query
> > > > > sysfs is inefficient, as there can be hundreds of wakeup_sources, with each
> > > > > wakeup source also having multiple attributes. debugfs is unstable and
> > > > > insecure.
> > > >
> > > > Describe "inefficient" please?
> > >
> > > Ack; I’ll provide a more detailed breakdown in the v4 cover letter. To
> > > summarize: the "inefficiency" isn't just the number of sources (150),
> > > but the fact that each source has 10 attributes. We are looking at
> > > 1,500+ sysfs nodes to get a full snapshot of the system.
> >
> > Wait, no, something is wrong here. You should NEVER be wanting to
> > combine multiple sysfs files at the same time into a "snapshot" of the
> > system because by virtue of how this works, it's going to change while
> > you are actually traversing all of those files!
>
> Agree, the current approach with sysfs might have stale values. The
> BPF approach holds a lock while traversing the list. It's not a
> perfect snapshot, but it's internally consistent and arguably better
> than the current sysfs implementation.
>
> > Why are you trying to read 1500+ sysfs files at once, and what are you
> > doing with that information? And if you really need it "all at once",
> > why can't we provide it for you in a sane manner, instead of being
> > forced to either walk the whole sysfs tree, or rely on a bpf script?
>
> The data is fundamental for debugging and improving power at scale.
> The original discussion and patch [1] provide more context of the
> intent. To summarize the history, debugfs was unstable and insecure,
> leading to the current sysfs implementation. However, sysfs has the
> constraint of one attribute per node, requiring 10 sysfs accesses per
> wakeup source.
Ok, as the sysfs api doesn't work your use case anymore, why do we need
to keep it around at all?
> That said, I completely agree that reading 1500+ sysfs files at once
> is unreasonable. Perhaps the sysfs approach was manageable at the time
> of [1], but moving forward we need a more scalable solution. This is
> the main motivator and makes BPF the sane approach, as it improves
> traversal in nearly every aspect (e.g. cycles, memory, simplicity,
> scalability).
I'm all for making this more scalable and work for your systems now, but
consider if you could drop the sysfs api entirely, would you want this
to be a different type of api entirely instead of having to plug through
these using ebpf?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 0/2] Support BPF traversal of wakeup sources
2026-04-03 10:04 ` Greg Kroah-Hartman
@ 2026-04-03 16:28 ` Samuel Wu
0 siblings, 0 replies; 11+ messages in thread
From: Samuel Wu @ 2026-04-03 16:28 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Danilo Krummrich,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, kernel-team,
linux-kernel, linux-pm, driver-core, bpf, linux-kselftest
On Fri, Apr 3, 2026 at 3:04 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Thu, Apr 02, 2026 at 12:37:12PM -0700, Samuel Wu wrote:
> > On Wed, Apr 1, 2026 at 9:06 PM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Wed, Apr 01, 2026 at 12:07:12PM -0700, Samuel Wu wrote:
> > > > On Wed, Apr 1, 2026 at 2:15 AM Greg Kroah-Hartman
> > > > <gregkh@linuxfoundation.org> wrote:
> > > > >
> > > > > On Tue, Mar 31, 2026 at 08:34:09AM -0700, Samuel Wu wrote:
[ ... ]
> > The data is fundamental for debugging and improving power at scale.
> > The original discussion and patch [1] provide more context of the
> > intent. To summarize the history, debugfs was unstable and insecure,
> > leading to the current sysfs implementation. However, sysfs has the
> > constraint of one attribute per node, requiring 10 sysfs accesses per
> > wakeup source.
>
> Ok, as the sysfs api doesn't work your use case anymore, why do we need
> to keep it around at all?
>
> > That said, I completely agree that reading 1500+ sysfs files at once
> > is unreasonable. Perhaps the sysfs approach was manageable at the time
> > of [1], but moving forward we need a more scalable solution. This is
> > the main motivator and makes BPF the sane approach, as it improves
> > traversal in nearly every aspect (e.g. cycles, memory, simplicity,
> > scalability).
>
> I'm all for making this more scalable and work for your systems now, but
> consider if you could drop the sysfs api entirely, would you want this
> to be a different type of api entirely instead of having to plug through
> these using ebpf?
Almost all use cases want all this data at once, so AFAICT BPF offers
the best performance for that. But of course, open to discussion if
there is an alternative API that matches BPF's performance for this
use case.
I'm not opposed to dropping the sysfs approach, and I attempted to do
so in the v1 patch [1]. I'm not sure who else currently uses those
sysfs nodes, but a config flag should remove friction and could be a
stepping stone toward deprecation/removal.
[1]: https://lore.kernel.org/all/20260320160055.4114055-3-wusamuel@google.com/
Thanks!
-- Sam
^ permalink raw reply [flat|nested] 11+ messages in thread