Linux Perf Users
 help / color / mirror / Atom feed
* [PATCH v4 0/3] Enable perf tracing for unprivileged users
@ 2026-05-15 19:40 Anubhav Shelat
  2026-05-15 19:40 ` [PATCH v4 1/3] perf evsel: don't set PERF_SAMPLE_IP for unprivileged tracepoints Anubhav Shelat
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Anubhav Shelat @ 2026-05-15 19:40 UTC (permalink / raw)
  To: mpetlan, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon,
	linux-kernel, linux-trace-kernel, linux-perf-users
  Cc: Anubhav Shelat

Enable users to use perf-trace to trace their own processes, like strace
but without the overhead of ptrace(). Ensure that users cannot access
other users' or systemwide tracing data.

Changes in v4:
- Preserve security_perf_event_open(PERF_SECURITY_KERNEL) LSM hook in
  the tp_bypass path.
- Lift the PERF_SAMPLE_IP check out of the tp_bypass path above the
  PERF_SAMPLE_RAW branch so it applies to counting and sampling. This
  also allows us to ensure PERF_SAMPLE_IP is set for uprobes.
- Block counting path for TRACE_EVENT_FL_CAP_ANY for unprivileged users
  with sysctl_perf_event_paranoid > 1.

Changes in v3:
- Don't set PERF_SAMPLE_IP for unprivileged tracepoints. This allows us
  to exclude PERF_SAMPLE_IP from kaddr_leak without weakening KASLR.
- Mount tracefs as world-traversable so users can access eventfs
  directories.

Anubhav Shelat (3):
  perf evsel: don't set PERF_SAMPLE_IP for unprivileged tracepoints
  perf: enable unprivileged syscall tracing with perf trace
  tracefs: make root directory world-traversable

 fs/tracefs/inode.c              |  2 +-
 kernel/events/core.c            | 28 +++++++++++++++++++++++++---
 kernel/trace/trace_event_perf.c | 21 ++++++++++++++++++++-
 kernel/trace/trace_events.c     | 16 ++++++++++++++--
 tools/perf/util/evsel.c         | 14 +++++++++++++-
 5 files changed, 73 insertions(+), 8 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 1/3] perf evsel: don't set PERF_SAMPLE_IP for unprivileged tracepoints
  2026-05-15 19:40 [PATCH v4 0/3] Enable perf tracing for unprivileged users Anubhav Shelat
@ 2026-05-15 19:40 ` Anubhav Shelat
  2026-05-15 20:10   ` sashiko-bot
  2026-05-15 19:40 ` [PATCH v4 2/3] perf: enable unprivileged syscall tracing with perf trace Anubhav Shelat
  2026-05-15 19:40 ` [PATCH v4 3/3] tracefs: make root directory world-traversable Anubhav Shelat
  2 siblings, 1 reply; 6+ messages in thread
From: Anubhav Shelat @ 2026-05-15 19:40 UTC (permalink / raw)
  To: mpetlan, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon,
	linux-kernel, linux-trace-kernel, linux-perf-users
  Cc: Anubhav Shelat

For tracepoint events the IP is a static kernel address.
It doesn't vary by sample and provides no useful information for
unprivileged users. Skipping setting PERF_SAMPLE_IP for unprivileged
tracepoints avoids exposing a kernel address that reveals the KASLR base
offset.

Make an exception for uprobes, which are registered as
PERF_TYPE_TRACEPOINT, because the IP is important for their
functionality and is a safe userspace address. Detect them with
__probe_ip (entry) and __probe_ret_ip (return) using evsel__field().

Assisted-by: Claude:claude-sonnet-4.5
Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
---
 tools/perf/util/evsel.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 2ee87fd84d3e..bf66e0c78451 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1509,7 +1509,19 @@ void evsel__config(struct evsel *evsel, const struct record_opts *opts,
 	attr->write_backward = opts->overwrite ? 1 : 0;
 	attr->read_format   = PERF_FORMAT_LOST;
 
-	evsel__set_sample_bit(evsel, IP);
+	/*
+	 * Don't set PERF_SAMPLE_IP for unprivileged kernel tracepoints to
+	 * avoid exposing kernel addresses. Uprobes expose only userspace
+	 * addresses so they're safe. Detect entry and return uprobes.
+	 */
+	if (attr->type != PERF_TYPE_TRACEPOINT || perf_event_paranoid_check(1)
+#ifdef HAVE_LIBTRACEEVENT
+	    || evsel__field(evsel, "__probe_ip")
+	    || evsel__field(evsel, "__probe_ret_ip")
+#endif
+	    )
+		evsel__set_sample_bit(evsel, IP);
+
 	evsel__set_sample_bit(evsel, TID);
 
 	if (evsel->sample_read) {
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 2/3] perf: enable unprivileged syscall tracing with perf trace
  2026-05-15 19:40 [PATCH v4 0/3] Enable perf tracing for unprivileged users Anubhav Shelat
  2026-05-15 19:40 ` [PATCH v4 1/3] perf evsel: don't set PERF_SAMPLE_IP for unprivileged tracepoints Anubhav Shelat
@ 2026-05-15 19:40 ` Anubhav Shelat
  2026-05-15 19:40 ` [PATCH v4 3/3] tracefs: make root directory world-traversable Anubhav Shelat
  2 siblings, 0 replies; 6+ messages in thread
From: Anubhav Shelat @ 2026-05-15 19:40 UTC (permalink / raw)
  To: mpetlan, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon,
	linux-kernel, linux-trace-kernel, linux-perf-users
  Cc: Anubhav Shelat

Allow unprivileged users to trace their own processes' syscalls using
perf trace, similar to strace without the intrusive overhead of ptrace().

Currently, perf trace requires CAP_PERFMON or paranoid level ≤ 1 even
though the kernel has existing infrastructure (TRACE_EVENT_FL_CAP_ANY)
specifically designed to mark syscall tracepoints as safe for
unprivileged access. To fix this:

1. Loosen the condition in perf_event_open() which requires privileges
   for all events with exclude_kernel=0. This allows perf_event_open() to
   bypass the paranoid check for task-attached tracepoint events. Ensure
   that sample types which can expose kernel addresses to unprivileged
   users are blocked. Ensure the PERF_SECURITY_KERNEL LSM hook is
   preserved.

2. Make the format and id tracefs files world-readable only for tracepoints
   with TRACE_EVENT_FL_CAP_ANY, allowing unprivileged users to see syscall
   tracepoint ids without exposing sensitive information.

3. Add a check to perf_trace_event_perm() to block PERF_SAMPLE_IP on
   kernel tracepoints for unprivileged users to prevent KASLR bypass. We do
   this here rather than in kaddr_leak because perf_trace_event_perm() can
   distinguish between kernel tracepoints and uprobe tracepoints, where the
   IP is a safe user space address and is necessary for uprobe
   functionality.

4. Restrict pure counting events (no PERF_SAMPLE_RAW) to
   TRACE_EVENT_FL_CAP_ANY tracepoints preventing unprivileged users from
   counting internal kernel tracepoints while preserving current
   behavior for exclude_kernel=1 events.

Example usage after this change:
  $ perf trace ls          # works as unprivileged user
  $ perf trace             # system-wide, still requires privileges
  $ perf trace -p 1234     # requires ptrace permission on pid 1234

Assisted-by: Claude:claude-sonnet-4.5
Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
---
 kernel/events/core.c            | 28 +++++++++++++++++++++++++---
 kernel/trace/trace_event_perf.c | 21 ++++++++++++++++++++-
 kernel/trace/trace_events.c     | 16 ++++++++++++++--
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7935d5663944..ff2d1e9a0b79 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -13873,9 +13873,31 @@ SYSCALL_DEFINE5(perf_event_open,
 		return err;
 
 	if (!attr.exclude_kernel) {
-		err = perf_allow_kernel();
-		if (err)
-			return err;
+		bool tp_bypass = false;
+
+		/* Check unprivileged tracepoints */
+		if (attr.type == PERF_TYPE_TRACEPOINT && pid != -1) {
+			/*
+			 * Block sample types that expose kernel addresses to
+			 * prevent KASLR bypass
+			 */
+			u64 kaddr_leak = PERF_SAMPLE_CALLCHAIN |
+					 PERF_SAMPLE_BRANCH_STACK |
+					 PERF_SAMPLE_ADDR |
+					 PERF_SAMPLE_REGS_INTR;
+
+			tp_bypass = !(attr.sample_type & kaddr_leak);
+		}
+
+		if (!tp_bypass) {
+			err = perf_allow_kernel();
+			if (err)
+				return err;
+		} else {
+			err = security_perf_event_open(PERF_SECURITY_KERNEL);
+			if (err)
+				return err;
+		}
 	}
 
 	if (attr.namespaces) {
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index a6bb7577e8c5..466007ed2869 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -72,9 +72,28 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
 			return -EINVAL;
 	}
 
+	/*
+	 * PERF_SAMPLE_IP on kernel tracepoints exposes a kernel text
+	 * address, weakening KASLR. Block for unprivileged users unless
+	 * the tracepoint is a uprobe (userspace IP, safe to expose).
+	 */
+	if ((p_event->attr.sample_type & PERF_SAMPLE_IP) &&
+	    !p_event->attr.exclude_kernel &&
+	    !(tp_event->flags & TRACE_EVENT_FL_UPROBE) &&
+	    sysctl_perf_event_paranoid > 1 && !perfmon_capable())
+		return -EACCES;
+
 	/* No tracing, just counting, so no obvious leak */
-	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW))
+	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW)) {
+		/* Prevent unprivileged users from counting kernel tracepoints */
+		if (!p_event->attr.exclude_kernel &&
+		    sysctl_perf_event_paranoid > 1 && !perfmon_capable()) {
+			if (!(p_event->attach_state == PERF_ATTACH_TASK &&
+			      (tp_event->flags & TRACE_EVENT_FL_CAP_ANY)))
+				return -EACCES;
+		}
 		return 0;
+	}
 
 	/* Some events are ok to be traced by non-root users... */
 	if (p_event->attach_state == PERF_ATTACH_TASK) {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c46e623e7e0d..cbd07e2ec528 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -3050,7 +3050,13 @@ static int event_callback(const char *name, umode_t *mode, void **data,
 	struct trace_event_call *call = file->event_call;
 
 	if (strcmp(name, "format") == 0) {
-		*mode = TRACE_MODE_READ;
+		/*
+		 * Make format tracefs file world readable for tracepoints with
+		 * TRACE_EVENT_FL_CAP_ANY
+		 */
+		*mode = (call->flags & TRACE_EVENT_FL_CAP_ANY) ?
+			(TRACE_MODE_READ | 0004) :
+			TRACE_MODE_READ;
 		*fops = &ftrace_event_format_fops;
 		return 1;
 	}
@@ -3086,7 +3092,13 @@ static int event_callback(const char *name, umode_t *mode, void **data,
 #ifdef CONFIG_PERF_EVENTS
 	if (call->event.type && call->class->reg &&
 	    strcmp(name, "id") == 0) {
-		*mode = TRACE_MODE_READ;
+		/*
+		 * Make id tracefs file world readable for tracepoints with
+		 * TRACE_EVENT_FL_CAP_ANY
+		 */
+		*mode = (call->flags & TRACE_EVENT_FL_CAP_ANY) ?
+			(TRACE_MODE_READ | 0004) :
+			TRACE_MODE_READ;
 		*data = (void *)(long)call->event.type;
 		*fops = &ftrace_event_id_fops;
 		return 1;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 3/3] tracefs: make root directory world-traversable
  2026-05-15 19:40 [PATCH v4 0/3] Enable perf tracing for unprivileged users Anubhav Shelat
  2026-05-15 19:40 ` [PATCH v4 1/3] perf evsel: don't set PERF_SAMPLE_IP for unprivileged tracepoints Anubhav Shelat
  2026-05-15 19:40 ` [PATCH v4 2/3] perf: enable unprivileged syscall tracing with perf trace Anubhav Shelat
@ 2026-05-15 19:40 ` Anubhav Shelat
  2026-05-15 23:16   ` Steven Rostedt
  2 siblings, 1 reply; 6+ messages in thread
From: Anubhav Shelat @ 2026-05-15 19:40 UTC (permalink / raw)
  To: mpetlan, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon,
	linux-kernel, linux-trace-kernel, linux-perf-users
  Cc: Anubhav Shelat

Change the default tracefs mount mode from 0700 to 0755. This allows
unprivileged users to access the eventfs directories underneath which
already use 0755.

Tracing data files use mode 0440 and 0640 so they are not exposed by
this change. Only the format and id files, which have been marked as
work-readable, become accessible.

Directory listings of kprobes and uprobes, which contain functions or
binaries, become visible to unprivileged users but do not contain kernel
addresses. Admins using probes can restore the previous behavior with
chmod or mount -o mode=700.

Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
---
 fs/tracefs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index f3d6188a3b7b..3a6a0c800a8b 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -23,7 +23,7 @@
 #include <linux/slab.h>
 #include "internal.h"
 
-#define TRACEFS_DEFAULT_MODE	0700
+#define TRACEFS_DEFAULT_MODE	0755
 static struct kmem_cache *tracefs_inode_cachep __ro_after_init;
 
 static struct vfsmount *tracefs_mount;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 1/3] perf evsel: don't set PERF_SAMPLE_IP for unprivileged tracepoints
  2026-05-15 19:40 ` [PATCH v4 1/3] perf evsel: don't set PERF_SAMPLE_IP for unprivileged tracepoints Anubhav Shelat
@ 2026-05-15 20:10   ` sashiko-bot
  0 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-05-15 20:10 UTC (permalink / raw)
  To: Anubhav Shelat; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Uprobes silently lose the `PERF_SAMPLE_IP` bit when `perf` is built without `libtraceevent` support, breaking uprobe profiling for unprivileged users.
--

commit 07ebb257f5c9c09215f15379588d99009f389e9e
Author: Anubhav Shelat <ashelat@redhat.com>

perf evsel: don't set PERF_SAMPLE_IP for unprivileged tracepoints

This commit avoids exposing kernel addresses by dropping the PERF_SAMPLE_IP
bit for unprivileged tracepoints. It makes an exception for uprobes since
they expose safe userspace addresses.

> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c

[ ... ]

> @@ -1509,7 +1509,19 @@ void evsel__config(struct evsel *evsel, const struct record_opts *opts,
>  	attr->write_backward = opts->overwrite ? 1 : 0;
>  	attr->read_format   = PERF_FORMAT_LOST;
>  
> -	evsel__set_sample_bit(evsel, IP);
> +	/*
> +	 * Don't set PERF_SAMPLE_IP for unprivileged kernel tracepoints to
> +	 * avoid exposing kernel addresses. Uprobes expose only userspace
> +	 * addresses so they're safe. Detect entry and return uprobes.
> +	 */
> +	if (attr->type != PERF_TYPE_TRACEPOINT || perf_event_paranoid_check(1)
> +#ifdef HAVE_LIBTRACEEVENT
> +	    || evsel__field(evsel, "__probe_ip")
> +	    || evsel__field(evsel, "__probe_ret_ip")
> +#endif
> +	    )
> +		evsel__set_sample_bit(evsel, IP);

Does this silently drop the PERF_SAMPLE_IP bit for unprivileged uprobes when
perf is built without libtraceevent support?

If HAVE_LIBTRACEEVENT is not defined, the uprobe detection logic is compiled
out, meaning the condition evaluates to false for unprivileged uprobes.

Is there a fallback detection mechanism that could be used here (such as
checking the event or PMU name) to ensure uprobe profiling continues to work
for unprivileged users on these builds?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260515194010.93725-2-ashelat@redhat.com?part=1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 3/3] tracefs: make root directory world-traversable
  2026-05-15 19:40 ` [PATCH v4 3/3] tracefs: make root directory world-traversable Anubhav Shelat
@ 2026-05-15 23:16   ` Steven Rostedt
  0 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2026-05-15 23:16 UTC (permalink / raw)
  To: Anubhav Shelat
  Cc: mpetlan, Masami Hiramatsu, Mathieu Desnoyers, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Thomas Falcon, linux-kernel, linux-trace-kernel,
	linux-perf-users

On Fri, 15 May 2026 15:40:07 -0400
Anubhav Shelat <ashelat@redhat.com> wrote:

> Change the default tracefs mount mode from 0700 to 0755. This allows
> unprivileged users to access the eventfs directories underneath which
> already use 0755.
> 
> Tracing data files use mode 0440 and 0640 so they are not exposed by
> this change. Only the format and id files, which have been marked as
> work-readable, become accessible.
> 
> Directory listings of kprobes and uprobes, which contain functions or
> binaries, become visible to unprivileged users but do not contain kernel
> addresses. Admins using probes can restore the previous behavior with
> chmod or mount -o mode=700.
> 

I've been thinking about this and I believe a better approach is to
make a eventfs that is mounted at:

 /sys/kernel/events

and be the same directory structure as /sys/kernel/tracing/events but
only contain read only files like "id" and "format". This directory
would be mounted as 555 and readable by all.

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-15 23:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 19:40 [PATCH v4 0/3] Enable perf tracing for unprivileged users Anubhav Shelat
2026-05-15 19:40 ` [PATCH v4 1/3] perf evsel: don't set PERF_SAMPLE_IP for unprivileged tracepoints Anubhav Shelat
2026-05-15 20:10   ` sashiko-bot
2026-05-15 19:40 ` [PATCH v4 2/3] perf: enable unprivileged syscall tracing with perf trace Anubhav Shelat
2026-05-15 19:40 ` [PATCH v4 3/3] tracefs: make root directory world-traversable Anubhav Shelat
2026-05-15 23:16   ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox