Re: [PATCH v4 2/3] perf: enable unprivileged syscall tracing with perf trace

Linux Trace Kernel
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Anubhav Shelat <ashelat@redhat.com>
Cc: mpetlan@redhat.com, Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	James Clark <james.clark@linaro.org>,
	Thomas Falcon <thomas.falcon@intel.com>,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v4 2/3] perf: enable unprivileged syscall tracing with perf trace
Date: Mon, 18 May 2026 23:41:16 +0200	[thread overview]
Message-ID: <20260518214116.GZ3102624@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20260515194010.93725-4-ashelat@redhat.com>

On Fri, May 15, 2026 at 03:40:06PM -0400, Anubhav Shelat wrote:
> Allow unprivileged users to trace their own processes' syscalls using
> perf trace, similar to strace without the intrusive overhead of ptrace().
> 
> Currently, perf trace requires CAP_PERFMON or paranoid level ≤ 1 even
> though the kernel has existing infrastructure (TRACE_EVENT_FL_CAP_ANY)
> specifically designed to mark syscall tracepoints as safe for
> unprivileged access. To fix this:
> 
> 1. Loosen the condition in perf_event_open() which requires privileges
>    for all events with exclude_kernel=0. This allows perf_event_open() to
>    bypass the paranoid check for task-attached tracepoint events. Ensure
>    that sample types which can expose kernel addresses to unprivileged
>    users are blocked. Ensure the PERF_SECURITY_KERNEL LSM hook is
>    preserved.
> 
> 2. Make the format and id tracefs files world-readable only for tracepoints
>    with TRACE_EVENT_FL_CAP_ANY, allowing unprivileged users to see syscall
>    tracepoint ids without exposing sensitive information.
> 
> 3. Add a check to perf_trace_event_perm() to block PERF_SAMPLE_IP on
>    kernel tracepoints for unprivileged users to prevent KASLR bypass. We do
>    this here rather than in kaddr_leak because perf_trace_event_perm() can
>    distinguish between kernel tracepoints and uprobe tracepoints, where the
>    IP is a safe user space address and is necessary for uprobe
>    functionality.
> 
> 4. Restrict pure counting events (no PERF_SAMPLE_RAW) to
>    TRACE_EVENT_FL_CAP_ANY tracepoints preventing unprivileged users from
>    counting internal kernel tracepoints while preserving current
>    behavior for exclude_kernel=1 events.

Typically patches are supposed to a single thing, you're listing 4
things. What gives?

> Example usage after this change:
>   $ perf trace ls          # works as unprivileged user
>   $ perf trace             # system-wide, still requires privileges
>   $ perf trace -p 1234     # requires ptrace permission on pid 1234
> 
> Assisted-by: Claude:claude-sonnet-4.5
> Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
> ---
>  kernel/events/core.c            | 28 +++++++++++++++++++++++++---
>  kernel/trace/trace_event_perf.c | 21 ++++++++++++++++++++-
>  kernel/trace/trace_events.c     | 16 ++++++++++++++--
>  3 files changed, 59 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 7935d5663944..ff2d1e9a0b79 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -13873,9 +13873,31 @@ SYSCALL_DEFINE5(perf_event_open,
>  		return err;
>  
>  	if (!attr.exclude_kernel) {
> -		err = perf_allow_kernel();
> -		if (err)
> -			return err;
> +		bool tp_bypass = false;
> +
> +		/* Check unprivileged tracepoints */
> +		if (attr.type == PERF_TYPE_TRACEPOINT && pid != -1) {
> +			/*
> +			 * Block sample types that expose kernel addresses to
> +			 * prevent KASLR bypass
> +			 */
> +			u64 kaddr_leak = PERF_SAMPLE_CALLCHAIN |
> +					 PERF_SAMPLE_BRANCH_STACK |
> +					 PERF_SAMPLE_ADDR |
> +					 PERF_SAMPLE_REGS_INTR;

PERF_SAMPLE_IP should be here too, no?

And I'm not sure if tracepoints can trigger it, but PHYS_ADDR also seems
something we shouldn't allow.

And we're sure RAW doesn't include pointers?

> +
> +			tp_bypass = !(attr.sample_type & kaddr_leak);
> +		}
> +
> +		if (!tp_bypass) {
> +			err = perf_allow_kernel();
> +			if (err)
> +				return err;
> +		} else {
> +			err = security_perf_event_open(PERF_SECURITY_KERNEL);
> +			if (err)
> +				return err;
> +		}
>  	}
>  
>  	if (attr.namespaces) {
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index a6bb7577e8c5..466007ed2869 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -72,9 +72,28 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
>  			return -EINVAL;
>  	}
>  
> +	/*
> +	 * PERF_SAMPLE_IP on kernel tracepoints exposes a kernel text
> +	 * address, weakening KASLR. Block for unprivileged users unless
> +	 * the tracepoint is a uprobe (userspace IP, safe to expose).
> +	 */
> +	if ((p_event->attr.sample_type & PERF_SAMPLE_IP) &&
> +	    !p_event->attr.exclude_kernel &&
> +	    !(tp_event->flags & TRACE_EVENT_FL_UPROBE) &&
> +	    sysctl_perf_event_paranoid > 1 && !perfmon_capable())
> +		return -EACCES;
> +
>  	/* No tracing, just counting, so no obvious leak */
> -	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW))
> +	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW)) {
> +		/* Prevent unprivileged users from counting kernel tracepoints */
> +		if (!p_event->attr.exclude_kernel &&
> +		    sysctl_perf_event_paranoid > 1 && !perfmon_capable()) {
> +			if (!(p_event->attach_state == PERF_ATTACH_TASK &&
> +			      (tp_event->flags & TRACE_EVENT_FL_CAP_ANY)))
> +				return -EACCES;
> +		}
>  		return 0;
> +	}

Maybe use less AI and try and type this yourself. I think you'll find
that repeating the same clauses over and over gets tiresome. IIRC they
invented something for that in the 60s or so :/

>  	/* Some events are ok to be traced by non-root users... */
>  	if (p_event->attach_state == PERF_ATTACH_TASK) {
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index c46e623e7e0d..cbd07e2ec528 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -3050,7 +3050,13 @@ static int event_callback(const char *name, umode_t *mode, void **data,
>  	struct trace_event_call *call = file->event_call;
>  
>  	if (strcmp(name, "format") == 0) {
> -		*mode = TRACE_MODE_READ;
> +		/*
> +		 * Make format tracefs file world readable for tracepoints with
> +		 * TRACE_EVENT_FL_CAP_ANY
> +		 */
> +		*mode = (call->flags & TRACE_EVENT_FL_CAP_ANY) ?
> +			(TRACE_MODE_READ | 0004) :
> +			TRACE_MODE_READ;
>  		*fops = &ftrace_event_format_fops;
>  		return 1;
>  	}
> @@ -3086,7 +3092,13 @@ static int event_callback(const char *name, umode_t *mode, void **data,
>  #ifdef CONFIG_PERF_EVENTS
>  	if (call->event.type && call->class->reg &&
>  	    strcmp(name, "id") == 0) {
> -		*mode = TRACE_MODE_READ;
> +		/*
> +		 * Make id tracefs file world readable for tracepoints with
> +		 * TRACE_EVENT_FL_CAP_ANY
> +		 */
> +		*mode = (call->flags & TRACE_EVENT_FL_CAP_ANY) ?
> +			(TRACE_MODE_READ | 0004) :
> +			TRACE_MODE_READ;
>  		*data = (void *)(long)call->event.type;
>  		*fops = &ftrace_event_id_fops;
>  		return 1;

Again, you're doing the same thing in multiple places. If only there was
something to re-use a previous expression.

None of this gives me warm and fuzzy feelings.

next prev parent reply	other threads:[~2026-05-18 21:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15 19:40 [PATCH v4 0/3] Enable perf tracing for unprivileged users Anubhav Shelat
2026-05-15 19:40 ` [PATCH v4 1/3] perf evsel: don't set PERF_SAMPLE_IP for unprivileged tracepoints Anubhav Shelat
2026-05-15 19:40 ` [PATCH v4 2/3] perf: enable unprivileged syscall tracing with perf trace Anubhav Shelat
2026-05-18 21:41   ` Peter Zijlstra [this message]
2026-05-15 19:40 ` [PATCH v4 3/3] tracefs: make root directory world-traversable Anubhav Shelat
2026-05-15 23:16   ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260518214116.GZ3102624@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=ashelat@redhat.com \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mpetlan@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=thomas.falcon@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox