* [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
@ 2024-06-09 6:33 Sam Sun
2024-06-09 13:04 ` Steven Rostedt
0 siblings, 1 reply; 21+ messages in thread
From: Sam Sun @ 2024-06-09 6:33 UTC (permalink / raw)
To: linux-kernel, x86
Cc: syzkaller-bugs, peterz, jpoimboe, jbaron, rostedt, ardb, tglx,
mingo, Borislav Petkov, dave.hansen, hpa, xrivendell7
Dear developers and maintainers,
We encountered a kernel bug while using our modified
syzkaller. It was tested against the latest upstream kernel.
Kernel crash log is listed below.
```
[ 82.310798][ T8020] ------------[ cut here ]------------
[ 82.311236][ T8020] kernel BUG at arch/x86/kernel/jump_label.c:73!
[ 82.311761][ T8020] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 82.312331][ T8020] CPU: 0 PID: 8020 Comm: static_key_slow Not
tainted 6.10.0-rc2-00366-g771ed66105de 3
[ 82.313044][ T8020] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.13.0-1ubuntu1.1 04/04
[ 82.313778][ T8020] RIP: 0010:__jump_label_patch+0x456/0x480
[ 82.314223][ T8020] Code: e8 af a1 5d 00 48 c7 c7 40 01 86 8b 48 8b
0c 24 48 89 ce 48 89 ca 4d 89 e8 4f
[ 82.315745][ T8020] RSP: 0018:ffffc9000f9cf980 EFLAGS: 00010292
[ 82.316204][ T8020] RAX: 000000000000008b RBX: 0000000000000085
RCX: 218564eba7c06800
[ 82.316813][ T8020] RDX: 0000000000000000 RSI: 0000000080000000
RDI: 0000000000000000
[ 82.317396][ T8020] RBP: ffffc9000f9cfac0 R08: ffffffff817176dc
R09: 1ffff92001f39ed0
[ 82.317987][ T8020] R10: dffffc0000000000 R11: fffff52001f39ed1
R12: 0000000000000001
[ 82.318548][ T8020] R13: ffffffff8b862901 R14: ffffffff90fbe8c0
R15: ffffffff8b862901
[ 82.319116][ T8020] FS: 00007f04610bc540(0000)
GS:ffff88802c800000(0000) knlGS:0000000000000000
[ 82.319808][ T8020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 82.320308][ T8020] CR2: 00007f0460fd9650 CR3: 000000004a16a000
CR4: 0000000000750ef0
[ 82.320883][ T8020] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[ 82.321460][ T8020] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[ 82.322004][ T8020] PKRU: 55555554
[ 82.322269][ T8020] Call Trace:
[ 82.322515][ T8020] <TASK>
[ 82.322729][ T8020] ? __die_body+0x88/0xe0
[ 82.323043][ T8020] ? die+0xcf/0x110
[ 82.323322][ T8020] ? do_trap+0x155/0x3a0
[ 82.323625][ T8020] ? __jump_label_patch+0x456/0x480
[ 82.324009][ T8020] ? do_error_trap+0x1dc/0x2a0
[ 82.324371][ T8020] ? __jump_label_patch+0x456/0x480
[ 82.324774][ T8020] ? do_int3+0x50/0x50
[ 82.325088][ T8020] ? handle_invalid_op+0x34/0x40
[ 82.325445][ T8020] ? __jump_label_patch+0x456/0x480
[ 82.325815][ T8020] ? exc_invalid_op+0x34/0x50
[ 82.326156][ T8020] ? asm_exc_invalid_op+0x1a/0x20
[ 82.326533][ T8020] ? __wake_up_klogd+0xcc/0x100
[ 82.326879][ T8020] ? __jump_label_patch+0x456/0x480
[ 82.327249][ T8020] ? cr4_update_pce+0xf/0x80
[ 82.327601][ T8020] ? arch_jump_label_transform_queue+0xf0/0xf0
[ 82.328047][ T8020] ? cr4_update_pce+0xf/0x80
[ 82.328380][ T8020] ? cr4_update_pce+0x1e/0x80
[ 82.328717][ T8020] ? cr4_update_pce+0x11/0x80
[ 82.329052][ T8020] ? read_lock_is_recursive+0x20/0x20
[ 82.329466][ T8020] ? __static_key_slow_dec_cpuslocked+0xb0/0x140
[ 82.329966][ T8020] ? text_poke_queue+0x46/0x180
[ 82.330329][ T8020] arch_jump_label_transform_queue+0x63/0xf0
[ 82.330742][ T8020] __jump_label_update+0x18b/0x3b0
[ 82.331101][ T8020] __static_key_slow_dec_cpuslocked+0xe9/0x140
[ 82.331519][ T8020] static_key_slow_dec+0x50/0xa0
[ 82.331873][ T8020] set_attr_rdpmc+0x193/0x270
[ 82.332179][ T8020] ? get_attr_rdpmc+0x30/0x30
[ 82.332511][ T8020] ? sysfs_kf_write+0x18d/0x2b0
[ 82.332832][ T8020] ? sysfs_kf_read+0x370/0x370
[ 82.333159][ T8020] kernfs_fop_write_iter+0x3ab/0x500
[ 82.333531][ T8020] vfs_write+0xa8b/0xd00
[ 82.333812][ T8020] ? kernfs_fop_read_iter+0x640/0x640
[ 82.334234][ T8020] ? kernel_write+0x330/0x330
[ 82.334598][ T8020] ? do_sys_openat2+0x16b/0x1c0
[ 82.334949][ T8020] ? do_sys_open+0x230/0x230
[ 82.335310][ T8020] ? __up_read+0x2bb/0x6a0
[ 82.335633][ T8020] ksys_write+0x17b/0x2a0
[ 82.335966][ T8020] ? __ia32_sys_read+0x90/0x90
[ 82.336333][ T8020] ? do_syscall_64+0xa5/0x230
[ 82.336656][ T8020] do_syscall_64+0xe2/0x230
[ 82.336983][ T8020] ? clear_bhb_loop+0x25/0x80
[ 82.337289][ T8020] entry_SYSCALL_64_after_hwframe+0x67/0x6f
[ 82.337692][ T8020] RIP: 0033:0x7f0460fd2473
[ 82.337987][ T8020] Code: 8b 15 21 2a 0e 00 f7 d8 64 89 02 48 c7 c0
ff ff ff ff eb b7 0f 1f 00 64 8b 08
[ 82.339230][ T8020] RSP: 002b:00007ffc46fb3058 EFLAGS: 00000246
ORIG_RAX: 0000000000000001
[ 82.339804][ T8020] RAX: ffffffffffffffda RBX: 0000000000000000
RCX: 00007f0460fd2473
[ 82.340331][ T8020] RDX: 0000000000000002 RSI: 00007ffc46fb3090
RDI: 0000000000000004
[ 82.340852][ T8020] RBP: 00007ffc46fb3cb0 R08: 000000000000000f
R09: 0072736d2f232f75
[ 82.341353][ T8020] R10: 0000000000000000 R11: 0000000000000246
R12: 00005615dc1dd1c0
[ 82.341853][ T8020] R13: 0000000000000000 R14: 0000000000000000
R15: 0000000000000000
[ 82.342364][ T8020] </TASK>
[ 82.342554][ T8020] Modules linked in:
[ 82.342875][ T8020] ---[ end trace 0000000000000000 ]---
[ 82.343230][ T8020] RIP: 0010:__jump_label_patch+0x456/0x480
[ 82.343668][ T8020] Code: e8 af a1 5d 00 48 c7 c7 40 01 86 8b 48 8b
0c 24 48 89 ce 48 89 ca 4d 89 e8 4f
[ 82.345750][ T8020] RSP: 0018:ffffc9000f9cf980 EFLAGS: 00010292
[ 82.346161][ T8020] RAX: 000000000000008b RBX: 0000000000000085
RCX: 218564eba7c06800
[ 82.346657][ T8020] RDX: 0000000000000000 RSI: 0000000080000000
RDI: 0000000000000000
[ 82.347391][ T8020] RBP: ffffc9000f9cfac0 R08: ffffffff817176dc
R09: 1ffff92001f39ed0
[ 82.347901][ T8020] R10: dffffc0000000000 R11: fffff52001f39ed1
R12: 0000000000000001
[ 82.348652][ T8020] R13: ffffffff8b862901 R14: ffffffff90fbe8c0
R15: ffffffff8b862901
[ 82.349173][ T8020] FS: 00007f04610bc540(0000)
GS:ffff88802c800000(0000) knlGS:0000000000000000
[ 82.349784][ T8020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 82.350196][ T8020] CR2: 00007f96d80450b8 CR3: 000000004a16a000
CR4: 0000000000750ef0
[ 82.350717][ T8020] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[ 82.351243][ T8020] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[ 82.351769][ T8020] PKRU: 55555554
[ 82.352011][ T8020] Kernel panic - not syncing: Fatal exception
[ 82.352665][ T8020] Kernel Offset: disabled
[ 82.352988][ T8020] Rebooting in 86400 seconds..
```
We have a C repro which can trigger this bug in latest upstream kernel
commit, listed below:
```
#define _GNU_SOURCE
#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
static unsigned long long procid;
static void sleep_ms(uint64_t ms)
{
usleep(ms * 1000);
}
static uint64_t current_time_ms(void)
{
struct timespec ts;
if (clock_gettime(CLOCK_MONOTONIC, &ts))
exit(1);
return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}
static bool write_file(const char* file, const char* what, ...)
{
char buf[1024];
va_list args;
va_start(args, what);
vsnprintf(buf, sizeof(buf), what, args);
va_end(args);
buf[sizeof(buf) - 1] = 0;
int len = strlen(buf);
int fd = open(file, O_WRONLY | O_CLOEXEC);
if (fd == -1)
return false;
if (write(fd, buf, len) != len) {
int err = errno;
close(fd);
errno = err;
return false;
}
close(fd);
return true;
}
static long syz_mod_dev(volatile long a0, volatile long a1, volatile long a2,
volatile long a3, volatile long a4)
{
int fd, sysfd;
char buf[1024], sysbuf[1024], input[1024];
char* hash;
char dev_num;
dev_num = 0;
strncpy(buf, (char*)a0, sizeof(buf) - 1);
buf[sizeof(buf) - 1] = 0;
while ((hash = strchr(buf, '#'))) {
*hash = '0' + (char)(a1 % 10);
a1 /= 10;
if (dev_num == 0)
dev_num = *hash;
}
fd = open(buf, a2, 0);
strncpy(sysbuf, (char*)a3, sizeof(sysbuf) - 1);
sysbuf[sizeof(sysbuf) - 1] = 0;
if ((hash = strchr(sysbuf, '#'))) {
*hash = dev_num;
while ((hash = strchr(sysbuf, '#'))) {
*hash = '0' + (char)(a1 % 10);
a1 /= 10;
}
}
sysfd = open(sysbuf, O_RDWR, 0);
strncpy(input, (char*)a4, sizeof(input) - 1);
input[sizeof(input) - 1] = 0;
hash = strchr(input, '\0');
sysfd = write(sysfd, input, hash - input + 1);
return fd;
}
static void kill_and_wait(int pid, int* status)
{
kill(-pid, SIGKILL);
kill(pid, SIGKILL);
for (int i = 0; i < 100; i++) {
if (waitpid(-1, status, WNOHANG | __WALL) == pid)
return;
usleep(1000);
}
DIR* dir = opendir("/sys/fs/fuse/connections");
if (dir) {
for (;;) {
struct dirent* ent = readdir(dir);
if (!ent)
break;
if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
continue;
char abort[300];
snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
ent->d_name);
int fd = open(abort, O_WRONLY);
if (fd == -1) {
continue;
}
if (write(fd, abort, 1) < 0) {
}
close(fd);
}
closedir(dir);
} else {
}
while (waitpid(-1, status, __WALL) != pid) {
}
}
static void setup_test()
{
prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
setpgrp();
write_file("/proc/self/oom_score_adj", "1000");
}
static void execute_one(void);
#define WAIT_FLAGS __WALL
static void loop(void)
{
int iter = 0;
for (;; iter++) {
int pid = fork();
if (pid < 0)
exit(1);
if (pid == 0) {
setup_test();
execute_one();
exit(0);
}
int status = 0;
uint64_t start = current_time_ms();
for (;;) {
if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
break;
sleep_ms(1);
if (current_time_ms() - start < 5000)
continue;
kill_and_wait(pid, &status);
break;
}
}
}
void execute_one(void)
{
memcpy((void*)0x20000000, "/dev/cpu/#/msr\000", 15);
memcpy((void*)0x200000c0, "/sys/devices/cpu/rdpmc\000", 23);
memcpy((void*)0x20000100, "0\000", 2);
syz_mod_dev(
/*dev=*/0x20000000, /*id=*/0x8001,
/*flags=O_TRUNC|O_PATH|O_NONBLOCK|O_NOATIME|O_EXCL|O_DIRECTORY|0x442*/
0x250ec2, /*file=*/0x200000c0, /*buf=*/0x20000100);
memcpy((void*)0x20000080, "/dev/cpu/#/msr\000", 15);
memcpy((void*)0x20000140, "/sys/devices/cpu/rdpmc\000", 23);
memcpy((void*)0x20000180, "1\000", 2);
syz_mod_dev(/*dev=*/0x20000080, /*id=*/0,
/*flags=__O_TMPFILE|O_APPEND*/ 0x400400, /*file=*/0x20000140,
/*buf=*/0x20000180);
}
int main(void)
{
syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
/*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
/*offset=*/0ul);
syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
/*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
/*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
/*offset=*/0ul);
syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
/*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
/*offset=*/0ul);
for (procid = 0; procid < 4; procid++) {
if (fork() == 0) {
loop();
}
}
sleep(1000000);
return 0;
}
```
If you have any questions, please contact us.
Reported by Yue Sun <samsun1006219@gmail.com>
Reported by xingwei lee <xrivendell7@gmail.com>
Best Regards,
Yue
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 6:33 [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked Sam Sun
@ 2024-06-09 13:04 ` Steven Rostedt
2024-06-09 14:06 ` Thomas Gleixner
0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2024-06-09 13:04 UTC (permalink / raw)
To: Sam Sun
Cc: linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe, jbaron, ardb,
tglx, mingo, Borislav Petkov, dave.hansen, hpa, xrivendell7,
Greg Kroah-Hartman, Tejun Heo
On Sun, 9 Jun 2024 14:33:01 +0800
Sam Sun <samsun1006219@gmail.com> wrote:
> Dear developers and maintainers,
>
> We encountered a kernel bug while using our modified
> syzkaller. It was tested against the latest upstream kernel.
> Kernel crash log is listed below.
>
> ```
> [ 82.310798][ T8020] ------------[ cut here ]------------
> [ 82.311236][ T8020] kernel BUG at arch/x86/kernel/jump_label.c:73!
This is not a bug with jump labels. It's a bug with whatever is using jump
labels. Looks like something tried to modify a jump label that no longer
exists.
> [ 82.311761][ T8020] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
> [ 82.312331][ T8020] CPU: 0 PID: 8020 Comm: static_key_slow Not
> tainted 6.10.0-rc2-00366-g771ed66105de 3
> [ 82.313044][ T8020] Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996), BIOS 1.13.0-1ubuntu1.1 04/04
> [ 82.313778][ T8020] RIP: 0010:__jump_label_patch+0x456/0x480
> [ 82.314223][ T8020] Code: e8 af a1 5d 00 48 c7 c7 40 01 86 8b 48 8b
> 0c 24 48 89 ce 48 89 ca 4d 89 e8 4f
> [ 82.315745][ T8020] RSP: 0018:ffffc9000f9cf980 EFLAGS: 00010292
> [ 82.316204][ T8020] RAX: 000000000000008b RBX: 0000000000000085
> RCX: 218564eba7c06800
> [ 82.316813][ T8020] RDX: 0000000000000000 RSI: 0000000080000000
> RDI: 0000000000000000
> [ 82.317396][ T8020] RBP: ffffc9000f9cfac0 R08: ffffffff817176dc
> R09: 1ffff92001f39ed0
> [ 82.317987][ T8020] R10: dffffc0000000000 R11: fffff52001f39ed1
> R12: 0000000000000001
> [ 82.318548][ T8020] R13: ffffffff8b862901 R14: ffffffff90fbe8c0
> R15: ffffffff8b862901
> [ 82.319116][ T8020] FS: 00007f04610bc540(0000)
> GS:ffff88802c800000(0000) knlGS:0000000000000000
> [ 82.319808][ T8020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 82.320308][ T8020] CR2: 00007f0460fd9650 CR3: 000000004a16a000
> CR4: 0000000000750ef0
> [ 82.320883][ T8020] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [ 82.321460][ T8020] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
> [ 82.322004][ T8020] PKRU: 55555554
> [ 82.322269][ T8020] Call Trace:
> [ 82.322515][ T8020] <TASK>
> [ 82.322729][ T8020] ? __die_body+0x88/0xe0
> [ 82.323043][ T8020] ? die+0xcf/0x110
> [ 82.323322][ T8020] ? do_trap+0x155/0x3a0
> [ 82.323625][ T8020] ? __jump_label_patch+0x456/0x480
> [ 82.324009][ T8020] ? do_error_trap+0x1dc/0x2a0
> [ 82.324371][ T8020] ? __jump_label_patch+0x456/0x480
> [ 82.324774][ T8020] ? do_int3+0x50/0x50
> [ 82.325088][ T8020] ? handle_invalid_op+0x34/0x40
> [ 82.325445][ T8020] ? __jump_label_patch+0x456/0x480
> [ 82.325815][ T8020] ? exc_invalid_op+0x34/0x50
> [ 82.326156][ T8020] ? asm_exc_invalid_op+0x1a/0x20
> [ 82.326533][ T8020] ? __wake_up_klogd+0xcc/0x100
> [ 82.326879][ T8020] ? __jump_label_patch+0x456/0x480
> [ 82.327249][ T8020] ? cr4_update_pce+0xf/0x80
> [ 82.327601][ T8020] ? arch_jump_label_transform_queue+0xf0/0xf0
> [ 82.328047][ T8020] ? cr4_update_pce+0xf/0x80
> [ 82.328380][ T8020] ? cr4_update_pce+0x1e/0x80
> [ 82.328717][ T8020] ? cr4_update_pce+0x11/0x80
> [ 82.329052][ T8020] ? read_lock_is_recursive+0x20/0x20
> [ 82.329466][ T8020] ? __static_key_slow_dec_cpuslocked+0xb0/0x140
> [ 82.329966][ T8020] ? text_poke_queue+0x46/0x180
> [ 82.330329][ T8020] arch_jump_label_transform_queue+0x63/0xf0
> [ 82.330742][ T8020] __jump_label_update+0x18b/0x3b0
> [ 82.331101][ T8020] __static_key_slow_dec_cpuslocked+0xe9/0x140
> [ 82.331519][ T8020] static_key_slow_dec+0x50/0xa0
> [ 82.331873][ T8020] set_attr_rdpmc+0x193/0x270
> [ 82.332179][ T8020] ? get_attr_rdpmc+0x30/0x30
> [ 82.332511][ T8020] ? sysfs_kf_write+0x18d/0x2b0
> [ 82.332832][ T8020] ? sysfs_kf_read+0x370/0x370
> [ 82.333159][ T8020] kernfs_fop_write_iter+0x3ab/0x500
So, something in kernfs modified a jump label location that was freed?
-- Steve
> [ 82.333531][ T8020] vfs_write+0xa8b/0xd00
> [ 82.333812][ T8020] ? kernfs_fop_read_iter+0x640/0x640
> [ 82.334234][ T8020] ? kernel_write+0x330/0x330
> [ 82.334598][ T8020] ? do_sys_openat2+0x16b/0x1c0
> [ 82.334949][ T8020] ? do_sys_open+0x230/0x230
> [ 82.335310][ T8020] ? __up_read+0x2bb/0x6a0
> [ 82.335633][ T8020] ksys_write+0x17b/0x2a0
> [ 82.335966][ T8020] ? __ia32_sys_read+0x90/0x90
> [ 82.336333][ T8020] ? do_syscall_64+0xa5/0x230
> [ 82.336656][ T8020] do_syscall_64+0xe2/0x230
> [ 82.336983][ T8020] ? clear_bhb_loop+0x25/0x80
> [ 82.337289][ T8020] entry_SYSCALL_64_after_hwframe+0x67/0x6f
> [ 82.337692][ T8020] RIP: 0033:0x7f0460fd2473
> [ 82.337987][ T8020] Code: 8b 15 21 2a 0e 00 f7 d8 64 89 02 48 c7 c0
> ff ff ff ff eb b7 0f 1f 00 64 8b 08
> [ 82.339230][ T8020] RSP: 002b:00007ffc46fb3058 EFLAGS: 00000246
> ORIG_RAX: 0000000000000001
> [ 82.339804][ T8020] RAX: ffffffffffffffda RBX: 0000000000000000
> RCX: 00007f0460fd2473
> [ 82.340331][ T8020] RDX: 0000000000000002 RSI: 00007ffc46fb3090
> RDI: 0000000000000004
> [ 82.340852][ T8020] RBP: 00007ffc46fb3cb0 R08: 000000000000000f
> R09: 0072736d2f232f75
> [ 82.341353][ T8020] R10: 0000000000000000 R11: 0000000000000246
> R12: 00005615dc1dd1c0
> [ 82.341853][ T8020] R13: 0000000000000000 R14: 0000000000000000
> R15: 0000000000000000
> [ 82.342364][ T8020] </TASK>
> [ 82.342554][ T8020] Modules linked in:
> [ 82.342875][ T8020] ---[ end trace 0000000000000000 ]---
> [ 82.343230][ T8020] RIP: 0010:__jump_label_patch+0x456/0x480
> [ 82.343668][ T8020] Code: e8 af a1 5d 00 48 c7 c7 40 01 86 8b 48 8b
> 0c 24 48 89 ce 48 89 ca 4d 89 e8 4f
> [ 82.345750][ T8020] RSP: 0018:ffffc9000f9cf980 EFLAGS: 00010292
> [ 82.346161][ T8020] RAX: 000000000000008b RBX: 0000000000000085
> RCX: 218564eba7c06800
> [ 82.346657][ T8020] RDX: 0000000000000000 RSI: 0000000080000000
> RDI: 0000000000000000
> [ 82.347391][ T8020] RBP: ffffc9000f9cfac0 R08: ffffffff817176dc
> R09: 1ffff92001f39ed0
> [ 82.347901][ T8020] R10: dffffc0000000000 R11: fffff52001f39ed1
> R12: 0000000000000001
> [ 82.348652][ T8020] R13: ffffffff8b862901 R14: ffffffff90fbe8c0
> R15: ffffffff8b862901
> [ 82.349173][ T8020] FS: 00007f04610bc540(0000)
> GS:ffff88802c800000(0000) knlGS:0000000000000000
> [ 82.349784][ T8020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 82.350196][ T8020] CR2: 00007f96d80450b8 CR3: 000000004a16a000
> CR4: 0000000000750ef0
> [ 82.350717][ T8020] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [ 82.351243][ T8020] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
> [ 82.351769][ T8020] PKRU: 55555554
> [ 82.352011][ T8020] Kernel panic - not syncing: Fatal exception
> [ 82.352665][ T8020] Kernel Offset: disabled
> [ 82.352988][ T8020] Rebooting in 86400 seconds..
> ```
>
> We have a C repro which can trigger this bug in latest upstream kernel
> commit, listed below:
>
> ```
> #define _GNU_SOURCE
>
> #include <dirent.h>
> #include <endian.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <signal.h>
> #include <stdarg.h>
> #include <stdbool.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/prctl.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <time.h>
> #include <unistd.h>
>
> static unsigned long long procid;
>
> static void sleep_ms(uint64_t ms)
> {
> usleep(ms * 1000);
> }
>
> static uint64_t current_time_ms(void)
> {
> struct timespec ts;
> if (clock_gettime(CLOCK_MONOTONIC, &ts))
> exit(1);
> return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
> }
>
> static bool write_file(const char* file, const char* what, ...)
> {
> char buf[1024];
> va_list args;
> va_start(args, what);
> vsnprintf(buf, sizeof(buf), what, args);
> va_end(args);
> buf[sizeof(buf) - 1] = 0;
> int len = strlen(buf);
> int fd = open(file, O_WRONLY | O_CLOEXEC);
> if (fd == -1)
> return false;
> if (write(fd, buf, len) != len) {
> int err = errno;
> close(fd);
> errno = err;
> return false;
> }
> close(fd);
> return true;
> }
>
> static long syz_mod_dev(volatile long a0, volatile long a1, volatile long a2,
> volatile long a3, volatile long a4)
> {
> int fd, sysfd;
> char buf[1024], sysbuf[1024], input[1024];
> char* hash;
> char dev_num;
> dev_num = 0;
> strncpy(buf, (char*)a0, sizeof(buf) - 1);
> buf[sizeof(buf) - 1] = 0;
> while ((hash = strchr(buf, '#'))) {
> *hash = '0' + (char)(a1 % 10);
> a1 /= 10;
> if (dev_num == 0)
> dev_num = *hash;
> }
> fd = open(buf, a2, 0);
> strncpy(sysbuf, (char*)a3, sizeof(sysbuf) - 1);
> sysbuf[sizeof(sysbuf) - 1] = 0;
> if ((hash = strchr(sysbuf, '#'))) {
> *hash = dev_num;
> while ((hash = strchr(sysbuf, '#'))) {
> *hash = '0' + (char)(a1 % 10);
> a1 /= 10;
> }
> }
> sysfd = open(sysbuf, O_RDWR, 0);
> strncpy(input, (char*)a4, sizeof(input) - 1);
> input[sizeof(input) - 1] = 0;
> hash = strchr(input, '\0');
> sysfd = write(sysfd, input, hash - input + 1);
> return fd;
> }
>
> static void kill_and_wait(int pid, int* status)
> {
> kill(-pid, SIGKILL);
> kill(pid, SIGKILL);
> for (int i = 0; i < 100; i++) {
> if (waitpid(-1, status, WNOHANG | __WALL) == pid)
> return;
> usleep(1000);
> }
> DIR* dir = opendir("/sys/fs/fuse/connections");
> if (dir) {
> for (;;) {
> struct dirent* ent = readdir(dir);
> if (!ent)
> break;
> if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
> continue;
> char abort[300];
> snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
> ent->d_name);
> int fd = open(abort, O_WRONLY);
> if (fd == -1) {
> continue;
> }
> if (write(fd, abort, 1) < 0) {
> }
> close(fd);
> }
> closedir(dir);
> } else {
> }
> while (waitpid(-1, status, __WALL) != pid) {
> }
> }
>
> static void setup_test()
> {
> prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
> setpgrp();
> write_file("/proc/self/oom_score_adj", "1000");
> }
>
> static void execute_one(void);
>
> #define WAIT_FLAGS __WALL
>
> static void loop(void)
> {
> int iter = 0;
> for (;; iter++) {
> int pid = fork();
> if (pid < 0)
> exit(1);
> if (pid == 0) {
> setup_test();
> execute_one();
> exit(0);
> }
> int status = 0;
> uint64_t start = current_time_ms();
> for (;;) {
> if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
> break;
> sleep_ms(1);
> if (current_time_ms() - start < 5000)
> continue;
> kill_and_wait(pid, &status);
> break;
> }
> }
> }
>
> void execute_one(void)
> {
> memcpy((void*)0x20000000, "/dev/cpu/#/msr\000", 15);
> memcpy((void*)0x200000c0, "/sys/devices/cpu/rdpmc\000", 23);
> memcpy((void*)0x20000100, "0\000", 2);
> syz_mod_dev(
> /*dev=*/0x20000000, /*id=*/0x8001,
> /*flags=O_TRUNC|O_PATH|O_NONBLOCK|O_NOATIME|O_EXCL|O_DIRECTORY|0x442*/
> 0x250ec2, /*file=*/0x200000c0, /*buf=*/0x20000100);
> memcpy((void*)0x20000080, "/dev/cpu/#/msr\000", 15);
> memcpy((void*)0x20000140, "/sys/devices/cpu/rdpmc\000", 23);
> memcpy((void*)0x20000180, "1\000", 2);
> syz_mod_dev(/*dev=*/0x20000080, /*id=*/0,
> /*flags=__O_TMPFILE|O_APPEND*/ 0x400400, /*file=*/0x20000140,
> /*buf=*/0x20000180);
> }
> int main(void)
> {
> syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> /*offset=*/0ul);
> syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
> /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> /*offset=*/0ul);
> syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> /*offset=*/0ul);
> for (procid = 0; procid < 4; procid++) {
> if (fork() == 0) {
> loop();
> }
> }
> sleep(1000000);
> return 0;
> }
> ```
>
> If you have any questions, please contact us.
>
> Reported by Yue Sun <samsun1006219@gmail.com>
> Reported by xingwei lee <xrivendell7@gmail.com>
>
> Best Regards,
> Yue
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 13:04 ` Steven Rostedt
@ 2024-06-09 14:06 ` Thomas Gleixner
2024-06-09 14:25 ` Steven Rostedt
0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-09 14:06 UTC (permalink / raw)
To: Steven Rostedt, Sam Sun
Cc: linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe, jbaron, ardb,
mingo, Borislav Petkov, dave.hansen, hpa, xrivendell7,
Greg Kroah-Hartman, Tejun Heo
On Sun, Jun 09 2024 at 09:04, Steven Rostedt wrote:
> On Sun, 9 Jun 2024 14:33:01 +0800
> Sam Sun <samsun1006219@gmail.com> wrote:
>> [ 82.310798][ T8020] ------------[ cut here ]------------
>> [ 82.311236][ T8020] kernel BUG at arch/x86/kernel/jump_label.c:73!
>
> This is not a bug with jump labels. It's a bug with whatever is using jump
> labels. Looks like something tried to modify a jump label that no longer
> exists.
The jump label exists.
>> [ 82.331873][ T8020] set_attr_rdpmc+0x193/0x270
>> [ 82.332179][ T8020] ? get_attr_rdpmc+0x30/0x30
>> [ 82.332511][ T8020] ? sysfs_kf_write+0x18d/0x2b0
>> [ 82.332832][ T8020] ? sysfs_kf_read+0x370/0x370
>> [ 82.333159][ T8020] kernfs_fop_write_iter+0x3ab/0x500
>
> So, something in kernfs modified a jump label location that was freed?
No. What happens is:
CPU 0 CPU 1
kernfs_fop_write_iter() kernfs_fop_write_iter()
set_attr_rdpmc() set_attr_rdpmc()
arch_jump_label_transform_queue() arch_jump_label_transform_queue()
mutex_lock(text_mutex) mutex_lock(text_mutex)
__jump_label_patch()
text_poke_queue()
mutex_unlokc(text_mutex)
__jump_label_patch()
CPU 1 sees the original text and not the expected because CPU 0 did not
yet invoke arch_jump_label_transform_apply().
So clearly set_attr_rdpmc() lacks serialization, no?
Thanks,
tglx
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 14:06 ` Thomas Gleixner
@ 2024-06-09 14:25 ` Steven Rostedt
2024-06-09 16:02 ` Thomas Gleixner
0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2024-06-09 14:25 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Sam Sun, linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe,
jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
xrivendell7, Greg Kroah-Hartman, Tejun Heo
On Sun, 09 Jun 2024 16:06:05 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sun, Jun 09 2024 at 09:04, Steven Rostedt wrote:
> > On Sun, 9 Jun 2024 14:33:01 +0800
> > Sam Sun <samsun1006219@gmail.com> wrote:
> >> [ 82.310798][ T8020] ------------[ cut here ]------------
> >> [ 82.311236][ T8020] kernel BUG at arch/x86/kernel/jump_label.c:73!
> >
> > This is not a bug with jump labels. It's a bug with whatever is using jump
> > labels. Looks like something tried to modify a jump label that no longer
> > exists.
>
> The jump label exists.
Ah, I missed the set_attr_rdpmc() as something not with a "?" in front :-p
>
> >> [ 82.331873][ T8020] set_attr_rdpmc+0x193/0x270
> >> [ 82.332179][ T8020] ? get_attr_rdpmc+0x30/0x30
> >> [ 82.332511][ T8020] ? sysfs_kf_write+0x18d/0x2b0
> >> [ 82.332832][ T8020] ? sysfs_kf_read+0x370/0x370
> >> [ 82.333159][ T8020] kernfs_fop_write_iter+0x3ab/0x500
> >
> > So, something in kernfs modified a jump label location that was freed?
>
> No. What happens is:
>
> CPU 0 CPU 1
>
> kernfs_fop_write_iter() kernfs_fop_write_iter()
> set_attr_rdpmc() set_attr_rdpmc()
> arch_jump_label_transform_queue() arch_jump_label_transform_queue()
> mutex_lock(text_mutex) mutex_lock(text_mutex)
> __jump_label_patch()
> text_poke_queue()
> mutex_unlokc(text_mutex)
> __jump_label_patch()
>
> CPU 1 sees the original text and not the expected because CPU 0 did not
> yet invoke arch_jump_label_transform_apply().
>
> So clearly set_attr_rdpmc() lacks serialization, no?
>
Hmm, but should jump labels fail when that happens? Or should it catch
it, and not cause a BUG?
-- Steve
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 14:25 ` Steven Rostedt
@ 2024-06-09 16:02 ` Thomas Gleixner
2024-06-09 16:56 ` Thomas Gleixner
0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-09 16:02 UTC (permalink / raw)
To: Steven Rostedt
Cc: Sam Sun, linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe,
jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
xrivendell7, Greg Kroah-Hartman, Tejun Heo
On Sun, Jun 09 2024 at 10:25, Steven Rostedt wrote:
> On Sun, 09 Jun 2024 16:06:05 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> CPU 0 CPU 1
>>
>> kernfs_fop_write_iter() kernfs_fop_write_iter()
>> set_attr_rdpmc() set_attr_rdpmc()
>> arch_jump_label_transform_queue() arch_jump_label_transform_queue()
>> mutex_lock(text_mutex) mutex_lock(text_mutex)
>> __jump_label_patch()
>> text_poke_queue()
>> mutex_unlokc(text_mutex)
>> __jump_label_patch()
>>
>> CPU 1 sees the original text and not the expected because CPU 0 did not
>> yet invoke arch_jump_label_transform_apply().
>>
>> So clearly set_attr_rdpmc() lacks serialization, no?
>>
> Hmm, but should jump labels fail when that happens? Or should it catch
> it, and not cause a BUG?
Well the bug is there to detect inconsistency and that clearly works :)
But I clearly can't read, because the jump label operations are
serialized via jump_label_mutex. Hrm...
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 16:02 ` Thomas Gleixner
@ 2024-06-09 16:56 ` Thomas Gleixner
2024-06-09 19:39 ` Thomas Gleixner
2024-06-10 6:46 ` Peter Zijlstra
0 siblings, 2 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-09 16:56 UTC (permalink / raw)
To: Steven Rostedt
Cc: Sam Sun, linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe,
jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
xrivendell7, Greg Kroah-Hartman, Tejun Heo
On Sun, Jun 09 2024 at 18:02, Thomas Gleixner wrote:
> On Sun, Jun 09 2024 at 10:25, Steven Rostedt wrote:
> Well the bug is there to detect inconsistency and that clearly works :)
>
> But I clearly can't read, because the jump label operations are
> serialized via jump_label_mutex. Hrm...
Ok. Now I found if for real. It's in the jump label core:
CPU0 CPU1
static_key_slow_dec()
static_key_slow_try_dec()
key->enabled == 1
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
if (val == 1)
return false;
jump_label_lock();
if (atomic_dec_and_test(&key->enabled)) {
--> key->enabled == 0
__jump_label_update()
static_key_slow_dec()
static_key_slow_try_dec()
key->enabled == 0
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
--> key->enabled == -1 <- FAIL
static_key_slow_try_dec() is buggy. It needs similar logic as
static_key_slow_try_inc() to work correctly.
It's not only the 0, key->enabled can be -1 when the other CPU is in the
slow path of enabling it.
I'll send a patch after testing it.
Thanks,
tglx
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 16:56 ` Thomas Gleixner
@ 2024-06-09 19:39 ` Thomas Gleixner
2024-06-10 6:46 ` Peter Zijlstra
1 sibling, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-09 19:39 UTC (permalink / raw)
To: Steven Rostedt
Cc: Sam Sun, linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe,
jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
xrivendell7, Greg Kroah-Hartman, Tejun Heo
On Sun, Jun 09 2024 at 18:56, Thomas Gleixner wrote:
> On Sun, Jun 09 2024 at 18:02, Thomas Gleixner wrote:
>> On Sun, Jun 09 2024 at 10:25, Steven Rostedt wrote:
>> Well the bug is there to detect inconsistency and that clearly works :)
>>
>> But I clearly can't read, because the jump label operations are
>> serialized via jump_label_mutex. Hrm...
>
> Ok. Now I found if for real. It's in the jump label core:
>
> CPU0 CPU1
>
> static_key_slow_dec()
> static_key_slow_try_dec()
>
> key->enabled == 1
> val = atomic_fetch_add_unless(&key->enabled, -1, 1);
> if (val == 1)
> return false;
>
> jump_label_lock();
> if (atomic_dec_and_test(&key->enabled)) {
> --> key->enabled == 0
> __jump_label_update()
>
> static_key_slow_dec()
> static_key_slow_try_dec()
>
> key->enabled == 0
> val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>
> --> key->enabled == -1 <- FAIL
>
> static_key_slow_try_dec() is buggy. It needs similar logic as
> static_key_slow_try_inc() to work correctly.
>
> It's not only the 0, key->enabled can be -1 when the other CPU is in the
> slow path of enabling it.
>
> I'll send a patch after testing it.
That's fixable, but it does not cure all of it.
set_attr_rdpmc() really needs serialization otherwise it can end up with
unbalanced operations.
CPU0 CPU1
x86_pmu.attr_rdpmc == 0
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
static_branch_dec(&rdpmc_never_available_key);
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
FAIL ---> static_branch_dec(&rdpmc_never_available_key);
No?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 16:56 ` Thomas Gleixner
2024-06-09 19:39 ` Thomas Gleixner
@ 2024-06-10 6:46 ` Peter Zijlstra
2024-06-10 10:34 ` Thomas Gleixner
1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2024-06-10 6:46 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Steven Rostedt, Sam Sun, linux-kernel, x86, syzkaller-bugs,
jpoimboe, jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
xrivendell7, Greg Kroah-Hartman, Tejun Heo
On Sun, Jun 09, 2024 at 06:56:14PM +0200, Thomas Gleixner wrote:
> Ok. Now I found if for real. It's in the jump label core:
>
> CPU0 CPU1
>
> static_key_slow_dec()
> static_key_slow_try_dec()
>
> key->enabled == 1
> val = atomic_fetch_add_unless(&key->enabled, -1, 1);
> if (val == 1)
> return false;
>
> jump_label_lock();
> if (atomic_dec_and_test(&key->enabled)) {
> --> key->enabled == 0
> __jump_label_update()
>
> static_key_slow_dec()
> static_key_slow_try_dec()
>
> key->enabled == 0
> val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>
> --> key->enabled == -1 <- FAIL
>
> static_key_slow_try_dec() is buggy. It needs similar logic as
> static_key_slow_try_inc() to work correctly.
>
> It's not only the 0, key->enabled can be -1 when the other CPU is in the
> slow path of enabling it.
Well, the -1 thing is in the 0->1 path, that is, the very first enabler.
That *should* not race with a disabler. If it does, there is external
confusion. (As I think the follow up email shows..)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-10 6:46 ` Peter Zijlstra
@ 2024-06-10 10:34 ` Thomas Gleixner
2024-06-10 12:46 ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 10:34 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Steven Rostedt, Sam Sun, linux-kernel, x86, syzkaller-bugs,
jpoimboe, jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
xrivendell7, Greg Kroah-Hartman, Tejun Heo
On Mon, Jun 10 2024 at 08:46, Peter Zijlstra wrote:
> On Sun, Jun 09, 2024 at 06:56:14PM +0200, Thomas Gleixner wrote:
>
>> Ok. Now I found if for real. It's in the jump label core:
>>
>> CPU0 CPU1
>>
>> static_key_slow_dec()
>> static_key_slow_try_dec()
>>
>> key->enabled == 1
>> val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>> if (val == 1)
>> return false;
>>
>> jump_label_lock();
>> if (atomic_dec_and_test(&key->enabled)) {
>> --> key->enabled == 0
>> __jump_label_update()
>>
>> static_key_slow_dec()
>> static_key_slow_try_dec()
>>
>> key->enabled == 0
>> val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>>
>> --> key->enabled == -1 <- FAIL
>>
>> static_key_slow_try_dec() is buggy. It needs similar logic as
>> static_key_slow_try_inc() to work correctly.
>>
>> It's not only the 0, key->enabled can be -1 when the other CPU is in the
>> slow path of enabling it.
>
> Well, the -1 thing is in the 0->1 path, that is, the very first enabler.
>
> That *should* not race with a disabler. If it does, there is external
> confusion. (As I think the follow up email shows..)
Right, but all of this is too fragile. Let me send out those patches.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 0/4] perf/x86, jump_label: Cure serialization issues
2024-06-10 10:34 ` Thomas Gleixner
@ 2024-06-10 12:46 ` Thomas Gleixner
2024-06-10 12:46 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
` (3 more replies)
0 siblings, 4 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
Yue and Xingwei reported a syzkaller crash related to the rdpmc attribute
file and jump labels. It turns out to be lack of serialization in the
attribute write path and non-robustness on the jump label side.
The following series addresses these issues.
Thanks,
tglx
---
arch/x86/events/core.c | 3 ++
kernel/jump_label.c | 67 ++++++++++++++++++++++++++++---------------------
2 files changed, 42 insertions(+), 28 deletions(-)
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 1/4] perf/x86: Serialize set_attr_rdpmc()
2024-06-10 12:46 ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
@ 2024-06-10 12:46 ` Thomas Gleixner
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
` (2 subsequent siblings)
3 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
Yue and Xingwei reported a jump label failure. It's caused by the lack of
serialization in set_attr_rdpmc():
CPU0 CPU1
Assume: x86_pmu.attr_rdpmc == 0
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
static_branch_dec(&rdpmc_never_available_key);
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
FAIL, due to imbalance ---> static_branch_dec(&rdpmc_never_available_key);
The reported BUG() is a consequence of the above and of another bug in the
jump label core code. The core code needs a separate fix, but that cannot
prevent the imbalance problem caused by set_attr_rdpmc().
Prevent this by serializing set_attr_rdpmc() locally.
Fixes: a66734297f78 ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks")
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Closes: https://lore.kernel.org/r/CAEkJfYNzfW1vG=ZTMdz_Weoo=RXY1NDunbxnDaLyj8R4kEoE_w@mail.gmail.com
---
arch/x86/events/core.c | 3 +++
1 file changed, 3 insertions(+)
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2547,6 +2547,7 @@ static ssize_t set_attr_rdpmc(struct dev
struct device_attribute *attr,
const char *buf, size_t count)
{
+ static DEFINE_MUTEX(rdpmc_mutex);
unsigned long val;
ssize_t ret;
@@ -2560,6 +2561,8 @@ static ssize_t set_attr_rdpmc(struct dev
if (x86_pmu.attr_rdpmc_broken)
return -ENOTSUPP;
+ guard(mutex)(&rdpmc_mutex);
+
if (val != x86_pmu.attr_rdpmc) {
/*
* Changing into or out of never available or always available,
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec()
2024-06-10 12:46 ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
2024-06-10 12:46 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
@ 2024-06-10 12:46 ` Thomas Gleixner
2024-06-10 17:57 ` Peter Zijlstra
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46 ` [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled() Thomas Gleixner
2024-06-10 12:46 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
3 siblings, 2 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
The commit which tried to fix the concurrency issues of concurrent
static_key_slow_inc() failed to fix the equivalent issues
vs. static_key_slow_dec():
CPU0 CPU1
static_key_slow_dec()
static_key_slow_try_dec()
key->enabled == 1
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
if (val == 1)
return false;
jump_label_lock();
if (atomic_dec_and_test(&key->enabled)) {
--> key->enabled == 0
__jump_label_update()
static_key_slow_dec()
static_key_slow_try_dec()
key->enabled == 0
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
--> key->enabled == -1 <- FAIL
There is another bug in that code, when there is a concurrent
static_key_slow_inc() which enables the key as that sets key->enabled to -1
so on the other CPU
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
will succeed and decrement to -2, which is invalid.
Cure all of this by replacing the atomic_fetch_add_unless() with a
atomic_try_cmpxchg() loop similar to static_key_fast_inc_not_disabled().
Fixes: 4c5ea0a9cd02 ("locking/static_key: Fix concurrent static_key_slow_inc()")
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/jump_label.c | 38 ++++++++++++++++++++++----------------
1 file changed, 22 insertions(+), 16 deletions(-)
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -131,7 +131,7 @@ bool static_key_fast_inc_not_disabled(st
STATIC_KEY_CHECK_USE(key);
/*
* Negative key->enabled has a special meaning: it sends
- * static_key_slow_inc() down the slow path, and it is non-zero
+ * static_key_slow_inc/dec() down the slow path, and it is non-zero
* so it counts as "enabled" in jump_label_update(). Note that
* atomic_inc_unless_negative() checks >= 0, so roll our own.
*/
@@ -150,7 +150,7 @@ bool static_key_slow_inc_cpuslocked(stru
lockdep_assert_cpus_held();
/*
- * Careful if we get concurrent static_key_slow_inc() calls;
+ * Careful if we get concurrent static_key_slow_inc/dec() calls;
* later calls must wait for the first one to _finish_ the
* jump_label_update() process. At the same time, however,
* the jump_label_update() call below wants to see
@@ -247,20 +247,25 @@ EXPORT_SYMBOL_GPL(static_key_disable);
static bool static_key_slow_try_dec(struct static_key *key)
{
- int val;
-
- val = atomic_fetch_add_unless(&key->enabled, -1, 1);
- if (val == 1)
- return false;
+ int v;
/*
- * The negative count check is valid even when a negative
- * key->enabled is in use by static_key_slow_inc(); a
- * __static_key_slow_dec() before the first static_key_slow_inc()
- * returns is unbalanced, because all other static_key_slow_inc()
- * instances block while the update is in progress.
+ * Go into the slow path if key::enabled is less than or equal than
+ * one. One is valid to shut down the key, anything less than one
+ * is an imbalance, which is handled at the call site.
+ *
+ * That includes the special case of '-1' which is set in
+ * static_key_slow_inc_cpuslocked(), but that's harmless as it is
+ * fully serialized in the slow path below. By the time this task
+ * acquires the jump label lock the value is back to one and the
+ * retry under the lock must succeed.
*/
- WARN(val < 0, "jump label: negative count!\n");
+ v = atomic_read(&key->enabled);
+ do {
+ if (v <= 1)
+ return false;
+ } while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1)));
+
return true;
}
@@ -271,10 +276,11 @@ static void __static_key_slow_dec_cpuslo
if (static_key_slow_try_dec(key))
return;
- jump_label_lock();
- if (atomic_dec_and_test(&key->enabled))
+ guard(mutex)(&jump_label_mutex);
+ if (atomic_cmpxchg(&key->enabled, 1, 0))
jump_label_update(key);
- jump_label_unlock();
+ else
+ WARN_ON_ONCE(!static_key_slow_try_dec(key));
}
static void __static_key_slow_dec(struct static_key *key)
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled()
2024-06-10 12:46 ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
2024-06-10 12:46 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
2024-06-10 12:46 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
@ 2024-06-10 12:46 ` Thomas Gleixner
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
3 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
The second part of
if (v <= 0 || (v + 1) < 0)
is not immediately obvious that it acts as overflow protection.
Check explicitely for v == INT_MAX instead and add a proper comment how
this is used at the call sites.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/jump_label.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -132,12 +132,15 @@ bool static_key_fast_inc_not_disabled(st
/*
* Negative key->enabled has a special meaning: it sends
* static_key_slow_inc/dec() down the slow path, and it is non-zero
- * so it counts as "enabled" in jump_label_update(). Note that
- * atomic_inc_unless_negative() checks >= 0, so roll our own.
+ * so it counts as "enabled" in jump_label_update().
+ *
+ * The INT_MAX overflow condition is either used by the networking
+ * code to reset or detected in the slow path of
+ * static_key_slow_inc_cpuslocked().
*/
v = atomic_read(&key->enabled);
do {
- if (v <= 0 || (v + 1) < 0)
+ if (v <= 0 || v == INT_MAX)
return false;
} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)));
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()
2024-06-10 12:46 ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
` (2 preceding siblings ...)
2024-06-10 12:46 ` [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled() Thomas Gleixner
@ 2024-06-10 12:46 ` Thomas Gleixner
2024-06-12 13:57 ` Uros Bizjak
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
3 siblings, 2 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
Make the code more obvious and add proper comments to avoid future head
scratching.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/jump_label.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -162,22 +162,24 @@ bool static_key_slow_inc_cpuslocked(stru
if (static_key_fast_inc_not_disabled(key))
return true;
- jump_label_lock();
- if (atomic_read(&key->enabled) == 0) {
- atomic_set(&key->enabled, -1);
+ guard(mutex)(&jump_label_mutex);
+ /* Try to mark it as 'enabling in progress. */
+ if (!atomic_cmpxchg(&key->enabled, 0, -1)) {
jump_label_update(key);
/*
- * Ensure that if the above cmpxchg loop observes our positive
- * value, it must also observe all the text changes.
+ * Ensure that when static_key_fast_inc_not_disabled() or
+ * static_key_slow_try_dec() observe the positive value,
+ * they must also observe all the text changes.
*/
atomic_set_release(&key->enabled, 1);
} else {
- if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key))) {
- jump_label_unlock();
+ /*
+ * While holding the mutex this should never observe
+ * anything else than a value >= 1 and succeed
+ */
+ if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key)))
return false;
- }
}
- jump_label_unlock();
return true;
}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec()
2024-06-10 12:46 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
@ 2024-06-10 17:57 ` Peter Zijlstra
2024-06-10 18:00 ` Thomas Gleixner
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2024-06-10 17:57 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
On Mon, Jun 10, 2024 at 02:46:36PM +0200, Thomas Gleixner wrote:
> @@ -247,20 +247,25 @@ EXPORT_SYMBOL_GPL(static_key_disable);
>
> static bool static_key_slow_try_dec(struct static_key *key)
> {
> + int v;
>
> /*
> + * Go into the slow path if key::enabled is less than or equal than
> + * one. One is valid to shut down the key, anything less than one
> + * is an imbalance, which is handled at the call site.
> + *
> + * That includes the special case of '-1' which is set in
> + * static_key_slow_inc_cpuslocked(), but that's harmless as it is
> + * fully serialized in the slow path below. By the time this task
> + * acquires the jump label lock the value is back to one and the
> + * retry under the lock must succeed.
Harmless yes, but it really should not happen to begin with. If this
happens it means someone wants to disable a key that is in the middle of
getting enabled for the first time.
I'm tempted to want a WARN here instead. Hmm?
> */
> + v = atomic_read(&key->enabled);
> + do {
> + if (v <= 1)
> + return false;
> + } while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1)));
> +
> return true;
> }
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec()
2024-06-10 17:57 ` Peter Zijlstra
@ 2024-06-10 18:00 ` Thomas Gleixner
0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 18:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
On Mon, Jun 10 2024 at 19:57, Peter Zijlstra wrote:
> On Mon, Jun 10, 2024 at 02:46:36PM +0200, Thomas Gleixner wrote:
>
>> @@ -247,20 +247,25 @@ EXPORT_SYMBOL_GPL(static_key_disable);
>>
>> static bool static_key_slow_try_dec(struct static_key *key)
>> {
>> + int v;
>>
>> /*
>> + * Go into the slow path if key::enabled is less than or equal than
>> + * one. One is valid to shut down the key, anything less than one
>> + * is an imbalance, which is handled at the call site.
>> + *
>> + * That includes the special case of '-1' which is set in
>> + * static_key_slow_inc_cpuslocked(), but that's harmless as it is
>> + * fully serialized in the slow path below. By the time this task
>> + * acquires the jump label lock the value is back to one and the
>> + * retry under the lock must succeed.
>
> Harmless yes, but it really should not happen to begin with. If this
> happens it means someone wants to disable a key that is in the middle of
> getting enabled for the first time.
>
> I'm tempted to want a WARN here instead. Hmm?
No strong opinion
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()
2024-06-10 12:46 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
@ 2024-06-12 13:57 ` Uros Bizjak
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
1 sibling, 0 replies; 21+ messages in thread
From: Uros Bizjak @ 2024-06-12 13:57 UTC (permalink / raw)
To: Thomas Gleixner, LKML
Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
> Make the code more obvious and add proper comments to avoid future head
> scratching.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> kernel/jump_label.c | 20 +++++++++++---------
> 1 file changed, 11 insertions(+), 9 deletions(-)
>
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -162,22 +162,24 @@ bool static_key_slow_inc_cpuslocked(stru
> if (static_key_fast_inc_not_disabled(key))
> return true;
>
> - jump_label_lock();
> - if (atomic_read(&key->enabled) == 0) {
> - atomic_set(&key->enabled, -1);
> + guard(mutex)(&jump_label_mutex);
> + /* Try to mark it as 'enabling in progress. */
Missing closing quotation mark above.
> + if (!atomic_cmpxchg(&key->enabled, 0, -1)) {
This can be:
int tmp = 0;
if (atomic_try_cmpxchg(&key->enabled, &tmp, -1)) {
...
and will result in more optimal code and will IMO also be more readable
because it is clear that the code is executed when cmpxchg succeeds.
(BTW: The atomic_read()/atomic_set() pair can also be converted to
cmpxchg in static_key_enable_cpuslocked().)
Uros.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [tip: locking/core] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()
2024-06-10 12:46 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
2024-06-12 13:57 ` Uros Bizjak
@ 2024-06-17 15:47 ` tip-bot2 for Thomas Gleixner
1 sibling, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-06-17 15:47 UTC (permalink / raw)
To: linux-tip-commits
Cc: Thomas Gleixner, Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the locking/core branch of tip:
Commit-ID: 9bc2ff871f00437ad2f10c1eceff51aaa72b478f
Gitweb: https://git.kernel.org/tip/9bc2ff871f00437ad2f10c1eceff51aaa72b478f
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 10 Jun 2024 14:46:39 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 11 Jun 2024 11:25:24 +02:00
jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()
Make the code more obvious and add proper comments to avoid future head
scratching.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.548322963@linutronix.de
---
kernel/jump_label.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 4d06ec2..4ad5ed8 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -162,22 +162,24 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key)
if (static_key_fast_inc_not_disabled(key))
return true;
- jump_label_lock();
- if (atomic_read(&key->enabled) == 0) {
- atomic_set(&key->enabled, -1);
+ guard(mutex)(&jump_label_mutex);
+ /* Try to mark it as 'enabling in progress. */
+ if (!atomic_cmpxchg(&key->enabled, 0, -1)) {
jump_label_update(key);
/*
- * Ensure that if the above cmpxchg loop observes our positive
- * value, it must also observe all the text changes.
+ * Ensure that when static_key_fast_inc_not_disabled() or
+ * static_key_slow_try_dec() observe the positive value,
+ * they must also observe all the text changes.
*/
atomic_set_release(&key->enabled, 1);
} else {
- if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key))) {
- jump_label_unlock();
+ /*
+ * While holding the mutex this should never observe
+ * anything else than a value >= 1 and succeed
+ */
+ if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key)))
return false;
- }
}
- jump_label_unlock();
return true;
}
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip: locking/core] jump_label: Clarify condition in static_key_fast_inc_not_disabled()
2024-06-10 12:46 ` [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled() Thomas Gleixner
@ 2024-06-17 15:47 ` tip-bot2 for Thomas Gleixner
0 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-06-17 15:47 UTC (permalink / raw)
To: linux-tip-commits
Cc: Thomas Gleixner, Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the locking/core branch of tip:
Commit-ID: 695ef796467ed228b60f1915995e390aea3d85c6
Gitweb: https://git.kernel.org/tip/695ef796467ed228b60f1915995e390aea3d85c6
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 10 Jun 2024 14:46:37 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 11 Jun 2024 11:25:23 +02:00
jump_label: Clarify condition in static_key_fast_inc_not_disabled()
The second part of
if (v <= 0 || (v + 1) < 0)
is not immediately obvious that it acts as overflow protection.
Check explicitely for v == INT_MAX instead and add a proper comment how
this is used at the call sites.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.484973160@linutronix.de
---
kernel/jump_label.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 1f05a19..4d06ec2 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -132,12 +132,15 @@ bool static_key_fast_inc_not_disabled(struct static_key *key)
/*
* Negative key->enabled has a special meaning: it sends
* static_key_slow_inc/dec() down the slow path, and it is non-zero
- * so it counts as "enabled" in jump_label_update(). Note that
- * atomic_inc_unless_negative() checks >= 0, so roll our own.
+ * so it counts as "enabled" in jump_label_update().
+ *
+ * The INT_MAX overflow condition is either used by the networking
+ * code to reset or detected in the slow path of
+ * static_key_slow_inc_cpuslocked().
*/
v = atomic_read(&key->enabled);
do {
- if (v <= 0 || (v + 1) < 0)
+ if (v <= 0 || v == INT_MAX)
return false;
} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)));
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip: locking/core] perf/x86: Serialize set_attr_rdpmc()
2024-06-10 12:46 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
@ 2024-06-17 15:47 ` tip-bot2 for Thomas Gleixner
0 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-06-17 15:47 UTC (permalink / raw)
To: linux-tip-commits
Cc: Yue Sun, Xingwei Lee, Thomas Gleixner, Peter Zijlstra (Intel),
x86, linux-kernel
The following commit has been merged into the locking/core branch of tip:
Commit-ID: bb9bb45f746b0f9457de9c3fc4da143a6351bdc9
Gitweb: https://git.kernel.org/tip/bb9bb45f746b0f9457de9c3fc4da143a6351bdc9
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 10 Jun 2024 14:46:35 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 11 Jun 2024 11:25:22 +02:00
perf/x86: Serialize set_attr_rdpmc()
Yue and Xingwei reported a jump label failure. It's caused by the lack of
serialization in set_attr_rdpmc():
CPU0 CPU1
Assume: x86_pmu.attr_rdpmc == 0
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
static_branch_dec(&rdpmc_never_available_key);
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
FAIL, due to imbalance ---> static_branch_dec(&rdpmc_never_available_key);
The reported BUG() is a consequence of the above and of another bug in the
jump label core code. The core code needs a separate fix, but that cannot
prevent the imbalance problem caused by set_attr_rdpmc().
Prevent this by serializing set_attr_rdpmc() locally.
Fixes: a66734297f78 ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks")
Closes: https://lore.kernel.org/r/CAEkJfYNzfW1vG=ZTMdz_Weoo=RXY1NDunbxnDaLyj8R4kEoE_w@mail.gmail.com
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.359476013@linutronix.de
---
arch/x86/events/core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5b0dd07..acd367c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2547,6 +2547,7 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
struct device_attribute *attr,
const char *buf, size_t count)
{
+ static DEFINE_MUTEX(rdpmc_mutex);
unsigned long val;
ssize_t ret;
@@ -2560,6 +2561,8 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
if (x86_pmu.attr_rdpmc_broken)
return -ENOTSUPP;
+ guard(mutex)(&rdpmc_mutex);
+
if (val != x86_pmu.attr_rdpmc) {
/*
* Changing into or out of never available or always available,
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip: locking/core] jump_label: Fix concurrency issues in static_key_slow_dec()
2024-06-10 12:46 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
2024-06-10 17:57 ` Peter Zijlstra
@ 2024-06-17 15:47 ` tip-bot2 for Thomas Gleixner
1 sibling, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-06-17 15:47 UTC (permalink / raw)
To: linux-tip-commits
Cc: Yue Sun, Xingwei Lee, Thomas Gleixner, Peter Zijlstra (Intel),
x86, linux-kernel
The following commit has been merged into the locking/core branch of tip:
Commit-ID: 83ab38ef0a0b2407d43af9575bb32333fdd74fb2
Gitweb: https://git.kernel.org/tip/83ab38ef0a0b2407d43af9575bb32333fdd74fb2
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 10 Jun 2024 14:46:36 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 11 Jun 2024 11:25:23 +02:00
jump_label: Fix concurrency issues in static_key_slow_dec()
The commit which tried to fix the concurrency issues of concurrent
static_key_slow_inc() failed to fix the equivalent issues
vs. static_key_slow_dec():
CPU0 CPU1
static_key_slow_dec()
static_key_slow_try_dec()
key->enabled == 1
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
if (val == 1)
return false;
jump_label_lock();
if (atomic_dec_and_test(&key->enabled)) {
--> key->enabled == 0
__jump_label_update()
static_key_slow_dec()
static_key_slow_try_dec()
key->enabled == 0
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
--> key->enabled == -1 <- FAIL
There is another bug in that code, when there is a concurrent
static_key_slow_inc() which enables the key as that sets key->enabled to -1
so on the other CPU
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
will succeed and decrement to -2, which is invalid.
Cure all of this by replacing the atomic_fetch_add_unless() with a
atomic_try_cmpxchg() loop similar to static_key_fast_inc_not_disabled().
[peterz: add WARN_ON_ONCE for the -1 race]
Fixes: 4c5ea0a9cd02 ("locking/static_key: Fix concurrent static_key_slow_inc()")
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.422897838@linutronix.de
---
kernel/jump_label.c | 45 ++++++++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 3218fa5..1f05a19 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -131,7 +131,7 @@ bool static_key_fast_inc_not_disabled(struct static_key *key)
STATIC_KEY_CHECK_USE(key);
/*
* Negative key->enabled has a special meaning: it sends
- * static_key_slow_inc() down the slow path, and it is non-zero
+ * static_key_slow_inc/dec() down the slow path, and it is non-zero
* so it counts as "enabled" in jump_label_update(). Note that
* atomic_inc_unless_negative() checks >= 0, so roll our own.
*/
@@ -150,7 +150,7 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key)
lockdep_assert_cpus_held();
/*
- * Careful if we get concurrent static_key_slow_inc() calls;
+ * Careful if we get concurrent static_key_slow_inc/dec() calls;
* later calls must wait for the first one to _finish_ the
* jump_label_update() process. At the same time, however,
* the jump_label_update() call below wants to see
@@ -247,20 +247,32 @@ EXPORT_SYMBOL_GPL(static_key_disable);
static bool static_key_slow_try_dec(struct static_key *key)
{
- int val;
-
- val = atomic_fetch_add_unless(&key->enabled, -1, 1);
- if (val == 1)
- return false;
+ int v;
/*
- * The negative count check is valid even when a negative
- * key->enabled is in use by static_key_slow_inc(); a
- * __static_key_slow_dec() before the first static_key_slow_inc()
- * returns is unbalanced, because all other static_key_slow_inc()
- * instances block while the update is in progress.
+ * Go into the slow path if key::enabled is less than or equal than
+ * one. One is valid to shut down the key, anything less than one
+ * is an imbalance, which is handled at the call site.
+ *
+ * That includes the special case of '-1' which is set in
+ * static_key_slow_inc_cpuslocked(), but that's harmless as it is
+ * fully serialized in the slow path below. By the time this task
+ * acquires the jump label lock the value is back to one and the
+ * retry under the lock must succeed.
*/
- WARN(val < 0, "jump label: negative count!\n");
+ v = atomic_read(&key->enabled);
+ do {
+ /*
+ * Warn about the '-1' case though; since that means a
+ * decrement is concurrent with a first (0->1) increment. IOW
+ * people are trying to disable something that wasn't yet fully
+ * enabled. This suggests an ordering problem on the user side.
+ */
+ WARN_ON_ONCE(v < 0);
+ if (v <= 1)
+ return false;
+ } while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1)));
+
return true;
}
@@ -271,10 +283,11 @@ static void __static_key_slow_dec_cpuslocked(struct static_key *key)
if (static_key_slow_try_dec(key))
return;
- jump_label_lock();
- if (atomic_dec_and_test(&key->enabled))
+ guard(mutex)(&jump_label_mutex);
+ if (atomic_cmpxchg(&key->enabled, 1, 0))
jump_label_update(key);
- jump_label_unlock();
+ else
+ WARN_ON_ONCE(!static_key_slow_try_dec(key));
}
static void __static_key_slow_dec(struct static_key *key)
^ permalink raw reply related [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-06-17 15:47 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-09 6:33 [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked Sam Sun
2024-06-09 13:04 ` Steven Rostedt
2024-06-09 14:06 ` Thomas Gleixner
2024-06-09 14:25 ` Steven Rostedt
2024-06-09 16:02 ` Thomas Gleixner
2024-06-09 16:56 ` Thomas Gleixner
2024-06-09 19:39 ` Thomas Gleixner
2024-06-10 6:46 ` Peter Zijlstra
2024-06-10 10:34 ` Thomas Gleixner
2024-06-10 12:46 ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
2024-06-10 12:46 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
2024-06-10 17:57 ` Peter Zijlstra
2024-06-10 18:00 ` Thomas Gleixner
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46 ` [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled() Thomas Gleixner
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
2024-06-12 13:57 ` Uros Bizjak
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.