* [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
@ 2024-06-09 6:33 Sam Sun
2024-06-09 13:04 ` Steven Rostedt
0 siblings, 1 reply; 21+ messages in thread
From: Sam Sun @ 2024-06-09 6:33 UTC (permalink / raw)
To: linux-kernel, x86
Cc: syzkaller-bugs, peterz, jpoimboe, jbaron, rostedt, ardb, tglx,
mingo, Borislav Petkov, dave.hansen, hpa, xrivendell7
Dear developers and maintainers,
We encountered a kernel bug while using our modified
syzkaller. It was tested against the latest upstream kernel.
Kernel crash log is listed below.
```
[ 82.310798][ T8020] ------------[ cut here ]------------
[ 82.311236][ T8020] kernel BUG at arch/x86/kernel/jump_label.c:73!
[ 82.311761][ T8020] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 82.312331][ T8020] CPU: 0 PID: 8020 Comm: static_key_slow Not
tainted 6.10.0-rc2-00366-g771ed66105de 3
[ 82.313044][ T8020] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.13.0-1ubuntu1.1 04/04
[ 82.313778][ T8020] RIP: 0010:__jump_label_patch+0x456/0x480
[ 82.314223][ T8020] Code: e8 af a1 5d 00 48 c7 c7 40 01 86 8b 48 8b
0c 24 48 89 ce 48 89 ca 4d 89 e8 4f
[ 82.315745][ T8020] RSP: 0018:ffffc9000f9cf980 EFLAGS: 00010292
[ 82.316204][ T8020] RAX: 000000000000008b RBX: 0000000000000085
RCX: 218564eba7c06800
[ 82.316813][ T8020] RDX: 0000000000000000 RSI: 0000000080000000
RDI: 0000000000000000
[ 82.317396][ T8020] RBP: ffffc9000f9cfac0 R08: ffffffff817176dc
R09: 1ffff92001f39ed0
[ 82.317987][ T8020] R10: dffffc0000000000 R11: fffff52001f39ed1
R12: 0000000000000001
[ 82.318548][ T8020] R13: ffffffff8b862901 R14: ffffffff90fbe8c0
R15: ffffffff8b862901
[ 82.319116][ T8020] FS: 00007f04610bc540(0000)
GS:ffff88802c800000(0000) knlGS:0000000000000000
[ 82.319808][ T8020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 82.320308][ T8020] CR2: 00007f0460fd9650 CR3: 000000004a16a000
CR4: 0000000000750ef0
[ 82.320883][ T8020] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[ 82.321460][ T8020] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[ 82.322004][ T8020] PKRU: 55555554
[ 82.322269][ T8020] Call Trace:
[ 82.322515][ T8020] <TASK>
[ 82.322729][ T8020] ? __die_body+0x88/0xe0
[ 82.323043][ T8020] ? die+0xcf/0x110
[ 82.323322][ T8020] ? do_trap+0x155/0x3a0
[ 82.323625][ T8020] ? __jump_label_patch+0x456/0x480
[ 82.324009][ T8020] ? do_error_trap+0x1dc/0x2a0
[ 82.324371][ T8020] ? __jump_label_patch+0x456/0x480
[ 82.324774][ T8020] ? do_int3+0x50/0x50
[ 82.325088][ T8020] ? handle_invalid_op+0x34/0x40
[ 82.325445][ T8020] ? __jump_label_patch+0x456/0x480
[ 82.325815][ T8020] ? exc_invalid_op+0x34/0x50
[ 82.326156][ T8020] ? asm_exc_invalid_op+0x1a/0x20
[ 82.326533][ T8020] ? __wake_up_klogd+0xcc/0x100
[ 82.326879][ T8020] ? __jump_label_patch+0x456/0x480
[ 82.327249][ T8020] ? cr4_update_pce+0xf/0x80
[ 82.327601][ T8020] ? arch_jump_label_transform_queue+0xf0/0xf0
[ 82.328047][ T8020] ? cr4_update_pce+0xf/0x80
[ 82.328380][ T8020] ? cr4_update_pce+0x1e/0x80
[ 82.328717][ T8020] ? cr4_update_pce+0x11/0x80
[ 82.329052][ T8020] ? read_lock_is_recursive+0x20/0x20
[ 82.329466][ T8020] ? __static_key_slow_dec_cpuslocked+0xb0/0x140
[ 82.329966][ T8020] ? text_poke_queue+0x46/0x180
[ 82.330329][ T8020] arch_jump_label_transform_queue+0x63/0xf0
[ 82.330742][ T8020] __jump_label_update+0x18b/0x3b0
[ 82.331101][ T8020] __static_key_slow_dec_cpuslocked+0xe9/0x140
[ 82.331519][ T8020] static_key_slow_dec+0x50/0xa0
[ 82.331873][ T8020] set_attr_rdpmc+0x193/0x270
[ 82.332179][ T8020] ? get_attr_rdpmc+0x30/0x30
[ 82.332511][ T8020] ? sysfs_kf_write+0x18d/0x2b0
[ 82.332832][ T8020] ? sysfs_kf_read+0x370/0x370
[ 82.333159][ T8020] kernfs_fop_write_iter+0x3ab/0x500
[ 82.333531][ T8020] vfs_write+0xa8b/0xd00
[ 82.333812][ T8020] ? kernfs_fop_read_iter+0x640/0x640
[ 82.334234][ T8020] ? kernel_write+0x330/0x330
[ 82.334598][ T8020] ? do_sys_openat2+0x16b/0x1c0
[ 82.334949][ T8020] ? do_sys_open+0x230/0x230
[ 82.335310][ T8020] ? __up_read+0x2bb/0x6a0
[ 82.335633][ T8020] ksys_write+0x17b/0x2a0
[ 82.335966][ T8020] ? __ia32_sys_read+0x90/0x90
[ 82.336333][ T8020] ? do_syscall_64+0xa5/0x230
[ 82.336656][ T8020] do_syscall_64+0xe2/0x230
[ 82.336983][ T8020] ? clear_bhb_loop+0x25/0x80
[ 82.337289][ T8020] entry_SYSCALL_64_after_hwframe+0x67/0x6f
[ 82.337692][ T8020] RIP: 0033:0x7f0460fd2473
[ 82.337987][ T8020] Code: 8b 15 21 2a 0e 00 f7 d8 64 89 02 48 c7 c0
ff ff ff ff eb b7 0f 1f 00 64 8b 08
[ 82.339230][ T8020] RSP: 002b:00007ffc46fb3058 EFLAGS: 00000246
ORIG_RAX: 0000000000000001
[ 82.339804][ T8020] RAX: ffffffffffffffda RBX: 0000000000000000
RCX: 00007f0460fd2473
[ 82.340331][ T8020] RDX: 0000000000000002 RSI: 00007ffc46fb3090
RDI: 0000000000000004
[ 82.340852][ T8020] RBP: 00007ffc46fb3cb0 R08: 000000000000000f
R09: 0072736d2f232f75
[ 82.341353][ T8020] R10: 0000000000000000 R11: 0000000000000246
R12: 00005615dc1dd1c0
[ 82.341853][ T8020] R13: 0000000000000000 R14: 0000000000000000
R15: 0000000000000000
[ 82.342364][ T8020] </TASK>
[ 82.342554][ T8020] Modules linked in:
[ 82.342875][ T8020] ---[ end trace 0000000000000000 ]---
[ 82.343230][ T8020] RIP: 0010:__jump_label_patch+0x456/0x480
[ 82.343668][ T8020] Code: e8 af a1 5d 00 48 c7 c7 40 01 86 8b 48 8b
0c 24 48 89 ce 48 89 ca 4d 89 e8 4f
[ 82.345750][ T8020] RSP: 0018:ffffc9000f9cf980 EFLAGS: 00010292
[ 82.346161][ T8020] RAX: 000000000000008b RBX: 0000000000000085
RCX: 218564eba7c06800
[ 82.346657][ T8020] RDX: 0000000000000000 RSI: 0000000080000000
RDI: 0000000000000000
[ 82.347391][ T8020] RBP: ffffc9000f9cfac0 R08: ffffffff817176dc
R09: 1ffff92001f39ed0
[ 82.347901][ T8020] R10: dffffc0000000000 R11: fffff52001f39ed1
R12: 0000000000000001
[ 82.348652][ T8020] R13: ffffffff8b862901 R14: ffffffff90fbe8c0
R15: ffffffff8b862901
[ 82.349173][ T8020] FS: 00007f04610bc540(0000)
GS:ffff88802c800000(0000) knlGS:0000000000000000
[ 82.349784][ T8020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 82.350196][ T8020] CR2: 00007f96d80450b8 CR3: 000000004a16a000
CR4: 0000000000750ef0
[ 82.350717][ T8020] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[ 82.351243][ T8020] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[ 82.351769][ T8020] PKRU: 55555554
[ 82.352011][ T8020] Kernel panic - not syncing: Fatal exception
[ 82.352665][ T8020] Kernel Offset: disabled
[ 82.352988][ T8020] Rebooting in 86400 seconds..
```
We have a C repro which can trigger this bug in latest upstream kernel
commit, listed below:
```
#define _GNU_SOURCE
#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
static unsigned long long procid;
static void sleep_ms(uint64_t ms)
{
usleep(ms * 1000);
}
static uint64_t current_time_ms(void)
{
struct timespec ts;
if (clock_gettime(CLOCK_MONOTONIC, &ts))
exit(1);
return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}
static bool write_file(const char* file, const char* what, ...)
{
char buf[1024];
va_list args;
va_start(args, what);
vsnprintf(buf, sizeof(buf), what, args);
va_end(args);
buf[sizeof(buf) - 1] = 0;
int len = strlen(buf);
int fd = open(file, O_WRONLY | O_CLOEXEC);
if (fd == -1)
return false;
if (write(fd, buf, len) != len) {
int err = errno;
close(fd);
errno = err;
return false;
}
close(fd);
return true;
}
static long syz_mod_dev(volatile long a0, volatile long a1, volatile long a2,
volatile long a3, volatile long a4)
{
int fd, sysfd;
char buf[1024], sysbuf[1024], input[1024];
char* hash;
char dev_num;
dev_num = 0;
strncpy(buf, (char*)a0, sizeof(buf) - 1);
buf[sizeof(buf) - 1] = 0;
while ((hash = strchr(buf, '#'))) {
*hash = '0' + (char)(a1 % 10);
a1 /= 10;
if (dev_num == 0)
dev_num = *hash;
}
fd = open(buf, a2, 0);
strncpy(sysbuf, (char*)a3, sizeof(sysbuf) - 1);
sysbuf[sizeof(sysbuf) - 1] = 0;
if ((hash = strchr(sysbuf, '#'))) {
*hash = dev_num;
while ((hash = strchr(sysbuf, '#'))) {
*hash = '0' + (char)(a1 % 10);
a1 /= 10;
}
}
sysfd = open(sysbuf, O_RDWR, 0);
strncpy(input, (char*)a4, sizeof(input) - 1);
input[sizeof(input) - 1] = 0;
hash = strchr(input, '\0');
sysfd = write(sysfd, input, hash - input + 1);
return fd;
}
static void kill_and_wait(int pid, int* status)
{
kill(-pid, SIGKILL);
kill(pid, SIGKILL);
for (int i = 0; i < 100; i++) {
if (waitpid(-1, status, WNOHANG | __WALL) == pid)
return;
usleep(1000);
}
DIR* dir = opendir("/sys/fs/fuse/connections");
if (dir) {
for (;;) {
struct dirent* ent = readdir(dir);
if (!ent)
break;
if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
continue;
char abort[300];
snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
ent->d_name);
int fd = open(abort, O_WRONLY);
if (fd == -1) {
continue;
}
if (write(fd, abort, 1) < 0) {
}
close(fd);
}
closedir(dir);
} else {
}
while (waitpid(-1, status, __WALL) != pid) {
}
}
static void setup_test()
{
prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
setpgrp();
write_file("/proc/self/oom_score_adj", "1000");
}
static void execute_one(void);
#define WAIT_FLAGS __WALL
static void loop(void)
{
int iter = 0;
for (;; iter++) {
int pid = fork();
if (pid < 0)
exit(1);
if (pid == 0) {
setup_test();
execute_one();
exit(0);
}
int status = 0;
uint64_t start = current_time_ms();
for (;;) {
if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
break;
sleep_ms(1);
if (current_time_ms() - start < 5000)
continue;
kill_and_wait(pid, &status);
break;
}
}
}
void execute_one(void)
{
memcpy((void*)0x20000000, "/dev/cpu/#/msr\000", 15);
memcpy((void*)0x200000c0, "/sys/devices/cpu/rdpmc\000", 23);
memcpy((void*)0x20000100, "0\000", 2);
syz_mod_dev(
/*dev=*/0x20000000, /*id=*/0x8001,
/*flags=O_TRUNC|O_PATH|O_NONBLOCK|O_NOATIME|O_EXCL|O_DIRECTORY|0x442*/
0x250ec2, /*file=*/0x200000c0, /*buf=*/0x20000100);
memcpy((void*)0x20000080, "/dev/cpu/#/msr\000", 15);
memcpy((void*)0x20000140, "/sys/devices/cpu/rdpmc\000", 23);
memcpy((void*)0x20000180, "1\000", 2);
syz_mod_dev(/*dev=*/0x20000080, /*id=*/0,
/*flags=__O_TMPFILE|O_APPEND*/ 0x400400, /*file=*/0x20000140,
/*buf=*/0x20000180);
}
int main(void)
{
syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
/*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
/*offset=*/0ul);
syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
/*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
/*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
/*offset=*/0ul);
syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
/*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
/*offset=*/0ul);
for (procid = 0; procid < 4; procid++) {
if (fork() == 0) {
loop();
}
}
sleep(1000000);
return 0;
}
```
If you have any questions, please contact us.
Reported by Yue Sun <samsun1006219@gmail.com>
Reported by xingwei lee <xrivendell7@gmail.com>
Best Regards,
Yue
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 6:33 [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked Sam Sun
@ 2024-06-09 13:04 ` Steven Rostedt
2024-06-09 14:06 ` Thomas Gleixner
0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2024-06-09 13:04 UTC (permalink / raw)
To: Sam Sun
Cc: linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe, jbaron, ardb,
tglx, mingo, Borislav Petkov, dave.hansen, hpa, xrivendell7,
Greg Kroah-Hartman, Tejun Heo
On Sun, 9 Jun 2024 14:33:01 +0800
Sam Sun <samsun1006219@gmail.com> wrote:
> Dear developers and maintainers,
>
> We encountered a kernel bug while using our modified
> syzkaller. It was tested against the latest upstream kernel.
> Kernel crash log is listed below.
>
> ```
> [ 82.310798][ T8020] ------------[ cut here ]------------
> [ 82.311236][ T8020] kernel BUG at arch/x86/kernel/jump_label.c:73!
This is not a bug with jump labels. It's a bug with whatever is using jump
labels. Looks like something tried to modify a jump label that no longer
exists.
> [ 82.311761][ T8020] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
> [ 82.312331][ T8020] CPU: 0 PID: 8020 Comm: static_key_slow Not
> tainted 6.10.0-rc2-00366-g771ed66105de 3
> [ 82.313044][ T8020] Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996), BIOS 1.13.0-1ubuntu1.1 04/04
> [ 82.313778][ T8020] RIP: 0010:__jump_label_patch+0x456/0x480
> [ 82.314223][ T8020] Code: e8 af a1 5d 00 48 c7 c7 40 01 86 8b 48 8b
> 0c 24 48 89 ce 48 89 ca 4d 89 e8 4f
> [ 82.315745][ T8020] RSP: 0018:ffffc9000f9cf980 EFLAGS: 00010292
> [ 82.316204][ T8020] RAX: 000000000000008b RBX: 0000000000000085
> RCX: 218564eba7c06800
> [ 82.316813][ T8020] RDX: 0000000000000000 RSI: 0000000080000000
> RDI: 0000000000000000
> [ 82.317396][ T8020] RBP: ffffc9000f9cfac0 R08: ffffffff817176dc
> R09: 1ffff92001f39ed0
> [ 82.317987][ T8020] R10: dffffc0000000000 R11: fffff52001f39ed1
> R12: 0000000000000001
> [ 82.318548][ T8020] R13: ffffffff8b862901 R14: ffffffff90fbe8c0
> R15: ffffffff8b862901
> [ 82.319116][ T8020] FS: 00007f04610bc540(0000)
> GS:ffff88802c800000(0000) knlGS:0000000000000000
> [ 82.319808][ T8020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 82.320308][ T8020] CR2: 00007f0460fd9650 CR3: 000000004a16a000
> CR4: 0000000000750ef0
> [ 82.320883][ T8020] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [ 82.321460][ T8020] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
> [ 82.322004][ T8020] PKRU: 55555554
> [ 82.322269][ T8020] Call Trace:
> [ 82.322515][ T8020] <TASK>
> [ 82.322729][ T8020] ? __die_body+0x88/0xe0
> [ 82.323043][ T8020] ? die+0xcf/0x110
> [ 82.323322][ T8020] ? do_trap+0x155/0x3a0
> [ 82.323625][ T8020] ? __jump_label_patch+0x456/0x480
> [ 82.324009][ T8020] ? do_error_trap+0x1dc/0x2a0
> [ 82.324371][ T8020] ? __jump_label_patch+0x456/0x480
> [ 82.324774][ T8020] ? do_int3+0x50/0x50
> [ 82.325088][ T8020] ? handle_invalid_op+0x34/0x40
> [ 82.325445][ T8020] ? __jump_label_patch+0x456/0x480
> [ 82.325815][ T8020] ? exc_invalid_op+0x34/0x50
> [ 82.326156][ T8020] ? asm_exc_invalid_op+0x1a/0x20
> [ 82.326533][ T8020] ? __wake_up_klogd+0xcc/0x100
> [ 82.326879][ T8020] ? __jump_label_patch+0x456/0x480
> [ 82.327249][ T8020] ? cr4_update_pce+0xf/0x80
> [ 82.327601][ T8020] ? arch_jump_label_transform_queue+0xf0/0xf0
> [ 82.328047][ T8020] ? cr4_update_pce+0xf/0x80
> [ 82.328380][ T8020] ? cr4_update_pce+0x1e/0x80
> [ 82.328717][ T8020] ? cr4_update_pce+0x11/0x80
> [ 82.329052][ T8020] ? read_lock_is_recursive+0x20/0x20
> [ 82.329466][ T8020] ? __static_key_slow_dec_cpuslocked+0xb0/0x140
> [ 82.329966][ T8020] ? text_poke_queue+0x46/0x180
> [ 82.330329][ T8020] arch_jump_label_transform_queue+0x63/0xf0
> [ 82.330742][ T8020] __jump_label_update+0x18b/0x3b0
> [ 82.331101][ T8020] __static_key_slow_dec_cpuslocked+0xe9/0x140
> [ 82.331519][ T8020] static_key_slow_dec+0x50/0xa0
> [ 82.331873][ T8020] set_attr_rdpmc+0x193/0x270
> [ 82.332179][ T8020] ? get_attr_rdpmc+0x30/0x30
> [ 82.332511][ T8020] ? sysfs_kf_write+0x18d/0x2b0
> [ 82.332832][ T8020] ? sysfs_kf_read+0x370/0x370
> [ 82.333159][ T8020] kernfs_fop_write_iter+0x3ab/0x500
So, something in kernfs modified a jump label location that was freed?
-- Steve
> [ 82.333531][ T8020] vfs_write+0xa8b/0xd00
> [ 82.333812][ T8020] ? kernfs_fop_read_iter+0x640/0x640
> [ 82.334234][ T8020] ? kernel_write+0x330/0x330
> [ 82.334598][ T8020] ? do_sys_openat2+0x16b/0x1c0
> [ 82.334949][ T8020] ? do_sys_open+0x230/0x230
> [ 82.335310][ T8020] ? __up_read+0x2bb/0x6a0
> [ 82.335633][ T8020] ksys_write+0x17b/0x2a0
> [ 82.335966][ T8020] ? __ia32_sys_read+0x90/0x90
> [ 82.336333][ T8020] ? do_syscall_64+0xa5/0x230
> [ 82.336656][ T8020] do_syscall_64+0xe2/0x230
> [ 82.336983][ T8020] ? clear_bhb_loop+0x25/0x80
> [ 82.337289][ T8020] entry_SYSCALL_64_after_hwframe+0x67/0x6f
> [ 82.337692][ T8020] RIP: 0033:0x7f0460fd2473
> [ 82.337987][ T8020] Code: 8b 15 21 2a 0e 00 f7 d8 64 89 02 48 c7 c0
> ff ff ff ff eb b7 0f 1f 00 64 8b 08
> [ 82.339230][ T8020] RSP: 002b:00007ffc46fb3058 EFLAGS: 00000246
> ORIG_RAX: 0000000000000001
> [ 82.339804][ T8020] RAX: ffffffffffffffda RBX: 0000000000000000
> RCX: 00007f0460fd2473
> [ 82.340331][ T8020] RDX: 0000000000000002 RSI: 00007ffc46fb3090
> RDI: 0000000000000004
> [ 82.340852][ T8020] RBP: 00007ffc46fb3cb0 R08: 000000000000000f
> R09: 0072736d2f232f75
> [ 82.341353][ T8020] R10: 0000000000000000 R11: 0000000000000246
> R12: 00005615dc1dd1c0
> [ 82.341853][ T8020] R13: 0000000000000000 R14: 0000000000000000
> R15: 0000000000000000
> [ 82.342364][ T8020] </TASK>
> [ 82.342554][ T8020] Modules linked in:
> [ 82.342875][ T8020] ---[ end trace 0000000000000000 ]---
> [ 82.343230][ T8020] RIP: 0010:__jump_label_patch+0x456/0x480
> [ 82.343668][ T8020] Code: e8 af a1 5d 00 48 c7 c7 40 01 86 8b 48 8b
> 0c 24 48 89 ce 48 89 ca 4d 89 e8 4f
> [ 82.345750][ T8020] RSP: 0018:ffffc9000f9cf980 EFLAGS: 00010292
> [ 82.346161][ T8020] RAX: 000000000000008b RBX: 0000000000000085
> RCX: 218564eba7c06800
> [ 82.346657][ T8020] RDX: 0000000000000000 RSI: 0000000080000000
> RDI: 0000000000000000
> [ 82.347391][ T8020] RBP: ffffc9000f9cfac0 R08: ffffffff817176dc
> R09: 1ffff92001f39ed0
> [ 82.347901][ T8020] R10: dffffc0000000000 R11: fffff52001f39ed1
> R12: 0000000000000001
> [ 82.348652][ T8020] R13: ffffffff8b862901 R14: ffffffff90fbe8c0
> R15: ffffffff8b862901
> [ 82.349173][ T8020] FS: 00007f04610bc540(0000)
> GS:ffff88802c800000(0000) knlGS:0000000000000000
> [ 82.349784][ T8020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 82.350196][ T8020] CR2: 00007f96d80450b8 CR3: 000000004a16a000
> CR4: 0000000000750ef0
> [ 82.350717][ T8020] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [ 82.351243][ T8020] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
> [ 82.351769][ T8020] PKRU: 55555554
> [ 82.352011][ T8020] Kernel panic - not syncing: Fatal exception
> [ 82.352665][ T8020] Kernel Offset: disabled
> [ 82.352988][ T8020] Rebooting in 86400 seconds..
> ```
>
> We have a C repro which can trigger this bug in latest upstream kernel
> commit, listed below:
>
> ```
> #define _GNU_SOURCE
>
> #include <dirent.h>
> #include <endian.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <signal.h>
> #include <stdarg.h>
> #include <stdbool.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/prctl.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <time.h>
> #include <unistd.h>
>
> static unsigned long long procid;
>
> static void sleep_ms(uint64_t ms)
> {
> usleep(ms * 1000);
> }
>
> static uint64_t current_time_ms(void)
> {
> struct timespec ts;
> if (clock_gettime(CLOCK_MONOTONIC, &ts))
> exit(1);
> return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
> }
>
> static bool write_file(const char* file, const char* what, ...)
> {
> char buf[1024];
> va_list args;
> va_start(args, what);
> vsnprintf(buf, sizeof(buf), what, args);
> va_end(args);
> buf[sizeof(buf) - 1] = 0;
> int len = strlen(buf);
> int fd = open(file, O_WRONLY | O_CLOEXEC);
> if (fd == -1)
> return false;
> if (write(fd, buf, len) != len) {
> int err = errno;
> close(fd);
> errno = err;
> return false;
> }
> close(fd);
> return true;
> }
>
> static long syz_mod_dev(volatile long a0, volatile long a1, volatile long a2,
> volatile long a3, volatile long a4)
> {
> int fd, sysfd;
> char buf[1024], sysbuf[1024], input[1024];
> char* hash;
> char dev_num;
> dev_num = 0;
> strncpy(buf, (char*)a0, sizeof(buf) - 1);
> buf[sizeof(buf) - 1] = 0;
> while ((hash = strchr(buf, '#'))) {
> *hash = '0' + (char)(a1 % 10);
> a1 /= 10;
> if (dev_num == 0)
> dev_num = *hash;
> }
> fd = open(buf, a2, 0);
> strncpy(sysbuf, (char*)a3, sizeof(sysbuf) - 1);
> sysbuf[sizeof(sysbuf) - 1] = 0;
> if ((hash = strchr(sysbuf, '#'))) {
> *hash = dev_num;
> while ((hash = strchr(sysbuf, '#'))) {
> *hash = '0' + (char)(a1 % 10);
> a1 /= 10;
> }
> }
> sysfd = open(sysbuf, O_RDWR, 0);
> strncpy(input, (char*)a4, sizeof(input) - 1);
> input[sizeof(input) - 1] = 0;
> hash = strchr(input, '\0');
> sysfd = write(sysfd, input, hash - input + 1);
> return fd;
> }
>
> static void kill_and_wait(int pid, int* status)
> {
> kill(-pid, SIGKILL);
> kill(pid, SIGKILL);
> for (int i = 0; i < 100; i++) {
> if (waitpid(-1, status, WNOHANG | __WALL) == pid)
> return;
> usleep(1000);
> }
> DIR* dir = opendir("/sys/fs/fuse/connections");
> if (dir) {
> for (;;) {
> struct dirent* ent = readdir(dir);
> if (!ent)
> break;
> if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
> continue;
> char abort[300];
> snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
> ent->d_name);
> int fd = open(abort, O_WRONLY);
> if (fd == -1) {
> continue;
> }
> if (write(fd, abort, 1) < 0) {
> }
> close(fd);
> }
> closedir(dir);
> } else {
> }
> while (waitpid(-1, status, __WALL) != pid) {
> }
> }
>
> static void setup_test()
> {
> prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
> setpgrp();
> write_file("/proc/self/oom_score_adj", "1000");
> }
>
> static void execute_one(void);
>
> #define WAIT_FLAGS __WALL
>
> static void loop(void)
> {
> int iter = 0;
> for (;; iter++) {
> int pid = fork();
> if (pid < 0)
> exit(1);
> if (pid == 0) {
> setup_test();
> execute_one();
> exit(0);
> }
> int status = 0;
> uint64_t start = current_time_ms();
> for (;;) {
> if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
> break;
> sleep_ms(1);
> if (current_time_ms() - start < 5000)
> continue;
> kill_and_wait(pid, &status);
> break;
> }
> }
> }
>
> void execute_one(void)
> {
> memcpy((void*)0x20000000, "/dev/cpu/#/msr\000", 15);
> memcpy((void*)0x200000c0, "/sys/devices/cpu/rdpmc\000", 23);
> memcpy((void*)0x20000100, "0\000", 2);
> syz_mod_dev(
> /*dev=*/0x20000000, /*id=*/0x8001,
> /*flags=O_TRUNC|O_PATH|O_NONBLOCK|O_NOATIME|O_EXCL|O_DIRECTORY|0x442*/
> 0x250ec2, /*file=*/0x200000c0, /*buf=*/0x20000100);
> memcpy((void*)0x20000080, "/dev/cpu/#/msr\000", 15);
> memcpy((void*)0x20000140, "/sys/devices/cpu/rdpmc\000", 23);
> memcpy((void*)0x20000180, "1\000", 2);
> syz_mod_dev(/*dev=*/0x20000080, /*id=*/0,
> /*flags=__O_TMPFILE|O_APPEND*/ 0x400400, /*file=*/0x20000140,
> /*buf=*/0x20000180);
> }
> int main(void)
> {
> syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> /*offset=*/0ul);
> syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
> /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> /*offset=*/0ul);
> syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> /*offset=*/0ul);
> for (procid = 0; procid < 4; procid++) {
> if (fork() == 0) {
> loop();
> }
> }
> sleep(1000000);
> return 0;
> }
> ```
>
> If you have any questions, please contact us.
>
> Reported by Yue Sun <samsun1006219@gmail.com>
> Reported by xingwei lee <xrivendell7@gmail.com>
>
> Best Regards,
> Yue
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 13:04 ` Steven Rostedt
@ 2024-06-09 14:06 ` Thomas Gleixner
2024-06-09 14:25 ` Steven Rostedt
0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-09 14:06 UTC (permalink / raw)
To: Steven Rostedt, Sam Sun
Cc: linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe, jbaron, ardb,
mingo, Borislav Petkov, dave.hansen, hpa, xrivendell7,
Greg Kroah-Hartman, Tejun Heo
On Sun, Jun 09 2024 at 09:04, Steven Rostedt wrote:
> On Sun, 9 Jun 2024 14:33:01 +0800
> Sam Sun <samsun1006219@gmail.com> wrote:
>> [ 82.310798][ T8020] ------------[ cut here ]------------
>> [ 82.311236][ T8020] kernel BUG at arch/x86/kernel/jump_label.c:73!
>
> This is not a bug with jump labels. It's a bug with whatever is using jump
> labels. Looks like something tried to modify a jump label that no longer
> exists.
The jump label exists.
>> [ 82.331873][ T8020] set_attr_rdpmc+0x193/0x270
>> [ 82.332179][ T8020] ? get_attr_rdpmc+0x30/0x30
>> [ 82.332511][ T8020] ? sysfs_kf_write+0x18d/0x2b0
>> [ 82.332832][ T8020] ? sysfs_kf_read+0x370/0x370
>> [ 82.333159][ T8020] kernfs_fop_write_iter+0x3ab/0x500
>
> So, something in kernfs modified a jump label location that was freed?
No. What happens is:
CPU 0 CPU 1
kernfs_fop_write_iter() kernfs_fop_write_iter()
set_attr_rdpmc() set_attr_rdpmc()
arch_jump_label_transform_queue() arch_jump_label_transform_queue()
mutex_lock(text_mutex) mutex_lock(text_mutex)
__jump_label_patch()
text_poke_queue()
mutex_unlokc(text_mutex)
__jump_label_patch()
CPU 1 sees the original text and not the expected because CPU 0 did not
yet invoke arch_jump_label_transform_apply().
So clearly set_attr_rdpmc() lacks serialization, no?
Thanks,
tglx
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 14:06 ` Thomas Gleixner
@ 2024-06-09 14:25 ` Steven Rostedt
2024-06-09 16:02 ` Thomas Gleixner
0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2024-06-09 14:25 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Sam Sun, linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe,
jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
xrivendell7, Greg Kroah-Hartman, Tejun Heo
On Sun, 09 Jun 2024 16:06:05 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sun, Jun 09 2024 at 09:04, Steven Rostedt wrote:
> > On Sun, 9 Jun 2024 14:33:01 +0800
> > Sam Sun <samsun1006219@gmail.com> wrote:
> >> [ 82.310798][ T8020] ------------[ cut here ]------------
> >> [ 82.311236][ T8020] kernel BUG at arch/x86/kernel/jump_label.c:73!
> >
> > This is not a bug with jump labels. It's a bug with whatever is using jump
> > labels. Looks like something tried to modify a jump label that no longer
> > exists.
>
> The jump label exists.
Ah, I missed the set_attr_rdpmc() as something not with a "?" in front :-p
>
> >> [ 82.331873][ T8020] set_attr_rdpmc+0x193/0x270
> >> [ 82.332179][ T8020] ? get_attr_rdpmc+0x30/0x30
> >> [ 82.332511][ T8020] ? sysfs_kf_write+0x18d/0x2b0
> >> [ 82.332832][ T8020] ? sysfs_kf_read+0x370/0x370
> >> [ 82.333159][ T8020] kernfs_fop_write_iter+0x3ab/0x500
> >
> > So, something in kernfs modified a jump label location that was freed?
>
> No. What happens is:
>
> CPU 0 CPU 1
>
> kernfs_fop_write_iter() kernfs_fop_write_iter()
> set_attr_rdpmc() set_attr_rdpmc()
> arch_jump_label_transform_queue() arch_jump_label_transform_queue()
> mutex_lock(text_mutex) mutex_lock(text_mutex)
> __jump_label_patch()
> text_poke_queue()
> mutex_unlokc(text_mutex)
> __jump_label_patch()
>
> CPU 1 sees the original text and not the expected because CPU 0 did not
> yet invoke arch_jump_label_transform_apply().
>
> So clearly set_attr_rdpmc() lacks serialization, no?
>
Hmm, but should jump labels fail when that happens? Or should it catch
it, and not cause a BUG?
-- Steve
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 14:25 ` Steven Rostedt
@ 2024-06-09 16:02 ` Thomas Gleixner
2024-06-09 16:56 ` Thomas Gleixner
0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-09 16:02 UTC (permalink / raw)
To: Steven Rostedt
Cc: Sam Sun, linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe,
jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
xrivendell7, Greg Kroah-Hartman, Tejun Heo
On Sun, Jun 09 2024 at 10:25, Steven Rostedt wrote:
> On Sun, 09 Jun 2024 16:06:05 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> CPU 0 CPU 1
>>
>> kernfs_fop_write_iter() kernfs_fop_write_iter()
>> set_attr_rdpmc() set_attr_rdpmc()
>> arch_jump_label_transform_queue() arch_jump_label_transform_queue()
>> mutex_lock(text_mutex) mutex_lock(text_mutex)
>> __jump_label_patch()
>> text_poke_queue()
>> mutex_unlokc(text_mutex)
>> __jump_label_patch()
>>
>> CPU 1 sees the original text and not the expected because CPU 0 did not
>> yet invoke arch_jump_label_transform_apply().
>>
>> So clearly set_attr_rdpmc() lacks serialization, no?
>>
> Hmm, but should jump labels fail when that happens? Or should it catch
> it, and not cause a BUG?
Well the bug is there to detect inconsistency and that clearly works :)
But I clearly can't read, because the jump label operations are
serialized via jump_label_mutex. Hrm...
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 16:02 ` Thomas Gleixner
@ 2024-06-09 16:56 ` Thomas Gleixner
2024-06-09 19:39 ` Thomas Gleixner
2024-06-10 6:46 ` Peter Zijlstra
0 siblings, 2 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-09 16:56 UTC (permalink / raw)
To: Steven Rostedt
Cc: Sam Sun, linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe,
jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
xrivendell7, Greg Kroah-Hartman, Tejun Heo
On Sun, Jun 09 2024 at 18:02, Thomas Gleixner wrote:
> On Sun, Jun 09 2024 at 10:25, Steven Rostedt wrote:
> Well the bug is there to detect inconsistency and that clearly works :)
>
> But I clearly can't read, because the jump label operations are
> serialized via jump_label_mutex. Hrm...
Ok. Now I found if for real. It's in the jump label core:
CPU0 CPU1
static_key_slow_dec()
static_key_slow_try_dec()
key->enabled == 1
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
if (val == 1)
return false;
jump_label_lock();
if (atomic_dec_and_test(&key->enabled)) {
--> key->enabled == 0
__jump_label_update()
static_key_slow_dec()
static_key_slow_try_dec()
key->enabled == 0
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
--> key->enabled == -1 <- FAIL
static_key_slow_try_dec() is buggy. It needs similar logic as
static_key_slow_try_inc() to work correctly.
It's not only the 0, key->enabled can be -1 when the other CPU is in the
slow path of enabling it.
I'll send a patch after testing it.
Thanks,
tglx
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 16:56 ` Thomas Gleixner
@ 2024-06-09 19:39 ` Thomas Gleixner
2024-06-10 6:46 ` Peter Zijlstra
1 sibling, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-09 19:39 UTC (permalink / raw)
To: Steven Rostedt
Cc: Sam Sun, linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe,
jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
xrivendell7, Greg Kroah-Hartman, Tejun Heo
On Sun, Jun 09 2024 at 18:56, Thomas Gleixner wrote:
> On Sun, Jun 09 2024 at 18:02, Thomas Gleixner wrote:
>> On Sun, Jun 09 2024 at 10:25, Steven Rostedt wrote:
>> Well the bug is there to detect inconsistency and that clearly works :)
>>
>> But I clearly can't read, because the jump label operations are
>> serialized via jump_label_mutex. Hrm...
>
> Ok. Now I found if for real. It's in the jump label core:
>
> CPU0 CPU1
>
> static_key_slow_dec()
> static_key_slow_try_dec()
>
> key->enabled == 1
> val = atomic_fetch_add_unless(&key->enabled, -1, 1);
> if (val == 1)
> return false;
>
> jump_label_lock();
> if (atomic_dec_and_test(&key->enabled)) {
> --> key->enabled == 0
> __jump_label_update()
>
> static_key_slow_dec()
> static_key_slow_try_dec()
>
> key->enabled == 0
> val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>
> --> key->enabled == -1 <- FAIL
>
> static_key_slow_try_dec() is buggy. It needs similar logic as
> static_key_slow_try_inc() to work correctly.
>
> It's not only the 0, key->enabled can be -1 when the other CPU is in the
> slow path of enabling it.
>
> I'll send a patch after testing it.
That's fixable, but it does not cure all of it.
set_attr_rdpmc() really needs serialization otherwise it can end up with
unbalanced operations.
CPU0 CPU1
x86_pmu.attr_rdpmc == 0
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
static_branch_dec(&rdpmc_never_available_key);
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
FAIL ---> static_branch_dec(&rdpmc_never_available_key);
No?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-09 16:56 ` Thomas Gleixner
2024-06-09 19:39 ` Thomas Gleixner
@ 2024-06-10 6:46 ` Peter Zijlstra
2024-06-10 10:34 ` Thomas Gleixner
1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2024-06-10 6:46 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Steven Rostedt, Sam Sun, linux-kernel, x86, syzkaller-bugs,
jpoimboe, jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
xrivendell7, Greg Kroah-Hartman, Tejun Heo
On Sun, Jun 09, 2024 at 06:56:14PM +0200, Thomas Gleixner wrote:
> Ok. Now I found if for real. It's in the jump label core:
>
> CPU0 CPU1
>
> static_key_slow_dec()
> static_key_slow_try_dec()
>
> key->enabled == 1
> val = atomic_fetch_add_unless(&key->enabled, -1, 1);
> if (val == 1)
> return false;
>
> jump_label_lock();
> if (atomic_dec_and_test(&key->enabled)) {
> --> key->enabled == 0
> __jump_label_update()
>
> static_key_slow_dec()
> static_key_slow_try_dec()
>
> key->enabled == 0
> val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>
> --> key->enabled == -1 <- FAIL
>
> static_key_slow_try_dec() is buggy. It needs similar logic as
> static_key_slow_try_inc() to work correctly.
>
> It's not only the 0, key->enabled can be -1 when the other CPU is in the
> slow path of enabling it.
Well, the -1 thing is in the 0->1 path, that is, the very first enabler.
That *should* not race with a disabler. If it does, there is external
confusion. (As I think the follow up email shows..)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
2024-06-10 6:46 ` Peter Zijlstra
@ 2024-06-10 10:34 ` Thomas Gleixner
2024-06-10 12:46 ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 10:34 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Steven Rostedt, Sam Sun, linux-kernel, x86, syzkaller-bugs,
jpoimboe, jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
xrivendell7, Greg Kroah-Hartman, Tejun Heo
On Mon, Jun 10 2024 at 08:46, Peter Zijlstra wrote:
> On Sun, Jun 09, 2024 at 06:56:14PM +0200, Thomas Gleixner wrote:
>
>> Ok. Now I found if for real. It's in the jump label core:
>>
>> CPU0 CPU1
>>
>> static_key_slow_dec()
>> static_key_slow_try_dec()
>>
>> key->enabled == 1
>> val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>> if (val == 1)
>> return false;
>>
>> jump_label_lock();
>> if (atomic_dec_and_test(&key->enabled)) {
>> --> key->enabled == 0
>> __jump_label_update()
>>
>> static_key_slow_dec()
>> static_key_slow_try_dec()
>>
>> key->enabled == 0
>> val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>>
>> --> key->enabled == -1 <- FAIL
>>
>> static_key_slow_try_dec() is buggy. It needs similar logic as
>> static_key_slow_try_inc() to work correctly.
>>
>> It's not only the 0, key->enabled can be -1 when the other CPU is in the
>> slow path of enabling it.
>
> Well, the -1 thing is in the 0->1 path, that is, the very first enabler.
>
> That *should* not race with a disabler. If it does, there is external
> confusion. (As I think the follow up email shows..)
Right, but all of this is too fragile. Let me send out those patches.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 0/4] perf/x86, jump_label: Cure serialization issues
2024-06-10 10:34 ` Thomas Gleixner
@ 2024-06-10 12:46 ` Thomas Gleixner
2024-06-10 12:46 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
` (3 more replies)
0 siblings, 4 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
Yue and Xingwei reported a syzkaller crash related to the rdpmc attribute
file and jump labels. It turns out to be lack of serialization in the
attribute write path and non-robustness on the jump label side.
The following series addresses these issues.
Thanks,
tglx
---
arch/x86/events/core.c | 3 ++
kernel/jump_label.c | 67 ++++++++++++++++++++++++++++---------------------
2 files changed, 42 insertions(+), 28 deletions(-)
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 1/4] perf/x86: Serialize set_attr_rdpmc()
2024-06-10 12:46 ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
@ 2024-06-10 12:46 ` Thomas Gleixner
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
` (2 subsequent siblings)
3 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
Yue and Xingwei reported a jump label failure. It's caused by the lack of
serialization in set_attr_rdpmc():
CPU0 CPU1
Assume: x86_pmu.attr_rdpmc == 0
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
static_branch_dec(&rdpmc_never_available_key);
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
FAIL, due to imbalance ---> static_branch_dec(&rdpmc_never_available_key);
The reported BUG() is a consequence of the above and of another bug in the
jump label core code. The core code needs a separate fix, but that cannot
prevent the imbalance problem caused by set_attr_rdpmc().
Prevent this by serializing set_attr_rdpmc() locally.
Fixes: a66734297f78 ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks")
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Closes: https://lore.kernel.org/r/CAEkJfYNzfW1vG=ZTMdz_Weoo=RXY1NDunbxnDaLyj8R4kEoE_w@mail.gmail.com
---
arch/x86/events/core.c | 3 +++
1 file changed, 3 insertions(+)
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2547,6 +2547,7 @@ static ssize_t set_attr_rdpmc(struct dev
struct device_attribute *attr,
const char *buf, size_t count)
{
+ static DEFINE_MUTEX(rdpmc_mutex);
unsigned long val;
ssize_t ret;
@@ -2560,6 +2561,8 @@ static ssize_t set_attr_rdpmc(struct dev
if (x86_pmu.attr_rdpmc_broken)
return -ENOTSUPP;
+ guard(mutex)(&rdpmc_mutex);
+
if (val != x86_pmu.attr_rdpmc) {
/*
* Changing into or out of never available or always available,
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec()
2024-06-10 12:46 ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
2024-06-10 12:46 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
@ 2024-06-10 12:46 ` Thomas Gleixner
2024-06-10 17:57 ` Peter Zijlstra
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46 ` [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled() Thomas Gleixner
2024-06-10 12:46 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
3 siblings, 2 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
The commit which tried to fix the concurrency issues of concurrent
static_key_slow_inc() failed to fix the equivalent issues
vs. static_key_slow_dec():
CPU0 CPU1
static_key_slow_dec()
static_key_slow_try_dec()
key->enabled == 1
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
if (val == 1)
return false;
jump_label_lock();
if (atomic_dec_and_test(&key->enabled)) {
--> key->enabled == 0
__jump_label_update()
static_key_slow_dec()
static_key_slow_try_dec()
key->enabled == 0
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
--> key->enabled == -1 <- FAIL
There is another bug in that code, when there is a concurrent
static_key_slow_inc() which enables the key as that sets key->enabled to -1
so on the other CPU
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
will succeed and decrement to -2, which is invalid.
Cure all of this by replacing the atomic_fetch_add_unless() with a
atomic_try_cmpxchg() loop similar to static_key_fast_inc_not_disabled().
Fixes: 4c5ea0a9cd02 ("locking/static_key: Fix concurrent static_key_slow_inc()")
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/jump_label.c | 38 ++++++++++++++++++++++----------------
1 file changed, 22 insertions(+), 16 deletions(-)
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -131,7 +131,7 @@ bool static_key_fast_inc_not_disabled(st
STATIC_KEY_CHECK_USE(key);
/*
* Negative key->enabled has a special meaning: it sends
- * static_key_slow_inc() down the slow path, and it is non-zero
+ * static_key_slow_inc/dec() down the slow path, and it is non-zero
* so it counts as "enabled" in jump_label_update(). Note that
* atomic_inc_unless_negative() checks >= 0, so roll our own.
*/
@@ -150,7 +150,7 @@ bool static_key_slow_inc_cpuslocked(stru
lockdep_assert_cpus_held();
/*
- * Careful if we get concurrent static_key_slow_inc() calls;
+ * Careful if we get concurrent static_key_slow_inc/dec() calls;
* later calls must wait for the first one to _finish_ the
* jump_label_update() process. At the same time, however,
* the jump_label_update() call below wants to see
@@ -247,20 +247,25 @@ EXPORT_SYMBOL_GPL(static_key_disable);
static bool static_key_slow_try_dec(struct static_key *key)
{
- int val;
-
- val = atomic_fetch_add_unless(&key->enabled, -1, 1);
- if (val == 1)
- return false;
+ int v;
/*
- * The negative count check is valid even when a negative
- * key->enabled is in use by static_key_slow_inc(); a
- * __static_key_slow_dec() before the first static_key_slow_inc()
- * returns is unbalanced, because all other static_key_slow_inc()
- * instances block while the update is in progress.
+ * Go into the slow path if key::enabled is less than or equal than
+ * one. One is valid to shut down the key, anything less than one
+ * is an imbalance, which is handled at the call site.
+ *
+ * That includes the special case of '-1' which is set in
+ * static_key_slow_inc_cpuslocked(), but that's harmless as it is
+ * fully serialized in the slow path below. By the time this task
+ * acquires the jump label lock the value is back to one and the
+ * retry under the lock must succeed.
*/
- WARN(val < 0, "jump label: negative count!\n");
+ v = atomic_read(&key->enabled);
+ do {
+ if (v <= 1)
+ return false;
+ } while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1)));
+
return true;
}
@@ -271,10 +276,11 @@ static void __static_key_slow_dec_cpuslo
if (static_key_slow_try_dec(key))
return;
- jump_label_lock();
- if (atomic_dec_and_test(&key->enabled))
+ guard(mutex)(&jump_label_mutex);
+ if (atomic_cmpxchg(&key->enabled, 1, 0))
jump_label_update(key);
- jump_label_unlock();
+ else
+ WARN_ON_ONCE(!static_key_slow_try_dec(key));
}
static void __static_key_slow_dec(struct static_key *key)
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled()
2024-06-10 12:46 ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
2024-06-10 12:46 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
2024-06-10 12:46 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
@ 2024-06-10 12:46 ` Thomas Gleixner
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
3 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
The second part of
if (v <= 0 || (v + 1) < 0)
is not immediately obvious that it acts as overflow protection.
Check explicitely for v == INT_MAX instead and add a proper comment how
this is used at the call sites.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/jump_label.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -132,12 +132,15 @@ bool static_key_fast_inc_not_disabled(st
/*
* Negative key->enabled has a special meaning: it sends
* static_key_slow_inc/dec() down the slow path, and it is non-zero
- * so it counts as "enabled" in jump_label_update(). Note that
- * atomic_inc_unless_negative() checks >= 0, so roll our own.
+ * so it counts as "enabled" in jump_label_update().
+ *
+ * The INT_MAX overflow condition is either used by the networking
+ * code to reset or detected in the slow path of
+ * static_key_slow_inc_cpuslocked().
*/
v = atomic_read(&key->enabled);
do {
- if (v <= 0 || (v + 1) < 0)
+ if (v <= 0 || v == INT_MAX)
return false;
} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)));
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()
2024-06-10 12:46 ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
` (2 preceding siblings ...)
2024-06-10 12:46 ` [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled() Thomas Gleixner
@ 2024-06-10 12:46 ` Thomas Gleixner
2024-06-12 13:57 ` Uros Bizjak
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
3 siblings, 2 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
Make the code more obvious and add proper comments to avoid future head
scratching.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/jump_label.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -162,22 +162,24 @@ bool static_key_slow_inc_cpuslocked(stru
if (static_key_fast_inc_not_disabled(key))
return true;
- jump_label_lock();
- if (atomic_read(&key->enabled) == 0) {
- atomic_set(&key->enabled, -1);
+ guard(mutex)(&jump_label_mutex);
+ /* Try to mark it as 'enabling in progress. */
+ if (!atomic_cmpxchg(&key->enabled, 0, -1)) {
jump_label_update(key);
/*
- * Ensure that if the above cmpxchg loop observes our positive
- * value, it must also observe all the text changes.
+ * Ensure that when static_key_fast_inc_not_disabled() or
+ * static_key_slow_try_dec() observe the positive value,
+ * they must also observe all the text changes.
*/
atomic_set_release(&key->enabled, 1);
} else {
- if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key))) {
- jump_label_unlock();
+ /*
+ * While holding the mutex this should never observe
+ * anything else than a value >= 1 and succeed
+ */
+ if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key)))
return false;
- }
}
- jump_label_unlock();
return true;
}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec()
2024-06-10 12:46 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
@ 2024-06-10 17:57 ` Peter Zijlstra
2024-06-10 18:00 ` Thomas Gleixner
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2024-06-10 17:57 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
On Mon, Jun 10, 2024 at 02:46:36PM +0200, Thomas Gleixner wrote:
> @@ -247,20 +247,25 @@ EXPORT_SYMBOL_GPL(static_key_disable);
>
> static bool static_key_slow_try_dec(struct static_key *key)
> {
> + int v;
>
> /*
> + * Go into the slow path if key::enabled is less than or equal than
> + * one. One is valid to shut down the key, anything less than one
> + * is an imbalance, which is handled at the call site.
> + *
> + * That includes the special case of '-1' which is set in
> + * static_key_slow_inc_cpuslocked(), but that's harmless as it is
> + * fully serialized in the slow path below. By the time this task
> + * acquires the jump label lock the value is back to one and the
> + * retry under the lock must succeed.
Harmless yes, but it really should not happen to begin with. If this
happens it means someone wants to disable a key that is in the middle of
getting enabled for the first time.
I'm tempted to want a WARN here instead. Hmm?
> */
> + v = atomic_read(&key->enabled);
> + do {
> + if (v <= 1)
> + return false;
> + } while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1)));
> +
> return true;
> }
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec()
2024-06-10 17:57 ` Peter Zijlstra
@ 2024-06-10 18:00 ` Thomas Gleixner
0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 18:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
On Mon, Jun 10 2024 at 19:57, Peter Zijlstra wrote:
> On Mon, Jun 10, 2024 at 02:46:36PM +0200, Thomas Gleixner wrote:
>
>> @@ -247,20 +247,25 @@ EXPORT_SYMBOL_GPL(static_key_disable);
>>
>> static bool static_key_slow_try_dec(struct static_key *key)
>> {
>> + int v;
>>
>> /*
>> + * Go into the slow path if key::enabled is less than or equal than
>> + * one. One is valid to shut down the key, anything less than one
>> + * is an imbalance, which is handled at the call site.
>> + *
>> + * That includes the special case of '-1' which is set in
>> + * static_key_slow_inc_cpuslocked(), but that's harmless as it is
>> + * fully serialized in the slow path below. By the time this task
>> + * acquires the jump label lock the value is back to one and the
>> + * retry under the lock must succeed.
>
> Harmless yes, but it really should not happen to begin with. If this
> happens it means someone wants to disable a key that is in the middle of
> getting enabled for the first time.
>
> I'm tempted to want a WARN here instead. Hmm?
No strong opinion
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()
2024-06-10 12:46 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
@ 2024-06-12 13:57 ` Uros Bizjak
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
1 sibling, 0 replies; 21+ messages in thread
From: Uros Bizjak @ 2024-06-12 13:57 UTC (permalink / raw)
To: Thomas Gleixner, LKML
Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7
> Make the code more obvious and add proper comments to avoid future head
> scratching.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> kernel/jump_label.c | 20 +++++++++++---------
> 1 file changed, 11 insertions(+), 9 deletions(-)
>
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -162,22 +162,24 @@ bool static_key_slow_inc_cpuslocked(stru
> if (static_key_fast_inc_not_disabled(key))
> return true;
>
> - jump_label_lock();
> - if (atomic_read(&key->enabled) == 0) {
> - atomic_set(&key->enabled, -1);
> + guard(mutex)(&jump_label_mutex);
> + /* Try to mark it as 'enabling in progress. */
Missing closing quotation mark above.
> + if (!atomic_cmpxchg(&key->enabled, 0, -1)) {
This can be:
int tmp = 0;
if (atomic_try_cmpxchg(&key->enabled, &tmp, -1)) {
...
and will result in more optimal code and will IMO also be more readable
because it is clear that the code is executed when cmpxchg succeeds.
(BTW: The atomic_read()/atomic_set() pair can also be converted to
cmpxchg in static_key_enable_cpuslocked().)
Uros.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [tip: locking/core] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()
2024-06-10 12:46 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
2024-06-12 13:57 ` Uros Bizjak
@ 2024-06-17 15:47 ` tip-bot2 for Thomas Gleixner
1 sibling, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-06-17 15:47 UTC (permalink / raw)
To: linux-tip-commits
Cc: Thomas Gleixner, Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the locking/core branch of tip:
Commit-ID: 9bc2ff871f00437ad2f10c1eceff51aaa72b478f
Gitweb: https://git.kernel.org/tip/9bc2ff871f00437ad2f10c1eceff51aaa72b478f
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 10 Jun 2024 14:46:39 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 11 Jun 2024 11:25:24 +02:00
jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()
Make the code more obvious and add proper comments to avoid future head
scratching.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.548322963@linutronix.de
---
kernel/jump_label.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 4d06ec2..4ad5ed8 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -162,22 +162,24 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key)
if (static_key_fast_inc_not_disabled(key))
return true;
- jump_label_lock();
- if (atomic_read(&key->enabled) == 0) {
- atomic_set(&key->enabled, -1);
+ guard(mutex)(&jump_label_mutex);
+ /* Try to mark it as 'enabling in progress. */
+ if (!atomic_cmpxchg(&key->enabled, 0, -1)) {
jump_label_update(key);
/*
- * Ensure that if the above cmpxchg loop observes our positive
- * value, it must also observe all the text changes.
+ * Ensure that when static_key_fast_inc_not_disabled() or
+ * static_key_slow_try_dec() observe the positive value,
+ * they must also observe all the text changes.
*/
atomic_set_release(&key->enabled, 1);
} else {
- if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key))) {
- jump_label_unlock();
+ /*
+ * While holding the mutex this should never observe
+ * anything else than a value >= 1 and succeed
+ */
+ if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key)))
return false;
- }
}
- jump_label_unlock();
return true;
}
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip: locking/core] jump_label: Clarify condition in static_key_fast_inc_not_disabled()
2024-06-10 12:46 ` [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled() Thomas Gleixner
@ 2024-06-17 15:47 ` tip-bot2 for Thomas Gleixner
0 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-06-17 15:47 UTC (permalink / raw)
To: linux-tip-commits
Cc: Thomas Gleixner, Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the locking/core branch of tip:
Commit-ID: 695ef796467ed228b60f1915995e390aea3d85c6
Gitweb: https://git.kernel.org/tip/695ef796467ed228b60f1915995e390aea3d85c6
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 10 Jun 2024 14:46:37 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 11 Jun 2024 11:25:23 +02:00
jump_label: Clarify condition in static_key_fast_inc_not_disabled()
The second part of
if (v <= 0 || (v + 1) < 0)
is not immediately obvious that it acts as overflow protection.
Check explicitely for v == INT_MAX instead and add a proper comment how
this is used at the call sites.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.484973160@linutronix.de
---
kernel/jump_label.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 1f05a19..4d06ec2 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -132,12 +132,15 @@ bool static_key_fast_inc_not_disabled(struct static_key *key)
/*
* Negative key->enabled has a special meaning: it sends
* static_key_slow_inc/dec() down the slow path, and it is non-zero
- * so it counts as "enabled" in jump_label_update(). Note that
- * atomic_inc_unless_negative() checks >= 0, so roll our own.
+ * so it counts as "enabled" in jump_label_update().
+ *
+ * The INT_MAX overflow condition is either used by the networking
+ * code to reset or detected in the slow path of
+ * static_key_slow_inc_cpuslocked().
*/
v = atomic_read(&key->enabled);
do {
- if (v <= 0 || (v + 1) < 0)
+ if (v <= 0 || v == INT_MAX)
return false;
} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)));
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip: locking/core] perf/x86: Serialize set_attr_rdpmc()
2024-06-10 12:46 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
@ 2024-06-17 15:47 ` tip-bot2 for Thomas Gleixner
0 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-06-17 15:47 UTC (permalink / raw)
To: linux-tip-commits
Cc: Yue Sun, Xingwei Lee, Thomas Gleixner, Peter Zijlstra (Intel),
x86, linux-kernel
The following commit has been merged into the locking/core branch of tip:
Commit-ID: bb9bb45f746b0f9457de9c3fc4da143a6351bdc9
Gitweb: https://git.kernel.org/tip/bb9bb45f746b0f9457de9c3fc4da143a6351bdc9
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 10 Jun 2024 14:46:35 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 11 Jun 2024 11:25:22 +02:00
perf/x86: Serialize set_attr_rdpmc()
Yue and Xingwei reported a jump label failure. It's caused by the lack of
serialization in set_attr_rdpmc():
CPU0 CPU1
Assume: x86_pmu.attr_rdpmc == 0
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
static_branch_dec(&rdpmc_never_available_key);
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
FAIL, due to imbalance ---> static_branch_dec(&rdpmc_never_available_key);
The reported BUG() is a consequence of the above and of another bug in the
jump label core code. The core code needs a separate fix, but that cannot
prevent the imbalance problem caused by set_attr_rdpmc().
Prevent this by serializing set_attr_rdpmc() locally.
Fixes: a66734297f78 ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks")
Closes: https://lore.kernel.org/r/CAEkJfYNzfW1vG=ZTMdz_Weoo=RXY1NDunbxnDaLyj8R4kEoE_w@mail.gmail.com
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.359476013@linutronix.de
---
arch/x86/events/core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5b0dd07..acd367c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2547,6 +2547,7 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
struct device_attribute *attr,
const char *buf, size_t count)
{
+ static DEFINE_MUTEX(rdpmc_mutex);
unsigned long val;
ssize_t ret;
@@ -2560,6 +2561,8 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
if (x86_pmu.attr_rdpmc_broken)
return -ENOTSUPP;
+ guard(mutex)(&rdpmc_mutex);
+
if (val != x86_pmu.attr_rdpmc) {
/*
* Changing into or out of never available or always available,
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip: locking/core] jump_label: Fix concurrency issues in static_key_slow_dec()
2024-06-10 12:46 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
2024-06-10 17:57 ` Peter Zijlstra
@ 2024-06-17 15:47 ` tip-bot2 for Thomas Gleixner
1 sibling, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-06-17 15:47 UTC (permalink / raw)
To: linux-tip-commits
Cc: Yue Sun, Xingwei Lee, Thomas Gleixner, Peter Zijlstra (Intel),
x86, linux-kernel
The following commit has been merged into the locking/core branch of tip:
Commit-ID: 83ab38ef0a0b2407d43af9575bb32333fdd74fb2
Gitweb: https://git.kernel.org/tip/83ab38ef0a0b2407d43af9575bb32333fdd74fb2
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 10 Jun 2024 14:46:36 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 11 Jun 2024 11:25:23 +02:00
jump_label: Fix concurrency issues in static_key_slow_dec()
The commit which tried to fix the concurrency issues of concurrent
static_key_slow_inc() failed to fix the equivalent issues
vs. static_key_slow_dec():
CPU0 CPU1
static_key_slow_dec()
static_key_slow_try_dec()
key->enabled == 1
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
if (val == 1)
return false;
jump_label_lock();
if (atomic_dec_and_test(&key->enabled)) {
--> key->enabled == 0
__jump_label_update()
static_key_slow_dec()
static_key_slow_try_dec()
key->enabled == 0
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
--> key->enabled == -1 <- FAIL
There is another bug in that code, when there is a concurrent
static_key_slow_inc() which enables the key as that sets key->enabled to -1
so on the other CPU
val = atomic_fetch_add_unless(&key->enabled, -1, 1);
will succeed and decrement to -2, which is invalid.
Cure all of this by replacing the atomic_fetch_add_unless() with a
atomic_try_cmpxchg() loop similar to static_key_fast_inc_not_disabled().
[peterz: add WARN_ON_ONCE for the -1 race]
Fixes: 4c5ea0a9cd02 ("locking/static_key: Fix concurrent static_key_slow_inc()")
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.422897838@linutronix.de
---
kernel/jump_label.c | 45 ++++++++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 3218fa5..1f05a19 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -131,7 +131,7 @@ bool static_key_fast_inc_not_disabled(struct static_key *key)
STATIC_KEY_CHECK_USE(key);
/*
* Negative key->enabled has a special meaning: it sends
- * static_key_slow_inc() down the slow path, and it is non-zero
+ * static_key_slow_inc/dec() down the slow path, and it is non-zero
* so it counts as "enabled" in jump_label_update(). Note that
* atomic_inc_unless_negative() checks >= 0, so roll our own.
*/
@@ -150,7 +150,7 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key)
lockdep_assert_cpus_held();
/*
- * Careful if we get concurrent static_key_slow_inc() calls;
+ * Careful if we get concurrent static_key_slow_inc/dec() calls;
* later calls must wait for the first one to _finish_ the
* jump_label_update() process. At the same time, however,
* the jump_label_update() call below wants to see
@@ -247,20 +247,32 @@ EXPORT_SYMBOL_GPL(static_key_disable);
static bool static_key_slow_try_dec(struct static_key *key)
{
- int val;
-
- val = atomic_fetch_add_unless(&key->enabled, -1, 1);
- if (val == 1)
- return false;
+ int v;
/*
- * The negative count check is valid even when a negative
- * key->enabled is in use by static_key_slow_inc(); a
- * __static_key_slow_dec() before the first static_key_slow_inc()
- * returns is unbalanced, because all other static_key_slow_inc()
- * instances block while the update is in progress.
+ * Go into the slow path if key::enabled is less than or equal than
+ * one. One is valid to shut down the key, anything less than one
+ * is an imbalance, which is handled at the call site.
+ *
+ * That includes the special case of '-1' which is set in
+ * static_key_slow_inc_cpuslocked(), but that's harmless as it is
+ * fully serialized in the slow path below. By the time this task
+ * acquires the jump label lock the value is back to one and the
+ * retry under the lock must succeed.
*/
- WARN(val < 0, "jump label: negative count!\n");
+ v = atomic_read(&key->enabled);
+ do {
+ /*
+ * Warn about the '-1' case though; since that means a
+ * decrement is concurrent with a first (0->1) increment. IOW
+ * people are trying to disable something that wasn't yet fully
+ * enabled. This suggests an ordering problem on the user side.
+ */
+ WARN_ON_ONCE(v < 0);
+ if (v <= 1)
+ return false;
+ } while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1)));
+
return true;
}
@@ -271,10 +283,11 @@ static void __static_key_slow_dec_cpuslocked(struct static_key *key)
if (static_key_slow_try_dec(key))
return;
- jump_label_lock();
- if (atomic_dec_and_test(&key->enabled))
+ guard(mutex)(&jump_label_mutex);
+ if (atomic_cmpxchg(&key->enabled, 1, 0))
jump_label_update(key);
- jump_label_unlock();
+ else
+ WARN_ON_ONCE(!static_key_slow_try_dec(key));
}
static void __static_key_slow_dec(struct static_key *key)
^ permalink raw reply related [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-06-17 15:47 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-09 6:33 [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked Sam Sun
2024-06-09 13:04 ` Steven Rostedt
2024-06-09 14:06 ` Thomas Gleixner
2024-06-09 14:25 ` Steven Rostedt
2024-06-09 16:02 ` Thomas Gleixner
2024-06-09 16:56 ` Thomas Gleixner
2024-06-09 19:39 ` Thomas Gleixner
2024-06-10 6:46 ` Peter Zijlstra
2024-06-10 10:34 ` Thomas Gleixner
2024-06-10 12:46 ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
2024-06-10 12:46 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
2024-06-10 17:57 ` Peter Zijlstra
2024-06-10 18:00 ` Thomas Gleixner
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46 ` [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled() Thomas Gleixner
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
2024-06-12 13:57 ` Uros Bizjak
2024-06-17 15:47 ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox