public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
@ 2024-06-09  6:33 Sam Sun
  2024-06-09 13:04 ` Steven Rostedt
  0 siblings, 1 reply; 21+ messages in thread
From: Sam Sun @ 2024-06-09  6:33 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: syzkaller-bugs, peterz, jpoimboe, jbaron, rostedt, ardb, tglx,
	mingo, Borislav Petkov, dave.hansen, hpa, xrivendell7

Dear developers and maintainers,

We encountered a kernel bug while using our modified
syzkaller. It was tested against the latest upstream kernel.
Kernel crash log is listed below.

```
[   82.310798][ T8020] ------------[ cut here ]------------
[   82.311236][ T8020] kernel BUG at arch/x86/kernel/jump_label.c:73!
[   82.311761][ T8020] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[   82.312331][ T8020] CPU: 0 PID: 8020 Comm: static_key_slow Not
tainted 6.10.0-rc2-00366-g771ed66105de 3
[   82.313044][ T8020] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.13.0-1ubuntu1.1 04/04
[   82.313778][ T8020] RIP: 0010:__jump_label_patch+0x456/0x480
[   82.314223][ T8020] Code: e8 af a1 5d 00 48 c7 c7 40 01 86 8b 48 8b
0c 24 48 89 ce 48 89 ca 4d 89 e8 4f
[   82.315745][ T8020] RSP: 0018:ffffc9000f9cf980 EFLAGS: 00010292
[   82.316204][ T8020] RAX: 000000000000008b RBX: 0000000000000085
RCX: 218564eba7c06800
[   82.316813][ T8020] RDX: 0000000000000000 RSI: 0000000080000000
RDI: 0000000000000000
[   82.317396][ T8020] RBP: ffffc9000f9cfac0 R08: ffffffff817176dc
R09: 1ffff92001f39ed0
[   82.317987][ T8020] R10: dffffc0000000000 R11: fffff52001f39ed1
R12: 0000000000000001
[   82.318548][ T8020] R13: ffffffff8b862901 R14: ffffffff90fbe8c0
R15: ffffffff8b862901
[   82.319116][ T8020] FS:  00007f04610bc540(0000)
GS:ffff88802c800000(0000) knlGS:0000000000000000
[   82.319808][ T8020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   82.320308][ T8020] CR2: 00007f0460fd9650 CR3: 000000004a16a000
CR4: 0000000000750ef0
[   82.320883][ T8020] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[   82.321460][ T8020] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[   82.322004][ T8020] PKRU: 55555554
[   82.322269][ T8020] Call Trace:
[   82.322515][ T8020]  <TASK>
[   82.322729][ T8020]  ? __die_body+0x88/0xe0
[   82.323043][ T8020]  ? die+0xcf/0x110
[   82.323322][ T8020]  ? do_trap+0x155/0x3a0
[   82.323625][ T8020]  ? __jump_label_patch+0x456/0x480
[   82.324009][ T8020]  ? do_error_trap+0x1dc/0x2a0
[   82.324371][ T8020]  ? __jump_label_patch+0x456/0x480
[   82.324774][ T8020]  ? do_int3+0x50/0x50
[   82.325088][ T8020]  ? handle_invalid_op+0x34/0x40
[   82.325445][ T8020]  ? __jump_label_patch+0x456/0x480
[   82.325815][ T8020]  ? exc_invalid_op+0x34/0x50
[   82.326156][ T8020]  ? asm_exc_invalid_op+0x1a/0x20
[   82.326533][ T8020]  ? __wake_up_klogd+0xcc/0x100
[   82.326879][ T8020]  ? __jump_label_patch+0x456/0x480
[   82.327249][ T8020]  ? cr4_update_pce+0xf/0x80
[   82.327601][ T8020]  ? arch_jump_label_transform_queue+0xf0/0xf0
[   82.328047][ T8020]  ? cr4_update_pce+0xf/0x80
[   82.328380][ T8020]  ? cr4_update_pce+0x1e/0x80
[   82.328717][ T8020]  ? cr4_update_pce+0x11/0x80
[   82.329052][ T8020]  ? read_lock_is_recursive+0x20/0x20
[   82.329466][ T8020]  ? __static_key_slow_dec_cpuslocked+0xb0/0x140
[   82.329966][ T8020]  ? text_poke_queue+0x46/0x180
[   82.330329][ T8020]  arch_jump_label_transform_queue+0x63/0xf0
[   82.330742][ T8020]  __jump_label_update+0x18b/0x3b0
[   82.331101][ T8020]  __static_key_slow_dec_cpuslocked+0xe9/0x140
[   82.331519][ T8020]  static_key_slow_dec+0x50/0xa0
[   82.331873][ T8020]  set_attr_rdpmc+0x193/0x270
[   82.332179][ T8020]  ? get_attr_rdpmc+0x30/0x30
[   82.332511][ T8020]  ? sysfs_kf_write+0x18d/0x2b0
[   82.332832][ T8020]  ? sysfs_kf_read+0x370/0x370
[   82.333159][ T8020]  kernfs_fop_write_iter+0x3ab/0x500
[   82.333531][ T8020]  vfs_write+0xa8b/0xd00
[   82.333812][ T8020]  ? kernfs_fop_read_iter+0x640/0x640
[   82.334234][ T8020]  ? kernel_write+0x330/0x330
[   82.334598][ T8020]  ? do_sys_openat2+0x16b/0x1c0
[   82.334949][ T8020]  ? do_sys_open+0x230/0x230
[   82.335310][ T8020]  ? __up_read+0x2bb/0x6a0
[   82.335633][ T8020]  ksys_write+0x17b/0x2a0
[   82.335966][ T8020]  ? __ia32_sys_read+0x90/0x90
[   82.336333][ T8020]  ? do_syscall_64+0xa5/0x230
[   82.336656][ T8020]  do_syscall_64+0xe2/0x230
[   82.336983][ T8020]  ? clear_bhb_loop+0x25/0x80
[   82.337289][ T8020]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   82.337692][ T8020] RIP: 0033:0x7f0460fd2473
[   82.337987][ T8020] Code: 8b 15 21 2a 0e 00 f7 d8 64 89 02 48 c7 c0
ff ff ff ff eb b7 0f 1f 00 64 8b 08
[   82.339230][ T8020] RSP: 002b:00007ffc46fb3058 EFLAGS: 00000246
ORIG_RAX: 0000000000000001
[   82.339804][ T8020] RAX: ffffffffffffffda RBX: 0000000000000000
RCX: 00007f0460fd2473
[   82.340331][ T8020] RDX: 0000000000000002 RSI: 00007ffc46fb3090
RDI: 0000000000000004
[   82.340852][ T8020] RBP: 00007ffc46fb3cb0 R08: 000000000000000f
R09: 0072736d2f232f75
[   82.341353][ T8020] R10: 0000000000000000 R11: 0000000000000246
R12: 00005615dc1dd1c0
[   82.341853][ T8020] R13: 0000000000000000 R14: 0000000000000000
R15: 0000000000000000
[   82.342364][ T8020]  </TASK>
[   82.342554][ T8020] Modules linked in:
[   82.342875][ T8020] ---[ end trace 0000000000000000 ]---
[   82.343230][ T8020] RIP: 0010:__jump_label_patch+0x456/0x480
[   82.343668][ T8020] Code: e8 af a1 5d 00 48 c7 c7 40 01 86 8b 48 8b
0c 24 48 89 ce 48 89 ca 4d 89 e8 4f
[   82.345750][ T8020] RSP: 0018:ffffc9000f9cf980 EFLAGS: 00010292
[   82.346161][ T8020] RAX: 000000000000008b RBX: 0000000000000085
RCX: 218564eba7c06800
[   82.346657][ T8020] RDX: 0000000000000000 RSI: 0000000080000000
RDI: 0000000000000000
[   82.347391][ T8020] RBP: ffffc9000f9cfac0 R08: ffffffff817176dc
R09: 1ffff92001f39ed0
[   82.347901][ T8020] R10: dffffc0000000000 R11: fffff52001f39ed1
R12: 0000000000000001
[   82.348652][ T8020] R13: ffffffff8b862901 R14: ffffffff90fbe8c0
R15: ffffffff8b862901
[   82.349173][ T8020] FS:  00007f04610bc540(0000)
GS:ffff88802c800000(0000) knlGS:0000000000000000
[   82.349784][ T8020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   82.350196][ T8020] CR2: 00007f96d80450b8 CR3: 000000004a16a000
CR4: 0000000000750ef0
[   82.350717][ T8020] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[   82.351243][ T8020] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[   82.351769][ T8020] PKRU: 55555554
[   82.352011][ T8020] Kernel panic - not syncing: Fatal exception
[   82.352665][ T8020] Kernel Offset: disabled
[   82.352988][ T8020] Rebooting in 86400 seconds..
```

We have a C repro which can trigger this bug in latest upstream kernel
commit, listed below:

```
#define _GNU_SOURCE

#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

static unsigned long long procid;

static void sleep_ms(uint64_t ms)
{
  usleep(ms * 1000);
}

static uint64_t current_time_ms(void)
{
  struct timespec ts;
  if (clock_gettime(CLOCK_MONOTONIC, &ts))
    exit(1);
  return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}

static bool write_file(const char* file, const char* what, ...)
{
  char buf[1024];
  va_list args;
  va_start(args, what);
  vsnprintf(buf, sizeof(buf), what, args);
  va_end(args);
  buf[sizeof(buf) - 1] = 0;
  int len = strlen(buf);
  int fd = open(file, O_WRONLY | O_CLOEXEC);
  if (fd == -1)
    return false;
  if (write(fd, buf, len) != len) {
    int err = errno;
    close(fd);
    errno = err;
    return false;
  }
  close(fd);
  return true;
}

static long syz_mod_dev(volatile long a0, volatile long a1, volatile long a2,
                        volatile long a3, volatile long a4)
{
  int fd, sysfd;
  char buf[1024], sysbuf[1024], input[1024];
  char* hash;
  char dev_num;
  dev_num = 0;
  strncpy(buf, (char*)a0, sizeof(buf) - 1);
  buf[sizeof(buf) - 1] = 0;
  while ((hash = strchr(buf, '#'))) {
    *hash = '0' + (char)(a1 % 10);
    a1 /= 10;
    if (dev_num == 0)
      dev_num = *hash;
  }
  fd = open(buf, a2, 0);
  strncpy(sysbuf, (char*)a3, sizeof(sysbuf) - 1);
  sysbuf[sizeof(sysbuf) - 1] = 0;
  if ((hash = strchr(sysbuf, '#'))) {
    *hash = dev_num;
    while ((hash = strchr(sysbuf, '#'))) {
      *hash = '0' + (char)(a1 % 10);
      a1 /= 10;
    }
  }
  sysfd = open(sysbuf, O_RDWR, 0);
  strncpy(input, (char*)a4, sizeof(input) - 1);
  input[sizeof(input) - 1] = 0;
  hash = strchr(input, '\0');
  sysfd = write(sysfd, input, hash - input + 1);
  return fd;
}

static void kill_and_wait(int pid, int* status)
{
  kill(-pid, SIGKILL);
  kill(pid, SIGKILL);
  for (int i = 0; i < 100; i++) {
    if (waitpid(-1, status, WNOHANG | __WALL) == pid)
      return;
    usleep(1000);
  }
  DIR* dir = opendir("/sys/fs/fuse/connections");
  if (dir) {
    for (;;) {
      struct dirent* ent = readdir(dir);
      if (!ent)
        break;
      if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
        continue;
      char abort[300];
      snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
               ent->d_name);
      int fd = open(abort, O_WRONLY);
      if (fd == -1) {
        continue;
      }
      if (write(fd, abort, 1) < 0) {
      }
      close(fd);
    }
    closedir(dir);
  } else {
  }
  while (waitpid(-1, status, __WALL) != pid) {
  }
}

static void setup_test()
{
  prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
  setpgrp();
  write_file("/proc/self/oom_score_adj", "1000");
}

static void execute_one(void);

#define WAIT_FLAGS __WALL

static void loop(void)
{
  int iter = 0;
  for (;; iter++) {
    int pid = fork();
    if (pid < 0)
      exit(1);
    if (pid == 0) {
      setup_test();
      execute_one();
      exit(0);
    }
    int status = 0;
    uint64_t start = current_time_ms();
    for (;;) {
      if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
        break;
      sleep_ms(1);
      if (current_time_ms() - start < 5000)
        continue;
      kill_and_wait(pid, &status);
      break;
    }
  }
}

void execute_one(void)
{
  memcpy((void*)0x20000000, "/dev/cpu/#/msr\000", 15);
  memcpy((void*)0x200000c0, "/sys/devices/cpu/rdpmc\000", 23);
  memcpy((void*)0x20000100, "0\000", 2);
  syz_mod_dev(
      /*dev=*/0x20000000, /*id=*/0x8001,
      /*flags=O_TRUNC|O_PATH|O_NONBLOCK|O_NOATIME|O_EXCL|O_DIRECTORY|0x442*/
      0x250ec2, /*file=*/0x200000c0, /*buf=*/0x20000100);
  memcpy((void*)0x20000080, "/dev/cpu/#/msr\000", 15);
  memcpy((void*)0x20000140, "/sys/devices/cpu/rdpmc\000", 23);
  memcpy((void*)0x20000180, "1\000", 2);
  syz_mod_dev(/*dev=*/0x20000080, /*id=*/0,
              /*flags=__O_TMPFILE|O_APPEND*/ 0x400400, /*file=*/0x20000140,
              /*buf=*/0x20000180);
}
int main(void)
{
  syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
          /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
          /*offset=*/0ul);
  syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
          /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
          /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
          /*offset=*/0ul);
  syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
          /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
          /*offset=*/0ul);
  for (procid = 0; procid < 4; procid++) {
    if (fork() == 0) {
      loop();
    }
  }
  sleep(1000000);
  return 0;
}
```

If you have any questions, please contact us.

Reported by Yue Sun <samsun1006219@gmail.com>
Reported by xingwei lee <xrivendell7@gmail.com>

Best Regards,
Yue

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
  2024-06-09  6:33 [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked Sam Sun
@ 2024-06-09 13:04 ` Steven Rostedt
  2024-06-09 14:06   ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2024-06-09 13:04 UTC (permalink / raw)
  To: Sam Sun
  Cc: linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe, jbaron, ardb,
	tglx, mingo, Borislav Petkov, dave.hansen, hpa, xrivendell7,
	Greg Kroah-Hartman, Tejun Heo

On Sun, 9 Jun 2024 14:33:01 +0800
Sam Sun <samsun1006219@gmail.com> wrote:

> Dear developers and maintainers,
> 
> We encountered a kernel bug while using our modified
> syzkaller. It was tested against the latest upstream kernel.
> Kernel crash log is listed below.
> 
> ```
> [   82.310798][ T8020] ------------[ cut here ]------------
> [   82.311236][ T8020] kernel BUG at arch/x86/kernel/jump_label.c:73!

This is not a bug with jump labels. It's a bug with whatever is using jump
labels. Looks like something tried to modify a jump label that no longer
exists.

> [   82.311761][ T8020] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
> [   82.312331][ T8020] CPU: 0 PID: 8020 Comm: static_key_slow Not
> tainted 6.10.0-rc2-00366-g771ed66105de 3
> [   82.313044][ T8020] Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996), BIOS 1.13.0-1ubuntu1.1 04/04
> [   82.313778][ T8020] RIP: 0010:__jump_label_patch+0x456/0x480
> [   82.314223][ T8020] Code: e8 af a1 5d 00 48 c7 c7 40 01 86 8b 48 8b
> 0c 24 48 89 ce 48 89 ca 4d 89 e8 4f
> [   82.315745][ T8020] RSP: 0018:ffffc9000f9cf980 EFLAGS: 00010292
> [   82.316204][ T8020] RAX: 000000000000008b RBX: 0000000000000085
> RCX: 218564eba7c06800
> [   82.316813][ T8020] RDX: 0000000000000000 RSI: 0000000080000000
> RDI: 0000000000000000
> [   82.317396][ T8020] RBP: ffffc9000f9cfac0 R08: ffffffff817176dc
> R09: 1ffff92001f39ed0
> [   82.317987][ T8020] R10: dffffc0000000000 R11: fffff52001f39ed1
> R12: 0000000000000001
> [   82.318548][ T8020] R13: ffffffff8b862901 R14: ffffffff90fbe8c0
> R15: ffffffff8b862901
> [   82.319116][ T8020] FS:  00007f04610bc540(0000)
> GS:ffff88802c800000(0000) knlGS:0000000000000000
> [   82.319808][ T8020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   82.320308][ T8020] CR2: 00007f0460fd9650 CR3: 000000004a16a000
> CR4: 0000000000750ef0
> [   82.320883][ T8020] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [   82.321460][ T8020] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
> [   82.322004][ T8020] PKRU: 55555554
> [   82.322269][ T8020] Call Trace:
> [   82.322515][ T8020]  <TASK>
> [   82.322729][ T8020]  ? __die_body+0x88/0xe0
> [   82.323043][ T8020]  ? die+0xcf/0x110
> [   82.323322][ T8020]  ? do_trap+0x155/0x3a0
> [   82.323625][ T8020]  ? __jump_label_patch+0x456/0x480
> [   82.324009][ T8020]  ? do_error_trap+0x1dc/0x2a0
> [   82.324371][ T8020]  ? __jump_label_patch+0x456/0x480
> [   82.324774][ T8020]  ? do_int3+0x50/0x50
> [   82.325088][ T8020]  ? handle_invalid_op+0x34/0x40
> [   82.325445][ T8020]  ? __jump_label_patch+0x456/0x480
> [   82.325815][ T8020]  ? exc_invalid_op+0x34/0x50
> [   82.326156][ T8020]  ? asm_exc_invalid_op+0x1a/0x20
> [   82.326533][ T8020]  ? __wake_up_klogd+0xcc/0x100
> [   82.326879][ T8020]  ? __jump_label_patch+0x456/0x480
> [   82.327249][ T8020]  ? cr4_update_pce+0xf/0x80
> [   82.327601][ T8020]  ? arch_jump_label_transform_queue+0xf0/0xf0
> [   82.328047][ T8020]  ? cr4_update_pce+0xf/0x80
> [   82.328380][ T8020]  ? cr4_update_pce+0x1e/0x80
> [   82.328717][ T8020]  ? cr4_update_pce+0x11/0x80
> [   82.329052][ T8020]  ? read_lock_is_recursive+0x20/0x20
> [   82.329466][ T8020]  ? __static_key_slow_dec_cpuslocked+0xb0/0x140
> [   82.329966][ T8020]  ? text_poke_queue+0x46/0x180
> [   82.330329][ T8020]  arch_jump_label_transform_queue+0x63/0xf0
> [   82.330742][ T8020]  __jump_label_update+0x18b/0x3b0
> [   82.331101][ T8020]  __static_key_slow_dec_cpuslocked+0xe9/0x140
> [   82.331519][ T8020]  static_key_slow_dec+0x50/0xa0
> [   82.331873][ T8020]  set_attr_rdpmc+0x193/0x270
> [   82.332179][ T8020]  ? get_attr_rdpmc+0x30/0x30
> [   82.332511][ T8020]  ? sysfs_kf_write+0x18d/0x2b0
> [   82.332832][ T8020]  ? sysfs_kf_read+0x370/0x370
> [   82.333159][ T8020]  kernfs_fop_write_iter+0x3ab/0x500

So, something in kernfs modified a jump label location that was freed?

-- Steve


> [   82.333531][ T8020]  vfs_write+0xa8b/0xd00
> [   82.333812][ T8020]  ? kernfs_fop_read_iter+0x640/0x640
> [   82.334234][ T8020]  ? kernel_write+0x330/0x330
> [   82.334598][ T8020]  ? do_sys_openat2+0x16b/0x1c0
> [   82.334949][ T8020]  ? do_sys_open+0x230/0x230
> [   82.335310][ T8020]  ? __up_read+0x2bb/0x6a0
> [   82.335633][ T8020]  ksys_write+0x17b/0x2a0
> [   82.335966][ T8020]  ? __ia32_sys_read+0x90/0x90
> [   82.336333][ T8020]  ? do_syscall_64+0xa5/0x230
> [   82.336656][ T8020]  do_syscall_64+0xe2/0x230
> [   82.336983][ T8020]  ? clear_bhb_loop+0x25/0x80
> [   82.337289][ T8020]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
> [   82.337692][ T8020] RIP: 0033:0x7f0460fd2473
> [   82.337987][ T8020] Code: 8b 15 21 2a 0e 00 f7 d8 64 89 02 48 c7 c0
> ff ff ff ff eb b7 0f 1f 00 64 8b 08
> [   82.339230][ T8020] RSP: 002b:00007ffc46fb3058 EFLAGS: 00000246
> ORIG_RAX: 0000000000000001
> [   82.339804][ T8020] RAX: ffffffffffffffda RBX: 0000000000000000
> RCX: 00007f0460fd2473
> [   82.340331][ T8020] RDX: 0000000000000002 RSI: 00007ffc46fb3090
> RDI: 0000000000000004
> [   82.340852][ T8020] RBP: 00007ffc46fb3cb0 R08: 000000000000000f
> R09: 0072736d2f232f75
> [   82.341353][ T8020] R10: 0000000000000000 R11: 0000000000000246
> R12: 00005615dc1dd1c0
> [   82.341853][ T8020] R13: 0000000000000000 R14: 0000000000000000
> R15: 0000000000000000
> [   82.342364][ T8020]  </TASK>
> [   82.342554][ T8020] Modules linked in:
> [   82.342875][ T8020] ---[ end trace 0000000000000000 ]---
> [   82.343230][ T8020] RIP: 0010:__jump_label_patch+0x456/0x480
> [   82.343668][ T8020] Code: e8 af a1 5d 00 48 c7 c7 40 01 86 8b 48 8b
> 0c 24 48 89 ce 48 89 ca 4d 89 e8 4f
> [   82.345750][ T8020] RSP: 0018:ffffc9000f9cf980 EFLAGS: 00010292
> [   82.346161][ T8020] RAX: 000000000000008b RBX: 0000000000000085
> RCX: 218564eba7c06800
> [   82.346657][ T8020] RDX: 0000000000000000 RSI: 0000000080000000
> RDI: 0000000000000000
> [   82.347391][ T8020] RBP: ffffc9000f9cfac0 R08: ffffffff817176dc
> R09: 1ffff92001f39ed0
> [   82.347901][ T8020] R10: dffffc0000000000 R11: fffff52001f39ed1
> R12: 0000000000000001
> [   82.348652][ T8020] R13: ffffffff8b862901 R14: ffffffff90fbe8c0
> R15: ffffffff8b862901
> [   82.349173][ T8020] FS:  00007f04610bc540(0000)
> GS:ffff88802c800000(0000) knlGS:0000000000000000
> [   82.349784][ T8020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   82.350196][ T8020] CR2: 00007f96d80450b8 CR3: 000000004a16a000
> CR4: 0000000000750ef0
> [   82.350717][ T8020] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [   82.351243][ T8020] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
> [   82.351769][ T8020] PKRU: 55555554
> [   82.352011][ T8020] Kernel panic - not syncing: Fatal exception
> [   82.352665][ T8020] Kernel Offset: disabled
> [   82.352988][ T8020] Rebooting in 86400 seconds..
> ```
> 
> We have a C repro which can trigger this bug in latest upstream kernel
> commit, listed below:
> 
> ```
> #define _GNU_SOURCE
> 
> #include <dirent.h>
> #include <endian.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <signal.h>
> #include <stdarg.h>
> #include <stdbool.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/prctl.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <time.h>
> #include <unistd.h>
> 
> static unsigned long long procid;
> 
> static void sleep_ms(uint64_t ms)
> {
>   usleep(ms * 1000);
> }
> 
> static uint64_t current_time_ms(void)
> {
>   struct timespec ts;
>   if (clock_gettime(CLOCK_MONOTONIC, &ts))
>     exit(1);
>   return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
> }
> 
> static bool write_file(const char* file, const char* what, ...)
> {
>   char buf[1024];
>   va_list args;
>   va_start(args, what);
>   vsnprintf(buf, sizeof(buf), what, args);
>   va_end(args);
>   buf[sizeof(buf) - 1] = 0;
>   int len = strlen(buf);
>   int fd = open(file, O_WRONLY | O_CLOEXEC);
>   if (fd == -1)
>     return false;
>   if (write(fd, buf, len) != len) {
>     int err = errno;
>     close(fd);
>     errno = err;
>     return false;
>   }
>   close(fd);
>   return true;
> }
> 
> static long syz_mod_dev(volatile long a0, volatile long a1, volatile long a2,
>                         volatile long a3, volatile long a4)
> {
>   int fd, sysfd;
>   char buf[1024], sysbuf[1024], input[1024];
>   char* hash;
>   char dev_num;
>   dev_num = 0;
>   strncpy(buf, (char*)a0, sizeof(buf) - 1);
>   buf[sizeof(buf) - 1] = 0;
>   while ((hash = strchr(buf, '#'))) {
>     *hash = '0' + (char)(a1 % 10);
>     a1 /= 10;
>     if (dev_num == 0)
>       dev_num = *hash;
>   }
>   fd = open(buf, a2, 0);
>   strncpy(sysbuf, (char*)a3, sizeof(sysbuf) - 1);
>   sysbuf[sizeof(sysbuf) - 1] = 0;
>   if ((hash = strchr(sysbuf, '#'))) {
>     *hash = dev_num;
>     while ((hash = strchr(sysbuf, '#'))) {
>       *hash = '0' + (char)(a1 % 10);
>       a1 /= 10;
>     }
>   }
>   sysfd = open(sysbuf, O_RDWR, 0);
>   strncpy(input, (char*)a4, sizeof(input) - 1);
>   input[sizeof(input) - 1] = 0;
>   hash = strchr(input, '\0');
>   sysfd = write(sysfd, input, hash - input + 1);
>   return fd;
> }
> 
> static void kill_and_wait(int pid, int* status)
> {
>   kill(-pid, SIGKILL);
>   kill(pid, SIGKILL);
>   for (int i = 0; i < 100; i++) {
>     if (waitpid(-1, status, WNOHANG | __WALL) == pid)
>       return;
>     usleep(1000);
>   }
>   DIR* dir = opendir("/sys/fs/fuse/connections");
>   if (dir) {
>     for (;;) {
>       struct dirent* ent = readdir(dir);
>       if (!ent)
>         break;
>       if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
>         continue;
>       char abort[300];
>       snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
>                ent->d_name);
>       int fd = open(abort, O_WRONLY);
>       if (fd == -1) {
>         continue;
>       }
>       if (write(fd, abort, 1) < 0) {
>       }
>       close(fd);
>     }
>     closedir(dir);
>   } else {
>   }
>   while (waitpid(-1, status, __WALL) != pid) {
>   }
> }
> 
> static void setup_test()
> {
>   prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
>   setpgrp();
>   write_file("/proc/self/oom_score_adj", "1000");
> }
> 
> static void execute_one(void);
> 
> #define WAIT_FLAGS __WALL
> 
> static void loop(void)
> {
>   int iter = 0;
>   for (;; iter++) {
>     int pid = fork();
>     if (pid < 0)
>       exit(1);
>     if (pid == 0) {
>       setup_test();
>       execute_one();
>       exit(0);
>     }
>     int status = 0;
>     uint64_t start = current_time_ms();
>     for (;;) {
>       if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
>         break;
>       sleep_ms(1);
>       if (current_time_ms() - start < 5000)
>         continue;
>       kill_and_wait(pid, &status);
>       break;
>     }
>   }
> }
> 
> void execute_one(void)
> {
>   memcpy((void*)0x20000000, "/dev/cpu/#/msr\000", 15);
>   memcpy((void*)0x200000c0, "/sys/devices/cpu/rdpmc\000", 23);
>   memcpy((void*)0x20000100, "0\000", 2);
>   syz_mod_dev(
>       /*dev=*/0x20000000, /*id=*/0x8001,
>       /*flags=O_TRUNC|O_PATH|O_NONBLOCK|O_NOATIME|O_EXCL|O_DIRECTORY|0x442*/
>       0x250ec2, /*file=*/0x200000c0, /*buf=*/0x20000100);
>   memcpy((void*)0x20000080, "/dev/cpu/#/msr\000", 15);
>   memcpy((void*)0x20000140, "/sys/devices/cpu/rdpmc\000", 23);
>   memcpy((void*)0x20000180, "1\000", 2);
>   syz_mod_dev(/*dev=*/0x20000080, /*id=*/0,
>               /*flags=__O_TMPFILE|O_APPEND*/ 0x400400, /*file=*/0x20000140,
>               /*buf=*/0x20000180);
> }
> int main(void)
> {
>   syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>           /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
>           /*offset=*/0ul);
>   syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
>           /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
>           /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
>           /*offset=*/0ul);
>   syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>           /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
>           /*offset=*/0ul);
>   for (procid = 0; procid < 4; procid++) {
>     if (fork() == 0) {
>       loop();
>     }
>   }
>   sleep(1000000);
>   return 0;
> }
> ```
> 
> If you have any questions, please contact us.
> 
> Reported by Yue Sun <samsun1006219@gmail.com>
> Reported by xingwei lee <xrivendell7@gmail.com>
> 
> Best Regards,
> Yue


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
  2024-06-09 13:04 ` Steven Rostedt
@ 2024-06-09 14:06   ` Thomas Gleixner
  2024-06-09 14:25     ` Steven Rostedt
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-09 14:06 UTC (permalink / raw)
  To: Steven Rostedt, Sam Sun
  Cc: linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe, jbaron, ardb,
	mingo, Borislav Petkov, dave.hansen, hpa, xrivendell7,
	Greg Kroah-Hartman, Tejun Heo

On Sun, Jun 09 2024 at 09:04, Steven Rostedt wrote:
> On Sun, 9 Jun 2024 14:33:01 +0800
> Sam Sun <samsun1006219@gmail.com> wrote:
>> [   82.310798][ T8020] ------------[ cut here ]------------
>> [   82.311236][ T8020] kernel BUG at arch/x86/kernel/jump_label.c:73!
>
> This is not a bug with jump labels. It's a bug with whatever is using jump
> labels. Looks like something tried to modify a jump label that no longer
> exists.

The jump label exists.

>> [   82.331873][ T8020]  set_attr_rdpmc+0x193/0x270
>> [   82.332179][ T8020]  ? get_attr_rdpmc+0x30/0x30
>> [   82.332511][ T8020]  ? sysfs_kf_write+0x18d/0x2b0
>> [   82.332832][ T8020]  ? sysfs_kf_read+0x370/0x370
>> [   82.333159][ T8020]  kernfs_fop_write_iter+0x3ab/0x500
>
> So, something in kernfs modified a jump label location that was freed?

No. What happens is:

CPU 0                           	CPU 1

kernfs_fop_write_iter()			kernfs_fop_write_iter()
  set_attr_rdpmc()		  	  set_attr_rdpmc()
    arch_jump_label_transform_queue()       arch_jump_label_transform_queue()
     mutex_lock(text_mutex)                   mutex_lock(text_mutex)
     __jump_label_patch()
     text_poke_queue()
     mutex_unlokc(text_mutex)
                                              __jump_label_patch()

CPU 1 sees the original text and not the expected because CPU 0 did not
yet invoke arch_jump_label_transform_apply().

So clearly set_attr_rdpmc() lacks serialization, no?

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
  2024-06-09 14:06   ` Thomas Gleixner
@ 2024-06-09 14:25     ` Steven Rostedt
  2024-06-09 16:02       ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2024-06-09 14:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sam Sun, linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe,
	jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
	xrivendell7, Greg Kroah-Hartman, Tejun Heo

On Sun, 09 Jun 2024 16:06:05 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Sun, Jun 09 2024 at 09:04, Steven Rostedt wrote:
> > On Sun, 9 Jun 2024 14:33:01 +0800
> > Sam Sun <samsun1006219@gmail.com> wrote:  
> >> [   82.310798][ T8020] ------------[ cut here ]------------
> >> [   82.311236][ T8020] kernel BUG at arch/x86/kernel/jump_label.c:73!  
> >
> > This is not a bug with jump labels. It's a bug with whatever is using jump
> > labels. Looks like something tried to modify a jump label that no longer
> > exists.  
> 
> The jump label exists.

Ah, I missed the set_attr_rdpmc() as something not with a "?" in front :-p

> 
> >> [   82.331873][ T8020]  set_attr_rdpmc+0x193/0x270
> >> [   82.332179][ T8020]  ? get_attr_rdpmc+0x30/0x30
> >> [   82.332511][ T8020]  ? sysfs_kf_write+0x18d/0x2b0
> >> [   82.332832][ T8020]  ? sysfs_kf_read+0x370/0x370
> >> [   82.333159][ T8020]  kernfs_fop_write_iter+0x3ab/0x500  
> >
> > So, something in kernfs modified a jump label location that was freed?  
> 
> No. What happens is:
> 
> CPU 0                           	CPU 1
> 
> kernfs_fop_write_iter()			kernfs_fop_write_iter()
>   set_attr_rdpmc()		  	  set_attr_rdpmc()
>     arch_jump_label_transform_queue()       arch_jump_label_transform_queue()
>      mutex_lock(text_mutex)                   mutex_lock(text_mutex)
>      __jump_label_patch()
>      text_poke_queue()
>      mutex_unlokc(text_mutex)
>                                               __jump_label_patch()
> 
> CPU 1 sees the original text and not the expected because CPU 0 did not
> yet invoke arch_jump_label_transform_apply().
> 
> So clearly set_attr_rdpmc() lacks serialization, no?
> 

Hmm, but should jump labels fail when that happens? Or should it catch
it, and not cause a BUG?

-- Steve

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
  2024-06-09 14:25     ` Steven Rostedt
@ 2024-06-09 16:02       ` Thomas Gleixner
  2024-06-09 16:56         ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-09 16:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sam Sun, linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe,
	jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
	xrivendell7, Greg Kroah-Hartman, Tejun Heo

On Sun, Jun 09 2024 at 10:25, Steven Rostedt wrote:
> On Sun, 09 Jun 2024 16:06:05 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> CPU 0                           	CPU 1
>> 
>> kernfs_fop_write_iter()			kernfs_fop_write_iter()
>>   set_attr_rdpmc()		  	  set_attr_rdpmc()
>>     arch_jump_label_transform_queue()       arch_jump_label_transform_queue()
>>      mutex_lock(text_mutex)                   mutex_lock(text_mutex)
>>      __jump_label_patch()
>>      text_poke_queue()
>>      mutex_unlokc(text_mutex)
>>                                               __jump_label_patch()
>> 
>> CPU 1 sees the original text and not the expected because CPU 0 did not
>> yet invoke arch_jump_label_transform_apply().
>> 
>> So clearly set_attr_rdpmc() lacks serialization, no?
>> 
> Hmm, but should jump labels fail when that happens? Or should it catch
> it, and not cause a BUG?

Well the bug is there to detect inconsistency and that clearly works :)

But I clearly can't read, because the jump label operations are
serialized via jump_label_mutex. Hrm...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
  2024-06-09 16:02       ` Thomas Gleixner
@ 2024-06-09 16:56         ` Thomas Gleixner
  2024-06-09 19:39           ` Thomas Gleixner
  2024-06-10  6:46           ` Peter Zijlstra
  0 siblings, 2 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-09 16:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sam Sun, linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe,
	jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
	xrivendell7, Greg Kroah-Hartman, Tejun Heo

On Sun, Jun 09 2024 at 18:02, Thomas Gleixner wrote:
> On Sun, Jun 09 2024 at 10:25, Steven Rostedt wrote:
> Well the bug is there to detect inconsistency and that clearly works :)
>
> But I clearly can't read, because the jump label operations are
> serialized via jump_label_mutex. Hrm...

Ok. Now I found if for real. It's in the jump label core:

CPU0                            CPU1

static_key_slow_dec()
 static_key_slow_try_dec()

   key->enabled == 1
   val = atomic_fetch_add_unless(&key->enabled, -1, 1);
   if (val == 1)
   	return false;

   jump_label_lock();
   if (atomic_dec_and_test(&key->enabled)) {
      --> key->enabled == 0
      __jump_label_update()

                                static_key_slow_dec()
                                 static_key_slow_try_dec()

                                    key->enabled == 0
                                    val = atomic_fetch_add_unless(&key->enabled, -1, 1);

                                    --> key->enabled == -1 <- FAIL

static_key_slow_try_dec() is buggy. It needs similar logic as
static_key_slow_try_inc() to work correctly.

It's not only the 0, key->enabled can be -1 when the other CPU is in the
slow path of enabling it.

I'll send a patch after testing it.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
  2024-06-09 16:56         ` Thomas Gleixner
@ 2024-06-09 19:39           ` Thomas Gleixner
  2024-06-10  6:46           ` Peter Zijlstra
  1 sibling, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-09 19:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sam Sun, linux-kernel, x86, syzkaller-bugs, peterz, jpoimboe,
	jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
	xrivendell7, Greg Kroah-Hartman, Tejun Heo

On Sun, Jun 09 2024 at 18:56, Thomas Gleixner wrote:

> On Sun, Jun 09 2024 at 18:02, Thomas Gleixner wrote:
>> On Sun, Jun 09 2024 at 10:25, Steven Rostedt wrote:
>> Well the bug is there to detect inconsistency and that clearly works :)
>>
>> But I clearly can't read, because the jump label operations are
>> serialized via jump_label_mutex. Hrm...
>
> Ok. Now I found if for real. It's in the jump label core:
>
> CPU0                            CPU1
>
> static_key_slow_dec()
>  static_key_slow_try_dec()
>
>    key->enabled == 1
>    val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>    if (val == 1)
>    	return false;
>
>    jump_label_lock();
>    if (atomic_dec_and_test(&key->enabled)) {
>       --> key->enabled == 0
>       __jump_label_update()
>
>                                 static_key_slow_dec()
>                                  static_key_slow_try_dec()
>
>                                     key->enabled == 0
>                                     val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>
>                                     --> key->enabled == -1 <- FAIL
>
> static_key_slow_try_dec() is buggy. It needs similar logic as
> static_key_slow_try_inc() to work correctly.
>
> It's not only the 0, key->enabled can be -1 when the other CPU is in the
> slow path of enabling it.
>
> I'll send a patch after testing it.

That's fixable, but it does not cure all of it.

set_attr_rdpmc() really needs serialization otherwise it can end up with
unbalanced operations.

CPU0                                    CPU1

x86_pmu.attr_rdpmc == 0

if (val != x86_pmu.attr_rdpmc) {
   if (val == 0)
   	...
   else if (x86_pmu.attr_rdpmc == 0)
     static_branch_dec(&rdpmc_never_available_key);

                                        if (val != x86_pmu.attr_rdpmc) {
                                           if (val == 0)
                                              	...
                                            else if (x86_pmu.attr_rdpmc == 0)
                               FAIL --->        static_branch_dec(&rdpmc_never_available_key);

No?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
  2024-06-09 16:56         ` Thomas Gleixner
  2024-06-09 19:39           ` Thomas Gleixner
@ 2024-06-10  6:46           ` Peter Zijlstra
  2024-06-10 10:34             ` Thomas Gleixner
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2024-06-10  6:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, Sam Sun, linux-kernel, x86, syzkaller-bugs,
	jpoimboe, jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
	xrivendell7, Greg Kroah-Hartman, Tejun Heo

On Sun, Jun 09, 2024 at 06:56:14PM +0200, Thomas Gleixner wrote:

> Ok. Now I found if for real. It's in the jump label core:
> 
> CPU0                            CPU1
> 
> static_key_slow_dec()
>  static_key_slow_try_dec()
> 
>    key->enabled == 1
>    val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>    if (val == 1)
>    	return false;
> 
>    jump_label_lock();
>    if (atomic_dec_and_test(&key->enabled)) {
>       --> key->enabled == 0
>       __jump_label_update()
> 
>                                 static_key_slow_dec()
>                                  static_key_slow_try_dec()
> 
>                                     key->enabled == 0
>                                     val = atomic_fetch_add_unless(&key->enabled, -1, 1);
> 
>                                     --> key->enabled == -1 <- FAIL
> 
> static_key_slow_try_dec() is buggy. It needs similar logic as
> static_key_slow_try_inc() to work correctly.
> 
> It's not only the 0, key->enabled can be -1 when the other CPU is in the
> slow path of enabling it.

Well, the -1 thing is in the 0->1 path, that is, the very first enabler.

That *should* not race with a disabler. If it does, there is external
confusion. (As I think the follow up email shows..)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked
  2024-06-10  6:46           ` Peter Zijlstra
@ 2024-06-10 10:34             ` Thomas Gleixner
  2024-06-10 12:46               ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 10:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Sam Sun, linux-kernel, x86, syzkaller-bugs,
	jpoimboe, jbaron, ardb, mingo, Borislav Petkov, dave.hansen, hpa,
	xrivendell7, Greg Kroah-Hartman, Tejun Heo

On Mon, Jun 10 2024 at 08:46, Peter Zijlstra wrote:
> On Sun, Jun 09, 2024 at 06:56:14PM +0200, Thomas Gleixner wrote:
>
>> Ok. Now I found if for real. It's in the jump label core:
>> 
>> CPU0                            CPU1
>> 
>> static_key_slow_dec()
>>  static_key_slow_try_dec()
>> 
>>    key->enabled == 1
>>    val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>>    if (val == 1)
>>    	return false;
>> 
>>    jump_label_lock();
>>    if (atomic_dec_and_test(&key->enabled)) {
>>       --> key->enabled == 0
>>       __jump_label_update()
>> 
>>                                 static_key_slow_dec()
>>                                  static_key_slow_try_dec()
>> 
>>                                     key->enabled == 0
>>                                     val = atomic_fetch_add_unless(&key->enabled, -1, 1);
>> 
>>                                     --> key->enabled == -1 <- FAIL
>> 
>> static_key_slow_try_dec() is buggy. It needs similar logic as
>> static_key_slow_try_inc() to work correctly.
>> 
>> It's not only the 0, key->enabled can be -1 when the other CPU is in the
>> slow path of enabling it.
>
> Well, the -1 thing is in the 0->1 path, that is, the very first enabler.
>
> That *should* not race with a disabler. If it does, there is external
> confusion. (As I think the follow up email shows..)

Right, but all of this is too fragile. Let me send out those patches.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 0/4] perf/x86, jump_label: Cure serialization issues
  2024-06-10 10:34             ` Thomas Gleixner
@ 2024-06-10 12:46               ` Thomas Gleixner
  2024-06-10 12:46                 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
                                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7

Yue and Xingwei reported a syzkaller crash related to the rdpmc attribute
file and jump labels. It turns out to be lack of serialization in the
attribute write path and non-robustness on the jump label side.

The following series addresses these issues.

Thanks,

	tglx
---
 arch/x86/events/core.c |    3 ++
 kernel/jump_label.c    |   67 ++++++++++++++++++++++++++++---------------------
 2 files changed, 42 insertions(+), 28 deletions(-)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 1/4] perf/x86: Serialize set_attr_rdpmc()
  2024-06-10 12:46               ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
@ 2024-06-10 12:46                 ` Thomas Gleixner
  2024-06-17 15:47                   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2024-06-10 12:46                 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7

Yue and Xingwei reported a jump label failure. It's caused by the lack of
serialization in set_attr_rdpmc():

CPU0                           CPU1

Assume: x86_pmu.attr_rdpmc == 0

if (val != x86_pmu.attr_rdpmc) {
  if (val == 0)
    ...
  else if (x86_pmu.attr_rdpmc == 0)
    static_branch_dec(&rdpmc_never_available_key);

				if (val != x86_pmu.attr_rdpmc) {
				   if (val == 0)
				      ...
				   else if (x86_pmu.attr_rdpmc == 0)
     FAIL, due to imbalance --->      static_branch_dec(&rdpmc_never_available_key);

The reported BUG() is a consequence of the above and of another bug in the
jump label core code. The core code needs a separate fix, but that cannot
prevent the imbalance problem caused by set_attr_rdpmc().

Prevent this by serializing set_attr_rdpmc() locally.

Fixes: a66734297f78 ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks")
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Closes: https://lore.kernel.org/r/CAEkJfYNzfW1vG=ZTMdz_Weoo=RXY1NDunbxnDaLyj8R4kEoE_w@mail.gmail.com
---
 arch/x86/events/core.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2547,6 +2547,7 @@ static ssize_t set_attr_rdpmc(struct dev
 			      struct device_attribute *attr,
 			      const char *buf, size_t count)
 {
+	static DEFINE_MUTEX(rdpmc_mutex);
 	unsigned long val;
 	ssize_t ret;
 
@@ -2560,6 +2561,8 @@ static ssize_t set_attr_rdpmc(struct dev
 	if (x86_pmu.attr_rdpmc_broken)
 		return -ENOTSUPP;
 
+	guard(mutex)(&rdpmc_mutex);
+
 	if (val != x86_pmu.attr_rdpmc) {
 		/*
 		 * Changing into or out of never available or always available,


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec()
  2024-06-10 12:46               ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
  2024-06-10 12:46                 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
@ 2024-06-10 12:46                 ` Thomas Gleixner
  2024-06-10 17:57                   ` Peter Zijlstra
  2024-06-17 15:47                   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2024-06-10 12:46                 ` [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled() Thomas Gleixner
  2024-06-10 12:46                 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
  3 siblings, 2 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7

The commit which tried to fix the concurrency issues of concurrent
static_key_slow_inc() failed to fix the equivalent issues
vs. static_key_slow_dec():

CPU0                     CPU1

static_key_slow_dec()
  static_key_slow_try_dec()

	key->enabled == 1
	val = atomic_fetch_add_unless(&key->enabled, -1, 1);
	if (val == 1)
	     return false;

  jump_label_lock();
  if (atomic_dec_and_test(&key->enabled)) {
     --> key->enabled == 0
   __jump_label_update()

			 static_key_slow_dec()
			   static_key_slow_try_dec()

			     key->enabled == 0
			     val = atomic_fetch_add_unless(&key->enabled, -1, 1);

			      --> key->enabled == -1 <- FAIL

There is another bug in that code, when there is a concurrent
static_key_slow_inc() which enables the key as that sets key->enabled to -1
so on the other CPU

	val = atomic_fetch_add_unless(&key->enabled, -1, 1);

will succeed and decrement to -2, which is invalid.

Cure all of this by replacing the atomic_fetch_add_unless() with a
atomic_try_cmpxchg() loop similar to static_key_fast_inc_not_disabled().

Fixes: 4c5ea0a9cd02 ("locking/static_key: Fix concurrent static_key_slow_inc()")
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/jump_label.c |   38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -131,7 +131,7 @@ bool static_key_fast_inc_not_disabled(st
 	STATIC_KEY_CHECK_USE(key);
 	/*
 	 * Negative key->enabled has a special meaning: it sends
-	 * static_key_slow_inc() down the slow path, and it is non-zero
+	 * static_key_slow_inc/dec() down the slow path, and it is non-zero
 	 * so it counts as "enabled" in jump_label_update().  Note that
 	 * atomic_inc_unless_negative() checks >= 0, so roll our own.
 	 */
@@ -150,7 +150,7 @@ bool static_key_slow_inc_cpuslocked(stru
 	lockdep_assert_cpus_held();
 
 	/*
-	 * Careful if we get concurrent static_key_slow_inc() calls;
+	 * Careful if we get concurrent static_key_slow_inc/dec() calls;
 	 * later calls must wait for the first one to _finish_ the
 	 * jump_label_update() process.  At the same time, however,
 	 * the jump_label_update() call below wants to see
@@ -247,20 +247,25 @@ EXPORT_SYMBOL_GPL(static_key_disable);
 
 static bool static_key_slow_try_dec(struct static_key *key)
 {
-	int val;
-
-	val = atomic_fetch_add_unless(&key->enabled, -1, 1);
-	if (val == 1)
-		return false;
+	int v;
 
 	/*
-	 * The negative count check is valid even when a negative
-	 * key->enabled is in use by static_key_slow_inc(); a
-	 * __static_key_slow_dec() before the first static_key_slow_inc()
-	 * returns is unbalanced, because all other static_key_slow_inc()
-	 * instances block while the update is in progress.
+	 * Go into the slow path if key::enabled is less than or equal than
+	 * one. One is valid to shut down the key, anything less than one
+	 * is an imbalance, which is handled at the call site.
+	 *
+	 * That includes the special case of '-1' which is set in
+	 * static_key_slow_inc_cpuslocked(), but that's harmless as it is
+	 * fully serialized in the slow path below. By the time this task
+	 * acquires the jump label lock the value is back to one and the
+	 * retry under the lock must succeed.
 	 */
-	WARN(val < 0, "jump label: negative count!\n");
+	v = atomic_read(&key->enabled);
+	do {
+		if (v <= 1)
+			return false;
+	} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1)));
+
 	return true;
 }
 
@@ -271,10 +276,11 @@ static void __static_key_slow_dec_cpuslo
 	if (static_key_slow_try_dec(key))
 		return;
 
-	jump_label_lock();
-	if (atomic_dec_and_test(&key->enabled))
+	guard(mutex)(&jump_label_mutex);
+	if (atomic_cmpxchg(&key->enabled, 1, 0))
 		jump_label_update(key);
-	jump_label_unlock();
+	else
+		WARN_ON_ONCE(!static_key_slow_try_dec(key));
 }
 
 static void __static_key_slow_dec(struct static_key *key)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled()
  2024-06-10 12:46               ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
  2024-06-10 12:46                 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
  2024-06-10 12:46                 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
@ 2024-06-10 12:46                 ` Thomas Gleixner
  2024-06-17 15:47                   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  2024-06-10 12:46                 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
  3 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7

The second part of

      if (v <= 0 || (v + 1) < 0)

is not immediately obvious that it acts as overflow protection.

Check explicitely for v == INT_MAX instead and add a proper comment how
this is used at the call sites.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/jump_label.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -132,12 +132,15 @@ bool static_key_fast_inc_not_disabled(st
 	/*
 	 * Negative key->enabled has a special meaning: it sends
 	 * static_key_slow_inc/dec() down the slow path, and it is non-zero
-	 * so it counts as "enabled" in jump_label_update().  Note that
-	 * atomic_inc_unless_negative() checks >= 0, so roll our own.
+	 * so it counts as "enabled" in jump_label_update().
+	 *
+	 * The INT_MAX overflow condition is either used by the networking
+	 * code to reset or detected in the slow path of
+	 * static_key_slow_inc_cpuslocked().
 	 */
 	v = atomic_read(&key->enabled);
 	do {
-		if (v <= 0 || (v + 1) < 0)
+		if (v <= 0 || v == INT_MAX)
 			return false;
 	} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)));
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()
  2024-06-10 12:46               ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
                                   ` (2 preceding siblings ...)
  2024-06-10 12:46                 ` [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled() Thomas Gleixner
@ 2024-06-10 12:46                 ` Thomas Gleixner
  2024-06-12 13:57                   ` Uros Bizjak
  2024-06-17 15:47                   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  3 siblings, 2 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 12:46 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7

Make the code more obvious and add proper comments to avoid future head
scratching.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/jump_label.c |   20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -162,22 +162,24 @@ bool static_key_slow_inc_cpuslocked(stru
 	if (static_key_fast_inc_not_disabled(key))
 		return true;
 
-	jump_label_lock();
-	if (atomic_read(&key->enabled) == 0) {
-		atomic_set(&key->enabled, -1);
+	guard(mutex)(&jump_label_mutex);
+	/* Try to mark it as 'enabling in progress. */
+	if (!atomic_cmpxchg(&key->enabled, 0, -1)) {
 		jump_label_update(key);
 		/*
-		 * Ensure that if the above cmpxchg loop observes our positive
-		 * value, it must also observe all the text changes.
+		 * Ensure that when static_key_fast_inc_not_disabled() or
+		 * static_key_slow_try_dec() observe the positive value,
+		 * they must also observe all the text changes.
 		 */
 		atomic_set_release(&key->enabled, 1);
 	} else {
-		if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key))) {
-			jump_label_unlock();
+		/*
+		 * While holding the mutex this should never observe
+		 * anything else than a value >= 1 and succeed
+		 */
+		if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key)))
 			return false;
-		}
 	}
-	jump_label_unlock();
 	return true;
 }
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec()
  2024-06-10 12:46                 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
@ 2024-06-10 17:57                   ` Peter Zijlstra
  2024-06-10 18:00                     ` Thomas Gleixner
  2024-06-17 15:47                   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2024-06-10 17:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7

On Mon, Jun 10, 2024 at 02:46:36PM +0200, Thomas Gleixner wrote:

> @@ -247,20 +247,25 @@ EXPORT_SYMBOL_GPL(static_key_disable);
>  
>  static bool static_key_slow_try_dec(struct static_key *key)
>  {
> +	int v;
>  
>  	/*
> +	 * Go into the slow path if key::enabled is less than or equal than
> +	 * one. One is valid to shut down the key, anything less than one
> +	 * is an imbalance, which is handled at the call site.
> +	 *
> +	 * That includes the special case of '-1' which is set in
> +	 * static_key_slow_inc_cpuslocked(), but that's harmless as it is
> +	 * fully serialized in the slow path below. By the time this task
> +	 * acquires the jump label lock the value is back to one and the
> +	 * retry under the lock must succeed.

Harmless yes, but it really should not happen to begin with. If this
happens it means someone wants to disable a key that is in the middle of
getting enabled for the first time.

I'm tempted to want a WARN here instead. Hmm?

>  	 */
> +	v = atomic_read(&key->enabled);
> +	do {
> +		if (v <= 1)
> +			return false;
> +	} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1)));
> +
>  	return true;
>  }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec()
  2024-06-10 17:57                   ` Peter Zijlstra
@ 2024-06-10 18:00                     ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2024-06-10 18:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7

On Mon, Jun 10 2024 at 19:57, Peter Zijlstra wrote:
> On Mon, Jun 10, 2024 at 02:46:36PM +0200, Thomas Gleixner wrote:
>
>> @@ -247,20 +247,25 @@ EXPORT_SYMBOL_GPL(static_key_disable);
>>  
>>  static bool static_key_slow_try_dec(struct static_key *key)
>>  {
>> +	int v;
>>  
>>  	/*
>> +	 * Go into the slow path if key::enabled is less than or equal than
>> +	 * one. One is valid to shut down the key, anything less than one
>> +	 * is an imbalance, which is handled at the call site.
>> +	 *
>> +	 * That includes the special case of '-1' which is set in
>> +	 * static_key_slow_inc_cpuslocked(), but that's harmless as it is
>> +	 * fully serialized in the slow path below. By the time this task
>> +	 * acquires the jump label lock the value is back to one and the
>> +	 * retry under the lock must succeed.
>
> Harmless yes, but it really should not happen to begin with. If this
> happens it means someone wants to disable a key that is in the middle of
> getting enabled for the first time.
>
> I'm tempted to want a WARN here instead. Hmm?

No strong opinion

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()
  2024-06-10 12:46                 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
@ 2024-06-12 13:57                   ` Uros Bizjak
  2024-06-17 15:47                   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 21+ messages in thread
From: Uros Bizjak @ 2024-06-12 13:57 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Steven Rostedt, Sam Sun, x86, syzkaller-bugs, xrivendell7


> Make the code more obvious and add proper comments to avoid future head
> scratching.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>   kernel/jump_label.c |   20 +++++++++++---------
>   1 file changed, 11 insertions(+), 9 deletions(-)
> 
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -162,22 +162,24 @@ bool static_key_slow_inc_cpuslocked(stru
>   	if (static_key_fast_inc_not_disabled(key))
>   		return true;
>   
> -	jump_label_lock();
> -	if (atomic_read(&key->enabled) == 0) {
> -		atomic_set(&key->enabled, -1);
> +	guard(mutex)(&jump_label_mutex);
> +	/* Try to mark it as 'enabling in progress. */

Missing closing quotation mark above.

> +	if (!atomic_cmpxchg(&key->enabled, 0, -1)) {

This can be:

int tmp = 0;
if (atomic_try_cmpxchg(&key->enabled, &tmp, -1)) {
...

and will result in more optimal code and will IMO also be more readable 
because it is clear that the code is executed when cmpxchg succeeds.

(BTW: The atomic_read()/atomic_set() pair can also be converted to 
cmpxchg in static_key_enable_cpuslocked().)

Uros.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip: locking/core] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()
  2024-06-10 12:46                 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
  2024-06-12 13:57                   ` Uros Bizjak
@ 2024-06-17 15:47                   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-06-17 15:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     9bc2ff871f00437ad2f10c1eceff51aaa72b478f
Gitweb:        https://git.kernel.org/tip/9bc2ff871f00437ad2f10c1eceff51aaa72b478f
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Mon, 10 Jun 2024 14:46:39 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 11 Jun 2024 11:25:24 +02:00

jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()

Make the code more obvious and add proper comments to avoid future head
scratching.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.548322963@linutronix.de
---
 kernel/jump_label.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 4d06ec2..4ad5ed8 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -162,22 +162,24 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key)
 	if (static_key_fast_inc_not_disabled(key))
 		return true;
 
-	jump_label_lock();
-	if (atomic_read(&key->enabled) == 0) {
-		atomic_set(&key->enabled, -1);
+	guard(mutex)(&jump_label_mutex);
+	/* Try to mark it as 'enabling in progress. */
+	if (!atomic_cmpxchg(&key->enabled, 0, -1)) {
 		jump_label_update(key);
 		/*
-		 * Ensure that if the above cmpxchg loop observes our positive
-		 * value, it must also observe all the text changes.
+		 * Ensure that when static_key_fast_inc_not_disabled() or
+		 * static_key_slow_try_dec() observe the positive value,
+		 * they must also observe all the text changes.
 		 */
 		atomic_set_release(&key->enabled, 1);
 	} else {
-		if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key))) {
-			jump_label_unlock();
+		/*
+		 * While holding the mutex this should never observe
+		 * anything else than a value >= 1 and succeed
+		 */
+		if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key)))
 			return false;
-		}
 	}
-	jump_label_unlock();
 	return true;
 }
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip: locking/core] jump_label: Clarify condition in static_key_fast_inc_not_disabled()
  2024-06-10 12:46                 ` [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled() Thomas Gleixner
@ 2024-06-17 15:47                   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-06-17 15:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     695ef796467ed228b60f1915995e390aea3d85c6
Gitweb:        https://git.kernel.org/tip/695ef796467ed228b60f1915995e390aea3d85c6
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Mon, 10 Jun 2024 14:46:37 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 11 Jun 2024 11:25:23 +02:00

jump_label: Clarify condition in static_key_fast_inc_not_disabled()

The second part of

      if (v <= 0 || (v + 1) < 0)

is not immediately obvious that it acts as overflow protection.

Check explicitely for v == INT_MAX instead and add a proper comment how
this is used at the call sites.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.484973160@linutronix.de
---
 kernel/jump_label.c |  9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 1f05a19..4d06ec2 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -132,12 +132,15 @@ bool static_key_fast_inc_not_disabled(struct static_key *key)
 	/*
 	 * Negative key->enabled has a special meaning: it sends
 	 * static_key_slow_inc/dec() down the slow path, and it is non-zero
-	 * so it counts as "enabled" in jump_label_update().  Note that
-	 * atomic_inc_unless_negative() checks >= 0, so roll our own.
+	 * so it counts as "enabled" in jump_label_update().
+	 *
+	 * The INT_MAX overflow condition is either used by the networking
+	 * code to reset or detected in the slow path of
+	 * static_key_slow_inc_cpuslocked().
 	 */
 	v = atomic_read(&key->enabled);
 	do {
-		if (v <= 0 || (v + 1) < 0)
+		if (v <= 0 || v == INT_MAX)
 			return false;
 	} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)));
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip: locking/core] perf/x86: Serialize set_attr_rdpmc()
  2024-06-10 12:46                 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
@ 2024-06-17 15:47                   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-06-17 15:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yue Sun, Xingwei Lee, Thomas Gleixner, Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     bb9bb45f746b0f9457de9c3fc4da143a6351bdc9
Gitweb:        https://git.kernel.org/tip/bb9bb45f746b0f9457de9c3fc4da143a6351bdc9
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Mon, 10 Jun 2024 14:46:35 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 11 Jun 2024 11:25:22 +02:00

perf/x86: Serialize set_attr_rdpmc()

Yue and Xingwei reported a jump label failure. It's caused by the lack of
serialization in set_attr_rdpmc():

CPU0                           CPU1

Assume: x86_pmu.attr_rdpmc == 0

if (val != x86_pmu.attr_rdpmc) {
  if (val == 0)
    ...
  else if (x86_pmu.attr_rdpmc == 0)
    static_branch_dec(&rdpmc_never_available_key);

				if (val != x86_pmu.attr_rdpmc) {
				   if (val == 0)
				      ...
				   else if (x86_pmu.attr_rdpmc == 0)
     FAIL, due to imbalance --->      static_branch_dec(&rdpmc_never_available_key);

The reported BUG() is a consequence of the above and of another bug in the
jump label core code. The core code needs a separate fix, but that cannot
prevent the imbalance problem caused by set_attr_rdpmc().

Prevent this by serializing set_attr_rdpmc() locally.

Fixes: a66734297f78 ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks")
Closes: https://lore.kernel.org/r/CAEkJfYNzfW1vG=ZTMdz_Weoo=RXY1NDunbxnDaLyj8R4kEoE_w@mail.gmail.com
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.359476013@linutronix.de
---
 arch/x86/events/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5b0dd07..acd367c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2547,6 +2547,7 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
 			      struct device_attribute *attr,
 			      const char *buf, size_t count)
 {
+	static DEFINE_MUTEX(rdpmc_mutex);
 	unsigned long val;
 	ssize_t ret;
 
@@ -2560,6 +2561,8 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
 	if (x86_pmu.attr_rdpmc_broken)
 		return -ENOTSUPP;
 
+	guard(mutex)(&rdpmc_mutex);
+
 	if (val != x86_pmu.attr_rdpmc) {
 		/*
 		 * Changing into or out of never available or always available,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip: locking/core] jump_label: Fix concurrency issues in static_key_slow_dec()
  2024-06-10 12:46                 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
  2024-06-10 17:57                   ` Peter Zijlstra
@ 2024-06-17 15:47                   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-06-17 15:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yue Sun, Xingwei Lee, Thomas Gleixner, Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     83ab38ef0a0b2407d43af9575bb32333fdd74fb2
Gitweb:        https://git.kernel.org/tip/83ab38ef0a0b2407d43af9575bb32333fdd74fb2
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Mon, 10 Jun 2024 14:46:36 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 11 Jun 2024 11:25:23 +02:00

jump_label: Fix concurrency issues in static_key_slow_dec()

The commit which tried to fix the concurrency issues of concurrent
static_key_slow_inc() failed to fix the equivalent issues
vs. static_key_slow_dec():

CPU0                     CPU1

static_key_slow_dec()
  static_key_slow_try_dec()

	key->enabled == 1
	val = atomic_fetch_add_unless(&key->enabled, -1, 1);
	if (val == 1)
	     return false;

  jump_label_lock();
  if (atomic_dec_and_test(&key->enabled)) {
     --> key->enabled == 0
   __jump_label_update()

			 static_key_slow_dec()
			   static_key_slow_try_dec()

			     key->enabled == 0
			     val = atomic_fetch_add_unless(&key->enabled, -1, 1);

			      --> key->enabled == -1 <- FAIL

There is another bug in that code, when there is a concurrent
static_key_slow_inc() which enables the key as that sets key->enabled to -1
so on the other CPU

	val = atomic_fetch_add_unless(&key->enabled, -1, 1);

will succeed and decrement to -2, which is invalid.

Cure all of this by replacing the atomic_fetch_add_unless() with a
atomic_try_cmpxchg() loop similar to static_key_fast_inc_not_disabled().

[peterz: add WARN_ON_ONCE for the -1 race]
Fixes: 4c5ea0a9cd02 ("locking/static_key: Fix concurrent static_key_slow_inc()")
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.422897838@linutronix.de
---
 kernel/jump_label.c | 45 ++++++++++++++++++++++++++++----------------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 3218fa5..1f05a19 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -131,7 +131,7 @@ bool static_key_fast_inc_not_disabled(struct static_key *key)
 	STATIC_KEY_CHECK_USE(key);
 	/*
 	 * Negative key->enabled has a special meaning: it sends
-	 * static_key_slow_inc() down the slow path, and it is non-zero
+	 * static_key_slow_inc/dec() down the slow path, and it is non-zero
 	 * so it counts as "enabled" in jump_label_update().  Note that
 	 * atomic_inc_unless_negative() checks >= 0, so roll our own.
 	 */
@@ -150,7 +150,7 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key)
 	lockdep_assert_cpus_held();
 
 	/*
-	 * Careful if we get concurrent static_key_slow_inc() calls;
+	 * Careful if we get concurrent static_key_slow_inc/dec() calls;
 	 * later calls must wait for the first one to _finish_ the
 	 * jump_label_update() process.  At the same time, however,
 	 * the jump_label_update() call below wants to see
@@ -247,20 +247,32 @@ EXPORT_SYMBOL_GPL(static_key_disable);
 
 static bool static_key_slow_try_dec(struct static_key *key)
 {
-	int val;
-
-	val = atomic_fetch_add_unless(&key->enabled, -1, 1);
-	if (val == 1)
-		return false;
+	int v;
 
 	/*
-	 * The negative count check is valid even when a negative
-	 * key->enabled is in use by static_key_slow_inc(); a
-	 * __static_key_slow_dec() before the first static_key_slow_inc()
-	 * returns is unbalanced, because all other static_key_slow_inc()
-	 * instances block while the update is in progress.
+	 * Go into the slow path if key::enabled is less than or equal than
+	 * one. One is valid to shut down the key, anything less than one
+	 * is an imbalance, which is handled at the call site.
+	 *
+	 * That includes the special case of '-1' which is set in
+	 * static_key_slow_inc_cpuslocked(), but that's harmless as it is
+	 * fully serialized in the slow path below. By the time this task
+	 * acquires the jump label lock the value is back to one and the
+	 * retry under the lock must succeed.
 	 */
-	WARN(val < 0, "jump label: negative count!\n");
+	v = atomic_read(&key->enabled);
+	do {
+		/*
+		 * Warn about the '-1' case though; since that means a
+		 * decrement is concurrent with a first (0->1) increment. IOW
+		 * people are trying to disable something that wasn't yet fully
+		 * enabled. This suggests an ordering problem on the user side.
+		 */
+		WARN_ON_ONCE(v < 0);
+		if (v <= 1)
+			return false;
+	} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1)));
+
 	return true;
 }
 
@@ -271,10 +283,11 @@ static void __static_key_slow_dec_cpuslocked(struct static_key *key)
 	if (static_key_slow_try_dec(key))
 		return;
 
-	jump_label_lock();
-	if (atomic_dec_and_test(&key->enabled))
+	guard(mutex)(&jump_label_mutex);
+	if (atomic_cmpxchg(&key->enabled, 1, 0))
 		jump_label_update(key);
-	jump_label_unlock();
+	else
+		WARN_ON_ONCE(!static_key_slow_try_dec(key));
 }
 
 static void __static_key_slow_dec(struct static_key *key)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-06-17 15:47 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-09  6:33 [Linux kernel bug] WARNING in static_key_slow_inc_cpuslocked Sam Sun
2024-06-09 13:04 ` Steven Rostedt
2024-06-09 14:06   ` Thomas Gleixner
2024-06-09 14:25     ` Steven Rostedt
2024-06-09 16:02       ` Thomas Gleixner
2024-06-09 16:56         ` Thomas Gleixner
2024-06-09 19:39           ` Thomas Gleixner
2024-06-10  6:46           ` Peter Zijlstra
2024-06-10 10:34             ` Thomas Gleixner
2024-06-10 12:46               ` [patch 0/4] perf/x86, jump_label: Cure serialization issues Thomas Gleixner
2024-06-10 12:46                 ` [patch 1/4] perf/x86: Serialize set_attr_rdpmc() Thomas Gleixner
2024-06-17 15:47                   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46                 ` [patch 2/4] jump_label: Fix concurrency issues in static_key_slow_dec() Thomas Gleixner
2024-06-10 17:57                   ` Peter Zijlstra
2024-06-10 18:00                     ` Thomas Gleixner
2024-06-17 15:47                   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46                 ` [patch 3/4] jump_label: Clarify condition in static_key_fast_inc_not_disabled() Thomas Gleixner
2024-06-17 15:47                   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2024-06-10 12:46                 ` [patch 4/4] jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() Thomas Gleixner
2024-06-12 13:57                   ` Uros Bizjak
2024-06-17 15:47                   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox