[PATCH v2 0/4] KCOV function entry/exit records

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/4] KCOV function entry/exit records
@ 2026-03-18 16:26 Jann Horn
  2026-03-18 16:26 ` [PATCH v2 1/4] sched: Ensure matching stack state for kcov disable/enable on switch Jann Horn
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Jann Horn @ 2026-03-18 16:26 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko
  Cc: Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	linux-kernel, kasan-dev, llvm, Jann Horn, Ingo Molnar,
	Peter Zijlstra, Josh Poimboeuf

This series adds a KCOV feature that userspace can use to keep track of
the current call stack. When userspace enables the new mode
KCOV_TRACE_PC_EXT, collected instruction addresses are tagged with one
of three types:

 - function entry
 - non-entry basic block
 - function exit

This requires corresponding LLVM support, which was recently added in
LLVM commit:
https://github.com/llvm/llvm-project/commit/dc5c6d008f487eea8f5d646011f9b3dca6caebd7

A simple example of how to use KCOV_TRACE_PC_EXT:
```
user@vm:~/kcov/u$ cat kcov-u.c

  typeof(x) __res = (x);      \
  if (__res == (typeof(x))-1) \
    err(1, "SYSCHK(" #x ")"); \
  __res;                      \
})

static void indent(int depth) {
  for (int i=0; i<depth; i++)
    printf("  ");
}

int main(void) {
  int fd = SYSCHK(open("/sys/kernel/debug/kcov", O_RDWR));
  SYSCHK(ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE));
  unsigned long *cover = (unsigned long*)SYSCHK(
      mmap(NULL, COVER_SIZE * sizeof(unsigned long), PROT_READ | PROT_WRITE,
           MAP_SHARED, fd, 0));
  SYSCHK(ioctl(fd, KCOV_ENABLE, KCOV_TRACE_PC_EXT));
  usleep(1000); // fault in stuff
  __atomic_store_n(&cover[0], 0, __ATOMIC_RELAXED); // start recording
  usleep(1000);
  unsigned long cover_num = __atomic_load_n(&cover[0], __ATOMIC_RELAXED); // end

  int depth = 0;
  for (unsigned long i = 0; i < cover_num; i++) {
    unsigned long record = cover[1+i];
    unsigned long pc = record | ~KCOV_RECORD_IP_MASK;
    switch (record & KCOV_RECORDFLAG_TYPEMASK) {
    case KCOV_RECORDFLAG_TYPE_NORMAL:
      indent(depth);
      printf("BB    0x%lx\n", pc);
      break;
    case KCOV_RECORDFLAG_TYPE_ENTRY:
      indent(depth);
      printf("ENTER 0x%lx\n", pc);
      depth++;
      break;
    case KCOV_RECORDFLAG_TYPE_EXIT:
      if (depth == 0)
        errx(1, "exit at depth 0");
      depth--;
      indent(depth);
      printf("EXIT  0x%lx\n", pc);
      break;
    default: errx(1, "unknown record type in 0x%016lx", record);
    }
  }
}
user@vm:~/kcov/u$ cat symbolize.py

import sys
syms = []
with open('/proc/kallsyms') as f:
  for line in f:
    parts = line.strip().split(' ')
    if len(parts) < 3:
      continue
    syms.append((int(parts[0], 16), parts[2]))

for line in sys.stdin:
  parts = line.rstrip().split('0x')
  if len(parts) != 2:
    continue
  record_pc = int(parts[1], 16)
  for i in range(0, len(syms)-1):
    if syms[i+1][0] > record_pc:
      print(parts[0] + syms[i][1] + '+' + hex(record_pc - syms[i][0]))
      break
user@vm:~/kcov/u$ gcc -o kcov-u kcov-u.c -Wall
user@vm:~/kcov/u$ sudo ./kcov-u | sudo ./symbolize.py
ENTER __audit_syscall_entry+0x2c
  BB    __audit_syscall_entry+0xa4
  BB    __audit_syscall_entry+0xd2
  BB    __audit_syscall_entry+0x1ab
  ENTER ktime_get_coarse_real_ts64+0x1a
    BB    ktime_get_coarse_real_ts64+0x3f
    BB    ktime_get_coarse_real_ts64+0x96
  EXIT  ktime_get_coarse_real_ts64+0x9b
EXIT  __audit_syscall_entry+0x12b
ENTER __x64_sys_clock_nanosleep+0x18
  ENTER __se_sys_clock_nanosleep+0x33
    BB    __se_sys_clock_nanosleep+0x10e
    ENTER get_timespec64+0x29
      ENTER _copy_from_user+0x17
        BB    _copy_from_user+0x5d
      EXIT  _copy_from_user+0x62
      BB    get_timespec64+0xaf
    EXIT  get_timespec64+0xd5
    BB    __se_sys_clock_nanosleep+0x1c0
    ENTER common_nsleep+0x1f
      ENTER hrtimer_nanosleep+0x2f
        ENTER hrtimer_setup_sleeper_on_stack+0x20
          BB    hrtimer_setup_sleeper_on_stack+0x2a
          BB    hrtimer_setup_sleeper_on_stack+0x7e
        EXIT  hrtimer_setup_sleeper_on_stack+0x14c
        ENTER do_nanosleep+0x2d
          BB    do_nanosleep+0x3b
          ENTER hrtimer_start_range_ns+0x28
            BB    hrtimer_start_range_ns+0x67
            ENTER remove_hrtimer+0x22
              BB    remove_hrtimer+0x4b
            EXIT  remove_hrtimer+0x1ea
            BB    hrtimer_start_range_ns+0x173
            ENTER __hrtimer_cb_get_time+0x11
              BB    __hrtimer_cb_get_time+0x32
              ENTER ktime_get+0x17
                BB    ktime_get+0x33
                BB    ktime_get+0x58
                ENTER kvm_clock_get_cycles+0xc
                  BB    kvm_clock_get_cycles+0x48
                EXIT  kvm_clock_get_cycles+0x4d
                BB    ktime_get+0xb7
                BB    ktime_get+0x149
              EXIT  ktime_get+0x151
            EXIT  __hrtimer_cb_get_time+0x84
            BB    hrtimer_start_range_ns+0x3bc
            BB    hrtimer_start_range_ns+0x5a4
            ENTER enqueue_hrtimer+0x20
              BB    enqueue_hrtimer+0x2a
              BB    enqueue_hrtimer+0x5b
              ENTER timerqueue_add+0x1c
                BB    timerqueue_add+0x41
                BB    timerqueue_add+0xb2
                BB    timerqueue_add+0xb2
                BB    timerqueue_add+0xf9
              EXIT  timerqueue_add+0x150
            EXIT  enqueue_hrtimer+0xaf
            BB    hrtimer_start_range_ns+0x714
            ENTER hrtimer_reprogram+0x1b
              BB    hrtimer_reprogram+0x65
              BB    hrtimer_reprogram+0x13a
              BB    hrtimer_reprogram+0x1dc
              ENTER tick_program_event+0x25
                BB    tick_program_event+0x65
                ENTER clockevents_program_event+0x20
                  BB    clockevents_program_event+0x7e
                  ENTER ktime_get+0x17
                    BB    ktime_get+0x33
                    BB    ktime_get+0x58
                    ENTER kvm_clock_get_cycles+0xc
                      BB    kvm_clock_get_cycles+0x48
                    EXIT  kvm_clock_get_cycles+0x4d
                    BB    ktime_get+0xb7
                    BB    ktime_get+0x149
                  EXIT  ktime_get+0x151
                  BB    clockevents_program_event+0x219
                EXIT  clockevents_program_event+0x22b
              EXIT  tick_program_event+0x89
            EXIT  hrtimer_reprogram+0x211
          EXIT  hrtimer_start_range_ns+0x74d
          BB    do_nanosleep+0x9c
          ENTER sched_clock+0xc
            BB    sched_clock+0x40
          EXIT  sched_clock+0x45
          ENTER arch_scale_cpu_capacity+0x13
            BB    arch_scale_cpu_capacity+0x1a
          EXIT  arch_scale_cpu_capacity+0x24
          ENTER __cgroup_account_cputime+0x1b
            ENTER css_rstat_updated+0x2c
              BB    css_rstat_updated+0x77
              BB    css_rstat_updated+0xbe
            EXIT  css_rstat_updated+0x1bc
            BB    __cgroup_account_cputime+0x81
          EXIT  __cgroup_account_cputime+0x86
          ENTER sched_clock+0xc
            BB    sched_clock+0x40
          EXIT  sched_clock+0x45
          ENTER sched_clock+0xc
            BB    sched_clock+0x40
          EXIT  sched_clock+0x45
          ENTER __msecs_to_jiffies+0x13
            BB    __msecs_to_jiffies+0x25
          EXIT  __msecs_to_jiffies+0x4c
          ENTER prandom_u32_state+0x15
          EXIT  prandom_u32_state+0xbe
          ENTER hrtimer_try_to_cancel+0x1e
            BB    hrtimer_try_to_cancel+0x6a
            BB    hrtimer_try_to_cancel+0x1da
          EXIT  hrtimer_try_to_cancel+0x1be
          BB    do_nanosleep+0xbf
          BB    do_nanosleep+0x166
          BB    do_nanosleep+0x177
          BB    do_nanosleep+0x275
        EXIT  do_nanosleep+0x2d5
        BB    hrtimer_nanosleep+0x182
      EXIT  hrtimer_nanosleep+0x194
    EXIT  common_nsleep+0x77
  EXIT  __se_sys_clock_nanosleep+0x15d
EXIT  __x64_sys_clock_nanosleep+0x62
ENTER __audit_syscall_exit+0x1d
  BB    __audit_syscall_exit+0x5c
  ENTER audit_reset_context+0x1e
    BB    audit_reset_context+0x52
  EXIT  audit_reset_context+0x5f6
EXIT  __audit_syscall_exit+0x168
ENTER fpregs_assert_state_consistent+0x11
  BB    fpregs_assert_state_consistent+0x48
  BB    fpregs_assert_state_consistent+0xa6
EXIT  fpregs_assert_state_consistent+0xcc
ENTER switch_fpu_return+0xe
  ENTER fpregs_restore_userregs+0x12
    BB    fpregs_restore_userregs+0x4c
    BB    fpregs_restore_userregs+0xb8
  EXIT  fpregs_restore_userregs+0x107
EXIT  switch_fpu_return+0x18
```

Signed-off-by: Jann Horn <jannh@google.com>
---
Changes in v2:
- patch 2: change commit message (dvyukov)
- patch 2: add __always_inline (dvyukov)
- patch 2: add comment in __sanitizer_cov_trace_pc_entry
- replaced patch 3 with patches 3+4
  - store extended record format flag as part of kcov_mode (dvyukov)
  - clarify comment in __sanitizer_cov_trace_pc_exit (dvyukov)
- Link to v1: https://lore.kernel.org/r/20260311-kcov-extrecord-v1-0-68f03c4a05ad@google.com

---
Jann Horn (4):
      sched: Ensure matching stack state for kcov disable/enable on switch
      kcov: wire up compiler instrumentation for CONFIG_KCOV_EXT_RECORDS
      kcov: refactor mode check out of check_kcov_mode()
      kcov: introduce extended PC coverage collection mode

 include/linux/kcov.h      |  9 +++++
 include/uapi/linux/kcov.h | 12 ++++++
 kernel/kcov.c             | 94 ++++++++++++++++++++++++++++++++++++-----------
 kernel/sched/core.c       | 13 +++++--
 lib/Kconfig.debug         | 12 ++++++
 scripts/Makefile.kcov     |  2 +
 tools/objtool/check.c     |  2 +
 7 files changed, 120 insertions(+), 24 deletions(-)
---
base-commit: b29fb8829bff243512bb8c8908fd39406f9fd4c3
change-id: 20260311-kcov-extrecord-6e0d9a2b0a8c

--  
Jann Horn <jannh@google.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/4] sched: Ensure matching stack state for kcov disable/enable on switch
  2026-03-18 16:26 [PATCH v2 0/4] KCOV function entry/exit records Jann Horn
@ 2026-03-18 16:26 ` Jann Horn
  2026-03-20 22:10   ` Peter Zijlstra
  2026-03-18 16:26 ` [PATCH v2 2/4] kcov: wire up compiler instrumentation for CONFIG_KCOV_EXT_RECORDS Jann Horn
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Jann Horn @ 2026-03-18 16:26 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko
  Cc: Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	linux-kernel, kasan-dev, llvm, Jann Horn, Ingo Molnar,
	Peter Zijlstra

Ensure that kcov is disabled and enabled with the same call stack.
This will be relied on by subsequent patches for recording function
entry/exit records via kcov.

This patch should not affect compilation of normal kernels without KCOV
(though it changes "inline" to "__always_inline").

To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jann Horn <jannh@google.com>
---
 kernel/sched/core.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7f77c165a6e..c470f0a669ec 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5072,8 +5072,10 @@ static inline void kmap_local_sched_in(void)
  *
  * prepare_task_switch sets up locking and calls architecture specific
  * hooks.
+ *
+ * Must be inlined for kcov_prepare_switch().
  */
-static inline void
+static __always_inline void
 prepare_task_switch(struct rq *rq, struct task_struct *prev,
 		    struct task_struct *next)
 	__must_hold(__rq_lockp(rq))
@@ -5149,7 +5151,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	tick_nohz_task_switch();
 	finish_lock_switch(rq);
 	finish_arch_post_lock_switch();
-	kcov_finish_switch(current);
 	/*
 	 * kmap_local_sched_out() is invoked with rq::lock held and
 	 * interrupts disabled. There is no requirement for that, but the
@@ -5295,7 +5296,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	switch_to(prev, next, prev);
 	barrier();
 
-	return finish_task_switch(prev);
+	rq = finish_task_switch(prev);
+	/*
+	 * This has to happen outside finish_task_switch() to ensure that
+	 * entry/exit records are balanced.
+	 */
+	kcov_finish_switch(current);
+	return rq;
 }
 
 /*

-- 
2.53.0.851.ga537e3e6e9-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/4] kcov: wire up compiler instrumentation for CONFIG_KCOV_EXT_RECORDS
  2026-03-18 16:26 [PATCH v2 0/4] KCOV function entry/exit records Jann Horn
  2026-03-18 16:26 ` [PATCH v2 1/4] sched: Ensure matching stack state for kcov disable/enable on switch Jann Horn
@ 2026-03-18 16:26 ` Jann Horn
  2026-03-18 16:27 ` [PATCH v2 3/4] kcov: refactor mode check out of check_kcov_mode() Jann Horn
  2026-03-18 16:27 ` [PATCH v2 4/4] kcov: introduce extended PC coverage collection mode Jann Horn
  3 siblings, 0 replies; 8+ messages in thread
From: Jann Horn @ 2026-03-18 16:26 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko
  Cc: Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	linux-kernel, kasan-dev, llvm, Jann Horn, Josh Poimboeuf,
	Peter Zijlstra

This is the first half of CONFIG_KCOV_EXT_RECORDS.

Set the appropriate compiler flags to call separate hooks for function
entry/exit, and provide these hooks, but don't make it visible in the KCOV
UAPI yet.

With -fsanitize-coverage=trace-pc-entry-exit, the compiler behavior changes
as follows:

- The __sanitizer_cov_trace_pc() call on function entry is replaced with a
  call to __sanitizer_cov_trace_pc_entry(); so for now,
  __sanitizer_cov_trace_pc_entry() must be treated the same way as
  __sanitizer_cov_trace_pc().
 - On function exit, an extra call to __sanitizer_cov_trace_pc_exit()
   happens; since function exit produced no coverage in the old UAPI,
   __sanitizer_cov_trace_pc_exit() should do nothing for now.

This feature was added to LLVM in commit:
https://github.com/llvm/llvm-project/commit/dc5c6d008f487eea8f5d646011f9b3dca6caebd7

Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jann Horn <jannh@google.com>
---
 include/linux/kcov.h  |  2 ++
 kernel/kcov.c         | 34 +++++++++++++++++++++++++++-------
 lib/Kconfig.debug     | 12 ++++++++++++
 scripts/Makefile.kcov |  2 ++
 tools/objtool/check.c |  2 ++
 5 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/include/linux/kcov.h b/include/linux/kcov.h
index 0143358874b0..e5502d674029 100644
--- a/include/linux/kcov.h
+++ b/include/linux/kcov.h
@@ -81,6 +81,8 @@ typedef unsigned long long kcov_u64;
 #endif
 
 void __sanitizer_cov_trace_pc(void);
+void __sanitizer_cov_trace_pc_entry(void);
+void __sanitizer_cov_trace_pc_exit(void);
 void __sanitizer_cov_trace_cmp1(u8 arg1, u8 arg2);
 void __sanitizer_cov_trace_cmp2(u16 arg1, u16 arg2);
 void __sanitizer_cov_trace_cmp4(u32 arg1, u32 arg2);
diff --git a/kernel/kcov.c b/kernel/kcov.c
index 0b369e88c7c9..86b681c7865c 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -202,15 +202,10 @@ static notrace unsigned long canonicalize_ip(unsigned long ip)
 	return ip;
 }
 
-/*
- * Entry point from instrumented code.
- * This is called once per basic-block/edge.
- */
-void notrace __sanitizer_cov_trace_pc(void)
+static __always_inline void notrace kcov_add_pc_record(unsigned long record)
 {
 	struct task_struct *t;
 	unsigned long *area;
-	unsigned long ip = canonicalize_ip(_RET_IP_);
 	unsigned long pos;
 
 	t = current;
@@ -230,11 +225,36 @@ void notrace __sanitizer_cov_trace_pc(void)
 		 */
 		WRITE_ONCE(area[0], pos);
 		barrier();
-		area[pos] = ip;
+		area[pos] = record;
 	}
 }
+
+/*
+ * Entry point from instrumented code.
+ * This is called once per basic-block/edge.
+ */
+void notrace __sanitizer_cov_trace_pc(void)
+{
+	kcov_add_pc_record(canonicalize_ip(_RET_IP_));
+}
 EXPORT_SYMBOL(__sanitizer_cov_trace_pc);
 
+#ifdef CONFIG_KCOV_EXT_RECORDS
+void notrace __sanitizer_cov_trace_pc_entry(void)
+{
+	unsigned long record = canonicalize_ip(_RET_IP_);
+
+	/*
+	 * This hook replaces __sanitizer_cov_trace_pc() for the function entry
+	 * basic block; it should still emit a record even in classic kcov mode.
+	 */
+	kcov_add_pc_record(record);
+}
+void notrace __sanitizer_cov_trace_pc_exit(void)
+{
+}
+#endif
+
 #ifdef CONFIG_KCOV_ENABLE_COMPARISONS
 static void notrace write_comp_data(u64 type, u64 arg1, u64 arg2, u64 ip)
 {
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 93f356d2b3d9..58686a99c40a 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2215,6 +2215,18 @@ config KCOV
 
 	  For more details, see Documentation/dev-tools/kcov.rst.
 
+config KCOV_EXT_RECORDS
+	bool "Support extended KCOV records with function entry/exit records"
+	depends on KCOV
+	depends on 64BIT
+	depends on $(cc-option,-fsanitize-coverage=trace-pc-entry-exit)
+	help
+	  Extended KCOV records allow distinguishing between multiple types of
+	  records: Normal edge coverage, function entry, and function exit.
+
+	  This will likely cause a small additional slowdown compared to normal
+	  KCOV.
+
 config KCOV_ENABLE_COMPARISONS
 	bool "Enable comparison operands collection by KCOV"
 	depends on KCOV
diff --git a/scripts/Makefile.kcov b/scripts/Makefile.kcov
index 78305a84ba9d..aa0be904268f 100644
--- a/scripts/Makefile.kcov
+++ b/scripts/Makefile.kcov
@@ -1,10 +1,12 @@
 # SPDX-License-Identifier: GPL-2.0-only
 kcov-flags-y					+= -fsanitize-coverage=trace-pc
+kcov-flags-$(CONFIG_KCOV_EXT_RECORDS)		+= -fsanitize-coverage=trace-pc-entry-exit
 kcov-flags-$(CONFIG_KCOV_ENABLE_COMPARISONS)	+= -fsanitize-coverage=trace-cmp
 
 kcov-rflags-y					+= -Cpasses=sancov-module
 kcov-rflags-y					+= -Cllvm-args=-sanitizer-coverage-level=3
 kcov-rflags-y					+= -Cllvm-args=-sanitizer-coverage-trace-pc
+kcov-rflags-$(CONFIG_KCOV_EXT_RECORDS)		+= -Cllvm-args=-sanitizer-coverage-trace-pc-entry-exit
 kcov-rflags-$(CONFIG_KCOV_ENABLE_COMPARISONS)	+= -Cllvm-args=-sanitizer-coverage-trace-compares
 
 export CFLAGS_KCOV := $(kcov-flags-y)
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index a30379e4ff97..ae3127227621 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1251,6 +1251,8 @@ static const char *uaccess_safe_builtin[] = {
 	"write_comp_data",
 	"check_kcov_mode",
 	"__sanitizer_cov_trace_pc",
+	"__sanitizer_cov_trace_pc_entry",
+	"__sanitizer_cov_trace_pc_exit",
 	"__sanitizer_cov_trace_const_cmp1",
 	"__sanitizer_cov_trace_const_cmp2",
 	"__sanitizer_cov_trace_const_cmp4",

-- 
2.53.0.851.ga537e3e6e9-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/4] kcov: refactor mode check out of check_kcov_mode()
  2026-03-18 16:26 [PATCH v2 0/4] KCOV function entry/exit records Jann Horn
  2026-03-18 16:26 ` [PATCH v2 1/4] sched: Ensure matching stack state for kcov disable/enable on switch Jann Horn
  2026-03-18 16:26 ` [PATCH v2 2/4] kcov: wire up compiler instrumentation for CONFIG_KCOV_EXT_RECORDS Jann Horn
@ 2026-03-18 16:27 ` Jann Horn
  2026-03-18 16:27 ` [PATCH v2 4/4] kcov: introduce extended PC coverage collection mode Jann Horn
  3 siblings, 0 replies; 8+ messages in thread
From: Jann Horn @ 2026-03-18 16:27 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko
  Cc: Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	linux-kernel, kasan-dev, llvm, Jann Horn

The following patch will need to check t->kcov_mode in different ways at
different check_kcov_mode() call sites. In preparation for that, move the
mode check up the call hierarchy.

Signed-off-by: Jann Horn <jannh@google.com>
---
 kernel/kcov.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/kernel/kcov.c b/kernel/kcov.c
index 86b681c7865c..7edb39e18bfe 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -171,10 +171,8 @@ static __always_inline bool in_softirq_really(void)
 	return in_serving_softirq() && !in_hardirq() && !in_nmi();
 }
 
-static notrace bool check_kcov_mode(enum kcov_mode needed_mode, struct task_struct *t)
+static notrace bool check_kcov_context(struct task_struct *t)
 {
-	unsigned int mode;
-
 	/*
 	 * We are interested in code coverage as a function of a syscall inputs,
 	 * so we ignore code executed in interrupts, unless we are in a remote
@@ -182,7 +180,6 @@ static notrace bool check_kcov_mode(enum kcov_mode needed_mode, struct task_stru
 	 */
 	if (!in_task() && !(in_softirq_really() && t->kcov_softirq))
 		return false;
-	mode = READ_ONCE(t->kcov_mode);
 	/*
 	 * There is some code that runs in interrupts but for which
 	 * in_interrupt() returns false (e.g. preempt_schedule_irq()).
@@ -191,7 +188,7 @@ static notrace bool check_kcov_mode(enum kcov_mode needed_mode, struct task_stru
 	 * kcov_start().
 	 */
 	barrier();
-	return mode == needed_mode;
+	return true;
 }
 
 static notrace unsigned long canonicalize_ip(unsigned long ip)
@@ -202,14 +199,12 @@ static notrace unsigned long canonicalize_ip(unsigned long ip)
 	return ip;
 }
 
-static __always_inline void notrace kcov_add_pc_record(unsigned long record)
+static __always_inline void notrace kcov_add_pc_record(struct task_struct *t, unsigned long record)
 {
-	struct task_struct *t;
 	unsigned long *area;
 	unsigned long pos;
 
-	t = current;
-	if (!check_kcov_mode(KCOV_MODE_TRACE_PC, t))
+	if (!check_kcov_context(t))
 		return;
 
 	area = t->kcov_area;
@@ -217,7 +212,7 @@ static __always_inline void notrace kcov_add_pc_record(unsigned long record)
 	pos = READ_ONCE(area[0]) + 1;
 	if (likely(pos < t->kcov_size)) {
 		/* Previously we write pc before updating pos. However, some
-		 * early interrupt code could bypass check_kcov_mode() check
+		 * early interrupt code could bypass check_kcov_context() check
 		 * and invoke __sanitizer_cov_trace_pc(). If such interrupt is
 		 * raised between writing pc and updating pos, the pc could be
 		 * overitten by the recursive __sanitizer_cov_trace_pc().
@@ -235,20 +230,28 @@ static __always_inline void notrace kcov_add_pc_record(unsigned long record)
  */
 void notrace __sanitizer_cov_trace_pc(void)
 {
-	kcov_add_pc_record(canonicalize_ip(_RET_IP_));
+	struct task_struct *cur = current;
+
+	if (READ_ONCE(cur->kcov_mode) != KCOV_MODE_TRACE_PC)
+		return;
+	kcov_add_pc_record(cur, canonicalize_ip(_RET_IP_));
 }
 EXPORT_SYMBOL(__sanitizer_cov_trace_pc);
 
 #ifdef CONFIG_KCOV_EXT_RECORDS
 void notrace __sanitizer_cov_trace_pc_entry(void)
 {
+	struct task_struct *cur = current;
 	unsigned long record = canonicalize_ip(_RET_IP_);
+	unsigned int kcov_mode = READ_ONCE(cur->kcov_mode);
 
 	/*
 	 * This hook replaces __sanitizer_cov_trace_pc() for the function entry
 	 * basic block; it should still emit a record even in classic kcov mode.
 	 */
-	kcov_add_pc_record(record);
+	if (kcov_mode != KCOV_MODE_TRACE_PC)
+		return;
+	kcov_add_pc_record(cur, record);
 }
 void notrace __sanitizer_cov_trace_pc_exit(void)
 {
@@ -263,7 +266,7 @@ static void notrace write_comp_data(u64 type, u64 arg1, u64 arg2, u64 ip)
 	u64 count, start_index, end_pos, max_pos;
 
 	t = current;
-	if (!check_kcov_mode(KCOV_MODE_TRACE_CMP, t))
+	if (READ_ONCE(t->kcov_mode) != KCOV_MODE_TRACE_CMP || !check_kcov_context(t))
 		return;
 
 	ip = canonicalize_ip(ip);
@@ -383,7 +386,7 @@ static void kcov_start(struct task_struct *t, struct kcov *kcov,
 	t->kcov_size = size;
 	t->kcov_area = area;
 	t->kcov_sequence = sequence;
-	/* See comment in check_kcov_mode(). */
+	/* See comment in check_kcov_context(). */
 	barrier();
 	WRITE_ONCE(t->kcov_mode, mode);
 }

-- 
2.53.0.851.ga537e3e6e9-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/4] kcov: introduce extended PC coverage collection mode
  2026-03-18 16:26 [PATCH v2 0/4] KCOV function entry/exit records Jann Horn
                   ` (2 preceding siblings ...)
  2026-03-18 16:27 ` [PATCH v2 3/4] kcov: refactor mode check out of check_kcov_mode() Jann Horn
@ 2026-03-18 16:27 ` Jann Horn
  3 siblings, 0 replies; 8+ messages in thread
From: Jann Horn @ 2026-03-18 16:27 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko
  Cc: Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	linux-kernel, kasan-dev, llvm, Jann Horn

This is the second half of CONFIG_KCOV_EXT_RECORDS.

Introduce a new KCOV mode KCOV_TRACE_PC_EXT which replaces the upper 8 bits
of recorded instruction pointers with metadata. For now, userspace can use
this metadata to distinguish three types of records:

 - function entry
 - function exit
 - normal basic block inside the function

Internally, this new mode is represented as a variant of
KCOV_MODE_TRACE_PC, distinguished with the flag KCOV_EXT_FORMAT.
Store this flag as part of the mode in task_struct::kcov_mode and in
kcov::mode to avoid having to pass it around separately everywhere.

Signed-off-by: Jann Horn <jannh@google.com>
---
 include/linux/kcov.h      |  7 +++++++
 include/uapi/linux/kcov.h | 12 ++++++++++++
 kernel/kcov.c             | 39 ++++++++++++++++++++++++++++++++++-----
 3 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/include/linux/kcov.h b/include/linux/kcov.h
index e5502d674029..455302b1cd1c 100644
--- a/include/linux/kcov.h
+++ b/include/linux/kcov.h
@@ -25,8 +25,15 @@ enum kcov_mode {
 	KCOV_MODE_REMOTE = 4,
 };
 
+/*
+ * Modifier for KCOV_MODE_TRACE_PC to record function entry/exit marked with
+ * metadata bits.
+ */
+#define KCOV_EXT_FORMAT	(1 << 29)
 #define KCOV_IN_CTXSW	(1 << 30)
 
+#define KCOV_MODE_TRACE_PC_EXT (KCOV_MODE_TRACE_PC | KCOV_EXT_FORMAT)
+
 void kcov_task_init(struct task_struct *t);
 void kcov_task_exit(struct task_struct *t);
 
diff --git a/include/uapi/linux/kcov.h b/include/uapi/linux/kcov.h
index ed95dba9fa37..8d8a233bd61f 100644
--- a/include/uapi/linux/kcov.h
+++ b/include/uapi/linux/kcov.h
@@ -35,8 +35,20 @@ enum {
 	KCOV_TRACE_PC = 0,
 	/* Collecting comparison operands mode. */
 	KCOV_TRACE_CMP = 1,
+	/*
+	 * Extended PC coverage collection mode.
+	 * In this mode, the top byte of the PC is replaced with flag bits
+	 * (KCOV_RECORDFLAG_*).
+	 */
+	KCOV_TRACE_PC_EXT = 2,
 };
 
+#define KCOV_RECORD_IP_MASK         0x00ffffffffffffff
+#define KCOV_RECORDFLAG_TYPEMASK    0xf000000000000000
+#define KCOV_RECORDFLAG_TYPE_NORMAL 0xf000000000000000
+#define KCOV_RECORDFLAG_TYPE_ENTRY  0x0000000000000000
+#define KCOV_RECORDFLAG_TYPE_EXIT   0x1000000000000000
+
 /*
  * The format for the types of collected comparisons.
  *
diff --git a/kernel/kcov.c b/kernel/kcov.c
index 7edb39e18bfe..3cd4ee4cc310 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -55,7 +55,12 @@ struct kcov {
 	refcount_t		refcount;
 	/* The lock protects mode, size, area and t. */
 	spinlock_t		lock;
-	enum kcov_mode		mode __guarded_by(&lock);
+	/*
+	 * Mode, consists of:
+	 *  - enum kcov_mode
+	 *  - flag KCOV_EXT_FORMAT
+	 */
+	unsigned int		mode __guarded_by(&lock);
 	/* Size of arena (in long's). */
 	unsigned int		size __guarded_by(&lock);
 	/* Coverage buffer shared with user space. */
@@ -232,8 +237,14 @@ void notrace __sanitizer_cov_trace_pc(void)
 {
 	struct task_struct *cur = current;
 
-	if (READ_ONCE(cur->kcov_mode) != KCOV_MODE_TRACE_PC)
+	if ((READ_ONCE(cur->kcov_mode) & ~KCOV_EXT_FORMAT) != KCOV_MODE_TRACE_PC)
 		return;
+	/*
+	 * No bitops are needed here for setting the record type because
+	 * KCOV_RECORDFLAG_TYPE_NORMAL has the high bits set.
+	 * This relies on userspace not caring about the rest of the top byte
+	 * for KCOV_RECORDFLAG_TYPE_NORMAL records.
+	 */
 	kcov_add_pc_record(cur, canonicalize_ip(_RET_IP_));
 }
 EXPORT_SYMBOL(__sanitizer_cov_trace_pc);
@@ -249,12 +260,28 @@ void notrace __sanitizer_cov_trace_pc_entry(void)
 	 * This hook replaces __sanitizer_cov_trace_pc() for the function entry
 	 * basic block; it should still emit a record even in classic kcov mode.
 	 */
-	if (kcov_mode != KCOV_MODE_TRACE_PC)
+	if ((kcov_mode & ~KCOV_EXT_FORMAT) != KCOV_MODE_TRACE_PC)
 		return;
+	if ((kcov_mode & KCOV_EXT_FORMAT) != 0)
+		record = (record & KCOV_RECORD_IP_MASK) | KCOV_RECORDFLAG_TYPE_ENTRY;
 	kcov_add_pc_record(cur, record);
 }
 void notrace __sanitizer_cov_trace_pc_exit(void)
 {
+	struct task_struct *cur = current;
+	unsigned long record;
+
+	/*
+	 * This hook is not called at the beginning of a basic block; the basic
+	 * block from which the hook was invoked is already covered by a
+	 * preceding hook call.
+	 * So unlike __sanitizer_cov_trace_pc_entry(), this PC should only be
+	 * reported in extended mode, where function exit events are recorded.
+	 */
+	if (READ_ONCE(cur->kcov_mode) != KCOV_MODE_TRACE_PC_EXT)
+		return;
+	record = (canonicalize_ip(_RET_IP_) & KCOV_RECORD_IP_MASK) | KCOV_RECORDFLAG_TYPE_EXIT;
+	kcov_add_pc_record(cur, record);
 }
 #endif
 
@@ -377,7 +404,7 @@ EXPORT_SYMBOL(__sanitizer_cov_trace_switch);
 #endif /* ifdef CONFIG_KCOV_ENABLE_COMPARISONS */
 
 static void kcov_start(struct task_struct *t, struct kcov *kcov,
-			unsigned int size, void *area, enum kcov_mode mode,
+			unsigned int size, void *area, unsigned int mode,
 			int sequence)
 {
 	kcov_debug("t = %px, size = %u, area = %px\n", t, size, area);
@@ -577,6 +604,8 @@ static int kcov_get_mode(unsigned long arg)
 #else
 		return -ENOTSUPP;
 #endif
+	else if (arg == KCOV_TRACE_PC_EXT)
+		return IS_ENABLED(CONFIG_KCOV_EXT_RECORDS) ? KCOV_MODE_TRACE_PC_EXT : -ENOTSUPP;
 	else
 		return -EINVAL;
 }
@@ -1089,7 +1118,7 @@ void kcov_remote_stop(void)
 	 * and kcov_remote_stop(), hence the sequence check.
 	 */
 	if (sequence == kcov->sequence && kcov->remote)
-		kcov_move_area(kcov->mode, kcov->area, kcov->size, area);
+		kcov_move_area(kcov->mode & ~KCOV_EXT_FORMAT, kcov->area, kcov->size, area);
 	spin_unlock(&kcov->lock);
 
 	if (in_task()) {

-- 
2.53.0.851.ga537e3e6e9-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/4] sched: Ensure matching stack state for kcov disable/enable on switch
  2026-03-18 16:26 ` [PATCH v2 1/4] sched: Ensure matching stack state for kcov disable/enable on switch Jann Horn
@ 2026-03-20 22:10   ` Peter Zijlstra
  2026-03-24 15:47     ` Jann Horn
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2026-03-20 22:10 UTC (permalink / raw)
  To: Jann Horn
  Cc: Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko,
	Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	linux-kernel, kasan-dev, llvm, Ingo Molnar

On Wed, Mar 18, 2026 at 05:26:58PM +0100, Jann Horn wrote:
> Ensure that kcov is disabled and enabled with the same call stack.
> This will be relied on by subsequent patches for recording function
> entry/exit records via kcov.
> 
> This patch should not affect compilation of normal kernels without KCOV
> (though it changes "inline" to "__always_inline").
> 
> To: Ingo Molnar <mingo@redhat.com>
> To: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
>  kernel/sched/core.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b7f77c165a6e..c470f0a669ec 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5072,8 +5072,10 @@ static inline void kmap_local_sched_in(void)
>   *
>   * prepare_task_switch sets up locking and calls architecture specific
>   * hooks.
> + *
> + * Must be inlined for kcov_prepare_switch().
>   */
> -static inline void
> +static __always_inline void
>  prepare_task_switch(struct rq *rq, struct task_struct *prev,
>  		    struct task_struct *next)
>  	__must_hold(__rq_lockp(rq))
> @@ -5149,7 +5151,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
>  	tick_nohz_task_switch();
>  	finish_lock_switch(rq);
>  	finish_arch_post_lock_switch();
> -	kcov_finish_switch(current);
>  	/*
>  	 * kmap_local_sched_out() is invoked with rq::lock held and
>  	 * interrupts disabled. There is no requirement for that, but the
> @@ -5295,7 +5296,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
>  	switch_to(prev, next, prev);
>  	barrier();
>  
> -	return finish_task_switch(prev);
> +	rq = finish_task_switch(prev);
> +	/*
> +	 * This has to happen outside finish_task_switch() to ensure that
> +	 * entry/exit records are balanced.
> +	 */

That's not exactly right; the requirement is that kcov_prepare_switch()
and kcov_finish_switch() are called from the exact same frame.

The wording above "outside finish_task_switch" could be anywhere and
doesn't cover the relation to prepare_switch().

> +	kcov_finish_switch(current);
> +	return rq;
>  }

That said; there was a patch that marked finish_task_switch() as
__always_inline too:

  https://lkml.kernel.org/r/20260301083520.110969-4-qq570070308@gmail.com

Except I think that does a little too much for one patch.

Anyway, I'm a little divided on this. Perhaps the simplest and most
obvious way is something like so.

But what about compiler funnies like the various IPA optimizations that
can do partial clones and whatnot? That could result in violating this
constraint, no?

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6e509e292f99..d9925220d51b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5135,7 +5135,6 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
 		    struct task_struct *next)
 	__must_hold(__rq_lockp(rq))
 {
-	kcov_prepare_switch(prev);
 	sched_info_switch(rq, prev, next);
 	perf_event_task_sched_out(prev, next);
 	fire_sched_out_preempt_notifiers(prev, next);
@@ -5206,7 +5205,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	tick_nohz_task_switch();
 	finish_lock_switch(rq);
 	finish_arch_post_lock_switch();
-	kcov_finish_switch(current);
 	/*
 	 * kmap_local_sched_out() is invoked with rq::lock held and
 	 * interrupts disabled. There is no requirement for that, but the
@@ -5294,6 +5292,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	       struct task_struct *next, struct rq_flags *rf)
 	__releases(__rq_lockp(rq))
 {
+	kcov_prepare_switch(prev);
 	prepare_task_switch(rq, prev, next);
 
 	/*
@@ -5352,7 +5351,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	switch_to(prev, next, prev);
 	barrier();
 
-	return finish_task_switch(prev);
+	rq = finish_task_switch(prev);
+	/*
+	 * kcov_prepare_switch() above, and kcov_finish_switch() must be
+	 * called from the same stack frame.
+	 */
+	kcov_finish_switch(current);
+	return rq;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/4] sched: Ensure matching stack state for kcov disable/enable on switch
  2026-03-20 22:10   ` Peter Zijlstra
@ 2026-03-24 15:47     ` Jann Horn
  2026-03-24 17:18       ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Jann Horn @ 2026-03-24 15:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko,
	Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	linux-kernel, kasan-dev, llvm, Ingo Molnar

On Fri, Mar 20, 2026 at 11:10 PM Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Mar 18, 2026 at 05:26:58PM +0100, Jann Horn wrote:
> > +     /*
> > +      * This has to happen outside finish_task_switch() to ensure that
> > +      * entry/exit records are balanced.
> > +      */
>
> That's not exactly right; the requirement is that kcov_prepare_switch()
> and kcov_finish_switch() are called from the exact same frame.
>
> The wording above "outside finish_task_switch" could be anywhere and
> doesn't cover the relation to prepare_switch().
>
> > +     kcov_finish_switch(current);
> > +     return rq;
> >  }
>
> That said; there was a patch that marked finish_task_switch() as
> __always_inline too:
>
>   https://lkml.kernel.org/r/20260301083520.110969-4-qq570070308@gmail.com
>
> Except I think that does a little too much for one patch.
>
> Anyway, I'm a little divided on this. Perhaps the simplest and most
> obvious way is something like so.
>
> But what about compiler funnies like the various IPA optimizations that
> can do partial clones and whatnot? That could result in violating this
> constraint, no?

Ah, good point, bah. And commit 0ed557aa8139 ("sched/core / kcov:
avoid kcov_area during task switch") explains that we also can't just
keep emitting kcov records throughout a context switch.

So options I have are:

1. In the scheduler: Try to put enough magic annotations on __schedule
to prevent any such optimizations. (Probably "noinline __noclone"?)
But that's kind of invasive, I assume the scheduler wouldn't want to
have that unconditionally on a core function because of a niche KCOV
usecase.
2. In KCOV (without touching the scheduler code): Keep track of the
current and minimum stack depth while record generation is suppressed,
and emit those values when record generation is reenabled. That's a
bit more complexity but keeps it contained within KCOV, and I guess a
way to suppress events might be useful for other stuff too.

I guess I'll probably go with option 2...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/4] sched: Ensure matching stack state for kcov disable/enable on switch
  2026-03-24 15:47     ` Jann Horn
@ 2026-03-24 17:18       ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2026-03-24 17:18 UTC (permalink / raw)
  To: Jann Horn
  Cc: Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko,
	Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	linux-kernel, kasan-dev, llvm, Ingo Molnar

On Tue, Mar 24, 2026 at 04:47:51PM +0100, Jann Horn wrote:

> > Anyway, I'm a little divided on this. Perhaps the simplest and most
> > obvious way is something like so.
> >
> > But what about compiler funnies like the various IPA optimizations that
> > can do partial clones and whatnot? That could result in violating this
> > constraint, no?
> 
> Ah, good point, bah. And commit 0ed557aa8139 ("sched/core / kcov:
> avoid kcov_area during task switch") explains that we also can't just
> keep emitting kcov records throughout a context switch.

Oh, fun!

> So options I have are:
> 
> 1. In the scheduler: Try to put enough magic annotations on __schedule
> to prevent any such optimizations. (Probably "noinline __noclone"?)
> But that's kind of invasive, I assume the scheduler wouldn't want to
> have that unconditionally on a core function because of a niche KCOV
> usecase.

So I noticed that clang doesn't even support __noclone, which is a bit
of a stumbling block there. But while going through the various function
attributes I did note __flatten, and I think we can use that.

But yeah, without then inhibiting cloning and the like this isn't going
to help. I found that linkers can also do cloning (as part of LTO I
guess).

> 2. In KCOV (without touching the scheduler code): Keep track of the
> current and minimum stack depth while record generation is suppressed,
> and emit those values when record generation is reenabled. That's a
> bit more complexity but keeps it contained within KCOV, and I guess a
> way to suppress events might be useful for other stuff too.
> 
> I guess I'll probably go with option 2...

Yes, I suppose not relying on the compiler doing things exactly so is
the more robust approach.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-24 17:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18 16:26 [PATCH v2 0/4] KCOV function entry/exit records Jann Horn
2026-03-18 16:26 ` [PATCH v2 1/4] sched: Ensure matching stack state for kcov disable/enable on switch Jann Horn
2026-03-20 22:10   ` Peter Zijlstra
2026-03-24 15:47     ` Jann Horn
2026-03-24 17:18       ` Peter Zijlstra
2026-03-18 16:26 ` [PATCH v2 2/4] kcov: wire up compiler instrumentation for CONFIG_KCOV_EXT_RECORDS Jann Horn
2026-03-18 16:27 ` [PATCH v2 3/4] kcov: refactor mode check out of check_kcov_mode() Jann Horn
2026-03-18 16:27 ` [PATCH v2 4/4] kcov: introduce extended PC coverage collection mode Jann Horn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox