linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86
@ 2025-04-04 13:14 Maciej Wieczor-Retman
  2025-04-04 13:14 ` [PATCH v3 01/14] kasan: sw_tags: Use arithmetic shift for shadow computation Maciej Wieczor-Retman
                   ` (13 more replies)
  0 siblings, 14 replies; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

Changes v3:
- Remove the runtime_const patch and setup a unified offset for both 5
  and 4 paging levels.
- Add a fix for inline mode on x86 tag-based KASAN. Add a handler for
  int3 that is generated on inline tag mismatches.
- Fix scripts/gdb/linux/kasan.py so the new signed mem_to_shadow() is
  reflected there.
- Fix Documentation/arch/arm64/kasan-offsets.sh to take new offsets into
  account.
- Made changes to the kasan_non_canonical_hook() according to upstream
  discussion.
- Remove patches 2 and 3 since they related to risc-v and this series
  adds only x86 related things.
- Reorder __tag_*() functions so they're before arch_kasan_*(). Remove
  CONFIG_KASAN condition from __tag_set().

Changes v2:
- Split the series into one adding KASAN tag-based mode (this one) and
  another one that adds the dense mode to KASAN (will post later).
- Removed exporting kasan_poison() and used a wrapper instead in
  kasan_init_64.c
- Prepended series with 4 patches from the risc-v series and applied
  review comments to the first patch as the rest already are reviewed.

======= Introduction
The patchset aims to add a KASAN tag-based mode for the x86 architecture
with the help of the new CPU feature called Linear Address Masking
(LAM). Main improvement introduced by the series is 2x lower memory
usage compared to KASAN's generic mode, the only currently available
mode on x86.

There are two relevant series in the process of adding KASAN tag-based
support to x86. This one focuses on implementing and enabling the
tag-based mode for the x86 architecture by using LAM. The second one
attempts to add a new memory saving mechanism called "dense mode" to the
non-arch part of the tag-based KASAN code. It can provide another 2x
memory savings by packing tags denser in the shadow memory.

======= How does KASAN' tag-based mode work?
When enabled, memory accesses and allocations are augmented by the
compiler during kernel compilation. Instrumentation functions are added
to each memory allocation and each pointer dereference.

The allocation related functions generate a random tag and save it in
two places: in shadow memory that maps to the allocated memory, and in
the top bits of the pointer that points to the allocated memory. Storing
the tag in the top of the pointer is possible because of Top-Byte Ignore
(TBI) on arm64 architecture and LAM on x86.

The access related functions are performing a comparison between the tag
stored in the pointer and the one stored in shadow memory. If the tags
don't match an out of bounds error must have occurred and so an error
report is generated.

The general idea for the tag-based mode is very well explained in the
series with the original implementation [1].

[1] https://lore.kernel.org/all/cover.1544099024.git.andreyknvl@google.com/

======= Differences summary compared to the arm64 tag-based mode
- Tag width:
	- Tag width influences the chance of a tag mismatch due to two
	  tags from different allocations having the same value. The
	  bigger the possible range of tag values the lower the chance
	  of that happening.
	- Shortening the tag width from 8 bits to 4, while it can help
	  with memory usage, it also increases the chance of not
	  reporting an error. 4 bit tags have a ~7% chance of a tag
	  mismatch.

- TBI and LAM
	- TBI in arm64 allows for storing metadata in the top 8 bits of
	  the virtual address.
	- LAM in x86 allows storing tags in bits [62:57] of the pointer.
	  To maximize memory savings the tag width is reduced to bits
	  [60:57].

======= Testing
Checked all the kunits for both software tags and generic KASAN after
making changes.

In generic mode the results were:

kasan: pass:59 fail:0 skip:13 total:72
Totals: pass:59 fail:0 skip:13 total:72
ok 1 kasan

and for software tags:

kasan: pass:63 fail:0 skip:9 total:72
Totals: pass:63 fail:0 skip:9 total:72
ok 1 kasan

======= Benchmarks
All tests were ran on a Sierra Forest server platform with 512GB of
memory. The only differences between the tests were kernel options:
	- CONFIG_KASAN
	- CONFIG_KASAN_GENERIC
	- CONFIG_KASAN_SW_TAGS
	- CONFIG_KASAN_INLINE [1]
	- CONFIG_KASAN_OUTLINE

More benchmarks are noted in the second series that adds the dense mode
to KASAN. That's because most values on x86' tag-based mode are tailored
to work well with that.

Boot time (until login prompt):
* 03:48 for clean kernel
* 08:02 / 09:45 for generic KASAN (inline/outline)
* 08:50 for tag-based KASAN
* 04:50 for tag-based KASAN with stacktrace disabled [1]

Compilation time comparison (10 cores):
* 7:27 for clean kernel
* 8:21/7:44 for generic KASAN (inline/outline)
* 7:41 for tag-based KASAN

[1] Currently (after getting it enabled in the Makefile) inline mode
doesn't work on x86. It's probably due to something missing in the
compiler and I aim to figure this out when working on the second series
that adds the dense mode (and will need compiler support anyway).

[2] Memory allocation and freeing performance suffers heavily from saving
stacktraces that can be later displayed in error reports.

======= Compilation
Clang was used to compile the series (make LLVM=1) since gcc doesn't
seem to have support for KASAN tag-based compiler instrumentation on
x86.

======= Dependencies
The base branch for the series is the tip x86/mm branch.

======= Enabling LAM for testing the series without LASS
Since LASS is needed for LAM and it can't be compiled without it I
enabled LAM during testing with the patch below:

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2275,7 +2275,7 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 config ADDRESS_MASKING
 	bool "Linear Address Masking support"
 	depends on X86_64
-	depends on COMPILE_TEST || !CPU_MITIGATIONS # wait for LASS
+	#depends on COMPILE_TEST || !CPU_MITIGATIONS # wait for LASS
 	help
 	  Linear Address Masking (LAM) modifies the checking that is applied
 	  to 64-bit linear addresses, allowing software to use of the

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2401,9 +2401,10 @@ void __init arch_cpu_finalize_init(void)

 		/*
 		 * Enable this when LAM is gated on LASS support
+		 */
 		if (cpu_feature_enabled(X86_FEATURE_LAM))
 			USER_PTR_MAX = (1ul << 63) - PAGE_SIZE;
-		 */
+
 		runtime_const_init(ptr, USER_PTR_MAX);

 		/*

Maciej Wieczor-Retman (12):
  x86: Add arch specific kasan functions
  kasan: arm64: x86: Make special tags arch specific
  x86: Reset tag for virtual to physical address conversions
  x86: Physical address comparisons in fill_p*d/pte
  x86: KASAN raw shadow memory PTE init
  x86: LAM initialization
  x86: Minimal SLAB alignment
  x86: Update the KASAN non-canonical hook
  x86: Handle int3 for inline KASAN reports
  kasan: Fix inline mode for x86 tag-based mode
  mm: Unpoison pcpu chunks with base address tag
  x86: Make software tag-based kasan available

Samuel Holland (2):
  kasan: sw_tags: Use arithmetic shift for shadow computation
  kasan: sw_tags: Support tag widths less than 8 bits

 Documentation/arch/arm64/kasan-offsets.sh |  8 ++-
 Documentation/arch/x86/x86_64/mm.rst      |  6 +-
 MAINTAINERS                               |  2 +-
 arch/arm64/Kconfig                        | 10 ++--
 arch/arm64/include/asm/kasan-tags.h       |  9 +++
 arch/arm64/include/asm/kasan.h            |  6 +-
 arch/arm64/include/asm/memory.h           | 14 ++++-
 arch/arm64/include/asm/uaccess.h          |  1 +
 arch/arm64/mm/kasan_init.c                |  7 ++-
 arch/x86/Kconfig                          |  5 +-
 arch/x86/boot/compressed/misc.h           |  1 +
 arch/x86/include/asm/cache.h              |  4 ++
 arch/x86/include/asm/kasan-tags.h         |  9 +++
 arch/x86/include/asm/kasan.h              | 41 ++++++++++++-
 arch/x86/include/asm/page.h               | 17 ++++--
 arch/x86/include/asm/page_64.h            |  2 +-
 arch/x86/kernel/alternative.c             |  3 +
 arch/x86/kernel/head_64.S                 |  3 +
 arch/x86/kernel/setup.c                   |  2 +
 arch/x86/kernel/traps.c                   | 52 +++++++++++++++++
 arch/x86/mm/fault.c                       |  2 +
 arch/x86/mm/init.c                        |  3 +
 arch/x86/mm/init_64.c                     | 11 ++--
 arch/x86/mm/kasan_init_64.c               | 21 +++++--
 arch/x86/mm/physaddr.c                    |  1 +
 include/linux/kasan-tags.h                | 19 ++++--
 include/linux/kasan.h                     | 24 +++++++-
 include/linux/mm.h                        |  6 +-
 include/linux/page-flags-layout.h         |  7 +--
 lib/Kconfig.kasan                         |  3 +-
 mm/kasan/report.c                         | 70 +++++++++++++++++++++--
 mm/kasan/shadow.c                         | 11 ++++
 mm/vmalloc.c                              |  3 +-
 scripts/Makefile.kasan                    |  3 +
 scripts/gdb/linux/kasan.py                |  3 +
 scripts/gdb/linux/mm.py                   |  5 +-
 36 files changed, 336 insertions(+), 58 deletions(-)
 mode change 100644 => 100755 Documentation/arch/arm64/kasan-offsets.sh
 create mode 100644 arch/arm64/include/asm/kasan-tags.h
 create mode 100644 arch/x86/include/asm/kasan-tags.h

-- 
2.49.0



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 01/14] kasan: sw_tags: Use arithmetic shift for shadow computation
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 13:14 ` [PATCH v3 02/14] kasan: sw_tags: Support tag widths less than 8 bits Maciej Wieczor-Retman
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

From: Samuel Holland <samuel.holland@sifive.com>

Currently, kasan_mem_to_shadow() uses a logical right shift, which turns
canonical kernel addresses into non-canonical addresses by clearing the
high KASAN_SHADOW_SCALE_SHIFT bits. The value of KASAN_SHADOW_OFFSET is
then chosen so that the addition results in a canonical address for the
shadow memory.

For KASAN_GENERIC, this shift/add combination is ABI with the compiler,
because KASAN_SHADOW_OFFSET is used in compiler-generated inline tag
checks[1], which must only attempt to dereference canonical addresses.

However, for KASAN_SW_TAGS we have some freedom to change the algorithm
without breaking the ABI. Because TBI is enabled for kernel addresses,
the top bits of shadow memory addresses computed during tag checks are
irrelevant, and so likewise are the top bits of KASAN_SHADOW_OFFSET.
This is demonstrated by the fact that LLVM uses a logical right shift
in the tag check fast path[2] but a sbfx (signed bitfield extract)
instruction in the slow path[3] without causing any issues.

Using an arithmetic shift in kasan_mem_to_shadow() provides a number of
benefits:

1) The memory layout doesn't change but is easier to understand.
KASAN_SHADOW_OFFSET becomes a canonical memory address, and the shifted
pointer becomes a negative offset, so KASAN_SHADOW_OFFSET ==
KASAN_SHADOW_END regardless of the shift amount or the size of the
virtual address space.

2) KASAN_SHADOW_OFFSET becomes a simpler constant, requiring only one
instruction to load instead of two. Since it must be loaded in each
function with a tag check, this decreases kernel text size by 0.5%.

3) This shift and the sign extension from kasan_reset_tag() can be
combined into a single sbfx instruction. When this same algorithm change
is applied to the compiler, it removes an instruction from each inline
tag check, further reducing kernel text size by an additional 4.6%.

These benefits extend to other architectures as well. On RISC-V, where
the baseline ISA does not shifted addition or have an equivalent to the
sbfx instruction, loading KASAN_SHADOW_OFFSET is reduced from 3 to 2
instructions, and kasan_mem_to_shadow(kasan_reset_tag(addr)) similarly
combines two consecutive right shifts.

Make changes to kasan_non_canonical_hook() - specifically the first part
that tries to deduce if a faulty address came from
kasan_mem_to_shadow().

Link: https://github.com/llvm/llvm-project/blob/llvmorg-20-init/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp#L1316 [1]
Link: https://github.com/llvm/llvm-project/blob/llvmorg-20-init/llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp#L895 [2]
Link: https://github.com/llvm/llvm-project/blob/llvmorg-20-init/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp#L669 [3]
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v3: (Maciej)
- Fix scripts/gdb/linux/kasan.py so the new signed mem_to_shadow() is
  reflected there.
- Fix Documentation/arch/arm64/kasan-offsets.sh to take new offsets into
  account.
- Made changes to the kasan_non_canonical_hook() according to upstream
  discussion. Moved x86 part to separate patch later.

Changelog v2: (Maciej)
- Correct address range that's checked in kasan_non_canonical_hook().
  Adjust the comment inside.
- Remove part of comment from arch/arm64/include/asm/memory.h.
- Append patch message paragraph about the overflow in
  kasan_non_canonical_hook().

 Documentation/arch/arm64/kasan-offsets.sh |  8 +++--
 arch/arm64/Kconfig                        | 10 +++----
 arch/arm64/include/asm/memory.h           | 14 ++++++++-
 arch/arm64/mm/kasan_init.c                |  7 +++--
 include/linux/kasan.h                     | 10 +++++--
 mm/kasan/report.c                         | 36 ++++++++++++++++++++---
 scripts/gdb/linux/kasan.py                |  3 ++
 scripts/gdb/linux/mm.py                   |  5 ++--
 8 files changed, 75 insertions(+), 18 deletions(-)
 mode change 100644 => 100755 Documentation/arch/arm64/kasan-offsets.sh

diff --git a/Documentation/arch/arm64/kasan-offsets.sh b/Documentation/arch/arm64/kasan-offsets.sh
old mode 100644
new mode 100755
index 2dc5f9e18039..ce777c7c7804
--- a/Documentation/arch/arm64/kasan-offsets.sh
+++ b/Documentation/arch/arm64/kasan-offsets.sh
@@ -5,8 +5,12 @@
 
 print_kasan_offset () {
 	printf "%02d\t" $1
-	printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \
-			- (1 << (64 - 32 - $2)) ))
+	if [[ $2 -ne 4 ]] then
+		printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \
+				- (1 << (64 - 32 - $2)) ))
+	else
+		printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) ))
+	fi
 }
 
 echo KASAN_SHADOW_SCALE_SHIFT = 3
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 62dc903ecc7f..10e215d6e435 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -428,11 +428,11 @@ config KASAN_SHADOW_OFFSET
 	default 0xdffffe0000000000 if ARM64_VA_BITS_42 && !KASAN_SW_TAGS
 	default 0xdfffffc000000000 if ARM64_VA_BITS_39 && !KASAN_SW_TAGS
 	default 0xdffffff800000000 if ARM64_VA_BITS_36 && !KASAN_SW_TAGS
-	default 0xefff800000000000 if (ARM64_VA_BITS_48 || (ARM64_VA_BITS_52 && !ARM64_16K_PAGES)) && KASAN_SW_TAGS
-	default 0xefffc00000000000 if (ARM64_VA_BITS_47 || ARM64_VA_BITS_52) && ARM64_16K_PAGES && KASAN_SW_TAGS
-	default 0xeffffe0000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS
-	default 0xefffffc000000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS
-	default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
+	default 0xffff800000000000 if (ARM64_VA_BITS_48 || (ARM64_VA_BITS_52 && !ARM64_16K_PAGES)) && KASAN_SW_TAGS
+	default 0xffffc00000000000 if (ARM64_VA_BITS_47 || ARM64_VA_BITS_52) && ARM64_16K_PAGES && KASAN_SW_TAGS
+	default 0xfffffe0000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS
+	default 0xffffffc000000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS
+	default 0xfffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
 	default 0xffffffffffffffff
 
 config UNWIND_TABLES
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 717829df294e..e71cdf036287 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -89,7 +89,15 @@
  *
  * KASAN_SHADOW_END is defined first as the shadow address that corresponds to
  * the upper bound of possible virtual kernel memory addresses UL(1) << 64
- * according to the mapping formula.
+ * according to the mapping formula. For Generic KASAN, the address in the
+ * mapping formula is treated as unsigned (part of the compiler's ABI), so the
+ * end of the shadow memory region is at a large positive offset from
+ * KASAN_SHADOW_OFFSET. For Software Tag-Based KASAN, the address in the
+ * formula is treated as signed. Since all kernel addresses are negative, they
+ * map to shadow memory below KASAN_SHADOW_OFFSET, making KASAN_SHADOW_OFFSET
+ * itself the end of the shadow memory region. (User pointers are positive and
+ * would map to shadow memory above KASAN_SHADOW_OFFSET, but shadow memory is
+ * not allocated for them.)
  *
  * KASAN_SHADOW_START is defined second based on KASAN_SHADOW_END. The shadow
  * memory start must map to the lowest possible kernel virtual memory address
@@ -100,7 +108,11 @@
  */
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
+#ifdef CONFIG_KASAN_GENERIC
 #define KASAN_SHADOW_END	((UL(1) << (64 - KASAN_SHADOW_SCALE_SHIFT)) + KASAN_SHADOW_OFFSET)
+#else
+#define KASAN_SHADOW_END	KASAN_SHADOW_OFFSET
+#endif
 #define _KASAN_SHADOW_START(va)	(KASAN_SHADOW_END - (UL(1) << ((va) - KASAN_SHADOW_SCALE_SHIFT)))
 #define KASAN_SHADOW_START	_KASAN_SHADOW_START(vabits_actual)
 #define PAGE_END		KASAN_SHADOW_START
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index d541ce45daeb..dc2de12c4f26 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -198,8 +198,11 @@ static bool __init root_level_aligned(u64 addr)
 /* The early shadow maps everything to a single page of zeroes */
 asmlinkage void __init kasan_early_init(void)
 {
-	BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
-		KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
+	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
+		BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
+			KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
+	else
+		BUILD_BUG_ON(KASAN_SHADOW_OFFSET != KASAN_SHADOW_END);
 	BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS), SHADOW_ALIGN));
 	BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS_MIN), SHADOW_ALIGN));
 	BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, SHADOW_ALIGN));
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 890011071f2b..b396feca714f 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -61,8 +61,14 @@ int kasan_populate_early_shadow(const void *shadow_start,
 #ifndef kasan_mem_to_shadow
 static inline void *kasan_mem_to_shadow(const void *addr)
 {
-	return (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
-		+ KASAN_SHADOW_OFFSET;
+	void *scaled;
+
+	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
+		scaled = (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT);
+	else
+		scaled = (void *)((long)addr >> KASAN_SHADOW_SCALE_SHIFT);
+
+	return KASAN_SHADOW_OFFSET + scaled;
 }
 #endif
 
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 8357e1a33699..f24f11cc644a 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -681,11 +681,39 @@ void kasan_non_canonical_hook(unsigned long addr)
 	const char *bug_type;
 
 	/*
-	 * All addresses that came as a result of the memory-to-shadow mapping
-	 * (even for bogus pointers) must be >= KASAN_SHADOW_OFFSET.
+	 * For Generic KASAN, kasan_mem_to_shadow() uses the logical right shift
+	 * and never overflows with the chosen KASAN_SHADOW_OFFSET values (on
+	 * both x86 and arm64). Thus, the possible shadow addresses (even for
+	 * bogus pointers) belong to a single contiguous region that is the
+	 * result of kasan_mem_to_shadow() applied to the whole address space.
 	 */
-	if (addr < KASAN_SHADOW_OFFSET)
-		return;
+	if (IS_ENABLED(CONFIG_KASAN_GENERIC)) {
+		if (addr < (u64)kasan_mem_to_shadow((void *)(0UL)) ||
+		    addr > (u64)kasan_mem_to_shadow((void *)(~0UL)))
+			return;
+	}
+
+	/*
+	 * For Software Tag-Based KASAN, kasan_mem_to_shadow() uses the
+	 * arithmetic shift. Normally, this would make checking for a possible
+	 * shadow address complicated, as the shadow address computation
+	 * operation would overflow only for some memory addresses. However, due
+	 * to the chosen KASAN_SHADOW_OFFSET values and the fact the
+	 * kasan_mem_to_shadow() only operates on pointers with the tag reset,
+	 * the overflow always happens.
+	 *
+	 * For arm64, the top byte of the pointer gets reset to 0xFF. Thus, the
+	 * possible shadow addresses belong to a region that is the result of
+	 * kasan_mem_to_shadow() applied to the memory range
+	 * [0xFF000000000000, 0xFFFFFFFFFFFFFFFF]. Despite the overflow, the
+	 * resulting possible shadow region is contiguous, as the overflow
+	 * happens for both 0xFF000000000000 and 0xFFFFFFFFFFFFFFFF.
+	 */
+	if (IS_ENABLED(CONFIG_KASAN_SW_TAGS) && IS_ENABLED(CONFIG_ARM64)) {
+		if (addr < (u64)kasan_mem_to_shadow((void *)(0xFFUL << 56)) ||
+		    addr > (u64)kasan_mem_to_shadow((void *)(~0UL)))
+			return;
+	}
 
 	orig_addr = (unsigned long)kasan_shadow_to_mem((void *)addr);
 
diff --git a/scripts/gdb/linux/kasan.py b/scripts/gdb/linux/kasan.py
index 56730b3fde0b..fca39968d308 100644
--- a/scripts/gdb/linux/kasan.py
+++ b/scripts/gdb/linux/kasan.py
@@ -8,6 +8,7 @@
 
 import gdb
 from linux import constants, mm
+from ctypes import c_int64 as s64
 
 def help():
     t = """Usage: lx-kasan_mem_to_shadow [Hex memory addr]
@@ -39,6 +40,8 @@ class KasanMemToShadow(gdb.Command):
         else:
             help()
     def kasan_mem_to_shadow(self, addr):
+        if constants.CONFIG_KASAN_SW_TAGS:
+            addr = s64(addr)
         return (addr >> self.p_ops.KASAN_SHADOW_SCALE_SHIFT) + self.p_ops.KASAN_SHADOW_OFFSET
 
 KasanMemToShadow()
diff --git a/scripts/gdb/linux/mm.py b/scripts/gdb/linux/mm.py
index 7571aebbe650..2e63f3dedd53 100644
--- a/scripts/gdb/linux/mm.py
+++ b/scripts/gdb/linux/mm.py
@@ -110,12 +110,13 @@ class aarch64_page_ops():
         self.KERNEL_END = gdb.parse_and_eval("_end")
 
         if constants.LX_CONFIG_KASAN_GENERIC or constants.LX_CONFIG_KASAN_SW_TAGS:
+            self.KASAN_SHADOW_OFFSET = constants.LX_CONFIG_KASAN_SHADOW_OFFSET
             if constants.LX_CONFIG_KASAN_GENERIC:
                 self.KASAN_SHADOW_SCALE_SHIFT = 3
+                self.KASAN_SHADOW_END = (1 << (64 - self.KASAN_SHADOW_SCALE_SHIFT)) + self.KASAN_SHADOW_OFFSET
             else:
                 self.KASAN_SHADOW_SCALE_SHIFT = 4
-            self.KASAN_SHADOW_OFFSET = constants.LX_CONFIG_KASAN_SHADOW_OFFSET
-            self.KASAN_SHADOW_END = (1 << (64 - self.KASAN_SHADOW_SCALE_SHIFT)) + self.KASAN_SHADOW_OFFSET
+                self.KASAN_SHADOW_END = self.KASAN_SHADOW_OFFSET
             self.PAGE_END = self.KASAN_SHADOW_END - (1 << (self.vabits_actual - self.KASAN_SHADOW_SCALE_SHIFT))
         else:
             self.PAGE_END = self._PAGE_END(self.VA_BITS_MIN)
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 02/14] kasan: sw_tags: Support tag widths less than 8 bits
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
  2025-04-04 13:14 ` [PATCH v3 01/14] kasan: sw_tags: Use arithmetic shift for shadow computation Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 13:14 ` [PATCH v3 03/14] x86: Add arch specific kasan functions Maciej Wieczor-Retman
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

From: Samuel Holland <samuel.holland@sifive.com>

Allow architectures to override KASAN_TAG_KERNEL in asm/kasan.h. This is
needed on x86 and RISC-V, which support different tag widths. For
consistency, move the arm64 MTE definition of KASAN_TAG_MIN to
asm/kasan.h, since it is also architecture-dependent.

Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
 arch/arm64/include/asm/kasan.h   |  6 ++++--
 arch/arm64/include/asm/uaccess.h |  1 +
 include/linux/kasan-tags.h       | 13 ++++++++-----
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
index e1b57c13f8a4..4ab419df8b93 100644
--- a/arch/arm64/include/asm/kasan.h
+++ b/arch/arm64/include/asm/kasan.h
@@ -6,8 +6,10 @@
 
 #include <linux/linkage.h>
 #include <asm/memory.h>
-#include <asm/mte-kasan.h>
-#include <asm/pgtable-types.h>
+
+#ifdef CONFIG_KASAN_HW_TAGS
+#define KASAN_TAG_MIN			0xF0 /* minimum value for random tags */
+#endif
 
 #define arch_kasan_set_tag(addr, tag)	__tag_set(addr, tag)
 #define arch_kasan_reset_tag(addr)	__tag_reset(addr)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 5b91803201ef..f890dadc7b4e 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -22,6 +22,7 @@
 #include <asm/cpufeature.h>
 #include <asm/mmu.h>
 #include <asm/mte.h>
+#include <asm/mte-kasan.h>
 #include <asm/ptrace.h>
 #include <asm/memory.h>
 #include <asm/extable.h>
diff --git a/include/linux/kasan-tags.h b/include/linux/kasan-tags.h
index 4f85f562512c..e07c896f95d3 100644
--- a/include/linux/kasan-tags.h
+++ b/include/linux/kasan-tags.h
@@ -2,13 +2,16 @@
 #ifndef _LINUX_KASAN_TAGS_H
 #define _LINUX_KASAN_TAGS_H
 
+#include <asm/kasan.h>
+
+#ifndef KASAN_TAG_KERNEL
 #define KASAN_TAG_KERNEL	0xFF /* native kernel pointers tag */
-#define KASAN_TAG_INVALID	0xFE /* inaccessible memory tag */
-#define KASAN_TAG_MAX		0xFD /* maximum value for random tags */
+#endif
+
+#define KASAN_TAG_INVALID	(KASAN_TAG_KERNEL - 1) /* inaccessible memory tag */
+#define KASAN_TAG_MAX		(KASAN_TAG_KERNEL - 2) /* maximum value for random tags */
 
-#ifdef CONFIG_KASAN_HW_TAGS
-#define KASAN_TAG_MIN		0xF0 /* minimum value for random tags */
-#else
+#ifndef KASAN_TAG_MIN
 #define KASAN_TAG_MIN		0x00 /* minimum value for random tags */
 #endif
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 03/14] x86: Add arch specific kasan functions
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
  2025-04-04 13:14 ` [PATCH v3 01/14] kasan: sw_tags: Use arithmetic shift for shadow computation Maciej Wieczor-Retman
  2025-04-04 13:14 ` [PATCH v3 02/14] kasan: sw_tags: Support tag widths less than 8 bits Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 16:06   ` Dave Hansen
  2025-04-04 13:14 ` [PATCH v3 04/14] kasan: arm64: x86: Make special tags arch specific Maciej Wieczor-Retman
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

KASAN's software tag-based mode needs multiple macros/functions to
handle tag and pointer interactions - mainly to set and retrieve tags
from the top bits of a pointer.

Mimic functions currently used by arm64 but change the tag's position to
bits [60:57] in the pointer.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v3:
- Reorder functions so that __tag_*() etc are above the
  arch_kasan_*() ones.
- Remove CONFIG_KASAN condition from __tag_set()

 arch/x86/include/asm/kasan.h | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kasan.h b/arch/x86/include/asm/kasan.h
index d7e33c7f096b..212218622963 100644
--- a/arch/x86/include/asm/kasan.h
+++ b/arch/x86/include/asm/kasan.h
@@ -3,6 +3,8 @@
 #define _ASM_X86_KASAN_H
 
 #include <linux/const.h>
+#include <linux/kasan-tags.h>
+#include <linux/types.h>
 #define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
 #define KASAN_SHADOW_SCALE_SHIFT 3
 
@@ -24,8 +26,33 @@
 						  KASAN_SHADOW_SCALE_SHIFT)))
 
 #ifndef __ASSEMBLER__
+#include <linux/bitops.h>
+#include <linux/bitfield.h>
+#include <linux/bits.h>
+
+#ifdef CONFIG_KASAN_SW_TAGS
+
+#define __tag_shifted(tag)		FIELD_PREP(GENMASK_ULL(60, 57), tag)
+#define __tag_reset(addr)		(sign_extend64((u64)(addr), 56))
+#define __tag_get(addr)			((u8)FIELD_GET(GENMASK_ULL(60, 57), (u64)addr))
+#else
+#define __tag_shifted(tag)		0UL
+#define __tag_reset(addr)		(addr)
+#define __tag_get(addr)			0
+#endif /* CONFIG_KASAN_SW_TAGS */
+
+static inline const void *__tag_set(const void *addr, u8 tag)
+{
+	u64 __addr = (u64)addr & ~__tag_shifted(KASAN_TAG_KERNEL);
+	return (const void *)(__addr | __tag_shifted(tag));
+}
+
+#define arch_kasan_set_tag(addr, tag)	__tag_set(addr, tag)
+#define arch_kasan_reset_tag(addr)	__tag_reset(addr)
+#define arch_kasan_get_tag(addr)	__tag_get(addr)
 
 #ifdef CONFIG_KASAN
+
 void __init kasan_early_init(void);
 void __init kasan_init(void);
 void __init kasan_populate_shadow_for_vaddr(void *va, size_t size, int nid);
@@ -34,8 +61,9 @@ static inline void kasan_early_init(void) { }
 static inline void kasan_init(void) { }
 static inline void kasan_populate_shadow_for_vaddr(void *va, size_t size,
 						   int nid) { }
-#endif
 
-#endif
+#endif /* CONFIG_KASAN */
+
+#endif /* __ASSEMBLER__ */
 
 #endif
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 04/14] kasan: arm64: x86: Make special tags arch specific
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
                   ` (2 preceding siblings ...)
  2025-04-04 13:14 ` [PATCH v3 03/14] x86: Add arch specific kasan functions Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 13:14 ` [PATCH v3 05/14] x86: Reset tag for virtual to physical address conversions Maciej Wieczor-Retman
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

KASAN's tag-based mode defines multiple special tag values. They're
reserved for:
- Native kernel value. On arm64 it's 0xFF and it causes an early return
  in the tag checking function.
- Invalid value. 0xFE marks an area as freed / unallocated. It's also
  the value that is used to initialize regions of shadow memory.
- Max value. 0xFD is the highest value that can be randomly generated
  for a new tag.

Metadata macro is also defined:
- Tag width equal to 8.

Tag-based mode on x86 is going to use 4 bit wide tags so all the above
values need to be changed accordingly.

Make native kernel tag arch specific for x86 and arm64.

Replace hardcoded kernel tag value and tag width with macros in KASAN's
non-arch specific code.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v2:
- Remove risc-v from the patch.

 MAINTAINERS                         | 2 +-
 arch/arm64/include/asm/kasan-tags.h | 9 +++++++++
 arch/x86/include/asm/kasan-tags.h   | 9 +++++++++
 include/linux/kasan-tags.h          | 8 +++++++-
 include/linux/kasan.h               | 4 +++-
 include/linux/mm.h                  | 6 +++---
 include/linux/page-flags-layout.h   | 7 +------
 7 files changed, 33 insertions(+), 12 deletions(-)
 create mode 100644 arch/arm64/include/asm/kasan-tags.h
 create mode 100644 arch/x86/include/asm/kasan-tags.h

diff --git a/MAINTAINERS b/MAINTAINERS
index d5dfb9186962..e6c0a6fff9f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12728,7 +12728,7 @@ L:	kasan-dev@googlegroups.com
 S:	Maintained
 B:	https://bugzilla.kernel.org/buglist.cgi?component=Sanitizers&product=Memory%20Management
 F:	Documentation/dev-tools/kasan.rst
-F:	arch/*/include/asm/*kasan.h
+F:	arch/*/include/asm/*kasan*.h
 F:	arch/*/mm/kasan_init*
 F:	include/linux/kasan*.h
 F:	lib/Kconfig.kasan
diff --git a/arch/arm64/include/asm/kasan-tags.h b/arch/arm64/include/asm/kasan-tags.h
new file mode 100644
index 000000000000..8cb12ebae57f
--- /dev/null
+++ b/arch/arm64/include/asm/kasan-tags.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KASAN_TAGS_H
+#define __ASM_KASAN_TAGS_H
+
+#define KASAN_TAG_KERNEL	0xFF /* native kernel pointers tag */
+
+#define KASAN_TAG_WIDTH		8
+
+#endif /* ASM_KASAN_TAGS_H */
diff --git a/arch/x86/include/asm/kasan-tags.h b/arch/x86/include/asm/kasan-tags.h
new file mode 100644
index 000000000000..68ba385bc75c
--- /dev/null
+++ b/arch/x86/include/asm/kasan-tags.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KASAN_TAGS_H
+#define __ASM_KASAN_TAGS_H
+
+#define KASAN_TAG_KERNEL	0xF /* native kernel pointers tag */
+
+#define KASAN_TAG_WIDTH		4
+
+#endif /* ASM_KASAN_TAGS_H */
diff --git a/include/linux/kasan-tags.h b/include/linux/kasan-tags.h
index e07c896f95d3..ad5c11950233 100644
--- a/include/linux/kasan-tags.h
+++ b/include/linux/kasan-tags.h
@@ -2,7 +2,13 @@
 #ifndef _LINUX_KASAN_TAGS_H
 #define _LINUX_KASAN_TAGS_H
 
-#include <asm/kasan.h>
+#if defined(CONFIG_KASAN_SW_TAGS) || defined(CONFIG_KASAN_HW_TAGS)
+#include <asm/kasan-tags.h>
+#endif
+
+#ifndef KASAN_TAG_WIDTH
+#define KASAN_TAG_WIDTH		0
+#endif
 
 #ifndef KASAN_TAG_KERNEL
 #define KASAN_TAG_KERNEL	0xFF /* native kernel pointers tag */
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index b396feca714f..54481f8c30c5 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -40,7 +40,9 @@ typedef unsigned int __bitwise kasan_vmalloc_flags_t;
 
 #ifdef CONFIG_KASAN_SW_TAGS
 /* This matches KASAN_TAG_INVALID. */
-#define KASAN_SHADOW_INIT 0xFE
+#ifndef KASAN_SHADOW_INIT
+#define KASAN_SHADOW_INIT KASAN_TAG_INVALID
+#endif
 #else
 #define KASAN_SHADOW_INIT 0
 #endif
diff --git a/include/linux/mm.h b/include/linux/mm.h
index beba5ba0fd97..610f6af6daf4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1815,7 +1815,7 @@ static inline u8 page_kasan_tag(const struct page *page)
 
 	if (kasan_enabled()) {
 		tag = (page->flags >> KASAN_TAG_PGSHIFT) & KASAN_TAG_MASK;
-		tag ^= 0xff;
+		tag ^= KASAN_TAG_KERNEL;
 	}
 
 	return tag;
@@ -1828,7 +1828,7 @@ static inline void page_kasan_tag_set(struct page *page, u8 tag)
 	if (!kasan_enabled())
 		return;
 
-	tag ^= 0xff;
+	tag ^= KASAN_TAG_KERNEL;
 	old_flags = READ_ONCE(page->flags);
 	do {
 		flags = old_flags;
@@ -1847,7 +1847,7 @@ static inline void page_kasan_tag_reset(struct page *page)
 
 static inline u8 page_kasan_tag(const struct page *page)
 {
-	return 0xff;
+	return KASAN_TAG_KERNEL;
 }
 
 static inline void page_kasan_tag_set(struct page *page, u8 tag) { }
diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h
index 4f5c9e979bb9..b2cc4cb870e0 100644
--- a/include/linux/page-flags-layout.h
+++ b/include/linux/page-flags-layout.h
@@ -3,6 +3,7 @@
 #define PAGE_FLAGS_LAYOUT_H
 
 #include <linux/numa.h>
+#include <linux/kasan-tags.h>
 #include <generated/bounds.h>
 
 /*
@@ -72,12 +73,6 @@
 #define NODE_NOT_IN_PAGE_FLAGS	1
 #endif
 
-#if defined(CONFIG_KASAN_SW_TAGS) || defined(CONFIG_KASAN_HW_TAGS)
-#define KASAN_TAG_WIDTH 8
-#else
-#define KASAN_TAG_WIDTH 0
-#endif
-
 #ifdef CONFIG_NUMA_BALANCING
 #define LAST__PID_SHIFT 8
 #define LAST__PID_MASK  ((1 << LAST__PID_SHIFT)-1)
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 05/14] x86: Reset tag for virtual to physical address conversions
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
                   ` (3 preceding siblings ...)
  2025-04-04 13:14 ` [PATCH v3 04/14] kasan: arm64: x86: Make special tags arch specific Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 16:42   ` Dave Hansen
  2025-04-04 13:14 ` [PATCH v3 06/14] x86: Physical address comparisons in fill_p*d/pte Maciej Wieczor-Retman
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

Any place where pointer arithmetic is used to convert a virtual address
into a physical one can raise errors if the virtual address is tagged.

Reset the pointer's tag by sign extending the tag bits in macros that do
pointer arithmetic in address conversions. There will be no change in
compiled code with KASAN disabled since the compiler will optimize the
__tag_reset() out.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
 arch/x86/include/asm/page.h    | 17 +++++++++++++----
 arch/x86/include/asm/page_64.h |  2 +-
 arch/x86/mm/physaddr.c         |  1 +
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 9265f2fca99a..e37f63b50728 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -7,6 +7,7 @@
 #ifdef __KERNEL__
 
 #include <asm/page_types.h>
+#include <asm/kasan.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/page_64.h>
@@ -41,7 +42,7 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 #define __pa(x)		__phys_addr((unsigned long)(x))
 #endif
 
-#define __pa_nodebug(x)	__phys_addr_nodebug((unsigned long)(x))
+#define __pa_nodebug(x)	__phys_addr_nodebug((unsigned long)(__tag_reset(x)))
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */
 /*
@@ -65,9 +66,17 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
  * virt_to_page(kaddr) returns a valid pointer if and only if
  * virt_addr_valid(kaddr) returns true.
  */
-#define virt_to_page(kaddr)	pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
+
+#ifdef CONFIG_KASAN_SW_TAGS
+#define page_to_virt(x)	({									\
+	__typeof__(x) __page = x;								\
+	void *__addr = __va(page_to_pfn((__typeof__(x))__tag_reset(__page)) << PAGE_SHIFT);	\
+	(void *)__tag_set((const void *)__addr, page_kasan_tag(__page));			\
+})
+#endif
+#define virt_to_page(kaddr)	pfn_to_page(__pa((void *)__tag_reset(kaddr)) >> PAGE_SHIFT)
 extern bool __virt_addr_valid(unsigned long kaddr);
-#define virt_addr_valid(kaddr)	__virt_addr_valid((unsigned long) (kaddr))
+#define virt_addr_valid(kaddr)	__virt_addr_valid((unsigned long)(__tag_reset(kaddr)))
 
 static __always_inline void *pfn_to_kaddr(unsigned long pfn)
 {
@@ -81,7 +90,7 @@ static __always_inline u64 __canonical_address(u64 vaddr, u8 vaddr_bits)
 
 static __always_inline u64 __is_canonical_address(u64 vaddr, u8 vaddr_bits)
 {
-	return __canonical_address(vaddr, vaddr_bits) == vaddr;
+	return __canonical_address(vaddr, vaddr_bits) == __tag_reset(vaddr);
 }
 
 #endif	/* __ASSEMBLER__ */
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index d3aab6f4e59a..44975a8ae665 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -33,7 +33,7 @@ static __always_inline unsigned long __phys_addr_nodebug(unsigned long x)
 extern unsigned long __phys_addr(unsigned long);
 extern unsigned long __phys_addr_symbol(unsigned long);
 #else
-#define __phys_addr(x)		__phys_addr_nodebug(x)
+#define __phys_addr(x)		__phys_addr_nodebug(__tag_reset(x))
 #define __phys_addr_symbol(x) \
 	((unsigned long)(x) - __START_KERNEL_map + phys_base)
 #endif
diff --git a/arch/x86/mm/physaddr.c b/arch/x86/mm/physaddr.c
index fc3f3d3e2ef2..7f2b11308245 100644
--- a/arch/x86/mm/physaddr.c
+++ b/arch/x86/mm/physaddr.c
@@ -14,6 +14,7 @@
 #ifdef CONFIG_DEBUG_VIRTUAL
 unsigned long __phys_addr(unsigned long x)
 {
+	x = __tag_reset(x);
 	unsigned long y = x - __START_KERNEL_map;
 
 	/* use the carry flag to determine if x was < __START_KERNEL_map */
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 06/14] x86: Physical address comparisons in fill_p*d/pte
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
                   ` (4 preceding siblings ...)
  2025-04-04 13:14 ` [PATCH v3 05/14] x86: Reset tag for virtual to physical address conversions Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 16:56   ` Dave Hansen
  2025-04-04 13:14 ` [PATCH v3 07/14] x86: KASAN raw shadow memory PTE init Maciej Wieczor-Retman
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

Calculating page offset returns a pointer without a tag. When comparing
the calculated offset to a tagged page pointer an error is raised
because they are not equal.

Change pointer comparisons to physical address comparisons as to avoid
issues with tagged pointers that pointer arithmetic would create. Open
code pte_offset_kernel(), pmd_offset(), pud_offset() and p4d_offset().
Because one parameter is always zero and the rest of the function
insides are enclosed inside __va(), removing that layer lowers the
complexity of final assembly.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v2:
- Open code *_offset() to avoid it's internal __va().

 arch/x86/mm/init_64.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 519aa53114fa..d40699c16f14 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -251,7 +251,10 @@ static p4d_t *fill_p4d(pgd_t *pgd, unsigned long vaddr)
 	if (pgd_none(*pgd)) {
 		p4d_t *p4d = (p4d_t *)spp_getpage();
 		pgd_populate(&init_mm, pgd, p4d);
-		if (p4d != p4d_offset(pgd, 0))
+
+		if (__pa(p4d) != (pgtable_l5_enabled() ?
+				  (unsigned long)pgd_val(*pgd) & PTE_PFN_MASK :
+				  __pa(pgd)))
 			printk(KERN_ERR "PAGETABLE BUG #00! %p <-> %p\n",
 			       p4d, p4d_offset(pgd, 0));
 	}
@@ -263,7 +266,7 @@ static pud_t *fill_pud(p4d_t *p4d, unsigned long vaddr)
 	if (p4d_none(*p4d)) {
 		pud_t *pud = (pud_t *)spp_getpage();
 		p4d_populate(&init_mm, p4d, pud);
-		if (pud != pud_offset(p4d, 0))
+		if (__pa(pud) != (p4d_val(*p4d) & p4d_pfn_mask(*p4d)))
 			printk(KERN_ERR "PAGETABLE BUG #01! %p <-> %p\n",
 			       pud, pud_offset(p4d, 0));
 	}
@@ -275,7 +278,7 @@ static pmd_t *fill_pmd(pud_t *pud, unsigned long vaddr)
 	if (pud_none(*pud)) {
 		pmd_t *pmd = (pmd_t *) spp_getpage();
 		pud_populate(&init_mm, pud, pmd);
-		if (pmd != pmd_offset(pud, 0))
+		if (__pa(pmd) != (pud_val(*pud) & pud_pfn_mask(*pud)))
 			printk(KERN_ERR "PAGETABLE BUG #02! %p <-> %p\n",
 			       pmd, pmd_offset(pud, 0));
 	}
@@ -287,7 +290,7 @@ static pte_t *fill_pte(pmd_t *pmd, unsigned long vaddr)
 	if (pmd_none(*pmd)) {
 		pte_t *pte = (pte_t *) spp_getpage();
 		pmd_populate_kernel(&init_mm, pmd, pte);
-		if (pte != pte_offset_kernel(pmd, 0))
+		if (__pa(pte) != (pmd_val(*pmd) & pmd_pfn_mask(*pmd)))
 			printk(KERN_ERR "PAGETABLE BUG #03!\n");
 	}
 	return pte_offset_kernel(pmd, vaddr);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 07/14] x86: KASAN raw shadow memory PTE init
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
                   ` (5 preceding siblings ...)
  2025-04-04 13:14 ` [PATCH v3 06/14] x86: Physical address comparisons in fill_p*d/pte Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 13:14 ` [PATCH v3 08/14] x86: LAM initialization Maciej Wieczor-Retman
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

In KASAN's generic mode the default value in shadow memory is zero.
During initialization of shadow memory pages they are allocated and
zeroed.

In KASAN's tag-based mode the default tag for the arm64 architecture is
0xFE which corresponds to any memory that should not be accessed. On x86
(where tags are 4-bit wide instead of 8-bit wide) that tag is 0xE so
during the initializations all the bytes in shadow memory pages should
be filled with 0xE.

Use memblock_alloc_try_nid_raw() instead of memblock_alloc_try_nid() to
avoid zeroing out the memory so it can be set with the KASAN invalid
tag.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v2:
- Remove dense mode references, use memset() instead of kasan_poison().

 arch/x86/mm/kasan_init_64.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 0539efd0d216..e8a451cafc8c 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -34,6 +34,18 @@ static __init void *early_alloc(size_t size, int nid, bool should_panic)
 	return ptr;
 }
 
+static __init void *early_raw_alloc(size_t size, int nid, bool should_panic)
+{
+	void *ptr = memblock_alloc_try_nid_raw(size, size,
+			__pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, nid);
+
+	if (!ptr && should_panic)
+		panic("%pS: Failed to allocate page, nid=%d from=%lx\n",
+		      (void *)_RET_IP_, nid, __pa(MAX_DMA_ADDRESS));
+
+	return ptr;
+}
+
 static void __init kasan_populate_pmd(pmd_t *pmd, unsigned long addr,
 				      unsigned long end, int nid)
 {
@@ -63,8 +75,9 @@ static void __init kasan_populate_pmd(pmd_t *pmd, unsigned long addr,
 		if (!pte_none(*pte))
 			continue;
 
-		p = early_alloc(PAGE_SIZE, nid, true);
-		entry = pfn_pte(PFN_DOWN(__pa(p)), PAGE_KERNEL);
+		p = early_raw_alloc(PAGE_SIZE, nid, true);
+		memset(p, PAGE_SIZE, KASAN_SHADOW_INIT);
+		entry = pfn_pte(PFN_DOWN(__pa_nodebug(p)), PAGE_KERNEL);
 		set_pte_at(&init_mm, addr, pte, entry);
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 }
@@ -436,7 +449,7 @@ void __init kasan_init(void)
 	 * it may contain some garbage. Now we can clear and write protect it,
 	 * since after the TLB flush no one should write to it.
 	 */
-	memset(kasan_early_shadow_page, 0, PAGE_SIZE);
+	memset(kasan_early_shadow_page, KASAN_SHADOW_INIT, PAGE_SIZE);
 	for (i = 0; i < PTRS_PER_PTE; i++) {
 		pte_t pte;
 		pgprot_t prot;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 08/14] x86: LAM initialization
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
                   ` (6 preceding siblings ...)
  2025-04-04 13:14 ` [PATCH v3 07/14] x86: KASAN raw shadow memory PTE init Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 13:14 ` [PATCH v3 09/14] x86: Minimal SLAB alignment Maciej Wieczor-Retman
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

To make use of KASAN's tag based mode on x86 Linear Address Masking
(LAM) needs to be enabled. To do that the 28th bit in CR4 needs to be
set.

Set the bit in early memory initialization.

When launching secondary CPUs the LAM bit gets lost. To avoid this it
needs to get added in a mask in head_64.S. The bit mask permits some
bits of CR4 to pass from the primary CPU to the secondary CPUs without
being cleared.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
 arch/x86/kernel/head_64.S | 3 +++
 arch/x86/mm/init.c        | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index fefe2a25cf02..95b897b8bbd2 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -209,6 +209,9 @@ SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
 	 *  there will be no global TLB entries after the execution."
 	 */
 	movl	$(X86_CR4_PAE | X86_CR4_LA57), %edx
+#ifdef CONFIG_ADDRESS_MASKING
+	orl	$X86_CR4_LAM_SUP, %edx
+#endif
 #ifdef CONFIG_X86_MCE
 	/*
 	 * Preserve CR4.MCE if the kernel will enable #MC support.
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index bfa444a7dbb0..84cefc5dd69b 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -766,6 +766,9 @@ void __init init_mem_mapping(void)
 	probe_page_size_mask();
 	setup_pcid();
 
+	if (boot_cpu_has(X86_FEATURE_LAM) && IS_ENABLED(CONFIG_KASAN_SW_TAGS))
+		cr4_set_bits_and_update_boot(X86_CR4_LAM_SUP);
+
 #ifdef CONFIG_X86_64
 	end = max_pfn << PAGE_SHIFT;
 #else
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 09/14] x86: Minimal SLAB alignment
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
                   ` (7 preceding siblings ...)
  2025-04-04 13:14 ` [PATCH v3 08/14] x86: LAM initialization Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 16:59   ` Dave Hansen
  2025-04-04 13:14 ` [PATCH v3 10/14] x86: Update the KASAN non-canonical hook Maciej Wieczor-Retman
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

Adjust x86 minimal SLAB alignment to match KASAN granularity size. In
tag-based mode the size changes to 16 bytes so the value needs to be 16.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v3:
- Fix typo in patch message 4 -> 16.
- Change define location to arch/x86/include/asm/cache.c.

 arch/x86/include/asm/cache.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/cache.h b/arch/x86/include/asm/cache.h
index 69404eae9983..3232583b5487 100644
--- a/arch/x86/include/asm/cache.h
+++ b/arch/x86/include/asm/cache.h
@@ -21,4 +21,8 @@
 #endif
 #endif
 
+#ifdef CONFIG_KASAN_SW_TAGS
+#define ARCH_SLAB_MINALIGN (1ULL << KASAN_SHADOW_SCALE_SHIFT)
+#endif
+
 #endif /* _ASM_X86_CACHE_H */
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 10/14] x86: Update the KASAN non-canonical hook
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
                   ` (8 preceding siblings ...)
  2025-04-04 13:14 ` [PATCH v3 09/14] x86: Minimal SLAB alignment Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 17:37   ` Dave Hansen
  2025-04-04 13:14 ` [PATCH v3 11/14] x86: Handle int3 for inline KASAN reports Maciej Wieczor-Retman
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

The kasan_non_canonical_hook() is useful in pointing out that an address
which caused some kind of error could be the result of
kasan_mem_to_shadow() mapping. Currently it's called only in the general
protection handler code path but can give helpful information also in
page fault oops reports.

For example consider a page fault for address 0xffdefc0000000000 on a
5-level paging system. It could have been accessed from KASAN's
kasan_mem_to_shadow() called on 0xfef0000000000000 address. Without the
kasan_non_canonical_hook() in the page fault case it might be hard to
figure out why an error occurred.

Add kasan_non_canonical_hook() to the beginning of show_fault_oops().

Update kasan_non_canonical_hook() to take into account the possible
memory to shadow mappings in the software tag-based mode of x86.

Patch was tested with positive results by accessing the following
addresses, causing #GPs and #PFs.

Valid mappings (showing kasan_non_canonical_hook() message):
	0xFFFFFFFF8FFFFFFF
	0xFEF0000000000000
	0x7FFFFF4FFFFFFFFF
	0x7EF0000000000000
Invalid mappings (not showing kasan_non_canonical_hook() message):
	0xFFFFFFFFF8FFFFFF
	0xFFBFFC0000000000
	0x07EFFC0000000000
	0x000E000000000000

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v3:
- Move the report.c part from first patch in the series to this new
  patch to have x86 changes in one place.
- Add the call in fault oops.
- Extend the comment in report.c with a graphical representation of what
  addresses are valid and invalid in memory to shadow mapping.

 arch/x86/mm/fault.c |  2 ++
 mm/kasan/report.c   | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 697432f63c59..16366af60ae5 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -511,6 +511,8 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 	if (!oops_may_print())
 		return;
 
+	kasan_non_canonical_hook(address);
+
 	if (error_code & X86_PF_INSTR) {
 		unsigned int level;
 		bool nx, rw;
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index f24f11cc644a..135307c93c2c 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -700,7 +700,7 @@ void kasan_non_canonical_hook(unsigned long addr)
 	 * operation would overflow only for some memory addresses. However, due
 	 * to the chosen KASAN_SHADOW_OFFSET values and the fact the
 	 * kasan_mem_to_shadow() only operates on pointers with the tag reset,
-	 * the overflow always happens.
+	 * the overflow always happens (for both x86 and arm64).
 	 *
 	 * For arm64, the top byte of the pointer gets reset to 0xFF. Thus, the
 	 * possible shadow addresses belong to a region that is the result of
@@ -715,6 +715,40 @@ void kasan_non_canonical_hook(unsigned long addr)
 			return;
 	}
 
+	 /*
+	  * For x86-64, only the pointer bits [62:57] get reset, and bits #63
+	  * and #56 can be 0 or 1. Thus, kasan_mem_to_shadow() can be possibly
+	  * applied to two regions of memory:
+	  * [0x7E00000000000000, 0x7FFFFFFFFFFFFFFF] and
+	  * [0xFE00000000000000, 0xFFFFFFFFFFFFFFFF]. As the overflow happens
+	  * for both ends of both memory ranges, both possible shadow regions
+	  * are contiguous.
+	  *
+	  * Given the KASAN_SHADOW_OFFSET equal to 0xffeffc0000000000, the
+	  * following ranges are valid mem-to-shadow mappings:
+	  *
+	  * 0xFFFFFFFFFFFFFFFF
+	  *         INVALID
+	  * 0xFFEFFBFFFFFFFFFF - kasan_mem_to_shadow(~0UL)
+	  *         VALID   - kasan shadow mem
+	  *         VALID   - non-canonical kernel virtual address
+	  * 0xFFCFFC0000000000 - kasan_mem_to_shadow(0xFEUL << 56)
+	  *         INVALID
+	  * 0x07EFFBFFFFFFFFFF - kasan_mem_to_shadow(~0UL >> 1)
+	  *         VALID   - non-canonical user virtual addresses
+	  *         VALID   - user addresses
+	  * 0x07CFFC0000000000 - kasan_mem_to_shadow(0x7EUL << 56)
+	  *         INVALID
+	  * 0x0000000000000000
+	  */
+	if (IS_ENABLED(CONFIG_KASAN_SW_TAGS) && IS_ENABLED(CONFIG_X86_64)) {
+		if ((addr < (u64)kasan_mem_to_shadow((void *)(0x7EUL << 56)) ||
+		     addr > (u64)kasan_mem_to_shadow((void *)(~0UL >> 1))) &&
+		    (addr < (u64)kasan_mem_to_shadow((void *)(0xFEUL << 56)) ||
+		     addr > (u64)kasan_mem_to_shadow((void *)(~0UL))))
+			return;
+	}
+
 	orig_addr = (unsigned long)kasan_shadow_to_mem((void *)addr);
 
 	/*
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 11/14] x86: Handle int3 for inline KASAN reports
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
                   ` (9 preceding siblings ...)
  2025-04-04 13:14 ` [PATCH v3 10/14] x86: Update the KASAN non-canonical hook Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 17:55   ` Dave Hansen
  2025-04-04 13:14 ` [PATCH v3 12/14] kasan: Fix inline mode for x86 tag-based mode Maciej Wieczor-Retman
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

When a tag mismatch happens in inline software tag-based KASAN on x86 an
int3 instruction is executed and needs proper handling.

Call kasan_report() from the int3 handler and pass down the proper
information from registers - RDI should contain the problematic address
and RAX other metadata.

Also early return from the int3 selftest if inline KASAN is enabled
since it will cause a kernel panic otherwise.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v3:
- Add this patch to the series.

 arch/x86/kernel/alternative.c |  3 ++
 arch/x86/kernel/traps.c       | 52 +++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index bf82c6f7d690..ba277a25b57f 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1979,6 +1979,9 @@ static noinline void __init int3_selftest(void)
 	};
 	unsigned int val = 0;
 
+	if (IS_ENABLED(CONFIG_KASAN_INLINE))
+		return;
+
 	BUG_ON(register_die_notifier(&int3_exception_nb));
 
 	/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9f88b8a78e50..32c81fc2d439 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 #include <linux/kallsyms.h>
 #include <linux/kmsan.h>
+#include <linux/kasan.h>
 #include <linux/spinlock.h>
 #include <linux/kprobes.h>
 #include <linux/uaccess.h>
@@ -849,6 +850,51 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 	cond_local_irq_disable(regs);
 }
 
+#ifdef CONFIG_KASAN_SW_TAGS
+
+#define KASAN_RAX_RECOVER	0x20
+#define KASAN_RAX_WRITE	0x10
+#define KASAN_RAX_SIZE_MASK	0x0f
+#define KASAN_RAX_SIZE(rax)	(1 << ((rax) & KASAN_RAX_SIZE_MASK))
+
+static bool kasan_handler(struct pt_regs *regs)
+{
+	int metadata = regs->ax;
+	u64 addr = regs->di;
+	u64 pc = regs->ip;
+	bool recover = metadata & KASAN_RAX_RECOVER;
+	bool write = metadata & KASAN_RAX_WRITE;
+	size_t size = KASAN_RAX_SIZE(metadata);
+
+	if (!IS_ENABLED(CONFIG_KASAN_INLINE))
+		return false;
+
+	if (user_mode(regs))
+		return false;
+
+	kasan_report((void *)addr, size, write, pc);
+
+	/*
+	 * The instrumentation allows to control whether we can proceed after
+	 * a crash was detected. This is done by passing the -recover flag to
+	 * the compiler. Disabling recovery allows to generate more compact
+	 * code.
+	 *
+	 * Unfortunately disabling recovery doesn't work for the kernel right
+	 * now. KASAN reporting is disabled in some contexts (for example when
+	 * the allocator accesses slab object metadata; this is controlled by
+	 * current->kasan_depth). All these accesses are detected by the tool,
+	 * even though the reports for them are not printed.
+	 *
+	 * This is something that might be fixed at some point in the future.
+	 */
+	if (!recover)
+		die("Oops - KASAN", regs, 0);
+	return true;
+}
+
+#endif
+
 static bool do_int3(struct pt_regs *regs)
 {
 	int res;
@@ -863,6 +909,12 @@ static bool do_int3(struct pt_regs *regs)
 	if (kprobe_int3_handler(regs))
 		return true;
 #endif
+
+#ifdef CONFIG_KASAN_SW_TAGS
+	if (kasan_handler(regs))
+		return true;
+#endif
+
 	res = notify_die(DIE_INT3, "int3", regs, 0, X86_TRAP_BP, SIGTRAP);
 
 	return res == NOTIFY_STOP;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 12/14] kasan: Fix inline mode for x86 tag-based mode
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
                   ` (10 preceding siblings ...)
  2025-04-04 13:14 ` [PATCH v3 11/14] x86: Handle int3 for inline KASAN reports Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 13:14 ` [PATCH v3 13/14] mm: Unpoison pcpu chunks with base address tag Maciej Wieczor-Retman
  2025-04-04 13:14 ` [PATCH v3 14/14] x86: Make software tag-based kasan available Maciej Wieczor-Retman
  13 siblings, 0 replies; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

The LLVM compiler uses hwasan-instrument-with-calls parameter to setup
inline or outline mode in tag-based KASAN. If zeroed, it means the
instrumentation implementation will be copied into each relevant
location along with appropriate constants during compilation. If set to
one, all function instrumentation will be done with function calls
instead.

The default hwasan-instrument-with-calls value for the x86 architecture
in the compiler is "1", which is not true for other architectures.
Because of this enabling inline mode in software tag-based KASAN doesn't
work on x86 as the kernel script doesn't zero out the parameter.

Explicitly zero out hwasan-instrument-with-calls when enabling inline
mode in tag-based KASAN.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v3:
- Add this patch to the series.

 scripts/Makefile.kasan | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan
index 693dbbebebba..2c7be96727ac 100644
--- a/scripts/Makefile.kasan
+++ b/scripts/Makefile.kasan
@@ -76,8 +76,11 @@ CFLAGS_KASAN := -fsanitize=kernel-hwaddress
 RUSTFLAGS_KASAN := -Zsanitizer=kernel-hwaddress \
 		   -Zsanitizer-recover=kernel-hwaddress
 
+# LLVM sets hwasan-instrument-with-calls to 1 on x86 by default. Set it to 0
+# when inline mode is enabled.
 ifdef CONFIG_KASAN_INLINE
 	kasan_params += hwasan-mapping-offset=$(KASAN_SHADOW_OFFSET)
+	kasan_params += hwasan-instrument-with-calls=0
 else
 	kasan_params += hwasan-instrument-with-calls=1
 endif
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 13/14] mm: Unpoison pcpu chunks with base address tag
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
                   ` (11 preceding siblings ...)
  2025-04-04 13:14 ` [PATCH v3 12/14] kasan: Fix inline mode for x86 tag-based mode Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  2025-04-04 18:08   ` Dave Hansen
  2025-04-04 13:14 ` [PATCH v3 14/14] x86: Make software tag-based kasan available Maciej Wieczor-Retman
  13 siblings, 1 reply; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

The problem presented here is related to NUMA systems and tag-based
KASAN mode. Getting to it can be explained in the following points:

	1. A new chunk is created with pcpu_create_chunk() and
	   vm_structs are allocated. On systems with one NUMA node only
	   one is allocated, but with more NUMA nodes at least a second
	   one will be allocated too.

	2. chunk->base_addr is assigned the modified value of
	   vms[0]->addr and thus inherits the tag of this allocated
	   structure.

	3. In pcpu_alloc() for each possible cpu pcpu_chunk_addr() is
	   executed which calculates per cpu pointers that correspond to
	   the vms structure addresses. The calculations are based on
	   adding an offset from a table to chunk->base_addr.

Here the problem presents itself since for addresses based on vms[1] and
up, the tag will be different than the ones based on vms[0] (base_addr).
The tag mismatch happens and an error is reported.

Unpoison all the vms[]->addr with the same tag to resolve the mismatch.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v3:
- Remove last version of this patch that just resets the tag on
  base_addr and add this patch that unpoisons all areas with the same
  tag instead.

 include/linux/kasan.h | 10 ++++++++++
 mm/kasan/shadow.c     | 11 +++++++++++
 mm/vmalloc.c          |  3 +--
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 54481f8c30c5..bd033b2ba383 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -613,6 +613,13 @@ static __always_inline void kasan_poison_vmalloc(const void *start,
 		__kasan_poison_vmalloc(start, size);
 }
 
+void __kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms);
+static __always_inline void kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms)
+{
+	if (kasan_enabled())
+		__kasan_unpoison_vmap_areas(vms, nr_vms);
+}
+
 #else /* CONFIG_KASAN_VMALLOC */
 
 static inline void kasan_populate_early_vm_area_shadow(void *start,
@@ -637,6 +644,9 @@ static inline void *kasan_unpoison_vmalloc(const void *start,
 static inline void kasan_poison_vmalloc(const void *start, unsigned long size)
 { }
 
+static inline void kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms)
+{ }
+
 #endif /* CONFIG_KASAN_VMALLOC */
 
 #if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 88d1c9dcb507..9496f256bc0f 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -582,6 +582,17 @@ void __kasan_poison_vmalloc(const void *start, unsigned long size)
 	kasan_poison(start, size, KASAN_VMALLOC_INVALID, false);
 }
 
+void __kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms)
+{
+	int area;
+
+	for (area = 0 ; area < nr_vms ; area++) {
+		kasan_poison(vms[area]->addr, vms[area]->size,
+			     arch_kasan_get_tag(vms[0]->addr), false);
+		arch_kasan_set_tag(vms[area]->addr, arch_kasan_get_tag(vms[0]->addr));
+	}
+}
+
 #else /* CONFIG_KASAN_VMALLOC */
 
 int kasan_alloc_module_shadow(void *addr, size_t size, gfp_t gfp_mask)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 61981ee1c9d2..fbd56bf8aeb2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4783,8 +4783,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
 	 * non-VM_ALLOC mappings, see __kasan_unpoison_vmalloc().
 	 */
 	for (area = 0; area < nr_vms; area++)
-		vms[area]->addr = kasan_unpoison_vmalloc(vms[area]->addr,
-				vms[area]->size, KASAN_VMALLOC_PROT_NORMAL);
+		kasan_unpoison_vmap_areas(vms, nr_vms);
 
 	kfree(vas);
 	return vms;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 14/14] x86: Make software tag-based kasan available
  2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
                   ` (12 preceding siblings ...)
  2025-04-04 13:14 ` [PATCH v3 13/14] mm: Unpoison pcpu chunks with base address tag Maciej Wieczor-Retman
@ 2025-04-04 13:14 ` Maciej Wieczor-Retman
  13 siblings, 0 replies; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-04 13:14 UTC (permalink / raw)
  To: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, maciej.wieczor-retman, kaleshsingh, jgross,
	andreyknvl, scott, tony.luck, dvyukov, pasha.tatashin, ziy,
	broonie, gatlin.newhouse, jackmanb, wangkefeng.wang,
	thiago.bauermann, tglx, kees, akpm, jason.andryuk, snovitoll, xin,
	jan.kiszka, bp, rppt, peterz, pankaj.gupta, thuth,
	andriy.shevchenko, joel.granados, kbingham, nicolas, mark.rutland,
	surenb, catalin.marinas, morbo, justinstitt, ubizjak, jhubbard,
	urezki, dave.hansen, bhe, luto, baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

Make CONFIG_KASAN_SW_TAGS available for x86 machines if they have
ADDRESS_MASKING enabled (LAM) as that works similarly to Top-Byte Ignore
(TBI) that allows the software tag-based mode on arm64 platform.

Set scale macro based on KASAN mode: in software tag-based mode 16 bytes
of memory map to one shadow byte and 8 in generic mode.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v3:
- Remove runtime_const from previous patch and merge the rest here.
- Move scale shift definition back to header file.
- Add new kasan offset for software tag based mode.
- Fix patch message typo 32 -> 16, and 16 -> 8.
- Update lib/Kconfig.kasan with x86 now having software tag-based
  support.

Changelog v2:
- Remove KASAN dense code.

 Documentation/arch/x86/x86_64/mm.rst | 6 ++++--
 arch/x86/Kconfig                     | 5 +++--
 arch/x86/boot/compressed/misc.h      | 1 +
 arch/x86/include/asm/kasan.h         | 9 +++++++++
 arch/x86/kernel/setup.c              | 2 ++
 arch/x86/mm/kasan_init_64.c          | 2 +-
 lib/Kconfig.kasan                    | 3 ++-
 7 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/Documentation/arch/x86/x86_64/mm.rst b/Documentation/arch/x86/x86_64/mm.rst
index f2db178b353f..1e2d6b3ae231 100644
--- a/Documentation/arch/x86/x86_64/mm.rst
+++ b/Documentation/arch/x86/x86_64/mm.rst
@@ -60,7 +60,8 @@ Complete virtual memory map with 4-level page tables
    ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
    ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
    ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
-   ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
+   ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory (generic mode)
+   fffff40000000000 |   -8    TB | fffffc0000000000 |    8 TB | KASAN shadow memory (software tag-based mode)
   __________________|____________|__________________|_________|____________________________________________________________
                                                               |
                                                               | Identical layout to the 56-bit one from here on:
@@ -130,7 +131,8 @@ Complete virtual memory map with 5-level page tables
    ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
    ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
    ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
-   ffdf000000000000 |   -8.25 PB | fffffbffffffffff |   ~8 PB | KASAN shadow memory
+   ffdf000000000000 |   -8.25 PB | fffffbffffffffff |   ~8 PB | KASAN shadow memory (generic mode)
+   ffdffc0000000000 |   -6    PB | ffeffc0000000000 |    4 PB | KASAN shadow memory (software tag-based mode)
   __________________|____________|__________________|_________|____________________________________________________________
                                                               |
                                                               | Identical layout to the 47-bit one from here on:
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 15f346f02af0..cfe1cb15950e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -197,6 +197,7 @@ config X86
 	select HAVE_ARCH_JUMP_LABEL_RELATIVE
 	select HAVE_ARCH_KASAN			if X86_64
 	select HAVE_ARCH_KASAN_VMALLOC		if X86_64
+	select HAVE_ARCH_KASAN_SW_TAGS		if ADDRESS_MASKING
 	select HAVE_ARCH_KFENCE
 	select HAVE_ARCH_KMSAN			if X86_64
 	select HAVE_ARCH_KGDB
@@ -402,8 +403,8 @@ config AUDIT_ARCH
 
 config KASAN_SHADOW_OFFSET
 	hex
-	depends on KASAN
-	default 0xdffffc0000000000
+	default 0xdffffc0000000000 if KASAN_GENERIC
+	default 0xffeffc0000000000 if KASAN_SW_TAGS
 
 config HAVE_INTEL_TXT
 	def_bool y
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index dd8d1a85f671..f6a87e9ad200 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -13,6 +13,7 @@
 #undef CONFIG_PARAVIRT_SPINLOCKS
 #undef CONFIG_KASAN
 #undef CONFIG_KASAN_GENERIC
+#undef CONFIG_KASAN_SW_TAGS
 
 #define __NO_FORTIFY
 
diff --git a/arch/x86/include/asm/kasan.h b/arch/x86/include/asm/kasan.h
index 212218622963..d2eedaa092d5 100644
--- a/arch/x86/include/asm/kasan.h
+++ b/arch/x86/include/asm/kasan.h
@@ -6,8 +6,16 @@
 #include <linux/kasan-tags.h>
 #include <linux/types.h>
 #define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
+#ifdef CONFIG_KASAN_SW_TAGS
+#define KASAN_SHADOW_SCALE_SHIFT 4
+#else
 #define KASAN_SHADOW_SCALE_SHIFT 3
+#endif
 
+#ifdef CONFIG_KASAN_SW_TAGS
+#define KASAN_SHADOW_END	KASAN_SHADOW_OFFSET
+#define KASAN_SHADOW_START	(KASAN_SHADOW_END - ((UL(1)) << (__VIRTUAL_MASK_SHIFT - KASAN_SHADOW_SCALE_SHIFT)))
+#else
 /*
  * Compiler uses shadow offset assuming that addresses start
  * from 0. Kernel addresses don't start from 0, so shadow
@@ -24,6 +32,7 @@
 #define KASAN_SHADOW_END        (KASAN_SHADOW_START + \
 					(1ULL << (__VIRTUAL_MASK_SHIFT - \
 						  KASAN_SHADOW_SCALE_SHIFT)))
+#endif
 
 #ifndef __ASSEMBLER__
 #include <linux/bitops.h>
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c7164a8de983..a40d66da69f4 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1182,6 +1182,8 @@ void __init setup_arch(char **cmdline_p)
 
 	kasan_init();
 
+	kasan_init_sw_tags();
+
 	/*
 	 * Sync back kernel address range.
 	 *
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index e8a451cafc8c..b5cf3dca6954 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -371,7 +371,7 @@ void __init kasan_init(void)
 	 * bunch of things like kernel code, modules, EFI mapping, etc.
 	 * We need to take extra steps to not overwrite them.
 	 */
-	if (pgtable_l5_enabled()) {
+	if (pgtable_l5_enabled() && !IS_ENABLED(CONFIG_KASAN_SW_TAGS)) {
 		void *ptr;
 
 		ptr = (void *)pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_END));
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index f82889a830fa..9ddbc6aeb5d5 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -100,7 +100,8 @@ config KASAN_SW_TAGS
 
 	  Requires GCC 11+ or Clang.
 
-	  Supported only on arm64 CPUs and relies on Top Byte Ignore.
+	  Supported on arm64 CPUs that support Top Byte Ignore and on x86 CPUs
+	  that support Linear Address Masking.
 
 	  Consumes about 1/16th of available memory at kernel start and
 	  add an overhead of ~20% for dynamic allocations.
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 03/14] x86: Add arch specific kasan functions
  2025-04-04 13:14 ` [PATCH v3 03/14] x86: Add arch specific kasan functions Maciej Wieczor-Retman
@ 2025-04-04 16:06   ` Dave Hansen
  2025-04-09  7:16     ` Maciej Wieczor-Retman
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2025-04-04 16:06 UTC (permalink / raw)
  To: Maciej Wieczor-Retman, hpa, hch, nick.desaulniers+lkml,
	kuan-ying.lee, masahiroy, samuel.holland, mingo, corbet,
	ryabinin.a.a, guoweikang.kernel, jpoimboe, ardb,
	vincenzo.frascino, glider, kirill.shutemov, apopple, samitolvanen,
	kaleshsingh, jgross, andreyknvl, scott, tony.luck, dvyukov,
	pasha.tatashin, ziy, broonie, gatlin.newhouse, jackmanb,
	wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
> +static inline const void *__tag_set(const void *addr, u8 tag)
> +{
> +	u64 __addr = (u64)addr & ~__tag_shifted(KASAN_TAG_KERNEL);
> +	return (const void *)(__addr | __tag_shifted(tag));
> +}

This becomes a lot clearer to read if you separate out the casting from
the logical bit manipulation. For instance:

static inline const void *__tag_set(const void *__addr, u8 tag)
{
	u64 addr = (u64)__addr;

	addr &= ~__tag_shifted(KASAN_TAG_KERNEL);
	addr |= __tag_shifted(tag);

	return (const void *)addr;
}

Also, unless there's a good reason for it, you might as well limit the
places you need to use "__".

Now that we can read this, I think it's potentially buggy. If someone
went and changed:

#define KASAN_TAG_KERNEL	0xFF

to, say:

#define KASAN_TAG_KERNEL	0xAB

the '&' would miss clearing bits. It works fine in the arm64 implementation:

	u64 __addr = (u64)addr & ~__tag_shifted(0xff);

because they've hard-coded 0xff. I _think_ that's what you actually want
here. You don't want to mask out KASAN_TAG_KERNEL, you actually want to
mask out *ANYTHING* in those bits.

So the best thing is probably to define a KASAN_TAG_MASK that makes it
clear which are the tag bits.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 05/14] x86: Reset tag for virtual to physical address conversions
  2025-04-04 13:14 ` [PATCH v3 05/14] x86: Reset tag for virtual to physical address conversions Maciej Wieczor-Retman
@ 2025-04-04 16:42   ` Dave Hansen
  2025-04-09  7:36     ` Maciej Wieczor-Retman
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2025-04-04 16:42 UTC (permalink / raw)
  To: Maciej Wieczor-Retman, hpa, hch, nick.desaulniers+lkml,
	kuan-ying.lee, masahiroy, samuel.holland, mingo, corbet,
	ryabinin.a.a, guoweikang.kernel, jpoimboe, ardb,
	vincenzo.frascino, glider, kirill.shutemov, apopple, samitolvanen,
	kaleshsingh, jgross, andreyknvl, scott, tony.luck, dvyukov,
	pasha.tatashin, ziy, broonie, gatlin.newhouse, jackmanb,
	wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
> +#ifdef CONFIG_KASAN_SW_TAGS
> +#define page_to_virt(x)	({									\
> +	__typeof__(x) __page = x;								\
> +	void *__addr = __va(page_to_pfn((__typeof__(x))__tag_reset(__page)) << PAGE_SHIFT);	\
> +	(void *)__tag_set((const void *)__addr, page_kasan_tag(__page));			\
> +})
> +#endif

Is this #ifdef needed?

I thought there were stub versions of all of those tag functions. So it
should be harmless to use this page_to_virt() implementation with or
without KASAN. Right?

I'm also confused by the implementation. This is one reason why I rather
dislike macros. Why does this act like the type of 'x' is variable?
Isn't it always a 'struct page *'? If so, then why all of the
__typeof__()'s?

Are struct page pointers _ever_ tagged? If they are, then doesn't
page_to_pfn() need to handle untagging as well? If they aren't, then
there's no reason to __tag_reset() in here.

What was the thinking behind this cast:

	(const void *)__addr

?

Are any of these casts _doing_ anything? I'm struggling to find anything
wrong with:

#define page_to_virt(x)	({													
	void *__addr = __va(page_to_pfn(__page) << PAGE_SHIFT);
	__tag_set(__addr, page_kasan_tag(x))
})

... which made me look back at:

	static inline const void *__tag_set(const void *addr, u8 tag)

from patch 3. I don't think the 'const' makes any sense on the return
value here. Surely the memory pointed at by a tagged pointer doesn't
need to be const. Why should the tag setting function be returning a
const pointer?

I can see why it would *take* a const pointer since it's not modifying
the memory, but I don't see why it is returning one.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 06/14] x86: Physical address comparisons in fill_p*d/pte
  2025-04-04 13:14 ` [PATCH v3 06/14] x86: Physical address comparisons in fill_p*d/pte Maciej Wieczor-Retman
@ 2025-04-04 16:56   ` Dave Hansen
  2025-04-09  7:49     ` Maciej Wieczor-Retman
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2025-04-04 16:56 UTC (permalink / raw)
  To: Maciej Wieczor-Retman, hpa, hch, nick.desaulniers+lkml,
	kuan-ying.lee, masahiroy, samuel.holland, mingo, corbet,
	ryabinin.a.a, guoweikang.kernel, jpoimboe, ardb,
	vincenzo.frascino, glider, kirill.shutemov, apopple, samitolvanen,
	kaleshsingh, jgross, andreyknvl, scott, tony.luck, dvyukov,
	pasha.tatashin, ziy, broonie, gatlin.newhouse, jackmanb,
	wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
> +		if (__pa(p4d) != (pgtable_l5_enabled() ?
> +				  (unsigned long)pgd_val(*pgd) & PTE_PFN_MASK :
> +				  __pa(pgd)))
>  			printk(KERN_ERR "PAGETABLE BUG #00! %p <-> %p\n",

This one is pretty fugly. But I guess it's just one place and it
probably isn't worth refactoring this and the other helpers just for a
debug message.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 09/14] x86: Minimal SLAB alignment
  2025-04-04 13:14 ` [PATCH v3 09/14] x86: Minimal SLAB alignment Maciej Wieczor-Retman
@ 2025-04-04 16:59   ` Dave Hansen
  2025-04-09 12:49     ` Maciej Wieczor-Retman
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2025-04-04 16:59 UTC (permalink / raw)
  To: Maciej Wieczor-Retman, hpa, hch, nick.desaulniers+lkml,
	kuan-ying.lee, masahiroy, samuel.holland, mingo, corbet,
	ryabinin.a.a, guoweikang.kernel, jpoimboe, ardb,
	vincenzo.frascino, glider, kirill.shutemov, apopple, samitolvanen,
	kaleshsingh, jgross, andreyknvl, scott, tony.luck, dvyukov,
	pasha.tatashin, ziy, broonie, gatlin.newhouse, jackmanb,
	wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
> Adjust x86 minimal SLAB alignment to match KASAN granularity size. In
> tag-based mode the size changes to 16 bytes so the value needs to be 16.

I feel like we need a _bit_ of a discussion of the impact here. We are,
after all, trying to get this feature into shape so that it can be used
more widely outside of just debugging environments.

What's the impact of this in a production environment?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 10/14] x86: Update the KASAN non-canonical hook
  2025-04-04 13:14 ` [PATCH v3 10/14] x86: Update the KASAN non-canonical hook Maciej Wieczor-Retman
@ 2025-04-04 17:37   ` Dave Hansen
  2025-04-09 14:34     ` Maciej Wieczor-Retman
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2025-04-04 17:37 UTC (permalink / raw)
  To: Maciej Wieczor-Retman, hpa, hch, nick.desaulniers+lkml,
	kuan-ying.lee, masahiroy, samuel.holland, mingo, corbet,
	ryabinin.a.a, guoweikang.kernel, jpoimboe, ardb,
	vincenzo.frascino, glider, kirill.shutemov, apopple, samitolvanen,
	kaleshsingh, jgross, andreyknvl, scott, tony.luck, dvyukov,
	pasha.tatashin, ziy, broonie, gatlin.newhouse, jackmanb,
	wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
> The kasan_non_canonical_hook() is useful in pointing out that an address
> which caused some kind of error could be the result of
> kasan_mem_to_shadow() mapping. Currently it's called only in the general
> protection handler code path but can give helpful information also in
> page fault oops reports.
> 
> For example consider a page fault for address 0xffdefc0000000000 on a
> 5-level paging system. It could have been accessed from KASAN's
> kasan_mem_to_shadow() called on 0xfef0000000000000 address. Without the
> kasan_non_canonical_hook() in the page fault case it might be hard to
> figure out why an error occurred.
> 
> Add kasan_non_canonical_hook() to the beginning of show_fault_oops().
> 
> Update kasan_non_canonical_hook() to take into account the possible
> memory to shadow mappings in the software tag-based mode of x86.
> 
> Patch was tested with positive results by accessing the following
> addresses, causing #GPs and #PFs.
> 
> Valid mappings (showing kasan_non_canonical_hook() message):
> 	0xFFFFFFFF8FFFFFFF
> 	0xFEF0000000000000
> 	0x7FFFFF4FFFFFFFFF
> 	0x7EF0000000000000
> Invalid mappings (not showing kasan_non_canonical_hook() message):
> 	0xFFFFFFFFF8FFFFFF
> 	0xFFBFFC0000000000
> 	0x07EFFC0000000000
> 	0x000E000000000000
> 
> Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
> ---
> Changelog v3:
> - Move the report.c part from first patch in the series to this new
>   patch to have x86 changes in one place.
> - Add the call in fault oops.
> - Extend the comment in report.c with a graphical representation of what
>   addresses are valid and invalid in memory to shadow mapping.
> 
>  arch/x86/mm/fault.c |  2 ++
>  mm/kasan/report.c   | 36 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 697432f63c59..16366af60ae5 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -511,6 +511,8 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
>  	if (!oops_may_print())
>  		return;
>  
> +	kasan_non_canonical_hook(address);
> +
>  	if (error_code & X86_PF_INSTR) {
>  		unsigned int level;
>  		bool nx, rw;
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
> index f24f11cc644a..135307c93c2c 100644
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -700,7 +700,7 @@ void kasan_non_canonical_hook(unsigned long addr)
>  	 * operation would overflow only for some memory addresses. However, due
>  	 * to the chosen KASAN_SHADOW_OFFSET values and the fact the
>  	 * kasan_mem_to_shadow() only operates on pointers with the tag reset,
> -	 * the overflow always happens.
> +	 * the overflow always happens (for both x86 and arm64).
>  	 *
>  	 * For arm64, the top byte of the pointer gets reset to 0xFF. Thus, the
>  	 * possible shadow addresses belong to a region that is the result of
> @@ -715,6 +715,40 @@ void kasan_non_canonical_hook(unsigned long addr)
>  			return;
>  	}
>  
> +	 /*
> +	  * For x86-64, only the pointer bits [62:57] get reset, and bits #63
> +	  * and #56 can be 0 or 1. Thus, kasan_mem_to_shadow() can be possibly
> +	  * applied to two regions of memory:
> +	  * [0x7E00000000000000, 0x7FFFFFFFFFFFFFFF] and
> +	  * [0xFE00000000000000, 0xFFFFFFFFFFFFFFFF]. As the overflow happens
> +	  * for both ends of both memory ranges, both possible shadow regions
> +	  * are contiguous.
> +	  *
> +	  * Given the KASAN_SHADOW_OFFSET equal to 0xffeffc0000000000, the
> +	  * following ranges are valid mem-to-shadow mappings:
> +	  *
> +	  * 0xFFFFFFFFFFFFFFFF
> +	  *         INVALID
> +	  * 0xFFEFFBFFFFFFFFFF - kasan_mem_to_shadow(~0UL)
> +	  *         VALID   - kasan shadow mem
> +	  *         VALID   - non-canonical kernel virtual address
> +	  * 0xFFCFFC0000000000 - kasan_mem_to_shadow(0xFEUL << 56)
> +	  *         INVALID
> +	  * 0x07EFFBFFFFFFFFFF - kasan_mem_to_shadow(~0UL >> 1)
> +	  *         VALID   - non-canonical user virtual addresses
> +	  *         VALID   - user addresses
> +	  * 0x07CFFC0000000000 - kasan_mem_to_shadow(0x7EUL << 56)
> +	  *         INVALID
> +	  * 0x0000000000000000
> +	  */
> +	if (IS_ENABLED(CONFIG_KASAN_SW_TAGS) && IS_ENABLED(CONFIG_X86_64)) {

One overall comment on this series: there's a lot of unnecessary
complexity. Case in point:

	config ADDRESS_MASKING
	        depends on X86_64

and

	select HAVE_ARCH_KASAN_SW_TAGS		if ADDRESS_MASKING

and

	config KASAN_SW_TAGS
        	depends on HAVE_ARCH_KASAN_SW_TAGS ...


So you can't have CONFIG_KASAN_SW_TAGS set without CONFIG_X86_64.

> +		if ((addr < (u64)kasan_mem_to_shadow((void *)(0x7E UL << 56)) ||
> +		     addr > (u64)kasan_mem_to_shadow((void *)(~0UL >> 1))) &&
> +		    (addr < (u64)kasan_mem_to_shadow((void *)(0xFE UL << 56)) ||
> +		     addr > (u64)kasan_mem_to_shadow((void *)(~0UL))))
> +			return;
> +	}
This isn't looking great.

I'd much rather have those kasan_mem_to_shadow() arguments be built up
programmatically.

I'm also not following the description of where these ranges come from:

	[0x7E00000000000000, 0x7FFFFFFFFFFFFFFF]
	[0xFE00000000000000, 0xFFFFFFFFFFFFFFFF]

I obviously recognize the top kernel and top userspace addresses, but
there do 0x7E... and 0xFE... come from? Is that because both of them
only have 56 actual bits of address space?

Wouldn't we be better off writing that as, say:

#define HIGHEST_KER_ADDR (void *)0xFFFFFFFFFFFFFFFF
// ^ we probably have some macro for that already
#define LOWEST_KERN_ADDR (void *)(HIGHEST_KERNEL_ADDRESS - \
					(1<<56) + 1)
// ^ or can this be calculated by tag manipulation?

which yields:

   void *_addr = (u64)addr;
   ...

   in_kern_shadow = (_addr >= kasan_mem_to_shadow(LOWEST_KERN_ADDR) ||
		    (_addr <= kasan_mem_to_shadow(HIGHEST_KERN_ADDR);
   in_user_shadow = (_addr >= kasan_mem_to_shadow(LOWEST_USER_ADDR) ||
		    (_addr <= kasan_mem_to_shadow(HIGHEST_USER_ADDR);

   if (!in_kern_shadow &&
       !in_user_shadow)
	return;

I _think_ that's the same logic you have. Isn't it slightly more readable?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 11/14] x86: Handle int3 for inline KASAN reports
  2025-04-04 13:14 ` [PATCH v3 11/14] x86: Handle int3 for inline KASAN reports Maciej Wieczor-Retman
@ 2025-04-04 17:55   ` Dave Hansen
  2025-04-09 14:48     ` Maciej Wieczor-Retman
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2025-04-04 17:55 UTC (permalink / raw)
  To: Maciej Wieczor-Retman, hpa, hch, nick.desaulniers+lkml,
	kuan-ying.lee, masahiroy, samuel.holland, mingo, corbet,
	ryabinin.a.a, guoweikang.kernel, jpoimboe, ardb,
	vincenzo.frascino, glider, kirill.shutemov, apopple, samitolvanen,
	kaleshsingh, jgross, andreyknvl, scott, tony.luck, dvyukov,
	pasha.tatashin, ziy, broonie, gatlin.newhouse, jackmanb,
	wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
> When a tag mismatch happens in inline software tag-based KASAN on x86 an
> int3 instruction is executed and needs proper handling.

Does this mean "inline software"? Or "inline" functions? I'm not quite
parsing that. I think it needs some more background.

> Call kasan_report() from the int3 handler and pass down the proper
> information from registers - RDI should contain the problematic address
> and RAX other metadata.
> 
> Also early return from the int3 selftest if inline KASAN is enabled
> since it will cause a kernel panic otherwise.
...
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index bf82c6f7d690..ba277a25b57f 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -1979,6 +1979,9 @@ static noinline void __init int3_selftest(void)
>  	};
>  	unsigned int val = 0;
>  
> +	if (IS_ENABLED(CONFIG_KASAN_INLINE))
> +		return;

Comments, please. This is a total non sequitur otherwise.

>  	BUG_ON(register_die_notifier(&int3_exception_nb));
>  
>  	/*
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 9f88b8a78e50..32c81fc2d439 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
...
> @@ -849,6 +850,51 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
>  	cond_local_irq_disable(regs);
>  }
>  
> +#ifdef CONFIG_KASAN_SW_TAGS
> +
> +#define KASAN_RAX_RECOVER	0x20
> +#define KASAN_RAX_WRITE	0x10
> +#define KASAN_RAX_SIZE_MASK	0x0f
> +#define KASAN_RAX_SIZE(rax)	(1 << ((rax) & KASAN_RAX_SIZE_MASK))

This ABI _looks_ like it was conjured out out of thin air. I assume it's
coming from the compiler. Any pointers to that ABI definition in or out
of the kernel would be appreciated.

> +static bool kasan_handler(struct pt_regs *regs)
> +{
> +	int metadata = regs->ax;
> +	u64 addr = regs->di;
> +	u64 pc = regs->ip;
> +	bool recover = metadata & KASAN_RAX_RECOVER;
> +	bool write = metadata & KASAN_RAX_WRITE;
> +	size_t size = KASAN_RAX_SIZE(metadata);

"metadata" is exactly the same length as "regs->ax", so it seems a
little silly. Also, please use vertical alignment as a tool to make code
more readable. Isn't this much more readable?

	bool recover = regs->ax & KASAN_RAX_RECOVER;
	bool write   = regs->ax & KASAN_RAX_WRITE;
	size_t size  = KASAN_RAX_SIZE(regs->ax);
	u64 addr     = regs->di;
	u64 pc       = regs->ip;

> +	if (!IS_ENABLED(CONFIG_KASAN_INLINE))
> +		return false;
> +
> +	if (user_mode(regs))
> +		return false;
> +
> +	kasan_report((void *)addr, size, write, pc);
> +
> +	/*
> +	 * The instrumentation allows to control whether we can proceed after
> +	 * a crash was detected. This is done by passing the -recover flag to
> +	 * the compiler. Disabling recovery allows to generate more compact
> +	 * code.
> +	 *
> +	 * Unfortunately disabling recovery doesn't work for the kernel right
> +	 * now. KASAN reporting is disabled in some contexts (for example when
> +	 * the allocator accesses slab object metadata; this is controlled by
> +	 * current->kasan_depth). All these accesses are detected by the tool,
> +	 * even though the reports for them are not printed.
> +	 *
> +	 * This is something that might be fixed at some point in the future.
> +	 */

Can we please find a way to do this that doesn't copy and paste a rather
verbose comment?

What if we passed 'recover' into kasan_report() and had it do the die()?

> +	if (!recover)
> +		die("Oops - KASAN", regs, 0);
> +	return true;
> +}
> +
> +#endif
> +
>  static bool do_int3(struct pt_regs *regs)
>  {
>  	int res;
> @@ -863,6 +909,12 @@ static bool do_int3(struct pt_regs *regs)
>  	if (kprobe_int3_handler(regs))
>  		return true;
>  #endif
> +
> +#ifdef CONFIG_KASAN_SW_TAGS
> +	if (kasan_handler(regs))
> +		return true;
> +#endif
I won't get _too_ grumbly about ti since there's another culprit right
above, but the "no #fidefs in .c files" rule still applies. The right
way to do this is with a stub kasan_handler() in a header with the
#ifdef in the header.

Actually, ditto on the kasan_handler() #ifdef. I suspect it can go away
too and be replaced with a IS_ENABLED(CONFIG_KASAN_SW_TAGS) check.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 13/14] mm: Unpoison pcpu chunks with base address tag
  2025-04-04 13:14 ` [PATCH v3 13/14] mm: Unpoison pcpu chunks with base address tag Maciej Wieczor-Retman
@ 2025-04-04 18:08   ` Dave Hansen
  2025-04-09 16:32     ` Maciej Wieczor-Retman
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2025-04-04 18:08 UTC (permalink / raw)
  To: Maciej Wieczor-Retman, hpa, hch, nick.desaulniers+lkml,
	kuan-ying.lee, masahiroy, samuel.holland, mingo, corbet,
	ryabinin.a.a, guoweikang.kernel, jpoimboe, ardb,
	vincenzo.frascino, glider, kirill.shutemov, apopple, samitolvanen,
	kaleshsingh, jgross, andreyknvl, scott, tony.luck, dvyukov,
	pasha.tatashin, ziy, broonie, gatlin.newhouse, jackmanb,
	wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst
  Cc: llvm, linux-mm, linux-doc, linux-arm-kernel, linux-kbuild,
	linux-kernel, kasan-dev, x86

On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
> The problem presented here is related to NUMA systems and tag-based
> KASAN mode. Getting to it can be explained in the following points:
> 
> 	1. A new chunk is created with pcpu_create_chunk() and
> 	   vm_structs are allocated. On systems with one NUMA node only
> 	   one is allocated, but with more NUMA nodes at least a second
> 	   one will be allocated too.
> 
> 	2. chunk->base_addr is assigned the modified value of
> 	   vms[0]->addr and thus inherits the tag of this allocated
> 	   structure.
> 
> 	3. In pcpu_alloc() for each possible cpu pcpu_chunk_addr() is
> 	   executed which calculates per cpu pointers that correspond to
> 	   the vms structure addresses. The calculations are based on
> 	   adding an offset from a table to chunk->base_addr.
> 
> Here the problem presents itself since for addresses based on vms[1] and
> up, the tag will be different than the ones based on vms[0] (base_addr).
> The tag mismatch happens and an error is reported.
> 
> Unpoison all the vms[]->addr with the same tag to resolve the mismatch.

I think there's a bit too much superfluous information in there. For
instance, it's not important to talk about how or why there can be more
than one chunk, just say there _can_ be more than one.

	1. There can be more than one chunk
	2. The chunks are virtually contiguous
	3. Since they are virtually contiguous, the chunks are all
	   addressed from a single base address
	4. The base address has a tag
	5. The base address points at the first chunk and thus inherits
	   the tag of the first chunk
	6. The subsequent chunks will be accessed with the tag from the
	   first chunk
	7. Thus, the subsequent chunks need to have their tag set to
	   match that of the first chunk.

Right?

> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
> index 54481f8c30c5..bd033b2ba383 100644
> --- a/include/linux/kasan.h
> +++ b/include/linux/kasan.h
> @@ -613,6 +613,13 @@ static __always_inline void kasan_poison_vmalloc(const void *start,
>  		__kasan_poison_vmalloc(start, size);
>  }
>  
> +void __kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms);
> +static __always_inline void kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms)
> +{
> +	if (kasan_enabled())
> +		__kasan_unpoison_vmap_areas(vms, nr_vms);
> +}
> +
>  #else /* CONFIG_KASAN_VMALLOC */
>  
>  static inline void kasan_populate_early_vm_area_shadow(void *start,
> @@ -637,6 +644,9 @@ static inline void *kasan_unpoison_vmalloc(const void *start,
>  static inline void kasan_poison_vmalloc(const void *start, unsigned long size)
>  { }
>  
> +static inline void kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms)
> +{ }
> +
>  #endif /* CONFIG_KASAN_VMALLOC */
>  
>  #if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
> diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
> index 88d1c9dcb507..9496f256bc0f 100644
> --- a/mm/kasan/shadow.c
> +++ b/mm/kasan/shadow.c
> @@ -582,6 +582,17 @@ void __kasan_poison_vmalloc(const void *start, unsigned long size)
>  	kasan_poison(start, size, KASAN_VMALLOC_INVALID, false);
>  }
>  
> +void __kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms)
> +{
> +	int area;
> +
> +	for (area = 0 ; area < nr_vms ; area++) {
> +		kasan_poison(vms[area]->addr, vms[area]->size,
> +			     arch_kasan_get_tag(vms[0]->addr), false);
> +		arch_kasan_set_tag(vms[area]->addr, arch_kasan_get_tag(vms[0]->addr));
> +	}
> +}

-ENOCOMMENTS

>  #else /* CONFIG_KASAN_VMALLOC */
>  
>  int kasan_alloc_module_shadow(void *addr, size_t size, gfp_t gfp_mask)
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 61981ee1c9d2..fbd56bf8aeb2 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4783,8 +4783,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>  	 * non-VM_ALLOC mappings, see __kasan_unpoison_vmalloc().
>  	 */
>  	for (area = 0; area < nr_vms; area++)
> -		vms[area]->addr = kasan_unpoison_vmalloc(vms[area]->addr,
> -				vms[area]->size, KASAN_VMALLOC_PROT_NORMAL);
> +		kasan_unpoison_vmap_areas(vms, nr_vms);
>  
>  	kfree(vas);
>  	return vms;

So, the right way to do this is refactor, first, then add your changes
after. This really wants to be two patches.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 03/14] x86: Add arch specific kasan functions
  2025-04-04 16:06   ` Dave Hansen
@ 2025-04-09  7:16     ` Maciej Wieczor-Retman
  0 siblings, 0 replies; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-09  7:16 UTC (permalink / raw)
  To: Dave Hansen
  Cc: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, kaleshsingh, jgross, andreyknvl, scott,
	tony.luck, dvyukov, pasha.tatashin, ziy, broonie, gatlin.newhouse,
	jackmanb, wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst, llvm, linux-mm, linux-doc,
	linux-arm-kernel, linux-kbuild, linux-kernel, kasan-dev, x86

On 2025-04-04 at 09:06:51 -0700, Dave Hansen wrote:
>On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
>> +static inline const void *__tag_set(const void *addr, u8 tag)
>> +{
>> +	u64 __addr = (u64)addr & ~__tag_shifted(KASAN_TAG_KERNEL);
>> +	return (const void *)(__addr | __tag_shifted(tag));
>> +}
>
>This becomes a lot clearer to read if you separate out the casting from
>the logical bit manipulation. For instance:
>
>static inline const void *__tag_set(const void *__addr, u8 tag)
>{
>	u64 addr = (u64)__addr;
>
>	addr &= ~__tag_shifted(KASAN_TAG_KERNEL);
>	addr |= __tag_shifted(tag);
>
>	return (const void *)addr;
>}
>
>Also, unless there's a good reason for it, you might as well limit the
>places you need to use "__".

Thanks, the above looks better :)

>
>Now that we can read this, I think it's potentially buggy. If someone
>went and changed:
>
>#define KASAN_TAG_KERNEL	0xFF
>
>to, say:
>
>#define KASAN_TAG_KERNEL	0xAB
>zo
>the '&' would miss clearing bits. It works fine in the arm64 implementation:
>
>	u64 __addr = (u64)addr & ~__tag_shifted(0xff);
>
>because they've hard-coded 0xff. I _think_ that's what you actually want
>here. You don't want to mask out KASAN_TAG_KERNEL, you actually want to
>mask out *ANYTHING* in those bits.
>
>So the best thing is probably to define a KASAN_TAG_MASK that makes it
>clear which are the tag bits.

Okay, that makes more sense. KASAN_TAG_MASK already exist ((1 << TAG_WIDTH) - 1)
in include/linux/mmzone.h. I'll move it to include/linux/kasan-tags.h so it can
be included.

-- 
Kind regards
Maciej Wieczór-Retman


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 05/14] x86: Reset tag for virtual to physical address conversions
  2025-04-04 16:42   ` Dave Hansen
@ 2025-04-09  7:36     ` Maciej Wieczor-Retman
  0 siblings, 0 replies; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-09  7:36 UTC (permalink / raw)
  To: Dave Hansen
  Cc: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, kaleshsingh, jgross, andreyknvl, scott,
	tony.luck, dvyukov, pasha.tatashin, ziy, broonie, gatlin.newhouse,
	jackmanb, wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst, llvm, linux-mm, linux-doc,
	linux-arm-kernel, linux-kbuild, linux-kernel, kasan-dev, x86

On 2025-04-04 at 09:42:55 -0700, Dave Hansen wrote:
>On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
>> +#ifdef CONFIG_KASAN_SW_TAGS
>> +#define page_to_virt(x)	({									\
>> +	__typeof__(x) __page = x;								\
>> +	void *__addr = __va(page_to_pfn((__typeof__(x))__tag_reset(__page)) << PAGE_SHIFT);	\
>> +	(void *)__tag_set((const void *)__addr, page_kasan_tag(__page));			\
>> +})
>> +#endif
>
>Is this #ifdef needed?
>
>I thought there were stub versions of all of those tag functions. So it
>should be harmless to use this page_to_virt() implementation with or
>without KASAN. Right?
>
>I'm also confused by the implementation. This is one reason why I rather
>dislike macros. Why does this act like the type of 'x' is variable?
>Isn't it always a 'struct page *'? If so, then why all of the
>__typeof__()'s?
>
>Are struct page pointers _ever_ tagged? If they are, then doesn't
>page_to_pfn() need to handle untagging as well? If they aren't, then
>there's no reason to __tag_reset() in here.
>
>What was the thinking behind this cast:
>
>	(const void *)__addr
>
>?
>
>Are any of these casts _doing_ anything? I'm struggling to find anything
>wrong with:
>
>#define page_to_virt(x)	({
>	void *__addr = __va(page_to_pfn(__page) << PAGE_SHIFT);
>	__tag_set(__addr, page_kasan_tag(x))
>})
>
>... which made me look back at:
>
>	static inline const void *__tag_set(const void *addr, u8 tag)
>
>from patch 3. I don't think the 'const' makes any sense on the return
>value here. Surely the memory pointed at by a tagged pointer doesn't
>need to be const. Why should the tag setting function be returning a
>const pointer?
>
>I can see why it would *take* a const pointer since it's not modifying
>the memory, but I don't see why it is returning one.
>

Right, yes, both your page_to_virt() and removing the const from __tag_set()
return seem to be working just fine. Thanks for pointing it out.

With the macros I was trying to do what the arm64 implementation did assuming it
had some significance that wasn't written down anywhere.

I recall lack of the const thing was giving me a compilation error a while
back but now it's gone. And I didn't think much of it since arm had the same
thing.

If Andrey Konovalov is reading this: is there a reason the __tag_set() on arm64
returns a const pointer? And the page_to_virt() does the __typeof__()?

-- 
Kind regards
Maciej Wieczór-Retman


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 06/14] x86: Physical address comparisons in fill_p*d/pte
  2025-04-04 16:56   ` Dave Hansen
@ 2025-04-09  7:49     ` Maciej Wieczor-Retman
  0 siblings, 0 replies; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-09  7:49 UTC (permalink / raw)
  To: Dave Hansen
  Cc: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, kaleshsingh, jgross, andreyknvl, scott,
	tony.luck, dvyukov, pasha.tatashin, ziy, broonie, gatlin.newhouse,
	jackmanb, wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst, llvm, linux-mm, linux-doc,
	linux-arm-kernel, linux-kbuild, linux-kernel, kasan-dev, x86

On 2025-04-04 at 09:56:31 -0700, Dave Hansen wrote:
>On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
>> +		if (__pa(p4d) != (pgtable_l5_enabled() ?
>> +				  (unsigned long)pgd_val(*pgd) & PTE_PFN_MASK :
>> +				  __pa(pgd)))
>>  			printk(KERN_ERR "PAGETABLE BUG #00! %p <-> %p\n",
>
>This one is pretty fugly. But I guess it's just one place and it
>probably isn't worth refactoring this and the other helpers just for a
>debug message.

I was trying to think of some prettier way to open code it but this seemed like
the simplest one.

-- 
Kind regards
Maciej Wieczór-Retman


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 09/14] x86: Minimal SLAB alignment
  2025-04-04 16:59   ` Dave Hansen
@ 2025-04-09 12:49     ` Maciej Wieczor-Retman
  2025-04-09 15:24       ` Dave Hansen
  0 siblings, 1 reply; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-09 12:49 UTC (permalink / raw)
  To: Dave Hansen
  Cc: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, kaleshsingh, jgross, andreyknvl, scott,
	tony.luck, dvyukov, pasha.tatashin, ziy, broonie, gatlin.newhouse,
	jackmanb, wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst, llvm, linux-mm, linux-doc,
	linux-arm-kernel, linux-kbuild, linux-kernel, kasan-dev, x86

On 2025-04-04 at 09:59:49 -0700, Dave Hansen wrote:
>On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
>> Adjust x86 minimal SLAB alignment to match KASAN granularity size. In
>> tag-based mode the size changes to 16 bytes so the value needs to be 16.
>
>I feel like we need a _bit_ of a discussion of the impact here. We are,
>after all, trying to get this feature into shape so that it can be used
>more widely outside of just debugging environments.
>
>What's the impact of this in a production environment?

I tried booting a fedora 41 on a Sierra Forest system with KASAN disabled
(disabled only the reports so the 8 byte alignment doesn't cause a wall of debug
information). Did so for both 8 byte alignment (default) and 16 byte alignment
(added by the series).

The differences looked mostly like noise, sometimes the higher alignment would
use up a little bit less memory, sometimes a little bit more. I looked at all
values in "cat /proc/meminfo".

Is there some slab/slub benchmark for the kernel that would make sense to
checkout here?

-- 
Kind regards
Maciej Wieczór-Retman


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 10/14] x86: Update the KASAN non-canonical hook
  2025-04-04 17:37   ` Dave Hansen
@ 2025-04-09 14:34     ` Maciej Wieczor-Retman
  2025-04-09 18:29       ` Dave Hansen
  0 siblings, 1 reply; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-09 14:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, kaleshsingh, jgross, andreyknvl, scott,
	tony.luck, dvyukov, pasha.tatashin, ziy, broonie, gatlin.newhouse,
	jackmanb, wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst, llvm, linux-mm, linux-doc,
	linux-arm-kernel, linux-kbuild, linux-kernel, kasan-dev, x86

On 2025-04-04 at 10:37:53 -0700, Dave Hansen wrote:
>On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
>> The kasan_non_canonical_hook() is useful in pointing out that an address
>> which caused some kind of error could be the result of
>> kasan_mem_to_shadow() mapping. Currently it's called only in the general
>> protection handler code path but can give helpful information also in
>> page fault oops reports.
>> 
>> For example consider a page fault for address 0xffdefc0000000000 on a
>> 5-level paging system. It could have been accessed from KASAN's
>> kasan_mem_to_shadow() called on 0xfef0000000000000 address. Without the
>> kasan_non_canonical_hook() in the page fault case it might be hard to
>> figure out why an error occurred.
>> 
>> Add kasan_non_canonical_hook() to the beginning of show_fault_oops().
>> 
>> Update kasan_non_canonical_hook() to take into account the possible
>> memory to shadow mappings in the software tag-based mode of x86.
>> 
>> Patch was tested with positive results by accessing the following
>> addresses, causing #GPs and #PFs.
>> 
>> Valid mappings (showing kasan_non_canonical_hook() message):
>> 	0xFFFFFFFF8FFFFFFF
>> 	0xFEF0000000000000
>> 	0x7FFFFF4FFFFFFFFF
>> 	0x7EF0000000000000
>> Invalid mappings (not showing kasan_non_canonical_hook() message):
>> 	0xFFFFFFFFF8FFFFFF
>> 	0xFFBFFC0000000000
>> 	0x07EFFC0000000000
>> 	0x000E000000000000
>> 
>> Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
>> ---
>> Changelog v3:
>> - Move the report.c part from first patch in the series to this new
>>   patch to have x86 changes in one place.
>> - Add the call in fault oops.
>> - Extend the comment in report.c with a graphical representation of what
>>   addresses are valid and invalid in memory to shadow mapping.
>> 
>>  arch/x86/mm/fault.c |  2 ++
>>  mm/kasan/report.c   | 36 +++++++++++++++++++++++++++++++++++-
>>  2 files changed, 37 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index 697432f63c59..16366af60ae5 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -511,6 +511,8 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
>>  	if (!oops_may_print())
>>  		return;
>>  
>> +	kasan_non_canonical_hook(address);
>> +
>>  	if (error_code & X86_PF_INSTR) {
>>  		unsigned int level;
>>  		bool nx, rw;
>> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
>> index f24f11cc644a..135307c93c2c 100644
>> --- a/mm/kasan/report.c
>> +++ b/mm/kasan/report.c
>> @@ -700,7 +700,7 @@ void kasan_non_canonical_hook(unsigned long addr)
>>  	 * operation would overflow only for some memory addresses. However, due
>>  	 * to the chosen KASAN_SHADOW_OFFSET values and the fact the
>>  	 * kasan_mem_to_shadow() only operates on pointers with the tag reset,
>> -	 * the overflow always happens.
>> +	 * the overflow always happens (for both x86 and arm64).
>>  	 *
>>  	 * For arm64, the top byte of the pointer gets reset to 0xFF. Thus, the
>>  	 * possible shadow addresses belong to a region that is the result of
>> @@ -715,6 +715,40 @@ void kasan_non_canonical_hook(unsigned long addr)
>>  			return;
>>  	}
>>  
>> +	 /*
>> +	  * For x86-64, only the pointer bits [62:57] get reset, and bits #63
>> +	  * and #56 can be 0 or 1. Thus, kasan_mem_to_shadow() can be possibly
>> +	  * applied to two regions of memory:
>> +	  * [0x7E00000000000000, 0x7FFFFFFFFFFFFFFF] and
>> +	  * [0xFE00000000000000, 0xFFFFFFFFFFFFFFFF]. As the overflow happens
>> +	  * for both ends of both memory ranges, both possible shadow regions
>> +	  * are contiguous.
>> +	  *
>> +	  * Given the KASAN_SHADOW_OFFSET equal to 0xffeffc0000000000, the
>> +	  * following ranges are valid mem-to-shadow mappings:
>> +	  *
>> +	  * 0xFFFFFFFFFFFFFFFF
>> +	  *         INVALID
>> +	  * 0xFFEFFBFFFFFFFFFF - kasan_mem_to_shadow(~0UL)
>> +	  *         VALID   - kasan shadow mem
>> +	  *         VALID   - non-canonical kernel virtual address
>> +	  * 0xFFCFFC0000000000 - kasan_mem_to_shadow(0xFEUL << 56)
>> +	  *         INVALID
>> +	  * 0x07EFFBFFFFFFFFFF - kasan_mem_to_shadow(~0UL >> 1)
>> +	  *         VALID   - non-canonical user virtual addresses
>> +	  *         VALID   - user addresses
>> +	  * 0x07CFFC0000000000 - kasan_mem_to_shadow(0x7EUL << 56)
>> +	  *         INVALID
>> +	  * 0x0000000000000000
>> +	  */
>> +	if (IS_ENABLED(CONFIG_KASAN_SW_TAGS) && IS_ENABLED(CONFIG_X86_64)) {
>
>One overall comment on this series: there's a lot of unnecessary
>complexity. Case in point:
>
>	config ADDRESS_MASKING
>	        depends on X86_64
>
>and
>
>	select HAVE_ARCH_KASAN_SW_TAGS		if ADDRESS_MASKING
>
>and
>
>	config KASAN_SW_TAGS
>        	depends on HAVE_ARCH_KASAN_SW_TAGS ...
>
>
>So you can't have CONFIG_KASAN_SW_TAGS set without CONFIG_X86_64.

This line was there so it only runs on x86 and not on arm64. So yeah, it looks
redundant but if we remove the first condition we might get some bogus info in
generic mode, and if we remove the second condition people on arm might get some
weird output.

If you still think this needs to get separated somehow then perhaps moving all
the CONFIG_KASAN_SW_TAGS code to a separate function might do it? And then there
would only be something like if(x86) else if(arm).

>
>> +		if ((addr < (u64)kasan_mem_to_shadow((void *)(0x7E UL << 56)) ||
>> +		     addr > (u64)kasan_mem_to_shadow((void *)(~0UL >> 1))) &&
>> +		    (addr < (u64)kasan_mem_to_shadow((void *)(0xFE UL << 56)) ||
>> +		     addr > (u64)kasan_mem_to_shadow((void *)(~0UL))))
>> +			return;
>> +	}
>This isn't looking great.
>
>I'd much rather have those kasan_mem_to_shadow() arguments be built up
>programmatically.
>
>I'm also not following the description of where these ranges come from:
>
>	[0x7E00000000000000, 0x7FFFFFFFFFFFFFFF]
>	[0xFE00000000000000, 0xFFFFFFFFFFFFFFFF]
>
>I obviously recognize the top kernel and top userspace addresses, but
>there do 0x7E... and 0xFE... come from? Is that because both of them
>only have 56 actual bits of address space?

We want to cover the whole LA mapping to shadow memory. These ranges are due to
the compiler masking out tag bits (in this case the 6 LAM bits). So after
masking out the [57:62] bits, we get these two contiguous ranges. All the other
addresses are invalid mappings when using kasan_mem_to_shadow().

We had a long exchange with Andrey on how to cover all the edge cases. Perhaps
this message summarizes it best [1].

>
>Wouldn't we be better off writing that as, say:
>
>#define HIGHEST_KER_ADDR (void *)0xFFFFFFFFFFFFFFFF
>// ^ we probably have some macro for that already
>#define LOWEST_KERN_ADDR (void *)(HIGHEST_KERNEL_ADDRESS - \
>					(1<<56) + 1)
>// ^ or can this be calculated by tag manipulation?
>
>which yields:
>
>   void *_addr = (u64)addr;
>   ...
>
>   in_kern_shadow = (_addr >= kasan_mem_to_shadow(LOWEST_KERN_ADDR) ||
>		    (_addr <= kasan_mem_to_shadow(HIGHEST_KERN_ADDR);
>   in_user_shadow = (_addr >= kasan_mem_to_shadow(LOWEST_USER_ADDR) ||
>		    (_addr <= kasan_mem_to_shadow(HIGHEST_USER_ADDR);
>
>   if (!in_kern_shadow &&
>       !in_user_shadow)
>	return;
>
>I _think_ that's the same logic you have. Isn't it slightly more readable?

Yes, I like it more than just generating the addresses in the parenthesis. What
do you think about this naming? KASAN prefix and [k/u]addr since it's not really
the lowest/highest address in the whole LA, just in this KASAN compiler scheme.
And I changed 1<<56 to 2<<56 so it generates 0xFE00000000000000 instead of
0xFF00000000000000.

	#define KASAN_HIGHEST_KADDR (void *)0xFFFFFFFFFFFFFFFF
	#define KASAN_LOWEST_KADDR (void *)(KASAN_HIGHEST_KADDR - \
						(2<<56) + 1)
	#define KASAN_HIGHEST_UADDR (void *)0x7FFFFFFFFFFFFFFF
	#define KASAN_LOWEST_UADDR (void *)(KASAN_HIGHEST_UADDR - \
						(2<<56) + 1)

[1] https://lore.kernel.org/all/CA+fCnZdUFO0+G9HHy4oaQfEx8sm3D_ZfxdkH3y2ZojjYqTN74Q@mail.gmail.com/

-- 
Kind regards
Maciej Wieczór-Retman


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 11/14] x86: Handle int3 for inline KASAN reports
  2025-04-04 17:55   ` Dave Hansen
@ 2025-04-09 14:48     ` Maciej Wieczor-Retman
  0 siblings, 0 replies; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-09 14:48 UTC (permalink / raw)
  To: Dave Hansen
  Cc: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, kaleshsingh, jgross, andreyknvl, scott,
	tony.luck, dvyukov, pasha.tatashin, ziy, broonie, gatlin.newhouse,
	jackmanb, wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst, llvm, linux-mm, linux-doc,
	linux-arm-kernel, linux-kbuild, linux-kernel, kasan-dev, x86

On 2025-04-04 at 10:55:09 -0700, Dave Hansen wrote:
>On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
>> When a tag mismatch happens in inline software tag-based KASAN on x86 an
>> int3 instruction is executed and needs proper handling.
>
>Does this mean "inline software"? Or "inline" functions? I'm not quite
>parsing that. I think it needs some more background.

Both software KASAN modes (generic and tag-based) have an inline and outline
variant. So I was referring to the inline mode in software tag-based mode. I'm
mentioning "software" since there is also the "hardware" mode.

>
>> Call kasan_report() from the int3 handler and pass down the proper
>> information from registers - RDI should contain the problematic address
>> and RAX other metadata.
>> 
>> Also early return from the int3 selftest if inline KASAN is enabled
>> since it will cause a kernel panic otherwise.
>...
>> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
>> index bf82c6f7d690..ba277a25b57f 100644
>> --- a/arch/x86/kernel/alternative.c
>> +++ b/arch/x86/kernel/alternative.c
>> @@ -1979,6 +1979,9 @@ static noinline void __init int3_selftest(void)
>>  	};
>>  	unsigned int val = 0;
>>  
>> +	if (IS_ENABLED(CONFIG_KASAN_INLINE))
>> +		return;
>
>Comments, please. This is a total non sequitur otherwise.

Sure, will add.

>
>>  	BUG_ON(register_die_notifier(&int3_exception_nb));
>>  
>>  	/*
>> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
>> index 9f88b8a78e50..32c81fc2d439 100644
>> --- a/arch/x86/kernel/traps.c
>> +++ b/arch/x86/kernel/traps.c
>...
>> @@ -849,6 +850,51 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
>>  	cond_local_irq_disable(regs);
>>  }
>>  
>> +#ifdef CONFIG_KASAN_SW_TAGS
>> +
>> +#define KASAN_RAX_RECOVER	0x20
>> +#define KASAN_RAX_WRITE	0x10
>> +#define KASAN_RAX_SIZE_MASK	0x0f
>> +#define KASAN_RAX_SIZE(rax)	(1 << ((rax) & KASAN_RAX_SIZE_MASK))
>
>This ABI _looks_ like it was conjured out out of thin air. I assume it's
>coming from the compiler. Any pointers to that ABI definition in or out
>of the kernel would be appreciated.

I'll put a comment that it's related to compilare ABI and I'll add a link to the
relevant compiler file in the patch message.

>
>> +static bool kasan_handler(struct pt_regs *regs)
>> +{
>> +	int metadata = regs->ax;
>> +	u64 addr = regs->di;
>> +	u64 pc = regs->ip;
>> +	bool recover = metadata & KASAN_RAX_RECOVER;
>> +	bool write = metadata & KASAN_RAX_WRITE;
>> +	size_t size = KASAN_RAX_SIZE(metadata);
>
>"metadata" is exactly the same length as "regs->ax", so it seems a
>little silly. Also, please use vertical alignment as a tool to make code
>more readable. Isn't this much more readable?
>
>	bool recover = regs->ax & KASAN_RAX_RECOVER;
>	bool write   = regs->ax & KASAN_RAX_WRITE;
>	size_t size  = KASAN_RAX_SIZE(regs->ax);
>	u64 addr     = regs->di;
>	u64 pc       = regs->ip;
>

Thanks, I'll apply this.

>> +	if (!IS_ENABLED(CONFIG_KASAN_INLINE))
>> +		return false;
>> +
>> +	if (user_mode(regs))
>> +		return false;
>> +
>> +	kasan_report((void *)addr, size, write, pc);
>> +
>> +	/*
>> +	 * The instrumentation allows to control whether we can proceed after
>> +	 * a crash was detected. This is done by passing the -recover flag to
>> +	 * the compiler. Disabling recovery allows to generate more compact
>> +	 * code.
>> +	 *
>> +	 * Unfortunately disabling recovery doesn't work for the kernel right
>> +	 * now. KASAN reporting is disabled in some contexts (for example when
>> +	 * the allocator accesses slab object metadata; this is controlled by
>> +	 * current->kasan_depth). All these accesses are detected by the tool,
>> +	 * even though the reports for them are not printed.
>> +	 *
>> +	 * This is something that might be fixed at some point in the future.
>> +	 */
>
>Can we please find a way to do this that doesn't copy and paste a rather
>verbose comment?
>
>What if we passed 'recover' into kasan_report() and had it do the die()?

If that doesn't conflict somehow with how the kasan_report() is envisioned to
work I think it's a good idea. Since risc-v will soon add this too I imagine? So
it'd be copied in three places.

>
>> +	if (!recover)
>> +		die("Oops - KASAN", regs, 0);
>> +	return true;
>> +}
>> +
>> +#endif
>> +
>>  static bool do_int3(struct pt_regs *regs)
>>  {
>>  	int res;
>> @@ -863,6 +909,12 @@ static bool do_int3(struct pt_regs *regs)
>>  	if (kprobe_int3_handler(regs))
>>  		return true;
>>  #endif
>> +
>> +#ifdef CONFIG_KASAN_SW_TAGS
>> +	if (kasan_handler(regs))
>> +		return true;
>> +#endif
>I won't get _too_ grumbly about ti since there's another culprit right
>above, but the "no #fidefs in .c files" rule still applies. The right
>way to do this is with a stub kasan_handler() in a header with the
>#ifdef in the header.
>
>Actually, ditto on the kasan_handler() #ifdef. I suspect it can go away
>too and be replaced with a IS_ENABLED(CONFIG_KASAN_SW_TAGS) check.

Okay, thanks for pointing it out, I'll add the stub and IS_ENABLED().

-- 
Kind regards
Maciej Wieczór-Retman


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 09/14] x86: Minimal SLAB alignment
  2025-04-09 12:49     ` Maciej Wieczor-Retman
@ 2025-04-09 15:24       ` Dave Hansen
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Hansen @ 2025-04-09 15:24 UTC (permalink / raw)
  To: Maciej Wieczor-Retman
  Cc: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, kaleshsingh, jgross, andreyknvl, scott,
	tony.luck, dvyukov, pasha.tatashin, ziy, broonie, gatlin.newhouse,
	jackmanb, wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst, llvm, linux-mm, linux-doc,
	linux-arm-kernel, linux-kbuild, linux-kernel, kasan-dev, x86

On 4/9/25 05:49, Maciej Wieczor-Retman wrote:
> The differences looked mostly like noise, sometimes the higher alignment would
> use up a little bit less memory, sometimes a little bit more. I looked at all
> values in "cat /proc/meminfo".
> 
> Is there some slab/slub benchmark for the kernel that would make sense to
> checkout here?

You don't need to benchmark anything. Just mention that it will waste
memory and also give *some* ballpark estimate on how much. Just looking
at your laptop's /proc/slabinfo would be a good start.

Oh, and it wouldn't hurt to find out when and why the minimal slab
alignment got dropped down to 8 bytes. I _thought_ it was higher at some
point. Presumably there was a good reason for it and you're now undoing
part of it.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 13/14] mm: Unpoison pcpu chunks with base address tag
  2025-04-04 18:08   ` Dave Hansen
@ 2025-04-09 16:32     ` Maciej Wieczor-Retman
  2025-04-09 17:12       ` Dave Hansen
  0 siblings, 1 reply; 32+ messages in thread
From: Maciej Wieczor-Retman @ 2025-04-09 16:32 UTC (permalink / raw)
  To: Dave Hansen
  Cc: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, kaleshsingh, jgross, andreyknvl, scott,
	tony.luck, dvyukov, pasha.tatashin, ziy, broonie, gatlin.newhouse,
	jackmanb, wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst, llvm, linux-mm, linux-doc,
	linux-arm-kernel, linux-kbuild, linux-kernel, kasan-dev, x86

On 2025-04-04 at 11:08:12 -0700, Dave Hansen wrote:
>On 4/4/25 06:14, Maciej Wieczor-Retman wrote:
>> The problem presented here is related to NUMA systems and tag-based
>> KASAN mode. Getting to it can be explained in the following points:
>> 
>> 	1. A new chunk is created with pcpu_create_chunk() and
>> 	   vm_structs are allocated. On systems with one NUMA node only
>> 	   one is allocated, but with more NUMA nodes at least a second
>> 	   one will be allocated too.
>> 
>> 	2. chunk->base_addr is assigned the modified value of
>> 	   vms[0]->addr and thus inherits the tag of this allocated
>> 	   structure.
>> 
>> 	3. In pcpu_alloc() for each possible cpu pcpu_chunk_addr() is
>> 	   executed which calculates per cpu pointers that correspond to
>> 	   the vms structure addresses. The calculations are based on
>> 	   adding an offset from a table to chunk->base_addr.
>> 
>> Here the problem presents itself since for addresses based on vms[1] and
>> up, the tag will be different than the ones based on vms[0] (base_addr).
>> The tag mismatch happens and an error is reported.
>> 
>> Unpoison all the vms[]->addr with the same tag to resolve the mismatch.
>
>I think there's a bit too much superfluous information in there. For
>instance, it's not important to talk about how or why there can be more
>than one chunk, just say there _can_ be more than one.
>
>	1. There can be more than one chunk
>	2. The chunks are virtually contiguous
>	3. Since they are virtually contiguous, the chunks are all
>	   addressed from a single base address
>	4. The base address has a tag
>	5. The base address points at the first chunk and thus inherits
>	   the tag of the first chunk
>	6. The subsequent chunks will be accessed with the tag from the
>	   first chunk
>	7. Thus, the subsequent chunks need to have their tag set to
>	   match that of the first chunk.
>
>Right?

They don't seem to be virtuall contiguous. At least from testing on a live
system, QEMU and Simics I never saw any be contiguous. And I double checked
today too. But your version is nice, I'll just drop 2 and 3 and I think it still
will make sense, right?

>
>> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
>> index 54481f8c30c5..bd033b2ba383 100644
>> --- a/include/linux/kasan.h
>> +++ b/include/linux/kasan.h
>> @@ -613,6 +613,13 @@ static __always_inline void kasan_poison_vmalloc(const void *start,
>>  		__kasan_poison_vmalloc(start, size);
>>  }
>>  
>> +void __kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms);
>> +static __always_inline void kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms)
>> +{
>> +	if (kasan_enabled())
>> +		__kasan_unpoison_vmap_areas(vms, nr_vms);
>> +}
>> +
>>  #else /* CONFIG_KASAN_VMALLOC */
>>  
>>  static inline void kasan_populate_early_vm_area_shadow(void *start,
>> @@ -637,6 +644,9 @@ static inline void *kasan_unpoison_vmalloc(const void *start,
>>  static inline void kasan_poison_vmalloc(const void *start, unsigned long size)
>>  { }
>>  
>> +static inline void kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms)
>> +{ }
>> +
>>  #endif /* CONFIG_KASAN_VMALLOC */
>>  
>>  #if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
>> diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
>> index 88d1c9dcb507..9496f256bc0f 100644
>> --- a/mm/kasan/shadow.c
>> +++ b/mm/kasan/shadow.c
>> @@ -582,6 +582,17 @@ void __kasan_poison_vmalloc(const void *start, unsigned long size)
>>  	kasan_poison(start, size, KASAN_VMALLOC_INVALID, false);
>>  }
>>  
>> +void __kasan_unpoison_vmap_areas(struct vm_struct **vms, int nr_vms)
>> +{
>> +	int area;
>> +
>> +	for (area = 0 ; area < nr_vms ; area++) {
>> +		kasan_poison(vms[area]->addr, vms[area]->size,
>> +			     arch_kasan_get_tag(vms[0]->addr), false);
>> +		arch_kasan_set_tag(vms[area]->addr, arch_kasan_get_tag(vms[0]->addr));
>> +	}
>> +}
>
>-ENOCOMMENTS

Right, I'll add a description why that's needed.

>
>>  #else /* CONFIG_KASAN_VMALLOC */
>>  
>>  int kasan_alloc_module_shadow(void *addr, size_t size, gfp_t gfp_mask)
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 61981ee1c9d2..fbd56bf8aeb2 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -4783,8 +4783,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>>  	 * non-VM_ALLOC mappings, see __kasan_unpoison_vmalloc().
>>  	 */
>>  	for (area = 0; area < nr_vms; area++)
>> -		vms[area]->addr = kasan_unpoison_vmalloc(vms[area]->addr,
>> -				vms[area]->size, KASAN_VMALLOC_PROT_NORMAL);
>> +		kasan_unpoison_vmap_areas(vms, nr_vms);
>>  
>>  	kfree(vas);
>>  	return vms;
>
>So, the right way to do this is refactor, first, then add your changes
>after. This really wants to be two patches.

Sure, I'll try splitting it.

-- 
Kind regards
Maciej Wieczór-Retman


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 13/14] mm: Unpoison pcpu chunks with base address tag
  2025-04-09 16:32     ` Maciej Wieczor-Retman
@ 2025-04-09 17:12       ` Dave Hansen
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Hansen @ 2025-04-09 17:12 UTC (permalink / raw)
  To: Maciej Wieczor-Retman
  Cc: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, kaleshsingh, jgross, andreyknvl, scott,
	tony.luck, dvyukov, pasha.tatashin, ziy, broonie, gatlin.newhouse,
	jackmanb, wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst, llvm, linux-mm, linux-doc,
	linux-arm-kernel, linux-kbuild, linux-kernel, kasan-dev, x86

On 4/9/25 09:32, Maciej Wieczor-Retman wrote:
> They don't seem to be virtuall contiguous. At least from testing on a live
> system, QEMU and Simics I never saw any be contiguous. And I double checked
> today too. But your version is nice, I'll just drop 2 and 3 and I think it still
> will make sense, right?

Yep, it still makes sense.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 10/14] x86: Update the KASAN non-canonical hook
  2025-04-09 14:34     ` Maciej Wieczor-Retman
@ 2025-04-09 18:29       ` Dave Hansen
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Hansen @ 2025-04-09 18:29 UTC (permalink / raw)
  To: Maciej Wieczor-Retman
  Cc: hpa, hch, nick.desaulniers+lkml, kuan-ying.lee, masahiroy,
	samuel.holland, mingo, corbet, ryabinin.a.a, guoweikang.kernel,
	jpoimboe, ardb, vincenzo.frascino, glider, kirill.shutemov,
	apopple, samitolvanen, kaleshsingh, jgross, andreyknvl, scott,
	tony.luck, dvyukov, pasha.tatashin, ziy, broonie, gatlin.newhouse,
	jackmanb, wangkefeng.wang, thiago.bauermann, tglx, kees, akpm,
	jason.andryuk, snovitoll, xin, jan.kiszka, bp, rppt, peterz,
	pankaj.gupta, thuth, andriy.shevchenko, joel.granados, kbingham,
	nicolas, mark.rutland, surenb, catalin.marinas, morbo,
	justinstitt, ubizjak, jhubbard, urezki, dave.hansen, bhe, luto,
	baohua, nathan, will, brgerst, llvm, linux-mm, linux-doc,
	linux-arm-kernel, linux-kbuild, linux-kernel, kasan-dev, x86

On 4/9/25 07:34, Maciej Wieczor-Retman wrote:
> Yes, I like it more than just generating the addresses in the parenthesis. What
> do you think about this naming? KASAN prefix and [k/u]addr since it's not really
> the lowest/highest address in the whole LA, just in this KASAN compiler scheme.
> And I changed 1<<56 to 2<<56 so it generates 0xFE00000000000000 instead of
> 0xFF00000000000000.
> 
> 	#define KASAN_HIGHEST_KADDR (void *)0xFFFFFFFFFFFFFFFF
> 	#define KASAN_LOWEST_KADDR (void *)(KASAN_HIGHEST_KADDR - \
> 						(2<<56) + 1)
> 	#define KASAN_HIGHEST_UADDR (void *)0x7FFFFFFFFFFFFFFF
> 	#define KASAN_LOWEST_UADDR (void *)(KASAN_HIGHEST_UADDR - \
> 						(2<<56) + 1)

Yes, that is much better.


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2025-04-09 18:29 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-04 13:14 [PATCH v3 00/14] kasan: x86: arm64: KASAN tag-based mode for x86 Maciej Wieczor-Retman
2025-04-04 13:14 ` [PATCH v3 01/14] kasan: sw_tags: Use arithmetic shift for shadow computation Maciej Wieczor-Retman
2025-04-04 13:14 ` [PATCH v3 02/14] kasan: sw_tags: Support tag widths less than 8 bits Maciej Wieczor-Retman
2025-04-04 13:14 ` [PATCH v3 03/14] x86: Add arch specific kasan functions Maciej Wieczor-Retman
2025-04-04 16:06   ` Dave Hansen
2025-04-09  7:16     ` Maciej Wieczor-Retman
2025-04-04 13:14 ` [PATCH v3 04/14] kasan: arm64: x86: Make special tags arch specific Maciej Wieczor-Retman
2025-04-04 13:14 ` [PATCH v3 05/14] x86: Reset tag for virtual to physical address conversions Maciej Wieczor-Retman
2025-04-04 16:42   ` Dave Hansen
2025-04-09  7:36     ` Maciej Wieczor-Retman
2025-04-04 13:14 ` [PATCH v3 06/14] x86: Physical address comparisons in fill_p*d/pte Maciej Wieczor-Retman
2025-04-04 16:56   ` Dave Hansen
2025-04-09  7:49     ` Maciej Wieczor-Retman
2025-04-04 13:14 ` [PATCH v3 07/14] x86: KASAN raw shadow memory PTE init Maciej Wieczor-Retman
2025-04-04 13:14 ` [PATCH v3 08/14] x86: LAM initialization Maciej Wieczor-Retman
2025-04-04 13:14 ` [PATCH v3 09/14] x86: Minimal SLAB alignment Maciej Wieczor-Retman
2025-04-04 16:59   ` Dave Hansen
2025-04-09 12:49     ` Maciej Wieczor-Retman
2025-04-09 15:24       ` Dave Hansen
2025-04-04 13:14 ` [PATCH v3 10/14] x86: Update the KASAN non-canonical hook Maciej Wieczor-Retman
2025-04-04 17:37   ` Dave Hansen
2025-04-09 14:34     ` Maciej Wieczor-Retman
2025-04-09 18:29       ` Dave Hansen
2025-04-04 13:14 ` [PATCH v3 11/14] x86: Handle int3 for inline KASAN reports Maciej Wieczor-Retman
2025-04-04 17:55   ` Dave Hansen
2025-04-09 14:48     ` Maciej Wieczor-Retman
2025-04-04 13:14 ` [PATCH v3 12/14] kasan: Fix inline mode for x86 tag-based mode Maciej Wieczor-Retman
2025-04-04 13:14 ` [PATCH v3 13/14] mm: Unpoison pcpu chunks with base address tag Maciej Wieczor-Retman
2025-04-04 18:08   ` Dave Hansen
2025-04-09 16:32     ` Maciej Wieczor-Retman
2025-04-09 17:12       ` Dave Hansen
2025-04-04 13:14 ` [PATCH v3 14/14] x86: Make software tag-based kasan available Maciej Wieczor-Retman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).