linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/8] x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes
@ 2025-07-13  7:17 Mike Rapoport
  2025-07-13  7:17 ` [PATCH v3 1/8] execmem: drop unused execmem_update_copy() Mike Rapoport
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Mike Rapoport @ 2025-07-13  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Borislav Petkov, Christophe Leroy, Daniel Gomez,
	Dave Hansen, Ingo Molnar, Liam R. Howlett, Luis Chamberlain,
	Mark Rutland, Masami Hiramatsu, Mike Rapoport, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Steven Rostedt,
	Thomas Gleixner, Yann Ylavic, linux-kernel, linux-mm,
	linux-modules, linux-trace-kernel, x86

From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

Hi,

These patches enable use of EXECMEM_ROX_CACHE for ftrace and kprobes
allocations on x86.

They also include some ground work in execmem.

Since the execmem model for caching large ROX pages changed from the
initial assumption that the memory that is allocated from ROX cache is
always ROX to the current state where memory can be temporarily made RW and
then restored to ROX, we can stop using text poking to update it. This also
saves the hassle of trying lock text_mutex in execmem_cache_free() when
kprobes already hold that mutex.

The patches 1-6 update and cleanup execmem ROX cache management,
patch 7 enables EXECMEM_ROX_CACHE for kprobes and
patch 8 enables EXECMEM_ROX_CACHE for frace.

The patches are also available at git:

https://git.kernel.org/rppt/h/execmem/x86-rox/ftrace%2bkprobes/v3

v3:
* Fix spelling (Petr)
* Add ack and review tags, thanks all!

v2: https://lore.kernel.org/all/20250709134933.3848895-1-rppt@kernel.org
* Fix setting and clearing pending_free for an area (Yann)
* Reorder execmem_cache_free() to avoid error goto (Peter)
* Add comment why mas_store_gfp() cannot fail in execmem_cache_free() (Peter)


Mike Rapoport (Microsoft) (8):
  execmem: drop unused execmem_update_copy()
  execmem: introduce execmem_alloc_rw()
  execmem: rework execmem_cache_free()
  execmem: move execmem_force_rw() and execmem_restore_rox() before use
  execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)
  execmem: drop writable parameter from execmem_fill_trapping_insns()
  x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations
  x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations

 arch/x86/kernel/alternative.c  |   3 +-
 arch/x86/kernel/ftrace.c       |   2 +-
 arch/x86/kernel/kprobes/core.c |  18 ---
 arch/x86/mm/init.c             |  24 ++--
 include/linux/execmem.h        |  54 ++++-----
 kernel/module/main.c           |  13 +--
 mm/execmem.c                   | 198 +++++++++++++++++++++++++--------
 7 files changed, 194 insertions(+), 118 deletions(-)


base-commit: 86731a2a651e58953fc949573895f2fa6d456841
--
2.47.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/8] execmem: drop unused execmem_update_copy()
  2025-07-13  7:17 [PATCH v3 0/8] x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes Mike Rapoport
@ 2025-07-13  7:17 ` Mike Rapoport
  2025-07-13  7:17 ` [PATCH v3 2/8] execmem: introduce execmem_alloc_rw() Mike Rapoport
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2025-07-13  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Borislav Petkov, Christophe Leroy, Daniel Gomez,
	Dave Hansen, Ingo Molnar, Liam R. Howlett, Luis Chamberlain,
	Mark Rutland, Masami Hiramatsu, Mike Rapoport, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Steven Rostedt,
	Thomas Gleixner, Yann Ylavic, linux-kernel, linux-mm,
	linux-modules, linux-trace-kernel, x86

From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

The execmem_update_copy() that used text poking was required when memory
allocated from ROX cache was always read-only. Since now its permissions
can be switched to read-write there is no need in a function that updates
memory with text poking.

Remove it.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 include/linux/execmem.h | 13 -------------
 mm/execmem.c            |  5 -----
 2 files changed, 18 deletions(-)

diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index 3be35680a54f..734fbe83d98e 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -185,19 +185,6 @@ DEFINE_FREE(execmem, void *, if (_T) execmem_free(_T));
 struct vm_struct *execmem_vmap(size_t size);
 #endif
 
-/**
- * execmem_update_copy - copy an update to executable memory
- * @dst:  destination address to update
- * @src:  source address containing the data
- * @size: how many bytes of memory shold be copied
- *
- * Copy @size bytes from @src to @dst using text poking if the memory at
- * @dst is read-only.
- *
- * Return: a pointer to @dst or NULL on error
- */
-void *execmem_update_copy(void *dst, const void *src, size_t size);
-
 /**
  * execmem_is_rox - check if execmem is read-only
  * @type - the execmem type to check
diff --git a/mm/execmem.c b/mm/execmem.c
index 2b683e7d864d..0712ebb4eb77 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -399,11 +399,6 @@ void execmem_free(void *ptr)
 		vfree(ptr);
 }
 
-void *execmem_update_copy(void *dst, const void *src, size_t size)
-{
-	return text_poke_copy(dst, src, size);
-}
-
 bool execmem_is_rox(enum execmem_type type)
 {
 	return !!(execmem_info->ranges[type].flags & EXECMEM_ROX_CACHE);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/8] execmem: introduce execmem_alloc_rw()
  2025-07-13  7:17 [PATCH v3 0/8] x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes Mike Rapoport
  2025-07-13  7:17 ` [PATCH v3 1/8] execmem: drop unused execmem_update_copy() Mike Rapoport
@ 2025-07-13  7:17 ` Mike Rapoport
  2025-07-13  7:17 ` [PATCH v3 3/8] execmem: rework execmem_cache_free() Mike Rapoport
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2025-07-13  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Borislav Petkov, Christophe Leroy, Daniel Gomez,
	Dave Hansen, Ingo Molnar, Liam R. Howlett, Luis Chamberlain,
	Mark Rutland, Masami Hiramatsu, Mike Rapoport, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Steven Rostedt,
	Thomas Gleixner, Yann Ylavic, linux-kernel, linux-mm,
	linux-modules, linux-trace-kernel, x86

From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

Some callers of execmem_alloc() require the memory to be temporarily
writable even when it is allocated from ROX cache. These callers use
execemem_make_temp_rw() right after the call to execmem_alloc().

Wrap this sequence in execmem_alloc_rw() API.

Reviewed-by: Daniel Gomez <da.gomez@samsung.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 arch/x86/kernel/alternative.c |  3 +--
 include/linux/execmem.h       | 38 ++++++++++++++++++++---------------
 kernel/module/main.c          | 13 ++----------
 mm/execmem.c                  | 27 ++++++++++++++++++++++++-
 4 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index ea1d984166cd..526a5fef93ab 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -120,7 +120,7 @@ struct its_array its_pages;
 
 static void *__its_alloc(struct its_array *pages)
 {
-	void *page __free(execmem) = execmem_alloc(EXECMEM_MODULE_TEXT, PAGE_SIZE);
+	void *page __free(execmem) = execmem_alloc_rw(EXECMEM_MODULE_TEXT, PAGE_SIZE);
 	if (!page)
 		return NULL;
 
@@ -237,7 +237,6 @@ static void *its_alloc(void)
 	if (!page)
 		return NULL;
 
-	execmem_make_temp_rw(page, PAGE_SIZE);
 	if (pages == &its_pages)
 		set_memory_x((unsigned long)page, 1);
 
diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index 734fbe83d98e..8b61b05da7d5 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -67,21 +67,6 @@ enum execmem_range_flags {
  */
 void execmem_fill_trapping_insns(void *ptr, size_t size, bool writable);
 
-/**
- * execmem_make_temp_rw - temporarily remap region with read-write
- *			  permissions
- * @ptr:	address of the region to remap
- * @size:	size of the region to remap
- *
- * Remaps a part of the cached large page in the ROX cache in the range
- * [@ptr, @ptr + @size) as writable and not executable. The caller must
- * have exclusive ownership of this range and ensure nothing will try to
- * execute code in this range.
- *
- * Return: 0 on success or negative error code on failure.
- */
-int execmem_make_temp_rw(void *ptr, size_t size);
-
 /**
  * execmem_restore_rox - restore read-only-execute permissions
  * @ptr:	address of the region to remap
@@ -95,7 +80,6 @@ int execmem_make_temp_rw(void *ptr, size_t size);
  */
 int execmem_restore_rox(void *ptr, size_t size);
 #else
-static inline int execmem_make_temp_rw(void *ptr, size_t size) { return 0; }
 static inline int execmem_restore_rox(void *ptr, size_t size) { return 0; }
 #endif
 
@@ -165,6 +149,28 @@ struct execmem_info *execmem_arch_setup(void);
  */
 void *execmem_alloc(enum execmem_type type, size_t size);
 
+/**
+ * execmem_alloc_rw - allocate writable executable memory
+ * @type: type of the allocation
+ * @size: how many bytes of memory are required
+ *
+ * Allocates memory that will contain executable code, either generated or
+ * loaded from kernel modules.
+ *
+ * Allocates memory that will contain data coupled with executable code,
+ * like data sections in kernel modules.
+ *
+ * Forces writable permissions on the allocated memory and the caller is
+ * responsible to manage the permissions afterwards.
+ *
+ * For architectures that use ROX cache the permissions will be set to R+W.
+ * For architectures that don't use ROX cache the default permissions for @type
+ * will be used as they must be writable.
+ *
+ * Return: a pointer to the allocated memory or %NULL
+ */
+void *execmem_alloc_rw(enum execmem_type type, size_t size);
+
 /**
  * execmem_free - free executable memory
  * @ptr: pointer to the memory that should be freed
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 413ac6ea3702..d009326ef7bb 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1292,20 +1292,11 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
 	else
 		execmem_type = EXECMEM_MODULE_TEXT;
 
-	ptr = execmem_alloc(execmem_type, size);
+	ptr = execmem_alloc_rw(execmem_type, size);
 	if (!ptr)
 		return -ENOMEM;
 
-	if (execmem_is_rox(execmem_type)) {
-		int err = execmem_make_temp_rw(ptr, size);
-
-		if (err) {
-			execmem_free(ptr);
-			return -ENOMEM;
-		}
-
-		mod->mem[type].is_rox = true;
-	}
+	mod->mem[type].is_rox = execmem_is_rox(execmem_type);
 
 	/*
 	 * The pointer to these blocks of memory are stored on the module
diff --git a/mm/execmem.c b/mm/execmem.c
index 0712ebb4eb77..6b040fbc5f4f 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -336,7 +336,7 @@ static bool execmem_cache_free(void *ptr)
 	return true;
 }
 
-int execmem_make_temp_rw(void *ptr, size_t size)
+static int execmem_force_rw(void *ptr, size_t size)
 {
 	unsigned int nr = PAGE_ALIGN(size) >> PAGE_SHIFT;
 	unsigned long addr = (unsigned long)ptr;
@@ -358,6 +358,16 @@ int execmem_restore_rox(void *ptr, size_t size)
 }
 
 #else /* CONFIG_ARCH_HAS_EXECMEM_ROX */
+/*
+ * when ROX cache is not used the permissions defined by architectures for
+ * execmem ranges that are updated before use (e.g. EXECMEM_MODULE_TEXT) must
+ * be writable anyway
+ */
+static inline int execmem_force_rw(void *ptr, size_t size)
+{
+	return 0;
+}
+
 static void *execmem_cache_alloc(struct execmem_range *range, size_t size)
 {
 	return NULL;
@@ -387,6 +397,21 @@ void *execmem_alloc(enum execmem_type type, size_t size)
 	return kasan_reset_tag(p);
 }
 
+void *execmem_alloc_rw(enum execmem_type type, size_t size)
+{
+	void *p __free(execmem) = execmem_alloc(type, size);
+	int err;
+
+	if (!p)
+		return NULL;
+
+	err = execmem_force_rw(p, size);
+	if (err)
+		return NULL;
+
+	return no_free_ptr(p);
+}
+
 void execmem_free(void *ptr)
 {
 	/*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/8] execmem: rework execmem_cache_free()
  2025-07-13  7:17 [PATCH v3 0/8] x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes Mike Rapoport
  2025-07-13  7:17 ` [PATCH v3 1/8] execmem: drop unused execmem_update_copy() Mike Rapoport
  2025-07-13  7:17 ` [PATCH v3 2/8] execmem: introduce execmem_alloc_rw() Mike Rapoport
@ 2025-07-13  7:17 ` Mike Rapoport
  2025-07-13  7:17 ` [PATCH v3 4/8] execmem: move execmem_force_rw() and execmem_restore_rox() before use Mike Rapoport
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2025-07-13  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Borislav Petkov, Christophe Leroy, Daniel Gomez,
	Dave Hansen, Ingo Molnar, Liam R. Howlett, Luis Chamberlain,
	Mark Rutland, Masami Hiramatsu, Mike Rapoport, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Steven Rostedt,
	Thomas Gleixner, Yann Ylavic, linux-kernel, linux-mm,
	linux-modules, linux-trace-kernel, x86

From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

Currently execmem_cache_free() ignores potential allocation failures that
may happen in execmem_cache_add(). Besides, it uses text poking to fill the
memory with trapping instructions before returning it to cache although it
would be more efficient to make that memory writable, update it using
memcpy and then restore ROX protection.

Rework execmem_cache_free() so that in case of an error it will defer
freeing of the memory to a delayed work.

With this the happy fast path will now change permissions to RW, fill the
memory with trapping instructions using memcpy, restore ROX permissions,
add the memory back to the free cache and clear the relevant entry in
busy_areas.

If any step in the fast path fails, the entry in busy_areas will be marked
as pending_free. These entries will be handled by a delayed work and freed
asynchronously.

To make the fast path faster, use __GFP_NORETRY for memory allocations and
let asynchronous handler try harder with GFP_KERNEL.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 mm/execmem.c | 125 +++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 102 insertions(+), 23 deletions(-)

diff --git a/mm/execmem.c b/mm/execmem.c
index 6b040fbc5f4f..4670e97f8e4e 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -93,8 +93,15 @@ struct execmem_cache {
 	struct mutex mutex;
 	struct maple_tree busy_areas;
 	struct maple_tree free_areas;
+	unsigned int pending_free_cnt;	/* protected by mutex */
 };
 
+/* delay to schedule asynchronous free if fast path free fails */
+#define FREE_DELAY	(msecs_to_jiffies(10))
+
+/* mark entries in busy_areas that should be freed asynchronously */
+#define PENDING_FREE_MASK	(1 << (PAGE_SHIFT - 1))
+
 static struct execmem_cache execmem_cache = {
 	.mutex = __MUTEX_INITIALIZER(execmem_cache.mutex),
 	.busy_areas = MTREE_INIT_EXT(busy_areas, MT_FLAGS_LOCK_EXTERN,
@@ -155,20 +162,17 @@ static void execmem_cache_clean(struct work_struct *work)
 
 static DECLARE_WORK(execmem_cache_clean_work, execmem_cache_clean);
 
-static int execmem_cache_add(void *ptr, size_t size)
+static int execmem_cache_add_locked(void *ptr, size_t size, gfp_t gfp_mask)
 {
 	struct maple_tree *free_areas = &execmem_cache.free_areas;
-	struct mutex *mutex = &execmem_cache.mutex;
 	unsigned long addr = (unsigned long)ptr;
 	MA_STATE(mas, free_areas, addr - 1, addr + 1);
 	unsigned long lower, upper;
 	void *area = NULL;
-	int err;
 
 	lower = addr;
 	upper = addr + size - 1;
 
-	mutex_lock(mutex);
 	area = mas_walk(&mas);
 	if (area && mas.last == addr - 1)
 		lower = mas.index;
@@ -178,12 +182,14 @@ static int execmem_cache_add(void *ptr, size_t size)
 		upper = mas.last;
 
 	mas_set_range(&mas, lower, upper);
-	err = mas_store_gfp(&mas, (void *)lower, GFP_KERNEL);
-	mutex_unlock(mutex);
-	if (err)
-		return err;
+	return mas_store_gfp(&mas, (void *)lower, gfp_mask);
+}
 
-	return 0;
+static int execmem_cache_add(void *ptr, size_t size, gfp_t gfp_mask)
+{
+	guard(mutex)(&execmem_cache.mutex);
+
+	return execmem_cache_add_locked(ptr, size, gfp_mask);
 }
 
 static bool within_range(struct execmem_range *range, struct ma_state *mas,
@@ -278,7 +284,7 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
 	if (err)
 		goto err_free_mem;
 
-	err = execmem_cache_add(p, alloc_size);
+	err = execmem_cache_add(p, alloc_size, GFP_KERNEL);
 	if (err)
 		goto err_reset_direct_map;
 
@@ -307,29 +313,102 @@ static void *execmem_cache_alloc(struct execmem_range *range, size_t size)
 	return __execmem_cache_alloc(range, size);
 }
 
+static inline bool is_pending_free(void *ptr)
+{
+	return ((unsigned long)ptr & PENDING_FREE_MASK);
+}
+
+static inline void *pending_free_set(void *ptr)
+{
+	return (void *)((unsigned long)ptr | PENDING_FREE_MASK);
+}
+
+static inline void *pending_free_clear(void *ptr)
+{
+	return (void *)((unsigned long)ptr & ~PENDING_FREE_MASK);
+}
+
+static int execmem_force_rw(void *ptr, size_t size);
+
+static int __execmem_cache_free(struct ma_state *mas, void *ptr, gfp_t gfp_mask)
+{
+	size_t size = mas_range_len(mas);
+	int err;
+
+	err = execmem_force_rw(ptr, size);
+	if (err)
+		return err;
+
+	execmem_fill_trapping_insns(ptr, size, /* writable = */ true);
+	execmem_restore_rox(ptr, size);
+
+	err = execmem_cache_add_locked(ptr, size, gfp_mask);
+	if (err)
+		return err;
+
+	mas_store_gfp(mas, NULL, gfp_mask);
+	return 0;
+}
+
+static void execmem_cache_free_slow(struct work_struct *work);
+static DECLARE_DELAYED_WORK(execmem_cache_free_work, execmem_cache_free_slow);
+
+static void execmem_cache_free_slow(struct work_struct *work)
+{
+	struct maple_tree *busy_areas = &execmem_cache.busy_areas;
+	MA_STATE(mas, busy_areas, 0, ULONG_MAX);
+	void *area;
+
+	guard(mutex)(&execmem_cache.mutex);
+
+	if (!execmem_cache.pending_free_cnt)
+		return;
+
+	mas_for_each(&mas, area, ULONG_MAX) {
+		if (!is_pending_free(area))
+			continue;
+
+		area = pending_free_clear(area);
+		if (__execmem_cache_free(&mas, area, GFP_KERNEL))
+			continue;
+
+		execmem_cache.pending_free_cnt--;
+	}
+
+	if (execmem_cache.pending_free_cnt)
+		schedule_delayed_work(&execmem_cache_free_work, FREE_DELAY);
+	else
+		schedule_work(&execmem_cache_clean_work);
+}
+
 static bool execmem_cache_free(void *ptr)
 {
 	struct maple_tree *busy_areas = &execmem_cache.busy_areas;
-	struct mutex *mutex = &execmem_cache.mutex;
 	unsigned long addr = (unsigned long)ptr;
 	MA_STATE(mas, busy_areas, addr, addr);
-	size_t size;
 	void *area;
+	int err;
+
+	guard(mutex)(&execmem_cache.mutex);
 
-	mutex_lock(mutex);
 	area = mas_walk(&mas);
-	if (!area) {
-		mutex_unlock(mutex);
+	if (!area)
 		return false;
-	}
-	size = mas_range_len(&mas);
 
-	mas_store_gfp(&mas, NULL, GFP_KERNEL);
-	mutex_unlock(mutex);
-
-	execmem_fill_trapping_insns(ptr, size, /* writable = */ false);
-
-	execmem_cache_add(ptr, size);
+	err = __execmem_cache_free(&mas, area, GFP_KERNEL | __GFP_NORETRY);
+	if (err) {
+		/*
+		 * mas points to exact slot we've got the area from, nothing
+		 * else can modify the tree because of the mutex, so there
+		 * won't be any allocations in mas_store_gfp() and it will just
+		 * change the pointer.
+		 */
+		area = pending_free_set(area);
+		mas_store_gfp(&mas, area, GFP_KERNEL);
+		execmem_cache.pending_free_cnt++;
+		schedule_delayed_work(&execmem_cache_free_work, FREE_DELAY);
+		return true;
+	}
 
 	schedule_work(&execmem_cache_clean_work);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/8] execmem: move execmem_force_rw() and execmem_restore_rox() before use
  2025-07-13  7:17 [PATCH v3 0/8] x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes Mike Rapoport
                   ` (2 preceding siblings ...)
  2025-07-13  7:17 ` [PATCH v3 3/8] execmem: rework execmem_cache_free() Mike Rapoport
@ 2025-07-13  7:17 ` Mike Rapoport
  2025-07-13  7:17 ` [PATCH v3 5/8] execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP) Mike Rapoport
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2025-07-13  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Borislav Petkov, Christophe Leroy, Daniel Gomez,
	Dave Hansen, Ingo Molnar, Liam R. Howlett, Luis Chamberlain,
	Mark Rutland, Masami Hiramatsu, Mike Rapoport, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Steven Rostedt,
	Thomas Gleixner, Yann Ylavic, linux-kernel, linux-mm,
	linux-modules, linux-trace-kernel, x86

From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

to avoid static declarations.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 mm/execmem.c | 44 +++++++++++++++++++++-----------------------
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/mm/execmem.c b/mm/execmem.c
index 4670e97f8e4e..056d3caaf4a1 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -137,6 +137,27 @@ static int execmem_set_direct_map_valid(struct vm_struct *vm, bool valid)
 	return err;
 }
 
+static int execmem_force_rw(void *ptr, size_t size)
+{
+	unsigned int nr = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	unsigned long addr = (unsigned long)ptr;
+	int ret;
+
+	ret = set_memory_nx(addr, nr);
+	if (ret)
+		return ret;
+
+	return set_memory_rw(addr, nr);
+}
+
+int execmem_restore_rox(void *ptr, size_t size)
+{
+	unsigned int nr = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	unsigned long addr = (unsigned long)ptr;
+
+	return set_memory_rox(addr, nr);
+}
+
 static void execmem_cache_clean(struct work_struct *work)
 {
 	struct maple_tree *free_areas = &execmem_cache.free_areas;
@@ -328,8 +349,6 @@ static inline void *pending_free_clear(void *ptr)
 	return (void *)((unsigned long)ptr & ~PENDING_FREE_MASK);
 }
 
-static int execmem_force_rw(void *ptr, size_t size);
-
 static int __execmem_cache_free(struct ma_state *mas, void *ptr, gfp_t gfp_mask)
 {
 	size_t size = mas_range_len(mas);
@@ -415,27 +434,6 @@ static bool execmem_cache_free(void *ptr)
 	return true;
 }
 
-static int execmem_force_rw(void *ptr, size_t size)
-{
-	unsigned int nr = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	unsigned long addr = (unsigned long)ptr;
-	int ret;
-
-	ret = set_memory_nx(addr, nr);
-	if (ret)
-		return ret;
-
-	return set_memory_rw(addr, nr);
-}
-
-int execmem_restore_rox(void *ptr, size_t size)
-{
-	unsigned int nr = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	unsigned long addr = (unsigned long)ptr;
-
-	return set_memory_rox(addr, nr);
-}
-
 #else /* CONFIG_ARCH_HAS_EXECMEM_ROX */
 /*
  * when ROX cache is not used the permissions defined by architectures for
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 5/8] execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)
  2025-07-13  7:17 [PATCH v3 0/8] x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes Mike Rapoport
                   ` (3 preceding siblings ...)
  2025-07-13  7:17 ` [PATCH v3 4/8] execmem: move execmem_force_rw() and execmem_restore_rox() before use Mike Rapoport
@ 2025-07-13  7:17 ` Mike Rapoport
  2025-07-13  7:17 ` [PATCH v3 6/8] execmem: drop writable parameter from execmem_fill_trapping_insns() Mike Rapoport
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2025-07-13  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Borislav Petkov, Christophe Leroy, Daniel Gomez,
	Dave Hansen, Ingo Molnar, Liam R. Howlett, Luis Chamberlain,
	Mark Rutland, Masami Hiramatsu, Mike Rapoport, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Steven Rostedt,
	Thomas Gleixner, Yann Ylavic, linux-kernel, linux-mm,
	linux-modules, linux-trace-kernel, x86

From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

When execmem populates ROX cache it uses vmalloc(VM_ALLOW_HUGE_VMAP).
Although vmalloc falls back to allocating base pages if high order
allocation fails, it may happen that it still cannot allocate enough
memory.

Right now ROX cache is only used by modules and in majority of cases the
allocations happen at boot time when there's plenty of free memory, but
upcoming enabling ROX cache for ftrace and kprobes would mean that execmem
allocations can happen when the system is under memory pressure and a
failure to allocate large page worth of memory becomes more likely.

Fallback to regular vmalloc() if vmalloc(VM_ALLOW_HUGE_VMAP) fails.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 mm/execmem.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/execmem.c b/mm/execmem.c
index 056d3caaf4a1..04c35c3a9361 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -291,6 +291,11 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
 
 	alloc_size = round_up(size, PMD_SIZE);
 	p = execmem_vmalloc(range, alloc_size, PAGE_KERNEL, vm_flags);
+	if (!p) {
+		alloc_size = size;
+		p = execmem_vmalloc(range, alloc_size, PAGE_KERNEL, vm_flags);
+	}
+
 	if (!p)
 		return err;
 
@@ -462,7 +467,7 @@ void *execmem_alloc(enum execmem_type type, size_t size)
 	bool use_cache = range->flags & EXECMEM_ROX_CACHE;
 	unsigned long vm_flags = VM_FLUSH_RESET_PERMS;
 	pgprot_t pgprot = range->pgprot;
-	void *p;
+	void *p = NULL;
 
 	size = PAGE_ALIGN(size);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 6/8] execmem: drop writable parameter from execmem_fill_trapping_insns()
  2025-07-13  7:17 [PATCH v3 0/8] x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes Mike Rapoport
                   ` (4 preceding siblings ...)
  2025-07-13  7:17 ` [PATCH v3 5/8] execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP) Mike Rapoport
@ 2025-07-13  7:17 ` Mike Rapoport
  2025-07-13  7:17 ` [PATCH v3 7/8] x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations Mike Rapoport
  2025-07-13  7:17 ` [PATCH v3 8/8] x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations Mike Rapoport
  7 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2025-07-13  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Borislav Petkov, Christophe Leroy, Daniel Gomez,
	Dave Hansen, Ingo Molnar, Liam R. Howlett, Luis Chamberlain,
	Mark Rutland, Masami Hiramatsu, Mike Rapoport, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Steven Rostedt,
	Thomas Gleixner, Yann Ylavic, linux-kernel, linux-mm,
	linux-modules, linux-trace-kernel, x86

From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

After update of execmem_cache_free() that made memory writable before
updating it, there is no need to update read only memory, so the writable
parameter to execmem_fill_trapping_insns() is not needed. Drop it.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 arch/x86/mm/init.c      | 8 ++------
 include/linux/execmem.h | 3 +--
 mm/execmem.c            | 4 ++--
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 7456df985d96..dbc63f0d538f 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -1063,13 +1063,9 @@ unsigned long arch_max_swapfile_size(void)
 static struct execmem_info execmem_info __ro_after_init;
 
 #ifdef CONFIG_ARCH_HAS_EXECMEM_ROX
-void execmem_fill_trapping_insns(void *ptr, size_t size, bool writeable)
+void execmem_fill_trapping_insns(void *ptr, size_t size)
 {
-	/* fill memory with INT3 instructions */
-	if (writeable)
-		memset(ptr, INT3_INSN_OPCODE, size);
-	else
-		text_poke_set(ptr, INT3_INSN_OPCODE, size);
+	memset(ptr, INT3_INSN_OPCODE, size);
 }
 #endif
 
diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index 8b61b05da7d5..7de229134e30 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -60,12 +60,11 @@ enum execmem_range_flags {
  *				 will trap
  * @ptr:	pointer to memory to fill
  * @size:	size of the range to fill
- * @writable:	is the memory poited by @ptr is writable or ROX
  *
  * A hook for architecures to fill execmem ranges with invalid instructions.
  * Architectures that use EXECMEM_ROX_CACHE must implement this.
  */
-void execmem_fill_trapping_insns(void *ptr, size_t size, bool writable);
+void execmem_fill_trapping_insns(void *ptr, size_t size);
 
 /**
  * execmem_restore_rox - restore read-only-execute permissions
diff --git a/mm/execmem.c b/mm/execmem.c
index 04c35c3a9361..0822305413ec 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -304,7 +304,7 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
 		goto err_free_mem;
 
 	/* fill memory with instructions that will trap */
-	execmem_fill_trapping_insns(p, alloc_size, /* writable = */ true);
+	execmem_fill_trapping_insns(p, alloc_size);
 
 	err = set_memory_rox((unsigned long)p, vm->nr_pages);
 	if (err)
@@ -363,7 +363,7 @@ static int __execmem_cache_free(struct ma_state *mas, void *ptr, gfp_t gfp_mask)
 	if (err)
 		return err;
 
-	execmem_fill_trapping_insns(ptr, size, /* writable = */ true);
+	execmem_fill_trapping_insns(ptr, size);
 	execmem_restore_rox(ptr, size);
 
 	err = execmem_cache_add_locked(ptr, size, gfp_mask);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 7/8] x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations
  2025-07-13  7:17 [PATCH v3 0/8] x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes Mike Rapoport
                   ` (5 preceding siblings ...)
  2025-07-13  7:17 ` [PATCH v3 6/8] execmem: drop writable parameter from execmem_fill_trapping_insns() Mike Rapoport
@ 2025-07-13  7:17 ` Mike Rapoport
  2025-07-15  0:21   ` Masami Hiramatsu
  2025-07-13  7:17 ` [PATCH v3 8/8] x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations Mike Rapoport
  7 siblings, 1 reply; 14+ messages in thread
From: Mike Rapoport @ 2025-07-13  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Borislav Petkov, Christophe Leroy, Daniel Gomez,
	Dave Hansen, Ingo Molnar, Liam R. Howlett, Luis Chamberlain,
	Mark Rutland, Masami Hiramatsu, Mike Rapoport, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Steven Rostedt,
	Thomas Gleixner, Yann Ylavic, linux-kernel, linux-mm,
	linux-modules, linux-trace-kernel, x86

From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

x86::alloc_insn_page() always allocates ROX memory.

Instead of overriding this method, add EXECMEM_KPROBES entry in
execmem_info with pgprot set to PAGE_KERNEL_ROX and  use ROX cache when
configuration and CPU features allow it.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 arch/x86/kernel/kprobes/core.c | 18 ------------------
 arch/x86/mm/init.c             |  9 ++++++++-
 2 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 47cb8eb138ba..6079d15dab8c 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -481,24 +481,6 @@ static int prepare_singlestep(kprobe_opcode_t *buf, struct kprobe *p,
 	return len;
 }
 
-/* Make page to RO mode when allocate it */
-void *alloc_insn_page(void)
-{
-	void *page;
-
-	page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
-	if (!page)
-		return NULL;
-
-	/*
-	 * TODO: Once additional kernel code protection mechanisms are set, ensure
-	 * that the page was not maliciously altered and it is still zeroed.
-	 */
-	set_memory_rox((unsigned long)page, 1);
-
-	return page;
-}
-
 /* Kprobe x86 instruction emulation - only regs->ip or IF flag modifiers */
 
 static void kprobe_emulate_ifmodifiers(struct kprobe *p, struct pt_regs *regs)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index dbc63f0d538f..442fafd8ff52 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -1098,7 +1098,14 @@ struct execmem_info __init *execmem_arch_setup(void)
 				.pgprot	= pgprot,
 				.alignment = MODULE_ALIGN,
 			},
-			[EXECMEM_KPROBES ... EXECMEM_BPF] = {
+			[EXECMEM_KPROBES] = {
+				.flags	= flags,
+				.start	= start,
+				.end	= MODULES_END,
+				.pgprot	= PAGE_KERNEL_ROX,
+				.alignment = MODULE_ALIGN,
+			},
+			[EXECMEM_FTRACE ... EXECMEM_BPF] = {
 				.flags	= EXECMEM_KASAN_SHADOW,
 				.start	= start,
 				.end	= MODULES_END,
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 8/8] x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
  2025-07-13  7:17 [PATCH v3 0/8] x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes Mike Rapoport
                   ` (6 preceding siblings ...)
  2025-07-13  7:17 ` [PATCH v3 7/8] x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations Mike Rapoport
@ 2025-07-13  7:17 ` Mike Rapoport
  2025-07-14 16:22   ` Steven Rostedt
  2025-08-20 22:47   ` Steven Rostedt
  7 siblings, 2 replies; 14+ messages in thread
From: Mike Rapoport @ 2025-07-13  7:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Borislav Petkov, Christophe Leroy, Daniel Gomez,
	Dave Hansen, Ingo Molnar, Liam R. Howlett, Luis Chamberlain,
	Mark Rutland, Masami Hiramatsu, Mike Rapoport, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Steven Rostedt,
	Thomas Gleixner, Yann Ylavic, linux-kernel, linux-mm,
	linux-modules, linux-trace-kernel, x86

From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

For the most part ftrace uses text poking and can handle ROX memory.
The only place that requires writable memory is create_trampoline() that
updates the allocated memory and in the end makes it ROX.

Use execmem_alloc_rw() in x86::ftrace::alloc_tramp() and enable ROX cache
for EXECMEM_FTRACE when configuration and CPU features allow that.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 arch/x86/kernel/ftrace.c | 2 +-
 arch/x86/mm/init.c       | 9 ++++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 252e82bcfd2f..4450acec9390 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -263,7 +263,7 @@ void arch_ftrace_update_code(int command)
 
 static inline void *alloc_tramp(unsigned long size)
 {
-	return execmem_alloc(EXECMEM_FTRACE, size);
+	return execmem_alloc_rw(EXECMEM_FTRACE, size);
 }
 static inline void tramp_free(void *tramp)
 {
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 442fafd8ff52..bb57e93b4caf 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -1105,7 +1105,14 @@ struct execmem_info __init *execmem_arch_setup(void)
 				.pgprot	= PAGE_KERNEL_ROX,
 				.alignment = MODULE_ALIGN,
 			},
-			[EXECMEM_FTRACE ... EXECMEM_BPF] = {
+			[EXECMEM_FTRACE] = {
+				.flags	= flags,
+				.start	= start,
+				.end	= MODULES_END,
+				.pgprot	= pgprot,
+				.alignment = MODULE_ALIGN,
+			},
+			[EXECMEM_BPF] = {
 				.flags	= EXECMEM_KASAN_SHADOW,
 				.start	= start,
 				.end	= MODULES_END,
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 8/8] x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
  2025-07-13  7:17 ` [PATCH v3 8/8] x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations Mike Rapoport
@ 2025-07-14 16:22   ` Steven Rostedt
  2025-08-20 22:47   ` Steven Rostedt
  1 sibling, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2025-07-14 16:22 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Andy Lutomirski, Borislav Petkov, Christophe Leroy,
	Daniel Gomez, Dave Hansen, Ingo Molnar, Liam R. Howlett,
	Luis Chamberlain, Mark Rutland, Masami Hiramatsu, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Thomas Gleixner,
	Yann Ylavic, linux-kernel, linux-mm, linux-modules,
	linux-trace-kernel, x86

On Sun, 13 Jul 2025 10:17:30 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> For the most part ftrace uses text poking and can handle ROX memory.
> The only place that requires writable memory is create_trampoline() that
> updates the allocated memory and in the end makes it ROX.
> 
> Use execmem_alloc_rw() in x86::ftrace::alloc_tramp() and enable ROX cache
> for EXECMEM_FTRACE when configuration and CPU features allow that.
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 7/8] x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations
  2025-07-13  7:17 ` [PATCH v3 7/8] x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations Mike Rapoport
@ 2025-07-15  0:21   ` Masami Hiramatsu
  0 siblings, 0 replies; 14+ messages in thread
From: Masami Hiramatsu @ 2025-07-15  0:21 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Andy Lutomirski, Borislav Petkov, Christophe Leroy,
	Daniel Gomez, Dave Hansen, Ingo Molnar, Liam R. Howlett,
	Luis Chamberlain, Mark Rutland, Masami Hiramatsu, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Steven Rostedt,
	Thomas Gleixner, Yann Ylavic, linux-kernel, linux-mm,
	linux-modules, linux-trace-kernel, x86

On Sun, 13 Jul 2025 10:17:29 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> x86::alloc_insn_page() always allocates ROX memory.
> 
> Instead of overriding this method, add EXECMEM_KPROBES entry in
> execmem_info with pgprot set to PAGE_KERNEL_ROX and  use ROX cache when
> configuration and CPU features allow it.
> 

Looks good to me.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thanks!

> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
>  arch/x86/kernel/kprobes/core.c | 18 ------------------
>  arch/x86/mm/init.c             |  9 ++++++++-
>  2 files changed, 8 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 47cb8eb138ba..6079d15dab8c 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -481,24 +481,6 @@ static int prepare_singlestep(kprobe_opcode_t *buf, struct kprobe *p,
>  	return len;
>  }
>  
> -/* Make page to RO mode when allocate it */
> -void *alloc_insn_page(void)
> -{
> -	void *page;
> -
> -	page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
> -	if (!page)
> -		return NULL;
> -
> -	/*
> -	 * TODO: Once additional kernel code protection mechanisms are set, ensure
> -	 * that the page was not maliciously altered and it is still zeroed.
> -	 */
> -	set_memory_rox((unsigned long)page, 1);
> -
> -	return page;
> -}
> -
>  /* Kprobe x86 instruction emulation - only regs->ip or IF flag modifiers */
>  
>  static void kprobe_emulate_ifmodifiers(struct kprobe *p, struct pt_regs *regs)
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index dbc63f0d538f..442fafd8ff52 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -1098,7 +1098,14 @@ struct execmem_info __init *execmem_arch_setup(void)
>  				.pgprot	= pgprot,
>  				.alignment = MODULE_ALIGN,
>  			},
> -			[EXECMEM_KPROBES ... EXECMEM_BPF] = {
> +			[EXECMEM_KPROBES] = {
> +				.flags	= flags,
> +				.start	= start,
> +				.end	= MODULES_END,
> +				.pgprot	= PAGE_KERNEL_ROX,
> +				.alignment = MODULE_ALIGN,
> +			},
> +			[EXECMEM_FTRACE ... EXECMEM_BPF] = {
>  				.flags	= EXECMEM_KASAN_SHADOW,
>  				.start	= start,
>  				.end	= MODULES_END,
> -- 
> 2.47.2
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 8/8] x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
  2025-07-13  7:17 ` [PATCH v3 8/8] x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations Mike Rapoport
  2025-07-14 16:22   ` Steven Rostedt
@ 2025-08-20 22:47   ` Steven Rostedt
  2025-08-21  6:11     ` Mike Rapoport
  1 sibling, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2025-08-20 22:47 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Andy Lutomirski, Borislav Petkov, Christophe Leroy,
	Daniel Gomez, Dave Hansen, Ingo Molnar, Liam R. Howlett,
	Luis Chamberlain, Mark Rutland, Masami Hiramatsu, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Thomas Gleixner,
	Yann Ylavic, linux-kernel, linux-mm, linux-modules,
	linux-trace-kernel, x86

On Sun, 13 Jul 2025 10:17:30 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> For the most part ftrace uses text poking and can handle ROX memory.
> The only place that requires writable memory is create_trampoline() that
> updates the allocated memory and in the end makes it ROX.
> 
> Use execmem_alloc_rw() in x86::ftrace::alloc_tramp() and enable ROX cache
> for EXECMEM_FTRACE when configuration and CPU features allow that.
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---

The "ftrace=function" kernel command line started crashing with v6.17-rc1,
and I bisected it down to this commit:

 5d79c2be5081 ("x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations")

On boot I hit this:

[    0.159269] BUG: kernel NULL pointer dereference, address: 000000000000001c
[    0.160254] #PF: supervisor read access in kernel mode
[    0.160975] #PF: error_code(0x0000) - not-present page
[    0.161697] PGD 0 P4D 0
[    0.162055] Oops: Oops: 0000 [#1] SMP PTI
[    0.162619] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.17.0-rc2-test-00006-g48d06e78b7cb-dirty #9 PREEMPT(undef)
[    0.164141] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[    0.165439] RIP: 0010:kmem_cache_alloc_noprof (mm/slub.c:4237) 
[ 0.166186] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 e4 f0 48 83 ec 20 8b 05 c9 b6 7e 01 <44> 8b 77 1c 65 4c 8b 2d b5 ea 20 02 4c 89 6c 24 18 41 89 f5 21 f0
All code
========
   0:	90                   	nop
   1:	90                   	nop
   2:	90                   	nop
   3:	f3 0f 1e fa          	endbr64
   7:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
   c:	55                   	push   %rbp
   d:	48 89 e5             	mov    %rsp,%rbp
  10:	41 57                	push   %r15
  12:	41 56                	push   %r14
  14:	41 55                	push   %r13
  16:	41 54                	push   %r12
  18:	49 89 fc             	mov    %rdi,%r12
  1b:	53                   	push   %rbx
  1c:	48 83 e4 f0          	and    $0xfffffffffffffff0,%rsp
  20:	48 83 ec 20          	sub    $0x20,%rsp
  24:	8b 05 c9 b6 7e 01    	mov    0x17eb6c9(%rip),%eax        # 0x17eb6f3
  2a:*	44 8b 77 1c          	mov    0x1c(%rdi),%r14d		<-- trapping instruction
  2e:	65 4c 8b 2d b5 ea 20 	mov    %gs:0x220eab5(%rip),%r13        # 0x220eaeb
  35:	02 
  36:	4c 89 6c 24 18       	mov    %r13,0x18(%rsp)
  3b:	41 89 f5             	mov    %esi,%r13d
  3e:	21 f0                	and    %esi,%eax

Code starting with the faulting instruction
===========================================
   0:	44 8b 77 1c          	mov    0x1c(%rdi),%r14d
   4:	65 4c 8b 2d b5 ea 20 	mov    %gs:0x220eab5(%rip),%r13        # 0x220eac1
   b:	02 
   c:	4c 89 6c 24 18       	mov    %r13,0x18(%rsp)
  11:	41 89 f5             	mov    %esi,%r13d
  14:	21 f0                	and    %esi,%eax
[    0.168811] RSP: 0000:ffffffffb2e03b30 EFLAGS: 00010086
[    0.169545] RAX: 0000000001fff33f RBX: 0000000000000000 RCX: 0000000000000000
[    0.170544] RDX: 0000000000002800 RSI: 0000000000002800 RDI: 0000000000000000
[    0.171554] RBP: ffffffffb2e03b80 R08: 0000000000000004 R09: ffffffffb2e03c90
[    0.172549] R10: ffffffffb2e03c90 R11: 0000000000000000 R12: 0000000000000000
[    0.173544] R13: ffffffffb2e03c90 R14: ffffffffb2e03c90 R15: 0000000000000001
[    0.174542] FS:  0000000000000000(0000) GS:ffff9d2808114000(0000) knlGS:0000000000000000
[    0.175684] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.176486] CR2: 000000000000001c CR3: 000000007264c001 CR4: 00000000000200b0
[    0.177483] Call Trace:
[    0.177828]  <TASK>
[    0.178123] mas_alloc_nodes (lib/maple_tree.c:176 (discriminator 2) lib/maple_tree.c:1255 (discriminator 2)) 
[    0.178692] mas_store_gfp (lib/maple_tree.c:5468) 
[    0.179223] execmem_cache_add_locked (mm/execmem.c:207) 
[    0.179870] execmem_alloc (mm/execmem.c:213 mm/execmem.c:313 mm/execmem.c:335 mm/execmem.c:475) 
[    0.180397] ? ftrace_caller (arch/x86/kernel/ftrace_64.S:169) 
[    0.180922] ? __pfx_ftrace_caller (arch/x86/kernel/ftrace_64.S:158) 
[    0.181517] execmem_alloc_rw (mm/execmem.c:487) 
[    0.182052] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:266 arch/x86/kernel/ftrace.c:344 arch/x86/kernel/ftrace.c:474) 
[    0.182778] ? ftrace_caller_op_ptr (arch/x86/kernel/ftrace_64.S:182) 
[    0.183388] ftrace_update_trampoline (kernel/trace/ftrace.c:7947) 
[    0.184024] __register_ftrace_function (kernel/trace/ftrace.c:368) 
[    0.184682] ftrace_startup (kernel/trace/ftrace.c:3048) 
[    0.185205] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210) 
[    0.185877] register_ftrace_function_nolock (kernel/trace/ftrace.c:8717) 
[    0.186595] register_ftrace_function (kernel/trace/ftrace.c:8745) 
[    0.187254] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210) 
[    0.187924] function_trace_init (kernel/trace/trace_functions.c:170) 
[    0.188499] tracing_set_tracer (kernel/trace/trace.c:5916 kernel/trace/trace.c:6349) 
[    0.189088] register_tracer (kernel/trace/trace.c:2391) 
[    0.189642] early_trace_init (kernel/trace/trace.c:11075 kernel/trace/trace.c:11149) 
[    0.190204] start_kernel (init/main.c:970) 
[    0.190732] x86_64_start_reservations (arch/x86/kernel/head64.c:307) 
[    0.191381] x86_64_start_kernel (??:?) 
[    0.191955] common_startup_64 (arch/x86/kernel/head_64.S:419) 
[    0.192534]  </TASK>
[    0.192839] Modules linked in:
[    0.193267] CR2: 000000000000001c
[    0.193730] ---[ end trace 0000000000000000 ]---


-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 8/8] x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
  2025-08-20 22:47   ` Steven Rostedt
@ 2025-08-21  6:11     ` Mike Rapoport
  2025-08-21 19:25       ` Steven Rostedt
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Rapoport @ 2025-08-21  6:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Andy Lutomirski, Borislav Petkov, Christophe Leroy,
	Daniel Gomez, Dave Hansen, Ingo Molnar, Liam R. Howlett,
	Luis Chamberlain, Mark Rutland, Masami Hiramatsu, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Thomas Gleixner,
	Yann Ylavic, linux-kernel, linux-mm, linux-modules,
	linux-trace-kernel, x86

On Wed, Aug 20, 2025 at 06:47:43PM -0400, Steven Rostedt wrote:
> On Sun, 13 Jul 2025 10:17:30 +0300
> Mike Rapoport <rppt@kernel.org> wrote:
> 
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > 
> > For the most part ftrace uses text poking and can handle ROX memory.
> > The only place that requires writable memory is create_trampoline() that
> > updates the allocated memory and in the end makes it ROX.
> > 
> > Use execmem_alloc_rw() in x86::ftrace::alloc_tramp() and enable ROX cache
> > for EXECMEM_FTRACE when configuration and CPU features allow that.
> > 
> > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > ---
> 
> The "ftrace=function" kernel command line started crashing with v6.17-rc1,
> and I bisected it down to this commit:
> 
>  5d79c2be5081 ("x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations")
> 
> On boot I hit this:
> 
> [    0.159269] BUG: kernel NULL pointer dereference, address: 000000000000001c
> [    0.160254] #PF: supervisor read access in kernel mode
> [    0.160975] #PF: error_code(0x0000) - not-present page
> [    0.161697] PGD 0 P4D 0
> [    0.162055] Oops: Oops: 0000 [#1] SMP PTI
> [    0.162619] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.17.0-rc2-test-00006-g48d06e78b7cb-dirty #9 PREEMPT(undef)
> [    0.164141] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [    0.165439] RIP: 0010:kmem_cache_alloc_noprof (mm/slub.c:4237) 
> [    0.177483] Call Trace:
> [    0.177828]  <TASK>
> [    0.178123] mas_alloc_nodes (lib/maple_tree.c:176 (discriminator 2) lib/maple_tree.c:1255 (discriminator 2)) 
> [    0.178692] mas_store_gfp (lib/maple_tree.c:5468) 
> [    0.179223] execmem_cache_add_locked (mm/execmem.c:207) 
> [    0.179870] execmem_alloc (mm/execmem.c:213 mm/execmem.c:313 mm/execmem.c:335 mm/execmem.c:475) 
> [    0.180397] ? ftrace_caller (arch/x86/kernel/ftrace_64.S:169) 
> [    0.180922] ? __pfx_ftrace_caller (arch/x86/kernel/ftrace_64.S:158) 
> [    0.181517] execmem_alloc_rw (mm/execmem.c:487) 
> [    0.182052] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:266 arch/x86/kernel/ftrace.c:344 arch/x86/kernel/ftrace.c:474) 
> [    0.182778] ? ftrace_caller_op_ptr (arch/x86/kernel/ftrace_64.S:182) 
> [    0.183388] ftrace_update_trampoline (kernel/trace/ftrace.c:7947) 
> [    0.184024] __register_ftrace_function (kernel/trace/ftrace.c:368) 
> [    0.184682] ftrace_startup (kernel/trace/ftrace.c:3048) 
> [    0.185205] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210) 
> [    0.185877] register_ftrace_function_nolock (kernel/trace/ftrace.c:8717) 
> [    0.186595] register_ftrace_function (kernel/trace/ftrace.c:8745) 
> [    0.187254] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210) 
> [    0.187924] function_trace_init (kernel/trace/trace_functions.c:170) 
> [    0.188499] tracing_set_tracer (kernel/trace/trace.c:5916 kernel/trace/trace.c:6349) 
> [    0.189088] register_tracer (kernel/trace/trace.c:2391) 
> [    0.189642] early_trace_init (kernel/trace/trace.c:11075 kernel/trace/trace.c:11149) 
> [    0.190204] start_kernel (init/main.c:970) 
> [    0.190732] x86_64_start_reservations (arch/x86/kernel/head64.c:307) 
> [    0.191381] x86_64_start_kernel (??:?) 
> [    0.191955] common_startup_64 (arch/x86/kernel/head_64.S:419) 
> [    0.192534]  </TASK>
> [    0.192839] Modules linked in:
> [    0.193267] CR2: 000000000000001c
> [    0.193730] ---[ end trace 0000000000000000 ]---

maple tree is initialized after ftrace, so the patch below should fix it:

diff --git a/init/main.c b/init/main.c
index 0ee0ee7b7c2c..5753e9539ae6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -956,6 +956,7 @@ void start_kernel(void)
 	sort_main_extable();
 	trap_init();
 	mm_core_init();
+	maple_tree_init();
 	poking_init();
 	ftrace_init();
 
@@ -973,7 +974,6 @@ void start_kernel(void)
 		 "Interrupts were enabled *very* early, fixing it\n"))
 		local_irq_disable();
 	radix_tree_init();
-	maple_tree_init();
 
 	/*
 	 * Set up housekeeping before setting up workqueues to allow the unbound
 
> -- Steve

-- 
Sincerely yours,
Mike.

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 8/8] x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
  2025-08-21  6:11     ` Mike Rapoport
@ 2025-08-21 19:25       ` Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2025-08-21 19:25 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Andy Lutomirski, Borislav Petkov, Christophe Leroy,
	Daniel Gomez, Dave Hansen, Ingo Molnar, Liam R. Howlett,
	Luis Chamberlain, Mark Rutland, Masami Hiramatsu, H. Peter Anvin,
	Peter Zijlstra, Petr Pavlu, Sami Tolvanen, Thomas Gleixner,
	Yann Ylavic, linux-kernel, linux-mm, linux-modules,
	linux-trace-kernel, x86

On Thu, 21 Aug 2025 09:11:46 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> maple tree is initialized after ftrace, so the patch below should fix it:
> 
> diff --git a/init/main.c b/init/main.c
> index 0ee0ee7b7c2c..5753e9539ae6 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -956,6 +956,7 @@ void start_kernel(void)
>  	sort_main_extable();
>  	trap_init();
>  	mm_core_init();
> +	maple_tree_init();
>  	poking_init();
>  	ftrace_init();
>  
> @@ -973,7 +974,6 @@ void start_kernel(void)
>  		 "Interrupts were enabled *very* early, fixing it\n"))
>  		local_irq_disable();
>  	radix_tree_init();
> -	maple_tree_init();
>  
>  	/*
>  	 * Set up housekeeping before setting up workqueues to allow the unbound
>  

Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-08-21 19:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-13  7:17 [PATCH v3 0/8] x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes Mike Rapoport
2025-07-13  7:17 ` [PATCH v3 1/8] execmem: drop unused execmem_update_copy() Mike Rapoport
2025-07-13  7:17 ` [PATCH v3 2/8] execmem: introduce execmem_alloc_rw() Mike Rapoport
2025-07-13  7:17 ` [PATCH v3 3/8] execmem: rework execmem_cache_free() Mike Rapoport
2025-07-13  7:17 ` [PATCH v3 4/8] execmem: move execmem_force_rw() and execmem_restore_rox() before use Mike Rapoport
2025-07-13  7:17 ` [PATCH v3 5/8] execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP) Mike Rapoport
2025-07-13  7:17 ` [PATCH v3 6/8] execmem: drop writable parameter from execmem_fill_trapping_insns() Mike Rapoport
2025-07-13  7:17 ` [PATCH v3 7/8] x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations Mike Rapoport
2025-07-15  0:21   ` Masami Hiramatsu
2025-07-13  7:17 ` [PATCH v3 8/8] x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations Mike Rapoport
2025-07-14 16:22   ` Steven Rostedt
2025-08-20 22:47   ` Steven Rostedt
2025-08-21  6:11     ` Mike Rapoport
2025-08-21 19:25       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).