[PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot()
@ 2024-11-24 12:46 Sasha Levin
  2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 2/5] perf/x86/amd: Warn only on new bits set Sasha Levin
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Sasha Levin @ 2024-11-24 12:46 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Peter Zijlstra, Sasha Levin, mhiramat, mingo, acme,
	namhyung, linux-trace-kernel, linux-perf-users

From: Oleg Nesterov <oleg@redhat.com>

[ Upstream commit c7b4133c48445dde789ed30b19ccb0448c7593f7 ]

1. Clear utask->xol_vaddr unconditionally, even if this addr is not valid,
   xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.

2. Add a comment to explain why do we need to validate slot_addr.

3. Simplify the validation above. We can simply check offset < PAGE_SIZE,
   unsigned underflows are fine, it should work if slot_addr < area->vaddr.

4. Kill the unnecessary "slot_nr >= UINSNS_PER_PAGE" check, slot_nr must
   be valid if offset < PAGE_SIZE.

The next patches will cleanup this function even more.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240929144235.GA9471@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/events/uprobes.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 4b52cb2ae6d62..cc605df73d72f 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1683,8 +1683,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
 static void xol_free_insn_slot(struct task_struct *tsk)
 {
 	struct xol_area *area;
-	unsigned long vma_end;
 	unsigned long slot_addr;
+	unsigned long offset;
 
 	if (!tsk->mm || !tsk->mm->uprobes_state.xol_area || !tsk->utask)
 		return;
@@ -1693,24 +1693,21 @@ static void xol_free_insn_slot(struct task_struct *tsk)
 	if (unlikely(!slot_addr))
 		return;
 
+	tsk->utask->xol_vaddr = 0;
 	area = tsk->mm->uprobes_state.xol_area;
-	vma_end = area->vaddr + PAGE_SIZE;
-	if (area->vaddr <= slot_addr && slot_addr < vma_end) {
-		unsigned long offset;
-		int slot_nr;
-
-		offset = slot_addr - area->vaddr;
-		slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
-		if (slot_nr >= UINSNS_PER_PAGE)
-			return;
+	offset = slot_addr - area->vaddr;
+	/*
+	 * slot_addr must fit into [area->vaddr, area->vaddr + PAGE_SIZE).
+	 * This check can only fail if the "[uprobes]" vma was mremap'ed.
+	 */
+	if (offset < PAGE_SIZE) {
+		int slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
 
 		clear_bit(slot_nr, area->bitmap);
 		atomic_dec(&area->slot_count);
 		smp_mb__after_atomic(); /* pairs with prepare_to_wait() */
 		if (waitqueue_active(&area->wq))
 			wake_up(&area->wq);
-
-		tsk->utask->xol_vaddr = 0;
 	}
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.12 2/5] perf/x86/amd: Warn only on new bits set
  2024-11-24 12:46 [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot() Sasha Levin
@ 2024-11-24 12:46 ` Sasha Levin
  2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 3/5] locking/ww_mutex: Adjust to lockdep nest_lock requirements Sasha Levin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2024-11-24 12:46 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Breno Leitao, Paul E . McKenney, Peter Zijlstra, Sandipan Das,
	Sasha Levin, mingo, acme, namhyung, tglx, bp, dave.hansen, x86,
	linux-perf-users

From: Breno Leitao <leitao@debian.org>

[ Upstream commit de20037e1b3c2f2ca97b8c12b8c7bca8abd509a7 ]

Warning at every leaking bits can cause a flood of message, triggering
various stall-warning mechanisms to fire, including CSD locks, which
makes the machine to be unusable.

Track the bits that are being leaked, and only warn when a new bit is
set.

That said, this patch will help with the following issues:

1) It will tell us which bits are being set, so, it is easy to
   communicate it back to vendor, and to do a root-cause analyzes.

2) It avoid the machine to be unusable, because, worst case
   scenario, the user gets less than 60 WARNs (one per unhandled bit).

Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lkml.kernel.org/r/20241001141020.2620361-1-leitao@debian.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/events/amd/core.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 920e3a640cadd..b4a1a2576510e 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -943,11 +943,12 @@ static int amd_pmu_v2_snapshot_branch_stack(struct perf_branch_entry *entries, u
 static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	static atomic64_t status_warned = ATOMIC64_INIT(0);
+	u64 reserved, status, mask, new_bits, prev_bits;
 	struct perf_sample_data data;
 	struct hw_perf_event *hwc;
 	struct perf_event *event;
 	int handled = 0, idx;
-	u64 reserved, status, mask;
 	bool pmu_enabled;
 
 	/*
@@ -1012,7 +1013,12 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
 	 * the corresponding PMCs are expected to be inactive according to the
 	 * active_mask
 	 */
-	WARN_ON(status > 0);
+	if (status > 0) {
+		prev_bits = atomic64_fetch_or(status, &status_warned);
+		// A new bit was set for the very first time.
+		new_bits = status & ~prev_bits;
+		WARN(new_bits, "New overflows for inactive PMCs: %llx\n", new_bits);
+	}
 
 	/* Clear overflow and freeze bits */
 	amd_pmu_ack_global_status(~status);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.12 3/5] locking/ww_mutex: Adjust to lockdep nest_lock requirements
  2024-11-24 12:46 [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot() Sasha Levin
  2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 2/5] perf/x86/amd: Warn only on new bits set Sasha Levin
@ 2024-11-24 12:46 ` Sasha Levin
  2024-11-25 10:06   ` Thomas Hellström
  2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 4/5] cleanup: Adjust scoped_guard() macros to avoid potential warning Sasha Levin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Sasha Levin @ 2024-11-24 12:46 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Thomas Hellström, Peter Zijlstra, Sasha Levin, mingo, will

From: Thomas Hellström <thomas.hellstrom@linux.intel.com>

[ Upstream commit 823a566221a5639f6c69424897218e5d6431a970 ]

When using mutex_acquire_nest() with a nest_lock, lockdep refcounts the
number of acquired lockdep_maps of mutexes of the same class, and also
keeps a pointer to the first acquired lockdep_map of a class. That pointer
is then used for various comparison-, printing- and checking purposes,
but there is no mechanism to actively ensure that lockdep_map stays in
memory. Instead, a warning is printed if the lockdep_map is freed and
there are still held locks of the same lock class, even if the lockdep_map
itself has been released.

In the context of WW/WD transactions that means that if a user unlocks
and frees a ww_mutex from within an ongoing ww transaction, and that
mutex happens to be the first ww_mutex grabbed in the transaction,
such a warning is printed and there might be a risk of a UAF.

Note that this is only problem when lockdep is enabled and affects only
dereferences of struct lockdep_map.

Adjust to this by adding a fake lockdep_map to the acquired context and
make sure it is the first acquired lockdep map of the associated
ww_mutex class. Then hold it for the duration of the WW/WD transaction.

This has the side effect that trying to lock a ww mutex *without* a
ww_acquire_context but where a such context has been acquire, we'd see
a lockdep splat. The test-ww_mutex.c selftest attempts to do that, so
modify that particular test to not acquire a ww_acquire_context if it
is not going to be used.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241009092031.6356-1-thomas.hellstrom@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/linux/ww_mutex.h       | 14 ++++++++++++++
 kernel/locking/test-ww_mutex.c |  8 +++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index bb763085479af..a401a2f31a775 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -65,6 +65,16 @@ struct ww_acquire_ctx {
 #endif
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map dep_map;
+	/**
+	 * @first_lock_dep_map: fake lockdep_map for first locked ww_mutex.
+	 *
+	 * lockdep requires the lockdep_map for the first locked ww_mutex
+	 * in a ww transaction to remain in memory until all ww_mutexes of
+	 * the transaction have been unlocked. Ensure this by keeping a
+	 * fake locked ww_mutex lockdep map between ww_acquire_init() and
+	 * ww_acquire_fini().
+	 */
+	struct lockdep_map first_lock_dep_map;
 #endif
 #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
 	unsigned int deadlock_inject_interval;
@@ -146,7 +156,10 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
 	debug_check_no_locks_freed((void *)ctx, sizeof(*ctx));
 	lockdep_init_map(&ctx->dep_map, ww_class->acquire_name,
 			 &ww_class->acquire_key, 0);
+	lockdep_init_map(&ctx->first_lock_dep_map, ww_class->mutex_name,
+			 &ww_class->mutex_key, 0);
 	mutex_acquire(&ctx->dep_map, 0, 0, _RET_IP_);
+	mutex_acquire_nest(&ctx->first_lock_dep_map, 0, 0, &ctx->dep_map, _RET_IP_);
 #endif
 #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
 	ctx->deadlock_inject_interval = 1;
@@ -185,6 +198,7 @@ static inline void ww_acquire_done(struct ww_acquire_ctx *ctx)
 static inline void ww_acquire_fini(struct ww_acquire_ctx *ctx)
 {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
+	mutex_release(&ctx->first_lock_dep_map, _THIS_IP_);
 	mutex_release(&ctx->dep_map, _THIS_IP_);
 #endif
 #ifdef DEBUG_WW_MUTEXES
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 10a5736a21c22..5d58b2c0ef98b 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -62,7 +62,8 @@ static int __test_mutex(unsigned int flags)
 	int ret;
 
 	ww_mutex_init(&mtx.mutex, &ww_class);
-	ww_acquire_init(&ctx, &ww_class);
+	if (flags & TEST_MTX_CTX)
+		ww_acquire_init(&ctx, &ww_class);
 
 	INIT_WORK_ONSTACK(&mtx.work, test_mutex_work);
 	init_completion(&mtx.ready);
@@ -90,7 +91,8 @@ static int __test_mutex(unsigned int flags)
 		ret = wait_for_completion_timeout(&mtx.done, TIMEOUT);
 	}
 	ww_mutex_unlock(&mtx.mutex);
-	ww_acquire_fini(&ctx);
+	if (flags & TEST_MTX_CTX)
+		ww_acquire_fini(&ctx);
 
 	if (ret) {
 		pr_err("%s(flags=%x): mutual exclusion failure\n",
@@ -679,7 +681,7 @@ static int __init test_ww_mutex_init(void)
 	if (ret)
 		return ret;
 
-	ret = stress(2047, hweight32(STRESS_ALL)*ncpus, STRESS_ALL);
+	ret = stress(2046, hweight32(STRESS_ALL)*ncpus, STRESS_ALL);
 	if (ret)
 		return ret;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.12 4/5] cleanup: Adjust scoped_guard() macros to avoid potential warning
  2024-11-24 12:46 [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot() Sasha Levin
  2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 2/5] perf/x86/amd: Warn only on new bits set Sasha Levin
  2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 3/5] locking/ww_mutex: Adjust to lockdep nest_lock requirements Sasha Levin
@ 2024-11-24 12:46 ` Sasha Levin
  2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 5/5] timekeeping: Always check for negative motion Sasha Levin
  2024-11-24 13:13 ` [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot() Oleg Nesterov
  4 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2024-11-24 12:46 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Przemek Kitszel, Peter Zijlstra, Dmitry Torokhov, Sasha Levin,
	brauner, viro, dan.j.williams, kevin.tian, dlechner, mingo,
	ubizjak

From: Przemek Kitszel <przemyslaw.kitszel@intel.com>

[ Upstream commit fcc22ac5baf06dd17193de44b60dbceea6461983 ]

Change scoped_guard() and scoped_cond_guard() macros to make reasoning
about them easier for static analysis tools (smatch, compiler
diagnostics), especially to enable them to tell if the given usage of
scoped_guard() is with a conditional lock class (interruptible-locks,
try-locks) or not (like simple mutex_lock()).

Add compile-time error if scoped_cond_guard() is used for non-conditional
lock class.

Beyond easier tooling and a little shrink reported by bloat-o-meter
this patch enables developer to write code like:

int foo(struct my_drv *adapter)
{
	scoped_guard(spinlock, &adapter->some_spinlock)
		return adapter->spinlock_protected_var;
}

Current scoped_guard() implementation does not support that,
due to compiler complaining:
error: control reaches end of non-void function [-Werror=return-type]

Technical stuff about the change:
scoped_guard() macro uses common idiom of using "for" statement to declare
a scoped variable. Unfortunately, current logic is too hard for compiler
diagnostics to be sure that there is exactly one loop step; fix that.

To make any loop so trivial that there is no above warning, it must not
depend on any non-const variable to tell if there are more steps. There is
no obvious solution for that in C, but one could use the compound
statement expression with "goto" jumping past the "loop", effectively
leaving only the subscope part of the loop semantics.

More impl details:
one more level of macro indirection is now needed to avoid duplicating
label names;
I didn't spot any other place that is using the
"for (...; goto label) if (0) label: break;" idiom, so it's not packed for
reuse beyond scoped_guard() family, what makes actual macros code cleaner.

There was also a need to introduce const true/false variable per lock
class, it is used to aid compiler diagnostics reasoning about "exactly
1 step" loops (note that converting that to function would undo the whole
benefit).

Big thanks to Andy Shevchenko for help on this patch, both internal and
public, ranging from whitespace/formatting, through commit message
clarifications, general improvements, ending with presenting alternative
approaches - all despite not even liking the idea.

Big thanks to Dmitry Torokhov for the idea of compile-time check for
scoped_cond_guard() (to use it only with conditional locsk), and general
improvements for the patch.

Big thanks to David Lechner for idea to cover also scoped_cond_guard().

Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://lkml.kernel.org/r/20241018113823.171256-1-przemyslaw.kitszel@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/linux/cleanup.h | 52 +++++++++++++++++++++++++++++++++--------
 1 file changed, 42 insertions(+), 10 deletions(-)

diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index 038b2d523bf88..9464724b99737 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -285,14 +285,20 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \
  *      similar to scoped_guard(), except it does fail when the lock
  *      acquire fails.
  *
+ *      Only for conditional locks.
  */
 
+#define __DEFINE_CLASS_IS_CONDITIONAL(_name, _is_cond)	\
+static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
+
 #define DEFINE_GUARD(_name, _type, _lock, _unlock) \
+	__DEFINE_CLASS_IS_CONDITIONAL(_name, false); \
 	DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
 	static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \
 	{ return *_T; }
 
 #define DEFINE_GUARD_COND(_name, _ext, _condlock) \
+	__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \
 	EXTEND_CLASS(_name, _ext, \
 		     ({ void *_t = _T; if (_T && !(_condlock)) _t = NULL; _t; }), \
 		     class_##_name##_t _T) \
@@ -303,17 +309,40 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \
 	CLASS(_name, __UNIQUE_ID(guard))
 
 #define __guard_ptr(_name) class_##_name##_lock_ptr
+#define __is_cond_ptr(_name) class_##_name##_is_conditional
 
-#define scoped_guard(_name, args...)					\
-	for (CLASS(_name, scope)(args),					\
-	     *done = NULL; __guard_ptr(_name)(&scope) && !done; done = (void *)1)
-
-#define scoped_cond_guard(_name, _fail, args...) \
-	for (CLASS(_name, scope)(args), \
-	     *done = NULL; !done; done = (void *)1) \
-		if (!__guard_ptr(_name)(&scope)) _fail; \
-		else
-
+/*
+ * Helper macro for scoped_guard().
+ *
+ * Note that the "!__is_cond_ptr(_name)" part of the condition ensures that
+ * compiler would be sure that for the unconditional locks the body of the
+ * loop (caller-provided code glued to the else clause) could not be skipped.
+ * It is needed because the other part - "__guard_ptr(_name)(&scope)" - is too
+ * hard to deduce (even if could be proven true for unconditional locks).
+ */
+#define __scoped_guard(_name, _label, args...)				\
+	for (CLASS(_name, scope)(args);					\
+	     __guard_ptr(_name)(&scope) || !__is_cond_ptr(_name);	\
+	     ({ goto _label; }))					\
+		if (0) {						\
+_label:									\
+			break;						\
+		} else
+
+#define scoped_guard(_name, args...)	\
+	__scoped_guard(_name, __UNIQUE_ID(label), args)
+
+#define __scoped_cond_guard(_name, _fail, _label, args...)		\
+	for (CLASS(_name, scope)(args); true; ({ goto _label; }))	\
+		if (!__guard_ptr(_name)(&scope)) {			\
+			BUILD_BUG_ON(!__is_cond_ptr(_name));		\
+			_fail;						\
+_label:									\
+			break;						\
+		} else
+
+#define scoped_cond_guard(_name, _fail, args...)	\
+	__scoped_cond_guard(_name, _fail, __UNIQUE_ID(label), args)
 /*
  * Additional helper macros for generating lock guards with types, either for
  * locks that don't have a native type (eg. RCU, preempt) or those that need a
@@ -369,14 +398,17 @@ static inline class_##_name##_t class_##_name##_constructor(void)	\
 }
 
 #define DEFINE_LOCK_GUARD_1(_name, _type, _lock, _unlock, ...)		\
+__DEFINE_CLASS_IS_CONDITIONAL(_name, false);				\
 __DEFINE_UNLOCK_GUARD(_name, _type, _unlock, __VA_ARGS__)		\
 __DEFINE_LOCK_GUARD_1(_name, _type, _lock)
 
 #define DEFINE_LOCK_GUARD_0(_name, _lock, _unlock, ...)			\
+__DEFINE_CLASS_IS_CONDITIONAL(_name, false);				\
 __DEFINE_UNLOCK_GUARD(_name, void, _unlock, __VA_ARGS__)		\
 __DEFINE_LOCK_GUARD_0(_name, _lock)
 
 #define DEFINE_LOCK_GUARD_1_COND(_name, _ext, _condlock)		\
+	__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true);		\
 	EXTEND_CLASS(_name, _ext,					\
 		     ({ class_##_name##_t _t = { .lock = l }, *_T = &_t;\
 		        if (_T->lock && !(_condlock)) _T->lock = NULL;	\
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.12 5/5] timekeeping: Always check for negative motion
  2024-11-24 12:46 [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot() Sasha Levin
                   ` (2 preceding siblings ...)
  2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 4/5] cleanup: Adjust scoped_guard() macros to avoid potential warning Sasha Levin
@ 2024-11-24 12:46 ` Sasha Levin
  2024-11-24 13:13 ` [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot() Oleg Nesterov
  4 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2024-11-24 12:46 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Thomas Gleixner, John Stultz, Sasha Levin, mingo, bp, dave.hansen,
	x86, rdunlap, paulmck

From: Thomas Gleixner <tglx@linutronix.de>

[ Upstream commit c163e40af9b2331b2c629fd4ec8b703ed4d4ae39 ]

clocksource_delta() has two variants. One with a check for negative motion,
which is only selected by x86. This is a historic leftover as this function
was previously used in the time getter hot paths.

Since 135225a363ae timekeeping_cycles_to_ns() has unconditional protection
against this as a by-product of the protection against 64bit math overflow.

clocksource_delta() is only used in the clocksource watchdog and in
timekeeping_advance(). The extra conditional there is not hurting anyone.

Remove the config option and unconditionally prevent negative motion of the
readout.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20241031120328.599430157@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/Kconfig                   | 1 -
 kernel/time/Kconfig                | 5 -----
 kernel/time/timekeeping_internal.h | 7 -------
 3 files changed, 13 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7b9a7e8f39acc..171be04eca1f5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -145,7 +145,6 @@ config X86
 	select ARCH_HAS_PARANOID_L1D_FLUSH
 	select BUILDTIME_TABLE_SORT
 	select CLKEVT_I8253
-	select CLOCKSOURCE_VALIDATE_LAST_CYCLE
 	select CLOCKSOURCE_WATCHDOG
 	# Word-size accesses may read uninitialized data past the trailing \0
 	# in strings and cause false KMSAN reports.
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 8ebb6d5a106be..b0b97a60aaa6f 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -17,11 +17,6 @@ config ARCH_CLOCKSOURCE_DATA
 config ARCH_CLOCKSOURCE_INIT
 	bool
 
-# Clocksources require validation of the clocksource against the last
-# cycle update - x86/TSC misfeature
-config CLOCKSOURCE_VALIDATE_LAST_CYCLE
-	bool
-
 # Timekeeping vsyscall support
 config GENERIC_TIME_VSYSCALL
 	bool
diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h
index 4ca2787d1642e..1d4854d5c386e 100644
--- a/kernel/time/timekeeping_internal.h
+++ b/kernel/time/timekeeping_internal.h
@@ -15,7 +15,6 @@ extern void tk_debug_account_sleep_time(const struct timespec64 *t);
 #define tk_debug_account_sleep_time(x)
 #endif
 
-#ifdef CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE
 static inline u64 clocksource_delta(u64 now, u64 last, u64 mask)
 {
 	u64 ret = (now - last) & mask;
@@ -26,12 +25,6 @@ static inline u64 clocksource_delta(u64 now, u64 last, u64 mask)
 	 */
 	return ret & ~(mask >> 1) ? 0 : ret;
 }
-#else
-static inline u64 clocksource_delta(u64 now, u64 last, u64 mask)
-{
-	return (now - last) & mask;
-}
-#endif
 
 /* Semi public for serialization of non timekeeper VDSO updates. */
 extern raw_spinlock_t timekeeper_lock;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot()
  2024-11-24 12:46 [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot() Sasha Levin
                   ` (3 preceding siblings ...)
  2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 5/5] timekeeping: Always check for negative motion Sasha Levin
@ 2024-11-24 13:13 ` Oleg Nesterov
  2024-11-24 14:15   ` Sasha Levin
  4 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2024-11-24 13:13 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Peter Zijlstra, mhiramat, mingo, acme,
	namhyung, linux-trace-kernel, linux-perf-users

Hi Sasha,

but why do you think it makes sense to backport these "uprobes" cleanups?

Oleg.

On 11/24, Sasha Levin wrote:
>
> From: Oleg Nesterov <oleg@redhat.com>
>
> [ Upstream commit c7b4133c48445dde789ed30b19ccb0448c7593f7 ]
>
> 1. Clear utask->xol_vaddr unconditionally, even if this addr is not valid,
>    xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
>
> 2. Add a comment to explain why do we need to validate slot_addr.
>
> 3. Simplify the validation above. We can simply check offset < PAGE_SIZE,
>    unsigned underflows are fine, it should work if slot_addr < area->vaddr.
>
> 4. Kill the unnecessary "slot_nr >= UINSNS_PER_PAGE" check, slot_nr must
>    be valid if offset < PAGE_SIZE.
>
> The next patches will cleanup this function even more.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lore.kernel.org/r/20240929144235.GA9471@redhat.com
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  kernel/events/uprobes.c | 21 +++++++++------------
>  1 file changed, 9 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 4b52cb2ae6d62..cc605df73d72f 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1683,8 +1683,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
>  static void xol_free_insn_slot(struct task_struct *tsk)
>  {
>  	struct xol_area *area;
> -	unsigned long vma_end;
>  	unsigned long slot_addr;
> +	unsigned long offset;
>
>  	if (!tsk->mm || !tsk->mm->uprobes_state.xol_area || !tsk->utask)
>  		return;
> @@ -1693,24 +1693,21 @@ static void xol_free_insn_slot(struct task_struct *tsk)
>  	if (unlikely(!slot_addr))
>  		return;
>
> +	tsk->utask->xol_vaddr = 0;
>  	area = tsk->mm->uprobes_state.xol_area;
> -	vma_end = area->vaddr + PAGE_SIZE;
> -	if (area->vaddr <= slot_addr && slot_addr < vma_end) {
> -		unsigned long offset;
> -		int slot_nr;
> -
> -		offset = slot_addr - area->vaddr;
> -		slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
> -		if (slot_nr >= UINSNS_PER_PAGE)
> -			return;
> +	offset = slot_addr - area->vaddr;
> +	/*
> +	 * slot_addr must fit into [area->vaddr, area->vaddr + PAGE_SIZE).
> +	 * This check can only fail if the "[uprobes]" vma was mremap'ed.
> +	 */
> +	if (offset < PAGE_SIZE) {
> +		int slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
>
>  		clear_bit(slot_nr, area->bitmap);
>  		atomic_dec(&area->slot_count);
>  		smp_mb__after_atomic(); /* pairs with prepare_to_wait() */
>  		if (waitqueue_active(&area->wq))
>  			wake_up(&area->wq);
> -
> -		tsk->utask->xol_vaddr = 0;
>  	}
>  }
>
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot()
  2024-11-24 13:13 ` [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot() Oleg Nesterov
@ 2024-11-24 14:15   ` Sasha Levin
  2024-11-24 14:36     ` Oleg Nesterov
  0 siblings, 1 reply; 10+ messages in thread
From: Sasha Levin @ 2024-11-24 14:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, stable, Peter Zijlstra, mhiramat, mingo, acme,
	namhyung, linux-trace-kernel, linux-perf-users

On Sun, Nov 24, 2024 at 02:13:57PM +0100, Oleg Nesterov wrote:
>Hi Sasha,
>
>but why do you think it makes sense to backport these "uprobes" cleanups?

If it doesn't, I'm happy to drop them. This commit was selected mostly
because of:

>> 1. Clear utask->xol_vaddr unconditionally, even if this addr is not valid,
>>    xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.

Which sounded to me like more than a cleanup.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot()
  2024-11-24 14:15   ` Sasha Levin
@ 2024-11-24 14:36     ` Oleg Nesterov
  0 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2024-11-24 14:36 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Peter Zijlstra, mhiramat, mingo, acme,
	namhyung, linux-trace-kernel, linux-perf-users

On 11/24, Sasha Levin wrote:
>
> On Sun, Nov 24, 2024 at 02:13:57PM +0100, Oleg Nesterov wrote:
> >Hi Sasha,
> >
> >but why do you think it makes sense to backport these "uprobes" cleanups?
>
> If it doesn't, I'm happy to drop them. This commit was selected mostly
> because of:

I'd suggest to drop.

> >>1. Clear utask->xol_vaddr unconditionally, even if this addr is not valid,
> >>   xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
>
> Which sounded to me like more than a cleanup.

Sorry for confusion. This patch doesn't make much sense without the next
changes.

Oleg.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH AUTOSEL 6.12 3/5] locking/ww_mutex: Adjust to lockdep nest_lock requirements
  2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 3/5] locking/ww_mutex: Adjust to lockdep nest_lock requirements Sasha Levin
@ 2024-11-25 10:06   ` Thomas Hellström
  2024-11-25 13:34     ` Sasha Levin
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Hellström @ 2024-11-25 10:06 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable; +Cc: Peter Zijlstra, mingo, will

On Sun, 2024-11-24 at 07:46 -0500, Sasha Levin wrote:
> From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> [ Upstream commit 823a566221a5639f6c69424897218e5d6431a970 ]
> 
> When using mutex_acquire_nest() with a nest_lock, lockdep refcounts
> the
> number of acquired lockdep_maps of mutexes of the same class, and
> also
> keeps a pointer to the first acquired lockdep_map of a class. That
> pointer
> is then used for various comparison-, printing- and checking
> purposes,
> but there is no mechanism to actively ensure that lockdep_map stays
> in
> memory. Instead, a warning is printed if the lockdep_map is freed and
> there are still held locks of the same lock class, even if the
> lockdep_map
> itself has been released.
> 
> In the context of WW/WD transactions that means that if a user
> unlocks
> and frees a ww_mutex from within an ongoing ww transaction, and that
> mutex happens to be the first ww_mutex grabbed in the transaction,
> such a warning is printed and there might be a risk of a UAF.
> 
> Note that this is only problem when lockdep is enabled and affects
> only
> dereferences of struct lockdep_map.
> 
> Adjust to this by adding a fake lockdep_map to the acquired context
> and
> make sure it is the first acquired lockdep map of the associated
> ww_mutex class. Then hold it for the duration of the WW/WD
> transaction.
> 
> This has the side effect that trying to lock a ww mutex *without* a
> ww_acquire_context but where a such context has been acquire, we'd
> see
> a lockdep splat. The test-ww_mutex.c selftest attempts to do that, so
> modify that particular test to not acquire a ww_acquire_context if it
> is not going to be used.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link:
> https://lkml.kernel.org/r/20241009092031.6356-1-thomas.hellstrom@linux.intel.com
> Signed-off-by: Sasha Levin <sashal@kernel.org>

It looks like the last version was not picked for this patch.

https://lore.kernel.org/all/172918113162.1279674.6570518059490493206@2413ebb6fbb6/T/

The version in this autosel patch regresses the locking api selftests
and should not be backported. Same for the corresponding backports for
6.11 and 6.6. Let me know if I should reply separately to those.

Thanks,
Thomas

> ---
>  include/linux/ww_mutex.h       | 14 ++++++++++++++
>  kernel/locking/test-ww_mutex.c |  8 +++++---
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
> index bb763085479af..a401a2f31a775 100644
> --- a/include/linux/ww_mutex.h
> +++ b/include/linux/ww_mutex.h
> @@ -65,6 +65,16 @@ struct ww_acquire_ctx {
>  #endif
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  	struct lockdep_map dep_map;
> +	/**
> +	 * @first_lock_dep_map: fake lockdep_map for first locked
> ww_mutex.
> +	 *
> +	 * lockdep requires the lockdep_map for the first locked
> ww_mutex
> +	 * in a ww transaction to remain in memory until all
> ww_mutexes of
> +	 * the transaction have been unlocked. Ensure this by
> keeping a
> +	 * fake locked ww_mutex lockdep map between
> ww_acquire_init() and
> +	 * ww_acquire_fini().
> +	 */
> +	struct lockdep_map first_lock_dep_map;
>  #endif
>  #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
>  	unsigned int deadlock_inject_interval;
> @@ -146,7 +156,10 @@ static inline void ww_acquire_init(struct
> ww_acquire_ctx *ctx,
>  	debug_check_no_locks_freed((void *)ctx, sizeof(*ctx));
>  	lockdep_init_map(&ctx->dep_map, ww_class->acquire_name,
>  			 &ww_class->acquire_key, 0);
> +	lockdep_init_map(&ctx->first_lock_dep_map, ww_class-
> >mutex_name,
> +			 &ww_class->mutex_key, 0);
>  	mutex_acquire(&ctx->dep_map, 0, 0, _RET_IP_);
> +	mutex_acquire_nest(&ctx->first_lock_dep_map, 0, 0, &ctx-
> >dep_map, _RET_IP_);
>  #endif
>  #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
>  	ctx->deadlock_inject_interval = 1;
> @@ -185,6 +198,7 @@ static inline void ww_acquire_done(struct
> ww_acquire_ctx *ctx)
>  static inline void ww_acquire_fini(struct ww_acquire_ctx *ctx)
>  {
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
> +	mutex_release(&ctx->first_lock_dep_map, _THIS_IP_);
>  	mutex_release(&ctx->dep_map, _THIS_IP_);
>  #endif
>  #ifdef DEBUG_WW_MUTEXES
> diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-
> ww_mutex.c
> index 10a5736a21c22..5d58b2c0ef98b 100644
> --- a/kernel/locking/test-ww_mutex.c
> +++ b/kernel/locking/test-ww_mutex.c
> @@ -62,7 +62,8 @@ static int __test_mutex(unsigned int flags)
>  	int ret;
>  
>  	ww_mutex_init(&mtx.mutex, &ww_class);
> -	ww_acquire_init(&ctx, &ww_class);
> +	if (flags & TEST_MTX_CTX)
> +		ww_acquire_init(&ctx, &ww_class);
>  
>  	INIT_WORK_ONSTACK(&mtx.work, test_mutex_work);
>  	init_completion(&mtx.ready);
> @@ -90,7 +91,8 @@ static int __test_mutex(unsigned int flags)
>  		ret = wait_for_completion_timeout(&mtx.done,
> TIMEOUT);
>  	}
>  	ww_mutex_unlock(&mtx.mutex);
> -	ww_acquire_fini(&ctx);
> +	if (flags & TEST_MTX_CTX)
> +		ww_acquire_fini(&ctx);
>  
>  	if (ret) {
>  		pr_err("%s(flags=%x): mutual exclusion failure\n",
> @@ -679,7 +681,7 @@ static int __init test_ww_mutex_init(void)
>  	if (ret)
>  		return ret;
>  
> -	ret = stress(2047, hweight32(STRESS_ALL)*ncpus, STRESS_ALL);
> +	ret = stress(2046, hweight32(STRESS_ALL)*ncpus, STRESS_ALL);
>  	if (ret)
>  		return ret;
>  


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH AUTOSEL 6.12 3/5] locking/ww_mutex: Adjust to lockdep nest_lock requirements
  2024-11-25 10:06   ` Thomas Hellström
@ 2024-11-25 13:34     ` Sasha Levin
  0 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2024-11-25 13:34 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: linux-kernel, stable, Peter Zijlstra, mingo, will

On Mon, Nov 25, 2024 at 11:06:38AM +0100, Thomas Hellström wrote:
>On Sun, 2024-11-24 at 07:46 -0500, Sasha Levin wrote:
>> From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>
>> [ Upstream commit 823a566221a5639f6c69424897218e5d6431a970 ]
>>
>> When using mutex_acquire_nest() with a nest_lock, lockdep refcounts
>> the
>> number of acquired lockdep_maps of mutexes of the same class, and
>> also
>> keeps a pointer to the first acquired lockdep_map of a class. That
>> pointer
>> is then used for various comparison-, printing- and checking
>> purposes,
>> but there is no mechanism to actively ensure that lockdep_map stays
>> in
>> memory. Instead, a warning is printed if the lockdep_map is freed and
>> there are still held locks of the same lock class, even if the
>> lockdep_map
>> itself has been released.
>>
>> In the context of WW/WD transactions that means that if a user
>> unlocks
>> and frees a ww_mutex from within an ongoing ww transaction, and that
>> mutex happens to be the first ww_mutex grabbed in the transaction,
>> such a warning is printed and there might be a risk of a UAF.
>>
>> Note that this is only problem when lockdep is enabled and affects
>> only
>> dereferences of struct lockdep_map.
>>
>> Adjust to this by adding a fake lockdep_map to the acquired context
>> and
>> make sure it is the first acquired lockdep map of the associated
>> ww_mutex class. Then hold it for the duration of the WW/WD
>> transaction.
>>
>> This has the side effect that trying to lock a ww mutex *without* a
>> ww_acquire_context but where a such context has been acquire, we'd
>> see
>> a lockdep splat. The test-ww_mutex.c selftest attempts to do that, so
>> modify that particular test to not acquire a ww_acquire_context if it
>> is not going to be used.
>>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Link:
>> https://lkml.kernel.org/r/20241009092031.6356-1-thomas.hellstrom@linux.intel.com
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>
>It looks like the last version was not picked for this patch.
>
>https://lore.kernel.org/all/172918113162.1279674.6570518059490493206@2413ebb6fbb6/T/
>
>The version in this autosel patch regresses the locking api selftests
>and should not be backported. Same for the corresponding backports for
>6.11 and 6.6. Let me know if I should reply separately to those.

This is what ended up landing upstream...

I can drop it from the autosel queue, but if this has issues then you
should also fix it up upstream.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-11-25 13:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-24 12:46 [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot() Sasha Levin
2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 2/5] perf/x86/amd: Warn only on new bits set Sasha Levin
2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 3/5] locking/ww_mutex: Adjust to lockdep nest_lock requirements Sasha Levin
2024-11-25 10:06   ` Thomas Hellström
2024-11-25 13:34     ` Sasha Levin
2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 4/5] cleanup: Adjust scoped_guard() macros to avoid potential warning Sasha Levin
2024-11-24 12:46 ` [PATCH AUTOSEL 6.12 5/5] timekeeping: Always check for negative motion Sasha Levin
2024-11-24 13:13 ` [PATCH AUTOSEL 6.12 1/5] uprobes: sanitiize xol_free_insn_slot() Oleg Nesterov
2024-11-24 14:15   ` Sasha Levin
2024-11-24 14:36     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).