linux-rt-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 6.18-5.10] ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels
       [not found] <20251212061223.305139-1-sashal@kernel.org>
@ 2025-12-12  6:12 ` Sasha Levin
  2025-12-12  6:12 ` [PATCH AUTOSEL 6.18-6.6] dm-snapshot: fix 'scheduling while atomic' on real-time kernels Sasha Levin
  1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-12-12  6:12 UTC (permalink / raw)
  To: patches, stable
  Cc: Sebastian Andrzej Siewior, Arnd Bergmann, Linus Walleij,
	Russell King (Oracle), Sasha Levin, linux, clrkwllms, rostedt,
	nathan, kees, peterz, lumag, richard, chrisi.schrefl,
	lukas.bulwahn, afd, linux-arm-kernel, linux-rt-devel

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

[ Upstream commit fedadc4137234c3d00c4785eeed3e747fe9036ae ]

gup_pgd_range() is invoked with disabled interrupts and invokes
__kmap_local_page_prot() via pte_offset_map(), gup_p4d_range().
With HIGHPTE enabled, __kmap_local_page_prot() invokes kmap_high_get()
which uses a spinlock_t via lock_kmap_any(). This leads to an
sleeping-while-atomic error on PREEMPT_RT because spinlock_t becomes a
sleeping lock and must not be acquired in atomic context.

The loop in map_new_virtual() uses wait_queue_head_t for wake up which
also is using a spinlock_t.

Since HIGHPTE is rarely needed at all, turn it off for PREEMPT_RT
to allow the use of get_user_pages_fast().

[arnd: rework patch to turn off HIGHPTE instead of HAVE_PAST_GUP]

Co-developed-by: Arnd Bergmann <arnd@arndb.de>

Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

# Commit Analysis: ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels

## 1. COMMIT MESSAGE ANALYSIS

**Problem identified**: The commit addresses a "sleeping-while-atomic"
bug on PREEMPT_RT ARM kernels. The issue occurs because:
- `gup_pgd_range()` runs with interrupts disabled (atomic context)
- With HIGHPTE enabled, the code path calls `kmap_high_get()` which
  acquires a `spinlock_t` via `lock_kmap_any()`
- On PREEMPT_RT, `spinlock_t` becomes a sleeping lock (mutex)
- Attempting to acquire a sleeping lock in atomic context is a bug

**Tags present**:
- Acked-by: Linus Walleij (ARM/pinctrl maintainer)
- Reviewed-by: Arnd Bergmann (major ARM contributor)
- Signed-off-by: Sebastian Andrzej Siewior (PREEMPT_RT maintainer)
- Signed-off-by: Russell King (ARM maintainer)

**Missing tags**: No `Cc: stable@vger.kernel.org` or `Fixes:` tag.

## 2. CODE CHANGE ANALYSIS

The change is a single-line Kconfig modification:

```diff
- depends on HIGHMEM
+       depends on HIGHMEM && !PREEMPT_RT
```

This simply prevents the `HIGHPTE` configuration option from being
selected when `PREEMPT_RT` is enabled. The technical mechanism of the
bug is clear:

1. `get_user_pages_fast()` → `gup_pgd_range()` (runs with interrupts
   disabled)
2. → `pte_offset_map()` → `__kmap_local_page_prot()` → `kmap_high_get()`
3. `kmap_high_get()` calls `lock_kmap_any()` which uses `spinlock_t`
4. On PREEMPT_RT: `spinlock_t` = sleeping lock → BUG in atomic context

The commit message notes that "HIGHPTE is rarely needed at all" - it's
an optimization to put page tables in high memory, which is typically
unnecessary on modern systems.

## 3. CLASSIFICATION

- **Bug type**: Runtime crash/BUG (sleeping-while-atomic violation)
- **Not a new feature**: Disables a problematic configuration
  combination
- **Not a security fix**: No CVE or security-sensitive code
- **Build fix category**: No, this is a runtime issue

## 4. SCOPE AND RISK ASSESSMENT

**Scope**:
- 1 file changed (`arch/arm/Kconfig`)
- 1 line modified
- Affects only ARM + PREEMPT_RT + HIGHMEM configurations

**Risk**: **Very low**
- This is a Kconfig dependency change only
- Users who previously had HIGHPTE enabled will now have it disabled on
  PREEMPT_RT
- The workaround is conservative (disable the problematic feature rather
  than complex code fixes)
- Cannot introduce regressions in other code paths

## 5. USER IMPACT

**Affected users**: ARM systems running PREEMPT_RT kernels with HIGHMEM
(systems with >~800MB RAM on 32-bit ARM)

**Severity**: High for affected users
- `get_user_pages_fast()` is a commonly used path for I/O and memory
  management
- Without this fix, users would hit kernel warnings/crashes when GUP
  fast path is used
- This completely breaks PREEMPT_RT usability on affected configurations

## 6. STABILITY INDICATORS

**Review chain is strong**:
- Sebastian Andrzej Siewior (PREEMPT_RT maintainer) developed this
- Arnd Bergmann reworked and reviewed it
- Linus Walleij acked it
- Russell King (ARM maintainer) accepted it

## 7. DEPENDENCY CHECK

This is a standalone Kconfig change. Dependencies:
- `PREEMPT_RT` must exist in the kernel - PREEMPT_RT was merged into
  mainline in kernel 6.12
- `HIGHPTE` and `HIGHMEM` options exist on ARM in all relevant kernel
  versions

The fix should apply cleanly to any stable tree with PREEMPT_RT support.

## STABLE KERNEL CRITERIA EVALUATION

| Criterion | Assessment |
|-----------|------------|
| Obviously correct | ✅ Yes - disables problematic config combination |
| Fixes real bug | ✅ Yes - sleeping-while-atomic crash |
| Important issue | ✅ Yes - crashes on PREEMPT_RT systems |
| Small and contained | ✅ Yes - 1 line Kconfig change |
| No new features | ✅ Yes - only disables an option |
| Clean application | ✅ Yes - simple dependency addition |

## CONCERNS

1. **No explicit stable request**: Maintainers didn't add `Cc: stable`.
   However, the fix is clearly appropriate for stable.
2. **PREEMPT_RT availability**: Only relevant for kernels 6.12+ where
   PREEMPT_RT was merged into mainline.

## CONCLUSION

This commit fixes a real, reproducible crash on ARM PREEMPT_RT systems.
The fix is minimal (1 line), obviously correct (disables problematic
feature combination), well-reviewed by relevant maintainers (ARM, RT),
and carries essentially zero risk. While there's no explicit stable tag,
the technical merits strongly support backporting.

The sleeping-while-atomic bug would make `get_user_pages_fast()`
unusable on affected configurations, which is a serious correctness
issue for PREEMPT_RT users who require deterministic behavior.

**YES**

 arch/arm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4fb985b76e97f..70cd3b5b5a059 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1215,7 +1215,7 @@ config HIGHMEM
 
 config HIGHPTE
 	bool "Allocate 2nd-level pagetables from highmem" if EXPERT
-	depends on HIGHMEM
+	depends on HIGHMEM && !PREEMPT_RT
 	default y
 	help
 	  The VM uses one page of physical memory for each page table.
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH AUTOSEL 6.18-6.6] dm-snapshot: fix 'scheduling while atomic' on real-time kernels
       [not found] <20251212061223.305139-1-sashal@kernel.org>
  2025-12-12  6:12 ` [PATCH AUTOSEL 6.18-5.10] ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels Sasha Levin
@ 2025-12-12  6:12 ` Sasha Levin
  1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-12-12  6:12 UTC (permalink / raw)
  To: patches, stable
  Cc: Mikulas Patocka, Jiping Ma, Sasha Levin, agk, snitzer, bmarzins,
	bigeasy, clrkwllms, rostedt, dm-devel, linux-rt-devel

From: Mikulas Patocka <mpatocka@redhat.com>

[ Upstream commit 8581b19eb2c5ccf06c195d3b5468c3c9d17a5020 ]

There is reported 'scheduling while atomic' bug when using dm-snapshot on
real-time kernels. The reason for the bug is that the hlist_bl code does
preempt_disable() when taking the lock and the kernel attempts to take
other spinlocks while holding the hlist_bl lock.

Fix this by converting a hlist_bl spinlock into a regular spinlock.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Jiping Ma <jiping.ma2@windriver.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

### Summary

**Bug Analysis:**
The commit fixes a genuine runtime bug on PREEMPT_RT kernels. The
`hlist_bl` implementation uses bit spinlocks with `preempt_disable()`,
creating an atomic context. When dm-snapshot code then tries to take
other spinlocks (which on RT kernels are sleeping locks/rt_mutexes), it
triggers "scheduling while atomic" warnings.

**Historical Context:**
The buggy code (using `hlist_bl` in dm-snapshot) was introduced in
commit `f79ae415b64c35` ("dm snapshot: Make exception tables scalable")
which landed in **v5.2**. This means all current stable/LTS trees
(5.4.y, 5.10.y, 5.15.y, 6.1.y, 6.6.y, 6.12.y) contain the vulnerable
code.

**Risk vs Benefit:**

| Factor | Assessment |
|--------|------------|
| Bug severity | HIGH - causes kernel BUG on RT systems |
| Fix correctness | HIGH - straightforward lock type conversion |
| Code scope | CONTAINED - dm-snapshot only |
| Regression risk | LOW - semantically equivalent locking |
| User impact | RT kernel users with dm-snapshot |

**Stable Criteria Checklist:**
- ✅ Fixes a real bug (scheduling while atomic)
- ✅ Obviously correct (converts bit spinlock to regular spinlock)
- ✅ Small and contained (within dm-snapshot subsystem)
- ✅ No new features or APIs
- ✅ Authored by subsystem maintainer (Mikulas Patocka)
- ✅ Has Reported-by indicating real user impact
- ⚠️ No explicit `Cc: stable` tag (but this isn't required)

**Conclusion:**
This commit is an excellent stable backport candidate. It fixes a real,
reported bug that causes kernel warnings and potential instability on
PREEMPT_RT kernels when using dm-snapshot. The fix is well-understood,
uses standard kernel primitives, and is authored by the DM subsystem
maintainer. The change is self-contained and poses low regression risk
since it maintains the same locking semantics while fixing RT
compatibility.

**YES**

 drivers/md/dm-exception-store.h |  2 +-
 drivers/md/dm-snap.c            | 73 +++++++++++++++------------------
 2 files changed, 35 insertions(+), 40 deletions(-)

diff --git a/drivers/md/dm-exception-store.h b/drivers/md/dm-exception-store.h
index b679766375381..061b4d3108132 100644
--- a/drivers/md/dm-exception-store.h
+++ b/drivers/md/dm-exception-store.h
@@ -29,7 +29,7 @@ typedef sector_t chunk_t;
  * chunk within the device.
  */
 struct dm_exception {
-	struct hlist_bl_node hash_list;
+	struct hlist_node hash_list;
 
 	chunk_t old_chunk;
 	chunk_t new_chunk;
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index f40c18da40000..dbd148967de42 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -40,10 +40,15 @@ static const char dm_snapshot_merge_target_name[] = "snapshot-merge";
 #define DM_TRACKED_CHUNK_HASH(x)	((unsigned long)(x) & \
 					 (DM_TRACKED_CHUNK_HASH_SIZE - 1))
 
+struct dm_hlist_head {
+	struct hlist_head head;
+	spinlock_t lock;
+};
+
 struct dm_exception_table {
 	uint32_t hash_mask;
 	unsigned int hash_shift;
-	struct hlist_bl_head *table;
+	struct dm_hlist_head *table;
 };
 
 struct dm_snapshot {
@@ -628,8 +633,8 @@ static uint32_t exception_hash(struct dm_exception_table *et, chunk_t chunk);
 
 /* Lock to protect access to the completed and pending exception hash tables. */
 struct dm_exception_table_lock {
-	struct hlist_bl_head *complete_slot;
-	struct hlist_bl_head *pending_slot;
+	spinlock_t *complete_slot;
+	spinlock_t *pending_slot;
 };
 
 static void dm_exception_table_lock_init(struct dm_snapshot *s, chunk_t chunk,
@@ -638,20 +643,20 @@ static void dm_exception_table_lock_init(struct dm_snapshot *s, chunk_t chunk,
 	struct dm_exception_table *complete = &s->complete;
 	struct dm_exception_table *pending = &s->pending;
 
-	lock->complete_slot = &complete->table[exception_hash(complete, chunk)];
-	lock->pending_slot = &pending->table[exception_hash(pending, chunk)];
+	lock->complete_slot = &complete->table[exception_hash(complete, chunk)].lock;
+	lock->pending_slot = &pending->table[exception_hash(pending, chunk)].lock;
 }
 
 static void dm_exception_table_lock(struct dm_exception_table_lock *lock)
 {
-	hlist_bl_lock(lock->complete_slot);
-	hlist_bl_lock(lock->pending_slot);
+	spin_lock_nested(lock->complete_slot, 1);
+	spin_lock_nested(lock->pending_slot, 2);
 }
 
 static void dm_exception_table_unlock(struct dm_exception_table_lock *lock)
 {
-	hlist_bl_unlock(lock->pending_slot);
-	hlist_bl_unlock(lock->complete_slot);
+	spin_unlock(lock->pending_slot);
+	spin_unlock(lock->complete_slot);
 }
 
 static int dm_exception_table_init(struct dm_exception_table *et,
@@ -661,13 +666,15 @@ static int dm_exception_table_init(struct dm_exception_table *et,
 
 	et->hash_shift = hash_shift;
 	et->hash_mask = size - 1;
-	et->table = kvmalloc_array(size, sizeof(struct hlist_bl_head),
+	et->table = kvmalloc_array(size, sizeof(struct dm_hlist_head),
 				   GFP_KERNEL);
 	if (!et->table)
 		return -ENOMEM;
 
-	for (i = 0; i < size; i++)
-		INIT_HLIST_BL_HEAD(et->table + i);
+	for (i = 0; i < size; i++) {
+		INIT_HLIST_HEAD(&et->table[i].head);
+		spin_lock_init(&et->table[i].lock);
+	}
 
 	return 0;
 }
@@ -675,16 +682,17 @@ static int dm_exception_table_init(struct dm_exception_table *et,
 static void dm_exception_table_exit(struct dm_exception_table *et,
 				    struct kmem_cache *mem)
 {
-	struct hlist_bl_head *slot;
+	struct dm_hlist_head *slot;
 	struct dm_exception *ex;
-	struct hlist_bl_node *pos, *n;
+	struct hlist_node *pos;
 	int i, size;
 
 	size = et->hash_mask + 1;
 	for (i = 0; i < size; i++) {
 		slot = et->table + i;
 
-		hlist_bl_for_each_entry_safe(ex, pos, n, slot, hash_list) {
+		hlist_for_each_entry_safe(ex, pos, &slot->head, hash_list) {
+			hlist_del(&ex->hash_list);
 			kmem_cache_free(mem, ex);
 			cond_resched();
 		}
@@ -700,7 +708,7 @@ static uint32_t exception_hash(struct dm_exception_table *et, chunk_t chunk)
 
 static void dm_remove_exception(struct dm_exception *e)
 {
-	hlist_bl_del(&e->hash_list);
+	hlist_del(&e->hash_list);
 }
 
 /*
@@ -710,12 +718,11 @@ static void dm_remove_exception(struct dm_exception *e)
 static struct dm_exception *dm_lookup_exception(struct dm_exception_table *et,
 						chunk_t chunk)
 {
-	struct hlist_bl_head *slot;
-	struct hlist_bl_node *pos;
+	struct hlist_head *slot;
 	struct dm_exception *e;
 
-	slot = &et->table[exception_hash(et, chunk)];
-	hlist_bl_for_each_entry(e, pos, slot, hash_list)
+	slot = &et->table[exception_hash(et, chunk)].head;
+	hlist_for_each_entry(e, slot, hash_list)
 		if (chunk >= e->old_chunk &&
 		    chunk <= e->old_chunk + dm_consecutive_chunk_count(e))
 			return e;
@@ -762,18 +769,17 @@ static void free_pending_exception(struct dm_snap_pending_exception *pe)
 static void dm_insert_exception(struct dm_exception_table *eh,
 				struct dm_exception *new_e)
 {
-	struct hlist_bl_head *l;
-	struct hlist_bl_node *pos;
+	struct hlist_head *l;
 	struct dm_exception *e = NULL;
 
-	l = &eh->table[exception_hash(eh, new_e->old_chunk)];
+	l = &eh->table[exception_hash(eh, new_e->old_chunk)].head;
 
 	/* Add immediately if this table doesn't support consecutive chunks */
 	if (!eh->hash_shift)
 		goto out;
 
 	/* List is ordered by old_chunk */
-	hlist_bl_for_each_entry(e, pos, l, hash_list) {
+	hlist_for_each_entry(e, l, hash_list) {
 		/* Insert after an existing chunk? */
 		if (new_e->old_chunk == (e->old_chunk +
 					 dm_consecutive_chunk_count(e) + 1) &&
@@ -804,13 +810,13 @@ static void dm_insert_exception(struct dm_exception_table *eh,
 		 * Either the table doesn't support consecutive chunks or slot
 		 * l is empty.
 		 */
-		hlist_bl_add_head(&new_e->hash_list, l);
+		hlist_add_head(&new_e->hash_list, l);
 	} else if (new_e->old_chunk < e->old_chunk) {
 		/* Add before an existing exception */
-		hlist_bl_add_before(&new_e->hash_list, &e->hash_list);
+		hlist_add_before(&new_e->hash_list, &e->hash_list);
 	} else {
 		/* Add to l's tail: e is the last exception in this slot */
-		hlist_bl_add_behind(&new_e->hash_list, &e->hash_list);
+		hlist_add_behind(&new_e->hash_list, &e->hash_list);
 	}
 }
 
@@ -820,7 +826,6 @@ static void dm_insert_exception(struct dm_exception_table *eh,
  */
 static int dm_add_exception(void *context, chunk_t old, chunk_t new)
 {
-	struct dm_exception_table_lock lock;
 	struct dm_snapshot *s = context;
 	struct dm_exception *e;
 
@@ -833,17 +838,7 @@ static int dm_add_exception(void *context, chunk_t old, chunk_t new)
 	/* Consecutive_count is implicitly initialised to zero */
 	e->new_chunk = new;
 
-	/*
-	 * Although there is no need to lock access to the exception tables
-	 * here, if we don't then hlist_bl_add_head(), called by
-	 * dm_insert_exception(), will complain about accessing the
-	 * corresponding list without locking it first.
-	 */
-	dm_exception_table_lock_init(s, old, &lock);
-
-	dm_exception_table_lock(&lock);
 	dm_insert_exception(&s->complete, e);
-	dm_exception_table_unlock(&lock);
 
 	return 0;
 }
@@ -873,7 +868,7 @@ static int calc_max_buckets(void)
 	/* use a fixed size of 2MB */
 	unsigned long mem = 2 * 1024 * 1024;
 
-	mem /= sizeof(struct hlist_bl_head);
+	mem /= sizeof(struct dm_hlist_head);
 
 	return mem;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-12-12  6:12 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20251212061223.305139-1-sashal@kernel.org>
2025-12-12  6:12 ` [PATCH AUTOSEL 6.18-5.10] ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels Sasha Levin
2025-12-12  6:12 ` [PATCH AUTOSEL 6.18-6.6] dm-snapshot: fix 'scheduling while atomic' on real-time kernels Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).