Linux Documentation
 help / color / mirror / Atom feed
* [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: Nico Pache @ 2026-05-22 15:00 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-mm, linux-trace-kernel
  Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, liam, ljs, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, npache, peterx, pfalcato,
	rakie.kim, raquini, rdunlap, richard.weiyang, rientjes, rostedt,
	rppt, ryan.roberts, shivankg, sunnanyong, surenb,
	thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
	wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <20260522150009.121603-1-npache@redhat.com>

Enable khugepaged to collapse to mTHP orders. This patch implements the
main scanning logic using a bitmap to track occupied pages and a stack
structure that allows us to find optimal collapse sizes.

Previous to this patch, PMD collapse had 3 main phases, a light weight
scanning phase (mmap_read_lock) that determines a potential PMD
collapse, an alloc phase (mmap unlocked), then finally heavier collapse
phase (mmap_write_lock).

To enabled mTHP collapse we make the following changes:

During PMD scan phase, track occupied pages in a bitmap. When mTHP
orders are enabled, we remove the restriction of max_ptes_none during the
scan phase to avoid missing potential mTHP collapse candidates. Once we
have scanned the full PMD range and updated the bitmap to track occupied
pages, we use the bitmap to find the optimal mTHP size.

Implement collapse_scan_bitmap() to perform binary recursion on the bitmap
and determine the best eligible order for the collapse. A stack structure
is used instead of traditional recursion to manage the search. This also
prevents a traditional recursive approach when the kernel stack struct is
limited. The algorithm recursively splits the bitmap into smaller chunks to
find the highest order mTHPs that satisfy the collapse criteria. We start
by attempting the PMD order, then moved on the consecutively lower orders
(mTHP collapse). The stack maintains a pair of variables (offset, order),
indicating the number of PTEs from the start of the PMD, and the order of
the potential collapse candidate.

The algorithm for consuming the bitmap works as such:
    1) push (0, HPAGE_PMD_ORDER) onto the stack
    2) pop the stack
    3) check if the number of set bits in that (offset,order) pair
       statisfy the max_ptes_none threshold for that order
    4) if yes, attempt collapse
    5) if no (or collapse fails), push two new stack items representing
       the left and right halves of the current bitmap range, at the
       next lower order
    6) repeat at step (2) until stack is empty.

Below is a diagram representing the algorithm and stack items:

                            offset   mid_offset
                            |        |
                            |        |
                            v        v
          ____________________________________
         |          PTE Page Table            |
         --------------------------------------
			    <-------><------->
                             order-1  order-1

mTHP collapses reject regions containing swapped out or shared pages.
This is because adding new entries can lead to new none pages, and these
may lead to constant promotion into a higher order mTHP. A similar
issue can occur with "max_ptes_none > HPAGE_PMD_NR/2" due to a collapse
introducing at least 2x the number of pages, and on a future scan will
satisfy the promotion condition once again. This issue is prevented via
the collapse_max_ptes_none() function which imposes the max_ptes_none
restrictions above.

We currently only support mTHP collapse for max_ptes_none values of 0
and HPAGE_PMD_NR - 1. resulting in the following behavior:

    - max_ptes_none=0: Never introduce new empty pages during collapse
    - max_ptes_none=HPAGE_PMD_NR-1: Always try collapse to the highest
      available mTHP order

Any other max_ptes_none value will emit a warning and default mTHP
collapse to max_ptes_none=0. There should be no behavior change for PMD
collapse.

Once we determine what mTHP sizes fits best in that PMD range a collapse
is attempted. A minimum collapse order of 2 is used as this is the lowest
order supported by anon memory as defined by THP_ORDERS_ALL_ANON.

Currently madv_collapse is not supported and will only attempt PMD
collapse.

We can also remove the check for is_khugepaged inside the PMD scan as
the collapse_max_ptes_none() function handles this logic now.

Signed-off-by: Nico Pache <npache@redhat.com>
---
 mm/khugepaged.c | 181 +++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 172 insertions(+), 9 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 64ceebc9d8a7..d3d7db8be26c 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -99,6 +99,30 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
 
 static struct kmem_cache *mm_slot_cache __ro_after_init;
 
+#define KHUGEPAGED_MIN_MTHP_ORDER	2
+/*
+ * mthp_collapse() does an iterative DFS over a binary tree, from
+ * HPAGE_PMD_ORDER down to KHUGEPAGED_MIN_MTHP_ORDER. The max stack
+ * size needed for a DFS on a binary tree is height + 1, where
+ * height = HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORDER.
+ *
+ * ilog2 is used in place of HPAGE_PMD_ORDER because some architectures
+ * (e.g. ppc64le) do not define HPAGE_PMD_ORDER until after build time.
+ */
+#define MTHP_STACK_SIZE	(ilog2(MAX_PTRS_PER_PTE) - KHUGEPAGED_MIN_MTHP_ORDER + 1)
+
+/*
+ * Defines a range of PTE entries in a PTE page table which are being
+ * considered for mTHP collapse.
+ *
+ * @offset: the offset of the first PTE entry in a PMD range.
+ * @order: the order of the PTE entries being considered for collapse.
+ */
+struct mthp_range {
+	u16 offset;
+	u8 order;
+};
+
 struct collapse_control {
 	bool is_khugepaged;
 
@@ -110,6 +134,12 @@ struct collapse_control {
 
 	/* nodemask for allocation fallback */
 	nodemask_t alloc_nmask;
+
+	/* Each bit represents a single occupied (!none/zero) page. */
+	DECLARE_BITMAP(mthp_bitmap, MAX_PTRS_PER_PTE);
+	/* A mask of the current range being considered for mTHP collapse. */
+	DECLARE_BITMAP(mthp_bitmap_mask, MAX_PTRS_PER_PTE);
+	struct mthp_range mthp_bitmap_stack[MTHP_STACK_SIZE];
 };
 
 /**
@@ -1411,20 +1441,137 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long s
 	return result;
 }
 
+static void collapse_mthp_stack_push(struct collapse_control *cc, int *stack_size,
+				     u16 offset, u8 order)
+{
+	const int size = *stack_size;
+	struct mthp_range *stack = &cc->mthp_bitmap_stack[size];
+
+	VM_WARN_ON_ONCE(size >= MTHP_STACK_SIZE);
+	stack->order = order;
+	stack->offset = offset;
+	(*stack_size)++;
+}
+
+static struct mthp_range collapse_mthp_stack_pop(struct collapse_control *cc,
+						 int *stack_size)
+{
+	const int size = *stack_size;
+
+	VM_WARN_ON_ONCE(size <= 0);
+	(*stack_size)--;
+	return cc->mthp_bitmap_stack[size - 1];
+}
+
+static unsigned int collapse_mthp_count_present(struct collapse_control *cc,
+						u16 offset, unsigned int nr_ptes)
+{
+	bitmap_zero(cc->mthp_bitmap_mask, MAX_PTRS_PER_PTE);
+	bitmap_set(cc->mthp_bitmap_mask, offset, nr_ptes);
+	return bitmap_weight_and(cc->mthp_bitmap, cc->mthp_bitmap_mask, MAX_PTRS_PER_PTE);
+}
+
+/*
+ * mthp_collapse() consumes the bitmap that is generated during
+ * collapse_scan_pmd() to determine what regions and mTHP orders fit best.
+ *
+ * Each bit in cc->mthp_bitmap represents a single occupied (!none/zero) page.
+ * A stack structure cc->mthp_bitmap_stack is used to check different regions
+ * of the bitmap for collapse eligibility. The stack maintains a pair of
+ * variables (offset, order), indicating the number of PTEs from the start of
+ * the PMD, and the order of the potential collapse candidate respectively. We
+ * start at the PMD order and check if it is eligible for collapse; if not, we
+ * add two entries to the stack at a lower order to represent the left and right
+ * halves of the PTE page table we are examining.
+ *
+ *                         offset       mid_offset
+ *                         |         |
+ *                         |         |
+ *                         v         v
+ *      --------------------------------------
+ *      |          cc->mthp_bitmap            |
+ *      --------------------------------------
+ *                         <-------><------->
+ *                          order-1  order-1
+ *
+ * For each of these, we determine how many PTE entries are occupied in the
+ * range of PTE entries we propose to collapse, then we compare this to a
+ * threshold number of PTE entries which would need to be occupied for a
+ * collapse to be permitted at that order (accounting for max_ptes_none).
+ *
+ * If a collapse is permitted, we attempt to collapse the PTE range into a
+ * mTHP.
+ */
+static int mthp_collapse(struct mm_struct *mm, struct vm_area_struct *vma,
+		unsigned long address, int referenced, int unmapped,
+		struct collapse_control *cc, unsigned long enabled_orders)
+{
+	unsigned int nr_occupied_ptes, nr_ptes, max_ptes_none;
+	int collapsed = 0, stack_size = 0;
+	unsigned long collapse_address;
+	struct mthp_range range;
+	u16 offset;
+	u8 order;
+
+	collapse_mthp_stack_push(cc, &stack_size, 0, HPAGE_PMD_ORDER);
+
+	while (stack_size) {
+		range = collapse_mthp_stack_pop(cc, &stack_size);
+		order = range.order;
+		offset = range.offset;
+		nr_ptes = 1UL << order;
+
+		if (!test_bit(order, &enabled_orders))
+			goto next_order;
+
+		max_ptes_none = collapse_max_ptes_none(cc, vma, order);
+
+		nr_occupied_ptes = collapse_mthp_count_present(cc, offset,
+							       nr_ptes);
+
+		if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
+			int ret;
+
+			collapse_address = address + offset * PAGE_SIZE;
+			ret = collapse_huge_page(mm, collapse_address, referenced,
+						 unmapped, cc, order);
+			if (ret == SCAN_SUCCEED) {
+				collapsed += nr_ptes;
+				continue;
+			}
+		}
+
+next_order:
+		if ((BIT(order) - 1) & enabled_orders) {
+			const u8 next_order = order - 1;
+			const u16 mid_offset = offset + (nr_ptes / 2);
+
+			collapse_mthp_stack_push(cc, &stack_size, mid_offset,
+						 next_order);
+			collapse_mthp_stack_push(cc, &stack_size, offset,
+						 next_order);
+		}
+	}
+	return collapsed;
+}
+
 static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long start_addr,
 		bool *lock_dropped, struct collapse_control *cc)
 {
-	const unsigned int max_ptes_none = collapse_max_ptes_none(cc, vma, HPAGE_PMD_ORDER);
 	const unsigned int max_ptes_shared = collapse_max_ptes_shared(cc, HPAGE_PMD_ORDER);
 	const unsigned int max_ptes_swap = collapse_max_ptes_swap(cc, HPAGE_PMD_ORDER);
+	unsigned int max_ptes_none = collapse_max_ptes_none(cc, vma, HPAGE_PMD_ORDER);
+	enum tva_type tva_flags = cc->is_khugepaged ? TVA_KHUGEPAGED : TVA_FORCED_COLLAPSE;
 	pmd_t *pmd;
-	pte_t *pte, *_pte;
-	int none_or_zero = 0, shared = 0, referenced = 0;
+	pte_t *pte, *_pte, pteval;
+	int i;
+	int none_or_zero = 0, shared = 0, nr_collapsed = 0, referenced = 0;
 	enum scan_result result = SCAN_FAIL;
 	struct page *page = NULL;
 	struct folio *folio = NULL;
 	unsigned long addr;
+	unsigned long enabled_orders;
 	spinlock_t *ptl;
 	int node = NUMA_NO_NODE, unmapped = 0;
 
@@ -1436,8 +1583,19 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 		goto out;
 	}
 
+	bitmap_zero(cc->mthp_bitmap, MAX_PTRS_PER_PTE);
 	memset(cc->node_load, 0, sizeof(cc->node_load));
 	nodes_clear(cc->alloc_nmask);
+
+	enabled_orders = collapse_allowable_orders(vma, vma->vm_flags, tva_flags);
+
+	/*
+	 * If PMD is the only enabled order, enforce max_ptes_none, otherwise
+	 * scan all pages to populate the bitmap for mTHP collapse.
+	 */
+	if (enabled_orders != BIT(HPAGE_PMD_ORDER))
+		max_ptes_none = KHUGEPAGED_MAX_PTES_LIMIT;
+
 	pte = pte_offset_map_lock(mm, pmd, start_addr, &ptl);
 	if (!pte) {
 		cc->progress++;
@@ -1445,11 +1603,13 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 		goto out;
 	}
 
-	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
-	     _pte++, addr += PAGE_SIZE) {
+	for (i = 0; i < HPAGE_PMD_NR; i++) {
+		_pte = pte + i;
+		addr = start_addr + i * PAGE_SIZE;
+		pteval = ptep_get(_pte);
+
 		cc->progress++;
 
-		pte_t pteval = ptep_get(_pte);
 		if (pte_none_or_zero(pteval)) {
 			if (++none_or_zero > max_ptes_none) {
 				result = SCAN_EXCEED_NONE_PTE;
@@ -1529,6 +1689,8 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 			}
 		}
 
+		/* Set bit for occupied pages */
+		__set_bit(i, cc->mthp_bitmap);
 		/*
 		 * Record which node the original page is from and save this
 		 * information to cc->node_load[].
@@ -1587,10 +1749,11 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 	if (result == SCAN_SUCCEED) {
 		/* collapse_huge_page expects the lock to be dropped before calling */
 		mmap_read_unlock(mm);
-		result = collapse_huge_page(mm, start_addr, referenced,
-					    unmapped, cc, HPAGE_PMD_ORDER);
-		/* collapse_huge_page will return with the mmap_lock released */
+		nr_collapsed = mthp_collapse(mm, vma, start_addr, referenced,
+					     unmapped, cc, enabled_orders);
+		/* mmap_lock was released above, set lock_dropped */
 		*lock_dropped = true;
+		result = nr_collapsed ? SCAN_SUCCEED : SCAN_FAIL;
 	}
 out:
 	trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
-- 
2.54.0


^ permalink raw reply related

* [PATCH mm-unstable v18 12/14] mm/khugepaged: avoid unnecessary mTHP collapse attempts
From: Nico Pache @ 2026-05-22 15:00 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-mm, linux-trace-kernel
  Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, liam, ljs, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, npache, peterx, pfalcato,
	rakie.kim, raquini, rdunlap, richard.weiyang, rientjes, rostedt,
	rppt, ryan.roberts, shivankg, sunnanyong, surenb,
	thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
	wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe,
	Usama Arif
In-Reply-To: <20260522150009.121603-1-npache@redhat.com>

There are cases where, if an attempted collapse fails, all subsequent
orders are guaranteed to also fail. Avoid these collapse attempts by
bailing out early.

Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Acked-by: Usama Arif <usama.arif@linux.dev>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Nico Pache <npache@redhat.com>
---
 mm/khugepaged.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index d3d7db8be26c..15b7298bc225 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1535,9 +1535,31 @@ static int mthp_collapse(struct mm_struct *mm, struct vm_area_struct *vma,
 			collapse_address = address + offset * PAGE_SIZE;
 			ret = collapse_huge_page(mm, collapse_address, referenced,
 						 unmapped, cc, order);
-			if (ret == SCAN_SUCCEED) {
+
+			switch (ret) {
+			/* Cases where we continue to next collapse candidate */
+			case SCAN_SUCCEED:
 				collapsed += nr_ptes;
+				fallthrough;
+			case SCAN_PTE_MAPPED_HUGEPAGE:
 				continue;
+			/* Cases where lower orders might still succeed */
+			case SCAN_LACK_REFERENCED_PAGE:
+			case SCAN_EXCEED_NONE_PTE:
+			case SCAN_EXCEED_SWAP_PTE:
+			case SCAN_EXCEED_SHARED_PTE:
+			case SCAN_PAGE_LOCK:
+			case SCAN_PAGE_COUNT:
+			case SCAN_PAGE_NULL:
+			case SCAN_DEL_PAGE_LRU:
+			case SCAN_PTE_NON_PRESENT:
+			case SCAN_PTE_UFFD_WP:
+			case SCAN_ALLOC_HUGE_PAGE_FAIL:
+			case SCAN_PAGE_LAZYFREE:
+				goto next_order;
+			/* Cases where no further collapse is possible */
+			default:
+				return collapsed;
 			}
 		}
 
-- 
2.54.0


^ permalink raw reply related

* [PATCH mm-unstable v18 13/14] mm/khugepaged: run khugepaged for all orders
From: Nico Pache @ 2026-05-22 15:00 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-mm, linux-trace-kernel
  Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, liam, ljs, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, npache, peterx, pfalcato,
	rakie.kim, raquini, rdunlap, richard.weiyang, rientjes, rostedt,
	rppt, ryan.roberts, shivankg, sunnanyong, surenb,
	thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
	wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe,
	Usama Arif
In-Reply-To: <20260522150009.121603-1-npache@redhat.com>

From: Baolin Wang <baolin.wang@linux.alibaba.com>

If any order (m)THP is enabled we should allow running khugepaged to
attempt scanning and collapsing mTHPs. In order for khugepaged to operate
when only mTHP sizes are specified in sysfs, we must modify the predicate
function that determines whether it ought to run to do so.

This function is currently called hugepage_pmd_enabled(), this patch
renames it to hugepage_enabled() and updates the logic to check to
determine whether any valid orders may exist which would justify
khugepaged running.

We must also update collapse_allowable_orders() to check all orders if
the vma is anonymous and the collapse is khugepaged.

After this patch khugepaged mTHP collapse is fully enabled.

Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Acked-by: Usama Arif <usama.arif@linux.dev>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Nico Pache <npache@redhat.com>
---
 mm/khugepaged.c | 36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 15b7298bc225..6d3f4ff6956a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -529,23 +529,23 @@ static inline int collapse_test_exit_or_disable(struct mm_struct *mm)
 		mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
 }
 
-static bool hugepage_pmd_enabled(void)
+static bool hugepage_enabled(void)
 {
 	/*
 	 * We cover the anon, shmem and the file-backed case here; file-backed
 	 * hugepages, when configured in, are determined by the global control.
-	 * Anon pmd-sized hugepages are determined by the pmd-size control.
+	 * Anon hugepages are determined by its per-size mTHP control.
 	 * Shmem pmd-sized hugepages are also determined by its pmd-size control,
 	 * except when the global shmem_huge is set to SHMEM_HUGE_DENY.
 	 */
 	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
 	    hugepage_global_enabled())
 		return true;
-	if (test_bit(PMD_ORDER, &huge_anon_orders_always))
+	if (READ_ONCE(huge_anon_orders_always))
 		return true;
-	if (test_bit(PMD_ORDER, &huge_anon_orders_madvise))
+	if (READ_ONCE(huge_anon_orders_madvise))
 		return true;
-	if (test_bit(PMD_ORDER, &huge_anon_orders_inherit) &&
+	if (READ_ONCE(huge_anon_orders_inherit) &&
 	    hugepage_global_enabled())
 		return true;
 	if (IS_ENABLED(CONFIG_SHMEM) && shmem_hpage_pmd_enabled())
@@ -586,7 +586,13 @@ void __khugepaged_enter(struct mm_struct *mm)
 static unsigned long collapse_allowable_orders(struct vm_area_struct *vma,
 		vm_flags_t vm_flags, enum tva_type tva_flags)
 {
-	unsigned long orders = BIT(HPAGE_PMD_ORDER);
+	unsigned long orders;
+
+	/* If khugepaged is scanning an anonymous vma, allow mTHP collapse */
+	if ((tva_flags == TVA_KHUGEPAGED) && vma_is_anonymous(vma))
+		orders = THP_ORDERS_ALL_ANON;
+	else
+		orders = BIT(HPAGE_PMD_ORDER);
 
 	return thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
 }
@@ -594,11 +600,9 @@ static unsigned long collapse_allowable_orders(struct vm_area_struct *vma,
 void khugepaged_enter_vma(struct vm_area_struct *vma,
 			  vm_flags_t vm_flags)
 {
-	if (!mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) &&
-	    hugepage_pmd_enabled()) {
-		if (collapse_allowable_orders(vma, vm_flags, TVA_KHUGEPAGED))
-			__khugepaged_enter(vma->vm_mm);
-	}
+	if (!mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) && hugepage_enabled()
+	    && collapse_allowable_orders(vma, vm_flags, TVA_KHUGEPAGED))
+		__khugepaged_enter(vma->vm_mm);
 }
 
 void __khugepaged_exit(struct mm_struct *mm)
@@ -2949,7 +2953,7 @@ static void collapse_scan_mm_slot(unsigned int progress_max,
 
 static int khugepaged_has_work(void)
 {
-	return !list_empty(&khugepaged_scan.mm_head) && hugepage_pmd_enabled();
+	return !list_empty(&khugepaged_scan.mm_head) && hugepage_enabled();
 }
 
 static int khugepaged_wait_event(void)
@@ -3022,7 +3026,7 @@ static void khugepaged_wait_work(void)
 		return;
 	}
 
-	if (hugepage_pmd_enabled())
+	if (hugepage_enabled())
 		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
 }
 
@@ -3053,7 +3057,7 @@ void set_recommended_min_free_kbytes(void)
 	int nr_zones = 0;
 	unsigned long recommended_min;
 
-	if (!hugepage_pmd_enabled()) {
+	if (!hugepage_enabled()) {
 		calculate_min_free_kbytes();
 		goto update_wmarks;
 	}
@@ -3103,7 +3107,7 @@ int start_stop_khugepaged(void)
 	int err = 0;
 
 	mutex_lock(&khugepaged_mutex);
-	if (hugepage_pmd_enabled()) {
+	if (hugepage_enabled()) {
 		if (!khugepaged_thread)
 			khugepaged_thread = kthread_run(khugepaged, NULL,
 							"khugepaged");
@@ -3129,7 +3133,7 @@ int start_stop_khugepaged(void)
 void khugepaged_min_free_kbytes_update(void)
 {
 	mutex_lock(&khugepaged_mutex);
-	if (hugepage_pmd_enabled() && khugepaged_thread)
+	if (hugepage_enabled() && khugepaged_thread)
 		set_recommended_min_free_kbytes();
 	mutex_unlock(&khugepaged_mutex);
 }
-- 
2.54.0


^ permalink raw reply related

* [PATCH mm-unstable v18 14/14] Documentation: mm: update the admin guide for mTHP collapse
From: Nico Pache @ 2026-05-22 15:00 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-mm, linux-trace-kernel
  Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, liam, ljs, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, npache, peterx, pfalcato,
	rakie.kim, raquini, rdunlap, richard.weiyang, rientjes, rostedt,
	rppt, ryan.roberts, shivankg, sunnanyong, surenb,
	thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
	wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe,
	Bagas Sanjaya
In-Reply-To: <20260522150009.121603-1-npache@redhat.com>

Now that we can collapse to mTHPs lets update the admin guide to
reflect these changes and provide proper guidance on how to utilize it.

Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Nico Pache <npache@redhat.com>
---
 Documentation/admin-guide/mm/transhuge.rst | 50 +++++++++++++---------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 80a4d0bed70b..644869d3adfd 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -63,7 +63,8 @@ often.
 THP can be enabled system wide or restricted to certain tasks or even
 memory ranges inside task's address space. Unless THP is completely
 disabled, there is ``khugepaged`` daemon that scans memory and
-collapses sequences of basic pages into PMD-sized huge pages.
+collapses sequences of basic pages into huge pages of either PMD size
+or mTHP sizes, if the system is configured to do so.
 
 The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
 interface and using madvise(2) and prctl(2) system calls.
@@ -219,10 +220,10 @@ this behaviour by writing 0 to shrink_underused, and enable it by writing
 	echo 0 > /sys/kernel/mm/transparent_hugepage/shrink_underused
 	echo 1 > /sys/kernel/mm/transparent_hugepage/shrink_underused
 
-khugepaged will be automatically started when PMD-sized THP is enabled
+khugepaged will be automatically started when any THP size is enabled
 (either of the per-size anon control or the top-level control are set
 to "always" or "madvise"), and it'll be automatically shutdown when
-PMD-sized THP is disabled (when both the per-size anon control and the
+all THP sizes are disabled (when both the per-size anon control and the
 top-level control are "never")
 
 process THP controls
@@ -264,11 +265,6 @@ support the following arguments::
 Khugepaged controls
 -------------------
 
-.. note::
-   khugepaged currently only searches for opportunities to collapse to
-   PMD-sized THP and no attempt is made to collapse to other THP
-   sizes.
-
 khugepaged runs usually at low frequency so while one may not want to
 invoke defrag algorithms synchronously during the page faults, it
 should be worth invoking defrag at least in khugepaged. However it's
@@ -296,11 +292,11 @@ allocation failure to throttle the next allocation attempt::
 The khugepaged progress can be seen in the number of pages collapsed (note
 that this counter may not be an exact count of the number of pages
 collapsed, since "collapsed" could mean multiple things: (1) A PTE mapping
-being replaced by a PMD mapping, or (2) All 4K physical pages replaced by
-one 2M hugepage. Each may happen independently, or together, depending on
-the type of memory and the failures that occur. As such, this value should
-be interpreted roughly as a sign of progress, and counters in /proc/vmstat
-consulted for more accurate accounting)::
+being replaced by a PMD mapping, or (2) physical pages replaced by one
+hugepage of various sizes (PMD-sized or mTHP). Each may happen independently,
+or together, depending on the type of memory and the failures that occur.
+As such, this value should be interpreted roughly as a sign of progress,
+and counters in /proc/vmstat consulted for more accurate accounting)::
 
 	/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed
 
@@ -308,16 +304,21 @@ for each pass::
 
 	/sys/kernel/mm/transparent_hugepage/khugepaged/full_scans
 
-``max_ptes_none`` specifies how many extra small pages (that are
-not already mapped) can be allocated when collapsing a group
-of small pages into one large page::
+``max_ptes_none`` specifies how many empty (none/zero) pages are allowed
+when collapsing a group of small pages into one large page::
 
 	/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
 
-A higher value leads to use additional memory for programs.
-A lower value leads to gain less thp performance. Value of
-max_ptes_none can waste cpu time very little, you can
-ignore it.
+For PMD-sized THP collapse, this directly limits the number of empty pages
+allowed in the 2MB region.
+
+For mTHP collapse, only 0 or (HPAGE_PMD_NR - 1) are supported. At
+HPAGE_PMD_NR - 1, we collapse to the highest possible order. Any intermediate
+value will emit a warning and mTHP collapse will default to max_ptes_none=0.
+
+A higher value allows more empty pages, potentially leading to more memory
+usage but better THP performance. A lower value is more conservative and
+may result in fewer THP collapses.
 
 ``max_ptes_swap`` specifies how many pages can be brought in from
 swap when collapsing a group of pages into a transparent huge page::
@@ -337,6 +338,15 @@ that THP is shared. Exceeding the number would block the collapse::
 
 A higher value may increase memory footprint for some workloads.
 
+.. note::
+   For mTHP collapse, khugepaged does not support collapsing regions that
+   contain shared or swapped out pages, as this could lead to continuous
+   promotion to higher orders. The collapse will fail if any shared or
+   swapped PTEs are encountered during the scan.
+
+   Currently, madvise_collapse only supports collapsing to PMD-sized THPs
+   and does not attempt mTHP collapses.
+
 Boot parameters
 ===============
 
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP collapse support
From: Nico Pache @ 2026-05-22 15:07 UTC (permalink / raw)
  To: linux-doc, akpm, linux-kernel, linux-mm, linux-trace-kernel
  Cc: aarcange, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, liam, ljs, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <20260522150009.121603-1-npache@redhat.com>

On Fri, May 22, 2026 at 8:59 AM Nico Pache <npache@redhat.com> wrote:
>
> The following series provides khugepaged with the capability to collapse
> anonymous memory regions to mTHPs.
>
> To achieve this we generalize the khugepaged functions to no longer depend
> on PMD_ORDER. Then during the PMD scan, we use a bitmap to track individual
> pages that are occupied (!none/zero). After the PMD scan is done, we use
> the bitmap to find the optimal mTHP sizes for the PMD range. The
> restriction on max_ptes_none is removed during the scan, to make sure we
> account for the whole PMD range in the bitmap. When no mTHP size is
> enabled, the legacy behavior of khugepaged is maintained.
>
> We currently only support max_ptes_none values of 0 or HPAGE_PMD_NR - 1
> (ie 511). If any other value is specified, the kernel will emit a warning
> and mTHP collapse will default to max_ptes_none=0. If a mTHP collapse is
> attempted, but contains swapped out, or shared pages, we don't perform
> the collapse.
> It is now also possible to collapse to mTHPs without requiring the PMD THP
> size to be enabled. These limitations are to prevent collapse "creep"
> behavior. This prevents constantly promoting mTHPs to the next available
> size, which would occur because a collapse introduces more non-zero pages
> that would satisfy the promotion condition on subsequent scans.
>
> Patch 1-2:   Generalize hugepage_vma_revalidate and alloc_charge_folio
>              for arbitrary orders.
> Patch 3:     Rework max_ptes_* handling into helper functions
> Patch 4:     Generalize __collapse_huge_page_* for mTHP support
> Patch 5:     Require collapse_huge_page to enter/exit with the lock dropped
> Patch 6:     Generalize collapse_huge_page for mTHP collapse
> Patch 7:     Skip collapsing mTHP to smaller orders
> Patch 8-9:   Add per-order mTHP statistics and tracepoints
> Patch 10:    Introduce collapse_allowable_orders helper function
> Patch 11-13: Introduce bitmap and mTHP collapse support, fully enabled
> Patch 14:    Documentation
>
> Testing:
> - Built for x86_64, aarch64, ppc64le, and s390x
> - ran all arches on test suites provided by the kernel-tests project
> - internal testing suites: functional testing and performance testing
> - selftests mm
> - I created a test script that I used to push khugepaged to its limits
>    while monitoring a number of stats and tracepoints. The code is
>    available here[1] (Run in legacy mode for these changes and set mthp
>    sizes to inherit)
>    The summary from my testings was that there was no significant
>    regression noticed through this test. In some cases my changes had
>    better collapse latencies, and was able to scan more pages in the same
>    amount of time/work, but for the most part the results were consistent.
> - redis testing. I did some testing with these changes along with my defer
>   changes (see followup [2] post for more details). We've decided to get
>   the mTHP changes merged first before attempting the defer series.
> - some basic testing on 64k page size.
> - lots of general use.
>
> [1] - https://gitlab.com/npache/khugepaged_mthp_test
> [2] - https://lore.kernel.org/lkml/20250515033857.132535-1-npache@redhat.com/
>
> V18 Changes:
> - Added RBs/Acks
> - [patch 02] Guard count_memcg_folio_events with is_pmd_order() to keep
>   THP_COLLAPSE_ALLOC PMD-only (Usama, Lance)
> - [patch 03] Convert C++ comments to C-style; fix "none-page" terminology
>   to "empty PTEs or PTEs mapping the shared zeropage"; drop unnecessary
>   userfaultfd comment; add const to local max_ptes_* variables; fix
>   "repect" typo (Lance, David)
> - [patch 04] collapse_max_ptes_none() now returns 0 instead of -EINVAL for
>   unsupported values; remove SCAN_INVALID_PTES_NONE; change return type
>   from int to unsigned int and propagate to all callers; add comment above
>   __collapse_huge_page_swapin explaining mTHP swap bail-out (David,
>   Lorenzo, Lance, Wei Yang, Usama)
> - [patch 05] Rewrite collapse_huge_page lock comment to David's suggested
>   wording (David)
> - [patch 11] Propagate unsigned int return type for max_ptes_none; remove
>   the now-unnecessary negative return check (consequence of patch 04);
>   Add optimization to the next_order goto that will prevent unnecessary
>   iterations if there are no lower orders enabled (Vernon); update locking
>   comment; pass VMA to mthp_collapse to improve uffd-armed detection, and
>   prevent unnecessary work. (Wei)
> - [patch 14] Update documentation to reflect fallback-to-0 behavior
>
> V17: https://lore.kernel.org/all/20260511185817.686831-1-npache@redhat.com
> V16: https://lore.kernel.org/all/20260419185750.260784-1-npache@redhat.com
> V15: https://lore.kernel.org/all/20260226031741.230674-1-npache@redhat.com
> V14: https://lore.kernel.org/all/20260122192841.128719-1-npache@redhat.com
> V13: https://lore.kernel.org/all/20251201174627.23295-1-npache@redhat.com
> V12: https://lore.kernel.org/all/20251022183717.70829-1-npache@redhat.com
> V11: https://lore.kernel.org/all/20250912032810.197475-1-npache@redhat.com
> V10: https://lore.kernel.org/all/20250819134205.622806-1-npache@redhat.com
> V9 : https://lore.kernel.org/all/20250714003207.113275-1-npache@redhat.com
> V8 : https://lore.kernel.org/all/20250702055742.102808-1-npache@redhat.com
> V7 : https://lore.kernel.org/all/20250515032226.128900-1-npache@redhat.com
> V6 : https://lore.kernel.org/all/20250515030312.125567-1-npache@redhat.com
> V5 : https://lore.kernel.org/all/20250428181218.85925-1-npache@redhat.com
> V4 : https://lore.kernel.org/all/20250417000238.74567-1-npache@redhat.com
> V3 : https://lore.kernel.org/all/20250414220557.35388-1-npache@redhat.com
> V2 : https://lore.kernel.org/all/20250211003028.213461-1-npache@redhat.com
> V1 : https://lore.kernel.org/all/20250108233128.14484-1-npache@redhat.com
>
> Baolin Wang (1):
>   mm/khugepaged: run khugepaged for all orders
>
> Dev Jain (1):
>   mm/khugepaged: generalize alloc_charge_folio()
>
> Nico Pache (12):
>   mm/khugepaged: generalize hugepage_vma_revalidate for mTHP support
>   mm/khugepaged: rework max_ptes_* handling with helper functions
>   mm/khugepaged: generalize __collapse_huge_page_* for mTHP support
>   mm/khugepaged: require collapse_huge_page to enter/exit with the lock
>     dropped
>   mm/khugepaged: generalize collapse_huge_page for mTHP collapse
>   mm/khugepaged: skip collapsing mTHP to smaller orders
>   mm/khugepaged: add per-order mTHP collapse failure statistics
>   mm/khugepaged: improve tracepoints for mTHP orders
>   mm/khugepaged: introduce collapse_allowable_orders helper function
>   mm/khugepaged: Introduce mTHP collapse support
>   mm/khugepaged: avoid unnecessary mTHP collapse attempts
>   Documentation: mm: update the admin guide for mTHP collapse
>
>  Documentation/admin-guide/mm/transhuge.rst |  72 ++-
>  include/linux/huge_mm.h                    |   5 +
>  include/trace/events/huge_memory.h         |  34 +-
>  mm/huge_memory.c                           |  11 +
>  mm/khugepaged.c                            | 634 ++++++++++++++++-----
>  5 files changed, 584 insertions(+), 172 deletions(-)
>
>
> base-commit: 6c8cb505a5634594b3ea159fd1c71bce2acf3346

Whoops I manually changed the coverletter subject to reflect that this
in on mm-hotfixes-unstable but never updated the others...

Hopefully that is ok. Just a small mistake. Base commit is referenced here.

-- Nico


> --
> 2.54.0
>


^ permalink raw reply

* Re: [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP collapse support
From: Vlastimil Babka (SUSE) @ 2026-05-22 15:13 UTC (permalink / raw)
  To: Nico Pache, linux-doc, akpm, linux-kernel, linux-mm,
	linux-trace-kernel
  Cc: aarcange, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, liam, ljs, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <CAA1CXcCoDU_pnp0SmMzRi8wPGB1OBjbbokevq2X_03X1vpWtOw@mail.gmail.com>

On 5/22/26 17:07, Nico Pache wrote:
> On Fri, May 22, 2026 at 8:59 AM Nico Pache <npache@redhat.com> wrote:
>>  include/trace/events/huge_memory.h         |  34 +-
>>  mm/huge_memory.c                           |  11 +
>>  mm/khugepaged.c                            | 634 ++++++++++++++++-----
>>  5 files changed, 584 insertions(+), 172 deletions(-)
>>
>>
>> base-commit: 6c8cb505a5634594b3ea159fd1c71bce2acf3346
> 
> Whoops I manually changed the coverletter subject to reflect that this
> in on mm-hotfixes-unstable but never updated the others...

But why? That branch is for hotfixes that would go to the current 7.1-rcX
series. mm-unstable would be the correct one for this, AFAICT.

> Hopefully that is ok. Just a small mistake. Base commit is referenced here.
> 
> -- Nico
> 
> 
>> --
>> 2.54.0
>>
> 


^ permalink raw reply

* Re: [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP collapse support
From: Lorenzo Stoakes @ 2026-05-22 15:13 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
	akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, david, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, liam, mathieu.desnoyers, matthew.brost, mhiramat,
	mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe
In-Reply-To: <20260522150009.121603-1-npache@redhat.com>

NAK.

This is not a hotfixes candidate Nico. Don't send a massive series like this
with that tag please.

Thanks, Lorenzo

^ permalink raw reply

* Re: [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP collapse support
From: Lorenzo Stoakes @ 2026-05-22 15:16 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-doc, akpm, linux-kernel, linux-mm, linux-trace-kernel,
	aarcange, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, liam, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <CAA1CXcCoDU_pnp0SmMzRi8wPGB1OBjbbokevq2X_03X1vpWtOw@mail.gmail.com>

On Fri, May 22, 2026 at 09:07:29AM -0600, Nico Pache wrote:
> Whoops I manually changed the coverletter subject to reflect that this
> in on mm-hotfixes-unstable but never updated the others...
>
> Hopefully that is ok. Just a small mistake. Base commit is referenced here.

It's not ok, this isn't suitable for a hotfix in any way shape or form?

As you know, because we told you :) May has been difficult because of
conferences, holidays (and in my case burnout recovery).

And unfortunately the series seems to have needed quite a bit of review again
(my suggestion to you would be to ensure you don't make major changes, only
small incremental ones on the basis of review feedback).

So this isn't viable for 7.2, and we'll have to target 7.3. Therefore there
was no rush.

Also please don't spring a respin on this series on us without discussion
first, with people away and (frankly) the amount of work involved here,
you're going to have to accept the pace that workload/availability permits.

Adding spurious hotfixes tags doesn't help anything :) please don't do that
again.

Thanks, Lorenzo

^ permalink raw reply

* Re: [PATCH v2 2/2] cgroup/dmem: add dmem.memcg control file for double-charging to memcg
From: Michal Koutný @ 2026-05-22 15:26 UTC (permalink / raw)
  To: Eric Chanudet
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard,
	Natalie Vock, Tejun Heo, Jonathan Corbet, Shuah Khan, cgroups,
	linux-mm, linux-kernel, dri-devel, T.J. Mercier,
	Christian König, Maxime Ripard, Albert Esteve, Dave Airlie,
	linux-doc
In-Reply-To: <20260519-cgroup-dmem-memcg-double-charge-v2-2-db4d1407062b@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2824 bytes --]

Hello Eric.

On Tue, May 19, 2026 at 11:59:02AM -0400, Eric Chanudet <echanude@redhat.com> wrote:
> Add a root-only cgroupfs file "dmem.memcg" that lets an administrator
> configure whether allocations in a dmem region should also be charged to
> the memory controller.

This kinda makes sense as it is not unlike io.cost.* device
configurators.

Just for my better understanding -- will there be a space for userspace
to switch this? (No charged dmem allocations happen before responsible
userspace runs, so that the attribute remains unlocked.)

(I'm rather indifferent about the actual double charging/non-charging
matter.)


> 
> To handle inheritance, dmem adds a depends_on the memory controller,
> unless MEMCG isn't configured in.
> 
> Double-charging is disabled by default. Once a charge is attempted, the
> setting is locked to prevent inconsistent accounting by a small 4-state
> machine (off, on, locked off, locked on).
> 
> The memcg to charge is derived from the pool's cgroup, since the pool
> holds a reference to the dmem cgroup state that keeps the cgroup alive
> until it gets uncharged.
> 
> Signed-off-by: Eric Chanudet <echanude@redhat.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst |  23 +++++
>  kernel/cgroup/dmem.c                    | 158 +++++++++++++++++++++++++++++++-
>  2 files changed, 178 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 6efd0095ed995b1550317662bc1b56c7a7f3db23..1d2fa55ddf0faa17baa916a8914d3033e8e42359 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -2828,6 +2828,29 @@ DMEM Interface Files
>  	  drm/0000:03:00.0/vram0 12550144
>  	  drm/0000:03:00.0/stolen 8650752
>  
> +  dmem.memcg
> +	A readwrite nested-keyed file that exists only on the root
> +	cgroup.

Strictly speaking this is not nested-keyed but flat keyed [1],
which leads me to realization that this is the first instance of a boolean.
All in call, such a composition comes to my mind (latter is RO):

	drm/0000:03:00.0/vram0 enable=0|1 locked=0|1




> +static ssize_t dmem_cgroup_memcg_write(struct kernfs_open_file *of, char *buf,
> +				       size_t nbytes, loff_t off)
> +{
> +	while (buf) {
> +		struct dmem_cgroup_region *region;
> +		char *options, *name;
> +		bool flag;
> +
> +		options = buf;
> +		buf = strchr(buf, '\n');
> +		if (buf)
> +			*buf++ = '\0';

I recall there was a discussion about accepting only a single device per
write(2) (at the same time I see this idiom is still present in other
dmem.* files, so this is nothing to change in _this_ patch).

Thanks,
Michal

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#format

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply

* Re: [PATCH v2] cpufreq: elanfreq: Drop support for AMD Elan SC4*
From: Rafael J. Wysocki @ 2026-05-22 15:28 UTC (permalink / raw)
  To: Viresh Kumar, Sean Young
  Cc: Jonathan Corbet, Shuah Khan, Zhongqiu Han, linux-doc,
	linux-kernel, linux-pm
In-Reply-To: <nms5cy6tgpcteav34jdtjf5mbmidf6rwc2uuxxef456xzegsyv@54rbblkdjybt>

On Fri, May 8, 2026 at 7:48 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 07-05-26, 10:01, Sean Young wrote:
> > Since commit 8b793a92d862 ("x86/cpu: Remove M486/M486SX/ELAN support"),
> > the AMD Elan SC4* is no longer supported, so the cpu frequency
> > driver is no longer needed.
> >
> > Signed-off-by: Sean Young <sean@mess.org>
> > ---
> > Changes since v1:
> >  - Also removes elanfreq= entry from kernel-parameters.txt
> >
> >  .../admin-guide/kernel-parameters.txt         |   4 -
> >  drivers/cpufreq/Kconfig.x86                   |  15 --
> >  drivers/cpufreq/Makefile                      |   1 -
> >  drivers/cpufreq/elanfreq.c                    | 226 ------------------
> >  4 files changed, 246 deletions(-)
> >  delete mode 100644 drivers/cpufreq/elanfreq.c
>
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

Applied as 7.2 material, thanks!

^ permalink raw reply

* [PATCH v3 0/5] KVM: PPC: Handle CPU compatibility mode for nested guests
From: Amit Machhiwal @ 2026-05-22 15:27 UTC (permalink / raw)
  To: linuxppc-dev, Madhavan Srinivasan
  Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
	Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
	Jonathan Corbet, Shuah Khan, kvm, linux-kernel, linux-doc, lkp

On POWER systems, newer processor generations can operate in compatibility
modes corresponding to earlier generations (e.g., a Power11 system running
in Power10 compatibility mode). In such cases, the effective CPU level
exposed to guests differs from the physical processor generation.

This creates a problem for nested virtualization. When booting a nested KVM
guest (L2) inside a host KVM guest (L1) running in a compatibility mode,
userspace (e.g., QEMU) may derive the CPU model from the raw hardware PVR
and attempt to configure the nested guest accordingly. However, the L1
partition is constrained by the compatibility level negotiated with the
hypervisor (L0), and requests exceeding that level are rejected, leading to
guest boot failures such as:

  KVM-NESTEDv2: couldn't set guest wide elements

This series addresses the issue in two steps:

1. Detect and reject invalid compatibility requests early in KVM to avoid
   late failures.

2. Provide a mechanism for userspace to query the effective CPU
   compatibility modes supported by the host, so it can select an
   appropriate CPU model for nested guests.

To achieve this, the series introduces a new KVM capability and ioctl
(KVM_CAP_PPC_COMPAT_CAPS / KVM_PPC_GET_COMPAT_CAPS) that expose the
compatibility modes supported by the host.

Why a new UAPI?
---------------
While cpu-version is available in /proc/device-tree/cpus/<cpu#>/cpu-version
on both L1 booted on PowerNV and PowerVM LPARs, the UAPI approach is
preferable for several reasons:

1. pHYP (L0) capabilities: On PowerVM, we need to rely on capabilities
   negotiated with pHYP in KVM, not just device tree properties. The
   cpu-version property depicts the current compat mode but doesn't point
   to what all compat modes are supported for the nested guest.

2. procfs dependency: Not all systems run with procfs enabled (CONFIG_PROC_FS
   is optional). Minimal configurations like buildroot might disable it, but
   KVM ioctl works regardless since it accesses kernel data structures
   directly.

3. Kernel validation: The kernel validates and normalizes the compatibility
   information. Patch 1 adds validation logic that rejects invalid
   compatibility requests early, ensuring userspace gets validated,
   consistent data.

4. Abstraction & stability: /proc/device-tree is an implementation detail.
   The UAPI provides a stable interface that won't break if the underlying
   mechanism changes.

5. Semantic clarity: KVM_PPC_GET_COMPAT_CAPS clearly expresses what
   compatibility modes can be used for KVM guests, vs. parsing device tree
   which requires understanding the semantic meaning of cpu-version.

The implementation supports both:

  - PowerVM (nested API v2), where compatibility information is obtained
    via the H_GUEST_GET_CAPABILITIES hypercall.
  - PowerNV (nested API v1), where compatibility is derived from the device
    tree ("cpu-version") representing the effective processor compatibility
    level.

This allows userspace (e.g., QEMU) to select a CPU model consistent with
the host compatibility mode, avoiding mismatches and enabling successful
nested guest boot.

Changes in v3:
  - Added "Why a new UAPI?" section to cover letter addressing questions
    about the need for a new UAPI vs. using existing mechanisms like
    /proc/device-tree
  - Fixed initialization of 'r' in KVM_PPC_GET_COMPAT_CAPS ioctl handler
    from 0 to -ENOTTY for proper error handling when the operation is not
    supported
  - Added Vaibhav's "Suggested-by" tags
  - Have retained Anushree's "Tested-by" tags as no major code changes
  - Fixed documentation build warning reported by kernel test robot and
    added "Reported-by" and "Closes" tags to patch 5

Changes in v2:
  - Squashed patches 2 and 3 from v1 (capability introduction and ioctl
    wiring) into a single patch for better logical grouping
  - Changed kvm_ppc_compat_caps.flags from __u32 to __u64 for consistency
    and future extensibility
  - Addressed other review comments
  - Improved commit messages with clearer explanations of the changes

Patch summary:
  [1/5] Validate arch_compat against host compatibility mode
  [2/5] Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
  [3/5] Implement capability retrieval for PowerVM (API v2)
  [4/5] Add PowerNV support (API v1)
  [5/5] Document the new ioctl

Tested on:
  - Power11 pSeries LPAR in Power10 compatibility mode (nested API v2)
  - Power10 PowerNV system (and QEMU TCG PowerNV 11) with nested
    virtualization (API v1) with various combinations of KVM L1/L2 guests
    in various supported compatibility modes.

With this series, nested guests boot successfully in configurations where
they previously failed due to compatibility mismatches.

Related QEMU series:
  A corresponding QEMU series adds support for querying and using these
  compatibility capabilities when configuring nested KVM guests:
  https://lore.kernel.org/all/20260502140021.69712-1-amachhiw@linux.ibm.com/

v2: https://lore.kernel.org/linuxppc-dev/20260513100755.83215-1-amachhiw@linux.ibm.com/
v1: https://lore.kernel.org/linuxppc-dev/20260430054906.94431-1-amachhiw@linux.ibm.com/

Amit Machhiwal (5):
  KVM: PPC: Book3S HV: Validate arch_compat against host compatibility
    mode
  KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
  KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM
    on PowerVM
  KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM
    on PowerNV
  KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl

 Documentation/virt/kvm/api.rst      | 35 ++++++++++++++++
 arch/powerpc/include/asm/kvm_ppc.h  |  1 +
 arch/powerpc/include/uapi/asm/kvm.h |  6 +++
 arch/powerpc/kvm/book3s_hv.c        | 63 +++++++++++++++++++++++++++++
 arch/powerpc/kvm/powerpc.c          | 21 ++++++++++
 include/uapi/linux/kvm.h            |  4 ++
 6 files changed, 130 insertions(+)


base-commit: 1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524
-- 
2.50.1 (Apple Git-155)

^ permalink raw reply

* [PATCH v3 3/5] KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM
From: Amit Machhiwal @ 2026-05-22 15:27 UTC (permalink / raw)
  To: linuxppc-dev, Madhavan Srinivasan
  Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
	Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
	Jonathan Corbet, Shuah Khan, kvm, linux-kernel, linux-doc, lkp
In-Reply-To: <20260522152744.55251-1-amachhiw@linux.ibm.com>

On POWER systems, the host CPU may run in a compatibility mode (e.g., a
Power11 processor operating in Power10 compatibility mode). In such
cases, the effective CPU level exposed to guests differs from the
physical processor generation.

When running nested KVM guests, QEMU derives the host CPU type using
mfpvr(), which reflects the physical processor version. This can result
in a mismatch between the CPU model selected by QEMU and the
compatibility mode enforced by the host, leading to guest boot failures.

For example, booting a nested guest on a Power11 LPAR configured in
Power10 compatibility mode fails with:

  KVM-NESTEDv2: couldn't set guest wide elements
  [..KVM reg dump..]

This occurs because QEMU selects a CPU model corresponding to the
physical processor (via mfpvr()), while the host operates in a lower
compatibility mode. As a result, KVM rejects the requested compatibility
level during guest initialization.

Add support for retrieving host CPU compatibility capabilities for
nested guests on PowerVM (PAPR nested API v2). The hypervisor provides
the effective compatibility levels via the H_GUEST_GET_CAPABILITIES
hcall, which reflects the processor modes negotiated between the Power
hypervisor (L0) and the host partition (L1).

On pseries systems, obtain the capability bitmap using
plpar_guest_get_capabilities() and return it via struct
kvm_ppc_compat_caps. This information is then exposed to userspace
through the KVM_PPC_GET_COMPAT_CAPS ioctl.

Hook the implementation into the Book3S HV kvmppc_ops so that it can be
invoked by the generic KVM ioctl handling code.

Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Tested-by: Anushree Mathur <anushree.mathur@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 249d1f2e4e2c..38de7040e2b7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6522,6 +6522,21 @@ static bool kvmppc_hash_v3_possible(void)
 	return true;
 }
 
+
+static int kvmppc_get_compat_cpu_caps(struct kvm_ppc_compat_caps *host_caps)
+{
+	unsigned long capabilities = 0;
+	long rc = -EINVAL;
+
+	if (kvmhv_on_pseries()) {
+		if (kvmhv_is_nestedv2())
+			rc = plpar_guest_get_capabilities(0, &capabilities);
+		host_caps->compat_capabilities = capabilities;
+	}
+
+	return rc;
+}
+
 static struct kvmppc_ops kvm_ops_hv = {
 	.get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv,
 	.set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv,
@@ -6564,6 +6579,7 @@ static struct kvmppc_ops kvm_ops_hv = {
 	.hash_v3_possible = kvmppc_hash_v3_possible,
 	.create_vcpu_debugfs = kvmppc_arch_create_vcpu_debugfs_hv,
 	.create_vm_debugfs = kvmppc_arch_create_vm_debugfs_hv,
+	.get_compat_cpu_ver = kvmppc_get_compat_cpu_caps,
 };
 
 static int kvm_init_subcore_bitmap(void)
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* [PATCH v3 1/5] KVM: PPC: Book3S HV: Validate arch_compat against host compatibility mode
From: Amit Machhiwal @ 2026-05-22 15:27 UTC (permalink / raw)
  To: linuxppc-dev, Madhavan Srinivasan
  Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
	Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
	Jonathan Corbet, Shuah Khan, kvm, linux-kernel, linux-doc, lkp
In-Reply-To: <20260522152744.55251-1-amachhiw@linux.ibm.com>

On IBM POWER systems, newer processor generations can operate in
compatibility modes corresponding to earlier generations. This becomes
relevant for nested virtualization, where nested KVM guests may need to
run with a specific processor compatibility level.

Currently, when running a nested KVM guest (L2) inside a Power11 pSeries
logical partition (L1) booted in Power10 compatibility mode, the guest
fails to boot while setting 'arch_compat'. This happens because the CPU
class is derived from the hardware PVR (via mfspr()), which reflects the
physical processor generation (Power11), rather than the effective
compatibility mode (Power10).

As a result, userspace may request a Power11 arch_compat for the L2
guest. However, the L1 partition, running in Power10 compatibility, has
only negotiated support up to Power10 with the Power Hypervisor (L0).
When H_SET_STATE is invoked with a Power11 Logical PVR, the hypervisor
rejects the request, leading to a late guest boot failure:

  KVM-NESTEDv2: couldn't set guest wide elements
  [..KVM reg dump..]

This situation should be detected earlier. Rejecting unsupported
'arch_compat' values in 'kvmppc_set_arch_compat()' avoids issuing an
invalid H_SET_STATE hcall and provides a clearer failure mode.

Add a check to reject Power11 'arch_compat' requests when the host is
running in Power10 compatibility mode, returning -EINVAL early instead
of deferring the failure to the hypervisor.

Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Tested-by: Anushree Mathur <anushree.mathur@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 61dbeea317f3..249d1f2e4e2c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -446,7 +446,19 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
 			guest_pcr_bit = PCR_ARCH_300;
 			break;
 		case PVR_ARCH_31:
+			guest_pcr_bit = PCR_ARCH_31;
+			break;
 		case PVR_ARCH_31_P11:
+			/*
+			 * Need to check this for ISA 3.1, as Power10 and
+			 * Power11 share the same PCR. For any subsequent ISA
+			 * versions, this will be taken care of by the guest vs
+			 * host PCR comparison below.
+			 */
+			if ((PVR_ARCH_31 & cur_cpu_spec->pvr_mask) ==
+				cur_cpu_spec->pvr_value) {
+				return -EINVAL;
+			}
 			guest_pcr_bit = PCR_ARCH_31;
 			break;
 		default:
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* [PATCH v3 2/5] KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
From: Amit Machhiwal @ 2026-05-22 15:27 UTC (permalink / raw)
  To: linuxppc-dev, Madhavan Srinivasan
  Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
	Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
	Jonathan Corbet, Shuah Khan, kvm, linux-kernel, linux-doc, lkp
In-Reply-To: <20260522152744.55251-1-amachhiw@linux.ibm.com>

Introduce a new capability and ioctl to expose CPU compatibility modes
supported by the host processor for nested guests.

On IBM POWER systems, newer processor generations (N) can operate in
compatibility modes corresponding to earlier generations, like (N-1) and
(N-2). This is particularly relevant for nested virtualization, where
nested KVM guests may need to run with a specific processor compatibility
level.

Introduce KVM_CAP_PPC_COMPAT_CAPS capability and the corresponding
KVM_PPC_GET_COMPAT_CAPS vm ioctl. The ioctl returns a bitmap describing
the compatibility modes supported by the host in respective bit numbers,
allowing userspace (e.g., QEMU) to select an appropriate compatibility
level when configuring nested KVM guests.

The ioctl handling is added in kvm_arch_vm_ioctl() and retrieves host
CPU compatibility capabilities via a PowerPC-specific backend
implementation when available. If the capability is not supported, the
ioctl returns success with no capabilities set, allowing userspace to
fall back gracefully.

Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Tested-by: Anushree Mathur <anushree.mathur@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
 arch/powerpc/include/asm/kvm_ppc.h  |  1 +
 arch/powerpc/include/uapi/asm/kvm.h |  6 ++++++
 arch/powerpc/kvm/powerpc.c          | 21 +++++++++++++++++++++
 include/uapi/linux/kvm.h            |  4 ++++
 4 files changed, 32 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 0953f2daa466..cadfb839e836 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -319,6 +319,7 @@ struct kvmppc_ops {
 	bool (*hash_v3_possible)(void);
 	int (*create_vm_debugfs)(struct kvm *kvm);
 	int (*create_vcpu_debugfs)(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry);
+	int (*get_compat_cpu_ver)(struct kvm_ppc_compat_caps *host_caps);
 };
 
 extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 077c5437f521..081d6c7f7f70 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -437,6 +437,12 @@ struct kvm_ppc_cpu_char {
 	__u64	behaviour_mask;		/* valid bits in behaviour */
 };
 
+/* For KVM_PPC_GET_COMPAT_CAPS */
+struct kvm_ppc_compat_caps {
+	__u64	flags;			/* Reserved for future use */
+	__u64	compat_capabilities;	/* Capabilities supported by the host */
+};
+
 /*
  * Values for character and character_mask.
  * These are identical to the values used by H_GET_CPU_CHARACTERISTICS.
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 00302399fc37..02b834ebd8d3 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -697,6 +697,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 			}
 		}
 		break;
+#if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
+	case KVM_CAP_PPC_COMPAT_CAPS:
+		r = 0;
+		if (kvmhv_on_pseries())
+			r = 1;
+		break;
+#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 	default:
 		r = 0;
 		break;
@@ -2463,6 +2470,20 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 		r = kvm->arch.kvm_ops->svm_off(kvm);
 		break;
 	}
+	case KVM_PPC_GET_COMPAT_CAPS: {
+		struct kvm_ppc_compat_caps host_caps;
+
+		r = -ENOTTY;
+		memset(&host_caps, 0, sizeof(host_caps));
+		if (!kvm->arch.kvm_ops->get_compat_cpu_ver)
+			goto out;
+
+		r = kvm->arch.kvm_ops->get_compat_cpu_ver(&host_caps);
+		if (!r && copy_to_user(argp, &host_caps,
+				     sizeof(host_caps)))
+			r = -EFAULT;
+		break;
+	}
 	default: {
 		struct kvm *kvm = filp->private_data;
 		r = kvm->arch.kvm_ops->arch_vm_ioctl(filp, ioctl, arg);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6c8afa2047bf..1788a0068662 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -996,6 +996,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_S390_USER_OPEREXEC 246
 #define KVM_CAP_S390_KEYOP 247
 #define KVM_CAP_S390_VSIE_ESAMODE 248
+#define KVM_CAP_PPC_COMPAT_CAPS 249
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
@@ -1349,6 +1350,9 @@ struct kvm_s390_keyop {
 #define KVM_GET_DEVICE_ATTR	  _IOW(KVMIO,  0xe2, struct kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR	  _IOW(KVMIO,  0xe3, struct kvm_device_attr)
 
+/* Available with KVM_CAP_PPC_COMPAT_CAPS */
+#define KVM_PPC_GET_COMPAT_CAPS	_IOR(KVMIO,  0xe4, struct kvm_ppc_compat_caps)
+
 /*
  * ioctls for vcpu fds
  */
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* [PATCH v3 4/5] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV
From: Amit Machhiwal @ 2026-05-22 15:27 UTC (permalink / raw)
  To: linuxppc-dev, Madhavan Srinivasan
  Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
	Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
	Jonathan Corbet, Shuah Khan, kvm, linux-kernel, linux-doc, lkp
In-Reply-To: <20260522152744.55251-1-amachhiw@linux.ibm.com>

Currently, when booting a compatibility-mode KVM guest (L1) on a PowerNV
hypervisor (L0), the guest runs with the expected processor
compatibility level. However, when booting a nested KVM guest (L2)
inside the L1, QEMU derives the CPU model from the raw host PVR and
attempts to run the nested guest at that level, instead of honoring the
compatibility mode of the L1.

Extend host CPU compatibility capability reporting to support nested
virtualization on PowerNV systems (PAPR nested API v1).

For nested API v2 (PowerVM), compatibility capabilities are obtained
from the hypervisor via the H_GUEST_GET_CAPABILITIES hcall. This
information is not available on PowerNV systems.

For nested API v1, derive the compatibility capabilities from the L1
guest by reading the "cpu-version" property from the device tree, which
reflects the effective (logical) processor compatibility level. Map this
value to the corresponding compatibility capability bitmap.

Introduce a helper to translate CPU version values into compatibility
capability bits and integrate it into kvmppc_get_compat_cpu_caps().

This allows userspace to query host CPU compatibility modes on both
PowerVM and PowerNV platforms via the KVM_PPC_GET_COMPAT_CAPS ioctl.

Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Tested-by: Anushree Mathur <anushree.mathur@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c | 37 +++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 38de7040e2b7..18774c49af85 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6522,15 +6522,50 @@ static bool kvmppc_hash_v3_possible(void)
 	return true;
 }
 
+static int kvmppc_map_compat_capabilities(const __be32 cpu_version,
+				      unsigned long *capabilities)
+{
+	switch (cpu_version) {
+	case PVR_ARCH_31_P11:
+		*capabilities |= H_GUEST_CAP_POWER11;
+		break;
+	case PVR_ARCH_31:
+		*capabilities |= H_GUEST_CAP_POWER10;
+		break;
+	case PVR_ARCH_300:
+		*capabilities |= H_GUEST_CAP_POWER9;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
 
 static int kvmppc_get_compat_cpu_caps(struct kvm_ppc_compat_caps *host_caps)
 {
+	struct device_node *np;
 	unsigned long capabilities = 0;
+	const __be32 *prop = NULL;
 	long rc = -EINVAL;
+	u32 cpu_version;
 
 	if (kvmhv_on_pseries()) {
-		if (kvmhv_is_nestedv2())
+		if (kvmhv_is_nestedv2()) {
 			rc = plpar_guest_get_capabilities(0, &capabilities);
+		} else {
+			for_each_node_by_type(np, "cpu") {
+				prop = of_get_property(np, "cpu-version", NULL);
+				if (prop) {
+					cpu_version = be32_to_cpup(prop);
+					break;
+				}
+			}
+			if (!prop)
+				return -EINVAL;
+			rc = kvmppc_map_compat_capabilities(cpu_version,
+								&capabilities);
+		}
 		host_caps->compat_capabilities = capabilities;
 	}
 
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* [PATCH v3 5/5] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl
From: Amit Machhiwal @ 2026-05-22 15:27 UTC (permalink / raw)
  To: linuxppc-dev, Madhavan Srinivasan
  Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
	Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
	Jonathan Corbet, Shuah Khan, kvm, linux-kernel, linux-doc, lkp
In-Reply-To: <20260522152744.55251-1-amachhiw@linux.ibm.com>

Add documentation for the KVM_PPC_GET_COMPAT_CAPS ioctl to the KVM API
documentation.

The ioctl exposes host processor compatibility modes supported for
nested KVM guests on PowerPC systems.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605140717.W1StD3Ke-lkp@intel.com/
Tested-by: Anushree Mathur <anushree.mathur@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
 Documentation/virt/kvm/api.rst | 35 ++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 52bbbb553ce1..d11e054e6665 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6555,6 +6555,41 @@ KVM_S390_KEYOP_SSKE
 
 .. _kvm_run:
 
+4.145 KVM_PPC_GET_COMPAT_CAPS
+-----------------------------
+:Capability: KVM_CAP_PPC_COMPAT_CAPS
+:Architectures: powerpc
+:Type: vm ioctl
+:Parameters: struct kvm_ppc_compat_caps (out)
+:Returns:
+	0 on successful completion,
+	-EFAULT if ``struct kvm_ppc_compat_caps`` cannot be written
+
+IBM POWER system server-based processors provide a compatibility mode feature
+where an Nth generation processor can operate in modes consistent with earlier
+generations such as (N-1) and (N-2).
+
+This ioctl provides userspace with information about the CPU compatibility modes
+supported by the current host processor for booting the nested KVM guests on
+PowerNV (KVM nested APIv1) and PowerVM (KVM nested APIv2) platforms.
+
+::
+
+  struct kvm_ppc_compat_caps {
+	__u64	flags;			/* Reserved for future use */
+	__u64	compat_capabilities;	/* Capabilities supported by the host */
+  };
+
+The ``compat_capabilities`` bit field describes the processor compatibility
+modes supported by the host. For example, the following bits indicate support
+for specific processor modes.
+
+::
+
+  H_GUEST_CAP_POWER9  (bit 1): KVM guests can run in Power9 processor mode
+  H_GUEST_CAP_POWER10 (bit 2): KVM guests can run in Power10 processor mode
+  H_GUEST_CAP_POWER11 (bit 3): KVM guests can run in Power11 processor mode
+
 5. The kvm_run structure
 ========================
 
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* Re: [PATCH v2] cpufreq: Documentation: fix sampling_down_factor range
From: Rafael J. Wysocki @ 2026-05-22 15:33 UTC (permalink / raw)
  To: Pengjie Zhang
  Cc: rafael, viresh.kumar, corbet, skhan, zhongqiu.han, linux-pm,
	linux-doc, zhanjie9, zhenglifeng1, lihuisong, yubowen8, linhongye,
	linuxarm, wangzhi12
In-Reply-To: <20260518133457.2408463-1-zhangpengjie2@huawei.com>

On Mon, May 18, 2026 at 3:35 PM Pengjie Zhang <zhangpengjie2@huawei.com> wrote:
>
> The ondemand governor implementation accepts sampling_down_factor values
> from 1 to 100000 via MAX_SAMPLING_DOWN_FACTOR, but the documentation in
> admin-guide/pm/cpufreq.rst still says the valid range is 1 to 100.
>
> Update the documentation to match the actual code.
>
> Fixes: 2a0e49279850 ("cpufreq: User/admin documentation update and consolidation")
> Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
> Signed-off-by: Pengjie Zhang <zhangpengjie2@huawei.com>
> ---
> Changes in v2:
> - Modify the title.
> - Add Reviewed-by tag.
> Link to v1:https://lore.kernel.org/all/20260515094930.273599-1-zhangpengjie2@huawei.com/
> ---
>  Documentation/admin-guide/pm/cpufreq.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
> index dbe6d23a5d67..fdca59c955dc 100644
> --- a/Documentation/admin-guide/pm/cpufreq.rst
> +++ b/Documentation/admin-guide/pm/cpufreq.rst
> @@ -516,7 +516,7 @@ This governor exposes the following tunables:
>         of those tasks above 0 and set this attribute to 1.
>
>  ``sampling_down_factor``
> -       Temporary multiplier, between 1 (default) and 100 inclusive, to apply to
> +       Temporary multiplier, between 1 (default) and 100000 inclusive, to apply to
>         the ``sampling_rate`` value if the CPU load goes above ``up_threshold``.
>
>         This causes the next execution of the governor's worker routine (after
> --

Applied as 7.1-rc material, thanks!

^ permalink raw reply

* Re: [PATCH net-next v3 01/14] virtchnl: create 'include/linux/intel' and move necessary header files
From: Jakub Kicinski @ 2026-05-22 15:40 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Larysa Zaremba, Tony Nguyen, davem, pabeni, edumazet,
	andrew+netdev, netdev, przemyslaw.kitszel, sridhar.samudrala,
	anjali.singhai, michal.swiatkowski, maciej.fijalkowski,
	emil.s.tantilov, madhu.chittim, joshua.a.hay, jacob.e.keller,
	jayaprakash.shanmugam, jiri, horms, corbet, richardcochran,
	linux-doc, tatyana.e.nikolova, krzysztof.czurylo, jgg, leon,
	linux-rdma, Samuel Salin, Aleksandr Loktionov
In-Reply-To: <5426379b-1201-4707-8d18-21dca3d1424e@intel.com>

On Fri, 22 May 2026 13:08:08 +0200 Alexander Lobakin wrote:
> >> There are at least
> >>
> >> include/linux/mlx4, include/linux/mlx5 and include/linux/bnxt.
> >>
> >> Those are per-driver and not per-vendor, but intel ethernet has too many drivers 
> >> to have separate folders for them.
> >>
> >> I just do not think this creates a precedent neccessarily.  
> > 
> > You just said the other ones are for specific drivers.  
> 
> Right, but according to your earlier suggestion they belong to
> include/net, not include/linux.
> 
> My understanding is that they're under include/linux, not include/net as
> mlx5 is not only about Ethernet, but also RDMA etc. The same applies to
> Intel's headers.
> 
> What's your position after all this? Still include/net/intel? This
> commit is about stopping scattering Intel headers all over include/linux
> and set one place for them.

I strongly dislike the idea there are "intel" headers. Header files
are not sorted by vendors. That gives off way too much "Intel's corner
of the kernel" vibe. "net+Intel" is fine, but Intel by itself is too
broad.

So IDK. include/net/intel is fine. So is the current layout. Or stick 
to driver / module by module like other vendors.

> >> Folder structure is for you to decide as a maintainer, but it would be nice to 
> >> have known about such doubts earlier.  
> > 
> > I'd love to know if you any suggestions for improving the process.
> > Otherwise please keep your venting off list.  
> 
> I think Larysa just wanted to say that you disliked this commit after
> the series went through several iterations on IWL and 3 iterations here,
> nothing more. It's not about the overall process.

Intel has a strongly negative reviewer score right now.
IMHO it's not appropriate for y'all to complain about upstream
reviews, or how long it takes to get your patches merged...

^ permalink raw reply

* Re: [PATCH v2 1/2] mm/memcontrol: add dmem charge/uncharge functions
From: Shakeel Butt @ 2026-05-22 15:53 UTC (permalink / raw)
  To: Eric Chanudet
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock,
	Tejun Heo, Michal Koutný, Jonathan Corbet, Shuah Khan,
	cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier,
	Christian König, Maxime Ripard, Albert Esteve, Dave Airlie,
	linux-doc
In-Reply-To: <20260519-cgroup-dmem-memcg-double-charge-v2-1-db4d1407062b@redhat.com>

On Tue, May 19, 2026 at 11:59:01AM -0400, Eric Chanudet wrote:
> Add mem_cgroup_dmem_charge() and mem_cgroup_dmem_uncharge() to allow
> dmem pool allocations to optionally be double-charged against the memory
> controller. Take the struct cgroup from the dmem pool's css as there is
> no convenient object exported to represent these allocations. These will
> resolve the effective memory css from that cgroup and perform the
> charge.
> 
> Introduce a MEMCG_DMEM stat counter to memory.stat to make the cgroup's
> dmem charge visible.
> 
> Signed-off-by: Eric Chanudet <echanude@redhat.com>
> ---
>  include/linux/memcontrol.h | 16 ++++++++++++
>  mm/memcontrol.c            | 65 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 81 insertions(+)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index dc3fa687759b45748b2acee6d7f43da325eb50c1..8e1d49b87fb64e6114f3eb920293e14920290fe7 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -39,6 +39,7 @@ enum memcg_stat_item {
>  	MEMCG_ZSWAP_B,
>  	MEMCG_ZSWAPPED,
>  	MEMCG_ZSWAP_INCOMP,
> +	MEMCG_DMEM,
>  	MEMCG_NR_STAT,
>  };
>  
> @@ -1872,6 +1873,21 @@ static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
>  }
>  #endif
>  
> +#if defined(CONFIG_MEMCG) && defined(CONFIG_CGROUP_DMEM)
> +bool mem_cgroup_dmem_charge(struct cgroup *cgrp, unsigned int nr_pages,
> +			    gfp_t gfp_mask);
> +void mem_cgroup_dmem_uncharge(struct cgroup *cgrp, unsigned int nr_pages);
> +#else
> +static inline bool mem_cgroup_dmem_charge(struct cgroup *cgrp,
> +					  unsigned int nr_pages, gfp_t gfp_mask)

Please follow Johannes's request to pass the actually memory object instead of
naked numbers.


^ permalink raw reply

* Re: [PATCH v5 04/28] mtd: spi-nor: swp: Improve locking user experience
From: Miquel Raynal @ 2026-05-22 15:55 UTC (permalink / raw)
  To: Tudor Ambarus
  Cc: Pratyush Yadav, Michael Walle, Takahiro Kuwano,
	Richard Weinberger, Vignesh Raghavendra, Jonathan Corbet,
	Shuah Khan, Sean Anderson, Thomas Petazzoni, Steam Lin, linux-mtd,
	linux-kernel, linux-doc, stable
In-Reply-To: <9432f07f-3724-4257-b6ab-84721e619f78@linaro.org>

On 22/05/2026 at 12:10:45 +03, Tudor Ambarus <tudor.ambarus@linaro.org> wrote:

> On 5/7/26 7:46 PM, Miquel Raynal wrote:
>> Fixes: 3dd8012a8eeb ("mtd: spi-nor: add TB (Top/Bottom) protect support")
>> Cc: stable@kernel.org
> Fixes shall be the first patches in the set.

Technically speaking all four first patches are fixes, except I don't
ask the first one to be backported. The reason why we ask fixes to be
first in the series is because we want them to be as independent as
possible from previous cleanups/changes. Here each four first patch are
targeting completely different places and should not interact with each
other. Anyway, I will re-shuffle the patches.

As for Sashiko's feedback, the AI raises the same point as our previous
discussion: the QE bit handling is really bad, and I am working on
improving this, in another series which waits for this one to land.

However the other warning it raises is IMO wrong: mixed-modes chips
(either read or write working in quad mode, and the other in single
mode) should enable their QE bit anyway. Please raise a warning if you
think this assumption is wrong.

Thanks,
Miquèl

^ permalink raw reply

* Re: [PATCH v2 1/2] mm/memcontrol: add dmem charge/uncharge functions
From: Shakeel Butt @ 2026-05-22 15:55 UTC (permalink / raw)
  To: Eric Chanudet
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock,
	Tejun Heo, Michal Koutný, Jonathan Corbet, Shuah Khan,
	cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier,
	Christian König, Maxime Ripard, Albert Esteve, Dave Airlie,
	linux-doc
In-Reply-To: <ahB7pCu_G4vuswc0@linux.dev>

On Fri, May 22, 2026 at 08:53:10AM -0700, Shakeel Butt wrote:
> On Tue, May 19, 2026 at 11:59:01AM -0400, Eric Chanudet wrote:
> > Add mem_cgroup_dmem_charge() and mem_cgroup_dmem_uncharge() to allow
> > dmem pool allocations to optionally be double-charged against the memory
> > controller. Take the struct cgroup from the dmem pool's css as there is
> > no convenient object exported to represent these allocations. These will
> > resolve the effective memory css from that cgroup and perform the
> > charge.
> > 
> > Introduce a MEMCG_DMEM stat counter to memory.stat to make the cgroup's
> > dmem charge visible.
> > 
> > Signed-off-by: Eric Chanudet <echanude@redhat.com>
> > ---
> >  include/linux/memcontrol.h | 16 ++++++++++++
> >  mm/memcontrol.c            | 65 ++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 81 insertions(+)
> > 
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index dc3fa687759b45748b2acee6d7f43da325eb50c1..8e1d49b87fb64e6114f3eb920293e14920290fe7 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -39,6 +39,7 @@ enum memcg_stat_item {
> >  	MEMCG_ZSWAP_B,
> >  	MEMCG_ZSWAPPED,
> >  	MEMCG_ZSWAP_INCOMP,
> > +	MEMCG_DMEM,
> >  	MEMCG_NR_STAT,
> >  };
> >  
> > @@ -1872,6 +1873,21 @@ static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
> >  }
> >  #endif
> >  
> > +#if defined(CONFIG_MEMCG) && defined(CONFIG_CGROUP_DMEM)
> > +bool mem_cgroup_dmem_charge(struct cgroup *cgrp, unsigned int nr_pages,
> > +			    gfp_t gfp_mask);
> > +void mem_cgroup_dmem_uncharge(struct cgroup *cgrp, unsigned int nr_pages);
> > +#else
> > +static inline bool mem_cgroup_dmem_charge(struct cgroup *cgrp,
> > +					  unsigned int nr_pages, gfp_t gfp_mask)
> 
> Please follow Johannes's request to pass the actually memory object instead of
> naked numbers.
> 

Also what exactly is the backing memory here? Is it system memory? If yes, then
you need to pass struct page. For non-system memory, I am not sure memcg is the
right place to charge such memory.

^ permalink raw reply

* Re: [PATCH v5 13/28] mtd: spi-nor: swp: Create a TB intermediate variable
From: Miquel Raynal @ 2026-05-22 16:06 UTC (permalink / raw)
  To: Tudor Ambarus
  Cc: Pratyush Yadav, Michael Walle, Takahiro Kuwano,
	Richard Weinberger, Vignesh Raghavendra, Jonathan Corbet,
	Shuah Khan, Sean Anderson, Thomas Petazzoni, Steam Lin, linux-mtd,
	linux-kernel, linux-doc
In-Reply-To: <a54562a0-1a75-401c-9508-8e0322d81a3f@linaro.org>


On 22/05/2026 at 12:39:48 +03, Tudor Ambarus <tudor.ambarus@linaro.org> wrote:

> On 5/7/26 7:46 PM, Miquel Raynal wrote:
>> Ease the future reuse of the tb (Top/Bottom) boolean by creating an
>> intermediate variable.
>
> Please squash this in the patch that needs it.

The problem with CMP addition is that it touches all functions all over
the place. I want people to be able to focus on the CMP addition, not
all the side changes which have nothing to do with the CMP addition by
itself. Most of the preparation patches are just steps in that
direction, they could also be squashed, but overall they make the final
diff much simpler. I believe every small change making that last step a
little bit easier to read goes into the right direction?

^ permalink raw reply

* Re: [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP collapse support
From: Nico Pache @ 2026-05-22 16:08 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-doc, akpm, linux-kernel, linux-mm, linux-trace-kernel,
	aarcange, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, liam, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <ahByO_HWn6MB8z-u@lucifer>

On Fri, May 22, 2026 at 9:17 AM Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> On Fri, May 22, 2026 at 09:07:29AM -0600, Nico Pache wrote:
> > Whoops I manually changed the coverletter subject to reflect that this
> > in on mm-hotfixes-unstable but never updated the others...
> >
> > Hopefully that is ok. Just a small mistake. Base commit is referenced here.
>
> It's not ok, this isn't suitable for a hotfix in any way shape or form?
>
> As you know, because we told you :) May has been difficult because of
> conferences, holidays (and in my case burnout recovery).
>
> And unfortunately the series seems to have needed quite a bit of review again
> (my suggestion to you would be to ensure you don't make major changes, only
> small incremental ones on the basis of review feedback).
>
> So this isn't viable for 7.2, and we'll have to target 7.3. Therefore there
> was no rush.
>
> Also please don't spring a respin on this series on us without discussion
> first, with people away and (frankly) the amount of work involved here,
> you're going to have to accept the pace that workload/availability permits.
>
> Adding spurious hotfixes tags doesn't help anything :) please don't do that
> again.

Hi,

Sorry for the confusion but Andrew and I spoke about this before I
sent it, and he confirmed that I should send it against this tree to
prevent merge conflicts.

Because Zi's series depends on this, and this is already in the mm
tree, choosing a candidate before my commits was best to prevent merge
conflicts.

The intent wasn't that this is a hotfix, just that this was the
closest base before the v17 that is already in the tree.

Sorry for the confusion, hopefully Andrew can still apply it to the
correct tree.

-- Nico

>
> Thanks, Lorenzo
>


^ permalink raw reply

* Re: [PATCH v5 04/28] mtd: spi-nor: swp: Improve locking user experience
From: Tudor Ambarus @ 2026-05-22 16:07 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Pratyush Yadav, Michael Walle, Takahiro Kuwano,
	Richard Weinberger, Vignesh Raghavendra, Jonathan Corbet,
	Shuah Khan, Sean Anderson, Thomas Petazzoni, Steam Lin, linux-mtd,
	linux-kernel, linux-doc, stable
In-Reply-To: <875x4fphgr.fsf@bootlin.com>



On 5/22/26 6:55 PM, Miquel Raynal wrote:
> On 22/05/2026 at 12:10:45 +03, Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> 
>> On 5/7/26 7:46 PM, Miquel Raynal wrote:
>>> Fixes: 3dd8012a8eeb ("mtd: spi-nor: add TB (Top/Bottom) protect support")
>>> Cc: stable@kernel.org
>> Fixes shall be the first patches in the set.
> 
> Technically speaking all four first patches are fixes, except I don't
> ask the first one to be backported. The reason why we ask fixes to be
> first in the series is because we want them to be as independent as
> possible from previous cleanups/changes. Here each four first patch are
> targeting completely different places and should not interact with each
> other. Anyway, I will re-shuffle the patches.

you don't need to resend just for that I think. Pratyush or Michael can
re-shuffle when applying.

> 
> As for Sashiko's feedback, the AI raises the same point as our previous
> discussion: the QE bit handling is really bad, and I am working on

I forgot what we talked about, sorry.

> improving this, in another series which waits for this one to land.
> 
> However the other warning it raises is IMO wrong: mixed-modes chips
> (either read or write working in quad mode, and the other in single

that's good to know, thanks. It assures people that the AI feedback was
considered.

> mode) should enable their QE bit anyway. Please raise a warning if you
> think this assumption is wrong.

Not sure if I'll be able to allocate time to review it. No blockers from
my side.

Cheers,
ta

^ permalink raw reply

* Re: [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP collapse support
From: Nico Pache @ 2026-05-22 16:11 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: linux-doc, akpm, linux-kernel, linux-mm, linux-trace-kernel,
	aarcange, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, liam, ljs, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <bd622950-62cf-4b57-b3ac-89635f28fa4f@kernel.org>

On Fri, May 22, 2026 at 9:13 AM Vlastimil Babka (SUSE)
<vbabka@kernel.org> wrote:
>
> On 5/22/26 17:07, Nico Pache wrote:
> > On Fri, May 22, 2026 at 8:59 AM Nico Pache <npache@redhat.com> wrote:
> >>  include/trace/events/huge_memory.h         |  34 +-
> >>  mm/huge_memory.c                           |  11 +
> >>  mm/khugepaged.c                            | 634 ++++++++++++++++-----
> >>  5 files changed, 584 insertions(+), 172 deletions(-)
> >>
> >>
> >> base-commit: 6c8cb505a5634594b3ea159fd1c71bce2acf3346
> >
> > Whoops I manually changed the coverletter subject to reflect that this
> > in on mm-hotfixes-unstable but never updated the others...
>
> But why? That branch is for hotfixes that would go to the current 7.1-rcX
> series. mm-unstable would be the correct one for this, AFAICT.

Sorry this was a misunderstanding. The goal here was to base this off
the closest base commit behind where my v17 already lies in the tree.

That just happened to be the hotfixes tree (previously it was
mm-unstable, but that seems the have moved).

Sorry...
-- Nico

>
> > Hopefully that is ok. Just a small mistake. Base commit is referenced here.
> >
> > -- Nico
> >
> >
> >> --
> >> 2.54.0
> >>
> >
>


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox