patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Heiko Carstens <hca@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 6.17-6.12] s390/mm: Use __GFP_ACCOUNT for user page table allocations
Date: Thu,  2 Oct 2025 11:29:59 -0400	[thread overview]
Message-ID: <20251002153025.2209281-12-sashal@kernel.org> (raw)
In-Reply-To: <20251002153025.2209281-1-sashal@kernel.org>

From: Heiko Carstens <hca@linux.ibm.com>

[ Upstream commit 5671ce2a1fc6b4a16cff962423bc416b92cac3c8 ]

Add missing kmemcg accounting of user page table allocations.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Backport Analysis Complete

**Backport Status: YES**

### Extensive Analysis

#### 1. Code Changes Analysis

The commit makes three specific changes to `arch/s390/mm/pgalloc.c`:

**Change 1 - `crst_table_alloc()` (lines 17-27):**
- Switches from `GFP_KERNEL` to `GFP_KERNEL_ACCOUNT`
- Adds check: if `mm == &init_mm`, removes `__GFP_ACCOUNT` flag
- This ensures kernel page tables for init_mm are NOT accounted (correct
  behavior)

**Change 2 - `page_table_alloc_pgste()` (line 120):**
- Under `CONFIG_PGSTE` (KVM guest support)
- Changes `GFP_KERNEL` to `GFP_KERNEL_ACCOUNT`
- No init_mm check here (pgste tables are always for user processes)

**Change 3 - `page_table_alloc()` (lines 137-148):**
- Similar to Change 1: uses `GFP_KERNEL_ACCOUNT` with init_mm exception
- This is the main user page table allocation path

#### 2. Historical Context

Through extensive kernel repository investigation, I found:

- **x86 got this in v4.10 (July 2016)** via commit 3e79ec7ddc33e by
  Vladimir Davydov
- **powerpc got this in v4.13 (May 2017)** via commits abd667be1502f and
  de3b87611dd1f
- **s390 is getting it NOW (September 2025)** - **9 years after x86!**

The original x86 commit message explains the rationale clearly:
> "Page tables can bite a relatively big chunk off system memory and
their allocations are easy to trigger from userspace, so they should be
accounted to kmemcg."

The pattern established in commit 3e79ec7ddc33e is identical to what
s390 implements: use `GFP_KERNEL_ACCOUNT` but clear `__GFP_ACCOUNT` for
init_mm because kernel page tables can be shared across cgroups.

#### 3. Impact of Missing Accounting

**Without this patch:**
- s390 systems running with memory cgroups cannot properly account page
  table memory
- Users can bypass memory limits by creating many page tables (fork
  bombs, etc.)
- OOM killer may make incorrect decisions due to unaccounted memory
- Memory accounting is incomplete and incorrect for containerized
  workloads

**With this patch:**
- Page tables are properly charged to the cgroup that allocates them
- Memory limits are enforced correctly
- OOM decisions are based on complete memory usage information

#### 4. Risk Assessment

**Regression Risk: VERY LOW**

- Change is architecture-specific (s390 only)
- Only modifies GFP flags in 3 functions
- Pattern proven by 9 years of use on x86 (since v4.10)
- Pattern proven by 8 years of use on powerpc (since v4.13)
- Code is straightforward and follows established kernel patterns

**Potential Side Effect:**
Workloads that were previously hitting high page table usage without
hitting memory limits might now hit those limits. However, this is
**correct behavior** - the accounting was missing before, and limits
were being bypassed incorrectly.

#### 5. Backport Evidence

**Critical finding:** This commit has ALREADY been selected for backport
to stable 6.17:
- Found as commit dc70c002dd2df in linux-autosel-6.17-2 tree
- Contains "[ Upstream commit 5671ce2a1fc6b... ]" tag
- Signed-off-by: Sasha Levin <sashal@kernel.org>
- This indicates the AUTOSEL process identified it as backport-worthy

#### 6. Why This Should Be Backported

**Meets all stable tree criteria:**

1. ✅ **Fixes a bug:** Missing kmemcg accounting is a correctness bug
2. ✅ **Important for users:** Affects anyone using memory cgroups on
   s390 (containers, systemd, cloud)
3. ✅ **Small and contained:** Only 3 functions changed, all in one file
4. ✅ **Low regression risk:** Pattern proven for 9 years on other
   architectures
5. ✅ **No architectural changes:** Just GFP flag modifications
6. ✅ **Brings s390 to parity:** Other architectures have had this for
   years

**Comparison with similar backports:**
Neither the x86 (3e79ec7ddc33e) nor powerpc (abd667be1502f) commits had
explicit `Cc: stable` tags, yet this represents a clear correctness
issue that affects real workloads using cgroups.

#### 7. Specific Code Path Analysis

Looking at the modified functions:

- `crst_table_alloc()`: Called during page table upgrades (66:73,
  expanding address space)
- `page_table_alloc_pgste()`: KVM guest page tables (CONFIG_PGSTE)
- `page_table_alloc()`: Main PTE allocation path (called for every new
  page mapping)

All three are in the hot path for memory allocation and MUST account to
cgroups for proper resource control.

### Conclusion

**YES, this commit should be backported** because:

1. It fixes missing functionality that has been standard on other
   architectures since 2016-2017
2. It affects correctness of memory accounting in production systems
   using cgroups
3. The change is minimal, well-understood, and low-risk
4. It has already been identified by AUTOSEL as worthy of backporting to
   6.17
5. s390 users running containers or using systemd (which uses cgroups
   extensively) need accurate memory accounting

The 9-year delay in adding this to s390 makes it even more important to
backport, as it brings s390 to parity with other architectures for a
fundamental memory management feature.

 arch/s390/mm/pgalloc.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index d2f6f1f6d2fcb..ad3e0f7f7fc1f 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -16,9 +16,13 @@
 
 unsigned long *crst_table_alloc(struct mm_struct *mm)
 {
-	struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, CRST_ALLOC_ORDER);
+	gfp_t gfp = GFP_KERNEL_ACCOUNT;
+	struct ptdesc *ptdesc;
 	unsigned long *table;
 
+	if (mm == &init_mm)
+		gfp &= ~__GFP_ACCOUNT;
+	ptdesc = pagetable_alloc(gfp, CRST_ALLOC_ORDER);
 	if (!ptdesc)
 		return NULL;
 	table = ptdesc_to_virt(ptdesc);
@@ -117,7 +121,7 @@ struct ptdesc *page_table_alloc_pgste(struct mm_struct *mm)
 	struct ptdesc *ptdesc;
 	u64 *table;
 
-	ptdesc = pagetable_alloc(GFP_KERNEL, 0);
+	ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, 0);
 	if (ptdesc) {
 		table = (u64 *)ptdesc_to_virt(ptdesc);
 		__arch_set_page_dat(table, 1);
@@ -136,10 +140,13 @@ void page_table_free_pgste(struct ptdesc *ptdesc)
 
 unsigned long *page_table_alloc(struct mm_struct *mm)
 {
+	gfp_t gfp = GFP_KERNEL_ACCOUNT;
 	struct ptdesc *ptdesc;
 	unsigned long *table;
 
-	ptdesc = pagetable_alloc(GFP_KERNEL, 0);
+	if (mm == &init_mm)
+		gfp &= ~__GFP_ACCOUNT;
+	ptdesc = pagetable_alloc(gfp, 0);
 	if (!ptdesc)
 		return NULL;
 	if (!pagetable_pte_ctor(mm, ptdesc)) {
-- 
2.51.0


  parent reply	other threads:[~2025-10-02 15:30 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-02 15:29 [PATCH AUTOSEL 6.17-5.4] hfs: fix KMSAN uninit-value issue in hfs_find_set_zero_bits() Sasha Levin
2025-10-02 15:29 ` [PATCH AUTOSEL 6.17-6.12] arm64: sysreg: Correct sign definitions for EIESB and DoubleLock Sasha Levin
2025-10-02 15:29 ` [PATCH AUTOSEL 6.17-5.4] hfs: clear offset and space out of valid records in b-tree node Sasha Levin
2025-10-02 15:29 ` [PATCH AUTOSEL 6.17-5.4] hfsplus: return EIO when type of hidden directory mismatch in hfsplus_fill_super() Sasha Levin
2025-10-02 15:29 ` [PATCH AUTOSEL 6.17-6.1] powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure Sasha Levin
2025-10-02 15:29 ` [PATCH AUTOSEL 6.17-5.4] m68k: bitops: Fix find_*_bit() signatures Sasha Levin
2025-10-02 15:29 ` [PATCH AUTOSEL 6.17] smb: client: make use of ib_wc_status_msg() and skip IB_WC_WR_FLUSH_ERR logging Sasha Levin
2025-10-02 15:29 ` [PATCH AUTOSEL 6.17-6.16] arm64: realm: ioremap: Allow mapping memory as encrypted Sasha Levin
2025-10-02 16:43   ` Suzuki K Poulose
2025-10-21 15:38     ` Sasha Levin
2025-10-02 15:29 ` [PATCH AUTOSEL 6.17-6.12] gfs2: Fix unlikely race in gdlm_put_lock Sasha Levin
2025-10-02 15:29 ` [PATCH AUTOSEL 6.17-6.1] smb: server: let smb_direct_flush_send_list() invalidate a remote key first Sasha Levin
2025-10-02 15:29 ` [PATCH AUTOSEL 6.17-5.15] nios2: ensure that memblock.current_limit is set when setting pfn limits Sasha Levin
2025-10-02 15:29 ` Sasha Levin [this message]
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.16] riscv: mm: Return intended SATP mode for noXlvl options Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.16] gfs2: Fix LM_FLAG_TRY* logic in add_to_queue Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.16] dlm: move to rinfo for all middle conversion cases Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-5.4] hfsplus: fix KMSAN uninit-value issue in hfsplus_delete_cat() Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-5.4] exec: Fix incorrect type for ret Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-5.4] hfsplus: fix KMSAN uninit-value issue in __hfsplus_ext_cache_extent() Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.1] lkdtm: fortify: Fix potential NULL dereference on kmalloc failure Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.16] riscv: mm: Use mmu-type from FDT to limit SATP mode Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.6] Unbreak 'make tools/*' for user-space targets Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-5.4] hfs: make proper initalization of struct hfs_find_data Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-5.4] hfsplus: fix slab-out-of-bounds read in hfsplus_strcasecmp() Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.16] riscv: cpufeature: add validation for zfa, zfh and zfhmin Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.12] PCI: Test for bit underflow in pcie_set_readrq() Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.16] s390/pkey: Forward keygenflags to ep11_unwrapkey Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.6] drivers/perf: hisi: Relax the event ID check in the framework Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-5.4] hfs: validate record offset in hfsplus_bmap_alloc Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17] smb: client: limit the range of info->receive_credit_target Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-5.4] dlm: check for defined force value in dlm_lockspace_release Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.12] binfmt_elf: preserve original ELF e_flags for core dumps Sasha Levin
2025-10-02 15:58   ` Kees Cook
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.16] arm64: errata: Apply workarounds for Neoverse-V3AE Sasha Levin
2025-10-02 15:30 ` [PATCH AUTOSEL 6.17-6.16] smb: client: queue post_recv_credits_work also if the peer raises the credit target Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251002153025.2209281-12-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=agordeev@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).