From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lgeamrelo03.lge.com (lgeamrelo03.lge.com [156.147.51.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3125933AD9A for ; Tue, 21 Apr 2026 05:53:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=156.147.51.102 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776750822; cv=none; b=RULpBGtttx9OLSL67O+StTEUxV2nfzwfZYM4prn54HAXdgV5VpKquJNZe9xDqxOQtoiyp9iHwPIDtlqpKk+1AKFSC+fbTgiIEGSiv/IKBv5vX0ZwVxsnIHB3zY9XmMXRuSRwo8TEFuXFaVQhl993L9u8jj9PORQB+DzifqU0ljc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776750822; c=relaxed/simple; bh=lZMUhyqBQDIsfqkQfpeOoymyJIsRxrznpSLRc744C7g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Nz+wluCm5yFWOGiLzpuqK+IVQQv5KHt2ME8qgknO4ua6tafZL1s2A0Che9JntMv/hshmmxbzj3mFfEKoOav73LecRpggfpYb6wyayoNmArCugnwJI9tTGSIAIguFFYtVJE+DfqTLsvFuMD9oT9ExEfLRkEUw0VIdRNJ4PjN1TOU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lge.com; spf=pass smtp.mailfrom=lge.com; arc=none smtp.client-ip=156.147.51.102 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lge.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lge.com Received: from unknown (HELO yjaykim-PowerEdge-T330.lge.net) (10.177.112.156) by 156.147.51.102 with ESMTP; 21 Apr 2026 14:53:32 +0900 X-Original-SENDERIP: 10.177.112.156 X-Original-MAILFROM: youngjun.park@lge.com From: Youngjun Park To: akpm@linux-foundation.org Cc: chrisl@kernel.org, youngjun.park@lge.com, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kasong@tencent.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, gunho.lee@lge.com, taejoon.song@lge.com, hyungjun.cho@lge.com, mkoutny@suse.com, baver.bae@lge.com, matia.kim@lge.com Subject: [PATCH v6 3/4] mm: memcontrol: add interfaces for swap tier selection Date: Tue, 21 Apr 2026 14:53:22 +0900 Message-Id: <20260421055323.940344-4-youngjun.park@lge.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260421055323.940344-1-youngjun.park@lge.com> References: <20260421055323.940344-1-youngjun.park@lge.com> Precedence: bulk X-Mailing-List: cgroups@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Integrate swap tier infrastructure with cgroup to allow selecting specific swap devices per cgroup. Introduce memory.swap.tiers for configuring allowed tiers, and memory.swap.tiers.effective for exposing the effective tiers. The effective tiers are the intersection of the configured tiers and the parent's effective tiers. Note that cgroups do not pin swap tiers, similar to cpuset and CPU hotplug, allowing configuration changes regardless of usage. Signed-off-by: Youngjun Park --- Documentation/admin-guide/cgroup-v2.rst | 29 +++++++ include/linux/memcontrol.h | 5 ++ mm/memcontrol.c | 96 +++++++++++++++++++++ mm/swap_state.c | 5 +- mm/swap_tier.c | 108 +++++++++++++++++++++++- mm/swap_tier.h | 56 ++++++++++-- 6 files changed, 289 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 8ad0b2781317..823247f5c72c 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1850,6 +1850,35 @@ The following nested keys are defined. Swap usage hard limit. If a cgroup's swap usage reaches this limit, anonymous memory of the cgroup will not be swapped out. + memory.swap.tiers + A read-write file which exists on non-root cgroups. + Format is similar to cgroup.subtree_control. + + Controls which swap tiers this cgroup is allowed to swap + out to. All tiers are enabled by default. + + :: + + (-|+)TIER [(-|+)TIER ...] + + "-" disables a tier, "+" re-enables it. + Entries are whitespace-delimited. + + Changes here are combined with parent restrictions to + compute memory.swap.tiers.effective. + + If a tier is removed from /sys/kernel/mm/swap/tiers, + any prior disable for that tier is invalidated. + + memory.swap.tiers.effective + A read-only file which exists on non-root cgroups. + + Shows the tiers this cgroup can actually swap out to. + This is the intersection of the parent's effective tiers + and this cgroup's own memory.swap.tiers configuration. + A child cannot enable a tier that is disabled in its + parent. + memory.swap.events A read-only flat-keyed file which exists on non-root cgroups. The following entries are defined. Unless specified diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index dc3fa687759b..4ce1de14a026 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -286,6 +286,11 @@ struct mem_cgroup { struct lru_gen_mm_list mm_list; #endif +#ifdef CONFIG_SWAP + int tier_mask; + int tier_effective_mask; +#endif + #ifdef CONFIG_MEMCG_V1 /* Legacy consumer-oriented counters */ struct page_counter kmem; /* v1 only */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c3d98ab41f1f..0f67572e5e3e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -68,6 +68,7 @@ #include #include "slab.h" #include "memcontrol-v1.h" +#include "swap_tier.h" #include @@ -4130,6 +4131,8 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css) refcount_set(&memcg->id.ref, 1); css_get(css); + swap_tiers_memcg_inherit_mask(memcg); + /* * Ensure mem_cgroup_from_private_id() works once we're fully online. * @@ -5667,6 +5670,88 @@ static int swap_events_show(struct seq_file *m, void *v) return 0; } +static int swap_tier_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + swap_tiers_mask_show(m, READ_ONCE(memcg->tier_mask)); + return 0; +} + +static ssize_t swap_tier_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + char *pos, *token; + int ret = 0; + int original_mask = 0; + + pos = strstrip(buf); + + spin_lock(&swap_tier_lock); + if (!*pos) { + WRITE_ONCE(memcg->tier_mask, TIER_ALL_MASK); + goto sync; + } + + original_mask = memcg->tier_mask; + + while ((token = strsep(&pos, " \t\n")) != NULL) { + int mask; + + if (!*token) + continue; + + if (token[0] != '-' && token[0] != '+') { + ret = -EINVAL; + goto err; + } + + mask = swap_tiers_mask_lookup(token+1); + if (!mask) { + ret = -EINVAL; + goto err; + } + + /* + * if child already set, cannot add that tiers for hierarch mismatching. + * parent compatible, child must respect parent selected swap device. + */ + switch (token[0]) { + case '-': + WRITE_ONCE(memcg->tier_mask, + memcg->tier_mask & ~mask); + break; + case '+': + WRITE_ONCE(memcg->tier_mask, + memcg->tier_mask | mask); + break; + default: + ret = -EINVAL; + break; + } + + if (ret) + goto err; + } + +sync: + swap_tiers_memcg_sync_mask(memcg); +err: + if (ret) + WRITE_ONCE(memcg->tier_mask, original_mask); + spin_unlock(&swap_tier_lock); + return ret ? ret : nbytes; +} + +static int swap_tier_effective_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + swap_tiers_mask_show(m, READ_ONCE(memcg->tier_effective_mask)); + return 0; +} + static struct cftype swap_files[] = { { .name = "swap.current", @@ -5699,6 +5784,17 @@ static struct cftype swap_files[] = { .file_offset = offsetof(struct mem_cgroup, swap_events_file), .seq_show = swap_events_show, }, + { + .name = "swap.tiers", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = swap_tier_show, + .write = swap_tier_write, + }, + { + .name = "swap.tiers.effective", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = swap_tier_effective_show, + }, { } /* terminate */ }; diff --git a/mm/swap_state.c b/mm/swap_state.c index 182e6d6c9c89..cc31c59473f7 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -938,6 +938,7 @@ static ssize_t tiers_store(struct kobject *kobj, char *p, *token, *name, *tmp; int ret = 0; short prio; + int mask = 0; tmp = kstrdup(buf, GFP_KERNEL); if (!tmp) @@ -970,7 +971,7 @@ static ssize_t tiers_store(struct kobject *kobj, goto restore; break; case '-': - ret = swap_tiers_remove(token + 1); + ret = swap_tiers_remove(token + 1, &mask); if (ret) goto restore; break; @@ -980,7 +981,7 @@ static ssize_t tiers_store(struct kobject *kobj, } } - if (!swap_tiers_update()) { + if (!swap_tiers_update(mask)) { ret = -EINVAL; goto restore; } diff --git a/mm/swap_tier.c b/mm/swap_tier.c index 9019b8978770..b53d6cc67d1e 100644 --- a/mm/swap_tier.c +++ b/mm/swap_tier.c @@ -253,7 +253,7 @@ int swap_tiers_add(const char *name, int prio) return ret; } -int swap_tiers_remove(const char *name) +int swap_tiers_remove(const char *name, int *mask) { int ret = 0; struct swap_tier *tier; @@ -276,6 +276,7 @@ int swap_tiers_remove(const char *name) list_prev_entry(tier, list)->prio = DEF_SWAP_PRIO; swap_tier_inactivate(tier); + *mask |= TIER_MASK(tier); return ret; } @@ -344,7 +345,36 @@ void swap_tiers_assign_dev(struct swap_info_struct *swp) swp->tier_mask = TIER_DEFAULT_MASK; } -bool swap_tiers_update(void) +#ifdef CONFIG_MEMCG +/* + * When a tier is removed, set its bit in every memcg's tier_mask and + * tier_effective_mask. This prevents stale tier indices from being + * silently filtered out if the same index is reused later. + */ +static void swap_tier_memcg_propagate(int mask) +{ + struct mem_cgroup *child; + + /* + * Writes are serialised by swap_tier_lock. Pair WRITE_ONCE here + * with READ_ONCE readers in swap_tier_show(), + * swap_tier_effective_show() and folio_tier_effective_mask(). + */ + rcu_read_lock(); + for_each_mem_cgroup_tree(child, root_mem_cgroup) { + WRITE_ONCE(child->tier_mask, child->tier_mask | mask); + WRITE_ONCE(child->tier_effective_mask, + child->tier_effective_mask | mask); + } + rcu_read_unlock(); +} +#else +static void swap_tier_memcg_propagate(int mask) +{ +} +#endif + +bool swap_tiers_update(int mask) { struct swap_tier *tier; struct swap_info_struct *swp; @@ -374,6 +404,80 @@ bool swap_tiers_update(void) break; swap_tiers_assign_dev(swp); } + /* + * XXX: Unused tiers default to ON, disabled after next tier added. + * Use removed tier mask to clear settings for removed/re-added tiers. + * (Could hold tier refs, but better to keep cgroup config independent) + */ + if (mask) + swap_tier_memcg_propagate(mask); return true; } + +#ifdef CONFIG_MEMCG +void swap_tiers_mask_show(struct seq_file *m, int mask) +{ + struct swap_tier *tier; + + spin_lock(&swap_tier_lock); + for_each_active_tier(tier) { + if (mask & TIER_MASK(tier)) + seq_printf(m, "%s ", tier->name); + } + spin_unlock(&swap_tier_lock); + seq_puts(m, "\n"); +} + +int swap_tiers_mask_lookup(const char *name) +{ + struct swap_tier *tier; + + lockdep_assert_held(&swap_tier_lock); + + for_each_active_tier(tier) { + if (!strcmp(name, tier->name)) + return TIER_MASK(tier); + } + + return 0; +} + +static void __swap_tier_memcg_inherit_mask(struct mem_cgroup *memcg, + struct mem_cgroup *parent) +{ + int parent_mask = parent + ? READ_ONCE(parent->tier_effective_mask) + : TIER_ALL_MASK; + + WRITE_ONCE(memcg->tier_effective_mask, + parent_mask & READ_ONCE(memcg->tier_mask)); +} + +/* Computes the initial effective mask from the parent's effective mask. */ +void swap_tiers_memcg_inherit_mask(struct mem_cgroup *memcg) +{ + spin_lock(&swap_tier_lock); + rcu_read_lock(); + memcg->tier_mask = TIER_ALL_MASK; + __swap_tier_memcg_inherit_mask(memcg, parent_mem_cgroup(memcg)); + rcu_read_unlock(); + spin_unlock(&swap_tier_lock); +} + +/* + * Called when a memcg's tier_mask is modified. Walks the subtree + * and recomputes each descendant's effective mask against its parent. + */ +void swap_tiers_memcg_sync_mask(struct mem_cgroup *memcg) +{ + struct mem_cgroup *child; + + lockdep_assert_held(&swap_tier_lock); + + rcu_read_lock(); + for_each_mem_cgroup_tree(child, memcg) + __swap_tier_memcg_inherit_mask(child, parent_mem_cgroup(child)); + rcu_read_unlock(); +} +#endif diff --git a/mm/swap_tier.h b/mm/swap_tier.h index 3e355f857363..49433dcaa1ce 100644 --- a/mm/swap_tier.h +++ b/mm/swap_tier.h @@ -10,22 +10,66 @@ struct swap_info_struct; extern spinlock_t swap_tier_lock; -#define TIER_ALL_MASK (~0) -#define TIER_DEFAULT_IDX (31) -#define TIER_DEFAULT_MASK (1U << TIER_DEFAULT_IDX) - /* Initialization and application */ void swap_tiers_init(void); ssize_t swap_tiers_sysfs_show(char *buf); int swap_tiers_add(const char *name, int prio); -int swap_tiers_remove(const char *name); +int swap_tiers_remove(const char *name, int *mask); void swap_tiers_snapshot(void); void swap_tiers_snapshot_restore(void); -bool swap_tiers_update(void); +bool swap_tiers_update(int mask); /* Tier assignment */ void swap_tiers_assign_dev(struct swap_info_struct *swp); +#define TIER_ALL_MASK (~0) +#define TIER_DEFAULT_IDX (31) +#define TIER_DEFAULT_MASK (1U << TIER_DEFAULT_IDX) + +#if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) +/* Memcg related functions */ +void swap_tiers_mask_show(struct seq_file *m, int mask); +void swap_tiers_memcg_inherit_mask(struct mem_cgroup *memcg); +void swap_tiers_memcg_sync_mask(struct mem_cgroup *memcg); +int swap_tiers_mask_lookup(const char *name); +static inline int folio_tier_effective_mask(struct folio *folio) +{ + struct mem_cgroup *memcg; + int mask = TIER_ALL_MASK; + + rcu_read_lock(); + memcg = folio_memcg(folio); + if (memcg) + mask = READ_ONCE(memcg->tier_effective_mask); + rcu_read_unlock(); + + return mask; +} +#else +static inline void swap_tiers_mask_show(struct seq_file *m, int mask) {} +static inline void swap_tiers_memcg_inherit_mask(struct mem_cgroup *memcg) {} +static inline void swap_tiers_memcg_sync_mask(struct mem_cgroup *memcg) {} +static inline int swap_tiers_mask_lookup(const char *name) +{ + return 0; +} +static inline int folio_tier_effective_mask(struct folio *folio) +{ + return TIER_ALL_MASK; +} +#endif + +/** + * swap_tiers_mask_test - Check if the tier mask is valid + * @tier_mask: The tier mask to check + * @mask: The mask to compare against + * + * Return: true if condition matches, false otherwise + */ +static inline bool swap_tiers_mask_test(int tier_mask, int mask) +{ + return tier_mask & mask; +} #endif /* _SWAP_TIER_H */ -- 2.34.1