From: Baoquan He <baoquan.he@linux.dev>
To: Youngjun Park <youngjun.park@lge.com>
Cc: akpm@linux-foundation.org, chrisl@kernel.org, linux-mm@kvack.org,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
kasong@tencent.com, hannes@cmpxchg.org, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
muchun.song@linux.dev, shikemeng@huaweicloud.com,
nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org,
gunho.lee@lge.com, taejoon.song@lge.com, hyungjun.cho@lge.com,
mkoutny@suse.com, baver.bae@lge.com, matia.kim@lge.com
Subject: Re: [PATCH v6 4/4] mm: swap: filter swap allocation by memcg tier mask
Date: Wed, 27 May 2026 09:42:54 +0800 [thread overview]
Message-ID: <ahZMHmMbhnNPspQj@MiWiFi-R3L-srv> (raw)
In-Reply-To: <20260421055323.940344-5-youngjun.park@lge.com>
On 04/21/26 at 02:53pm, Youngjun Park wrote:
> Apply memcg tier effective mask during swap slot allocation to
> enforce per-cgroup swap tier restrictions.
>
> In the fast path, check the percpu cached swap_info's tier_mask
> against the folio's effective mask. If it does not match, fall
> through to the slow path. In the slow path, skip swap devices
> whose tier_mask is not covered by the folio's effective mask.
>
> This works correctly when there is only one non-rotational
> device in the system and no devices share the same priority.
> However, there are known limitations:
>
> - When non-rotational devices are distributed across multiple
> tiers, and different memcgs are configured to use those
> distinct tiers, they may constantly overwrite the shared
> percpu swap cache. This cache thrashing leads to frequent
> fast path misses.
>
> - Combined with the above issue, if same-priority devices exist
> among them, a percpu cache miss (overwritten by another memcg)
> forces the allocator to round-robin to the next device
> prematurely, even if the current cluster is not fully
> exhausted.
>
> These edge cases do not affect the primary use case of
> directing swap traffic per cgroup. Further optimization is
> planned for future work.
>
> Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> ---
> mm/swapfile.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index d5abc831cde7..8734e5d26b08 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1352,15 +1352,22 @@ static bool swap_alloc_fast(struct folio *folio)
> struct swap_cluster_info *ci;
> struct swap_info_struct *si;
> unsigned int offset;
> + int mask = folio_tier_effective_mask(folio);
>
> /*
> * Once allocated, swap_info_struct will never be completely freed,
> * so checking it's liveness by get_swap_device_info is enough.
> */
> si = this_cpu_read(percpu_swap_cluster.si[order]);
> + if (!si || !swap_tiers_mask_test(si->tier_mask, mask) ||
> + !get_swap_device_info(si))
> + return false;
> +
> offset = this_cpu_read(percpu_swap_cluster.offset[order]);
> - if (!si || !offset || !get_swap_device_info(si))
> + if (!offset) {
> + put_swap_device(si);
> return false;
> + }
The whole patch looks good to me except of one nitpick. Is it a lille
cleaner with below tiny adjustment?
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2864cd8c2da9..cdf453bf6b80 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1359,15 +1359,12 @@ static bool swap_alloc_fast(struct folio *folio)
* so checking it's liveness by get_swap_device_info is enough.
*/
si = this_cpu_read(percpu_swap_cluster.si[order]);
- if (!si || !swap_tiers_mask_test(si->tier_mask, mask) ||
- !get_swap_device_info(si))
+ if (!si || !swap_tiers_mask_test(si->tier_mask, mask))
return false;
offset = this_cpu_read(percpu_swap_cluster.offset[order]);
- if (!offset) {
- put_swap_device(si);
+ if (!offset || !get_swap_device_info(si))
return false;
- }
ci = swap_cluster_lock(si, offset);
if (cluster_is_usable(ci, order)) {
>
> ci = swap_cluster_lock(si, offset);
> if (cluster_is_usable(ci, order)) {
> @@ -1379,10 +1386,14 @@ static bool swap_alloc_fast(struct folio *folio)
> static void swap_alloc_slow(struct folio *folio)
> {
> struct swap_info_struct *si, *next;
> + int mask = folio_tier_effective_mask(folio);
>
> spin_lock(&swap_avail_lock);
> start_over:
> plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> + if (!swap_tiers_mask_test(si->tier_mask, mask))
> + continue;
> +
> /* Rotate the device and switch to a new cluster */
> plist_requeue(&si->avail_list, &swap_avail_head);
> spin_unlock(&swap_avail_lock);
> --
> 2.34.1
>
next prev parent reply other threads:[~2026-05-27 1:43 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 5:53 [PATCH v6 0/4] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Youngjun Park
2026-04-21 5:53 ` [PATCH v6 1/4] mm: swap: introduce swap tier infrastructure Youngjun Park
2026-05-25 21:49 ` Baoquan He
2026-05-26 6:12 ` YoungJun Park
2026-05-25 22:57 ` Baoquan He
2026-05-26 6:09 ` YoungJun Park
2026-05-26 10:52 ` Baoquan He
2026-04-21 5:53 ` [PATCH v6 2/4] mm: swap: associate swap devices with tiers Youngjun Park
2026-05-25 23:04 ` Baoquan He
2026-04-21 5:53 ` [PATCH v6 3/4] mm: memcontrol: add interfaces for swap tier selection Youngjun Park
2026-04-23 4:34 ` YoungJun Park
2026-05-26 15:33 ` Baoquan He
2026-05-27 1:58 ` YoungJun Park
2026-05-26 23:56 ` Baoquan He
2026-05-27 2:08 ` YoungJun Park
2026-04-21 5:53 ` [PATCH v6 4/4] mm: swap: filter swap allocation by memcg tier mask Youngjun Park
2026-04-23 4:38 ` YoungJun Park
2026-05-27 1:42 ` Baoquan He [this message]
2026-05-27 2:17 ` YoungJun Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahZMHmMbhnNPspQj@MiWiFi-R3L-srv \
--to=baoquan.he@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baver.bae@lge.com \
--cc=bhe@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=chrisl@kernel.org \
--cc=gunho.lee@lge.com \
--cc=hannes@cmpxchg.org \
--cc=hyungjun.cho@lge.com \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matia.kim@lge.com \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=taejoon.song@lge.com \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.