From: Chris Li <chrisl@kernel.org>
To: kasong@tencent.com
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Barry Song <baohua@kernel.org>, Hugh Dickins <hughd@google.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Youngjun Park <youngjun.park@lge.com>,
Chengming Zhou <chengming.zhou@linux.dev>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Qi Zheng <zhengqi.arch@bytedance.com>,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
Yosry Ahmed <yosry@kernel.org>, Lorenzo Stoakes <ljs@kernel.org>,
Dev Jain <dev.jain@arm.com>, Lance Yang <lance.yang@linux.dev>,
Michal Hocko <mhocko@suse.com>, Michal Hocko <mhocko@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Axel Rasmussen <axelrasmussen@google.com>
Subject: Re: [PATCH v3 08/12] mm, swap: delay and unify memcg lookup and charging for swapin
Date: Fri, 8 May 2026 06:46:59 +0200 [thread overview]
Message-ID: <CACePvbUsKUBF=inQDRfcp-_RGiADobAkGDmeMuUZOAxi3v_SAg@mail.gmail.com> (raw)
In-Reply-To: <20260421-swap-table-p4-v3-8-2f23759a76bc@tencent.com>
On Tue, Apr 21, 2026 at 2:16 AM Kairui Song via B4 Relay
<devnull+kasong.tencent.com@kernel.org> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> Instead of checking the cgroup private ID during page table walk in
> swap_pte_batch(), move the memcg lookup into __swap_cache_add_check()
> under the cluster lock.
>
> The first pre-alloc check is speculative and skips the memcg check since
> the post-alloc stable check ensures all slots covered by the folio
> belong to the same memcg. It is very rare for contiguous and aligned
> entries across a contiguous region of a page table of the same process
> or shmem mapping to belong to different memcgs.
>
> This also prepares for recording the memcg info in the cluster's table.
> Also make the order check and fallback more compact.
>
> There should be no user-observable behavior change.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Chris Li <chrisl@kernel.org>
> ---
> include/linux/memcontrol.h | 6 +++---
> mm/internal.h | 10 +---------
> mm/memcontrol.c | 10 ++++------
> mm/swap_state.c | 28 +++++++++++++++++++---------
> 4 files changed, 27 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 7d08128de1fd..a013f37f24aa 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -646,8 +646,8 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm,
>
> int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp);
>
> -int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
> - gfp_t gfp, swp_entry_t entry);
> +int mem_cgroup_swapin_charge_folio(struct folio *folio, unsigned short id,
> + struct mm_struct *mm, gfp_t gfp);
>
> void __mem_cgroup_uncharge(struct folio *folio);
>
> @@ -1137,7 +1137,7 @@ static inline int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp)
> }
>
> static inline int mem_cgroup_swapin_charge_folio(struct folio *folio,
> - struct mm_struct *mm, gfp_t gfp, swp_entry_t entry)
> + unsigned short id, struct mm_struct *mm, gfp_t gfp)
> {
> return 0;
> }
> diff --git a/mm/internal.h b/mm/internal.h
> index 5a2ddcf68e0b..9d2fec696bd6 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -451,24 +451,16 @@ static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
> {
> pte_t expected_pte = pte_next_swp_offset(pte);
> const pte_t *end_ptep = start_ptep + max_nr;
> - const softleaf_t entry = softleaf_from_pte(pte);
> pte_t *ptep = start_ptep + 1;
> - unsigned short cgroup_id;
>
> VM_WARN_ON(max_nr < 1);
> - VM_WARN_ON(!softleaf_is_swap(entry));
> + VM_WARN_ON(!softleaf_is_swap(softleaf_from_pte(pte)));
>
> - cgroup_id = lookup_swap_cgroup_id(entry);
> while (ptep < end_ptep) {
> - softleaf_t entry;
> -
> pte = ptep_get(ptep);
>
> if (!pte_same(pte, expected_pte))
> break;
> - entry = softleaf_from_pte(pte);
> - if (lookup_swap_cgroup_id(entry) != cgroup_id)
> - break;
> expected_pte = pte_next_swp_offset(expected_pte);
> ptep++;
> }
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c7df30ca5aa7..641706fa47bf 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5062,27 +5062,25 @@ int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
>
> /**
> * mem_cgroup_swapin_charge_folio - Charge a newly allocated folio for swapin.
> - * @folio: folio to charge.
> + * @folio: the folio to charge
> + * @id: memory cgroup id
> * @mm: mm context of the victim
> * @gfp: reclaim mode
> - * @entry: swap entry for which the folio is allocated
> *
> * This function charges a folio allocated for swapin. Please call this before
> * adding the folio to the swapcache.
> *
> * Returns 0 on success. Otherwise, an error code is returned.
> */
> -int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
> - gfp_t gfp, swp_entry_t entry)
> +int mem_cgroup_swapin_charge_folio(struct folio *folio, unsigned short id,
> + struct mm_struct *mm, gfp_t gfp)
> {
> struct mem_cgroup *memcg;
> - unsigned short id;
> int ret;
>
> if (mem_cgroup_disabled())
> return 0;
>
> - id = lookup_swap_cgroup_id(entry);
> rcu_read_lock();
> memcg = mem_cgroup_from_private_id(id);
> if (!memcg || !css_tryget_online(&memcg->css))
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 12b290d43e45..86d517a33a55 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -142,16 +142,20 @@ void *swap_cache_get_shadow(swp_entry_t entry)
> * @ci: The locked swap cluster
> * @targ_entry: The target swap entry to check, will be rounded down by @nr
> * @nr: Number of slots to check, must be a power of 2
> - * @shadowp: Returns the shadow value if one exists in the range.
> + * @shadowp: Returns the shadow value if one exists in the range
> + * @memcg_id: Returns the memory cgroup id, NULL to ignore cgroup check
> *
> * Check if all slots covered by given range have a swap count >= 1.
> - * Retrieves the shadow if there is one.
> + * Retrieves the shadow if there is one. If @memcg_id is not NULL, also
> + * checks if all slots belong to the same cgroup and return the cgroup
> + * private id.
> *
> * Context: Caller must lock the cluster.
> */
> static int __swap_cache_add_check(struct swap_cluster_info *ci,
> swp_entry_t targ_entry,
> - unsigned long nr, void **shadowp)
> + unsigned long nr, void **shadowp,
> + unsigned short *memcg_id)
> {
> unsigned int ci_off, ci_end;
> unsigned long old_tb;
> @@ -169,19 +173,24 @@ static int __swap_cache_add_check(struct swap_cluster_info *ci,
> return -EEXIST;
> if (!__swp_tb_get_count(old_tb))
> return -ENOENT;
> - if (swp_tb_is_shadow(old_tb) && shadowp)
> + if (shadowp && swp_tb_is_shadow(old_tb))
> *shadowp = swp_tb_to_shadow(old_tb);
> + if (memcg_id)
> + *memcg_id = lookup_swap_cgroup_id(targ_entry);
Nitpick: Consider also use a local variable to stare the memcg_id value here.
>
> if (nr == 1)
> return 0;
>
> + targ_entry.val = round_down(targ_entry.val, nr);
> ci_off = round_down(ci_off, nr);
> ci_end = ci_off + nr;
> do {
> old_tb = __swap_table_get(ci, ci_off);
> if (unlikely(swp_tb_is_folio(old_tb) ||
> - !__swp_tb_get_count(old_tb)))
> + !__swp_tb_get_count(old_tb) ||
> + (memcg_id && *memcg_id != lookup_swap_cgroup_id(targ_entry))))
Nitpick: You can use the local variable here to avoid a memory fetch.
Micro optimizations.
Chris
next prev parent reply other threads:[~2026-05-08 4:47 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 6:16 [PATCH v3 00/12] mm, swap: swap table phase IV: unify allocation and reduce static metadata Kairui Song via B4 Relay
2026-04-21 6:16 ` [PATCH v3 01/12] mm, swap: simplify swap cache allocation helper Kairui Song via B4 Relay
2026-05-06 13:51 ` Chris Li
2026-05-11 8:57 ` Kairui Song
2026-04-21 6:16 ` [PATCH v3 02/12] mm, swap: move common swap cache operations into standalone helpers Kairui Song via B4 Relay
2026-05-06 14:42 ` Chris Li
2026-05-12 14:48 ` Kairui Song
2026-04-21 6:16 ` [PATCH v3 03/12] mm/huge_memory: move THP gfp limit helper into header Kairui Song via B4 Relay
2026-04-21 13:08 ` Zi Yan
2026-04-21 17:21 ` Kairui Song
2026-04-21 17:23 ` Zi Yan
2026-05-12 9:02 ` Baolin Wang
2026-05-06 14:46 ` Chris Li
2026-04-21 6:16 ` [PATCH v3 04/12] mm, swap: add support for stable large allocation in swap cache directly Kairui Song via B4 Relay
2026-05-06 20:27 ` Chris Li
2026-05-12 9:48 ` Baolin Wang
2026-05-12 9:55 ` Kairui Song
2026-04-21 6:16 ` [PATCH v3 05/12] mm, swap: unify large folio allocation Kairui Song via B4 Relay
2026-05-06 20:48 ` Chris Li
2026-05-11 12:57 ` David Hildenbrand (Arm)
2026-05-11 14:37 ` Kairui Song
2026-05-11 15:15 ` David Hildenbrand (Arm)
2026-05-11 16:44 ` Kairui Song
2026-05-12 6:07 ` David Hildenbrand (Arm)
2026-05-12 10:10 ` Baolin Wang
2026-05-13 6:47 ` Kairui Song
2026-04-21 6:16 ` [PATCH v3 06/12] mm/memcg, swap: tidy up cgroup v1 memsw swap helpers Kairui Song via B4 Relay
2026-05-06 20:57 ` Chris Li
2026-04-21 6:16 ` [PATCH v3 07/12] mm, swap: support flexible batch freeing of slots in different memcgs Kairui Song via B4 Relay
2026-05-08 4:01 ` Chris Li
2026-05-13 9:43 ` Kairui Song
2026-04-21 6:16 ` [PATCH v3 08/12] mm, swap: delay and unify memcg lookup and charging for swapin Kairui Song via B4 Relay
2026-05-08 4:46 ` Chris Li [this message]
2026-04-21 6:16 ` [PATCH v3 09/12] mm, swap: consolidate cluster allocation helpers Kairui Song via B4 Relay
2026-05-08 5:02 ` Chris Li
2026-04-21 6:16 ` [PATCH v3 10/12] mm/memcg, swap: store cgroup id in cluster table directly Kairui Song via B4 Relay
2026-05-08 22:46 ` Chris Li
2026-05-13 11:49 ` Kairui Song
2026-04-21 6:16 ` [PATCH v3 11/12] mm/memcg: remove no longer used swap cgroup array Kairui Song via B4 Relay
2026-05-08 22:47 ` Chris Li
2026-04-21 6:16 ` [PATCH v3 12/12] mm, swap: merge zeromap into swap table Kairui Song via B4 Relay
2026-05-11 16:30 ` Chris Li
2026-04-24 18:11 ` [PATCH v3 00/12] mm, swap: swap table phase IV: unify allocation and reduce static metadata Kairui Song
2026-05-11 21:12 ` Andrew Morton
2026-05-12 5:10 ` Kairui Song
2026-05-11 16:34 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CACePvbUsKUBF=inQDRfcp-_RGiADobAkGDmeMuUZOAxi3v_SAg@mail.gmail.com' \
--to=chrisl@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kasong@tencent.com \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=yosry@kernel.org \
--cc=youngjun.park@lge.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox