Re: [PATCH v3 08/12] mm, swap: delay and unify memcg lookup and charging for swapin

Linux cgroups development
 help / color / mirror / Atom feed

From: Chris Li <chrisl@kernel.org>
To: kasong@tencent.com
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 David Hildenbrand <david@kernel.org>, Zi Yan <ziy@nvidia.com>,
	 Baolin Wang <baolin.wang@linux.alibaba.com>,
	Barry Song <baohua@kernel.org>,  Hugh Dickins <hughd@google.com>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	 Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Youngjun Park <youngjun.park@lge.com>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 Muchun Song <muchun.song@linux.dev>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	 linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	 Yosry Ahmed <yosry@kernel.org>, Lorenzo Stoakes <ljs@kernel.org>,
	Dev Jain <dev.jain@arm.com>,  Lance Yang <lance.yang@linux.dev>,
	Michal Hocko <mhocko@suse.com>, Michal Hocko <mhocko@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>
Subject: Re: [PATCH v3 08/12] mm, swap: delay and unify memcg lookup and charging for swapin
Date: Fri, 8 May 2026 06:46:59 +0200	[thread overview]
Message-ID: <CACePvbUsKUBF=inQDRfcp-_RGiADobAkGDmeMuUZOAxi3v_SAg@mail.gmail.com> (raw)
In-Reply-To: <20260421-swap-table-p4-v3-8-2f23759a76bc@tencent.com>

On Tue, Apr 21, 2026 at 2:16 AM Kairui Song via B4 Relay
<devnull+kasong.tencent.com@kernel.org> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> Instead of checking the cgroup private ID during page table walk in
> swap_pte_batch(), move the memcg lookup into __swap_cache_add_check()
> under the cluster lock.
>
> The first pre-alloc check is speculative and skips the memcg check since
> the post-alloc stable check ensures all slots covered by the folio
> belong to the same memcg. It is very rare for contiguous and aligned
> entries across a contiguous region of a page table of the same process
> or shmem mapping to belong to different memcgs.
>
> This also prepares for recording the memcg info in the cluster's table.
> Also make the order check and fallback more compact.
>
> There should be no user-observable behavior change.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>

Acked-by: Chris Li <chrisl@kernel.org>

> ---
>  include/linux/memcontrol.h |  6 +++---
>  mm/internal.h              | 10 +---------
>  mm/memcontrol.c            | 10 ++++------
>  mm/swap_state.c            | 28 +++++++++++++++++++---------
>  4 files changed, 27 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 7d08128de1fd..a013f37f24aa 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -646,8 +646,8 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm,
>
>  int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp);
>
> -int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
> -                                 gfp_t gfp, swp_entry_t entry);
> +int mem_cgroup_swapin_charge_folio(struct folio *folio, unsigned short id,
> +                                  struct mm_struct *mm, gfp_t gfp);
>
>  void __mem_cgroup_uncharge(struct folio *folio);
>
> @@ -1137,7 +1137,7 @@ static inline int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp)
>  }
>
>  static inline int mem_cgroup_swapin_charge_folio(struct folio *folio,
> -                       struct mm_struct *mm, gfp_t gfp, swp_entry_t entry)
> +                unsigned short id, struct mm_struct *mm, gfp_t gfp)
>  {
>         return 0;
>  }
> diff --git a/mm/internal.h b/mm/internal.h
> index 5a2ddcf68e0b..9d2fec696bd6 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -451,24 +451,16 @@ static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
>  {
>         pte_t expected_pte = pte_next_swp_offset(pte);
>         const pte_t *end_ptep = start_ptep + max_nr;
> -       const softleaf_t entry = softleaf_from_pte(pte);
>         pte_t *ptep = start_ptep + 1;
> -       unsigned short cgroup_id;
>
>         VM_WARN_ON(max_nr < 1);
> -       VM_WARN_ON(!softleaf_is_swap(entry));
> +       VM_WARN_ON(!softleaf_is_swap(softleaf_from_pte(pte)));
>
> -       cgroup_id = lookup_swap_cgroup_id(entry);
>         while (ptep < end_ptep) {
> -               softleaf_t entry;
> -
>                 pte = ptep_get(ptep);
>
>                 if (!pte_same(pte, expected_pte))
>                         break;
> -               entry = softleaf_from_pte(pte);
> -               if (lookup_swap_cgroup_id(entry) != cgroup_id)
> -                       break;
>                 expected_pte = pte_next_swp_offset(expected_pte);
>                 ptep++;
>         }
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c7df30ca5aa7..641706fa47bf 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5062,27 +5062,25 @@ int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
>
>  /**
>   * mem_cgroup_swapin_charge_folio - Charge a newly allocated folio for swapin.
> - * @folio: folio to charge.
> + * @folio: the folio to charge
> + * @id: memory cgroup id
>   * @mm: mm context of the victim
>   * @gfp: reclaim mode
> - * @entry: swap entry for which the folio is allocated
>   *
>   * This function charges a folio allocated for swapin. Please call this before
>   * adding the folio to the swapcache.
>   *
>   * Returns 0 on success. Otherwise, an error code is returned.
>   */
> -int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
> -                                 gfp_t gfp, swp_entry_t entry)
> +int mem_cgroup_swapin_charge_folio(struct folio *folio, unsigned short id,
> +                                  struct mm_struct *mm, gfp_t gfp)
>  {
>         struct mem_cgroup *memcg;
> -       unsigned short id;
>         int ret;
>
>         if (mem_cgroup_disabled())
>                 return 0;
>
> -       id = lookup_swap_cgroup_id(entry);
>         rcu_read_lock();
>         memcg = mem_cgroup_from_private_id(id);
>         if (!memcg || !css_tryget_online(&memcg->css))
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 12b290d43e45..86d517a33a55 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -142,16 +142,20 @@ void *swap_cache_get_shadow(swp_entry_t entry)
>   * @ci: The locked swap cluster
>   * @targ_entry: The target swap entry to check, will be rounded down by @nr
>   * @nr: Number of slots to check, must be a power of 2
> - * @shadowp: Returns the shadow value if one exists in the range.
> + * @shadowp: Returns the shadow value if one exists in the range
> + * @memcg_id: Returns the memory cgroup id, NULL to ignore cgroup check
>   *
>   * Check if all slots covered by given range have a swap count >= 1.
> - * Retrieves the shadow if there is one.
> + * Retrieves the shadow if there is one. If @memcg_id is not NULL, also
> + * checks if all slots belong to the same cgroup and return the cgroup
> + * private id.
>   *
>   * Context: Caller must lock the cluster.
>   */
>  static int __swap_cache_add_check(struct swap_cluster_info *ci,
>                                   swp_entry_t targ_entry,
> -                                 unsigned long nr, void **shadowp)
> +                                 unsigned long nr, void **shadowp,
> +                                 unsigned short *memcg_id)
>  {
>         unsigned int ci_off, ci_end;
>         unsigned long old_tb;
> @@ -169,19 +173,24 @@ static int __swap_cache_add_check(struct swap_cluster_info *ci,
>                 return -EEXIST;
>         if (!__swp_tb_get_count(old_tb))
>                 return -ENOENT;
> -       if (swp_tb_is_shadow(old_tb) && shadowp)
> +       if (shadowp && swp_tb_is_shadow(old_tb))
>                 *shadowp = swp_tb_to_shadow(old_tb);
> +       if (memcg_id)
> +               *memcg_id = lookup_swap_cgroup_id(targ_entry);

Nitpick: Consider also use a local variable to stare the memcg_id value here.

>
>         if (nr == 1)
>                 return 0;
>
> +       targ_entry.val = round_down(targ_entry.val, nr);
>         ci_off = round_down(ci_off, nr);
>         ci_end = ci_off + nr;
>         do {
>                 old_tb = __swap_table_get(ci, ci_off);
>                 if (unlikely(swp_tb_is_folio(old_tb) ||
> -                            !__swp_tb_get_count(old_tb)))
> +                            !__swp_tb_get_count(old_tb) ||
> +                            (memcg_id && *memcg_id != lookup_swap_cgroup_id(targ_entry))))

Nitpick: You can use the local variable here to avoid a memory fetch.
Micro optimizations.

Chris

next prev parent reply	other threads:[~2026-05-08  4:47 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  6:16 [PATCH v3 00/12] mm, swap: swap table phase IV: unify allocation and reduce static metadata Kairui Song via B4 Relay
2026-04-21  6:16 ` [PATCH v3 01/12] mm, swap: simplify swap cache allocation helper Kairui Song via B4 Relay
2026-05-06 13:51   ` Chris Li
2026-05-11  8:57     ` Kairui Song
2026-04-21  6:16 ` [PATCH v3 02/12] mm, swap: move common swap cache operations into standalone helpers Kairui Song via B4 Relay
2026-05-06 14:42   ` Chris Li
2026-05-12 14:48     ` Kairui Song
2026-04-21  6:16 ` [PATCH v3 03/12] mm/huge_memory: move THP gfp limit helper into header Kairui Song via B4 Relay
2026-04-21 13:08   ` Zi Yan
2026-04-21 17:21     ` Kairui Song
2026-04-21 17:23       ` Zi Yan
2026-05-12  9:02         ` Baolin Wang
2026-05-06 14:46   ` Chris Li
2026-04-21  6:16 ` [PATCH v3 04/12] mm, swap: add support for stable large allocation in swap cache directly Kairui Song via B4 Relay
2026-05-06 20:27   ` Chris Li
2026-05-12  9:48   ` Baolin Wang
2026-05-12  9:55     ` Kairui Song
2026-04-21  6:16 ` [PATCH v3 05/12] mm, swap: unify large folio allocation Kairui Song via B4 Relay
2026-05-06 20:48   ` Chris Li
2026-05-11 12:57   ` David Hildenbrand (Arm)
2026-05-11 14:37     ` Kairui Song
2026-05-11 15:15       ` David Hildenbrand (Arm)
2026-05-11 16:44         ` Kairui Song
2026-05-12  6:07           ` David Hildenbrand (Arm)
2026-05-12 10:10   ` Baolin Wang
2026-05-13  6:47     ` Kairui Song
2026-04-21  6:16 ` [PATCH v3 06/12] mm/memcg, swap: tidy up cgroup v1 memsw swap helpers Kairui Song via B4 Relay
2026-05-06 20:57   ` Chris Li
2026-04-21  6:16 ` [PATCH v3 07/12] mm, swap: support flexible batch freeing of slots in different memcgs Kairui Song via B4 Relay
2026-05-08  4:01   ` Chris Li
2026-05-13  9:43     ` Kairui Song
2026-04-21  6:16 ` [PATCH v3 08/12] mm, swap: delay and unify memcg lookup and charging for swapin Kairui Song via B4 Relay
2026-05-08  4:46   ` Chris Li [this message]
2026-04-21  6:16 ` [PATCH v3 09/12] mm, swap: consolidate cluster allocation helpers Kairui Song via B4 Relay
2026-05-08  5:02   ` Chris Li
2026-04-21  6:16 ` [PATCH v3 10/12] mm/memcg, swap: store cgroup id in cluster table directly Kairui Song via B4 Relay
2026-05-08 22:46   ` Chris Li
2026-05-13 11:49     ` Kairui Song
2026-04-21  6:16 ` [PATCH v3 11/12] mm/memcg: remove no longer used swap cgroup array Kairui Song via B4 Relay
2026-05-08 22:47   ` Chris Li
2026-04-21  6:16 ` [PATCH v3 12/12] mm, swap: merge zeromap into swap table Kairui Song via B4 Relay
2026-05-11 16:30   ` Chris Li
2026-04-24 18:11 ` [PATCH v3 00/12] mm, swap: swap table phase IV: unify allocation and reduce static metadata Kairui Song
2026-05-11 21:12   ` Andrew Morton
2026-05-12  5:10     ` Kairui Song
2026-05-11 16:34 ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACePvbUsKUBF=inQDRfcp-_RGiADobAkGDmeMuUZOAxi3v_SAg@mail.gmail.com' \
    --to=chrisl@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kasong@tencent.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=yosry@kernel.org \
    --cc=youngjun.park@lge.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox