Re: [PATCH v2 11/11] mm, swap: merge zeromap into swap table

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: YoungJun Park <youngjun.park@lge.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>, Zi Yan <ziy@nvidia.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Barry Song <baohua@kernel.org>, Hugh Dickins <hughd@google.com>,
	Chris Li <chrisl@kernel.org>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	Yosry Ahmed <yosry@kernel.org>, Lorenzo Stoakes <ljs@kernel.org>,
	Dev Jain <dev.jain@arm.com>, Lance Yang <lance.yang@linux.dev>,
	Michal Hocko <mhocko@suse.com>, Michal Hocko <mhocko@kernel.org>,
	Qi Zheng <qi.zheng@linux.dev>
Subject: Re: [PATCH v2 11/11] mm, swap: merge zeromap into swap table
Date: Sun, 19 Apr 2026 21:50:47 +0900	[thread overview]
Message-ID: <aeTPpwblIa6LoLk0@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <CAMgjq7CwaKWP_2yxorK88ZLZ-hRRpaLVm76jJE5mLNcX5eCg=w@mail.gmail.com>

On Sat, Apr 18, 2026 at 09:34:35PM +0800, Kairui Song wrote:
> On Sat, Apr 18, 2026 at 8:28 PM YoungJun Park <youngjun.park@lge.com> wrote:
> >
> > On Fri, Apr 17, 2026 at 02:34:41AM +0800, Kairui Song via B4 Relay wrote:
> >
> > >   *
> > >   * Usages:
> > >   *
> > > @@ -74,17 +76,22 @@ struct swap_memcg_table {
> > >  #define SWP_TB_PFN_MARK_BITS 2
> > >  #define SWP_TB_PFN_MARK_MASK (BIT(SWP_TB_PFN_MARK_BITS) - 1)
> > >
> > > -/* SWAP_COUNT part for PFN or shadow, the width can be shrunk or extended */
> > > -#define SWP_TB_COUNT_BITS      min(4, BITS_PER_LONG - SWP_TB_PFN_BITS)
> > > +/* SWAP_COUNT and flags for PFN or shadow, width can be shrunk or extended */
> > > +#define SWP_TB_FLAGS_BITS    min(5, BITS_PER_LONG - SWP_TB_PFN_BITS)
> > > +#define SWP_TB_COUNT_BITS    (SWP_TB_FLAGS_BITS - 1)
> >
> > Hi Kairui :)
> >
> > Would this break the build on 32-bit arches with 40-bit phys
> > addrs (MAX_POSSIBLE_PHYSMEM_BITS = 40)?
> >
> > Architectures I checked.
> >   - ARM LPAE   (CONFIG_ARM_LPAE=y)
> >   - ARC PAE40  (CONFIG_ARC_HAS_PAE40=y)
> >   - MIPS XPA   (CONFIG_XPA=y)
> >
> > Calculations.
> >
> >   SWP_TB_PFN_BITS   = 28 + 2 = 30
> >   SWP_TB_FLAGS_BITS = min(5, 32 - 30) = 2
> >   SWP_TB_COUNT_BITS = 2 - 1 = 1
> >
> > The BUILD_BUG_ON looks like the real problem. it needs at
> > least 3 count values (free/used/overflow).
> >
> >   BUILD_BUG_ON(SWP_TB_COUNT_MAX < 2 || SWP_TB_COUNT_BITS < 2);
> >
> > Confirmed with a cross build (multi_v7_defconfig + lpae.config).
> >
> >   error: BUILD_BUG_ON failed: SWP_TB_COUNT_MAX < 2 || SWP_TB_COUNT_BITS < 2
> >     at __count_to_swp_tb (mm/swap_table.h:227)
> 
> Hi YoungJun
> 
> Nice catch! Thanks a lot :)
> 
> > I think the right fix is widening swap_tb to 64 bits
> > unconditionally (atomic64_t).
> 
> I'm a bit concerned that memory usage on 32 bits will bloat up...
> 
> >
> > (Or, uglier, these arches could always route counts through the
> > extend table.)
> >
> 
> Seems not ugly with a ci->zero_bitmap, looks clean to me, the
> definition will be:
> 
> SWP_TABLE_USE_INLINE_ZEROMAP is true when BITS_PER_LONG is not enough
> for SWP_TB_FLAGS_BITS, then:
> 
> struct swap_cluster_info {
> ...
> #ifndef SWP_TABLE_USE_INLINE_ZEROMAP
>         unsigned long *zero_bitmap;
> #endif
>         ...
> };
> 
> And helpers will be:
> static inline void __swap_table_set_zero(struct swap_cluster_info *ci,
>           unsigned int ci_off)
> {
>         unsigned long swp_tb;
> 
> #ifdef SWP_TABLE_USE_INLINE_ZEROMAP
>         return bitmap_set(&ci->zeromap);
> #else
> 
>         swp_tb = __swap_table_get(ci, ci_off);
>         VM_WARN_ON(!swp_tb_is_countable(swp_tb));
>         swp_tb |= SWP_TB_ZERO_MARK;
>         __swap_table_set(ci, ci_off, swp_tb);
> }
> 
> There are only three helpers in total, looks fine. Allocation part is
> just like the memcg_table. Compared to this version only it seems
> onlys needs a few dozen lines change (A few #ifdef
> SWP_TABLE_USE_INLINE_ZEROMAP) and not hard to understand. How do you
> think?

Hi Kairui,

Sounds good. easy to understand and not many changes.

Another option could be using a 64-bit entry only on LPAE-like arches, 
(not including every 32-bit arch)
though that would mean adding a separate set of atomic64 ops. 

The direction you proposed seems cleaner, so I'm on board. 

Thanks,
YoungJun Park

     prev parent reply	other threads:[~2026-04-19 12:50 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16 18:34 [PATCH v2 00/11] mm, swap: swap table phase IV: unify allocation and reduce static metadata Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 01/11] mm, swap: simplify swap cache allocation helper Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 02/11] mm, swap: move common swap cache operations into standalone helpers Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 03/11] mm/huge_memory: move THP gfp limit helper into header Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 04/11] mm, swap: add support for stable large allocation in swap cache directly Kairui Song via B4 Relay
2026-04-17  3:19   ` Kairui Song
2026-04-16 18:34 ` [PATCH v2 05/11] mm, swap: unify large folio allocation Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 06/11] mm/memcg, swap: tidy up cgroup v1 memsw swap helpers Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 07/11] mm, swap: support flexible batch freeing of slots in different memcg Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 08/11] mm/swap: delay and unify memcg lookup and charging for swapin Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 09/11] mm/memcg, swap: store cgroup id in cluster table directly Kairui Song via B4 Relay
2026-04-18 14:03   ` YoungJun Park
2026-04-19 15:55     ` Kairui Song
2026-04-16 18:34 ` [PATCH v2 10/11] mm/memcg: remove no longer used swap cgroup array Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 11/11] mm, swap: merge zeromap into swap table Kairui Song via B4 Relay
2026-04-18 12:27   ` YoungJun Park
2026-04-18 13:34     ` Kairui Song
2026-04-19 12:50       ` YoungJun Park [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeTPpwblIa6LoLk0@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=qi.zheng@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=yosry@kernel.org \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox