From: YoungJun Park <youngjun.park@lge.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Barry Song <baohua@kernel.org>, Hugh Dickins <hughd@google.com>,
Chris Li <chrisl@kernel.org>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Chengming Zhou <chengming.zhou@linux.dev>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Qi Zheng <zhengqi.arch@bytedance.com>,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
Yosry Ahmed <yosry@kernel.org>, Lorenzo Stoakes <ljs@kernel.org>,
Dev Jain <dev.jain@arm.com>, Lance Yang <lance.yang@linux.dev>,
Michal Hocko <mhocko@suse.com>, Michal Hocko <mhocko@kernel.org>,
Qi Zheng <qi.zheng@linux.dev>
Subject: Re: [PATCH v2 11/11] mm, swap: merge zeromap into swap table
Date: Sun, 19 Apr 2026 21:50:47 +0900 [thread overview]
Message-ID: <aeTPpwblIa6LoLk0@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <CAMgjq7CwaKWP_2yxorK88ZLZ-hRRpaLVm76jJE5mLNcX5eCg=w@mail.gmail.com>
On Sat, Apr 18, 2026 at 09:34:35PM +0800, Kairui Song wrote:
> On Sat, Apr 18, 2026 at 8:28 PM YoungJun Park <youngjun.park@lge.com> wrote:
> >
> > On Fri, Apr 17, 2026 at 02:34:41AM +0800, Kairui Song via B4 Relay wrote:
> >
> > > *
> > > * Usages:
> > > *
> > > @@ -74,17 +76,22 @@ struct swap_memcg_table {
> > > #define SWP_TB_PFN_MARK_BITS 2
> > > #define SWP_TB_PFN_MARK_MASK (BIT(SWP_TB_PFN_MARK_BITS) - 1)
> > >
> > > -/* SWAP_COUNT part for PFN or shadow, the width can be shrunk or extended */
> > > -#define SWP_TB_COUNT_BITS min(4, BITS_PER_LONG - SWP_TB_PFN_BITS)
> > > +/* SWAP_COUNT and flags for PFN or shadow, width can be shrunk or extended */
> > > +#define SWP_TB_FLAGS_BITS min(5, BITS_PER_LONG - SWP_TB_PFN_BITS)
> > > +#define SWP_TB_COUNT_BITS (SWP_TB_FLAGS_BITS - 1)
> >
> > Hi Kairui :)
> >
> > Would this break the build on 32-bit arches with 40-bit phys
> > addrs (MAX_POSSIBLE_PHYSMEM_BITS = 40)?
> >
> > Architectures I checked.
> > - ARM LPAE (CONFIG_ARM_LPAE=y)
> > - ARC PAE40 (CONFIG_ARC_HAS_PAE40=y)
> > - MIPS XPA (CONFIG_XPA=y)
> >
> > Calculations.
> >
> > SWP_TB_PFN_BITS = 28 + 2 = 30
> > SWP_TB_FLAGS_BITS = min(5, 32 - 30) = 2
> > SWP_TB_COUNT_BITS = 2 - 1 = 1
> >
> > The BUILD_BUG_ON looks like the real problem. it needs at
> > least 3 count values (free/used/overflow).
> >
> > BUILD_BUG_ON(SWP_TB_COUNT_MAX < 2 || SWP_TB_COUNT_BITS < 2);
> >
> > Confirmed with a cross build (multi_v7_defconfig + lpae.config).
> >
> > error: BUILD_BUG_ON failed: SWP_TB_COUNT_MAX < 2 || SWP_TB_COUNT_BITS < 2
> > at __count_to_swp_tb (mm/swap_table.h:227)
>
> Hi YoungJun
>
> Nice catch! Thanks a lot :)
>
> > I think the right fix is widening swap_tb to 64 bits
> > unconditionally (atomic64_t).
>
> I'm a bit concerned that memory usage on 32 bits will bloat up...
>
> >
> > (Or, uglier, these arches could always route counts through the
> > extend table.)
> >
>
> Seems not ugly with a ci->zero_bitmap, looks clean to me, the
> definition will be:
>
> SWP_TABLE_USE_INLINE_ZEROMAP is true when BITS_PER_LONG is not enough
> for SWP_TB_FLAGS_BITS, then:
>
> struct swap_cluster_info {
> ...
> #ifndef SWP_TABLE_USE_INLINE_ZEROMAP
> unsigned long *zero_bitmap;
> #endif
> ...
> };
>
> And helpers will be:
> static inline void __swap_table_set_zero(struct swap_cluster_info *ci,
> unsigned int ci_off)
> {
> unsigned long swp_tb;
>
> #ifdef SWP_TABLE_USE_INLINE_ZEROMAP
> return bitmap_set(&ci->zeromap);
> #else
>
> swp_tb = __swap_table_get(ci, ci_off);
> VM_WARN_ON(!swp_tb_is_countable(swp_tb));
> swp_tb |= SWP_TB_ZERO_MARK;
> __swap_table_set(ci, ci_off, swp_tb);
> }
>
> There are only three helpers in total, looks fine. Allocation part is
> just like the memcg_table. Compared to this version only it seems
> onlys needs a few dozen lines change (A few #ifdef
> SWP_TABLE_USE_INLINE_ZEROMAP) and not hard to understand. How do you
> think?
Hi Kairui,
Sounds good. easy to understand and not many changes.
Another option could be using a 64-bit entry only on LPAE-like arches,
(not including every 32-bit arch)
though that would mean adding a separate set of atomic64 ops.
The direction you proposed seems cleaner, so I'm on board.
Thanks,
YoungJun Park
prev parent reply other threads:[~2026-04-19 12:50 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 18:34 [PATCH v2 00/11] mm, swap: swap table phase IV: unify allocation and reduce static metadata Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 01/11] mm, swap: simplify swap cache allocation helper Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 02/11] mm, swap: move common swap cache operations into standalone helpers Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 03/11] mm/huge_memory: move THP gfp limit helper into header Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 04/11] mm, swap: add support for stable large allocation in swap cache directly Kairui Song via B4 Relay
2026-04-17 3:19 ` Kairui Song
2026-04-16 18:34 ` [PATCH v2 05/11] mm, swap: unify large folio allocation Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 06/11] mm/memcg, swap: tidy up cgroup v1 memsw swap helpers Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 07/11] mm, swap: support flexible batch freeing of slots in different memcg Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 08/11] mm/swap: delay and unify memcg lookup and charging for swapin Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 09/11] mm/memcg, swap: store cgroup id in cluster table directly Kairui Song via B4 Relay
2026-04-18 14:03 ` YoungJun Park
2026-04-19 15:55 ` Kairui Song
2026-04-16 18:34 ` [PATCH v2 10/11] mm/memcg: remove no longer used swap cgroup array Kairui Song via B4 Relay
2026-04-16 18:34 ` [PATCH v2 11/11] mm, swap: merge zeromap into swap table Kairui Song via B4 Relay
2026-04-18 12:27 ` YoungJun Park
2026-04-18 13:34 ` Kairui Song
2026-04-19 12:50 ` YoungJun Park [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeTPpwblIa6LoLk0@yjaykim-PowerEdge-T330 \
--to=youngjun.park@lge.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=qi.zheng@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=yosry@kernel.org \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox