From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3A99C3ABD8 for ; Wed, 14 May 2025 20:19:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 32D586B00BB; Wed, 14 May 2025 16:19:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B07C6B00BC; Wed, 14 May 2025 16:19:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1086C6B00BD; Wed, 14 May 2025 16:19:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E21A46B00BB for ; Wed, 14 May 2025 16:19:51 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E92D21A10C5 for ; Wed, 14 May 2025 20:19:51 +0000 (UTC) X-FDA: 83442629382.19.A2837CE Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf18.hostedemail.com (Postfix) with ESMTP id 0C21F1C0003 for ; Wed, 14 May 2025 20:19:49 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SOVI5Yug; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747253990; a=rsa-sha256; cv=none; b=zMjghgh2sA3+LPRWhyAE4dKg7fL+5eFccxocFJZ34GRU0rqCdwDEh0H1KyF8iDbrpwQQSL 83VYEzWDq3ha6JxiIGlcP/OBCrQCtrE0kvLzYLKCwXwn3d0hNkdbvNCQGuPqVLNtL4v3W4 4gwu+JEhNC9nzDpVXfa+LWVFKhzCdnA= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SOVI5Yug; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747253990; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0MUnnnVN3b+cBPjyOT3GG4FgLxBLXNmZ4SICIB9ihgo=; b=lvSY85M2S3mBmBSuHG8bTH+sKGvhktdwULsGm+YzAMvL45ywN6VeafnXe6BD/QXI6jHDN5 pIr1UZhtBPi7LY7YasUJ51q9ycez0iWVGdxfItbiu4p5WwyaWKusUn95nMyWN+WR7tvvva ayZIYIKTwp6qdEwveMxzRkA981LeiQg= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-22e7e5bce38so2135535ad.1 for ; Wed, 14 May 2025 13:19:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747253988; x=1747858788; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=0MUnnnVN3b+cBPjyOT3GG4FgLxBLXNmZ4SICIB9ihgo=; b=SOVI5YugR6xLyUYsyq0aUrHIvuMwFS+sKEd1JeAOgXsdzOlC0JhiJiX2fTR6jeL8PD QtHx4pvnE0TSpfpc3fHDaTSrzx2hngEBevuUIavSyu/eDZ0G2ysFXsXdI5esAJ6ypnPI MexZprfsBh764UIzPmHQoCNoCwDPACF3/XF4kLInq2hIVv4sPs7u3VgmlKw16YMFUvS1 Aj5xR5JGAkLZKKpAYmoC9TpTPbAl4WzN+/6tB3gFlpQhWXI0WVZCPMDtcsgCOSxPtZwu uYepRWskcxB6xGAzWSpsWj/eO1mmYf+mFFBpr01s+CLO8+Tjn+e7C48HTf8QDdkgAryW OYfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747253988; x=1747858788; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=0MUnnnVN3b+cBPjyOT3GG4FgLxBLXNmZ4SICIB9ihgo=; b=AP4oOYGItQGYtOnCyrMXeOynBWUWGRqJrxfbI1Ph0y0ZDse4YNL18YZA/IOpDT4kKG Pb7wVA9WdYSsew9yG514dFJ/sMtlox2YKDnunk+T3HN7md/ySAALkcwAEHOryr2UeeFL 3jdwnhQQbsNISlo9BYkISC1DQFZjHXc0v2RWeiB6XMq7jdyLTDkJX/MfjxZZ3q2e5Ygq 44SwZDyAD2IlVjckKMYeKe92rZENxsbOitPT+ppuuvLzIyePrZN7LU/t4e1CbThDZ4Di roe35HRKoNGpvNFEf6b13xIaEUJVE6laCCsGzdIWejF/HJnYW+9402CaNBA9cToLaEVi Xgaw== X-Gm-Message-State: AOJu0YyXBdCDffGOJg2HDhKmnON9V02DDK87BPukCf2Cc4pwqmO+SMYZ DL7yA57C8GR7TjrhonzANd8f3MXOWeEFrK8QatgCPvIUPnrhThZCOGrdUVSRHTg= X-Gm-Gg: ASbGncsJI8KGE608Ptgi9TE042iGzp3Lt/pHCrKFIgPYn/FFQ1LdS3SrB+G3fwEdLdR uTzSoXKci/FNV4an1t28NGccCR8xfprbEpJJxsVYqZ6THTs0bsdcMFGZAs05oX/Lz8FyKSSFDOx SXH3Cwnp1R9YnUyfdP3J2WrCe2bcJrIqxsF35arNlebzsJF4jUMtKziQSwRPv4gibCc+YuhaRfM 64mjYmNDI/q4/Ud9Dx/0qT1z9yxjijj9DyEEu7P/s/rJJbeLSRZVscx+lm5lJMZIbIzmBCiabSz R0lzS1ows/d7pcotBwH/8QFMZ3G/sNQw5ydDjhNGlcs4VOvp/SepOJWeWjmNM4maIYdSo777Yn6 oEvOehjY= X-Google-Smtp-Source: AGHT+IEwvmfAco98nGh9c9I804LgK/NI5B5gMjpE0yaCnjlhpRcIVXov0YtvuVPKefXIAwHAA8PGVQ== X-Received: by 2002:a17:902:dad2:b0:22e:5e70:b2d3 with SMTP id d9443c01a7336-231980bbe7cmr66312145ad.1.1747253988233; Wed, 14 May 2025 13:19:48 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30e33401934sm2003692a91.9.2025.05.14.13.19.43 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 14 May 2025 13:19:47 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Matthew Wilcox , Hugh Dickins , Chris Li , David Hildenbrand , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Baolin Wang , Baoquan He , Barry Song , Kalesh Singh , Kemeng Shi , Tim Chen , Ryan Roberts , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 25/28] mm/workingset: leave highest 8 bits empty for anon shadow Date: Thu, 15 May 2025 04:17:25 +0800 Message-ID: <20250514201729.48420-26-ryncsn@gmail.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250514201729.48420-1-ryncsn@gmail.com> References: <20250514201729.48420-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: by3944z1y8pdsyzwhkozz6pbcmrkupm1 X-Rspam-User: X-Rspamd-Queue-Id: 0C21F1C0003 X-Rspamd-Server: rspam06 X-HE-Tag: 1747253989-994719 X-HE-Meta: U2FsdGVkX19Qyr+F0yFeDQ4DvPt1ngpEBlY3Htt1s+ziryH2t8lM00PIBqoAmXGhmdDhlCusiC9N56blDbfHK1evI59c2u2j8l9ZM/hcu5Zk93IHTj+6gZOWs9hHpVKK2qNXQ+R1TnipE+5Qa3A0FqTKGt68QaCy2ZYPY7oCnP2EYcA93Rzpyy/J7rqtAmQ4C7qYcLkZs/NoM8arYFvS9GPsyWUtu4flVnL7zA+4FOqW1PNLbCvccAc+FxEI2VsyrjVEeK6j61SovnbCH4A685MrLHLYKLNtpm8cxmGbC0KMzpTRzEyQte16nKAbncXevPAO2Kbx5epI7egvWnrTs2FHOEJO6+1Rc+M14+erWKoSeAICoUj7oHOizMAX9/5BBBywFGZAYEwNXJvRcwIAVRcDGiyw3zt8mmNxx4jrCzwtwnDGIVzV1tRQEwAL8cETubPCqM0F6KtMbzAcAl6ZhV7slWdXUC4hKz+EEVCpnovxTv/9jRsulNVSpd5FORIri89wLkDotNYs5vbV8evHQ7OUfZH6oAmrl0UqpjsnKq6ombQ5Gmyux2ENgimT6+uuqwcKkmFlnODUI1crr9Ju4n/tAje/cuhTiS2VxuJ8zEIkQp8tPXRCWci02x/yXsXrU4UTC+SUFEZIuz4ibqqP/zfUTiXOjQk/uwaVjDkW9Txhhq69oLeEbvSh4q+0ONvDRFNpMqkXZJ4rsyjfdXiAVqDhJNq5lcLiUcby8dd5hpwB4eg06k7uCLp96G3jpdR7D7oLEmWtkmpe0M2RFdYPF7LKc5fyyqNuL8Jwtyk83c5kSqjbVmnFAaN/xrwzuKV7IkbqwL/HKe2VD91ohBqkpValFDx8Tlwoo0520OnI0vy2DnKZYBM5sZXLVdvGlVtiqBk0T8KBsMDCA6HUdST9AuaGC4Q+S4arZaJHyUhO/F3oFf+FfY2PJT+VmmrGzDQjsDGmCYAiOGMiqagyBoY Hh+j30MD 3iWQNYM43SD0scFsPwMdUr7c7M424wG/DNo0SP504oO87gidflDpYBVLfHYIyKB1upLrwrMh/CpSp4bjBjWb7Ky6mwZoeTzujUVj7NA/fr6Ho27rZkkAqcM2GmKVYNKKj9n7iNjarDnEDH/gHJBwj/WpS1x8vRKFSvqqUg1ru0VTWPU7OfP9mzKODnn4oYTDbtUTt5HRemKkDxXWIIk8nSQp+lxj/c3EuhFJTPpmZ1epPJtN74Ftl6eWJB1L+qq2Bz5j5DmSVFMCWUha/3XnLyCRn5ScKnsdIKfJtY/Hn6sYr9TTjCRFl0+0lcYPZo/VL76he123Ll/d/5hJxU5FHCy/Q2nlwx8yMTzfCEsZv1U26oYjXjknciwGxF5fCAEpuqCH3Zi5bswHtl3CeHEsIHRwYX9UjGerN2p17ZAu2v7ZiPMbk1xfIUBvCFjMsRAifQ+PLvj4R+NJrQC41lOa4HW4WhJF6Z0PjX/IXNu/NDCeTRzgrIwWaYPRCqLbRxzPIOeVprYT5wYjmIax6/QAEhgZPzG7kc5+uLOzi3XNFd9v0p47C9BWhwvZgAqBXrEmCtQ5t X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Swap table entry will need 8 bits reserved for swap count, so anon shadow should have 8 bits remain 0. This should be OK for foreseeable future, take 52 bits physical address space as example: for 4K pages, there would be at most 40 bits for addressable pages. Currently we have 36 bits available (with NODES_SHIFT set to 10, this can be decreased for more bits), so in worst case refault distance compare will be done for every 64K sized bucket. This commit may increases the bucket size to 16M, which should be fine as the workingset size will be way larger than the bucket size for such large machines. For MGLRU 28 bits can track a huge amount of gens already, there should be no problem either. And the 8 bits can be changed to 6 or even fewer bits later. Signed-off-by: Kairui Song --- mm/swap_table.h | 1 + mm/workingset.c | 39 ++++++++++++++++++++++++++------------- 2 files changed, 27 insertions(+), 13 deletions(-) diff --git a/mm/swap_table.h b/mm/swap_table.h index 9356004d211a..afb2953d408a 100644 --- a/mm/swap_table.h +++ b/mm/swap_table.h @@ -65,6 +65,7 @@ static inline swp_te_t shadow_swp_te(void *shadow) BUILD_BUG_ON((BITS_PER_XA_VALUE + 1) != BITS_PER_BYTE * sizeof(swp_te_t)); BUILD_BUG_ON((unsigned long)xa_mk_value(0) != ENTRY_SHADOW_MARK); VM_WARN_ON_ONCE(shadow && !xa_is_value(shadow)); + VM_WARN_ON((unsigned long)shadow & ENTRY_COUNT_MASK); swp_te.counter |= ENTRY_SHADOW_MARK; return swp_te; } diff --git a/mm/workingset.c b/mm/workingset.c index 6e7f4cb1b9a7..86a549a17ae1 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -16,6 +16,7 @@ #include #include #include +#include "swap_table.h" #include "internal.h" /* @@ -184,7 +185,9 @@ #define EVICTION_SHIFT ((BITS_PER_LONG - BITS_PER_XA_VALUE) + \ WORKINGSET_SHIFT + NODES_SHIFT + \ MEM_CGROUP_ID_SHIFT) +#define EVICTION_SHIFT_ANON (EVICTION_SHIFT + SWAP_COUNT_SHIFT) #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) +#define EVICTION_MASK_ANON (~0UL >> EVICTION_SHIFT_ANON) /* * Eviction timestamps need to be able to cover the full range of @@ -194,12 +197,16 @@ * that case, we have to sacrifice granularity for distance, and group * evictions into coarser buckets by shaving off lower timestamp bits. */ -static unsigned int bucket_order __read_mostly; +static unsigned int bucket_order[ANON_AND_FILE] __read_mostly; static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction, - bool workingset) + bool workingset, bool file) { - eviction &= EVICTION_MASK; + if (file) + eviction &= EVICTION_MASK; + else + eviction &= EVICTION_MASK_ANON; + eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid; eviction = (eviction << NODES_SHIFT) | pgdat->node_id; eviction = (eviction << WORKINGSET_SHIFT) | workingset; @@ -244,7 +251,8 @@ static void *lru_gen_eviction(struct folio *folio) struct mem_cgroup *memcg = folio_memcg(folio); struct pglist_data *pgdat = folio_pgdat(folio); - BUILD_BUG_ON(LRU_GEN_WIDTH + LRU_REFS_WIDTH > BITS_PER_LONG - EVICTION_SHIFT); + BUILD_BUG_ON(LRU_GEN_WIDTH + LRU_REFS_WIDTH > + BITS_PER_LONG - max(EVICTION_SHIFT, EVICTION_SHIFT_ANON)); lruvec = mem_cgroup_lruvec(memcg, pgdat); lrugen = &lruvec->lrugen; @@ -254,7 +262,7 @@ static void *lru_gen_eviction(struct folio *folio) hist = lru_hist_from_seq(min_seq); atomic_long_add(delta, &lrugen->evicted[hist][type][tier]); - return pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset); + return pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset, type); } /* @@ -381,6 +389,7 @@ void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages) void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg) { struct pglist_data *pgdat = folio_pgdat(folio); + int file = folio_is_file_lru(folio); unsigned long eviction; struct lruvec *lruvec; int memcgid; @@ -397,10 +406,10 @@ void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg) /* XXX: target_memcg can be NULL, go through lruvec */ memcgid = mem_cgroup_id(lruvec_memcg(lruvec)); eviction = atomic_long_read(&lruvec->nonresident_age); - eviction >>= bucket_order; + eviction >>= bucket_order[file]; workingset_age_nonresident(lruvec, folio_nr_pages(folio)); return pack_shadow(memcgid, pgdat, eviction, - folio_test_workingset(folio)); + folio_test_workingset(folio), folio_is_file_lru(folio)); } /** @@ -438,7 +447,7 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset, rcu_read_lock(); unpack_shadow(shadow, &memcgid, &pgdat, &eviction, workingset); - eviction <<= bucket_order; + eviction <<= bucket_order[file]; /* * Look up the memcg associated with the stored ID. It might @@ -780,8 +789,8 @@ static struct lock_class_key shadow_nodes_key; static int __init workingset_init(void) { + unsigned int timestamp_bits, timestamp_bits_anon; struct shrinker *workingset_shadow_shrinker; - unsigned int timestamp_bits; unsigned int max_order; int ret = -ENOMEM; @@ -794,11 +803,15 @@ static int __init workingset_init(void) * double the initial memory by using totalram_pages as-is. */ timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT; + timestamp_bits_anon = BITS_PER_LONG - EVICTION_SHIFT_ANON; max_order = fls_long(totalram_pages() - 1); - if (max_order > timestamp_bits) - bucket_order = max_order - timestamp_bits; - pr_info("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n", - timestamp_bits, max_order, bucket_order); + if (max_order > (BITS_PER_LONG - EVICTION_SHIFT)) + bucket_order[WORKINGSET_FILE] = max_order - timestamp_bits; + if (max_order > timestamp_bits_anon) + bucket_order[WORKINGSET_ANON] = max_order - timestamp_bits_anon; + pr_info("workingset: timestamp_bits=%d (anon: %d) max_order=%d bucket_order=%u (anon: %d)\n", + timestamp_bits, timestamp_bits_anon, max_order, + bucket_order[WORKINGSET_FILE], bucket_order[WORKINGSET_ANON]); workingset_shadow_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE, -- 2.49.0