From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA0577E0E8 for ; Fri, 19 Dec 2025 19:58:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.196 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766174290; cv=none; b=tFlsBexWbOd2ww0Yr8+sJgywSRUDpAdcp/v44lAWJW2NQkF8D250+JGo5zlbDSBl3cUPX9wkUDW1zLk7q4OoZzZqgDdP11joC1tRF8W3/gN4tLJjg4sO2m6ZqIY6NdNO01o4YuMy6TUXHCqrYby1TdNwaFFLwwRAnyEF8Al+AfU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766174290; c=relaxed/simple; bh=31dFcsjD7K9QVX4sb15+PRfrOeEiJVIOLRfCZTtA4oA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=UDYm3+mmWNnfvphacY5vcOncvyi+DnGvdiQlHN71T6oEv5T4zw4hMPfMjYSIJ5ZC7y/7Xdz5v/uMYVrVvKdoOnJJ5LacIPvjd6Xq8U4lbZI5cSJWBYX3JHvu9n+JD6R3y1jtmrPd/44P7nxLBB1atcgH+nQJv7MNjBRRdIGfc6c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PttNT1q7; arc=none smtp.client-ip=209.85.214.196 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PttNT1q7" Received: by mail-pl1-f196.google.com with SMTP id d9443c01a7336-2a07f8dd9cdso23600105ad.1 for ; Fri, 19 Dec 2025 11:58:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766174288; x=1766779088; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4H4DuUxDB6oobQYa52xvrq3Ax3h4Y1Qt9zUrdq7LqpM=; b=PttNT1q7kSprQA87lq1kPV/uPu6tIuy75mCArP4uDdG/Chpf4Bp65AtOghclIm2YOq GnJPNI2rmTOUzkT+OoqsVaMSaeDOvqw4G3eUTAplblpZuuZ57wpQ5s3c79+kI/9kniap d/dKQtiPgIod2he8QtPMAFJzxQwLR+G9sc/l2rSqsqRVjCpL+Lmga1W7pE7d4Ozu4Xad T4s7HObmk/QTOlRpS5qBUf5pN5UG7+VePJ1Ow2SejClKoBOE7DkNgZeG6U6tOdnOLvJl rxgD3NtaSOzznHnLztVgRdQlI8Gej7x2LPZpIEin0tpkBNaLdfdxL0Ai4Javjm6eC0y0 VS6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766174288; x=1766779088; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=4H4DuUxDB6oobQYa52xvrq3Ax3h4Y1Qt9zUrdq7LqpM=; b=QSGjZkWohnsTjTWFEFEs5x7oF94+TriYS4s/M8e4qljJyOC/KpHLsqaFcC4VEuJcIg JkoizgY766dsAcSK/USN77irgQNtNGml8Io1Xz96lGj8QAIbEtkuiJ1Jd7J0PJGJllzM T42lUaNp6g3GVoy7Jdd7iLnA2M0bxdckjJ5xQJNPespcVdxR2wOshqL/MlkmGeXu5bWn N9CpRfGIWNNHAyL57EGeF21pso3P8GWQZiIuWE5jQxXxrRmYuyG9TrrVIGQTefK7Z9lp PTYfn2PlvIfZeRvGDXXTJ1YJ9r7QFneQxVqK6VSrD87fcL8bsqOJi2TYz+57XRinEMWM ZS4Q== X-Forwarded-Encrypted: i=1; AJvYcCU6APaGhKh8ZJvox2tn9WoDDIJ5w5Dg4jNI1crWEkGrlOzNgEwAJst6eJh3QYntvVk8nt2kzpm+SwmlBWk=@vger.kernel.org X-Gm-Message-State: AOJu0YwCU6qNoAPh0/dSJA68yvONHirV++AIIHAALjS3Eh2G09YCQb8E ct6q8Bug1+rrPzoZ8tK0T2OIKHRRmOFAJqwsZRo55HAGDGOp4MYmG8u/ X-Gm-Gg: AY/fxX6oQOV2NHIQLP3N4WdWs/toS0YZ2DAOAC7st7vJ7dU8+Bz8i2KtrDIWJ35vsC/ hGJbAjJ25EI9unQtbGzFUKMaaKJH+I1U0/ej+4SVkr6QfbTmwfQ2/amNUViO+Mq61QKwa7SC5nq yry8km1PHiP/3SyEKKCRFr2wOci2odu9FfHvl/ffOwn4izXu/1UcxkRphFTZ8CQrGcW+YWgHzWs yHJZaJQYmL2Ie3fMPQmBmNJ2ghpYe5ni98461ZQOngKqzdJ7RHjfZIITJNZpkHS9vDt9UU8X/XL qF4snI+q8wb6XklejnjPAPeRPcdnCyM39zLKBk1HK7+NtOrHkYluT+LevVmd2khhaVKilxxOW4v RjZ4gE7HvDePHbCILKTj4JOWVnrGWUM9V8T2poJV1IPa9G+0FAauV4JdIlZRtVd+d1uA71WTjaX rjEAqAaU5bpuPkzDW83LESjqdcJrmQinybkCedtPDqL6NhzZmOTTNLZC5uvwPsJzJbWvO8eAMqF PFSww== X-Google-Smtp-Source: AGHT+IG3wtyX2kCqK0/+5Lh2TLFmivYjvnsdXdS7J5h4wcZuh9jh+Qx4ARXEJV+u1nbtB3ROW7TP9g== X-Received: by 2002:a17:902:d50b:b0:2a0:f47c:cfc with SMTP id d9443c01a7336-2a2f283685amr38147215ad.34.1766174287808; Fri, 19 Dec 2025 11:58:07 -0800 (PST) Received: from KASONG-MC4.tencent.com ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2a2f3d6ec6bsm29561605ad.87.2025.12.19.11.58.03 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 19 Dec 2025 11:58:07 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Kairui Song , Andrew Morton , Baoquan He , Barry Song , Chris Li , Nhat Pham , Yosry Ahmed , David Hildenbrand , Johannes Weiner , Youngjun Park , Hugh Dickins , Baolin Wang , Ying Huang , Kemeng Shi , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v5 07/19] mm/shmem: never bypass the swap cache for SWP_SYNCHRONOUS_IO Date: Sat, 20 Dec 2025 03:57:51 +0800 Message-ID: <20251219195751.61328-1-ryncsn@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251220-swap-table-p2-v5-0-8862a265a033@tencent.com> References: <20251220-swap-table-p2-v5-0-8862a265a033@tencent.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-Developer-Signature: v=1; a=ed25519-sha256; t=1766173451; l=8383; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=sKGl3ENiC74FwrANABEUQMRrBkxGAp/uMm91rZRl9xM=; b=yf5gksHPlsaqtrml++5CgCNI5OX/44vVNb6duB9fWkQkMKDNXC+FFx8Jxgs7hLdW9/Lsta7Bm l1nNK9Cc7AWBVfWv/5eQgmJ/f2SGNDpuQOKI16R60HkMmte2972R59h X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= Content-Transfer-Encoding: quoted-printable From: Kairui Song =0D =0D Now the overhead of the swap cache is trivial to none, bypassing the=0D swap cache is no longer a good optimization.=0D =0D We have removed the cache bypass swapin for anon memory, now do the same=0D for shmem. Many helpers and functions can be dropped now.=0D =0D The performance may slightly drop because of the co-existence and double=0D update of swap_map and swap table, and this problem will be improved=0D very soon in later commits by dropping the swap_map update partially:=0D =0D Swapin of 24 GB file with tmpfs with=0D transparent_hugepage_tmpfs=3Dwithin_size and ZRAM, 3 test runs on my=0D machine:=0D =0D Before: After this commit: After this series:=0D 5.99s 6.29s 6.08s=0D =0D And later swap table phases will drop the swap_map completely to avoid=0D overhead and reduce memory usage.=0D =0D Reviewed-by: Baolin Wang =0D Tested-by: Baolin Wang =0D Signed-off-by: Kairui Song =0D ---=0D mm/shmem.c | 65 +++++++++++++++++--------------------------------------= ----=0D mm/swap.h | 4 ----=0D mm/swapfile.c | 35 +++++++++-----------------------=0D 3 files changed, 27 insertions(+), 77 deletions(-)=0D =0D diff --git a/mm/shmem.c b/mm/shmem.c=0D index dd136d40631c..d7eeeaa9580d 100644=0D --- a/mm/shmem.c=0D +++ b/mm/shmem.c=0D @@ -2014,10 +2014,9 @@ static struct folio *shmem_swap_alloc_folio(struct i= node *inode,=0D swp_entry_t entry, int order, gfp_t gfp)=0D {=0D struct shmem_inode_info *info =3D SHMEM_I(inode);=0D + struct folio *new, *swapcache;=0D int nr_pages =3D 1 << order;=0D - struct folio *new;=0D gfp_t alloc_gfp;=0D - void *shadow;=0D =0D /*=0D * We have arrived here because our zones are constrained, so don't=0D @@ -2057,34 +2056,19 @@ static struct folio *shmem_swap_alloc_folio(struct = inode *inode,=0D goto fallback;=0D }=0D =0D - /*=0D - * Prevent parallel swapin from proceeding with the swap cache flag.=0D - *=0D - * Of course there is another possible concurrent scenario as well,=0D - * that is to say, the swap cache flag of a large folio has already=0D - * been set by swapcache_prepare(), while another thread may have=0D - * already split the large swap entry stored in the shmem mapping.=0D - * In this case, shmem_add_to_page_cache() will help identify the=0D - * concurrent swapin and return -EEXIST.=0D - */=0D - if (swapcache_prepare(entry, nr_pages)) {=0D + swapcache =3D swapin_folio(entry, new);=0D + if (swapcache !=3D new) {=0D folio_put(new);=0D - new =3D ERR_PTR(-EEXIST);=0D - /* Try smaller folio to avoid cache conflict */=0D - goto fallback;=0D + if (!swapcache) {=0D + /*=0D + * The new folio is charged already, swapin can=0D + * only fail due to another raced swapin.=0D + */=0D + new =3D ERR_PTR(-EEXIST);=0D + goto fallback;=0D + }=0D }=0D -=0D - __folio_set_locked(new);=0D - __folio_set_swapbacked(new);=0D - new->swap =3D entry;=0D -=0D - memcg1_swapin(entry, nr_pages);=0D - shadow =3D swap_cache_get_shadow(entry);=0D - if (shadow)=0D - workingset_refault(new, shadow);=0D - folio_add_lru(new);=0D - swap_read_folio(new, NULL);=0D - return new;=0D + return swapcache;=0D fallback:=0D /* Order 0 swapin failed, nothing to fallback to, abort */=0D if (!order)=0D @@ -2174,8 +2158,7 @@ static int shmem_replace_folio(struct folio **foliop,= gfp_t gfp,=0D }=0D =0D static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t inde= x,=0D - struct folio *folio, swp_entry_t swap,=0D - bool skip_swapcache)=0D + struct folio *folio, swp_entry_t swap)=0D {=0D struct address_space *mapping =3D inode->i_mapping;=0D swp_entry_t swapin_error;=0D @@ -2191,8 +2174,7 @@ static void shmem_set_folio_swapin_error(struct inode= *inode, pgoff_t index,=0D =0D nr_pages =3D folio_nr_pages(folio);=0D folio_wait_writeback(folio);=0D - if (!skip_swapcache)=0D - swap_cache_del_folio(folio);=0D + swap_cache_del_folio(folio);=0D /*=0D * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks=0D * won't be 0 when inode is released and thus trigger WARN_ON(i_blocks)=0D @@ -2292,7 +2274,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index,=0D softleaf_t index_entry;=0D struct swap_info_struct *si;=0D struct folio *folio =3D NULL;=0D - bool skip_swapcache =3D false;=0D int error, nr_pages, order;=0D pgoff_t offset;=0D =0D @@ -2335,7 +2316,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index,=0D folio =3D NULL;=0D goto failed;=0D }=0D - skip_swapcache =3D true;=0D } else {=0D /* Cached swapin only supports order 0 folio */=0D folio =3D shmem_swapin_cluster(swap, gfp, info, index);=0D @@ -2391,9 +2371,8 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index,=0D * and swap cache folios are never partially freed.=0D */=0D folio_lock(folio);=0D - if ((!skip_swapcache && !folio_test_swapcache(folio)) ||=0D - shmem_confirm_swap(mapping, index, swap) < 0 ||=0D - folio->swap.val !=3D swap.val) {=0D + if (!folio_matches_swap_entry(folio, swap) ||=0D + shmem_confirm_swap(mapping, index, swap) < 0) {=0D error =3D -EEXIST;=0D goto unlock;=0D }=0D @@ -2425,12 +2404,7 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index,=0D if (sgp =3D=3D SGP_WRITE)=0D folio_mark_accessed(folio);=0D =0D - if (skip_swapcache) {=0D - folio->swap.val =3D 0;=0D - swapcache_clear(si, swap, nr_pages);=0D - } else {=0D - swap_cache_del_folio(folio);=0D - }=0D + swap_cache_del_folio(folio);=0D folio_mark_dirty(folio);=0D swap_free_nr(swap, nr_pages);=0D put_swap_device(si);=0D @@ -2441,14 +2415,11 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index,=0D if (shmem_confirm_swap(mapping, index, swap) < 0)=0D error =3D -EEXIST;=0D if (error =3D=3D -EIO)=0D - shmem_set_folio_swapin_error(inode, index, folio, swap,=0D - skip_swapcache);=0D + shmem_set_folio_swapin_error(inode, index, folio, swap);=0D unlock:=0D if (folio)=0D folio_unlock(folio);=0D failed_nolock:=0D - if (skip_swapcache)=0D - swapcache_clear(si, folio->swap, folio_nr_pages(folio));=0D if (folio)=0D folio_put(folio);=0D put_swap_device(si);=0D diff --git a/mm/swap.h b/mm/swap.h=0D index 214e7d041030..e0f05babe13a 100644=0D --- a/mm/swap.h=0D +++ b/mm/swap.h=0D @@ -403,10 +403,6 @@ static inline int swap_writeout(struct folio *folio,=0D return 0;=0D }=0D =0D -static inline void swapcache_clear(struct swap_info_struct *si, swp_entry_= t entry, int nr)=0D -{=0D -}=0D -=0D static inline struct folio *swap_cache_get_folio(swp_entry_t entry)=0D {=0D return NULL;=0D diff --git a/mm/swapfile.c b/mm/swapfile.c=0D index e5284067a442..3762b8f3f9e9 100644=0D --- a/mm/swapfile.c=0D +++ b/mm/swapfile.c=0D @@ -1614,22 +1614,6 @@ struct swap_info_struct *get_swap_device(swp_entry_t= entry)=0D return NULL;=0D }=0D =0D -static void swap_entries_put_cache(struct swap_info_struct *si,=0D - swp_entry_t entry, int nr)=0D -{=0D - unsigned long offset =3D swp_offset(entry);=0D - struct swap_cluster_info *ci;=0D -=0D - ci =3D swap_cluster_lock(si, offset);=0D - if (swap_only_has_cache(si, offset, nr)) {=0D - swap_entries_free(si, ci, entry, nr);=0D - } else {=0D - for (int i =3D 0; i < nr; i++, entry.val++)=0D - swap_entry_put_locked(si, ci, entry, SWAP_HAS_CACHE);=0D - }=0D - swap_cluster_unlock(ci);=0D -}=0D -=0D static bool swap_entries_put_map(struct swap_info_struct *si,=0D swp_entry_t entry, int nr)=0D {=0D @@ -1765,13 +1749,21 @@ void swap_free_nr(swp_entry_t entry, int nr_pages)= =0D void put_swap_folio(struct folio *folio, swp_entry_t entry)=0D {=0D struct swap_info_struct *si;=0D + struct swap_cluster_info *ci;=0D + unsigned long offset =3D swp_offset(entry);=0D int size =3D 1 << swap_entry_order(folio_order(folio));=0D =0D si =3D _swap_info_get(entry);=0D if (!si)=0D return;=0D =0D - swap_entries_put_cache(si, entry, size);=0D + ci =3D swap_cluster_lock(si, offset);=0D + if (swap_only_has_cache(si, offset, size))=0D + swap_entries_free(si, ci, entry, size);=0D + else=0D + for (int i =3D 0; i < size; i++, entry.val++)=0D + swap_entry_put_locked(si, ci, entry, SWAP_HAS_CACHE);=0D + swap_cluster_unlock(ci);=0D }=0D =0D int __swap_count(swp_entry_t entry)=0D @@ -3784,15 +3776,6 @@ int swapcache_prepare(swp_entry_t entry, int nr)=0D return __swap_duplicate(entry, SWAP_HAS_CACHE, nr);=0D }=0D =0D -/*=0D - * Caller should ensure entries belong to the same folio so=0D - * the entries won't span cross cluster boundary.=0D - */=0D -void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry, int n= r)=0D -{=0D - swap_entries_put_cache(si, entry, nr);=0D -}=0D -=0D /*=0D * add_swap_count_continuation - called when a swap count is duplicated=0D * beyond SWAP_MAP_MAX, it allocates a new page and links that to the entr= y's=0D =0D -- =0D 2.52.0=0D =0D