From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4364C3ABD8 for ; Wed, 14 May 2025 20:19:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35DB36B0092; Wed, 14 May 2025 16:19:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 30D486B00B7; Wed, 14 May 2025 16:19:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13A0B6B00B8; Wed, 14 May 2025 16:19:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DE9796B0092 for ; Wed, 14 May 2025 16:19:36 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DEB27140B7F for ; Wed, 14 May 2025 20:19:36 +0000 (UTC) X-FDA: 83442628752.11.6210E04 Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) by imf17.hostedemail.com (Postfix) with ESMTP id E8C4E40009 for ; Wed, 14 May 2025 20:19:34 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=exrikjZo; spf=pass (imf17.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.215.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747253975; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nGH3UgLLu6HJsYhxZ+8K9wzkU3sDrM85ReEhs5D3H1M=; b=50dZNLNEHX7Df+mISrCOnGKF0eGrX+825Furlm6Wpt69Ux08SmKx8XoXCvdnbIWjyYF/xY uu2WY4KNF5DBqPwJOuRiQlrLeRB8cliMiuWLzNu8aIfY5MrvgOUWxirUUzZRCUv2wl9GcN 8KJUOoripW9c2cgMovz3eSrPn5haLJc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747253975; a=rsa-sha256; cv=none; b=i/ELdRzVtVs7NEQ1k5SjvNzRaXDgRqN34GNqOzOijmi+kdhDW2HB9ylIPM+2w0qyvpuwdR 6O04WeL7rW05t1QgptT8GdfxX+pRCNYII5F7Fdo8oaidwovtwLae1QuvmRLGX1VPvnn4qb cGRaHTywwVAE6Th5IFBChbPOdvuIyno= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=exrikjZo; spf=pass (imf17.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.215.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pg1-f182.google.com with SMTP id 41be03b00d2f7-b0b2d1f2845so87817a12.3 for ; Wed, 14 May 2025 13:19:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747253973; x=1747858773; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=nGH3UgLLu6HJsYhxZ+8K9wzkU3sDrM85ReEhs5D3H1M=; b=exrikjZoXHpbchnM87Mo373tvH/BkS5+BPiHp4iWvfJaoFuckDQjQ90ZZ9y970QCGK 2NBrZ/vmj8s0lnGziA2FusYGau5W9nUZmVxqP7GErNtNLOrVkZpRwc2HsS5AmlX2JEHT nVWBpod5nXshnAtmMHim9hvO2le1m00jiQGs3070MlQrTAHu/Mcab+eakWO0t7K0nW2K QCnBLrMtI770JS+oFYWHtFBJr7Uq6UQIGKyRTlq8cigdsmo3X9CnVyesFB7Dor1nVPTE XkIPEb4BtfNUC6iH7rSuk3Q2iN7cItrTnHGPm3ZmfNNnk7CMzlZRPhk8C9gTxe99xAW9 VQbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747253973; x=1747858773; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=nGH3UgLLu6HJsYhxZ+8K9wzkU3sDrM85ReEhs5D3H1M=; b=SF0E8a1/tqeYy4+82aGHIWQ8ybOny9tLdKnlECW5Y8iHHcR7mqYy9rYIqNzqgJo9bl jhahVlKmEY90VSNJWUJVnAng8iFL5nEweZ37SmGk/W48X4wvrs9jxV31H/AjJvWCa8H4 KNMd7cTCVocBV+Q+nvUig/LiHf+5NuSWlUAvIIn1KNAJ2QkYRT8y9qy/Umvb8hr9CKBn 9g2ouosbxDvFmJ272g0aJu8JMPRbj0i+Me9qTKpZwvpmyvPnnnSqWDPSHbyXvq7nwpVa 2xtayYV1rixJhL6bgkP7+ZkgHm5Lm0fdo8gOKRa6WWJ8rtGcnhArisJmvaDbQiXDTocs +y9A== X-Gm-Message-State: AOJu0YzeGvg/7FPoomfPuK+TLHEAaNRUfVHy8jIbmCBSa9XY2DPJ6cH0 rhjg4hTkiEUwdf/KXI5X1w6rLPInJn42bE5AGIX4UeSsTTn0mv+Y7YOGwn2udSQ= X-Gm-Gg: ASbGnct1sxCX7AZ98PCvnEIjTKKkBpyNjYCMqKKyD+rGUFuXvD4U9aDSJtm6bcKZ+xl v9hAzgqwUPOsJ62oWjOdNf5u1JGAAYi3eN93aQPvsAekgiTzk3sX4OVTX4Ote5H17882Md3EOFe Lou1IUIW/VcgkPnTR3NSgiiP1drDuJIIGpUZdTkpPM4me5eZ9UBBDNSz7sALc+c3lJmnozRp9Rs quYjQc2dWJ44FlZFgq6rm5f+zz4WaEXn1DRsGBAEc+aBglRBAbVzH5PCJ3HeNPIo2dP210CsoWX rej0trGaAPY2UGyvCXBiCO/b3crniZ3b9zoO/imsJ9KlfFmNCcePupf7kZO5DzVo41HgQRAk X-Google-Smtp-Source: AGHT+IGIr5kjapt3vNEP2J6YEtdEOm1TjKVHXnnCXSQPWLHPrNcTbCn4wsagC4rstsHPb8bIs0U4iQ== X-Received: by 2002:a17:90b:3b43:b0:2f6:d266:f462 with SMTP id 98e67ed59e1d1-30e2e68a300mr6853815a91.35.1747253973067; Wed, 14 May 2025 13:19:33 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30e33401934sm2003692a91.9.2025.05.14.13.19.28 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 14 May 2025 13:19:32 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Matthew Wilcox , Hugh Dickins , Chris Li , David Hildenbrand , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Baolin Wang , Baoquan He , Barry Song , Kalesh Singh , Kemeng Shi , Tim Chen , Ryan Roberts , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 22/28] mm, swap: drop the SWAP_HAS_CACHE flag Date: Thu, 15 May 2025 04:17:22 +0800 Message-ID: <20250514201729.48420-23-ryncsn@gmail.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250514201729.48420-1-ryncsn@gmail.com> References: <20250514201729.48420-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: pxbitugqne9xnna4inihx73af77hrmso X-Rspamd-Queue-Id: E8C4E40009 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1747253974-938832 X-HE-Meta: U2FsdGVkX19pEWmPyFxlYzxABwQu8IfKMWm3HzxKg/NlM+1g66Tcf7yyVR7XoZLzhylPb3cHkTT79/AA7evhF9G6vfc2250I/uVOdJEaT4xhnRroffFT/mN+5ZQ3AhH1UbEa2F0FwsFGoubL3GcYe9rpDOAe2gSJsNCT7iSbcP7iX0pddPAMq5caxJrL8o6zbgjZ2eVAGP50Jt0KNOOn2wooujiDmPARwkbrQCMP2pbQjHpi9laHrvhRLUN/eh9BDpg6liwEGknE0SVuYg7OhgzYwX+8VgST051hnyYUpVv1X+lI/c9dWrQuoanUgaWcEvul4Z30ClN/u6qvaYHBkBUcUey0aWlOf1no5XY8qoP03Ghr2SGTzG3Uqv6JkJSUZcrbMUekDzUw80ArUC09GwULs+bHHYySZcnd6jbnr8zA5smxEMFOXyb7QCQ50bhOAW3lQhYMxcciFg7aeprxqeyvk2VlhgmjBLtIE+YpxxywORxM0sQuipi66F+vI6UraUHF0eEtDnnC4wMibGc8VkcObp0j2kOHNbCgfzVveWYWQI8vpV3c1ILt50qQTqE2OS4ZT6q22LKnAC3RGs0NvIZaWIZnYfV3u/8PsSEV72ufbCxvKXHgTjRV+taCdHsDFOAsHd05wzadpi2oNJcvkLrGeC4qQwMKhPfVXneeSxBMdAYx/yjQ++zEt+HeU6kmn4/oZ0l7F06zScG4QruidHjYCY8E+ItKYjuhVPGDEsBKY0lOQ8tgCpOtA8rvies1jbJwTS44zitq3fb1nj1Cf9Pn6lSV0GY2qwJkfArRG+3QjlZ47XIYOV3ZQ/KWq5PXIouArFUjraM3HqMPoR78oD+fXG5VGGGTLCHKoH12eFpBnkNLOI0nOgQ2n2Xqlf3V6CS0BS2za4bBAY9cNbN42iSXyINN+PKJYBN2euYF59+OzR9I8Yb2pvBXPSfdcjiGgdyEqbi83EbtsKseeNz SgHxTMgP 7ERBvQFN2xtzCFKlN6TDzfe9OI1QqswX77CZQC6MhXf6kV0Jamdt24mBsocxMtJzspoqb6Cae035K5Zgd1sujO6mzgIqhZE4C1DUjO6kCvpGF3tgDzRceTqYeQ29us82pvfctxnDhHZC0rW4gBJ+JjbruaYYjyBbWXTi4E+Jrr7bL5o7DwP0pCTqkNIifj1FDPY62qpAtUQutmcKyJKsC6ZjT3urHfUSUV758p7qG3wX9N0o8hLOOvK8R5ipMh8g3EIXKKrn06xnCs3r3wbjlwkarp4D8WWNAFC2OVi7ScyRC8sUMNElca8b9Gs4d6+yeD3NPP6+oPsQKi0+84cPmWDlkl6CaQEQyfOG8gZXYSQrVASr6DODX3Eap+Mad2l/auQS1Hf3ISiHibrfd/7VgUHcmBxxhtr5kGJaI4g67h0Cas+2Dlr9AmutPouc+Ma5E1FWXq6cy3/7OdwS3r8vVjWns6mAAeFgTJFg+E16Hdu6cksX/h0AUj0aulorD0I3Ij9B9rNZilwHJy6XL7FJ9meVBOIe6WplkPtRDqXUVursNEZ/tI0sF3fDlkjxIdH7cm6gQZASx2Frnvv50Mh3h1NMm4MGvWlHN0/TRGqNhkA2IQX8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Now swap cache is managed with swap table, users will check if an entry is cached by looking at the swap table type directly. SWAP_HAS_CACHE is only used to pin a entry temporarily so it won't be used by anyone else. Previous commits have converted all places checking SWAP_HAS_CACHE to check the swap table directly, now the only place still sets SWAP_HAS_CACHE is on folio freeing. Freeing a cached entry will set its swap map to SWAP_HAS_CACHE first, keep the entry pinned with SWAP_HAS_CACHE, then free it. Now as the swap has become the mandatory layer and managed by swap table, and all users are checking the swap table directly, this can be much simplified: when removing a folio from swap cache, free all it's entry that have zero count directly instead of doing a temporarily pin. After above change, SWAP_HAS_CACHE no longer have any users, remove all related logic and helpers. Signed-off-by: Kairui Song --- include/linux/swap.h | 1 - mm/swap.h | 12 ++- mm/swap_state.c | 22 ++++-- mm/swapfile.c | 184 +++++++++++-------------------------------- 4 files changed, 67 insertions(+), 152 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index adac6d51da05..60b126918399 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -224,7 +224,6 @@ enum { #define COMPACT_CLUSTER_MAX SWAP_CLUSTER_MAX /* Bit flag in swap_map */ -#define SWAP_HAS_CACHE 0x40 /* Flag page is cached, in first swap_map */ #define COUNT_CONTINUED 0x80 /* Flag swap_map continuation for full count */ /* Special value in first swap_map */ diff --git a/mm/swap.h b/mm/swap.h index b042609e6eb2..7cbfca39225f 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -135,13 +135,6 @@ static inline void swap_unlock_cluster_irq(struct swap_cluster_info *ci) spin_unlock_irq(&ci->lock); } -extern int __swap_cache_set_entry(struct swap_info_struct *si, - struct swap_cluster_info *ci, - unsigned long offset); -extern void __swap_cache_put_entries(struct swap_info_struct *si, - struct swap_cluster_info *ci, - swp_entry_t entry, unsigned int size); - /* * All swap entries starts getting allocated by folio_alloc_swap(), * and the folio will be added to swap cache. @@ -161,6 +154,11 @@ int folio_dup_swap(struct folio *folio, struct page *subpage); void folio_put_swap(struct folio *folio, struct page *subpage); void folio_free_swap_cache(struct folio *folio); +/* For internal use */ +extern void __swap_free_entries(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset, unsigned int nr_pages); + /* linux/mm/page_io.c */ int sio_pool_init(void); struct swap_iocb; diff --git a/mm/swap_state.c b/mm/swap_state.c index 9e7d40215958..2b145c0f7773 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -169,7 +169,7 @@ struct folio *swap_cache_add_folio(swp_entry_t entry, struct folio *folio, existing = swp_te_folio(exist); goto out_failed; } - if (__swap_cache_set_entry(si, ci, offset)) + if (!__swap_count(swp_entry(si->type, offset))) goto out_failed; if (shadow && swp_te_is_shadow(exist)) *shadow = swp_te_shadow(exist); @@ -191,10 +191,8 @@ struct folio *swap_cache_add_folio(swp_entry_t entry, struct folio *folio, * We may lose shadow here due to raced swapin, which is rare and OK, * caller better keep the previous returned shadow. */ - while (offset-- > start) { + while (offset-- > start) __swap_table_set_shadow(ci, offset, NULL); - __swap_cache_put_entries(si, ci, swp_entry(si->type, offset), 1); - } swap_unlock_cluster(ci); /* @@ -219,6 +217,7 @@ void __swap_cache_del_folio(swp_entry_t entry, pgoff_t offset, start, end; struct swap_info_struct *si; struct swap_cluster_info *ci; + bool folio_swapped = false, need_free = false; unsigned long nr_pages = folio_nr_pages(folio); VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); @@ -235,13 +234,26 @@ void __swap_cache_del_folio(swp_entry_t entry, exist = __swap_table_get(ci, offset); VM_WARN_ON_ONCE(swp_te_folio(exist) != folio); __swap_table_set_shadow(ci, offset, shadow); + if (__swap_count(swp_entry(si->type, offset))) + folio_swapped = true; + else + need_free = true; } while (++offset < end); folio->swap.val = 0; folio_clear_swapcache(folio); node_stat_mod_folio(folio, NR_FILE_PAGES, -nr_pages); lruvec_stat_mod_folio(folio, NR_SWAPCACHE, -nr_pages); - __swap_cache_put_entries(si, ci, entry, nr_pages); + + if (!folio_swapped) { + __swap_free_entries(si, ci, start, nr_pages); + } else if (need_free) { + offset = start; + do { + if (!__swap_count(swp_entry(si->type, offset))) + __swap_free_entries(si, ci, offset, 1); + } while (++offset < end); + } } /* diff --git a/mm/swapfile.c b/mm/swapfile.c index 91025ba98653..c2154f19c21b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -49,21 +49,18 @@ #include #include "swap_table.h" #include "internal.h" +#include "swap_table.h" #include "swap.h" static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); static void free_swap_count_continuations(struct swap_info_struct *); -static void swap_free_entries(struct swap_info_struct *si, - struct swap_cluster_info *ci, - unsigned long start, unsigned int nr_pages); static void swap_range_alloc(struct swap_info_struct *si, unsigned int nr_entries); static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr); static unsigned char swap_put_entry_locked(struct swap_info_struct *si, struct swap_cluster_info *ci, - swp_entry_t entry, - unsigned char usage); + swp_entry_t entry); static bool folio_swapcache_freeable(struct folio *folio); static DEFINE_SPINLOCK(swap_lock); @@ -145,11 +142,6 @@ static struct swap_info_struct *swp_get_info(swp_entry_t entry) return swp_type_get_info(swp_type(entry)); } -static inline unsigned char swap_count(unsigned char ent) -{ - return ent & ~SWAP_HAS_CACHE; /* may include COUNT_CONTINUED flag */ -} - /* * Use the second highest bit of inuse_pages counter as the indicator * if one swap device is on the available plist, so the atomic can @@ -190,7 +182,7 @@ static bool swap_only_has_cache(struct swap_info_struct *si, do { entry = __swap_table_get(ci, offset); - VM_BUG_ON(!(*map & SWAP_HAS_CACHE)); + VM_WARN_ON_ONCE(!swp_te_is_folio(entry)); if (*map) return false; offset++; @@ -600,7 +592,6 @@ static void partial_free_cluster(struct swap_info_struct *si, { VM_BUG_ON(!ci->count); VM_BUG_ON(ci->count == SWAPFILE_CLUSTER); - lockdep_assert_held(&ci->lock); if (ci->flags != CLUSTER_FLAG_NONFULL) @@ -664,7 +655,7 @@ static bool cluster_reclaim_range(struct swap_info_struct *si, spin_unlock(&ci->lock); do { - if (swap_count(READ_ONCE(map[offset]))) + if (READ_ONCE(map[offset])) break; nr_reclaim = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); if (nr_reclaim > 0) @@ -696,10 +687,9 @@ static bool cluster_scan_range(struct swap_info_struct *si, return true; for (offset = start; offset < end; offset++) { - if (swap_count(map[offset])) + if (map[offset]) return false; if (swp_te_is_folio(__swap_table_get(ci, offset))) { - VM_WARN_ON_ONCE(!(map[offset] & SWAP_HAS_CACHE)); if (!vm_swap_full()) return false; *need_reclaim = true; @@ -733,7 +723,6 @@ static bool cluster_alloc_range(struct swap_info_struct *si, if (folio) { /* from folio_alloc_swap */ __swap_cache_add_folio(entry, ci, folio); - memset(&si->swap_map[offset], SWAP_HAS_CACHE, nr_pages); } else { /* from get_swap_page_of_type */ VM_WARN_ON_ONCE(si->swap_map[offset] || swap_cache_check_folio(entry)); @@ -818,7 +807,7 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force) to_scan--; while (offset < end) { - if (!swap_count(map[offset]) && + if (!map[offset] && swp_te_is_folio(__swap_table_get(ci, offset))) { spin_unlock(&ci->lock); nr_reclaim = __try_to_reclaim_swap(si, offset, @@ -910,7 +899,7 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, * Scan only one fragment cluster is good enough. Order 0 * allocation will surely success, and mTHP allocation failure * is not critical, and scanning one cluster still keeps the - * list rotated and scanned (for reclaiming HAS_CACHE). + * list rotated and scanned (for reclaiming swap cachec). */ ci = isolate_lock_cluster(si, &si->frag_clusters[order]); if (ci) { @@ -1226,10 +1215,9 @@ static bool swap_put_entries(struct swap_info_struct *si, do { swp_te = __swap_table_get(ci, offset); count = si->swap_map[offset]; - if (WARN_ON_ONCE(!swap_count(count))) { + if (WARN_ON_ONCE(!count)) { goto skip; } else if (swp_te_is_folio(swp_te)) { - VM_WARN_ON_ONCE(!(count & SWAP_HAS_CACHE)); /* Let the swap cache (folio) handle the final free */ has_cache = true; } else if (count == 1) { @@ -1237,16 +1225,16 @@ static bool swap_put_entries(struct swap_info_struct *si, head = head ? head : offset; continue; } - swap_put_entry_locked(si, ci, swp_entry(si->type, offset), 1); + swap_put_entry_locked(si, ci, swp_entry(si->type, offset)); skip: if (head) { - swap_free_entries(si, ci, head, offset - head); + __swap_free_entries(si, ci, head, offset - head); head = SWAP_ENTRY_INVALID; } } while (++offset < cluster_end); if (head) { - swap_free_entries(si, ci, head, offset - head); + __swap_free_entries(si, ci, head, offset - head); head = SWAP_ENTRY_INVALID; } @@ -1296,12 +1284,10 @@ int folio_alloc_swap(struct folio *folio, gfp_t gfp) local_unlock(&percpu_swap_cluster.lock); /* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */ - if (mem_cgroup_try_charge_swap(folio, folio->swap)) { + if (mem_cgroup_try_charge_swap(folio, folio->swap)) folio_free_swap_cache(folio); - return -ENOMEM; - } - if (!folio->swap.val) + if (unlikely(!folio->swap.val)) return -ENOMEM; atomic_long_sub(size, &nr_swap_pages); @@ -1393,13 +1379,8 @@ static struct swap_info_struct *_swap_info_get(swp_entry_t entry) offset = swp_offset(entry); if (offset >= si->max) goto bad_offset; - if (data_race(!si->swap_map[swp_offset(entry)])) - goto bad_free; return si; -bad_free: - pr_err("%s: %s%08lx\n", __func__, Unused_offset, entry.val); - goto out; bad_offset: pr_err("%s: %s%08lx\n", __func__, Bad_offset, entry.val); goto out; @@ -1414,22 +1395,13 @@ static struct swap_info_struct *_swap_info_get(swp_entry_t entry) static unsigned char swap_put_entry_locked(struct swap_info_struct *si, struct swap_cluster_info *ci, - swp_entry_t entry, - unsigned char usage) + swp_entry_t entry) { unsigned long offset = swp_offset(entry); unsigned char count; - unsigned char has_cache; count = si->swap_map[offset]; - - has_cache = count & SWAP_HAS_CACHE; - count &= ~SWAP_HAS_CACHE; - - if (usage == SWAP_HAS_CACHE) { - VM_BUG_ON(!has_cache); - has_cache = 0; - } else if ((count & ~COUNT_CONTINUED) <= SWAP_MAP_MAX) { + if ((count & ~COUNT_CONTINUED) <= SWAP_MAP_MAX) { if (count == COUNT_CONTINUED) { if (swap_count_continued(si, offset, count)) count = SWAP_MAP_MAX | COUNT_CONTINUED; @@ -1439,13 +1411,11 @@ static unsigned char swap_put_entry_locked(struct swap_info_struct *si, count--; } - usage = count | has_cache; - if (usage) - WRITE_ONCE(si->swap_map[offset], usage); - else - swap_free_entries(si, ci, offset, 1); + WRITE_ONCE(si->swap_map[offset], count); + if (!count && !swp_te_is_folio(__swap_table_get(ci, offset))) + __swap_free_entries(si, ci, offset, 1); - return usage; + return count; } /* @@ -1514,25 +1484,12 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) return NULL; } -/* - * Check if it's the last ref of swap entry in the freeing path. - */ -static inline bool __maybe_unused swap_is_last_ref(unsigned char count) -{ - return (count == SWAP_HAS_CACHE) || (count == 1); -} - -/* - * Drop the last ref of swap entries, caller have to ensure all entries - * belong to the same cgroup and cluster. - */ -static void swap_free_entries(struct swap_info_struct *si, - struct swap_cluster_info *ci, - unsigned long offset, unsigned int nr_pages) +void __swap_free_entries(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset, unsigned int nr_pages) { swp_entry_t entry = swp_entry(si->type, offset); - unsigned char *map = si->swap_map + offset; - unsigned char *map_end = map + nr_pages; + unsigned long end = offset + nr_pages; /* It should never free entries across different clusters */ VM_BUG_ON(ci != swp_offset_cluster(si, offset + nr_pages - 1)); @@ -1541,10 +1498,10 @@ static void swap_free_entries(struct swap_info_struct *si, ci->count -= nr_pages; do { - VM_BUG_ON(!swap_is_last_ref(*map)); - *map = 0; - } while (++map < map_end); + si->swap_map[offset] = 0; + } while (++offset < end); + offset = swp_offset(entry); mem_cgroup_uncharge_swap(entry, nr_pages); swap_range_free(si, offset, nr_pages); @@ -1554,46 +1511,12 @@ static void swap_free_entries(struct swap_info_struct *si, partial_free_cluster(si, ci); } -/* - * Caller has made sure that the swap device corresponding to entry - * is still around or has not been recycled. - */ -void __swap_cache_put_entries(struct swap_info_struct *si, - struct swap_cluster_info *ci, - swp_entry_t entry, unsigned int size) -{ - if (swap_only_has_cache(si, ci, swp_offset(entry), size)) - swap_free_entries(si, ci, swp_offset(entry), size); - else - for (int i = 0; i < size; i++, entry.val++) - swap_put_entry_locked(si, ci, entry, SWAP_HAS_CACHE); -} - -/* - * Called after dropping swapcache to decrease refcnt to swap entries. - */ -void put_swap_folio(struct folio *folio, swp_entry_t entry) -{ - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - int size = 1 << swap_entry_order(folio_order(folio)); - - si = _swap_info_get(entry); - if (!si) - return; - - ci = swap_lock_cluster(si, offset); - __swap_cache_put_entries(si, ci, entry, size); - swap_unlock_cluster(ci); -} - int __swap_count(swp_entry_t entry) { struct swap_info_struct *si = swp_info(entry); pgoff_t offset = swp_offset(entry); - return swap_count(si->swap_map[offset]); + return si->swap_map[offset]; } /* @@ -1608,7 +1531,7 @@ bool swap_entry_swapped(struct swap_info_struct *si, swp_entry_t entry) int count; ci = swap_lock_cluster(si, offset); - count = swap_count(si->swap_map[offset]); + count = si->swap_map[offset]; swap_unlock_cluster(ci); return !!count; } @@ -1634,7 +1557,7 @@ int swp_swapcount(swp_entry_t entry) ci = swap_lock_cluster(si, offset); - count = swap_count(si->swap_map[offset]); + count = si->swap_map[offset]; if (!(count & COUNT_CONTINUED)) goto out; @@ -1672,12 +1595,12 @@ static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, ci = swap_lock_cluster(si, offset); if (nr_pages == 1) { - if (swap_count(map[roffset])) + if (map[roffset]) ret = true; goto unlock_out; } for (i = 0; i < nr_pages; i++) { - if (swap_count(map[offset + i])) { + if (map[offset + i]) { ret = true; break; } @@ -1777,6 +1700,7 @@ void do_put_swap_entries(swp_entry_t entry, int nr) swp_te_t swp_te; si = get_swap_device(entry); + if (WARN_ON_ONCE(!si)) return; if (WARN_ON_ONCE(end_offset > si->max)) @@ -1800,7 +1724,7 @@ void do_put_swap_entries(swp_entry_t entry, int nr) for (offset = start_offset; offset < end_offset; offset += nr) { nr = 1; swp_te = __swap_table_get(swp_offset_cluster(si, offset), offset); - if (!swap_count(si->swap_map[offset]) && swp_te_is_folio(swp_te)) { + if (!READ_ONCE(si->swap_map[offset]) && swp_te_is_folio(swp_te)) { /* * Folios are always naturally aligned in swap so * advance forward to the next boundary. Zero means no @@ -1818,7 +1742,6 @@ void do_put_swap_entries(swp_entry_t entry, int nr) nr = ALIGN(offset + 1, nr) - offset; } } - out: put_swap_device(si); } @@ -1860,7 +1783,7 @@ void free_swap_page_of_entry(swp_entry_t entry) if (!si) return; ci = swap_lock_cluster(si, offset); - WARN_ON(swap_count(swap_put_entry_locked(si, ci, entry, 1))); + WARN_ON(swap_put_entry_locked(si, ci, entry)); /* It might got added to swap cache accidentally by read ahead */ __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); swap_unlock_cluster(ci); @@ -2261,6 +2184,7 @@ static unsigned int find_next_to_unuse(struct swap_info_struct *si, unsigned int prev) { unsigned int i; + swp_te_t swp_te; unsigned char count; /* @@ -2271,7 +2195,10 @@ static unsigned int find_next_to_unuse(struct swap_info_struct *si, */ for (i = prev + 1; i < si->max; i++) { count = READ_ONCE(si->swap_map[i]); - if (count && swap_count(count) != SWAP_MAP_BAD) + swp_te = __swap_table_get(swp_offset_cluster(si, i), i); + if (count == SWAP_MAP_BAD) + continue; + if (count || swp_te_is_folio(swp_te)) break; if ((i % LATENCY_LIMIT) == 0) cond_resched(); @@ -3530,7 +3457,7 @@ static int swap_dup_entries(struct swap_info_struct *si, unsigned char usage, int nr) { int i; - unsigned char count, has_cache; + unsigned char count; for (i = 0; i < nr; i++) { count = si->swap_map[offset + i]; @@ -3539,31 +3466,16 @@ static int swap_dup_entries(struct swap_info_struct *si, * swapin_readahead() doesn't check if a swap entry is valid, so the * swap entry could be SWAP_MAP_BAD. Check here with lock held. */ - if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { + if (unlikely(count == SWAP_MAP_BAD)) return -ENOENT; - } - - has_cache = count & SWAP_HAS_CACHE; - count &= ~SWAP_HAS_CACHE; - if (!count && !has_cache) { + if (!count && !swp_te_is_folio(__swap_table_get(ci, offset))) return -ENOENT; - } else if (usage == SWAP_HAS_CACHE) { - if (has_cache) - return -EEXIST; - } else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) { - return -EINVAL; - } } for (i = 0; i < nr; i++) { count = si->swap_map[offset + i]; - has_cache = count & SWAP_HAS_CACHE; - count &= ~SWAP_HAS_CACHE; - - if (usage == SWAP_HAS_CACHE) - has_cache = SWAP_HAS_CACHE; - else if ((count & ~COUNT_CONTINUED) < SWAP_MAP_MAX) + if ((count & ~COUNT_CONTINUED) < SWAP_MAP_MAX) count += usage; else if (swap_count_continued(si, offset + i, count)) count = COUNT_CONTINUED; @@ -3575,7 +3487,7 @@ static int swap_dup_entries(struct swap_info_struct *si, return -ENOMEM; } - WRITE_ONCE(si->swap_map[offset + i], count | has_cache); + WRITE_ONCE(si->swap_map[offset + i], count); } return 0; @@ -3625,12 +3537,6 @@ int do_dup_swap_entry(swp_entry_t entry) return err; } -int __swap_cache_set_entry(struct swap_info_struct *si, - struct swap_cluster_info *ci, unsigned long offset) -{ - return swap_dup_entries(si, ci, offset, SWAP_HAS_CACHE, 1); -} - /* * add_swap_count_continuation - called when a swap count is duplicated * beyond SWAP_MAP_MAX, it allocates a new page and links that to the entry's @@ -3676,7 +3582,7 @@ int add_swap_count_continuation(swp_entry_t entry, gfp_t gfp_mask) ci = swap_lock_cluster(si, offset); - count = swap_count(si->swap_map[offset]); + count = si->swap_map[offset]; if ((count & ~COUNT_CONTINUED) != SWAP_MAP_MAX) { /* -- 2.49.0