From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1AFEC3ABD8 for ; Wed, 14 May 2025 20:18:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 317006B00A3; Wed, 14 May 2025 16:18:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C8326B00A4; Wed, 14 May 2025 16:18:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 144D56B00A5; Wed, 14 May 2025 16:18:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E836D6B00A3 for ; Wed, 14 May 2025 16:18:34 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C9A071410DD for ; Wed, 14 May 2025 20:18:34 +0000 (UTC) X-FDA: 83442626148.10.89468A1 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf13.hostedemail.com (Postfix) with ESMTP id E275920013 for ; Wed, 14 May 2025 20:18:32 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hnCLRBTu; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747253913; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M1R/mpbhR+GPNqZSC0ObHBH2ENS4V9i/XDChvLFkyhc=; b=di2YYL/pOWdx9IpK6EUz2wAr2wUoUwZxZJqW+gkabEhAG5NOns5gNBLTabhPHsZXlK5yRN 7KlSBMe0JgSc8GjGmhDExHdOfd6sPq3mQPOM5SG1FT9Qim9rYyDlFQ08c8aeTKwTaua08p rUaHHqQrEb+v0havKry/ooQijIWLGVg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747253913; a=rsa-sha256; cv=none; b=eHaR8GYQyR0TS0P9haesNoUC/9MU+aLbwIZTpRTxhmcWh32Rfv5VurUdFH6KrDRt7y0aFC bDu6dAuIH4SggEzZglyhWhlhL9GUR3yOwhHFZb5nFGZZL0AghjzG038tMtR0dWyok0FmNC X6vuy9Jd7qI4RriXJbPqEUf5GYJYIuw= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hnCLRBTu; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com Received: by mail-pg1-f174.google.com with SMTP id 41be03b00d2f7-b13e0471a2dso75722a12.2 for ; Wed, 14 May 2025 13:18:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747253911; x=1747858711; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=M1R/mpbhR+GPNqZSC0ObHBH2ENS4V9i/XDChvLFkyhc=; b=hnCLRBTuEE/LyordyRXaa7Wy3SlrPNO3RVHajNzVkXyRL/+yWwG+jJ3GPSif6n4FVi /r7uYakTjbPVNEJ+3R8zdfGst576lyza+nH30ODdfeTcUa96c/lxJ6OmL6tbPGlrNE6c oiOnon5YKo48XZNOKuFDaNmopz0N+ceda4vmVqi56P+fDay95eh8h8JKBoKREV9cuBdo mAg+SjcZE8oOokcduaj2r5Ck2/4uk1GCCEaowF074a9KKAlP1so9mXCRcp3btRddWjZl SYX07/0RUPFIx6TR9VhjNxD7uyOPrBOTvx7u47CmYeL0U9NUdDgddwTSHGPF3qmat8YY RSnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747253911; x=1747858711; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=M1R/mpbhR+GPNqZSC0ObHBH2ENS4V9i/XDChvLFkyhc=; b=QxXvp4xuLMuAOIkmd7q/oytRBeCixSERYmtQmWzdu/yCVuk5hRlGylSujJhPU9Uk6x HH8vBpsMUstBCRdc2Gr8Eg+s1G1p59uDxuFpAoKqoLe5WKvib64s8YG0YlkqdLz0S/BP HpG9c7yV8xuwPUcVajYpxGaS5vuqQXUVObcCm1BiYUDzKmmO1f/y29RB2sJL/luQ2w+q r3iNlwnts02drr/d36pWLjJiInEKHzpFjvbKlUUk+MER/FiSPDnF/BsSblU1GBsBAXRW b+bYwFVnuWwiLPb2OMMpkXJDDbMFFx08fWxsDyjr9cYX4ptwYn8scY/q7gugQA97nOrD HeDw== X-Gm-Message-State: AOJu0YxSXcGiDm/DdrACtQr0rMP1Na3eZnctoOVfeBT2o0SbDDYieZJ+ M1W/+oRIYF527rEp76ziQ5RfqWA7dXwGu/EHMEnzyay+cQLDMN3naL23dafpSOg= X-Gm-Gg: ASbGncuhzzbLRpxNJO92Gvk5pKG4EtnM1MuBnF9CbU4G4Ea5XYJyDXJBssZK+2dusL9 BtrpMRtJ4XrrhKRsRTig604456mcxQOJdV4YrTXQvNwn62L25FZtiy+aIs6gwi590ScxUij5Pe4 a9AWeYP9cJHFdpb7cx3LvzZC01HItZUycR4RRtCrgkxridnON52GheiEz/Og9hFEtQu8zzsSqX9 XPmZKOOAeaSD5NE+nvBjvAOQJ1DIEahHq7QvEBJVbxNt2EENmGXGFbwNLuFE3EarmWDhD/y4sWy ke+rUiQ/weS3yZEou5wOTlJuv5ll1XCmWDL4Ud23A2cj8eROQtMy9dyt8LbgkXyTroXYHYvX X-Google-Smtp-Source: AGHT+IGnlSHbxyA5V3K7uMRaI4uuFU1zzPZsYG8BiPN7S4hJcbAGXBFga/6TlUDA8ebrQlzWWbc3Gw== X-Received: by 2002:a17:90b:5866:b0:301:1bce:c26f with SMTP id 98e67ed59e1d1-30e2e59bbf0mr6421498a91.3.1747253911031; Wed, 14 May 2025 13:18:31 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30e33401934sm2003692a91.9.2025.05.14.13.18.26 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 14 May 2025 13:18:30 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Matthew Wilcox , Hugh Dickins , Chris Li , David Hildenbrand , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Baolin Wang , Baoquan He , Barry Song , Kalesh Singh , Kemeng Shi , Tim Chen , Ryan Roberts , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 10/28] mm, swap: add a swap helper for bypassing only read ahead Date: Thu, 15 May 2025 04:17:10 +0800 Message-ID: <20250514201729.48420-11-ryncsn@gmail.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250514201729.48420-1-ryncsn@gmail.com> References: <20250514201729.48420-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: tum5f69qn8jgq6hxhocfm5ty3pannnbw X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E275920013 X-HE-Tag: 1747253912-325191 X-HE-Meta: U2FsdGVkX1+zqsbYahpyN5fpzKbMal5rL+ER2y+uSbYkruzkrZfRY7cwR10Z2kWuSJ2LeYBOUV9blNFTRx8uVtXw0lDP7Sm5CBehQxFTheY3VzKXGr+E2XU8W6E/4NakwboBe8zuaTo6kq0GI8yb0eB/cWhw/ognts0eUaCHogzj3XN7Me2nHzK6Yq/QN7GwQUnZ7sDYsRh3ii5YRqd8QlD2mAGluHn65Tpaa8m32w8TGIhZ5RJAl66RYoLr2OFdjhMxwPBHgqPFhlVDvzAba2Ajln4zhpMF+RsWV26/GZkoyGrrot4flMhAsKEYu/BGhMc+12L0Z1IkWIyPVtIlMLhKA8/19e/UKqeG/SezU/Vu6vFItRm2vOL4n3b+gL6TfMnMty2ENrR3mSvdaJ5KPiOLwA4m8ExLLscprIGBHRRrMVs4OR4beL1DT3ZMoOAcagrjs27ptT5vEALwwjeqqYa/ImdVn6gKVkr2XkI+OQ2ik/605Y8FmrHIPoWDawhBgtMOzeYPNyxFlfBKzVTIJf1502gwQj66+Rt11fuGfwD8jf8lpQD6+znxXDxptn5omPv5qjY10WTyUS/nagWo0CYIYMhNkEecOKjJ4ZpQ36qrI9m0I8mFDO0Mcz5Zx6dPs/od59DVhTG0ss3HIq1KXi2I7vAz1UXFDRvB2wiMIlSEx+HdO3qdwfY00U8QgDoEUPMOQYf5pQwWChtIJ2aH4vBsDN4xHc1gFJ+1mSlhZExDxKrmBiFgsn1CF9l6/P1p1JOdzMl5m9f1pgQxr4a3pLrG7AuWw18HQwj3q0Xu6eBaptSkFkrGg+zziG3oFltpf1WDkFyzGHfA2Dsf1SYEcDuHBEAxvtIm6fbTyyRrjBVmdXEe217v7yiQq2AKG64qsqXlQERqsHR+1Qro4zjRuPb4IRBn/GOf0BbEAuXz4RygCwpZIcwGjes087flAA3dxtiZeeJtWLo9OJHKcsJ GX+eVu8J EWQ8nbn5U5UrTWQLOZlcyIhjxUpXkrJxKWoIO7cKAjL1jsTBGY1jV9qQn+L7xdoazQNHyeCNClEwh7xG7eIr9tQ/nk7lef/r0v1afnNXXZZO1fQNEALQbTP5w++uLPE44KTmjswKrMCOetuOEJLsddqvxv194ya9sywE7ZZNNAnuKxzJejgsbHvjSuOQ9m55B8Is1hPzUCsYt15EOGu1ugxMHTPXucUjXjPHdsHBIKmFs/9Qt7zAxgPzTiTCuJr7aPbYYwoTAZwDPWap+ddnYHpfvjU+FSJIh7zXPv3qG/q6/1QCfAMK722logVRXhEB1aA03EwUFvORFWAhD+lr4vtLRx0jq/yYYrAbur7XdP/+nmtt/3r6oJ2F9Yrh+6Yg448pa7O27ixAD817HBajF3HjQTcSv9xx4JYIM2dMGuLKf1nRf7DXNoqzQbQ+gH5mfbuycnfIrHKpPl8INpWQZevIYDogUxbuzVBVI67yUH2NVrXE8jgQ/WO6MQNSpf0OD6N+xGEQiwZXUIwVB7ldP1zK3PrQDUX8J8J1taNr+6K0+W982JPGQbMhbcrmLxRSA//U7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song The swap cache now has a very low overhead, bypassing it is not helpful anymore. To prepare for unifying the swap in path, introduce a new helper that only bypasses read ahead and does not bypass the swap cache. Signed-off-by: Kairui Song --- mm/swap.h | 6 ++ mm/swap_state.c | 158 ++++++++++++++++++++++++++++++------------------ 2 files changed, 105 insertions(+), 59 deletions(-) diff --git a/mm/swap.h b/mm/swap.h index fec7d6e751ae..aab6bf9c3a8a 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -217,6 +217,7 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, struct mempolicy *mpol, pgoff_t ilx); struct folio *swapin_readahead(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf); +struct folio *swapin_entry(swp_entry_t entry, struct folio *folio); void swap_update_readahead(struct folio *folio, struct vm_area_struct *vma, unsigned long addr); @@ -303,6 +304,11 @@ static inline struct folio *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, return NULL; } +static inline struct folio *swapin_entry(swp_entry_t ent, struct folio *folio) +{ + return NULL; +} + static inline void swap_update_readahead(struct folio *folio, struct vm_area_struct *vma, unsigned long addr) { diff --git a/mm/swap_state.c b/mm/swap_state.c index fe71706e29d9..d68687295f52 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -353,54 +353,26 @@ void swap_update_readahead(struct folio *folio, } } -struct folio *__swapin_cache_alloc(swp_entry_t entry, gfp_t gfp_mask, - struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated, - bool skip_if_exists) +static struct folio *__swapin_cache_add_prepare(swp_entry_t entry, + struct folio *folio, + bool skip_if_exists) { - struct swap_info_struct *si = swp_info(entry); - struct folio *folio; - struct folio *new_folio = NULL; - struct folio *result = NULL; + int nr_pages = folio_nr_pages(folio); + struct folio *exist; void *shadow = NULL; + int err; - *new_page_allocated = false; for (;;) { - int err; - /* - * Check the swap cache first, if a cached folio is found, - * return it unlocked. The caller will lock and check it. + * Caller should have checked swap cache and swap count + * already, try prepare the swap map directly, it will still + * fail with -ENOENT or -EEXIST if the entry is gone or raced. */ - folio = swap_cache_get_folio(entry); - if (folio) - goto got_folio; - - /* - * Just skip read ahead for unused swap slot. - */ - if (!swap_entry_swapped(si, entry)) - goto put_and_return; - - /* - * Get a new folio to read into from swap. Allocate it now if - * new_folio not exist, before marking swap_map SWAP_HAS_CACHE, - * when -EEXIST will cause any racers to loop around until we - * add it to cache. - */ - if (!new_folio) { - new_folio = folio_alloc_mpol(gfp_mask, 0, mpol, ilx, numa_node_id()); - if (!new_folio) - goto put_and_return; - } - - /* - * Swap entry may have been freed since our caller observed it. - */ - err = swapcache_prepare(entry, 1); + err = swapcache_prepare(entry, nr_pages); if (!err) break; else if (err != -EEXIST) - goto put_and_return; + return NULL; /* * Protect against a recursive call to __swapin_cache_alloc() @@ -411,7 +383,11 @@ struct folio *__swapin_cache_alloc(swp_entry_t entry, gfp_t gfp_mask, * __swapin_cache_alloc() in the writeback path. */ if (skip_if_exists) - goto put_and_return; + return NULL; + + exist = swap_cache_get_folio(entry); + if (exist) + return exist; /* * We might race against __swap_cache_del_folio(), and @@ -426,35 +402,99 @@ struct folio *__swapin_cache_alloc(swp_entry_t entry, gfp_t gfp_mask, /* * The swap entry is ours to swap in. Prepare the new folio. */ - __folio_set_locked(new_folio); - __folio_set_swapbacked(new_folio); + __folio_set_locked(folio); + __folio_set_swapbacked(folio); - if (mem_cgroup_swapin_charge_folio(new_folio, NULL, gfp_mask, entry)) - goto fail_unlock; - - if (swap_cache_add_folio(entry, new_folio, &shadow)) + if (swap_cache_add_folio(entry, folio, &shadow)) goto fail_unlock; memcg1_swapin(entry, 1); if (shadow) - workingset_refault(new_folio, shadow); + workingset_refault(folio, shadow); /* Caller will initiate read into locked new_folio */ - folio_add_lru(new_folio); - *new_page_allocated = true; - folio = new_folio; -got_folio: - result = folio; - goto put_and_return; + folio_add_lru(folio); + return folio; fail_unlock: - put_swap_folio(new_folio, entry); - folio_unlock(new_folio); -put_and_return: - if (!(*new_page_allocated) && new_folio) - folio_put(new_folio); - return result; + put_swap_folio(folio, entry); + folio_unlock(folio); + return NULL; +} + +struct folio *__swapin_cache_alloc(swp_entry_t entry, gfp_t gfp_mask, + struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated, + bool skip_if_exists) +{ + struct swap_info_struct *si = swp_info(entry); + struct folio *swapcache = NULL, *folio = NULL; + + /* + * Check the swap cache first, if a cached folio is found, + * return it unlocked. The caller will lock and check it. + */ + swapcache = swap_cache_get_folio(entry); + if (swapcache) + goto out; + + /* + * Just skip read ahead for unused swap slot. + */ + if (!swap_entry_swapped(si, entry)) + goto out; + + /* + * Get a new folio to read into from swap. Allocate it now if + * new_folio not exist, before marking swap_map SWAP_HAS_CACHE, + * when -EEXIST will cause any racers to loop around until we + * add it to cache. + */ + folio = folio_alloc_mpol(gfp_mask, 0, mpol, ilx, numa_node_id()); + if (!folio) + goto out; + + if (mem_cgroup_swapin_charge_folio(folio, NULL, gfp_mask, entry)) + goto out; + + swapcache = __swapin_cache_add_prepare(entry, folio, skip_if_exists); +out: + if (swapcache && swapcache == folio) { + *new_page_allocated = true; + } else { + if (folio) + folio_put(folio); + *new_page_allocated = false; + } + + return swapcache; +} + +/** + * swapin_entry - swap-in one or multiple entries skipping readahead + * + * @entry: swap entry to swap in + * @folio: pre allocated folio + * + * Reads @entry into @folio. @folio will be added to swap cache first, if + * this raced with another users, only one user will successfully add its + * folio into swap cache, and that folio will be returned for all readers. + * + * If @folio is a large folio, the entry will be rounded down to match + * the folio start and the whole folio will be read in. + */ +struct folio *swapin_entry(swp_entry_t entry, struct folio *folio) +{ + struct folio *swapcache; + pgoff_t offset = swp_offset(entry); + unsigned long nr_pages = folio_nr_pages(folio); + VM_WARN_ON_ONCE(nr_pages > SWAPFILE_CLUSTER); + + entry = swp_entry(swp_type(entry), ALIGN_DOWN(offset, nr_pages)); + swapcache = __swapin_cache_add_prepare(entry, folio, false); + if (swapcache == folio) + swap_read_folio(folio, NULL); + return swapcache; } /* -- 2.49.0