From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 327D2C3ABDA for ; Wed, 14 May 2025 20:18:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 82EEA6B00A5; Wed, 14 May 2025 16:18:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B4E16B00A6; Wed, 14 May 2025 16:18:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6301D6B00A7; Wed, 14 May 2025 16:18:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 40ACB6B00A5 for ; Wed, 14 May 2025 16:18:40 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id F152DE2B64 for ; Wed, 14 May 2025 20:18:39 +0000 (UTC) X-FDA: 83442626358.25.01DE12C Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf15.hostedemail.com (Postfix) with ESMTP id EB7A7A000D for ; Wed, 14 May 2025 20:18:37 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZQYhMGZL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747253918; a=rsa-sha256; cv=none; b=LtAGzadhoi6EklsQSjIcL3pbqBc0uoLI6lxftGOzh1W2R++igh6nk8yFQ/CsA94KFcXJIp ssM/QmvHFLw4Xr9FODSA+Shg36RrGOrHIiTYGTS/OhC7rSMfGG7hzNDy87hBnP0rDRVDXX CNHvW16eZU1yV+OkNCOijmwtfy8AQj8= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZQYhMGZL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747253918; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9BUZKEH/Ak77Mt8ITzomy+/mZrVRHqlKC6HHmrhrRTA=; b=jmihd1pUG76jPlxChE1hZpGFoXfTis1UvCd5e/h0uLU6Qf4g4CSXhBl6kwo4LTeqMrrEZj WQv6W3PgX1YXk/II85yk37/d8UdcBnb6HLCSC9ViPAcDHp+XRLwixdgW2Y/QLU7zftl5QK 1g7ohOuei0Y6Zo7HOk8dFT5lCv1HUjY= Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-30c47918d84so263652a91.3 for ; Wed, 14 May 2025 13:18:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747253916; x=1747858716; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=9BUZKEH/Ak77Mt8ITzomy+/mZrVRHqlKC6HHmrhrRTA=; b=ZQYhMGZL452Wf9h8ienhr6Tgsyft9vzCzzrDrEfnpl0+1TvE4iCstq70L8brj/xQJH 4MGRTghkZwfL/YLRRKpW9s27k93+Xd82gkavEbrKonmQCj80qezgVl8r42KxL03qSfhj 1eIIF5iRLFs5Km11D7FG0hQNu4fdGorLXt0bNZ0/74x2+UaCiHGSww9oEKfNl/8vAg+O eCM5fbAvCj0UX3O/+Gr1LuA0ROpnadlg7lvB4QA+KHuZQCHylokn/ujIOF0+cimxX9rv rSNFV/gTjpV4zVUaY7uARPzpf8D7MZfQENvlPTkBgVYqzD/uBbf6XoHiTuPpIFO7r5Ty fLkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747253916; x=1747858716; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=9BUZKEH/Ak77Mt8ITzomy+/mZrVRHqlKC6HHmrhrRTA=; b=FwVnses60ccRbQI2mazWb55E/QWD9BGBf4f9ty6HSXQSVo8NUWdZX+ZipLtecSzJyc ltUKd73fsGKDPKOS4RE5S43bHXsu4A+Fn551Ilu05cyhfs6pz1M9Gm5JwYBm+ev5Nv1H O+/lmoedUwTpxhKO1iWeGkRoQ0uwszk6409B0twzaePB9z/7zT0EjoUm2dPBjdeFXr8L A4p67T5YA+tpdIcf5PzOA78kja/pjWGORUjVteO/ku6V4QawpiDI1NlwrgItKyS8ukv7 ksTES9zG9HZ0FCRJSphnBaptmLJ9cD2kRMud7GTM62ArhPaxgARv1dJVfGOGHfj6tCy0 hVcg== X-Gm-Message-State: AOJu0YzE18jOh9yXtqmUmrfy1w/vHcR1f2dCEpJSFD3owfh3bz4+TOZF ovwWeDIq9PEINVV2UHc5VNkV+3mgxWkejHWKcWdhgl8NVh8L3+6MmFblT1lEjIk= X-Gm-Gg: ASbGncsy0ppVSliM2YlUFNPzXap3N4HUjuxCtn/Eq/6Vm5uR09j1FndrfWY9jduXpvZ YfEtuNJXIWxtDxv+13ABDToFrY3qnItLsHHifzo9XZy09ykzTMLNvsJ6GWundU6/UQ00l1uqVb1 VDOVFwnegccofZTb8lEnfH3NU6GTsIJNQUlZQFfhco+okwHODuWhcYS/gOdRuKK3zptSoG8bBq/ iJA1+SPF1gm4ogyOTwbpvgca3gsYtOZ2C0/szDIu5dPyLkH27b2vSSW05rmqO3HuKRk2jwS5S+b 3+3nSrXGNyT+tgjwWsXnzVWk/3Yee7UriErHm+g5+FGbCvtivjdD3tYd/M8cqQY0/3cWaS3O+AB mA1cPoh8= X-Google-Smtp-Source: AGHT+IEtdUvTmatOnRbfR6jC6LpmvBeyjNdx8dFBRfok/5utlBG8OKabienqCLq1lEw6NF//Eu4auA== X-Received: by 2002:a17:90b:4a86:b0:2ee:8e75:4aeb with SMTP id 98e67ed59e1d1-30e2e5dc9f5mr8044733a91.17.1747253916122; Wed, 14 May 2025 13:18:36 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30e33401934sm2003692a91.9.2025.05.14.13.18.31 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 14 May 2025 13:18:35 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Matthew Wilcox , Hugh Dickins , Chris Li , David Hildenbrand , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Baolin Wang , Baoquan He , Barry Song , Kalesh Singh , Kemeng Shi , Tim Chen , Ryan Roberts , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 11/28] mm, swap: clean up and consolidate helper for mTHP swapin check Date: Thu, 15 May 2025 04:17:11 +0800 Message-ID: <20250514201729.48420-12-ryncsn@gmail.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250514201729.48420-1-ryncsn@gmail.com> References: <20250514201729.48420-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: EB7A7A000D X-Rspam-User: X-Stat-Signature: 8ttxdaht3xwbtaawmcda5koduaq7iinb X-HE-Tag: 1747253917-118314 X-HE-Meta: U2FsdGVkX1+oMIlEpbZyzTSOA/Pw1R711E0E0UFaTeELACJYXFREjPpJdniRhPGl+LjJAI3RYOSxL99sXDGTL9xOujC1iOTQ0C1prJtg+j1bIA64dYSJXeGhO1Dp+hfJIF4hojRKhOezgBIucMbHrhkw1nA8+qNk2SgRgokM7o7jvNqJMw1X+j3Ne7RfnbyNo4VbD3IEAv9+hDOe6EdNUr31UJRZUMEk7vl4rew4U2sPur92OAPhTMQE8cl5b6A+CSTurjv0eOnnkCiC+vf6KrhLqVqkUpHsiZNvtggjABP7tBoDKHoVLm1/f7zZ3kC0b5VuP1/uleOr/B2PXLTmjFw6CTDvdKzIsqQQOxhkL+LVd7ooPKqBLbO1rUCBtgmOzAtsGmw31dBE5I2tsQVj5Hcz7vWtjYcX11TEs4Gdl1sEjGnDO1UWbyAs6v2TBb35mOrJbRSJi2UxqOvSnJaCqi/lV+FLWx3yrpVewCQ9uC2PxWtTh4eJNvaV0rMCHZmEe0kMNeirzpZJ1vlX0uIE0AuSMv+fdgu0pH9ekvttLCyN+VPZ6qb+8JoPMP+3Y2uLu5C3bCl/pp8wLh7fRIQFGd4R5zxEkgl5sUGlLBnmvCMTb+lKH1YDB6Y6DWzFJhG9DkeC2P0SImSe7m43QANYiXksbd8Ql3ZBRGHCh7YQb0mo796IpCr4WhUHpXbFzwuHjus1B5rObSfw2lGnrT1+qs467BP+glfyZjH3Opy7UIvBNmnQ9fg+SrhucIF49ESx5XqBWM7wo0fW7sqgOpB8B4NaUaXcl3xd3Hm3i3QGd5IOuYZf7tlMnC2R1L7fO+PoW7Nc2peNcGgdL0XxiFQPnDv+ZCP71gYEQ4m3TWE7hsJiDApOCLkU5luM4oe0KLMisoYuw2Lp9mUwjNdHP6RLBYzIhkEKfdy/f0pKjhbXq98coVqEZlIGmtzi28yBXhunenP3O3ymeqObMyubPql eFHfXRcR kQD9Ttpbw5VCAEYlxR/2h6RIkexmr9iww9rO3l9XGQXM5VPNYJ0fkKULw4KSgeifqKP1eKsyPwsP3Wj2Nc9XZfIS7Qhr8IXNXDgdEGuGugnRR8EqtXl4BjlZ9mjwN/gZTv/eDN3mrSDLaR+gOUNXZnObxXmpbF+RTZDIHwH1ZE9F4HFeDAek/1ROimOoFa9Flcb+kkjIja7nP6HEOyLjg3Q07jRshCOGBuQXnYvJUyVvnAbGOEKjIL8SKWDbKojhquLhoGslxGJ5suY8kSkrnA94j/ehoqzx+YRoiENalenoxF0Y69hsT/t8xly8VhIyG0nSYOYB0WzDofm1ypcIjZXOY5FwGMXjI5ZBxlBqhlWNLH0dR7n4PPqXBAc73jSaHUXn/+vPSVQ7h8jotU3ASHc26UZ+xTTa/dUh+m+SCfKDigU6WPQFAE+ONXk58MOrcL3fJA8tiar3DnC9zxNXXrU2LaLshIN9BHC8S8T3betBgiJ+p8z1ZvcjTW4Qtj2qmxfU+EGmZ4e/HvG0omupbYOVGLwZglNnsDaVTYBjZukD8drhtxMuExLboc5zI7AYTlVpf X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Move all mTHP swapin check into can_swapin_thp and use it for both pre IO check and post IO check. This way the code is more consolidated and make later commit easier to maintain. Also clean up the comments while at it. The current comment of non_swapcache_batch is not correct: swap in bypassing swap cache won't reach the swap device as long as the entry is cached, because it still sets the SWAP_HAS_CACHE flag. If the folio is already in swap cache, raced swap in will either fail due to -EEXIST with swapcache_prepare, or see the cached folio. The real reason this non_swapcache_batch is needed is that if a smaller folio is in the swap cache but not mapped, mTHP swapin will be blocked forever as it won't see the folio due to index offset, nor it can set the SWAP_HAS_CACHE bit, so it has to fallback to order 0 swap in. Signed-off-by: Kairui Song --- mm/memory.c | 90 ++++++++++++++++++++++++----------------------------- 1 file changed, 41 insertions(+), 49 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index f2897d9059f2..1b6e192de6ec 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4319,12 +4319,6 @@ static inline int non_swapcache_batch(swp_entry_t entry, int max_nr) pgoff_t offset = swp_offset(entry); int i; - /* - * While allocating a large folio and doing swap_read_folio, which is - * the case the being faulted pte doesn't have swapcache. We need to - * ensure all PTEs have no cache as well, otherwise, we might go to - * swap devices while the content is in swapcache. - */ for (i = 0; i < max_nr; i++) { if ((si->swap_map[offset + i] & SWAP_HAS_CACHE)) return i; @@ -4334,34 +4328,30 @@ static inline int non_swapcache_batch(swp_entry_t entry, int max_nr) } /* - * Check if the PTEs within a range are contiguous swap entries - * and have consistent swapcache, zeromap. + * Check if the page table is still suitable for large folio swap in. + * @vmf: The fault triggering the swap-in. + * @ptep: Pointer to the PTE that should be the head of the swap in folio. + * @addr: The address corresponding to the PTE. + * @nr_pages: Number of pages of the folio that suppose to be swapped in. */ -static bool can_swapin_thp(struct vm_fault *vmf, pte_t *ptep, int nr_pages) +static bool can_swapin_thp(struct vm_fault *vmf, pte_t *ptep, + unsigned long addr, unsigned int nr_pages) { - unsigned long addr; - swp_entry_t entry; - int idx; - pte_t pte; - - addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE); - idx = (vmf->address - addr) / PAGE_SIZE; - pte = ptep_get(ptep); + pte_t pte = ptep_get(ptep); + unsigned long addr_end = addr + (PAGE_SIZE * nr_pages); + unsigned long pte_offset = (vmf->address - addr) / PAGE_SIZE; - if (!pte_same(pte, pte_move_swp_offset(vmf->orig_pte, -idx))) + VM_WARN_ON_ONCE(!IS_ALIGNED(addr, PAGE_SIZE) || + addr > vmf->address || addr_end <= vmf->address); + if (unlikely(addr < max(addr & PMD_MASK, vmf->vma->vm_start) || + addr_end > pmd_addr_end(addr, vmf->vma->vm_end))) return false; - entry = pte_to_swp_entry(pte); - if (swap_pte_batch(ptep, nr_pages, pte) != nr_pages) - return false; - /* - * swap_read_folio() can't handle the case a large folio is hybridly - * from different backends. And they are likely corner cases. Similar - * things might be added once zswap support large folios. + * All swap entries must from the same swap device, in same + * cgroup, with same exclusiveness, only differs in offset. */ - if (unlikely(swap_zeromap_batch(entry, nr_pages, NULL) != nr_pages)) - return false; - if (unlikely(non_swapcache_batch(entry, nr_pages) != nr_pages)) + if (!pte_same(pte, pte_move_swp_offset(vmf->orig_pte, -pte_offset)) || + swap_pte_batch(ptep, nr_pages, pte) != nr_pages) return false; return true; @@ -4441,13 +4431,24 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) * completely swap entries with contiguous swap offsets. */ order = highest_order(orders); - while (orders) { - addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); - if (can_swapin_thp(vmf, pte + pte_index(addr), 1 << order)) - break; - order = next_order(&orders, order); + for (; orders; order = next_order(&orders, order)) { + unsigned long nr_pages = 1 << order; + swp_entry_t swap_entry = { .val = ALIGN_DOWN(entry.val, nr_pages) }; + addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE); + if (!can_swapin_thp(vmf, pte + pte_index(addr), addr, nr_pages)) + continue; + /* + * If there is already a smaller folio in cache, it will + * conflict with the larger folio in the swap cache layer + * and block the swap in. + */ + if (unlikely(non_swapcache_batch(swap_entry, nr_pages) != nr_pages)) + continue; + /* Zero map doesn't work with large folio yet. */ + if (unlikely(swap_zeromap_batch(swap_entry, nr_pages, NULL) != nr_pages)) + continue; + break; } - pte_unmap_unlock(pte, ptl); /* Try allocating the highest of the remaining orders. */ @@ -4731,27 +4732,18 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) page_idx = 0; address = vmf->address; ptep = vmf->pte; + if (folio_test_large(folio) && folio_test_swapcache(folio)) { - int nr = folio_nr_pages(folio); + unsigned long nr = folio_nr_pages(folio); unsigned long idx = folio_page_idx(folio, page); - unsigned long folio_start = address - idx * PAGE_SIZE; - unsigned long folio_end = folio_start + nr * PAGE_SIZE; - pte_t *folio_ptep; - pte_t folio_pte; + unsigned long folio_address = address - idx * PAGE_SIZE; + pte_t *folio_ptep = vmf->pte - idx; - if (unlikely(folio_start < max(address & PMD_MASK, vma->vm_start))) - goto check_folio; - if (unlikely(folio_end > pmd_addr_end(address, vma->vm_end))) - goto check_folio; - - folio_ptep = vmf->pte - idx; - folio_pte = ptep_get(folio_ptep); - if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) || - swap_pte_batch(folio_ptep, nr, folio_pte) != nr) + if (!can_swapin_thp(vmf, folio_ptep, folio_address, nr)) goto check_folio; page_idx = idx; - address = folio_start; + address = folio_address; ptep = folio_ptep; nr_pages = nr; entry = folio->swap; -- 2.49.0