From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A372C5320E for ; Tue, 27 Aug 2024 06:46:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 652B36B007B; Tue, 27 Aug 2024 02:46:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6028B6B0082; Tue, 27 Aug 2024 02:46:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4CA9D6B0083; Tue, 27 Aug 2024 02:46:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 308546B007B for ; Tue, 27 Aug 2024 02:46:51 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D805714160E for ; Tue, 27 Aug 2024 06:46:50 +0000 (UTC) X-FDA: 82497092580.14.834337B Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) by imf03.hostedemail.com (Postfix) with ESMTP id 7382A2000C for ; Tue, 27 Aug 2024 06:46:46 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=kNJ64BAA; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf03.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724741138; a=rsa-sha256; cv=none; b=xEETGaNcJdwuzdbRS2r0mEe3/8EwIlKQeP36I8iUVj1TxE5Bqh+bB7455NGVsWu9BiULXQ /Pow0v8eyFaC4awGGIV7OIZVIEFibsLCxya54eKqzdT0Oh2BA1J80TbFcsDpYWwDgee0Dv SnwBx2Bpzju450EkFyzgz6V/ehAkpVc= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=kNJ64BAA; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf03.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724741138; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MYj42ejBei/qNh91sxn96wAtDa2ftVcX0BdcltOjqTM=; b=kEjjfq6QdsWu2sEbsDSMMPc218KlUSbK+TN0aTXlomlReELqW7/IDjly0TjEZRsy76Rk0X 8GnGdIZ1ver38+DGtXFc1mWdj/+8M3hWNVYgQ974qQK34yjaM9mxRdVjnDVM0mVEikuAI9 OdtcSjuLZ1g9HchJy9QO+QOzZbDvgTk= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1724741203; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=MYj42ejBei/qNh91sxn96wAtDa2ftVcX0BdcltOjqTM=; b=kNJ64BAAdKUAPv1E1ccn/PW+lNH5fidcVr5r02RzXuScfLRKyZTAQvpoq8nUqxCIPDnQc1Y42v1+fVzEbSDMwI5NbYA+Vu7VaWl7igYULGjN/Og7LUWZlnSci3L0FLwGjqXQD37/akXYa4i49Sd5AWlZSEadiM5H377f2mmFHN0= Received: from 30.97.56.57(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WDlZ0bw_1724741200) by smtp.aliyun-inc.com; Tue, 27 Aug 2024 14:46:41 +0800 Message-ID: <39c71ccf-669b-4d9f-923c-f6b9c4ceb8df@linux.alibaba.com> Date: Tue, 27 Aug 2024 14:46:40 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 8/9] mm: shmem: split large entry if the swapin folio is not large To: Hugh Dickins , Andrew Morton Cc: willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, chrisl@kernel.org, ying.huang@intel.com, 21cnbao@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com, ioworker0@gmail.com, da.gomez@samsung.com, p.raghav@samsung.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <4a0f12f27c54a62eb4d9ca1265fed3a62531a63e.1723434324.git.baolin.wang@linux.alibaba.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 7382A2000C X-Stat-Signature: hono7mhukwoscryb9rdd7icqdooitpo9 X-Rspam-User: X-HE-Tag: 1724741206-269268 X-HE-Meta: U2FsdGVkX1/HiAnUTLbyRMrzvkttUspP28me6Lu07sLHQnfwyrvQ9TVMObhtz1hurCh3TFirdTuc9YiG+CWDcl+zbtY711YLPsTRwFmQfR/ruTMdjzfup1vWOI4c78j+wOQ1vbVR3bVhygtLKAWcveCfM+wFjC6EEpYbppqpbJsRyti8daS1/mPDSHD/nkazFowX4mR4FE/1Co5knqX+eFAgqMOBIc0mtv18FcfMt1uTNEeA18PaOFUFUN6urX00ouXaOt0bxF+NOYUkdNwGhMyYv/BJ6P7RSytRNMGlWjm2TGREWN46PMbEIDSg2hUAboS5CrWxyGM4h4YQmtKlW9YvK/pJxo4k7aYytcLGOsoWCxlvHb8VaGjFMuEv5Xu7DOUN6kjUy+pCQtxhLBX7G5ua7SWTEYuBTCL5usEL9wEK2BOaEmhfDdjFQ7DjmGMIdy77uwHcGx2KgT6ahkwgS5Jq7bDGpVg7S885W3J3zsYVvHfB+Dl7woGCf+xM7PaTRGjG5qhSNHju/c0g4zXQaHmxmwY11bGwRemcFPAFsa1ncdngxNWNJuS7otHDE4yEOjXgl1OEQDGS8dU9i7Gl0zIB+M//9k3HkAryJtM3y+Ej0a27bkj8b8eto4TjvuYDie2reN853V6tPwIAknvZ6nj68dRaA+9BIsgOCXZf2j1/UOTcYVKmAcGKiunVi/3yNvO4lYKs7zShoTjjn3zYNsbSswmczxCt1Qa6bsr9cKiohsDdpDBD3ZVx/ywH7bA0Esw5yndP8ftpBXMPE0uYNinE+cFouVYKCajIsUIiyrSuE3n0FNd0c66KiGrkv4XGaEVbqqg8AENRsfUFjrpNnHIzqMMIwsww/siKyKy/YLtyNK8RjQwqG0l5sYgzGvDKOVGb5cgBva5jByJtUxOUu2XN71tFw8Cclrz2RWrQJk+hDYnOb6zObkDlzA0B1xV6b2EqEjiynk5A24uC1p2 WknPQuOY K7Oq6yrFp2Kr5482aZAnk4dAQ+Dq1eqbffvy2zzqIbYCNn4JV+dQTez0T0kX8Juj+qdhTR/tMkimNKULxKAZWhEkK/uUYW6bapmFB4Ul2y/pFRnbd84nJyMrOY0txpBMumpm1SD19wmd8AMHCKEkwzjn7NVnzH2ToNIFdVZeT7ZA7AgKvUKjeoadhSTlQuSJ2ov0QFl8suKMJL9HhvTeYN/ltidpGrd6VLObFaSYmw5nf/ToKwyrFSDKEmIPsAYoJtRCQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/8/26 06:31, Hugh Dickins wrote: > On Mon, 12 Aug 2024, Baolin Wang wrote: > >> Now the swap device can only swap-in order 0 folio, even though a large >> folio is swapped out. This requires us to split the large entry previously >> saved in the shmem pagecache to support the swap in of small folios. >> >> Signed-off-by: Baolin Wang >> --- >> mm/shmem.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 100 insertions(+) >> >> diff --git a/mm/shmem.c b/mm/shmem.c >> index 345e25425e37..996062dc196b 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -1990,6 +1990,81 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index, >> swap_free_nr(swap, nr_pages); >> } >> >> +static int shmem_split_large_entry(struct inode *inode, pgoff_t index, >> + swp_entry_t swap, int new_order, gfp_t gfp) >> +{ >> + struct address_space *mapping = inode->i_mapping; >> + XA_STATE_ORDER(xas, &mapping->i_pages, index, new_order); >> + void *alloced_shadow = NULL; >> + int alloced_order = 0, i; > > gfp needs to be adjusted: see fix patch below. Ah, good catch. Thank you Hugh. >> + >> + for (;;) { >> + int order = -1, split_order = 0; >> + void *old = NULL; >> + >> + xas_lock_irq(&xas); >> + old = xas_load(&xas); >> + if (!xa_is_value(old) || swp_to_radix_entry(swap) != old) { >> + xas_set_err(&xas, -EEXIST); >> + goto unlock; >> + } >> + >> + order = xas_get_order(&xas); >> + >> + /* Swap entry may have changed before we re-acquire the lock */ >> + if (alloced_order && >> + (old != alloced_shadow || order != alloced_order)) { >> + xas_destroy(&xas); >> + alloced_order = 0; >> + } >> + >> + /* Try to split large swap entry in pagecache */ >> + if (order > 0 && order > new_order) { > > I have not even attempted to understand all the manipulations of order and > new_order and alloced_order and split_order. And further down it turns out > that this is only ever called with new_order 0. > > You may be wanting to cater for more generality in future, but for now > please cut this down to the new_order 0 case, and omit that parameter. > It will be easier for us to think about the xa_get_order() races if > the possibilities are more limited. Sure. I will drop the 'new_order' with following fix. > >> + if (!alloced_order) { >> + split_order = order; >> + goto unlock; >> + } >> + xas_split(&xas, old, order); >> + >> + /* >> + * Re-set the swap entry after splitting, and the swap >> + * offset of the original large entry must be continuous. >> + */ >> + for (i = 0; i < 1 << order; i += (1 << new_order)) { >> + pgoff_t aligned_index = round_down(index, 1 << order); >> + swp_entry_t tmp; >> + >> + tmp = swp_entry(swp_type(swap), swp_offset(swap) + i); >> + __xa_store(&mapping->i_pages, aligned_index + i, >> + swp_to_radix_entry(tmp), 0); >> + } > > So that is done under xas lock: good. But is the intermediate state > visible to RCU readers, and could that be a problem? In xas_split(), the multi-index entry has been split into smaller entries, and each of these smaller entries has been set with the old swap value. During the process of __xa_store(), these entries will be re-set to the new swap value. Although RCU readers might observe the old swap value, I have not seen any problems until now (may be I missed something). For concurrent shmem swap-in cases, there are some checks in shmem_swapin_folio() (including folio->swap.val and shmem_confirm_swap() validation ) to ensure the correctness of the swap values. For the shmem_partial_swap_usage(), we may get racy swap usages, but it is not a problem form its comments: " * This is safe to call without i_rwsem or the i_pages lock thanks to RCU, * as long as the inode doesn't go away and racy results are not a problem." For shmem truncation, when removing the racy swap entry from shmem page cache, it will use xa_cmpxchg_irq() to sync the correct swap state. [PATCH] mm: shmem: split large entry if the swapin folio is not large fix 2 Now we only split large folio to order 0, so drop the 'new_order' parameter. Signed-off-by: Baolin Wang --- mm/shmem.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index d8038a66b110..f00b7b99ad09 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1998,10 +1998,10 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index, } static int shmem_split_large_entry(struct inode *inode, pgoff_t index, - swp_entry_t swap, int new_order, gfp_t gfp) + swp_entry_t swap, gfp_t gfp) { struct address_space *mapping = inode->i_mapping; - XA_STATE_ORDER(xas, &mapping->i_pages, index, new_order); + XA_STATE_ORDER(xas, &mapping->i_pages, index, 0); void *alloced_shadow = NULL; int alloced_order = 0, i; @@ -2026,7 +2026,7 @@ static int shmem_split_large_entry(struct inode *inode, pgoff_t index, } /* Try to split large swap entry in pagecache */ - if (order > 0 && order > new_order) { + if (order > 0) { if (!alloced_order) { split_order = order; goto unlock; @@ -2037,7 +2037,7 @@ static int shmem_split_large_entry(struct inode *inode, pgoff_t index, * Re-set the swap entry after splitting, and the swap * offset of the original large entry must be continuous. */ - for (i = 0; i < 1 << order; i += (1 << new_order)) { + for (i = 0; i < 1 << order; i++) { pgoff_t aligned_index = round_down(index, 1 << order); swp_entry_t tmp; @@ -2123,7 +2123,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, * should split the large swap entry stored in the pagecache * if necessary. */ - split_order = shmem_split_large_entry(inode, index, swap, 0, gfp); + split_order = shmem_split_large_entry(inode, index, swap, gfp); if (split_order < 0) { error = split_order; goto failed;