From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4A0AC71136 for ; Wed, 18 Jun 2025 02:08:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0ABBF6B0088; Tue, 17 Jun 2025 22:08:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 05CB16B0089; Tue, 17 Jun 2025 22:08:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB4586B008A; Tue, 17 Jun 2025 22:08:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DE3D76B0088 for ; Tue, 17 Jun 2025 22:08:44 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 79761C17CE for ; Wed, 18 Jun 2025 02:08:44 +0000 (UTC) X-FDA: 83566887768.19.5B7271D Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by imf15.hostedemail.com (Postfix) with ESMTP id 4D25EA0013 for ; Wed, 18 Jun 2025 02:08:39 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf15.hostedemail.com: domain of shikemeng@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=shikemeng@huaweicloud.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750212522; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=96PM2z0v5CJzdeqcjC7tReeM12M5soiC6OMSAgrUkiU=; b=OnjvvKjl9dqmhDP/opxTL2DlQ5ZnVCW6RwRConrIB1g7cBeA3OGTSilsATtbU76Z0lnEZ5 f/YmSeBU8ZK1Irz53TKdQfiOJ9wC25d87qy+ErrILLPsbVk1OMLNFxg6qyWtt57BcYFoOz m6rBzBCCa+UyR5EJXaDBGD4Y0dg58Ms= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf15.hostedemail.com: domain of shikemeng@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=shikemeng@huaweicloud.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750212522; a=rsa-sha256; cv=none; b=gLiYHAcgJHGiOmgqYJTVl4bTINVnDKXY5/C/H4zszE9r9um8yBqd34Neyvxd3YPBtC2V+6 r78VDhSpn2sMRVvQlLaF3KUNneMJV4xHF0C2LFlUBBrVWR34JZucuGPoRtDmb2HX94u2nD omI5gchgcfmFJEGFUKXMcnsR9lKCWSA= Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4bMRx05z5GzYQvby for ; Wed, 18 Jun 2025 10:08:36 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id C66711A07BB for ; Wed, 18 Jun 2025 10:08:35 +0800 (CST) Received: from [10.174.99.169] (unknown [10.174.99.169]) by APP3 (Coremail) with SMTP id _Ch0CgD3ScSUH1JoZwV2Pg--.4725S2; Wed, 18 Jun 2025 10:08:35 +0800 (CST) Subject: Re: [PATCH 1/4] mm/shmem, swap: improve cached mTHP handling and fix potential hung To: Kairui Song , linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, stable@vger.kernel.org References: <20250617183503.10527-1-ryncsn@gmail.com> <20250617183503.10527-2-ryncsn@gmail.com> From: Kemeng Shi Message-ID: Date: Wed, 18 Jun 2025 10:08:19 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1 MIME-Version: 1.0 In-Reply-To: <20250617183503.10527-2-ryncsn@gmail.com> Content-Type: text/plain; charset=gbk Content-Transfer-Encoding: 7bit X-CM-TRANSID:_Ch0CgD3ScSUH1JoZwV2Pg--.4725S2 X-Coremail-Antispam: 1UD129KBjvJXoW3JFyDuw43Kw13XFW8CFy7trb_yoW7Xr1fpF ZFgwn3JFW8Wr9Fkr17tF40vryru3yIgFW8KF93Gw43A3Z8Xw1jkF18tw12vFyUCr97Ga92 qF18ZF1q9F1DtaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9Ib4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7Mxk0xIA0c2IE e2xFo4CEbIxvr21lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4I kC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWU WwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr 0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWU JVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJb IYCTnIWIevJa73UjIFyTuYvjxUF1v3UUUUU X-CM-SenderInfo: 5vklyvpphqwq5kxd4v5lfo033gof0z/ X-Rspamd-Server: rspam11 X-Rspam-User: X-Rspamd-Queue-Id: 4D25EA0013 X-Stat-Signature: mjq4fwtbp6qpgqh5jtftry7zm66pdm6y X-HE-Tag: 1750212519-214158 X-HE-Meta: U2FsdGVkX1/vNyM2VWyg3VTPKwDNtyQFG5T0jLTv3yFo/muZXbIbiI+cSEVIloQGkB2HK7KrVvW+sFOgFQ0xlXTLFjlmitw07MJEKawdzui3WwhI5/Zl5vKaF6hawxfkecDnCpu6XyorLEns0dd9XFvyiSj7ZuCkC9PQKOMiDeDfNk54tP31eY/pbJ87KlyUYek5qyDDbrLu6K9jKF3FS9y/Ygd7CBfTKuOuR4r+GQQFYVP8RET9N6p+9/0UZA8qkn9XSBoXYemEvNEpMtVl54TGOMa6xOqyMTNS9U9CdrUK+lOGARfzCkv/KiDityV5h4LNDCM04vfppQ2AeqEQCiuSqODgi+M3BW7xV/CdfE9X58zXcLWDajuYfDsQCx1+fS21nuaXR71YCKLRDDIcmB46L6y7EnE8Ga/NbNT2Ct2vhb67qussawtLWrzyVfJhLyolJA7ZZp3B59uxKFaX/wcOvzScvWpdrQ2kbb/QtqqI1y1Vooe61B6fxsQ1wPTgWHhOKXc6O5kvfsfJzLRk78tdKi1z9Si70ihXPi4yP5WM9OZ54g32iswlKeIKW9YqoiVl/X9LzP+xfZe04gSPoMjQODms18NmWL/UyLdZGbX5jiEJ8VCI9+BSqWAIzQsHGDZ2bfeuLcYUURBFgnQwQ/kYlRbduSmix1p8U2kiCzAUqqPe0JCSjI4RMrexibjDTxf55SNwjkBjJDaEsx91JP/HHHCL9eISTQoBa8XBWYx2xJVYhGPc2m6zw7xvSiR2il8dh+jTDrSs1Yv5UZYGuwa38sz36+uBOSa67D5y8W+HdUn8N/hxKp9FTFUhkNYwdEvQ5koqXo29eMByd+shJiNdrQ4JX9NzG1FEBhMyF5utnlK59G8gPnil1RCqZyLXim8IaFrlrvNSR9hIhOxiYVpQd1dSCN1AYATfHXWgZ8IbvtgR/oYZ5elg/CEgyK43oq5oI+KD2cIN+sxC3lU 1O5wd/Gc BjxAcOv8BPDevtBdtikige4VB7i0OIiJaTQ5D4BRTf+YTN5yPepNWhzdn9TW1ma4bPXG+4Opfpp4GpgCNqTsA5FaY4AWCZ0rw/gguSZcnzsfJt2RVDR7e5sm3+2cecyJDV8sA4dC5xo3L8dAW/N6mvvJSTNk77Rzydzd0dJ0tZ3k2RiXCkMERtIxgTJy9kS/dRmGi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: on 6/18/2025 2:35 AM, Kairui Song wrote: > From: Kairui Song > > The current swap-in code assumes that, when a swap entry in shmem > mapping is order 0, its cached folios (if present) must be order 0 > too, which turns out not always correct. > > The problem is shmem_split_large_entry is called before verifying the > folio will eventually be swapped in, one possible race is: > > CPU1 CPU2 > shmem_swapin_folio > /* swap in of order > 0 swap entry S1 */ > folio = swap_cache_get_folio > /* folio = NULL */ > order = xa_get_order > /* order > 0 */ > folio = shmem_swap_alloc_folio > /* mTHP alloc failure, folio = NULL */ > <... Interrupted ...> > shmem_swapin_folio > /* S1 is swapped in */ > shmem_writeout > /* S1 is swapped out, folio cached */ > shmem_split_large_entry(..., S1) > /* S1 is split, but the folio covering it has order > 0 now */ > > Now any following swapin of S1 will hang: `xa_get_order` returns 0, > and folio lookup will return a folio with order > 0. The > `xa_get_order(&mapping->i_pages, index) != folio_order(folio)` will > always return false causing swap-in to return -EEXIST. > > And this looks fragile. So fix this up by allowing seeing a larger folio > in swap cache, and check the whole shmem mapping range covered by the > swapin have the right swap value upon inserting the folio. And drop > the redundant tree walks before the insertion. > > This will actually improve the performance, as it avoided two redundant > Xarray tree walks in the hot path, and the only side effect is that in > the failure path, shmem may redundantly reallocate a few folios > causing temporary slight memory pressure. > > And worth noting, it may seems the order and value check before > inserting might help reducing the lock contention, which is not true. > The swap cache layer ensures raced swapin will either see a swap cache > folio or failed to do a swapin (we have SWAP_HAS_CACHE bit even if > swap cache is bypassed), so holding the folio lock and checking the > folio flag is already good enough for avoiding the lock contention. > The chance that a folio passes the swap entry value check but the > shmem mapping slot has changed should be very low. > > Cc: stable@vger.kernel.org > Fixes: 058313515d5a ("mm: shmem: fix potential data corruption during shmem swapin") > Fixes: 809bc86517cc ("mm: shmem: support large folio swap out") > Signed-off-by: Kairui Song > --- > mm/shmem.c | 30 +++++++++++++++++++++--------- > 1 file changed, 21 insertions(+), 9 deletions(-) > > diff --git a/mm/shmem.c b/mm/shmem.c > index eda35be2a8d9..4e7ef343a29b 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -884,7 +884,9 @@ static int shmem_add_to_page_cache(struct folio *folio, > pgoff_t index, void *expected, gfp_t gfp) > { > XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio)); > - long nr = folio_nr_pages(folio); > + unsigned long nr = folio_nr_pages(folio); > + swp_entry_t iter, swap; > + void *entry; > > VM_BUG_ON_FOLIO(index != round_down(index, nr), folio); > VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); > @@ -896,14 +898,24 @@ static int shmem_add_to_page_cache(struct folio *folio, > > gfp &= GFP_RECLAIM_MASK; > folio_throttle_swaprate(folio, gfp); > + swap = iter = radix_to_swp_entry(expected); > > do { > xas_lock_irq(&xas); > - if (expected != xas_find_conflict(&xas)) { > - xas_set_err(&xas, -EEXIST); > - goto unlock; > + xas_for_each_conflict(&xas, entry) { > + /* > + * The range must either be empty, or filled with > + * expected swap entries. Shmem swap entries are never > + * partially freed without split of both entry and > + * folio, so there shouldn't be any holes. > + */ > + if (!expected || entry != swp_to_radix_entry(iter)) { > + xas_set_err(&xas, -EEXIST); > + goto unlock; > + } > + iter.val += 1 << xas_get_order(&xas); > } > - if (expected && xas_find_conflict(&xas)) { > + if (expected && iter.val - nr != swap.val) { > xas_set_err(&xas, -EEXIST); > goto unlock; > } > @@ -2323,7 +2335,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, > error = -ENOMEM; > goto failed; > } > - } else if (order != folio_order(folio)) { > + } else if (order > folio_order(folio)) { > /* > * Swap readahead may swap in order 0 folios into swapcache > * asynchronously, while the shmem mapping can still stores > @@ -2348,15 +2360,15 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, > > swap = swp_entry(swp_type(swap), swp_offset(swap) + offset); > } > + } else if (order < folio_order(folio)) { > + swap.val = round_down(swp_type(swap), folio_order(folio)); > } > > alloced: > /* We have to do this with folio locked to prevent races */ > folio_lock(folio); > if ((!skip_swapcache && !folio_test_swapcache(folio)) || > - folio->swap.val != swap.val || > - !shmem_confirm_swap(mapping, index, swap) || > - xa_get_order(&mapping->i_pages, index) != folio_order(folio)) { > + folio->swap.val != swap.val) { > error = -EEXIST; > goto unlock; > } > Nice catch! Reviewed-by: Kemeng Shi