From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87819C3DA6D for ; Tue, 20 May 2025 22:33:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CE256B007B; Tue, 20 May 2025 18:33:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07F716B0082; Tue, 20 May 2025 18:33:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EAF086B0083; Tue, 20 May 2025 18:33:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CB74A6B007B for ; Tue, 20 May 2025 18:33:50 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 31E5581622 for ; Tue, 20 May 2025 22:33:50 +0000 (UTC) X-FDA: 83464739820.12.9F684E6 Received: from mail-vk1-f170.google.com (mail-vk1-f170.google.com [209.85.221.170]) by imf28.hostedemail.com (Postfix) with ESMTP id 79CA8C0004 for ; Tue, 20 May 2025 22:33:48 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BYY8KknT; spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747780428; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BEQ4yHTgYt/W3yyenrM8/GZviNPETELo+EsMJcdNHok=; b=q6pwdq4LgX8mCILFoqdNukkunmvTjmbF6u3LyubjO4SaPQEcicEsFjh/SmfCADYLOzul2z odvrJFdFGBCzARe5HqRfuDCmupxbln42/qvmHt0S6xR3O8K2BMOSicNQ8tPpeb6hY3rJ/i os+L0HxvUUP6k0NHHYLL4DJoxhEMo78= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BYY8KknT; spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747780428; a=rsa-sha256; cv=none; b=bL5TJcbCknyNp4ncaa8wijaVjVV0ioUHwwTtx00t0Ro3HVRl+dFRwZRFamGSVnB5lgiIOd XbojOeMg2R8NoAvHwkL3BwpcQSIV47CXNPJkulGqMlZ7xrVKDC/8zOJkM3PGpqX20yT6Z5 0XgUxS9k4Ru69FynrrjixeUjd6GSQn8= Received: by mail-vk1-f170.google.com with SMTP id 71dfb90a1353d-52934f4fb23so4167484e0c.1 for ; Tue, 20 May 2025 15:33:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747780427; x=1748385227; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BEQ4yHTgYt/W3yyenrM8/GZviNPETELo+EsMJcdNHok=; b=BYY8KknTnHa6+dqsWLcJB1I7sO8Wvt4DYep4+qZ6hs7a9neAjYGf3BnkBrgRJMNlyp Vqt3i8aakrNQ0tpEW3q/4Er3cwIAjVhwtdpBnk3hg6BWNHRxAzS67NOYAKOGTPm5Ykuh ox2lIaCm9JXJRRhILzIEwzVCrkhKE57jtkabc9tV/xRBNSSQWrZ1bFZvYDXM0w7QqwJk w5wyCjRHxb3zvLdmGD0WjB/Hm4rHF9Vlfl0RZfJZix20GbRcqEArxJYilaR8Ottq+QSZ rPYO+B2OLSYdas9R3xf9j2aA6sHjDlBec6KnnRMXFYONuHW+sIpFn2+uwQE5FZn6mMhc 9lag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747780427; x=1748385227; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BEQ4yHTgYt/W3yyenrM8/GZviNPETELo+EsMJcdNHok=; b=udIXqkfPk/uK/iC4PZuGDDAtl5YAZWKIMGLKdtHm+ohe4unseWL6XMmwxJFMl1QO7H mG89adiA0HJ+4km98Hz8itudlkStEPMb8BRovQOaXPm5dtqkYsU5CIU5Ce18Knbf7rpo PiWNof0On482xcO9/LAHOSUdTFkIwq6mYHAvcg4fsSjaZnJKlfccTlyBgv4hWzX4rrSF 1yJPpg2Spf13NwkhkX5Y9ypnHJ3hu6YlDQh76Bkjw5jnh+a/vr8UEn2+hQg8k+nqadcQ ETyHBxd+LWKRk2WM+lkmSoEq+ZPp6T4oTve2JJ1WNvafa2NvSn7cz+zTq46BQ//wziim Ut+Q== X-Forwarded-Encrypted: i=1; AJvYcCW6S+EsHJkchYf3fElbwIaF3RlLOxX7a7+dniFLhWb9umCs73RMZQvgUl2Prwyp3s7TuTIAq4NRZQ==@kvack.org X-Gm-Message-State: AOJu0Yxl7wXammw4yHNfjmQC+zUZLjZKq2/VErzbOIiE2Ay++0ZzHmdy lz+V28bOll1OGLYBEETHqpTuyhdm7iWijgunsdFhfZC5NH7DRCnOMqZTth+cnkLAirCyQz10csm lSl6cKl00Zro2CH0TDsOjZlnYvkl5KSI= X-Gm-Gg: ASbGncuEQo047f9lr+j4PZXrfYN8JGe2frys4+vxdEtDUHDVxrFLz6VtbwNCPyg7GWs 4QNal0jFmcrRI3NEMRprfY+CMALibgeO8wewdhUGZE1n9rUsC+OGJ0ZJ0MBpjeOrw0DwYl4N4vm wXZ30d9HR1lGPYVwUCUX10bQL/r6/JxB0lDA== X-Google-Smtp-Source: AGHT+IGPHoXac3jI35IfwhzrJ/gzoXPwDZc5MasnWyMo8kgUrXT0nFms4usEO1kkGt4/hMnTUQvvQ/ePeTnoSom82J8= X-Received: by 2002:a05:6122:8c0f:b0:520:62ce:98ed with SMTP id 71dfb90a1353d-52dba8d0941mr19344098e0c.6.1747780427377; Tue, 20 May 2025 15:33:47 -0700 (PDT) MIME-Version: 1.0 References: <20250514201729.48420-6-ryncsn@gmail.com> <20250519043847.1806-1-21cnbao@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Wed, 21 May 2025 10:33:36 +1200 X-Gm-Features: AX0GCFtlyzmN6MFJ-bl0pt3pOnCjq_9R5kTXKBIQWYVW7LIPaaKshpndYqyFqVI Message-ID: Subject: Re: [PATCH 05/28] mm, swap: sanitize swap cache lookup convention To: Kairui Song Cc: akpm@linux-foundation.org, baolin.wang@linux.alibaba.com, bhe@redhat.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, nphamcs@gmail.com, ryan.roberts@arm.com, shikemeng@huaweicloud.com, tim.c.chen@linux.intel.com, willy@infradead.org, ying.huang@linux.alibaba.com, yosryahmed@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 79CA8C0004 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: xt5bucezjz3ufmpm4846bzehs84whr7q X-HE-Tag: 1747780428-576578 X-HE-Meta: U2FsdGVkX1/C+iyUPyl9+pEAn3yfFtXqZUMnOAqbam8imrq3MPdLsyHi267YQ54AseCxCTzrwrRHbMhFzUaovERqO6XueVp1WSuWIyCigjC0UqidGnjDrcwTRTEKpBt3mCo+b9v76/wYnpYLPJkbvlrFIJsPMHOBtTlZSi5GExf3PnNXDLUtEU1hBPxJnG1b8VL0uUrsLPZlHmAVKEAX/gFfxVYh1V0AAz8WZU62rg55s5KUUH5zgae/yNO0VmTsPnSTOsxQsjLe6A7MwEcbbAAUp+kXAod2yrPGlhDqY4AENM+ul9m3DwFHFscFUCmOK62rdLJHjl4rPT3HadfCH7Azli/IbWnhcQNu7mTf/2p3DpzV84joLi7ZjELg0I2v2c9Gb+ruyBif4L9B3p9su59L+YiYGZ7ObaHW1/JQ706xiOGcMKyl8p8IeQMbbRbnNhMs5RU2n5TaxtLjeS2i4F8Z992gHTKmTEB0spzaVrsWmnKNEbEZdwsAyEdEYqvonWnd/yng1BevaGZaQ2413DSO0EivJEW57J5ikLsE+HRDTf14PhCnism8spPTc9CelcjqssQ3GpaxzMVO0q2eGMD7aysab/AIM5UgqykMXGUd7BVftPb0wjixKTA71TnT2J282FIPy0jFz32+sxrIYC8Ea+4pUivc5Zff3Sa0d5kaLBM6I+VqODWErbfkfJ64g1Aztam4nTjYq5VVrP8U8N1aXKJcuvsataQna840emwMTKgvv9F48GFZOQr+60YXFsmNGFtwKeh1IGNOqo3NX5EmT5GJ0XduyoOMAwLCCZ2We3wnR2Elq9ecWX579hyDcTZ3U7dpgote2lIbBsz3G/VfDvyGt0tVodQp702rQHGj6IPq1ZJDfTMUQQ7kx85pK9kZIxib05i8o4DIiR2mnVSk8pwLjk/7B6JbAzsM63hkisfhRFogWGoSnNA2NkBGh89nYe2eo5w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 21, 2025 at 7:10=E2=80=AFAM Kairui Song wrot= e: > > On Tue, May 20, 2025 at 12:41=E2=80=AFPM Barry Song <21cnbao@gmail.com> w= rote: > > > > On Tue, May 20, 2025 at 3:31=E2=80=AFPM Kairui Song = wrote: > > > > > > On Mon, May 19, 2025 at 12:38=E2=80=AFPM Barry Song <21cnbao@gmail.co= m> wrote: > > > > > > > > > From: Kairui Song > > > > > > > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > > > > index e5a0db7f3331..5b4f01aecf35 100644 > > > > > --- a/mm/userfaultfd.c > > > > > +++ b/mm/userfaultfd.c > > > > > @@ -1409,6 +1409,10 @@ static int move_pages_pte(struct mm_struct= *mm, pmd_t *dst_pmd, pmd_t *src_pmd, > > > > > goto retry; > > > > > } > > > > > } > > > > > + if (!folio_swap_contains(src_folio, entry)) { > > > > > + err =3D -EBUSY; > > > > > + goto out; > > > > > + } > > > > > > > > It seems we don't need this. In move_swap_pte(), we have been check= ing pte pages > > > > are stable: > > > > > > > > if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, or= ig_src_pte, > > > > dst_pmd, dst_pmdval)) { > > > > double_pt_unlock(dst_ptl, src_ptl); > > > > return -EAGAIN; > > > > } > > > > > > The tricky part is when swap_cache_get_folio returns the folio, both > > > folio and ptes are unlocked. So is it possible that someone else > > > swapped in the entries, then swapped them out again using the same > > > entries? > > > > > > The folio will be different here but PTEs are still the same value to > > > they will pass the is_pte_pages_stable check, we previously saw > > > similar races with anon fault or shmem. I think more strict checking > > > won't hurt here. > > > > This doesn't seem to be the same case as the one you fixed in > > do_swap_page(). Here, we're hitting the swap cache, whereas in that > > case, there was no one hitting the swap cache, and you used > > swap_prepare() to set up the cache to fix the issue. > > > > By the way, if we're not hitting the swap cache, src_folio will be > > NULL. Also, it seems that folio_swap_contains(src_folio, entry) does > > not guard against that case either. > > Ah, that's true, it should be moved inside the if (folio) {...} block > above. Thanks for catching this! > > > But I suspect we won't have a problem, since we're not swapping in =E2= =80=94 > > we didn't read any stale data, right? Swap-in will only occur after we > > move the PTEs. > > My concern is that a parallel swapin / swapout could result in the > folio to be a completely irrelevant or invalid folio. > > It's not about the dst, but in the move src side, something like: > > CPU1 CPU2 > move_pages_pte > folio =3D swap_cache_get_folio(...) > | Got folio A here > move_swap_pte > > > | Now folio A is no longer valid. > | It's very unlikely but here SWAP > | could reuse the same entry as above. swap_cache_get_folio() does increment the folio's refcount, but it seems th= is doesn't prevent do_swap_page() from freeing the swap entry after swapping in src_pte with folio A, if it's a read fault. for write fault, folio_ref_count(folio) =3D=3D (1 + folio_nr_pages(folio)) will be false: static inline bool should_try_to_free_swap(struct folio *folio, struct vm_area_struct *vma, unsigned int fault_flags) { ... /* * If we want to map a page that's in the swapcache writable, we * have to detect via the refcount if we're really the exclusive * user. Try freeing the swapcache to get rid of the swapcache * reference only in case it's likely that we'll be the exlusive us= er. */ return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &= & folio_ref_count(folio) =3D=3D (1 + folio_nr_pages(folio)); } and for swapout, __removing_mapping does check refcount as well: static int __remove_mapping(struct address_space *mapping, struct folio *fo= lio, bool reclaimed, struct mem_cgroup *target_memcg= ) { refcount =3D 1 + folio_nr_pages(folio); if (!folio_ref_freeze(folio, refcount)) goto cannot_free; } However, since __remove_mapping() occurs after pageout(), it seems this also doesn't prevent swapout from allocating a new swap entry to fill src_pte. It seems your concern is valid=E2=80=94unless I'm missing something. Do you have a reproducer? If so, this will likely need a separate fix patch rather than being hidden in this patchset. > double_pt_lock > is_pte_pages_stable > | Passed because of entry reuse. > folio_move_anon_rmap(...) > | Moved invalid folio A. > > And could it be possible that the swap_cache_get_folio returns NULL > here, but later right before the double_pt_lock, a folio is added to > swap cache? Maybe we better check the swap cache after clear and > releasing dst lock, but before releasing src lock? It seems you're suggesting that a parallel swap-in allocates and adds a folio to the swap cache, but the PTE has not yet been updated from a swap entry to a present mapping? As long as do_swap_page() adds the folio to the swap cache before updating the PTE to present, this scenario seems possible. It seems we need to double-check: diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index bc473ad21202..976053bd2bf1 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1102,8 +1102,14 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma, if (src_folio) { folio_move_anon_rmap(src_folio, dst_vma); src_folio->index =3D linear_page_index(dst_vma, dst_addr); + } else { + struct folio *folio =3D filemap_get_folio(swap_address_space(entry), + swap_cache_index(entry)); + if (!IS_ERR_OR_NULL(folio)) { + double_pt_unlock(dst_ptl, src_ptl); + return -EAGAIN; + } } - orig_src_pte =3D ptep_get_and_clear(mm, src_addr, src_pte); #ifdef CONFIG_MEM_SOFT_DIRTY orig_src_pte =3D pte_swp_mksoft_dirty(orig_src_pte); Let me run test case [1] to check whether this ever happens. I guess I need= to hack kernel a bit to always add folio to swapcache even for SYNC IO. [1] https://lore.kernel.org/linux-mm/20250219112519.92853-1-21cnbao@gmail.c= om/ > > > > > > > > > > > > > > > Also, -EBUSY is somehow incorrect error code. > > > > > > Yes, thanks, I'll use EAGAIN here just like move_swap_pte. > > > > > > > > > > > > > > > err =3D move_swap_pte(mm, dst_vma, dst_addr, src_ad= dr, dst_pte, src_pte, > > > > > orig_dst_pte, orig_src_pte, dst_pmd= , dst_pmdval, > > > > > dst_ptl, src_ptl, src_folio); > > > > > > > > > > > Thanks Barry