From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA6D8C433FE for ; Mon, 14 Nov 2022 20:09:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F4466B0071; Mon, 14 Nov 2022 15:09:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 07CEC6B0072; Mon, 14 Nov 2022 15:09:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E38B18E0001; Mon, 14 Nov 2022 15:09:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CD97F6B0071 for ; Mon, 14 Nov 2022 15:09:12 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7C846AAB6B for ; Mon, 14 Nov 2022 20:09:12 +0000 (UTC) X-FDA: 80133136944.05.AA58123 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id 5BA1240002 for ; Mon, 14 Nov 2022 20:09:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668456549; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=6n6zpJl4G/A5VCRLcxo9kc/9ERsyYZ62NHpJBtTugxQ=; b=P+Gh7QZSqEwjHfyTnX97vfu235ftJYpwnvcS0gSvTlDqEMVlbKubxSjiwuBK2BrO5ry45i CVElMEYYaKMDo/l4/Ff4wRqQhdWVGGnU4qTPrfHrd42ALSWvmcXIF4JQLrIF8jrlro8QNR Byi7Rp/KHkFYhg7hBZif3zOEudG4hMI= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-91-m8KIdEz3MWqrW9na5yaAkg-1; Mon, 14 Nov 2022 15:09:08 -0500 X-MC-Unique: m8KIdEz3MWqrW9na5yaAkg-1 Received: by mail-qt1-f198.google.com with SMTP id cj6-20020a05622a258600b003a519d02f59so8692829qtb.5 for ; Mon, 14 Nov 2022 12:09:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=6n6zpJl4G/A5VCRLcxo9kc/9ERsyYZ62NHpJBtTugxQ=; b=AiDyiKadVJXL2SvBeRPhth+7U3Tgl8cVSH2O9aCDIlL/9ElCyXbK37y+T0haesufOL YTncHxRtWUMEYdKppLxnL7SrRGsl+B0KE8bT1wGXEHaDD6l/Fek4x5BNyKxYRVc6xv+b EMpJHgEAZInGtFmp5n2TZzHJa7zxMePm482umVkpz7VHEGHydNdun8h7zbuu77FILxpT R0Rbyelbs8kNTFlXjkRn5DBT02FNTNkFkleP+ldHa4BSsUoXyh38lgRfM2aQ3nEeOH1N yGWi5mP1MWm013GSNiANgqB1zzGPS0oNV2KEtiYzSE+U2ZRa12AXuALZLGgCBguOc6ZM F+Ow== X-Gm-Message-State: ANoB5pnrLkyJhT2wukiFf11TE7L681A8tBMbEc32Mj3nAkIyOzNPSMvM RJ9s/lqTvJUnKa63ZPF3ryKliMO/hLpQDxXaOr4O8EDJnFn8paWWDADmYi/LVEnDXHsMz7WU0YV xCi0q8EZfQ18= X-Received: by 2002:ae9:e416:0:b0:6fa:21ba:b882 with SMTP id q22-20020ae9e416000000b006fa21bab882mr12368625qkc.646.1668456547898; Mon, 14 Nov 2022 12:09:07 -0800 (PST) X-Google-Smtp-Source: AA0mqf6upwGHutL5OZAEnZpDgA+t9A69V+yZeqyR7wNZmEjKXlTMunsCllTaEmsySgYRdSqy77p31w== X-Received: by 2002:ae9:e416:0:b0:6fa:21ba:b882 with SMTP id q22-20020ae9e416000000b006fa21bab882mr12368602qkc.646.1668456547589; Mon, 14 Nov 2022 12:09:07 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id h3-20020a05620a244300b006eed47a1a1esm6992533qkn.134.2022.11.14.12.09.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 12:09:07 -0800 (PST) Date: Mon, 14 Nov 2022 15:09:05 -0500 From: Peter Xu To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrea Arcangeli , Axel Rasmussen , Ives van Hoorne , Nadav Amit , Andrew Morton , Mike Rapoport , stable@vger.kernel.org Subject: Re: [PATCH v2 1/2] mm/migrate: Fix read-only page got writable when recover pte Message-ID: References: <20221110203132.1498183-1-peterx@redhat.com> <20221110203132.1498183-2-peterx@redhat.com> <9af36be3-313b-e39c-85bb-bf30011bccb8@redhat.com> MIME-Version: 1.0 In-Reply-To: <9af36be3-313b-e39c-85bb-bf30011bccb8@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=P+Gh7QZS; spf=pass (imf17.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668456550; a=rsa-sha256; cv=none; b=GhooSU76cFMUA4LrvbcERHbKfOvbT6QMO9BM6kveZWv6noM9Hkcop317FTvOe0wuZmMF9t RJSxzdTIpMiwMPqYK/XNBsiOmXl2oISTMfIxAJIRHHxyR9EJFIjwYYUJfcCk5t2gxHJIeP tNYWZR6OnSC1UxGIfsOQcRnUSz+8DtE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668456550; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6n6zpJl4G/A5VCRLcxo9kc/9ERsyYZ62NHpJBtTugxQ=; b=n5NPUvcgpRCOye6r+ZuQLVmy6E/343IHfuFLgxrC48KZysDDNmL0RoWpcLviGNw7M58C8V urNR8XfvwW9D36Nz7nGGRVEvldlbdKR03FlFHlPAf0oxdLwk4caF5Vg1pVMxl0qaPinsYG BcG4ljnaLt06sf/o1Rxd4irFbTCrWeE= X-Stat-Signature: ujh6dmetem7jshwo9wg9bza1zpb7og3b X-Rspamd-Queue-Id: 5BA1240002 X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=P+Gh7QZS; spf=pass (imf17.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam11 X-HE-Tag: 1668456550-458489 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 14, 2022 at 05:09:32PM +0100, David Hildenbrand wrote: > On 10.11.22 21:31, Peter Xu wrote: > > Ives van Hoorne from codesandbox.io reported an issue regarding possible > > data loss of uffd-wp when applied to memfds on heavily loaded systems. The > > sympton is some read page got data mismatch from the snapshot child VMs. > > > > Here I can also reproduce with a Rust reproducer that was provided by Ives > > that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate > > 80 instances I can trigger the issues in ten minutes. > > > > It turns out that we got some pages write-through even if uffd-wp is > > applied to the pte. > > > > The problem is, when removing migration entries, we didn't really worry > > about write bit as long as we know it's not a write migration entry. That > > may not be true, for some memory types (e.g. writable shmem) mk_pte can > > return a pte with write bit set, then to recover the migration entry to its > > original state we need to explicit wr-protect the pte or it'll has the > > write bit set if it's a read migration entry. > > > > For uffd it can cause write-through. I didn't verify, but I think it'll be > > the same for mprotect()ed pages and after migration we can miss the sigbus > > instead. > > I don't think so. mprotect() handling relies on vma->vm_page_prot, which is > supposed to do the right thing. E.g., map the pte protnone without > VM_READ/VM_WRITE/.... I've removed that example when I posted v3, feel free to have a look. > > > > > The relevant code on uffd was introduced in the anon support, which is > > commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration", > > 2020-04-07). However anon shouldn't suffer from this problem because anon > > should already have the write bit cleared always, so that may not be a > > proper Fixes target. To satisfy the need on the backport, I'm attaching > > the Fixes tag to the uffd-wp shmem support. Since no one had issue with > > mprotect, so I assume that's also the kernel version we should start to > > backport for stable, and we shouldn't need to worry before that. > > > > Cc: Andrea Arcangeli > > Cc: stable@vger.kernel.org > > Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") > > Reported-by: Ives van Hoorne > > Signed-off-by: Peter Xu > > --- > > mm/migrate.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > index dff333593a8a..8b6351c08c78 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -213,8 +213,14 @@ static bool remove_migration_pte(struct folio *folio, > > pte = pte_mkdirty(pte); > > if (is_writable_migration_entry(entry)) > > pte = maybe_mkwrite(pte, vma); > > - else if (pte_swp_uffd_wp(*pvmw.pte)) > > + else > > + /* NOTE: mk_pte can have write bit set */ > > + pte = pte_wrprotect(pte); > > > Any particular reason why not to simply glue this to pte_swp_uffd_wp(), > because only that needs special care: > > if (pte_swp_uffd_wp(*pvmw.pte)) { > pte = pte_wrprotect(pte); > pte = pte_mkuffd_wp(pte); > } > > > And that would match what actually should have been done in commit > f45ec5ff16a7 -- only special-case uffd-wp. > > Note that I think there are cases where we have a PTE that was !writable, > but after migration we can map it writable. The thing is recovering the pte into its original form is the safest approach to me, so I think we need justification on why it's always safe to set the write bit. Or do you perhaps have solid clue and think it's always safe? > > BTW, does unuse_pte() need similar care? > > new_pte = pte_mkold(mk_pte(page, vma->vm_page_prot)); > if (pte_swp_uffd_wp(*pte)) > new_pte = pte_mkuffd_wp(new_pte); > set_pte_at(vma->vm_mm, addr, pte, new_pte); I think unuse path is fine because unuse only applies to private mappings, so we should always have the W bit removed there within mk_pte(). Thanks, -- Peter Xu