From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79917C43217 for ; Thu, 1 Dec 2022 15:43:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D5C26B007D; Thu, 1 Dec 2022 10:43:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 885676B007B; Thu, 1 Dec 2022 10:43:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 774DF6B007D; Thu, 1 Dec 2022 10:43:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 684276B0078 for ; Thu, 1 Dec 2022 10:43:16 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2997714061E for ; Thu, 1 Dec 2022 15:43:16 +0000 (UTC) X-FDA: 80194156392.01.B06D0D2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id B1A5CA0012 for ; Thu, 1 Dec 2022 15:43:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669909393; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1PA8FyCR3/gunJx7DkyWDmKgZ7n6tougFbaW1vOf1yk=; b=jPfQn6m+Ez5+waSU7MlJZdySlqQo2mydh/WJOX/7ju6akqDPmyH9w7dZPrjtw+ICPo2WjS Z/IHBTyNgzn7II2wUN57LyobdvCadbcuAPqTn3+B7wC2amtZdPsGSKDysdjhsF94HQHyDN dwPS8LEr77OCmIVuIdZIuEKWHRB95wI= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-552-WgYUN5fKNImS8OVYWpuRWg-1; Thu, 01 Dec 2022 10:43:08 -0500 X-MC-Unique: WgYUN5fKNImS8OVYWpuRWg-1 Received: by mail-wm1-f72.google.com with SMTP id 8-20020a05600c228800b003d0376e42deso835373wmf.4 for ; Thu, 01 Dec 2022 07:43:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=1PA8FyCR3/gunJx7DkyWDmKgZ7n6tougFbaW1vOf1yk=; b=k/wqRTPHKcChaf5Hv8s4CORO9N4G+2xGeWWxTdDFLoB80X9e+jRSUf8dJZg4FcwEsb bkwsxTG8NXh+vwUE8FqNcZEcRngqUyqFLb3WRxUyaAR1ABO2rHCQhpqY4NyLVG3rZTXQ QDShZhXk7mcTN+yGHuycqkQjSERMgvSumQiUsqywjJ1VTlE9l7yBl+r4Ds9In/SwjZyu i2NMsHvwoX0hGtViIN4rTxyyayEVTgvGB/F1HRg7i+5vVWoPNMWECrgkvwrlhH1c0UWj 7q3rLVuhsB+1GRhDM2JVzsrhVn/K0mx6E84kYF2lWAjjxc5H0tj8PXe4dS6EE77MESjE zeAg== X-Gm-Message-State: ANoB5pmM+IkqP/D2Q6bv8Nh+NgdoNw/wnRlmWHZKSAvtLchjKL+pk42/ 5FgoUM3dIfHny0qEC7lL5lhADKyc0OtMUzlkev/WK3dMrfOFSijN4/5yrmxBr6/nG3iieiqKB64 vxHqATFbyB7I= X-Received: by 2002:a05:6000:124d:b0:242:10a:6667 with SMTP id j13-20020a056000124d00b00242010a6667mr23538869wrx.39.1669909383071; Thu, 01 Dec 2022 07:43:03 -0800 (PST) X-Google-Smtp-Source: AA0mqf5TCeDPeTaqSt4aUGafmCL2GAkina7IU0KvdH18HPecW20emnXyEvE8tc6bqGA+pHGslUBJ3Q== X-Received: by 2002:a05:6000:124d:b0:242:10a:6667 with SMTP id j13-20020a056000124d00b00242010a6667mr23538465wrx.39.1669909374676; Thu, 01 Dec 2022 07:42:54 -0800 (PST) Received: from ?IPV6:2a09:80c0:192:0:5dac:bf3d:c41:c3e7? ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with ESMTPSA id u11-20020a5d6acb000000b00241c4bd6c09sm4777074wrw.33.2022.12.01.07.42.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 01 Dec 2022 07:42:53 -0800 (PST) Message-ID: Date: Thu, 1 Dec 2022 16:42:52 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Subject: Re: [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover pte To: Peter Xu , Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mike Rapoport , Nadav Amit , Andrea Arcangeli , Ives van Hoorne , Axel Rasmussen , Alistair Popple , stable@vger.kernel.org References: <20221114000447.1681003-1-peterx@redhat.com> <20221114000447.1681003-2-peterx@redhat.com> <5ddf1310-b49f-6e66-a22a-6de361602558@redhat.com> <20221130142425.6a7fdfa3e5954f3c305a77ee@linux-foundation.org> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669909394; a=rsa-sha256; cv=none; b=zfWnOKp+uhATuZP0JHB+h64AcrTai5cSk1iC7DZ8JF79UDyx8OWF1QgSDfsc7tPrm3vw62 7PmMmdU9eqoNGV9KK7XVgk9K9FIc8nYPrpiz3WXYxpItS1Dym9uFSGaWrcmK/++pOTb3jx uLFZfm0Iz2suhzvQbvLk+3SYIPtG8R8= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jPfQn6m+; spf=pass (imf15.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669909394; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1PA8FyCR3/gunJx7DkyWDmKgZ7n6tougFbaW1vOf1yk=; b=nQC4GpYnLBDhPIvjx+asRAk1/CBwFh3xzZsT9evQtPL9IGHAsyqgk9VWAkBp1QhfzGykMr j0dtQI3XwB8RTIoAzcvrLSUfAY9HWf0IRTJlKMqI9zRCVa/GMzokVA+zPUaKV75f1ro6Br twfaILpggVsRyMwrIUIUxG3t2Q0mF9E= X-Stat-Signature: 9unougj1q8cmes47jr9rge589rw6ott6 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jPfQn6m+; spf=pass (imf15.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: B1A5CA0012 X-Rspam-User: X-HE-Tag: 1669909394-95383 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 01.12.22 16:28, Peter Xu wrote: > Hi, Andrew, > > On Wed, Nov 30, 2022 at 02:24:25PM -0800, Andrew Morton wrote: >> On Tue, 15 Nov 2022 19:17:43 +0100 David Hildenbrand wrote: >> >>> On 14.11.22 01:04, Peter Xu wrote: >>>> Ives van Hoorne from codesandbox.io reported an issue regarding possible >>>> data loss of uffd-wp when applied to memfds on heavily loaded systems. The >>>> symptom is some read page got data mismatch from the snapshot child VMs. >>>> >>>> Here I can also reproduce with a Rust reproducer that was provided by Ives >>>> that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate >>>> 80 instances I can trigger the issues in ten minutes. >>>> >>>> It turns out that we got some pages write-through even if uffd-wp is >>>> applied to the pte. >>>> >>>> The problem is, when removing migration entries, we didn't really worry >>>> about write bit as long as we know it's not a write migration entry. That >>>> may not be true, for some memory types (e.g. writable shmem) mk_pte can >>>> return a pte with write bit set, then to recover the migration entry to its >>>> original state we need to explicit wr-protect the pte or it'll has the >>>> write bit set if it's a read migration entry. For uffd it can cause >>>> write-through. >>>> >>>> The relevant code on uffd was introduced in the anon support, which is >>>> commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration", >>>> 2020-04-07). However anon shouldn't suffer from this problem because anon >>>> should already have the write bit cleared always, so that may not be a >>>> proper Fixes target, while I'm adding the Fixes to be uffd shmem support. >>>> >>> >>> ... >>> >>>> --- a/mm/migrate.c >>>> +++ b/mm/migrate.c >>>> @@ -213,8 +213,14 @@ static bool remove_migration_pte(struct folio *folio, >>>> pte = pte_mkdirty(pte); >>>> if (is_writable_migration_entry(entry)) >>>> pte = maybe_mkwrite(pte, vma); >>>> - else if (pte_swp_uffd_wp(*pvmw.pte)) >>>> + else >>>> + /* NOTE: mk_pte can have write bit set */ >>>> + pte = pte_wrprotect(pte); >>>> + >>>> + if (pte_swp_uffd_wp(*pvmw.pte)) { >>>> + WARN_ON_ONCE(pte_write(pte)); >> >> Will this warnnig trigger in the scenario you and Ives have discovered? > > If without the above newly added wr-protect, yes. This is the case where > we found we got write bit set even if uffd-wp bit is also set, hence allows > the write to go through even if marked protected. > >> >>>> pte = pte_mkuffd_wp(pte); >>>> + } >>>> >>>> if (folio_test_anon(folio) && !is_readable_migration_entry(entry)) >>>> rmap_flags |= RMAP_EXCLUSIVE; >>> >>> As raised, I don't agree to this generic non-uffd-wp change without >>> further, clear justification. >> >> Pater, can you please work this further? > > I didn't reply here because I have already replied with the question in > previous version with a few attempts. Quotting myself: > > https://lore.kernel.org/all/Y3KgYeMTdTM0FN5W@x1n/ > > The thing is recovering the pte into its original form is the > safest approach to me, so I think we need justification on why it's > always safe to set the write bit. > > I've also got another longer email trying to explain why I think it's the > other way round to be justfied, rather than justifying removal of the write > bit for a read migration entry, here: > And I disagree for this patch that is supposed to fix this hunk: @@ -243,11 +243,15 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, entry = pte_to_swp_entry(*pvmw.pte); if (is_write_migration_entry(entry)) pte = maybe_mkwrite(pte, vma); + else if (pte_swp_uffd_wp(*pvmw.pte)) + pte = pte_mkuffd_wp(pte); if (unlikely(is_zone_device_page(new))) { if (is_device_private_page(new)) { entry = make_device_private_entry(new, pte_write(pte)); pte = swp_entry_to_pte(entry); + if (pte_swp_uffd_wp(*pvmw.pte)) + pte = pte_mkuffd_wp(pte); } } There is really nothing to justify the other way around here. If it's broken fix it independently and properly backport it independenty. But we don't know about any such broken case. I have no energy to spare to argue further ;) -- Thanks, David / dhildenb