From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A8601C4708D
	for <linux-mm@archiver.kernel.org>; Wed,  7 Dec 2022 20:10:49 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 18CDD8E0003; Wed,  7 Dec 2022 15:10:49 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 13CD78E0001; Wed,  7 Dec 2022 15:10:49 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id F1F838E0003; Wed,  7 Dec 2022 15:10:48 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id E0CE08E0001
	for <linux-mm@kvack.org>; Wed,  7 Dec 2022 15:10:48 -0500 (EST)
Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id A1A59AB5F0
	for <linux-mm@kvack.org>; Wed,  7 Dec 2022 20:10:48 +0000 (UTC)
X-FDA: 80216603376.30.FA1DA58
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by imf02.hostedemail.com (Postfix) with ESMTP id E212F80008
	for <linux-mm@kvack.org>; Wed,  7 Dec 2022 20:10:47 +0000 (UTC)
Authentication-Results: imf02.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=aAE9viV9;
	spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1670443848;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=baWxWbCInGY7stkmdks81E7dLCHRp7Y+jb4EX+IK+hM=;
	b=NEQpbS3JHkDq4IcFxZ8Q1gEOYw27uqn7Q7RmjJWA49ZnEf4DAw7Pd+x9A3SlTYhC+czXmK
	Rfkfne1R9KnaEZ5LsxrYwjYWRqg4g3IlZKcGwzSZ9qYDE29mYReTGaGtstSOrPwei2Sgn7
	UGModKIHcR8pBDsFohtWnknMlN+PnK8=
ARC-Authentication-Results: i=1;
	imf02.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=aAE9viV9;
	spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670443848; a=rsa-sha256;
	cv=none;
	b=yH6CZ6jrBNs5tp/QiANf4T/MCQwbFyv+AjXIXz4YnthLl7AB09NsgPBodW+E+zxNrLrFJz
	MEMOIah8YwbfgAMzk6V8VUGvhII4ZHbh9/C1SdoCK3aWmqDbZvduzGUcEDVKDJEZ1GBSTy
	UnKt2rjEw4rZHO4gvuOkeULG5PU+Irc=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1670443847;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=baWxWbCInGY7stkmdks81E7dLCHRp7Y+jb4EX+IK+hM=;
	b=aAE9viV9dbfnGKdna6nYuVhSQZDu6ngdfeZ28A9WIExgMnZuIOj+og1Q/+tw+Bz5E8ZXB8
	9hlK2ajlThCWY9AEOz6D1efuyV92qiHvtVMPgrc32NI1b1dmAjF83VROoy67dS4sHp35Fg
	Oi0Sisjfhfg8bl4HCKXxTvPP/woG2fw=
Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com
 [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-147-FaDTzrcSMIe9xUNGKPn3bQ-1; Wed, 07 Dec 2022 15:10:44 -0500
X-MC-Unique: FaDTzrcSMIe9xUNGKPn3bQ-1
Received: by mail-wr1-f72.google.com with SMTP id j29-20020adfb31d000000b0024237066261so4635918wrd.14
        for <linux-mm@kvack.org>; Wed, 07 Dec 2022 12:10:44 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:subject:organization:from
         :references:cc:to:content-language:user-agent:mime-version:date
         :message-id:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=baWxWbCInGY7stkmdks81E7dLCHRp7Y+jb4EX+IK+hM=;
        b=y9BmYq9nmBrzUTP0o38y1JNg4wk67yjxn3hFVnaI87vWZ/VJP5bHrKukyIxdTHLz5I
         OQQW7SWrRfcmlK/V25f/PxvXyC7M6AyMa2WfdYABY0RiJOw43Ufcsb/ThjiZP0Y7jIdn
         4uvOTc5Des7Eh/0stvbMIGBStn1M6nouqELgUqcPlk36u0veiSZwtNwhvQw9zUhiu06q
         dCozbUQQAIL87vJ0tju7m3AHeN+FLqcUYEHbRQEdYiNJ6xQ2C2dfuIrpvy+5leWgsHGw
         awKsZ8gyAKeBLIDtroV7cHGDnf/31YdxDNH5TYkDC7c9+wGY4tiub0hN7N1aDMP1TQTb
         jJ2A==
X-Gm-Message-State: ANoB5pkMZfDZSDVc891fI3dt4BeTjFE9QWwUG881C5smHW8efF6nPiII
	1cuniAc9x28HLD0WnypMWumxGVzwCU4rbSUJyCeuWwgheZx+9ShQh4XGITekiILi2jCePo/hNQL
	FR3eOCOWb3MU=
X-Received: by 2002:a05:600c:3542:b0:3cf:6c2f:950c with SMTP id i2-20020a05600c354200b003cf6c2f950cmr60500605wmq.146.1670443843089;
        Wed, 07 Dec 2022 12:10:43 -0800 (PST)
X-Google-Smtp-Source: AA0mqf4zwrELXMIzIpNeWlphS4Moyt8c60RN3m7bVg8O7emdf3WdIK9ut84KFLcMr5WMA28nsglkiA==
X-Received: by 2002:a05:600c:3542:b0:3cf:6c2f:950c with SMTP id i2-20020a05600c354200b003cf6c2f950cmr60500590wmq.146.1670443842851;
        Wed, 07 Dec 2022 12:10:42 -0800 (PST)
Received: from ?IPV6:2003:cb:c702:2500:fe2d:7534:ffa4:c1e5? (p200300cbc7022500fe2d7534ffa4c1e5.dip0.t-ipconnect.de. [2003:cb:c702:2500:fe2d:7534:ffa4:c1e5])
        by smtp.gmail.com with ESMTPSA id q10-20020a05600c46ca00b003c70191f267sm2912613wmo.39.2022.12.07.12.10.41
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Wed, 07 Dec 2022 12:10:42 -0800 (PST)
Message-ID: <53e52007-e556-332d-ec4d-5fe48a90e9b0@redhat.com>
Date: Wed, 7 Dec 2022 21:10:41 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.5.0
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
 Ives van Hoorne <ives@codesandbox.io>, stable@vger.kernel.org,
 Andrew Morton <akpm@linux-foundation.org>, Hugh Dickins <hugh@veritas.com>,
 Alistair Popple <apopple@nvidia.com>, Mike Rapoport
 <rppt@linux.vnet.ibm.com>, Nadav Amit <nadav.amit@gmail.com>,
 Andrea Arcangeli <aarcange@redhat.com>
References: <20221202122748.113774-1-david@redhat.com> <Y4oo6cN1a4Yz5prh@x1n>
 <690afe0f-c9a0-9631-b365-d11d98fdf56f@redhat.com>
 <19800718-9cb6-9355-da1c-c7961b01e922@redhat.com> <Y45duzmGGUT0+u8t@x1n>
 <92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.com> <Y4+zw4JU7JMlDHbM@x1n>
 <5a626d30-ccc9-6be3-29f7-78f83afbe5c4@redhat.com> <Y5C4Zu9sDvZ7KiCk@x1n>
From: David Hildenbrand <david@redhat.com>
Organization: Red Hat
Subject: Re: [PATCH RFC] mm/userfaultfd: enable writenotify while
 userfaultfd-wp is enabled for a VMA
In-Reply-To: <Y5C4Zu9sDvZ7KiCk@x1n>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: E212F80008
X-Stat-Signature: u5znu5jcxzd3e6j19aise3y88iw4nxxh
X-Spamd-Result: default: False [-1.40 / 9.00];
	BAYES_HAM(-6.00)[100.00%];
	SORBS_IRL_BL(3.00)[209.85.221.72:received];
	SUSPICIOUS_RECIPS(1.50)[];
	RCVD_NO_TLS_LAST(0.10)[];
	MIME_GOOD(-0.10)[text/plain];
	BAD_REP_POLICIES(0.10)[];
	R_SPF_ALLOW(0.00)[+ip4:170.10.129.0/24];
	RCPT_COUNT_SEVEN(0.00)[11];
	DMARC_POLICY_ALLOW(0.00)[redhat.com,none];
	DKIM_TRACE(0.00)[redhat.com:+];
	ARC_NA(0.00)[];
	MIME_TRACE(0.00)[0:+];
	FROM_EQ_ENVFROM(0.00)[];
	MID_RHS_MATCH_FROM(0.00)[];
	TO_DN_SOME(0.00)[];
	TO_MATCH_ENVRCPT_SOME(0.00)[];
	HAS_ORG_HEADER(0.00)[];
	PREVIOUSLY_DELIVERED(0.00)[linux-mm@kvack.org];
	TAGGED_RCPT(0.00)[];
	RCVD_COUNT_THREE(0.00)[4];
	ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1];
	R_DKIM_ALLOW(0.00)[redhat.com:s=mimecast20190719];
	FROM_HAS_DN(0.00)[];
	RCVD_VIA_SMTP_AUTH(0.00)[]
X-Rspam-User: 
X-HE-Tag: 1670443847-794969
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

>> For example, libvhost-user.c in QEMU uses for ordinary postcopy:
>>
>>          /*
>>           * In postcopy we're using PROT_NONE here to catch anyone
>>           * accessing it before we userfault.
>>           */
>>          mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
>>                           PROT_NONE, MAP_SHARED | MAP_NORESERVE,
>>                           vmsg->fds[0], 0);
> 
> I assume this is for missing mode only.  More on wr-protect mode below.
> 
> Personally I don't see immediately on whether this is needed.  If the
> process itself is trusted then it should be under control of anyone who
> will be accessing the pages..  If the other threads are not trusted, then
> there's no way to stop anyone from mprotect(RW) after mprotect(NONE)
> anyway..

I think there is a difference between code that can read/write memory 
(e.g., rings/buffers in libvhost-user.c, where I think this was added to 
detect such early access) and code that can execute arbitrary mprotect() 
to voluntarily break the system. I think that's the whole reason 
libvhost-user.c went that direction.

> 
> So I may not really get the gut of it.
> 
> Another way to make sure no one access it is right after receiving the
> memory range from QEMU (VhostUserMemoryRegion), if VuDev.postcopy_listening
> is set, then we register the range with UFFD missing immediately.  After
> all if postcopy_listening is set it means we passed the advise phase
> already (VHOST_USER_POSTCOPY_ADVISE). Any potential access will be blocked
> until QEMU starts to read on that uffd.
> 
>>
>> I'd imagine, when using uffd-wp (VM snapshotting with shmem?) one might use
>> PROT_READ instead before the write-protection is properly set. Because read
>> access would be fine in the meantime.
> 
> It'll be different for wr-protect IIUC, because unlike missing protections,
> we don't worry about writes happening before UFFDIO_WRITEPROTECT.
> 
> IMHO the solo thing the vhost-user proc needs to do is one
> UFFDIO_WRITEPROTECT for each of the range when QEMU tells it to, then it'll
> be fine.  Pre-writes are fine.
> 
> Sorry I probably went a bit off-topic.  I just want to make sure I don't
> miss any real use case of having mprotect being useful for uffd-wp being
> there, because that used to be a grey area for me.
> 
>>
>> But I'm just pulling use cases out of my magic hat ;) Nothing stops user
>> space from doing things that are not clearly forbidden (well, even then
>> users might complain, but that's a different story).
> 
> Yes, I think those are always fine but the user just cannot assume it'll
> work as they assumed how it will work.
> 
> If "doing things that are not clearly forbidden" triggers a host warning or
> crash that's a bug, OTOH if the outcome is limited to the process itself
> then from kernel pov I think we're good.  I used to even thought about
> forbid mprotect() on uffd-wp but I'm not sure whether it's good idea either.
> 
> Let's see whether I missed something above, if so I'll rethink.

Let's not get distracted too much. As a reminder, I wrote that test case 
to showcase that other kernel code behaves just like the migration code 
does. It was the long hanging fruit to make a point, I'm happy to 
exclude it for now.


Now, my 2 cents on the whole topic regarding "supported", "not 
supported" etc:

(1) If something is not supported we should bail out or at least warn
     the user. I'm pretty sure there are other uffd-wp dummy users like
     me. Skimming over the man userfaultfd page nothing in particular
     regarding PROT_WRITE, mprotect(), ... maybe I looked at the wrong
     page.
(2) If something is easy to support, support it instead of having all
     these surprises for users and having to document them and warn the
     user. Makes all these discussions regarding what's supported, what's
     a valid use case, etc ... much easier.
(3) Inconsistency confuses users. If something used to work for anon,
     in an ideal world, we make shmem behave in a similar, non-surprising
     way.
(4) There are much smarter people like me with much more advanced
     magical hats. I'm pretty sure they will come up with use cases that
     I am not even able to anticipate right now.
(5) Users will make any imaginable mistake possible and point at the
     doc, that nothing speaks against it and that the system didn't bail
     out.

Again, just my 2 cents. Maybe the dos and don'ts of userfaultfd-wp are 
properly documented already and we just don't bail out.

-- 
Thanks,

David / dhildenb