From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 122A3FF8860 for ; Mon, 27 Apr 2026 11:47:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F39B6B00A7; Mon, 27 Apr 2026 07:47:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CAF16B00A8; Mon, 27 Apr 2026 07:47:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E1D86B00A9; Mon, 27 Apr 2026 07:47:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 59CBE6B00A7 for ; Mon, 27 Apr 2026 07:47:24 -0400 (EDT) Received: from smtpin17.hostedemail.com (lb01b-stub [10.200.18.250]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 21E6A87B06 for ; Mon, 27 Apr 2026 11:47:24 +0000 (UTC) X-FDA: 84704160408.17.59E5FD9 Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf22.hostedemail.com (Postfix) with ESMTP id 2391EC000F for ; Mon, 27 Apr 2026 11:47:21 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=K8d35II0; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf22.hostedemail.com: domain of kas@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=kas@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1777290442; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=631aALAauK4wXNXMVFHX+iW3ceTeWlitWCFE4McJwHE=; b=XDnt9Z7ZEJqApUn/hiSaIiBM30qKgoNXDqwHnQEOuKY/dEqnWAaJ/rlfiKDBqRilV2Bbls U5L9n8Kiu0VEC1pwnG7UCK+WitZo5aW7bcBqxRUlrZsD4Tv1IZ1wpzkRlo6CBZUvAhps3v 4Tc240o4eBi9a6shFVN69+ygg5QfEiQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1777290442; a=rsa-sha256; cv=none; b=nOpgc60zePwBHgo6HuAEL1BX7cIzV9oKckm0h9k7xDZp0uGqvI5s7LkamvYctgoM3hnEwg JcFy9gap8JyTo6zJDrC4pyfIx83ThyFiEXkWTC1pFRMlJbzI4hs+j9yB+yRQlvD70yuGLI uLSAugJpKt0hJpZcLGtDWSjtTnXvTNA= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=K8d35II0; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf22.hostedemail.com: domain of kas@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=kas@kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 7981161141; Mon, 27 Apr 2026 11:47:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 11FB8C4AF0B; Mon, 27 Apr 2026 11:47:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777290441; bh=TYh+lzt4CSzVH+dituEGydlocLDeNWGX/CjLXbJ/xBo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=K8d35II0asogLscUtJhwKJcBw1PhivxgxnNTqg3W+N9ohIY/185nFYsRMBXK/iqrl FoEWsRLn6EYrIcPlnTroHgo9GbD8TdYAoV6DCHhi072VtPHcElrXRY2gvQx2yNx+XX W/gTG+eVvx2cILakpyj/MOnucTjHix+jsqsNKxDFi3QQYb8ZxzXY6b99dW9JHkz5vW evsTPgzUtaCPlKmU8wh9YKGIk2XNdLN381YFD/LwmSrSGa+1bodxE6JkA5Y1LOKJC9 q4ThUd9GnPpmj8uJRJzDjMqSV2aJO8YXdwuqAPhuQZV628KR4KPiqvrTFmI0lVI+3f 1Bz9O2tApaZag== Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfauth.phl.internal (Postfix) with ESMTP id 3CC84F40069; Mon, 27 Apr 2026 07:47:19 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Mon, 27 Apr 2026 07:47:19 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdejkeeiudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpuffrtefokffrpgfnqfghnecuuegr ihhlohhuthemuceftddtnecunecujfgurhephffvvefufffkofgjfhggtgfgsehtkeertd ertdejnecuhfhrohhmpedfmfhirhihlhcuufhhuhhtshgvmhgruhculdfovghtrgdmfdcu oehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvghrnhephfdvfedvveejve ehhffhvedufedujeefuddvkeehleduhfeihfehudejffffiefgnecuvehluhhsthgvrhfu ihiivgepvdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirhhilhhlodhmvghsmhhtph gruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheehqddvkeeggeegjedvkedq khgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrdhnrghmvgdpnhgspghrtg hpthhtohepvdegpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopegrkhhpmheslhhi nhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoheprhhpphhtsehkvghrnh gvlhdrohhrghdprhgtphhtthhopehpvghtvghrgiesrhgvughhrghtrdgtohhmpdhrtghp thhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhjsheskhgvrh hnvghlrdhorhhgpdhrtghpthhtohepshhurhgvnhgssehgohhoghhlvgdrtghomhdprhgt phhtthhopehvsggrsghkrgeskhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhirghmrd hhohiflhgvthhtsehorhgrtghlvgdrtghomhdprhgtphhtthhopeiiihihsehnvhhiughi rgdrtghomh X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 27 Apr 2026 07:47:17 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH 08/14] userfaultfd: add UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT plumbing Date: Mon, 27 Apr 2026 12:45:56 +0100 Message-ID: <20260427114607.4068647-9-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260427114607.4068647-1-kas@kernel.org> References: <20260427114607.4068647-1-kas@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 2391EC000F X-Stat-Signature: kptip4dq4648x3gcyjkotp6eh8jr5oku X-Rspam-User: X-HE-Tag: 1777290441-765057 X-HE-Meta: U2FsdGVkX1/3uruuH2coEJ3WchJpheOm57tpEWmu7idDpaWLQ5O482Y6RYGDzpMEyIlOrn/noMXVqwjCEqnlV215Sbpunw7MYm4pMxVj6lpiz5peuxijLmtdn4FZs6HsbNE+uGqmjaP7ZpjhW4eTX1UJdSpesQslNZgx3p4U/EUcUN6Uhx/BlazbQ8YbPxTO0nDuvumQg0ge6RvbLrOyn4xNuiewfqTOQNwiRw9tyBjj8+xzLCTovIWaoTsFkOQYVI50EUGaL2jxSmHXe3QhQLj5JwZr6qeDeDZHxivjm5HO7/Cwl0ISLPn9DkrkkC6IxZ9QhAHjORHPBqbEXukYvIk/hYUYt2PULQqI7QDGGcGyGmUnkGb3RSpioxaAlEX5KU5WGsWS69h9qViEG8+kaI8L/U3io260/sCMSWqIH1fDP2MMn4PwnU9Lv69Dk7/iD55EFgBXlFjFRHqgeyeXTHTaTpxQK6gvQmNpsEsnL631W+xo44Ya4FybRej79aodoe/Qh1nOhvXQttFMZhy0rlzBxqEarO+sJ/tQYeRjmFtkyz3/GKB9ZK/GPU7c0BAOm0zFFsbI8mrMW6hnETib7iwYVWShPK5xSCJgQOLhJZFPxHx33P6JxdjQAa1/80YOGqCSF4MzWzV76dr89IicDAAskOfj/u2UzdLEXcKB+WLWPdGqMV6OV3VPPFMXLZZ636GI2FN4qDmU0HT5YvnJv21k9km/Q5wrV5R7E8tQlhG1LtXNbdWjwCTIVbmLkipIiHG1iPTiG55qpTSUzeTo/MarXyGqbdODEY1L2XJWPexFV4gtR1gIk0bUdbpPBD8nrEUm1a/iqtWaHYXN6qQ4aYkEr2XhXVieCjzXGL9jDd4Vw0/30gC8NeAfY2tHO6wcvH2JJYhhtnysEByYlb3ILS3sTOtYvraYU3Gz7N5kOc726OzELFFGyH2dEXIaDByEPgUUDdlmt2JFB7I9Faa KkLi0flo VtFd9kBj6EAximPsA4ceEu0np9OBJN6Crj+RsFwiu2oniZzCaZNUYG+xmHfC+uVffkzrHmr2Hs3Cw91bgu80HlcgMpeVSYTIsCSwG3GzLZ1W6kV6psaX5V+D6rQ2d2NkjLYul0bRdfoNLx71fTpvrd+nGR4li3f0y8yeyh0oNQ/Ume2iRaAINodl74jdjREI8j0PtUEeDTZro7f/rLVxFuz/s3GHBSEzeu/ZGRm48idP/FPCCNCrvLu8qltBfmTk2DTfRJl0Ovz4yVa4eR71HxIL/FIV/es96YfXmBpMPgPvnVsqgKM9bqf6wk5DuVofZWkLiqCFOt1V7+mOdsk65pWfAHw== Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add the userspace interface for read-write protection tracking: - UFFDIO_REGISTER_MODE_RWP register a range for RWP tracking - UFFD_FEATURE_RWP capability bit - UFFDIO_RWPROTECT install / remove RWP on a range Registration sets VM_UFFD_RWP on the VMA. Combining MODE_WP with MODE_RWP is rejected because both modes claim the uffd PTE bit. UFFDIO_RWPROTECT is the bidirectional counterpart of UFFDIO_WRITEPROTECT: - MODE_RWP change_protection() with MM_CP_UFFD_RWP installs PAGE_NONE and sets the uffd bit on present PTEs - !MODE_RWP change_protection() with MM_CP_UFFD_RWP_RESOLVE restores vma->vm_page_prot and clears the bit userfaultfd_clear_vma() runs the same resolve pass on unregister so RWP state cannot outlive the uffd. Re-registering a range must not drop a mode that installs per-PTE markers (WP or RWP); doing so returns -EBUSY. This also closes a pre-existing window where re-registering without MODE_WP would strand uffd-wp markers: before, those caused extra write-faults but were otherwise benign; with RWP preservation in place, a subsequent mprotect() on a VM_UFFD_RWP VMA would silently promote the stale markers to RWP. The feature is not yet advertised. UFFDIO_REGISTER_MODE_RWP, UFFD_FEATURE_RWP, and _UFFDIO_RWPROTECT are intentionally absent from UFFD_API_REGISTER_MODES, UFFD_API_FEATURES, and UFFD_API_RANGE_IOCTLS, so UFFDIO_API masks them out and the register-mode validator rejects the bit. The follow-up patch adds fault dispatch and exposes the UAPI. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 --- Documentation/admin-guide/mm/userfaultfd.rst | 10 +++ fs/userfaultfd.c | 84 +++++++++++++++++++ include/linux/userfaultfd_k.h | 2 + include/uapi/linux/userfaultfd.h | 19 +++++ mm/userfaultfd.c | 88 +++++++++++++++++++- 5 files changed, 200 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index e5cc8848dcb3..1e533639fd50 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -131,6 +131,16 @@ userfaults on the range registered. Not all ioctls will necessarily be supported for all memory types (e.g. anonymous memory vs. shmem vs. hugetlbfs), or all types of intercepted faults. +.. note:: + + Re-registering an already-registered range must not drop any of the + modes that install per-PTE markers — currently + ``UFFDIO_REGISTER_MODE_WP`` and ``UFFDIO_REGISTER_MODE_RWP``. Doing + so would strand markers with no flag to describe them, so the call + is rejected with ``-EBUSY``; userspace must issue + ``UFFDIO_UNREGISTER`` first. This differs from older kernels, which + silently replaced the mode bits on re-registration. + Userland can use the ``uffdio_register.ioctls`` to manage the virtual address space in the background (to add or potentially also remove memory from the ``userfaultfd`` registered range). This means a userfault diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 0fdf28f62702..f2097c558165 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -215,6 +215,8 @@ static inline struct uffd_msg userfault_msg(unsigned long address, msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WRITE; if (reason & VM_UFFD_WP) msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WP; + if (reason & VM_UFFD_RWP) + msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_RWP; if (reason & VM_UFFD_MINOR) msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_MINOR; if (features & UFFD_FEATURE_THREAD_ID) @@ -1292,6 +1294,22 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, vm_flags |= VM_UFFD_WP; } + if (uffdio_register.mode & UFFDIO_REGISTER_MODE_RWP) { + if (!pgtable_supports_uffd() || VM_UFFD_RWP == VM_NONE) + goto out; + if (!(ctx->features & UFFD_FEATURE_RWP)) + goto out; + vm_flags |= VM_UFFD_RWP; + } + + /* + * WP and RWP share the uffd PTE bit and + * cannot coexist in the same VMA — the bit would carry ambiguous + * semantics. Reject the combination up front. + */ + if ((vm_flags & VM_UFFD_WP) && (vm_flags & VM_UFFD_RWP)) + goto out; + if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR) { #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR goto out; @@ -1385,6 +1403,16 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, cur->vm_userfaultfd_ctx.ctx != ctx) goto out_unlock; + /* + * Mode switches that drop VM_UFFD_WP or VM_UFFD_RWP would + * leave PTE markers without the flag that describes them; + * subsequent mprotect() would then promote stale markers + * into the other mode. Require an unregister first. + */ + if (cur->vm_userfaultfd_ctx.ctx == ctx && + cur->vm_flags & (VM_UFFD_WP | VM_UFFD_RWP) & ~vm_flags) + goto out_unlock; + /* * Note vmas containing huge pages */ @@ -1418,6 +1446,10 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR)) ioctls_out &= ~((__u64)1 << _UFFDIO_CONTINUE); + /* RWPROTECT is only supported for RWP ranges */ + if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_RWP)) + ioctls_out &= ~((__u64)1 << _UFFDIO_RWPROTECT); + /* * Now that we scanned all vmas we can already tell * userland which ioctls methods are guaranteed to @@ -1765,6 +1797,55 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return ret; } +static int userfaultfd_rwprotect(struct userfaultfd_ctx *ctx, + unsigned long arg) +{ + int ret; + struct uffdio_rwprotect uffdio_rwp; + struct userfaultfd_wake_range range; + bool mode_rwp, mode_dontwake; + + if (atomic_read(&ctx->mmap_changing)) + return -EAGAIN; + + if (copy_from_user(&uffdio_rwp, (void __user *)arg, + sizeof(uffdio_rwp))) + return -EFAULT; + + ret = validate_range(ctx->mm, uffdio_rwp.range.start, + uffdio_rwp.range.len); + if (ret) + return ret; + + if (uffdio_rwp.mode & ~(UFFDIO_RWPROTECT_MODE_DONTWAKE | + UFFDIO_RWPROTECT_MODE_RWP)) + return -EINVAL; + + mode_rwp = uffdio_rwp.mode & UFFDIO_RWPROTECT_MODE_RWP; + mode_dontwake = uffdio_rwp.mode & UFFDIO_RWPROTECT_MODE_DONTWAKE; + + if (mode_rwp && mode_dontwake) + return -EINVAL; + + if (mmget_not_zero(ctx->mm)) { + ret = mrwprotect_range(ctx, uffdio_rwp.range.start, + uffdio_rwp.range.len, mode_rwp); + mmput(ctx->mm); + } else { + return -ESRCH; + } + + if (ret) + return ret; + + if (!mode_rwp && !mode_dontwake) { + range.start = uffdio_rwp.range.start; + range.len = uffdio_rwp.range.len; + wake_userfault(ctx, &range); + } + return ret; +} + static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) { __s64 ret; @@ -2071,6 +2152,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd, case UFFDIO_POISON: ret = userfaultfd_poison(ctx, arg); break; + case UFFDIO_RWPROTECT: + ret = userfaultfd_rwprotect(ctx, arg); + break; } return ret; } diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 3725e61a7041..3dfcdc3a9b98 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -162,6 +162,8 @@ extern int mwriteprotect_range(struct userfaultfd_ctx *ctx, unsigned long start, unsigned long len, bool enable_wp); extern long uffd_wp_range(struct vm_area_struct *vma, unsigned long start, unsigned long len, bool enable_wp); +extern int mrwprotect_range(struct userfaultfd_ctx *ctx, unsigned long start, + unsigned long len, bool enable_rwp); /* move_pages */ void double_pt_lock(spinlock_t *ptl1, spinlock_t *ptl2); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 2841e4ea8f2c..7b78aa3b5318 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -79,6 +79,7 @@ #define _UFFDIO_WRITEPROTECT (0x06) #define _UFFDIO_CONTINUE (0x07) #define _UFFDIO_POISON (0x08) +#define _UFFDIO_RWPROTECT (0x09) #define _UFFDIO_API (0x3F) /* userfaultfd ioctl ids */ @@ -103,6 +104,8 @@ struct uffdio_continue) #define UFFDIO_POISON _IOWR(UFFDIO, _UFFDIO_POISON, \ struct uffdio_poison) +#define UFFDIO_RWPROTECT _IOWR(UFFDIO, _UFFDIO_RWPROTECT, \ + struct uffdio_rwprotect) /* read() structure */ struct uffd_msg { @@ -158,6 +161,7 @@ struct uffd_msg { #define UFFD_PAGEFAULT_FLAG_WRITE (1<<0) /* If this was a write fault */ #define UFFD_PAGEFAULT_FLAG_WP (1<<1) /* If reason is VM_UFFD_WP */ #define UFFD_PAGEFAULT_FLAG_MINOR (1<<2) /* If reason is VM_UFFD_MINOR */ +#define UFFD_PAGEFAULT_FLAG_RWP (1<<3) /* If reason is VM_UFFD_RWP */ struct uffdio_api { /* userland asks for an API number and the features to enable */ @@ -230,6 +234,11 @@ struct uffdio_api { * * UFFD_FEATURE_MOVE indicates that the kernel supports moving an * existing page contents from userspace. + * + * UFFD_FEATURE_RWP indicates that the kernel supports + * UFFDIO_REGISTER_MODE_RWP for read-write protection tracking. + * Pages are made inaccessible via UFFDIO_RWPROTECT and faults + * are delivered when the pages are re-accessed. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -248,6 +257,7 @@ struct uffdio_api { #define UFFD_FEATURE_POISON (1<<14) #define UFFD_FEATURE_WP_ASYNC (1<<15) #define UFFD_FEATURE_MOVE (1<<16) +#define UFFD_FEATURE_RWP (1<<17) __u64 features; __u64 ioctls; @@ -263,6 +273,7 @@ struct uffdio_register { #define UFFDIO_REGISTER_MODE_MISSING ((__u64)1<<0) #define UFFDIO_REGISTER_MODE_WP ((__u64)1<<1) #define UFFDIO_REGISTER_MODE_MINOR ((__u64)1<<2) +#define UFFDIO_REGISTER_MODE_RWP ((__u64)1<<3) __u64 mode; /* @@ -356,6 +367,14 @@ struct uffdio_poison { __s64 updated; }; +struct uffdio_rwprotect { + struct uffdio_range range; + /* !RWP means undo RWP-protection */ +#define UFFDIO_RWPROTECT_MODE_RWP ((__u64)1<<0) +#define UFFDIO_RWPROTECT_MODE_DONTWAKE ((__u64)1<<1) + __u64 mode; +}; + struct uffdio_move { __u64 dst; __u64 src; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index d4a1d340dab3..facc2048bf07 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1072,6 +1072,67 @@ int mwriteprotect_range(struct userfaultfd_ctx *ctx, unsigned long start, return err; } +int mrwprotect_range(struct userfaultfd_ctx *ctx, unsigned long start, + unsigned long len, bool enable_rwp) +{ + struct mm_struct *dst_mm = ctx->mm; + unsigned long end = start + len; + struct vm_area_struct *dst_vma; + unsigned int mm_cp_flags; + struct mmu_gather tlb; + long err; + VMA_ITERATOR(vmi, dst_mm, start); + + VM_WARN_ON_ONCE(start & ~PAGE_MASK); + VM_WARN_ON_ONCE(len & ~PAGE_MASK); + VM_WARN_ON_ONCE(start + len <= start); + + guard(mmap_read_lock)(dst_mm); + guard(rwsem_read)(&ctx->map_changing_lock); + + if (atomic_read(&ctx->mmap_changing)) + return -EAGAIN; + + if (enable_rwp) + mm_cp_flags = MM_CP_UFFD_RWP; + else + mm_cp_flags = MM_CP_UFFD_RWP_RESOLVE | MM_CP_TRY_CHANGE_WRITABLE; + + /* + * Pre-scan the range: validate every spanned VMA before applying + * any change_protection() so a partial failure cannot leave the + * process with only a prefix of the range re-protected. + */ + err = -ENOENT; + for_each_vma_range(vmi, dst_vma, end) { + if (!userfaultfd_rwp(dst_vma)) + return -ENOENT; + + if (is_vm_hugetlb_page(dst_vma)) { + unsigned long page_mask; + + page_mask = vma_kernel_pagesize(dst_vma) - 1; + if ((start & page_mask) || (len & page_mask)) + return -EINVAL; + } + err = 0; + } + if (err) + return err; + + vma_iter_set(&vmi, start); + tlb_gather_mmu(&tlb, dst_mm); + for_each_vma_range(vmi, dst_vma, end) { + unsigned long vma_start = max(dst_vma->vm_start, start); + unsigned long vma_end = min(dst_vma->vm_end, end); + + change_protection(&tlb, dst_vma, vma_start, vma_end, + mm_cp_flags); + } + tlb_finish_mmu(&tlb); + + return 0; +} void double_pt_lock(spinlock_t *ptl1, spinlock_t *ptl2) @@ -2109,9 +2170,22 @@ struct vm_area_struct *userfaultfd_clear_vma(struct vma_iterator *vmi, if (start == vma->vm_start && end == vma->vm_end) give_up_on_oom = true; - /* Reset ptes for the whole vma range if wr-protected */ - if (userfaultfd_wp(vma)) - uffd_wp_range(vma, start, end - start, false); + /* Clear the uffd bit and/or restore protnone PTEs */ + if (userfaultfd_protected(vma)) { + unsigned int mm_cp_flags = 0; + struct mmu_gather tlb; + + if (userfaultfd_wp(vma)) + mm_cp_flags |= MM_CP_UFFD_WP_RESOLVE; + if (userfaultfd_rwp(vma)) + mm_cp_flags |= MM_CP_UFFD_RWP_RESOLVE; + if (vma_wants_manual_pte_write_upgrade(vma)) + mm_cp_flags |= MM_CP_TRY_CHANGE_WRITABLE; + + tlb_gather_mmu(&tlb, vma->vm_mm); + change_protection(&tlb, vma, start, end, mm_cp_flags); + tlb_finish_mmu(&tlb); + } ret = vma_modify_flags_uffd(vmi, prev, vma, start, end, &new_vma_flags, NULL_VM_UFFD_CTX, @@ -2160,6 +2234,14 @@ int userfaultfd_register_range(struct userfaultfd_ctx *ctx, vma_test_all_mask(vma, vma_flags)) goto skip; + /* + * Pre-scan in userfaultfd_register() already rejected mode + * switches that would drop VM_UFFD_WP or VM_UFFD_RWP, so a + * stray bit here is a bug. + */ + VM_WARN_ON_ONCE(vma->vm_userfaultfd_ctx.ctx == ctx && + vma->vm_flags & (VM_UFFD_WP | VM_UFFD_RWP) & ~vm_flags); + if (vma->vm_start > start) start = vma->vm_start; vma_end = min(end, vma->vm_end); -- 2.51.2