From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Peter Xu <peterx@redhat.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Axel Rasmussen <axelrasmussen@google.com>,
David Hildenbrand <david@redhat.com>,
Nadav Amit <nadav.amit@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.6 49/67] mm/page_table_check: support userfault wr-protect entries
Date: Thu, 15 Aug 2024 15:26:03 +0200 [thread overview]
Message-ID: <20240815131840.196517595@linuxfoundation.org> (raw)
In-Reply-To: <20240815131838.311442229@linuxfoundation.org>
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Xu <peterx@redhat.com>
[ Upstream commit 8430557fc584657559bfbd5150b6ae1bb90f35a0 ]
Allow page_table_check hooks to check over userfaultfd wr-protect criteria
upon pgtable updates. The rule is no co-existance allowed for any
writable flag against userfault wr-protect flag.
This should be better than c2da319c2e, where we used to only sanitize such
issues during a pgtable walk, but when hitting such issue we don't have a
good chance to know where does that writable bit came from [1], so that
even the pgtable walk exposes a kernel bug (which is still helpful on
triaging) but not easy to track and debug.
Now we switch to track the source. It's much easier too with the recent
introduction of page table check.
There are some limitations with using the page table check here for
userfaultfd wr-protect purpose:
- It is only enabled with explicit enablement of page table check configs
and/or boot parameters, but should be good enough to track at least
syzbot issues, as syzbot should enable PAGE_TABLE_CHECK[_ENFORCED] for
x86 [1]. We used to have DEBUG_VM but it's now off for most distros,
while distros also normally not enable PAGE_TABLE_CHECK[_ENFORCED], which
is similar.
- It conditionally works with the ptep_modify_prot API. It will be
bypassed when e.g. XEN PV is enabled, however still work for most of the
rest scenarios, which should be the common cases so should be good
enough.
- Hugetlb check is a bit hairy, as the page table check cannot identify
hugetlb pte or normal pte via trapping at set_pte_at(), because of the
current design where hugetlb maps every layers to pte_t... For example,
the default set_huge_pte_at() can invoke set_pte_at() directly and lose
the hugetlb context, treating it the same as a normal pte_t. So far it's
fine because we have huge_pte_uffd_wp() always equals to pte_uffd_wp() as
long as supported (x86 only). It'll be a bigger problem when we'll
define _PAGE_UFFD_WP differently at various pgtable levels, because then
one huge_pte_uffd_wp() per-arch will stop making sense first.. as of now
we can leave this for later too.
This patch also removes commit c2da319c2e altogether, as we have something
better now.
[1] https://lore.kernel.org/all/000000000000dce0530615c89210@google.com/
Link: https://lkml.kernel.org/r/20240417212549.2766883-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Documentation/mm/page_table_check.rst | 9 +++++++-
arch/x86/include/asm/pgtable.h | 18 +---------------
mm/page_table_check.c | 30 +++++++++++++++++++++++++++
3 files changed, 39 insertions(+), 18 deletions(-)
diff --git a/Documentation/mm/page_table_check.rst b/Documentation/mm/page_table_check.rst
index c12838ce6b8de..c59f22eb6a0f9 100644
--- a/Documentation/mm/page_table_check.rst
+++ b/Documentation/mm/page_table_check.rst
@@ -14,7 +14,7 @@ Page table check performs extra verifications at the time when new pages become
accessible from the userspace by getting their page table entries (PTEs PMDs
etc.) added into the table.
-In case of detected corruption, the kernel is crashed. There is a small
+In case of most detected corruption, the kernel is crashed. There is a small
performance and memory overhead associated with the page table check. Therefore,
it is disabled by default, but can be optionally enabled on systems where the
extra hardening outweighs the performance costs. Also, because page table check
@@ -22,6 +22,13 @@ is synchronous, it can help with debugging double map memory corruption issues,
by crashing kernel at the time wrong mapping occurs instead of later which is
often the case with memory corruptions bugs.
+It can also be used to do page table entry checks over various flags, dump
+warnings when illegal combinations of entry flags are detected. Currently,
+userfaultfd is the only user of such to sanity check wr-protect bit against
+any writable flags. Illegal flag combinations will not directly cause data
+corruption in this case immediately, but that will cause read-only data to
+be writable, leading to corrupt when the page content is later modified.
+
Double mapping detection logic
==============================
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index e02b179ec6598..d03fe4fb41f43 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -387,23 +387,7 @@ static inline pte_t pte_wrprotect(pte_t pte)
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
static inline int pte_uffd_wp(pte_t pte)
{
- bool wp = pte_flags(pte) & _PAGE_UFFD_WP;
-
-#ifdef CONFIG_DEBUG_VM
- /*
- * Having write bit for wr-protect-marked present ptes is fatal,
- * because it means the uffd-wp bit will be ignored and write will
- * just go through.
- *
- * Use any chance of pgtable walking to verify this (e.g., when
- * page swapped out or being migrated for all purposes). It means
- * something is already wrong. Tell the admin even before the
- * process crashes. We also nail it with wrong pgtable setup.
- */
- WARN_ON_ONCE(wp && pte_write(pte));
-#endif
-
- return wp;
+ return pte_flags(pte) & _PAGE_UFFD_WP;
}
static inline pte_t pte_mkuffd_wp(pte_t pte)
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 6363f93a47c69..509c6ef8de400 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -7,6 +7,8 @@
#include <linux/kstrtox.h>
#include <linux/mm.h>
#include <linux/page_table_check.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
#undef pr_fmt
#define pr_fmt(fmt) "page_table_check: " fmt
@@ -191,6 +193,22 @@ void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
}
EXPORT_SYMBOL(__page_table_check_pud_clear);
+/* Whether the swap entry cached writable information */
+static inline bool swap_cached_writable(swp_entry_t entry)
+{
+ return is_writable_device_exclusive_entry(entry) ||
+ is_writable_device_private_entry(entry) ||
+ is_writable_migration_entry(entry);
+}
+
+static inline void page_table_check_pte_flags(pte_t pte)
+{
+ if (pte_present(pte) && pte_uffd_wp(pte))
+ WARN_ON_ONCE(pte_write(pte));
+ else if (is_swap_pte(pte) && pte_swp_uffd_wp(pte))
+ WARN_ON_ONCE(swap_cached_writable(pte_to_swp_entry(pte)));
+}
+
void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
unsigned int nr)
{
@@ -199,6 +217,8 @@ void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
if (&init_mm == mm)
return;
+ page_table_check_pte_flags(pte);
+
for (i = 0; i < nr; i++)
__page_table_check_pte_clear(mm, ptep_get(ptep + i));
if (pte_user_accessible_page(pte))
@@ -206,11 +226,21 @@ void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
}
EXPORT_SYMBOL(__page_table_check_ptes_set);
+static inline void page_table_check_pmd_flags(pmd_t pmd)
+{
+ if (pmd_present(pmd) && pmd_uffd_wp(pmd))
+ WARN_ON_ONCE(pmd_write(pmd));
+ else if (is_swap_pmd(pmd) && pmd_swp_uffd_wp(pmd))
+ WARN_ON_ONCE(swap_cached_writable(pmd_to_swp_entry(pmd)));
+}
+
void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd)
{
if (&init_mm == mm)
return;
+ page_table_check_pmd_flags(pmd);
+
__page_table_check_pmd_clear(mm, *pmdp);
if (pmd_user_accessible_page(pmd)) {
page_table_check_set(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT,
--
2.43.0
next prev parent reply other threads:[~2024-08-15 14:09 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-15 13:25 [PATCH 6.6 00/67] 6.6.47-rc1 review Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 01/67] exec: Fix ToCToU between perm check and set-uid/gid usage Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 02/67] ASoC: topology: Clean up route loading Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 03/67] ASoC: topology: Fix route memory corruption Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 04/67] LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 05/67] NFSD: Rewrite synopsis of nfsd_percpu_counters_init() Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 06/67] NFSD: Fix frame size warning in svc_export_parse() Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 07/67] sunrpc: dont change ->sv_stats if it doesnt exist Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 08/67] nfsd: stop setting ->pg_stats for unused stats Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 09/67] sunrpc: pass in the sv_stats struct through svc_create_pooled Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 10/67] sunrpc: remove ->pg_stats from svc_program Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 11/67] sunrpc: use the struct net as the svc proc private Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 12/67] nfsd: rename NFSD_NET_* to NFSD_STATS_* Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 13/67] nfsd: expose /proc/net/sunrpc/nfsd in net namespaces Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 14/67] nfsd: make all of the nfsd stats per-network namespace Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 15/67] nfsd: remove nfsd_stats, make th_cnt a global counter Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 16/67] nfsd: make svc_stat per-network namespace instead of global Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 17/67] mm: gup: stop abusing try_grab_folio Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 18/67] nvme/pci: Add APST quirk for Lenovo N60z laptop Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 19/67] genirq/cpuhotplug: Skip suspended interrupts when restoring affinity Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 20/67] genirq/cpuhotplug: Retry with cpu_online_mask when migration fails Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 21/67] cgroup: Make operations on the cgroup root_list RCU safe Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 22/67] tcp_metrics: optimize tcp_metrics_flush_all() Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 23/67] wifi: mac80211: take wiphy lock for MAC addr change Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 24/67] wifi: mac80211: fix change_address deadlock during unregister Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 25/67] fs: Convert to bdev_open_by_dev() Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 26/67] jfs: " Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 27/67] jfs: fix log->bdev_handle null ptr deref in lbmStartIO Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 28/67] net: dont dump stack on queue timeout Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 29/67] jfs: fix shift-out-of-bounds in dbJoin Greg Kroah-Hartman
2024-08-15 14:13 ` Dave Kleikamp
2024-08-15 14:19 ` Greg Kroah-Hartman
2024-08-15 16:24 ` Dave Kleikamp
2024-08-15 13:25 ` [PATCH 6.6 30/67] squashfs: squashfs_read_data need to check if the length is 0 Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 31/67] Squashfs: fix variable overflow triggered by sysbot Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 32/67] reiserfs: fix uninit-value in comp_keys Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 33/67] erofs: avoid debugging output for (de)compressed data Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 34/67] net: tls, add test to capture error on large splice Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 35/67] Input: bcm5974 - check endpoint type before starting traffic Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 36/67] quota: Detect loops in quota tree Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 37/67] net:rds: Fix possible deadlock in rds_message_put Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 38/67] net: sctp: fix skb leak in sctp_inq_free() Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 39/67] pppoe: Fix memory leak in pppoe_sendmsg() Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 40/67] bpf: Replace bpf_lpm_trie_key 0-length array with flexible array Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 41/67] bpf: Avoid kfree_rcu() under lock in bpf_lpm_trie Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 42/67] fs: Annotate struct file_handle with __counted_by() and use struct_size() Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 43/67] mISDN: fix MISDN_TIME_STAMP handling Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 44/67] net: add copy_safe_from_sockptr() helper Greg Kroah-Hartman
2024-08-15 13:25 ` [PATCH 6.6 45/67] nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 46/67] Bluetooth: RFCOMM: Fix not validating setsockopt user input Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 47/67] ext4: fold quota accounting into ext4_xattr_inode_lookup_create() Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 48/67] ext4: do not create EA inode under buffer lock Greg Kroah-Hartman
2024-08-15 13:26 ` Greg Kroah-Hartman [this message]
2024-08-15 13:26 ` [PATCH 6.6 50/67] wifi: cfg80211: restrict NL80211_ATTR_TXQ_QUANTUM values Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 51/67] ext4: convert ext4_da_do_write_end() to take a folio Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 52/67] ext4: sanity check for NULL pointer after ext4_force_shutdown Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 53/67] bpf, net: Use DEV_STAT_INC() Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 54/67] f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GC Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 55/67] f2fs: fix to cover read extent cache access with lock Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 56/67] fou: remove warn in gue_gro_receive on unsupported protocol Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 57/67] jfs: fix null ptr deref in dtInsertEntry Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 58/67] jfs: Fix shift-out-of-bounds in dbDiscardAG Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 59/67] fs/ntfs3: Do copy_to_user out of run_lock Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 60/67] ALSA: usb: Fix UBSAN warning in parse_audio_unit() Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 61/67] binfmt_flat: Fix corruption when not offsetting data start Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 62/67] Revert "jfs: fix shift-out-of-bounds in dbJoin" Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 63/67] Revert "Input: bcm5974 - check endpoint type before starting traffic" Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 64/67] mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 65/67] cgroup: Move rcu_head up near the top of cgroup_root Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 66/67] KVM: arm64: Dont defer TLB invalidation when zapping table entries Greg Kroah-Hartman
2024-08-15 13:26 ` [PATCH 6.6 67/67] KVM: arm64: Dont pass a TLBI level hint " Greg Kroah-Hartman
2024-08-15 19:35 ` [PATCH 6.6 00/67] 6.6.47-rc1 review ChromeOS Kernel Stable Merge
2024-08-15 19:46 ` Peter Schneider
2024-08-15 21:59 ` Florian Fainelli
2024-08-16 8:47 ` Anders Roxell
2024-08-16 11:24 ` Mark Brown
2024-08-16 11:56 ` Takeshi Ogasawara
2024-08-16 19:47 ` Jon Hunter
2024-08-16 20:40 ` Ron Economos
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240815131840.196517595@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=nadav.amit@gmail.com \
--cc=pasha.tatashin@soleen.com \
--cc=patches@lists.linux.dev \
--cc=peterx@redhat.com \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox