From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 403F5C83F1B for ; Thu, 17 Jul 2025 11:52:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C7288D000C; Thu, 17 Jul 2025 07:52:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 476328D0009; Thu, 17 Jul 2025 07:52:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 317708D000C; Thu, 17 Jul 2025 07:52:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1D0DF8D0009 for ; Thu, 17 Jul 2025 07:52:38 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B2D5112E50F for ; Thu, 17 Jul 2025 11:52:37 +0000 (UTC) X-FDA: 83673594354.08.CD31CBE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 595848000A for ; Thu, 17 Jul 2025 11:52:35 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=U11oFUPw; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752753155; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GbR0phC2X7fIX30bxtjDJxvlOXqw8t3nlah85Ck4RFM=; b=dF/wsYd0OoSwgXQ+G1OZ732tQ/GROlyRLyUWlLSxJgL73KThbawEgO/PetJ05uG6pLAgGX lQOFXaFQYoEiXxdSYRn0iDOLoMPq/lX3NZQuQF66P3Y1c4MGpmsW+UOC+61XqJnstp+JYC Gfh6BSJUNnaWm/VyAKsrBuFdMNKfOxQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752753155; a=rsa-sha256; cv=none; b=uI2nxjdUH/3r1f2xywyN9A8VJL2NEE26w8PKD+wz/URmYnmIsFKqGeV9JEoMIt/ZzIciM7 UGZFSY+q0vPS0KD99pH6ZypcYq+AeD21qCpY7ynW6h0qvxi06P6K5Orv+3at297EJqVcZL 4CRLW3zZ7AyyHxvcberr8QZXFJqS3tw= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=U11oFUPw; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1752753154; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GbR0phC2X7fIX30bxtjDJxvlOXqw8t3nlah85Ck4RFM=; b=U11oFUPwSgFforW31IQ3ERgYF9VPlP23HVKfearb2qtijLkm1Vd9BisAsKGTxvCdeCAlzX D2EFa0NU2IKTInvt0v3G2DTM7kF8iMpS4k8a2dwjABlYuw0kV16mwiHsj6nyVgR3UdtuN0 3K0NwrKrMsvFx5zBjdSv+aoJ5+abxZE= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-677-4y2fcUXsOVqS_XBz1FcnIA-1; Thu, 17 Jul 2025 07:52:31 -0400 X-MC-Unique: 4y2fcUXsOVqS_XBz1FcnIA-1 X-Mimecast-MFC-AGG-ID: 4y2fcUXsOVqS_XBz1FcnIA_1752753150 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-455eda09c57so5374675e9.2 for ; Thu, 17 Jul 2025 04:52:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752753150; x=1753357950; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GbR0phC2X7fIX30bxtjDJxvlOXqw8t3nlah85Ck4RFM=; b=wGMPy1niidxR3mr838eW/PGAssuZcc7El3FiWt9KprTKc20oae3UqdvdRjqLe8qrcX 2vrxjr+YZJwiJ112rM7Hwawp6zLv+4GC1ybzZei0LreGJ6H2T7WiKQY+XRlGtGsU/iir 97zOh+ckMj9dLhsvag8fEZM+NB47CtlBoQQ3G2NF+XqAIAKxLeiCUKXE7E0fJ4To5mqB vW3VHcT1j06U8vZI3mTRfC5VHxt58UQv5++aaveEILy/tqPQrCESPrxexfo31gcnjCuB mAcGCgDQ15OWhy+LpqiUBHO6z7fJpeZosB/57Z/tOKqjFfkQ+CfEi6kVpUXrTEzJwG1V UIXw== X-Gm-Message-State: AOJu0YxwwzbBd3+S5QpSDcEu5+comv/GiDDJY6b+bYdrA7hRTD9rECGr oJu6u5uE7zcDCDg0vueYdUCLLxowAc5iuElYP+qwmeZO2hufv3Qz6f6fFXhVEVrkWakUmc3iHwd u4U1Flr4/8Agp3Q2Pud7PaxlqrAte7HCdMK6OZco/weUnjAT8Cka/ X-Gm-Gg: ASbGncvtIaomHdymCcSujmK8wWWfMmwwsuWOmErEyjPEPfUVdHMUQf+MFQwymioHZZy wg0kKMkBMvxsniq+m1ykamj2C2zqI08bisgMgG54014uf4d6rbfvLP9mvyMEbofT4jO5jNrWrv2 eN85p3l11rc5njK7D8lv2nu+X8/He92Q2Wieh4JB/L6QtLqL1B3JUh1qF6tlZ2oH2UERli+dsPs 1h6eBXyMFaPAnUgVYKGOlISGpOADY8DNjzyE62ynpmF01TibToAz8a+acfBJYLSlMTBOx2plC2h 1/d2KyBsAdgRckcexzE080WN+Hc/glZZYAO9O55n0EZqyuQWTvdVWnNBHjMaGUxgcTSrga0N77Y bvMdscBygbUc804s7xAxTnGA= X-Received: by 2002:a05:600c:4706:b0:456:19eb:2e09 with SMTP id 5b1f17b1804b1-4562edaa08cmr59715995e9.8.1752753150335; Thu, 17 Jul 2025 04:52:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFxGF6l6GEGtw18Sa0l+0UWiR0a/WM3dG6MbtHwia3ILIrxrhuz79ecySj7OKZMvEeK7lnwGw== X-Received: by 2002:a05:600c:4706:b0:456:19eb:2e09 with SMTP id 5b1f17b1804b1-4562edaa08cmr59715535e9.8.1752753149770; Thu, 17 Jul 2025 04:52:29 -0700 (PDT) Received: from localhost (p200300d82f1f36000dc826ee9aa9fdc7.dip0.t-ipconnect.de. [2003:d8:2f1f:3600:dc8:26ee:9aa9:fdc7]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-45626c7b1a9sm45953235e9.0.2025.07.17.04.52.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 17 Jul 2025 04:52:29 -0700 (PDT) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, xen-devel@lists.xenproject.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev, David Hildenbrand , Andrew Morton , Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , Dan Williams , Matthew Wilcox , Jan Kara , Alexander Viro , Christian Brauner , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Zi Yan , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Jann Horn , Pedro Falcato , Hugh Dickins , Oscar Salvador , Lance Yang Subject: [PATCH v2 6/9] mm/memory: convert print_bad_pte() to print_bad_page_map() Date: Thu, 17 Jul 2025 13:52:09 +0200 Message-ID: <20250717115212.1825089-7-david@redhat.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20250717115212.1825089-1-david@redhat.com> References: <20250717115212.1825089-1-david@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: DoiSYfzdRI_asz6f8qqJeg81wcuJIzgyCVnrALUGK9g_1752753150 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true X-Stat-Signature: q76ztmc8p4cd7hoqejupy7kam6b5667o X-Rspamd-Queue-Id: 595848000A X-Rspamd-Server: rspam10 X-Rspam-User: X-HE-Tag: 1752753155-170972 X-HE-Meta: U2FsdGVkX19tIf6qStHp+TfK8Dc041WlmaVU5l+OH7RSKRLBTs1scoy4sMC4xW1U0tRrQIl/T0g5cAw800KtpcXywpOy1nQoN1YyDgO6dUEsuq3MVNjWke0sg2RDvmrmS314JXnlA2/0jW2om1rZfzfooHuW10zNLGJmxxj9s8crYEh9fAAetjCxiYMvbQBwSfnWMJz//DKGAe6aFjaJqFwz+FDbD/vwVCMTYLExE2MyH/5nW2mzfzlx+hje/j1uqx0rPyr7sOaPCCFn+ksAVPGbl0uFiQhIqv6cvs+9AEwB2ebYFN+wISN/l2/b/iNe3o5/F5TNG4w1Nb46FFaUkxkMKJskvJ+90B+/T+kPOMFeXXXbQNhODxEKtGgzQY9zGsSgwzRy2ds6HN/+4pvzoDNWivMo84ru6Hyfw1K7VXrv+qSIbbR6O5btqsG/zvH1tYPr3U44pS9fPJntRsm+knKwQmkRU7QXncM2EtTpqlvNzEVTmt646SA7bKKu7ee7jxi8qYZnRNmAJEqcRGG+qLpch0YvFIYadZecB/E0yWq6fBQaTJHTqjyCEhcYYJ+a2SxUkgZ6d4yPLlxnTBt6KkEJSq//mtrGyoOPauQwcs0EEVmgAIZoc4p7OgBtl59/h3CXBM/SvNvW5hxZO/nqC+rag+9VsrrhcvgOxG3mF2oRePeOBCcvp5muoUJF9M9IvkabIvsLOzETwqQXKyZAJeGSU3iH5BniPT/cy3jNN77h+yDPo3AVr8y/XJMmZTkh3QDhac5z+RqosJpjFCJPSgD7txj203iAM/S3rsGVtsBZsk2776RKV97Tgbaf5xkmqYCqVBLOk8H4rXTFZNbJ9giYOd7+ZRETO7u5XHdqoa/KMLwZ0622r1nWQ6EdIMuk1Dd7d8Ai+IAZNCMaPYmDYGZVxHiS3oHFasXBn3qra9s5uoafj2lsWCNNdt1gwl9ZSXLZlgpAZ9FcBIoX6UW DudHjrm1 fOqpHqtotNCyR61KpYKIHJ+TX3PMnKn5SN5MRnUkMiLhWtQrOlAx/Gi1pPmytjIoDrEoKf6CI+F0LYaNCiNQecWNYeSdhE567/HpomwNtdLK7fNsfIM6q99TNd9mT2TeW9VlYfir/05fpofq9ijGcBJfG99Hucm4HxupayOuuRuC4hPlzVCBNVpMI3LXiY/xhwVxlhzXXLsQv71jZHYV6k5405LgBm4Ay/KLG+k1sapAUae/UlSYJ+CSXuFX9KKfzSUkM36Sqmaly87TZhm5+rhynUXyxpD5muzO4U+n0Ows9H8i8LbZho4O5rK+8NK70NF0l5tNYqv0mGab8VMrKIaYs1uNrUdenLaoXuPjfSVqFfnb8d0wB4lHRJfEa6PloUYw5ZV9KgE8TlBUejTR9Oc4nQMbwEIjA1Ad6T4uuDL0BnxNXgmXc4MEigTXTFXKHAU9gsjHiRjUyQzqlkBC24qQPDvFmAJxjkQvfVIB8gJtNsMNa7Z7HaK4tijUMPlpGfSvcnEkRbV1jThtHuUwFyEKhZw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: print_bad_pte() looks like something that should actually be a WARN or similar, but historically it apparently has proven to be useful to detect corruption of page tables even on production systems -- report the issue and keep the system running to make it easier to actually detect what is going wrong (e.g., multiple such messages might shed a light). As we want to unify vm_normal_page_*() handling for PTE/PMD/PUD, we'll have to take care of print_bad_pte() as well. Let's prepare for using print_bad_pte() also for non-PTEs by adjusting the implementation and renaming the function -- we'll rename it to what we actually print: bad (page) mappings. Maybe it should be called "print_bad_table_entry()"? We'll just call it "print_bad_page_map()" because the assumption is that we are dealing with some (previously) present page table entry that got corrupted in weird ways. Whether it is a PTE or something else will usually become obvious from the page table dump or from the dumped stack. If ever required in the future, we could pass the entry level type similar to "enum rmap_level". For now, let's keep it simple. To make the function a bit more readable, factor out the ratelimit check into is_bad_page_map_ratelimited() and place the dumping of page table content into __dump_bad_page_map_pgtable(). We'll now dump information from each level in a single line, and just stop the table walk once we hit something that is not a present page table. Use print_bad_page_map() in vm_normal_page_pmd() similar to how we do it for vm_normal_page(), now that we have a function that can handle it. The report will now look something like (dumping pgd to pmd values): [ 77.943408] BUG: Bad page map in process XXX entry:80000001233f5867 [ 77.944077] addr:00007fd84bb1c000 vm_flags:08100071 anon_vma: ... [ 77.945186] pgd:10a89f067 p4d:10a89f067 pud:10e5a2067 pmd:105327067 Not using pgdp_get(), because that does not work properly on some arm configs where pgd_t is an array. Note that we are dumping all levels even when levels are folded for simplicity. Signed-off-by: David Hildenbrand --- mm/memory.c | 120 ++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 94 insertions(+), 26 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 173eb6267e0ac..08d16ed7b4cc7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -473,22 +473,8 @@ static inline void add_mm_rss_vec(struct mm_struct *mm, int *rss) add_mm_counter(mm, i, rss[i]); } -/* - * This function is called to print an error when a bad pte - * is found. For example, we might have a PFN-mapped pte in - * a region that doesn't allow it. - * - * The calling function must still handle the error. - */ -static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, - pte_t pte, struct page *page) +static bool is_bad_page_map_ratelimited(void) { - pgd_t *pgd = pgd_offset(vma->vm_mm, addr); - p4d_t *p4d = p4d_offset(pgd, addr); - pud_t *pud = pud_offset(p4d, addr); - pmd_t *pmd = pmd_offset(pud, addr); - struct address_space *mapping; - pgoff_t index; static unsigned long resume; static unsigned long nr_shown; static unsigned long nr_unshown; @@ -500,7 +486,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, if (nr_shown == 60) { if (time_before(jiffies, resume)) { nr_unshown++; - return; + return true; } if (nr_unshown) { pr_alert("BUG: Bad page map: %lu messages suppressed\n", @@ -511,15 +497,87 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, } if (nr_shown++ == 0) resume = jiffies + 60 * HZ; + return false; +} + +static void __dump_bad_page_map_pgtable(struct mm_struct *mm, unsigned long addr) +{ + unsigned long long pgdv, p4dv, pudv, pmdv; + p4d_t p4d, *p4dp; + pud_t pud, *pudp; + pmd_t pmd, *pmdp; + pgd_t *pgdp; + + /* + * This looks like a fully lockless walk, however, the caller is + * expected to hold the leaf page table lock in addition to other + * rmap/mm/vma locks. So this is just a re-walk to dump page table + * content while any concurrent modifications should be completely + * prevented. + */ + pgdp = pgd_offset(mm, addr); + pgdv = pgd_val(*pgdp); + + if (!pgd_present(*pgdp) || pgd_leaf(*pgdp)) { + pr_alert("pgd:%08llx\n", pgdv); + return; + } + + p4dp = p4d_offset(pgdp, addr); + p4d = p4dp_get(p4dp); + p4dv = p4d_val(p4d); + + if (!p4d_present(p4d) || p4d_leaf(p4d)) { + pr_alert("pgd:%08llx p4d:%08llx\n", pgdv, p4dv); + return; + } + + pudp = pud_offset(p4dp, addr); + pud = pudp_get(pudp); + pudv = pud_val(pud); + + if (!pud_present(pud) || pud_leaf(pud)) { + pr_alert("pgd:%08llx p4d:%08llx pud:%08llx\n", pgdv, p4dv, pudv); + return; + } + + pmdp = pmd_offset(pudp, addr); + pmd = pmdp_get(pmdp); + pmdv = pmd_val(pmd); + + /* + * Dumping the PTE would be nice, but it's tricky with CONFIG_HIGHPTE, + * because the table should already be mapped by the caller and + * doing another map would be bad. print_bad_page_map() should + * already take care of printing the PTE. + */ + pr_alert("pgd:%08llx p4d:%08llx pud:%08llx pmd:%08llx\n", pgdv, + p4dv, pudv, pmdv); +} + +/* + * This function is called to print an error when a bad page table entry (e.g., + * corrupted page table entry) is found. For example, we might have a + * PFN-mapped pte in a region that doesn't allow it. + * + * The calling function must still handle the error. + */ +static void print_bad_page_map(struct vm_area_struct *vma, + unsigned long addr, unsigned long long entry, struct page *page) +{ + struct address_space *mapping; + pgoff_t index; + + if (is_bad_page_map_ratelimited()) + return; mapping = vma->vm_file ? vma->vm_file->f_mapping : NULL; index = linear_page_index(vma, addr); - pr_alert("BUG: Bad page map in process %s pte:%08llx pmd:%08llx\n", - current->comm, - (long long)pte_val(pte), (long long)pmd_val(*pmd)); + pr_alert("BUG: Bad page map in process %s entry:%08llx", current->comm, entry); + __dump_bad_page_map_pgtable(vma->vm_mm, addr); if (page) - dump_page(page, "bad pte"); + dump_page(page, "bad page map"); pr_alert("addr:%px vm_flags:%08lx anon_vma:%px mapping:%px index:%lx\n", (void *)addr, vma->vm_flags, vma->anon_vma, mapping, index); pr_alert("file:%pD fault:%ps mmap:%ps mmap_prepare: %ps read_folio:%ps\n", @@ -597,7 +655,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, if (is_zero_pfn(pfn)) return NULL; - print_bad_pte(vma, addr, pte, NULL); + print_bad_page_map(vma, addr, pte_val(pte), NULL); return NULL; } @@ -625,7 +683,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, check_pfn: if (unlikely(pfn > highest_memmap_pfn)) { - print_bad_pte(vma, addr, pte, NULL); + print_bad_page_map(vma, addr, pte_val(pte), NULL); return NULL; } @@ -654,8 +712,15 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, { unsigned long pfn = pmd_pfn(pmd); - if (unlikely(pmd_special(pmd))) + if (unlikely(pmd_special(pmd))) { + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) + return NULL; + if (is_huge_zero_pfn(pfn)) + return NULL; + + print_bad_page_map(vma, addr, pmd_val(pmd), NULL); return NULL; + } if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { if (vma->vm_flags & VM_MIXEDMAP) { @@ -674,8 +739,10 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, if (is_huge_zero_pfn(pfn)) return NULL; - if (unlikely(pfn > highest_memmap_pfn)) + if (unlikely(pfn > highest_memmap_pfn)) { + print_bad_page_map(vma, addr, pmd_val(pmd), NULL); return NULL; + } /* * NOTE! We still have PageReserved() pages in the page tables. @@ -1509,7 +1576,7 @@ static __always_inline void zap_present_folio_ptes(struct mmu_gather *tlb, folio_remove_rmap_ptes(folio, page, nr, vma); if (unlikely(folio_mapcount(folio) < 0)) - print_bad_pte(vma, addr, ptent, page); + print_bad_page_map(vma, addr, pte_val(ptent), page); } if (unlikely(__tlb_remove_folio_pages(tlb, page, nr, delay_rmap))) { *force_flush = true; @@ -4507,7 +4574,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } else if (is_pte_marker_entry(entry)) { ret = handle_pte_marker(vmf); } else { - print_bad_pte(vma, vmf->address, vmf->orig_pte, NULL); + print_bad_page_map(vma, vmf->address, + pte_val(vmf->orig_pte), NULL); ret = VM_FAULT_SIGBUS; } goto out; -- 2.50.1