From: Matthew Wilcox <willy@infradead.org>
To: Miaohe Lin <linmiaohe@huawei.com>
Cc: Oscar Salvador <osalvador@suse.de>, Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 0/5] Remove some races around folio_test_hugetlb
Date: Mon, 4 Mar 2024 17:08:07 +0000 [thread overview]
Message-ID: <ZeX_922nCa1KtsuR@casper.infradead.org> (raw)
In-Reply-To: <a9b40ae3-6e2d-56bb-ba75-8cfd2ace4b33@huawei.com>
On Mon, Mar 04, 2024 at 05:09:58PM +0800, Miaohe Lin wrote:
> I encountered similar issues with PageSwapCache check when doing memory-failure test:
>
> [66258.945079] page:00000000135e1205 refcount:1 mapcount:0 mapping:0000000000000000 index:0x9b pfn:0xa04e9a
> [66258.949096] head:0000000038449724 order:9 entire_mapcount:1 nr_pages_mapped:0 pincount:0
> [66258.949485] memcg:ffff95fb43379000
> [66258.950334] anon flags: 0x6fffc00000a0068(uptodate|lru|head|mappedtodisk|swapbacked|node=1|zone=2|lastcpupid=0x3fff)
> [66258.951212] page_type: 0xffffffff()
> [66258.951882] raw: 06fffc0000000000 ffffc89628138001 dead000000000122 dead000000000400
> [66258.952273] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
> [66258.952884] head: 06fffc00000a0068 ffffc896218a8008 ffffc89621680008 ffff95fb4349c439
> [66258.953239] head: 0000000700000600 0000000000000000 00000001ffffffff ffff95fb43379000
> [66258.953725] page dumped because: VM_BUG_ON_PAGE(PageTail(page))
> [66258.954497] ------------[ cut here ]------------
> [66258.954937] kernel BUG at include/linux/page-flags.h:313!
> [66258.956502] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [66258.957001] CPU: 14 PID: 174237 Comm: page-types Kdump: loaded Not tainted 6.8.0-rc1-00162-gd162e170f118 #11
> [66258.957001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> [66258.958415] RIP: 0010:folio_flags.constprop.0+0x1c/0x50
> [66258.958415] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 8b 57 08 48 89 f8 83 e2 01 74 12 48 c7 c6 a0 59 34 a7 48 89 c7 e8 b5 60 e8 ff 90 <0f> 0b 66 90 c3 cc cc cc cc f7 c7 ff 0f 00 00 75 1a 48 8b 17 83 e2
> [66258.958415] RSP: 0018:ffffa0f38ae53e00 EFLAGS: 00000282
> [66258.958415] RAX: 0000000000000033 RBX: 0000000000000000 RCX: ffff96031fd9c9c8
> [66258.958415] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff96031fd9c9c0
> [66258.958415] RBP: ffffc8962813a680 R08: ffffffffa7756f88 R09: 0000000000009ffb
> [66258.962155] R10: 000000000000054a R11: ffffffffa7726fa0 R12: 06fffc0000000000
> [66258.962155] R13: 0000000000000000 R14: 00007fff93bf1348 R15: 0000000000a04e9a
> [66258.962155] FS: 00007f47cc5c4740(0000) GS:ffff96031fd80000(0000) knlGS:0000000000000000
> [66258.962155] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [66258.962155] CR2: 00007fff93c7b000 CR3: 0000000850c28000 CR4: 00000000000006f0
> [66258.962155] Call Trace:
> [66258.962155] <TASK>
> [66258.965730] stable_page_flags+0x210/0x940
> [66258.965730] kpageflags_read+0x97/0xf0
Clearly nobody has loved kpageflags_read() in a long time. It's
absolutely full of bugs, some harmless, others less so.
The heart of the problem is that nobody has a refcount on the page, so
literally everything can change under us. The old implementations of
PageSwapCache (etc) would silently give bad information; the folio
reimplementations warn you when you hit this race.
We have a few options:
- We could grab a reference. That would probaby be unwelcome.
- We can grab a snapshot. Might be a bit overkill.
- We can grab the parts of the page/folio we need and open-code our
tests. This actually seems easiest.
Here's option 3. Compile-tested only. Some notes ...
- Slab no longer uses page mapcount, so we can remove that test.
- We still want to check page_mapped(), not folio_mapped() because
each page can be mapped individually.
- Our reporting of THP is probably wrong, but I've preserved the
current behaviour here.
- Now report the entire HZP with the ZERO_PAGE flag instead of just the
head page.
diff --git a/fs/proc/page.c b/fs/proc/page.c
index 195b077c0fac..f5e3cc6509be 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -107,10 +107,18 @@ static inline u64 kpf_copy_bit(u64 kflags, int ubit, int kbit)
return ((kflags >> kbit) & 1) << ubit;
}
+/*
+ * We do not have a reference on the struct page! We must be very careful
+ * with what functions we call. Some inaccuracy is tolerable here but the
+ * helper functions may warn.
+ */
u64 stable_page_flags(struct page *page)
{
- u64 k;
- u64 u;
+ struct folio *folio;
+ unsigned long k;
+ unsigned long mapping;
+ bool anon;
+ u64 u = 0;
/*
* pseudo flag: KPF_NOPAGE
@@ -118,45 +126,39 @@ u64 stable_page_flags(struct page *page)
*/
if (!page)
return 1 << KPF_NOPAGE;
+ folio = page_folio(page);
- k = page->flags;
- u = 0;
+ k = folio->flags;
+ mapping = (unsigned long)folio->mapping;
+ anon = mapping & PAGE_MAPPING_ANON;
/*
* pseudo flags for the well known (anonymous) memory mapped pages
- *
- * Note that page->_mapcount is overloaded in SLAB, so the
- * simple test in page_mapped() is not enough.
*/
- if (!PageSlab(page) && page_mapped(page))
+ if (page_mapped(page))
u |= 1 << KPF_MMAP;
- if (PageAnon(page))
+ if (anon)
u |= 1 << KPF_ANON;
- if (PageKsm(page))
+ if (mapping & PAGE_MAPPING_KSM)
u |= 1 << KPF_KSM;
/*
* compound pages: export both head/tail info
* they together define a compound page's start/end pos and order
*/
- if (PageHead(page))
- u |= 1 << KPF_COMPOUND_HEAD;
+ u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head);
if (PageTail(page))
u |= 1 << KPF_COMPOUND_TAIL;
- if (PageHuge(page))
+ if (folio_test_hugetlb(folio))
u |= 1 << KPF_HUGE;
/*
- * PageTransCompound can be true for non-huge compound pages (slab
- * pages or pages allocated by drivers with __GFP_COMP) because it
- * just checks PG_head/PG_tail, so we need to check PageLRU/PageAnon
+ * We need to check LRU/Anon
* to make sure a given page is a thp, not a non-huge compound page.
*/
- else if (PageTransCompound(page)) {
- struct page *head = compound_head(page);
-
- if (PageLRU(head) || PageAnon(head))
+ else if (folio_test_large(folio)) {
+ if ((k & PG_lru) || anon)
u |= 1 << KPF_THP;
- else if (is_huge_zero_page(head)) {
+ else if (is_huge_zero_page(&folio->page)) {
u |= 1 << KPF_ZERO_PAGE;
u |= 1 << KPF_THP;
}
@@ -178,15 +180,15 @@ u64 stable_page_flags(struct page *page)
if (PageTable(page))
u |= 1 << KPF_PGTABLE;
- if (page_is_idle(page))
+#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
+ u |= kpf_copy_bit(k, KPF_IDLE, PG_idle);
+#else
+ if (folio_test_idle(folio))
u |= 1 << KPF_IDLE;
+#endif
u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
-
u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
- if (PageTail(page) && PageSlab(page))
- u |= 1 << KPF_SLAB;
-
u |= kpf_copy_bit(k, KPF_ERROR, PG_error);
u |= kpf_copy_bit(k, KPF_DIRTY, PG_dirty);
u |= kpf_copy_bit(k, KPF_UPTODATE, PG_uptodate);
@@ -197,7 +199,8 @@ u64 stable_page_flags(struct page *page)
u |= kpf_copy_bit(k, KPF_ACTIVE, PG_active);
u |= kpf_copy_bit(k, KPF_RECLAIM, PG_reclaim);
- if (PageSwapCache(page))
+ if ((k & ((1 << PG_swapbacked) | (1 << PG_swapcache))) ==
+ ((1 << PG_swapbacked) | (1 << PG_swapcache)))
u |= 1 << KPF_SWAPCACHE;
u |= kpf_copy_bit(k, KPF_SWAPBACKED, PG_swapbacked);
next prev parent reply other threads:[~2024-03-04 17:08 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-01 21:47 [PATCH 0/5] Remove some races around folio_test_hugetlb Matthew Wilcox (Oracle)
2024-03-01 21:47 ` [PATCH 1/5] hugetlb: Make folio_test_hugetlb safer to call Matthew Wilcox (Oracle)
2024-03-05 6:43 ` Oscar Salvador
2024-03-05 8:39 ` David Hildenbrand
2024-03-01 21:47 ` [PATCH 2/5] hugetlb: Add hugetlb_pfn_folio Matthew Wilcox (Oracle)
2024-03-05 6:58 ` Oscar Salvador
2024-03-01 21:47 ` [PATCH 3/5] memory-failure: Use hugetlb_pfn_folio Matthew Wilcox (Oracle)
2024-03-01 21:47 ` [PATCH 4/5] memory-failure: Reorganise get_huge_page_for_hwpoison() Matthew Wilcox (Oracle)
2024-03-01 21:47 ` [PATCH 5/5] compaction: Use hugetlb_pfn_folio in isolate_migratepages_block Matthew Wilcox (Oracle)
2024-03-04 9:09 ` [PATCH 0/5] Remove some races around folio_test_hugetlb Miaohe Lin
2024-03-04 17:08 ` Matthew Wilcox [this message]
2024-03-06 7:58 ` Miaohe Lin
2024-03-07 21:16 ` Matthew Wilcox
2024-03-05 9:10 ` David Hildenbrand
2024-03-05 20:35 ` Matthew Wilcox
2024-03-06 15:18 ` David Hildenbrand
2024-03-07 4:31 ` Matthew Wilcox
2024-03-07 9:20 ` David Hildenbrand
2024-03-07 21:14 ` Matthew Wilcox
2024-03-07 21:38 ` David Hildenbrand
2024-03-08 4:31 ` Matthew Wilcox
2024-03-08 8:46 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZeX_922nCa1KtsuR@casper.infradead.org \
--to=willy@infradead.org \
--cc=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=osalvador@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).