All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, Aishwarya TCV <Aishwarya.TCV@arm.com>,
	Mark Brown <broonie@kernel.org>, Ard Biesheuvel <ardb@kernel.org>
Subject: Re: [PATCH 4/8] mm: Add __dump_folio()
Date: Fri, 1 Mar 2024 10:21:10 +0000	[thread overview]
Message-ID: <6de0d026-cd8d-4152-97ca-d33d2a4e2e84@arm.com> (raw)
In-Reply-To: <20240227192337.757313-5-willy@infradead.org>

Hi Matthew,

On 27/02/2024 19:23, Matthew Wilcox (Oracle) wrote:
> Turn __dump_page() into a wrapper around __dump_folio().  Snapshot the
> page & folio into a stack variable so we don't hit BUG_ON() if an
> allocation is freed under us and what was a folio pointer becomes a
> pointer to a tail page.

I'm seeing a couple of panics caused by this patch. I already raised the first one at [1], and it looks like there is a bug in Ard's patch (which he now has a proposed fix for) which provokes the bug in this.

[1] https://lore.kernel.org/linux-arm-kernel/fc691e8d-1a50-4be6-a3b2-d60d6f2e2487@arm.com/

The other way to trigger it is to run the mm kselftest:

  gup_test -ct -F 0x1 0 19 0x1000

This calls into the kernel and deliberately calls dump_page() via dump_pages_test() in gup_test.c.

Panic is as follows (root cause identified below):

[   22.994800] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[   22.995428] Mem abort info:
[   22.995617]   ESR = 0x0000000096000005
[   22.995867]   EC = 0x25: DABT (current EL), IL = 32 bits
[   22.996215]   SET = 0, FnV = 0
[   22.996419]   EA = 0, S1PTW = 0
[   22.996628]   FSC = 0x05: level 1 translation fault
[   22.996951] Data abort info:
[   22.997145]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[   22.997541]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[   22.998025]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[   22.998438] user pgtable: 4k pages, 39-bit VAs, pgdp=000000019f2d6000
[   22.998937] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[   22.999608] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[   23.000083] Modules linked in:
[   23.000319] CPU: 6 PID: 1222 Comm: gup_test Not tainted 6.8.0-rc6-00915-g7f43e0f76e47 #2
[   23.000883] Hardware name: linux,dummy-virt (DT)
[   23.001209] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   23.001621] pc : get_pfnblock_flags_mask+0x3c/0x68
[   23.001929] lr : __dump_page+0x188/0x400
[   23.002168] sp : ffffffc0885ebb40
[   23.002370] x29: ffffffc0885ebb40 x28: 0000000000ffffc0 x27: 0000000000000000
[   23.002839] x26: 0000000000000000 x25: ffffffc0885ebba0 x24: 00000000ffffffff
[   23.003335] x23: ffffffeabbbc1000 x22: 0000000000000030 x21: ffffffeabb9f5e98
[   23.003869] x20: ffffffc0885ebba0 x19: ffffffc0885ebba0 x18: ffffffffffffffff
[   23.004489] x17: 34383833383a6465 x16: 7070616d5f736567 x15: ffffffeabd058782
[   23.004989] x14: 0000000000000000 x13: 3030303730303039 x12: 3138666666666666
[   23.005501] x11: fffffffffffe0000 x10: ffffffeabca9c018 x9 : ffffffeab9f0c798
[   23.005980] x8 : 00000000ffffefff x7 : ffffffeabca9c018 x6 : 0000000000000000
[   23.006448] x5 : 000003fffffffc28 x4 : 0001fffffffe144a x3 : 0000000000000000
[   23.006931] x2 : 0000000000000007 x1 : ffffffff0a257aee x0 : 00000000001fffff
[   23.007436] Call trace:
[   23.007623]  get_pfnblock_flags_mask+0x3c/0x68
[   23.007928]  dump_page+0x2c/0x70
[   23.008156]  gup_test_ioctl+0xb34/0xc40
[   23.008416]  __arm64_sys_ioctl+0xb0/0x100
[   23.008694]  invoke_syscall+0x50/0x128
[   23.008944]  el0_svc_common.constprop.0+0x48/0xf8
[   23.009259]  do_el0_svc+0x28/0x40
[   23.009499]  el0_svc+0x34/0xb8
[   23.009720]  el0t_64_sync_handler+0x13c/0x158
[   23.010029]  el0t_64_sync+0x190/0x198
[   23.010293] Code: d37b1884 f100007f 8b040064 9a831083 (f9400460) 
[   23.010714] ---[ end trace 0000000000000000 ]---


> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/debug.c | 120 +++++++++++++++++++++++++++++------------------------
>  1 file changed, 66 insertions(+), 54 deletions(-)
> 
> diff --git a/mm/debug.c b/mm/debug.c
> index ee533a5ceb79..96555fc78f1a 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -51,84 +51,96 @@ const struct trace_print_flags vmaflag_names[] = {
>  	{0, NULL}
>  };
>  
> -static void __dump_page(struct page *page)
> +static void __dump_folio(struct folio *folio, struct page *page,
> +		unsigned long pfn, unsigned long idx)
>  {
> -	struct folio *folio = page_folio(page);
> -	struct page *head = &folio->page;
> -	struct address_space *mapping;
> -	bool compound = PageCompound(page);
> -	/*
> -	 * Accessing the pageblock without the zone lock. It could change to
> -	 * "isolate" again in the meantime, but since we are just dumping the
> -	 * state for debugging, it should be fine to accept a bit of
> -	 * inaccuracy here due to racing.
> -	 */
> -	bool page_cma = is_migrate_cma_page(page);
> -	int mapcount;
> +	struct address_space *mapping = folio_mapping(folio);
> +	bool page_cma;
> +	int mapcount = 0;
>  	char *type = "";
>  
> -	if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) {
> -		/*
> -		 * Corrupt page, so we cannot call page_mapping. Instead, do a
> -		 * safe subset of the steps that page_mapping() does. Caution:
> -		 * this will be misleading for tail pages, PageSwapCache pages,
> -		 * and potentially other situations. (See the page_mapping()
> -		 * implementation for what's missing here.)
> -		 */
> -		unsigned long tmp = (unsigned long)page->mapping;
> -
> -		if (tmp & PAGE_MAPPING_ANON)
> -			mapping = NULL;
> -		else
> -			mapping = (void *)(tmp & ~PAGE_MAPPING_FLAGS);
> -		head = page;
> -		folio = (struct folio *)page;
> -		compound = false;
> -	} else {
> -		mapping = page_mapping(page);
> -	}
> -
>  	/*
> -	 * Avoid VM_BUG_ON() in page_mapcount().
> -	 * page->_mapcount space in struct page is used by sl[aou]b pages to
> -	 * encode own info.
> +	 * page->_mapcount space in struct page is used by slab pages to
> +	 * encode own info, and we must avoid calling page_folio() again.
>  	 */
> -	mapcount = PageSlab(head) ? 0 : page_mapcount(page);
> -
> -	pr_warn("page:%p refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
> -			page, page_ref_count(head), mapcount, mapping,
> -			page_to_pgoff(page), page_to_pfn(page));
> -	if (compound) {
> -		pr_warn("head:%p order:%u entire_mapcount:%d nr_pages_mapped:%d pincount:%d\n",
> -				head, compound_order(head),
> +	if (!folio_test_slab(folio)) {
> +		mapcount = atomic_read(&page->_mapcount) + 1;
> +		if (folio_test_large(folio))
> +			mapcount += folio_entire_mapcount(folio);
> +	}
> +
> +	pr_warn("page: refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
> +			folio_ref_count(folio), mapcount, mapping,
> +			folio->index + idx, pfn);
> +	if (folio_test_large(folio)) {
> +		pr_warn("head: order:%u entire_mapcount:%d nr_pages_mapped:%d pincount:%d\n",
> +				folio_order(folio),
>  				folio_entire_mapcount(folio),
>  				folio_nr_pages_mapped(folio),
>  				atomic_read(&folio->_pincount));
>  	}
>  
>  #ifdef CONFIG_MEMCG
> -	if (head->memcg_data)
> -		pr_warn("memcg:%lx\n", head->memcg_data);
> +	if (folio->memcg_data)
> +		pr_warn("memcg:%lx\n", folio->memcg_data);
>  #endif
> -	if (PageKsm(page))
> +	if (folio_test_ksm(folio))
>  		type = "ksm ";
> -	else if (PageAnon(page))
> +	else if (folio_test_anon(folio))
>  		type = "anon ";
>  	else if (mapping)
>  		dump_mapping(mapping);
>  	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
>  
> -	pr_warn("%sflags: %pGp%s\n", type, &head->flags,
> +	/*
> +	 * Accessing the pageblock without the zone lock. It could change to
> +	 * "isolate" again in the meantime, but since we are just dumping the
> +	 * state for debugging, it should be fine to accept a bit of
> +	 * inaccuracy here due to racing.
> +	 */
> +	page_cma = is_migrate_cma_page(page);

Problem is here: is_migrate_cma_page() is a macro that resolves to this:

page_cma = get_pfnblock_flags_mask(page, page_to_pfn(page), MIGRATETYPE_MASK) == MIGRATE_CMA;

And since page is on the stack, page_to_pfn() gives a very wrong answer.

I confirmed that the problem goes away for both cases above, when changing the line to:

page_cma = get_pfnblock_flags_mask(page, pfn, MIGRATETYPE_MASK) == MIGRATE_CMA;

Thanks,
Ryan

> +	pr_warn("%sflags: %pGp%s\n", type, &folio->flags,
>  		page_cma ? " CMA" : "");
> -	pr_warn("page_type: %pGt\n", &head->page_type);
> +	pr_warn("page_type: %pGt\n", &folio->page.page_type);
>  
>  	print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
>  			sizeof(unsigned long), page,
>  			sizeof(struct page), false);
> -	if (head != page)
> +	if (folio_test_large(folio))
>  		print_hex_dump(KERN_WARNING, "head: ", DUMP_PREFIX_NONE, 32,
> -			sizeof(unsigned long), head,
> -			sizeof(struct page), false);
> +			sizeof(unsigned long), folio,
> +			2 * sizeof(struct page), false);
> +}
> +
> +static void __dump_page(const struct page *page)
> +{
> +	struct folio *foliop, folio;
> +	struct page precise;
> +	unsigned long pfn = page_to_pfn(page);
> +	unsigned long idx, nr_pages = 1;
> +	int loops = 5;
> +
> +again:
> +	memcpy(&precise, page, sizeof(*page));
> +	foliop = page_folio(&precise);
> +	idx = folio_page_idx(foliop, page);
> +	if (idx != 0) {
> +		if (idx < (1UL << PUD_ORDER)) {
> +			memcpy(&folio, foliop, 2 * sizeof(struct page));
> +			nr_pages = folio_nr_pages(&folio);
> +		}
> +
> +		if (idx > nr_pages) {
> +			if (loops-- > 0)
> +				goto again;
> +			printk("page does not match folio\n");
> +			precise.compound_head &= ~1UL;
> +			foliop = (struct folio *)&precise;
> +			idx = 0;
> +		}
> +	}
> +
> +	__dump_folio(foliop, &precise, pfn, idx);
>  }
>  
>  void dump_page(struct page *page, const char *reason)



  parent reply	other threads:[~2024-03-01 10:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 19:23 [PATCH 0/8] PageFlags cleanups Matthew Wilcox (Oracle)
2024-02-27 19:23 ` [PATCH 1/8] mm: Separate out FOLIO_FLAGS from PAGEFLAGS Matthew Wilcox (Oracle)
2024-03-01 11:23   ` David Hildenbrand
2024-02-27 19:23 ` [PATCH 2/8] mm: Remove PageWaiters, PageSetWaiters and PageClearWaiters Matthew Wilcox (Oracle)
2024-03-01 11:24   ` David Hildenbrand
2024-02-27 19:23 ` [PATCH 3/8] mm: Remove PageYoung and PageIdle definitions Matthew Wilcox (Oracle)
2024-03-01 11:25   ` David Hildenbrand
2024-02-27 19:23 ` [PATCH 4/8] mm: Add __dump_folio() Matthew Wilcox (Oracle)
2024-02-28 21:34   ` SeongJae Park
2024-02-29  4:37     ` Matthew Wilcox
2024-02-29  5:05       ` SeongJae Park
2024-03-01 10:21   ` Ryan Roberts [this message]
2024-03-01 21:32     ` Matthew Wilcox
2024-03-04 19:02       ` Matthew Wilcox
2024-05-14  4:33   ` Kees Cook
2024-05-14  4:53     ` Matthew Wilcox
2024-05-14 14:25     ` Matthew Wilcox
2024-02-27 19:23 ` [PATCH 5/8] mm: Make dump_page() take a const argument Matthew Wilcox (Oracle)
2024-03-01 11:26   ` David Hildenbrand
2024-02-27 19:23 ` [PATCH 6/8] mm: Constify testing page/folio flags Matthew Wilcox (Oracle)
2024-03-01 11:28   ` David Hildenbrand
2024-02-27 19:23 ` [PATCH 7/8] mm: Constify more page/folio tests Matthew Wilcox (Oracle)
2024-03-01 11:28   ` David Hildenbrand
2024-02-27 19:23 ` [PATCH 8/8] mm: Remove cast from page_to_nid() Matthew Wilcox (Oracle)
2024-03-01 11:27   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6de0d026-cd8d-4152-97ca-d33d2a4e2e84@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=Aishwarya.TCV@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=broonie@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.