public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Will Deacon <will.deacon@arm.com>, Robert Richter <rrichter@cavium.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	David Daney <david.daney@cavium.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Hanjun Guo <hanjun.guo@linaro.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] arm64: mm: Fix memmap to be initialized for the entire section
Date: Thu, 01 Dec 2016 17:26:55 +0000	[thread overview]
Message-ID: <58405D5F.90805@arm.com> (raw)
In-Reply-To: <20161201164538.GB1236@arm.com>

Hi Robert, Will,

On 01/12/16 16:45, Will Deacon wrote:
> On Wed, Nov 30, 2016 at 07:21:31PM +0100, Robert Richter wrote:
>> On ThunderX systems with certain memory configurations we see the
>> following BUG_ON():
>>
>>  kernel BUG at mm/page_alloc.c:1848!
>>
>> This happens for some configs with 64k page size enabled. The BUG_ON()
>> checks if start and end page of a memmap range belongs to the same
>> zone.
>>
>> The BUG_ON() check fails if a memory zone contains NOMAP regions. In
>> this case the node information of those pages is not initialized. This
>> causes an inconsistency of the page links with wrong zone and node
>> information for that pages. NOMAP pages from node 1 still point to the
>> mem zone from node 0 and have the wrong nid assigned.
>>
>> The reason for the mis-configuration is a change in pfn_valid() which
>> reports pages marked NOMAP as invalid:
>>
>>  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear mapping
>>
>> This causes pages marked as nomap being no long reassigned to the new
>> zone in memmap_init_zone() by calling __init_single_pfn().
>>
>> Fixing this by restoring the old behavior of pfn_valid() to use
>> memblock_is_memory(). Also changing users of pfn_valid() in arm64 code
>> to use memblock_is_map_memory() where necessary. This only affects
>> code in ioremap.c. The code in mmu.c still can use the new version of
>> pfn_valid().
>>
>> As a consequence, pfn_valid() can not be used to check if a physical
>> page is RAM. It just checks if there is an underlying memmap with a
>> valid struct page. Moreover, for performance reasons the whole memmap
>> (with pageblock_nr_pages number of pages) has valid pfns (SPARSEMEM
>> config). The memory range is extended to fit the alignment of the
>> memmap. Thus, pfn_valid() may return true for pfns that do not map to
>> physical memory. Those pages are simply not reported to the mm, they
>> are not marked reserved nor added to the list of free pages. Other
>> functions such a page_is_ram() or memblock_is_map_ memory() must be
>> used to check for memory and if the page can be mapped with the linear
>> mapping.

[...]

> Thanks for sending out the new patch. Whilst I'm still a bit worried about
> changing pfn_valid like this, I guess we'll just have to fix up any callers
> which suffer from this change.

Hibernate's core code falls foul of this. This patch causes a panic when copying
memory to build the 'image'[0].
saveable_page() in kernel/power/snapshot.c broadly assumes that pfn_valid()
pages can be accessed.

Fortunately the core code exposes pfn_is_nosave() which we can extend to catch
'nomap' pages, but only if they are also marked as PageReserved().

Are there any side-effects of marking all the nomap regions with
mark_page_reserved()? (it doesn't appear to be the case today).


Patches incoming...


Thanks,

James


[0] panic trace
root@juno-r1:~# echo disk > /sys/power/state
[   56.914184] PM: Syncing filesystems ... [   56.918853] done.
[   56.920826] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   56.930383] PM: Preallocating image memory... done (allocated 97481 pages)
[   60.566084] PM: Allocated 389924 kbytes in 3.62 seconds (107.71 MB/s)
[   60.572576] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   60.604877] PM: freeze of devices complete after 23.146 msecs
[   60.611230] PM: late freeze of devices complete after 0.578 msecs
[   60.618609] PM: noirq freeze of devices complete after 1.247 msecs
[   60.624833] Disabling non-boot CPUs ...
[   60.649112] CPU1: shutdown
[   60.651823] psci: CPU1 killed.
[   60.701055] CPU2: shutdown
[   60.703766] psci: CPU2 killed.
[   60.745002] IRQ11 no longer affine to CPU3
[   60.745043] CPU3: shutdown
[   60.751890] psci: CPU3 killed.
[   60.784966] CPU4: shutdown
[   60.787676] psci: CPU4 killed.
[   60.824916] IRQ8 no longer affine to CPU5
[   60.824920] IRQ9 no longer affine to CPU5
[   60.824927] IRQ18 no longer affine to CPU5
[   60.824931] IRQ20 no longer affine to CPU5
[   60.824951] CPU5: shutdown
[   60.843975] psci: CPU5 killed.
[   60.857989] PM: Creating hibernation image:
[   60.857989] PM: Need to copy 96285 pages
[   60.857989] Unable to handle kernel paging request at virtual address
ffff8000794a0000
[   60.857989] pgd = ffff800975190000
[   60.857989] [ffff8000794a0000] *pgd=0000000000000000[   60.857989]
[   60.857989] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[   60.857989] Modules linked in:
[   60.857989] CPU: 0 PID: 2366 Comm: bash Not tainted
4.9.0-rc7-00001-gecf7c47af54d #6346
[   60.857989] Hardware name: ARM Juno development board (r1) (DT)
[   60.857989] task: ffff8009766d3200 task.stack: ffff800975fec000
[   60.857989] PC is at swsusp_save+0x250/0x2c8
[   60.857989] LR is at swsusp_save+0x214/0x2c8
[   60.857989] pc : [<ffff000008100bd0>] lr : [<ffff000008100b94>] pstate: 200003c5
[   60.857989] sp : ffff800975fefb50
[   60.857989] x29: ffff800975fefb50 x28: ffff800975fec000
[   60.857989] x27: ffff0000088c2000 x26: 0000000000000040
[   60.857989] x25: 00000000000f94a0 x24: ffff000008e437e8
[   60.857989] x23: 000000000001781d x22: ffff000008bee000
[   60.857989] x21: ffff000008e437f8 x20: ffff000008e43878
[   60.857989] x19: ffff7e0000000000 x18: 0000000000000006
[   60.857989] x17: 0000000000000000 x16: 00000000000005d0
[   60.857989] x15: ffff000008e43e95 x14: 00000000000001d9
[   60.857989] x13: 0000000000000001 x12: ffff7e0000000000
[   60.857989] x11: ffff7e0025ffffc0 x10: 0000000025ffffc0
[   60.857989] x9 : 000000000000012f x8 : ffff80096cd04ce0
[   60.857989] x7 : 0000000000978000 x6 : 0000000000000076
[   60.857989] x5 : fffffffffffffff8 x4 : 0000000000080000
[   60.857989] x3 : ffff8000794a0000 x2 : ffff800959d83000
[   60.857989] x1 : 0000000000000000 x0 : ffff7e0001e52800

[   60.857989] Process bash (pid: 2366, stack limit = 0xffff800975fec020)
[   60.857989] Stack: (0xffff800975fefb50 to 0xffff800975ff0000)
[   60.857989] Call trace:
[   60.857989] [<ffff000008100bd0>] swsusp_save+0x250/0x2c8
[   60.857989] [<ffff0000080936ec>] swsusp_arch_suspend+0xb4/0x100
[   60.857989] [<ffff0000080fe670>] hibernation_snapshot+0x278/0x318
[   60.857989] [<ffff0000080fef10>] hibernate+0x1d0/0x268
[   60.857989] [<ffff0000080fc954>] state_store+0xdc/0x100
[   60.857989] [<ffff00000838419c>] kobj_attr_store+0x14/0x28
[   60.857989] [<ffff00000825be68>] sysfs_kf_write+0x48/0x58
[   60.857989] [<ffff00000825b1f8>] kernfs_fop_write+0xb0/0x1d8
[   60.857989] [<ffff0000081e2ddc>] __vfs_write+0x1c/0x110
[   60.857989] [<ffff0000081e3bd8>] vfs_write+0xa0/0x1b8
[   60.857989] [<ffff0000081e4f44>] SyS_write+0x44/0xa0
[   60.857989] [<ffff000008082ef0>] el0_svc_naked+0x24/0x28
[   60.857989] Code: d37ae442 d37ae463 b2514042 b2514063 (f8636820)
[   60.857989] ---[ end trace d0265b757c9dd571 ]---
[   60.857989] ------------[ cut here ]------------

  reply	other threads:[~2016-12-01 17:27 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-30 18:21 [PATCH v2] arm64: mm: Fix memmap to be initialized for the entire section Robert Richter
2016-12-01 16:45 ` Will Deacon
2016-12-01 17:26   ` James Morse [this message]
2016-12-02  7:11     ` Robert Richter
2016-12-02 14:48       ` James Morse
2016-12-02 14:49 ` [PATCH 0/2] Hibernate fixes for 'Fix memmap to be initialized for the entire section' James Morse
2016-12-02 14:49   ` [PATCH 1/2] arm64: mm: Mark nomap regions with the PG_reserved flag James Morse
2016-12-02 14:49   ` [PATCH 2/2] arm64: hibernate: report nomap regions as being pfn_nosave James Morse
2016-12-05 15:42   ` [PATCH 0/2] Hibernate fixes for 'Fix memmap to be initialized for the entire section' Ard Biesheuvel
2016-12-06 17:38     ` Will Deacon
2016-12-07  9:06       ` Robert Richter
2016-12-07 14:32         ` Will Deacon
2016-12-09 12:14 ` [PATCH v2] arm64: mm: Fix memmap to be initialized for the entire section Yisheng Xie
2016-12-09 12:19   ` Ard Biesheuvel
2016-12-09 12:23     ` Hanjun Guo
2016-12-09 13:15       ` Yisheng Xie
2016-12-09 14:52         ` Robert Richter
2016-12-12 12:02           ` Ard Biesheuvel
2016-12-09 14:51   ` Robert Richter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58405D5F.90805@arm.com \
    --to=james.morse@arm.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=david.daney@cavium.com \
    --cc=hanjun.guo@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=rrichter@cavium.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox