From: Michal Hocko <mhocko@kernel.org>
To: James Morse <james.morse@arm.com>
Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
Pavel Tatashin <Pavel.Tatashin@microsoft.com>,
Mikulas Patocka <mpatocka@redhat.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: Re: [PATCH] arm64: mm: always enable CONFIG_HOLES_IN_ZONE
Date: Mon, 3 Sep 2018 21:47:31 +0200 [thread overview]
Message-ID: <20180903194731.GE14951@dhcp22.suse.cz> (raw)
In-Reply-To: <20180830150532.22745-1-james.morse@arm.com>
On Thu 30-08-18 16:05:32, James Morse wrote:
> Commit 6d526ee26ccd ("arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA")
> only enabled HOLES_IN_ZONE for NUMA systems because the NUMA code was
> choking on the missing zone for nomap pages. This problem doesn't just
> apply to NUMA systems.
>
> If the architecture doesn't set HAVE_ARCH_PFN_VALID, pfn_valid() will
> return true if the pfn is part of a valid sparsemem section.
>
> When working with multiple pages, the mm code uses pfn_valid_within()
> to test each page it uses within the sparsemem section is valid. On
> most systems memory comes in MAX_ORDER_NR_PAGES chunks which all
> have valid/initialised struct pages. In this case pfn_valid_within()
> is optimised out.
>
> Systems where this isn't true (e.g. due to nomap) should set
> HOLES_IN_ZONE and provide HAVE_ARCH_PFN_VALID so that mm tests each
> page as it works with it.
>
> Currently non-NUMA arm64 systems can't enable HOLES_IN_ZONE, leading to
> VM_BUG_ON():
> | page:fffffdff802e1780 is uninitialized and poisoned
> | raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> | raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> | page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> | ------------[ cut here ]------------
> | kernel BUG at include/linux/mm.h:978!
> | Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [...]
> | CPU: 1 PID: 25236 Comm: dd Not tainted 4.18.0 #7
> | Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> | pstate: 40000085 (nZcv daIf -PAN -UAO)
> | pc : move_freepages_block+0x144/0x248
> | lr : move_freepages_block+0x144/0x248
> | sp : fffffe0071177680
> [...]
> | Process dd (pid: 25236, stack limit = 0x0000000094cc07fb)
> | Call trace:
> | move_freepages_block+0x144/0x248
> | steal_suitable_fallback+0x100/0x16c
> | get_page_from_freelist+0x440/0xb20
> | __alloc_pages_nodemask+0xe8/0x838
> | new_slab+0xd4/0x418
> | ___slab_alloc.constprop.27+0x380/0x4a8
> | __slab_alloc.isra.21.constprop.26+0x24/0x34
> | kmem_cache_alloc+0xa8/0x180
> | alloc_buffer_head+0x1c/0x90
> | alloc_page_buffers+0x68/0xb0
> | create_empty_buffers+0x20/0x1ec
> | create_page_buffers+0xb0/0xf0
> | __block_write_begin_int+0xc4/0x564
> | __block_write_begin+0x10/0x18
> | block_write_begin+0x48/0xd0
> | blkdev_write_begin+0x28/0x30
> | generic_perform_write+0x98/0x16c
> | __generic_file_write_iter+0x138/0x168
> | blkdev_write_iter+0x80/0xf0
> | __vfs_write+0xe4/0x10c
> | vfs_write+0xb4/0x168
> | ksys_write+0x44/0x88
> | sys_write+0xc/0x14
> | el0_svc_naked+0x30/0x34
> | Code: aa1303e0 90001a01 91296421 94008902 (d4210000)
> | ---[ end trace 1601ba47f6e883fe ]---
>
> Remove the NUMA dependency.
>
> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
> Link: https://www.spinics.net/lists/arm-kernel/msg671851.html
> Fixes: 6d526ee26ccd ("arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA")
> CC: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Signed-off-by: James Morse <james.morse@arm.com>
OK. I guess you are also going to post a patch to drop
ARCH_HAS_HOLES_MEMORYMODEL, right?
Anyway
Acked-by: Michal Hocko <mhocko@suse.com>
I wish we could simplify the pfn validation code a bit. I find
pfn_valid_within quite confusing and I would bet it is not used
consistently. This will require a non trivial audit. I am wondering
whether we really need to make the code more complicated rather than
simply establish a contract that we always have a pageblock worth of
struct pages always available. Even when there is no physical memory
backing it. Such a page can be reserved and never used by the page
allocator. pfn walkers should back off for reserved pages already.
> ---
> arch/arm64/Kconfig | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 29e75b47becd..1b1a0e95c751 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -763,7 +763,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>
> config HOLES_IN_ZONE
> def_bool y
> - depends on NUMA
>
> source kernel/Kconfig.hz
>
> --
> 2.18.0
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2018-09-03 19:47 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-30 15:05 [PATCH] arm64: mm: always enable CONFIG_HOLES_IN_ZONE James Morse
2018-08-30 15:09 ` Pasha Tatashin
2018-08-30 18:23 ` Mikulas Patocka
2018-09-03 19:47 ` Michal Hocko [this message]
2018-09-07 17:47 ` James Morse
2018-09-10 9:25 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180903194731.GE14951@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=Pavel.Tatashin@microsoft.com \
--cc=ard.biesheuvel@linaro.org \
--cc=catalin.marinas@arm.com \
--cc=james.morse@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=mpatocka@redhat.com \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).