public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: travis@sgi.com, mike.travis@hpe.com
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, x86@kernel.org, thgarnie@google.com,
	linux-kernel@vger.kernel.org, keescook@chromium.org,
	akpm@linux-foundation.org, yamada.masahiro@socionext.com,
	kirill@shutemov.name
Subject: Re: [PATCH v3 6/6] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system
Date: Sun, 17 Feb 2019 10:09:46 +0800	[thread overview]
Message-ID: <20190217020904.GF14858@MiWiFi-R3L-srv> (raw)
In-Reply-To: <20190216140008.28671-7-bhe@redhat.com>

Hi Mike,

On 02/16/19 at 10:00pm, Baoquan He wrote:
> On SGI UV system, kernel often hangs when KASLR is enabled. Disabling
> KASLR makes kernel work well.

I wrap codes which calculate the size of the direct mapping section
into a new function calc_direct_mapping_size() as Ingo suggested. This
code change has passed basic testing, but hasn't been tested on a
SGI UV machine after reproducing since it needs UV machine with UV
module installed of enough size.

To reproduce it, we can apply patches 0001~0005. If reproduced, patch
0006 can be applied on top to check if bug is fixed. Please help check
if the code is OK, if you have a machine, I can have a test.

Thanks
Baoquan

> 
> The back trace is:
> 
> kernel BUG at arch/x86/mm/init_64.c:311!
> invalid opcode: 0000 [#1] SMP
> [...]
> RIP: 0010:__init_extra_mapping+0x188/0x196
> [...]
> Call Trace:
>  init_extra_mapping_uc+0x13/0x15
>  map_high+0x67/0x75
>  map_mmioh_high_uv3+0x20a/0x219
>  uv_system_init_hub+0x12d9/0x1496
>  uv_system_init+0x27/0x29
>  native_smp_prepare_cpus+0x28d/0x2d8
>  kernel_init_freeable+0xdd/0x253
>  ? rest_init+0x80/0x80
>  kernel_init+0xe/0x110
>  ret_from_fork+0x2c/0x40
> 
> This is because the SGI UV system need map its MMIOH region to the direct
> mapping section, and the mapping happens in rest_init() which is much
> later than the calling of kernel_randomize_memory() to do mm KASLR. So
> mm KASLR can't count in the size of the MMIOH region when calculate the
> needed size of address space for the direct mapping section.
> 
> When KASLR is disabled, there are 64TB address space for both system RAM
> and the MMIOH regions to share. When KASLR is enabled, the current code
> of mm KASLR only reserves the actual size of system RAM plus extra 10TB
> for the direct mapping. Thus later the MMIOH mapping could go beyond
> the upper bound of the direct mapping to step into VMALLOC or VMEMMAP area.
> Then BUG_ON() in __init_extra_mapping() will be triggered.
> 
> E.g on the SGI UV3 machine where this bug was reported , there are two
> MMIOH regions:
> 
> [    1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000
> [    1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000
> 
> They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G RAM are
> spread out to 1TB regions. Then above two SGI MMIOH regions also will be
> mapped into the direct mapping section.
> 
> To fix it, we need check if it's SGI UV system by calling
> is_early_uv_system() in kernel_randomize_memory(). If yes, do not adapt
> thesize of the direct mapping section, just keep it as is, e.g in level-4
> paging mode, 64TB.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  arch/x86/mm/kaslr.c | 57 +++++++++++++++++++++++++++++++++------------
>  1 file changed, 42 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index ca12ed4e5239..754b5da91d43 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -29,6 +29,7 @@
>  #include <asm/pgtable.h>
>  #include <asm/setup.h>
>  #include <asm/kaslr.h>
> +#include <asm/uv/uv.h>
>  
>  #include "mm_internal.h"
>  
> @@ -113,15 +114,51 @@ static inline bool kaslr_memory_enabled(void)
>  	return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
>  }
>  
> +/*
> + * Even though a huge virtual address space is reserved for the direct
> + * mapping of physical memory, e.g in 4-level pageing mode, it's 64TB,
> + * rare system can own enough physical memory to use it up, most are
> + * even less than 1TB. So with KASLR enabled, we adapt the size of
> + * direct mapping area to size of actual physical memory plus the
> + * configured padding CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING.
> + * The left part will be taken out to join memory randomization.
> + *
> + * Note that UV system is an exception, its MMIOH region need be mapped
> + * into the direct mapping area too, while the size can't be got until
> + * rest_init() calling. Hence for UV system, do not adapt the size
> + * of direct mapping area.
> + */
> +static inline unsigned long calc_direct_mapping_size(void)
> +{
> +	unsigned long size_tb, memory_tb;
> +
> +	/*
> +	 * Update Physical memory mapping to available and
> +	 * add padding if needed (especially for memory hotplug support).
> +	 */
> +	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> +		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> +
> +	size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
> +
> +	/*
> +	 * Adapt phyiscal memory region size based on available memory if
> +	 * it's not UV system.
> +	 */
> +	if (memory_tb < size_tb && !is_early_uv_system())
> +		size_tb = memory_tb;
> +
> +	return size_tb;
> +}
> +
>  /* Initialize base and padding for each memory region randomized with KASLR */
>  void __init kernel_randomize_memory(void)
>  {
> -	size_t i;
> -	unsigned long vaddr_start, vaddr;
> -	unsigned long rand, memory_tb;
> -	struct rnd_state rand_state;
> +	unsigned long vaddr_start, vaddr, rand;
>  	unsigned long remain_entropy;
>  	unsigned long vmemmap_size;
> +	struct rnd_state rand_state;
> +	size_t i;
>  
>  	vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
>  	vaddr = vaddr_start;
> @@ -138,20 +175,10 @@ void __init kernel_randomize_memory(void)
>  	if (!kaslr_memory_enabled())
>  		return;
>  
> -	kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
> +	kaslr_regions[0].size_tb = calc_direct_mapping_size();
>  	kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;
>  
> -	/*
> -	 * Update Physical memory mapping to available and
> -	 * add padding if needed (especially for memory hotplug support).
> -	 */
>  	BUG_ON(kaslr_regions[0].base != &page_offset_base);
> -	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> -		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> -
> -	/* Adapt phyiscal memory region size based on available memory */
> -	if (memory_tb < kaslr_regions[0].size_tb)
> -		kaslr_regions[0].size_tb = memory_tb;
>  
>  	/*
>  	 * Calculate how many TB vmemmap region needs, and align to
> -- 
> 2.17.2
> 

  reply	other threads:[~2019-02-17  2:10 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-16 14:00 [PATCH v3 0/6] Several patches to fix code bugs, improve documents and clean up Baoquan He
2019-02-16 14:00 ` [PATCH v3 1/6] x86/mm/KASLR: Improve code comments about struct kaslr_memory_region Baoquan He
2019-02-17 17:07   ` Kees Cook
2019-02-18  3:17     ` Baoquan He
2019-03-12  3:45       ` Baoquan He
2019-03-12  0:55     ` Baoquan He
2019-02-16 14:00 ` [PATCH v3 2/6] x86/mm/KASLR: Open code unnecessary function get_padding Baoquan He
2019-02-17 17:14   ` Kees Cook
2019-02-16 14:00 ` [PATCH v3 3/6] mm: Add build time sanity check for struct page size Baoquan He
2019-02-17 16:50   ` Kees Cook
2019-02-18  8:07     ` Baoquan He
2019-02-16 14:00 ` [PATCH v3 4/6] x86/mm/KASLR: Fix the wrong calculation of memory region initial size Baoquan He
2019-02-17 16:53   ` Kees Cook
2019-02-18  8:30     ` Baoquan He
2019-02-16 14:00 ` [PATCH v3 5/6] x86/mm/KASLR: Calculate the actual size of vmemmap region Baoquan He
2019-02-17 17:25   ` Kees Cook
2019-02-18  9:50     ` Baoquan He
2019-02-18 10:09       ` Baoquan He
2019-02-18 10:11       ` Baoquan He
2019-02-16 14:00 ` [PATCH v3 6/6] x86/mm/KASLR: Do not adapt the size of the direct mapping section for SGI UV system Baoquan He
2019-02-17  2:09   ` Baoquan He [this message]
2019-02-18 19:24     ` Mike Travis
2019-02-19  0:04       ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190217020904.GF14858@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mike.travis@hpe.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=thgarnie@google.com \
    --cc=travis@sgi.com \
    --cc=x86@kernel.org \
    --cc=yamada.masahiro@socionext.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox