All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Dan Williams <dan.j.williams@intel.com>,
	Alistair Popple <apopple@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, max8rr8@gmail.com,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	jhubbard@nvidia.com, Kees Cook <keescook@chromium.org>
Subject: Re: [PATCH 1/1] x86/ioremap: Use is_vmalloc_addr in iounmap
Date: Sat, 10 Aug 2024 19:45:56 +0200	[thread overview]
Message-ID: <87zfpks23v.ffs@tglx> (raw)
In-Reply-To: <66b59314b3d4_c1448294d3@dwillia2-xfh.jf.intel.com.notmuch>

On Thu, Aug 08 2024 at 20:55, Dan Williams wrote:
> Alistair Popple wrote:
>> Dan Williams <dan.j.williams@intel.com> writes:
>> > Thomas Gleixner wrote:
>> >> Wait. MAX_PHYSMEM_BITS is either 46 (4-level) or 52 (5-level), which is
>> >> fully covered by the identity map space.
>> >> 
>> >> So even if the search starts from top of that space, how do we end up
>> >> with high_memory > VMALLOC_START?
>> >> 
>> >> That does not make any sense at all
>> >
>> > Max, or Alistair can you provide more details of how private memory spills over
>> > into the VMALLOC space on these platforms?
>> 
>> Well I was hoping pleading ignorance on x86 memory maps would get me out
>> of having to look too deeply :-) But alas...
>> 
>> It appears the problem originates in KASLR which can cause the VMALLOC
>> region to overlap with the top of the linear map.
>> 
>> > I too would have thought that MAX_PHYSMEM_BITS protects against this?
>> 
>> Me too, until about an hour ago. As noted above
>> request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
>> down. Therefore VMALLOC_START needs to be greater than PAGE_OFFSET + (1
>> << MAX_PHYSMEM_BITS) - 1.  However the default configuration for KASLR
>> as set by RANDOMIZE_MEMORY_PHYSICAL_PADDING is to only provide 10TB
>> above what max_pfn is set to at boot time (and even then only if
>> CONFIG_MEMORY_HOTPLUG is enabled).

Duh.

>> Obviously ZONE_DEVICE memory ends up being way above that and crosses
>> into the VMALLOC region.

So we need to go through all usage sites of MAX_PHYSMEM_BITS and fix
them up...

>> 	@@ -2277,6 +2277,7 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
>> 		depends on RANDOMIZE_MEMORY
>> 		default "0xa" if MEMORY_HOTPLUG
>> 		default "0x0"
>> 	+       range 0x40 0x40 if GET_FREE_REGION
>> 		range 0x1 0x40 if MEMORY_HOTPLUG
>> 		range 0x0 0x40
>> 		help

That reduces the entropy to the minimum and papers over the problem with
4-level paging, but it won't solve the problem on systems with 5-level
paging.

There the 64T of padding are just not cutting it. MAX_PHYSMEM_BITS is 52
for 5 level ....

> 	 @@ -1824,10 +1824,11 @@ static resource_size_t gfr_start(struct resource *base, resource_size_t size,
> 					  resource_size_t align, unsigned long flags)
> 	  {
> 		 if (flags & GFR_DESCENDING) {
> 	 +               u64 kaslr_pad = CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING << 40;
> 			 resource_size_t end;
> 
> 			 end = min_t(resource_size_t, base->end,
> 	 -                           (1ULL << MAX_PHYSMEM_BITS) - 1);
> 	 +                           (1ULL << MAX_PHYSMEM_BITS) - kaslr_pad - 1);
> 			 return end - size + 1;
> 		 }

This needs a fixup of gfr_continue() too, but it does not work at
all. The size of the direct map is calculated as:

       memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
                CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;

That size is used to calculate the address above which the vmalloc area
is placed.

So assume 6T RAM installed + 10T space for hotplug memory. That means
the vmalloc area can start right above 16T. But 64T - 10T = 54T which is
slightly larger than 16T :)

I came up with the uncompiled below. I fixed up the obvious places, but
did not go through all usage sites of MAX_PHYSMEM_BITS yet.

Thanks,

        tglx
---
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -187,6 +187,8 @@ extern unsigned int ptrs_per_p4d;
 #define KMSAN_MODULES_ORIGIN_START	(KMSAN_MODULES_SHADOW_START + MODULES_LEN)
 #endif /* CONFIG_KMSAN */
 
+#define DIRECT_MAP_END		(VMALLOC_START - 1)
+
 #define MODULES_VADDR		(__START_KERNEL_map + KERNEL_IMAGE_SIZE)
 /* The module sections ends with the start of the fixmap */
 #ifndef CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -97,6 +97,10 @@ extern const int mmap_rnd_compat_bits_ma
 extern int mmap_rnd_compat_bits __read_mostly;
 #endif
 
+#ifndef DIRECT_MAP_END
+# define DIRECT_MAP_END	((1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT)) - 1)
+#endif
+
 #include <asm/page.h>
 #include <asm/processor.h>
 
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1826,8 +1826,7 @@ static resource_size_t gfr_start(struct
 	if (flags & GFR_DESCENDING) {
 		resource_size_t end;
 
-		end = min_t(resource_size_t, base->end,
-			    (1ULL << MAX_PHYSMEM_BITS) - 1);
+		end = min_t(resource_size_t, base->end, DIRECT_MAP_END);
 		return end - size + 1;
 	}
 
@@ -1844,8 +1843,7 @@ static bool gfr_continue(struct resource
 	 * @size did not wrap 0.
 	 */
 	return addr > addr - size &&
-	       addr <= min_t(resource_size_t, base->end,
-			     (1ULL << MAX_PHYSMEM_BITS) - 1);
+	       addr <= min_t(resource_size_t, base->end, DIRECT_MAP_END);
 }
 
 static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1681,7 +1681,7 @@ struct range __weak arch_get_mappable_ra
 
 struct range mhp_get_pluggable_range(bool need_mapping)
 {
-	const u64 max_phys = (1ULL << MAX_PHYSMEM_BITS) - 1;
+	const u64 max_phys = DIRECT_MAP_END;
 	struct range mhp_range;
 
 	if (need_mapping) {
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -129,7 +129,7 @@ static inline int sparse_early_nid(struc
 static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
 						unsigned long *end_pfn)
 {
-	unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT);
+	unsigned long max_sparsemem_pfn = (DIRECT_MAP_END + 1) >> PAGE_SHIFT;
 
 	/*
 	 * Sanity checks - do not allow an architecture to pass

  reply	other threads:[~2024-08-10 17:46 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10 10:00 [PATCH 1/1] x86/ioremap: Use is_vmalloc_addr in iounmap Max Ramanouski
2024-08-08  6:12 ` Alistair Popple
2024-08-08  6:44   ` John Hubbard
2024-08-12  6:15   ` Christoph Hellwig
2024-08-08 14:18 ` Thomas Gleixner
2024-08-08 15:58   ` Dan Williams
2024-08-08 16:15     ` Thomas Gleixner
2024-08-08 16:32       ` Dan Williams
2024-08-08 16:39         ` Dan Williams
2024-08-08 18:44           ` Thomas Gleixner
2024-08-08 19:59             ` Dan Williams
2024-08-09  2:28               ` Alistair Popple
2024-08-09  3:55                 ` Dan Williams
2024-08-10 17:45                   ` Thomas Gleixner [this message]
2024-08-12  7:41                     ` Alistair Popple
2024-08-12 10:03                       ` Thomas Gleixner
2024-08-12 11:46                         ` Alistair Popple
2024-08-12 12:10                         ` Max R
2024-08-12 13:23                         ` Thomas Gleixner
2024-08-13  1:33                           ` Alistair Popple
2024-08-13  8:20                             ` Thomas Gleixner
2024-08-13 20:37                               ` Thomas Gleixner
2024-08-13 22:29                                 ` x86/kaslr: Expose and use the end of the physical memory address space Thomas Gleixner
2024-08-14  0:26                                   ` Alistair Popple
2024-08-14 14:33                                   ` Dan Williams
2024-08-15 16:11                                   ` Kees Cook
2024-08-15 22:48                                   ` Max R
2024-08-16  9:42                                   ` David Hildenbrand
2024-08-16  9:43                                   ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner
2024-08-20 20:38                                   ` tip-bot2 for Thomas Gleixner
2024-09-22 22:31                                   ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zfpks23v.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=apopple@nvidia.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jhubbard@nvidia.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=max8rr8@gmail.com \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.