From mboxrd@z Thu Jan 1 00:00:00 1970 From: wefu@redhat.com (Fu Wei) Date: Tue, 16 Feb 2016 16:55:46 +0800 Subject: [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory In-Reply-To: <56AF8797.5000908@redhat.com> References: <1454324093-15998-1-git-send-email-ard.biesheuvel@linaro.org> <1454324093-15998-9-git-send-email-ard.biesheuvel@linaro.org> <20160201145014.GK674@leverpostej> <56AF8797.5000908@redhat.com> Message-ID: <56C2E412.3000402@redhat.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 02/02/2016 12:28 AM, Fu Wei wrote: > Hi Mark > > On 02/01/2016 10:50 PM, Mark Rutland wrote: >> On Mon, Feb 01, 2016 at 11:54:53AM +0100, Ard Biesheuvel wrote: >>> This relaxes the kernel Image placement requirements, so that it >>> may be placed at any 2 MB aligned offset in physical memory. >>> >>> This is accomplished by ignoring PHYS_OFFSET when installing >>> memblocks, and accounting for the apparent virtual offset of >>> the kernel Image. As a result, virtual address references >>> below PAGE_OFFSET are correctly mapped onto physical references >>> into the kernel Image regardless of where it sits in memory. >>> >>> Note that limiting memory using mem= is not unambiguous anymore after >>> this change, considering that the kernel may be at the top of physical >>> memory, and clipping from the bottom rather than the top will discard >>> any 32-bit DMA addressable memory first. To deal with this, the handling >>> of mem= is reimplemented to clip top down, but take special care not to >>> clip memory that covers the kernel image. >>> >>> Since mem= should not be considered a production feature, a panic >>> notifier >>> handler is installed that dumps the memory limit at panic time if one >>> was >>> set. >> >> Good idea! >> >> It would be great if we could follow up with a sizes.h update for SZ_4G, >> though that's only a nice-to-have, and in no way should block this. >> >> Other than that, this looks good. Thanks for putting this together! >> >> Reviewed-by: Mark Rutland >> >> For the Documentation/arm64 parts we'll need to ask Fu Wei to update the >> zh_CN/ translation to match. > > Great thanks for your info > Yes, I will working on it The zh_CN patch has been prepared, once the English version is merged into mainline, I will upstream that immediately. Because there is another zh_CN patch for booting.txt in upstream procedure: https://lkml.org/lkml/2016/2/16/164 sorry for delay. :-) > >> >> Mark. >> >>> >>> Signed-off-by: Ard Biesheuvel >>> --- >>> Documentation/arm64/booting.txt | 20 ++-- >>> arch/arm64/include/asm/boot.h | 6 ++ >>> arch/arm64/include/asm/kernel-pgtable.h | 12 +++ >>> arch/arm64/include/asm/kvm_asm.h | 2 +- >>> arch/arm64/include/asm/memory.h | 15 +-- >>> arch/arm64/kernel/head.S | 6 +- >>> arch/arm64/kernel/image.h | 13 ++- >>> arch/arm64/mm/init.c | 100 +++++++++++++++++++- >>> arch/arm64/mm/mmu.c | 3 + >>> 9 files changed, 155 insertions(+), 22 deletions(-) >>> >>> diff --git a/Documentation/arm64/booting.txt >>> b/Documentation/arm64/booting.txt >>> index 701d39d3171a..56d6d8b796db 100644 >>> --- a/Documentation/arm64/booting.txt >>> +++ b/Documentation/arm64/booting.txt >>> @@ -109,7 +109,13 @@ Header notes: >>> 1 - 4K >>> 2 - 16K >>> 3 - 64K >>> - Bits 3-63: Reserved. >>> + Bit 3: Kernel physical placement >>> + 0 - 2MB aligned base should be as close as possible >>> + to the base of DRAM, since memory below it is not >>> + accessible via the linear mapping >>> + 1 - 2MB aligned base may be anywhere in physical >>> + memory >>> + Bits 4-63: Reserved. >>> >>> - When image_size is zero, a bootloader should attempt to keep as much >>> memory as possible free for use by the kernel immediately after the >>> @@ -117,14 +123,14 @@ Header notes: >>> depending on selected features, and is effectively unbound. >>> >>> The Image must be placed text_offset bytes from a 2MB aligned base >>> -address near the start of usable system RAM and called there. Memory >>> -below that base address is currently unusable by Linux, and >>> therefore it >>> -is strongly recommended that this location is the start of system RAM. >>> -The region between the 2 MB aligned base address and the start of the >>> -image has no special significance to the kernel, and may be used for >>> -other purposes. >>> +address anywhere in usable system RAM and called there. The region >>> +between the 2 MB aligned base address and the start of the image has no >>> +special significance to the kernel, and may be used for other purposes. >>> At least image_size bytes from the start of the image must be free for >>> use by the kernel. >>> +NOTE: versions prior to v4.6 cannot make use of memory below the >>> +physical offset of the Image so it is recommended that the Image be >>> +placed as close as possible to the start of system RAM. >>> >>> Any memory described to the kernel (even that below the start of the >>> image) which is not marked as reserved from the kernel (e.g., with a >>> diff --git a/arch/arm64/include/asm/boot.h >>> b/arch/arm64/include/asm/boot.h >>> index 81151b67b26b..ebf2481889c3 100644 >>> --- a/arch/arm64/include/asm/boot.h >>> +++ b/arch/arm64/include/asm/boot.h >>> @@ -11,4 +11,10 @@ >>> #define MIN_FDT_ALIGN 8 >>> #define MAX_FDT_SIZE SZ_2M >>> >>> +/* >>> + * arm64 requires the kernel image to placed >>> + * TEXT_OFFSET bytes beyond a 2 MB aligned base >>> + */ >>> +#define MIN_KIMG_ALIGN SZ_2M >>> + >>> #endif >>> diff --git a/arch/arm64/include/asm/kernel-pgtable.h >>> b/arch/arm64/include/asm/kernel-pgtable.h >>> index a459714ee29e..5c6375d8528b 100644 >>> --- a/arch/arm64/include/asm/kernel-pgtable.h >>> +++ b/arch/arm64/include/asm/kernel-pgtable.h >>> @@ -79,5 +79,17 @@ >>> #define SWAPPER_MM_MMUFLAGS (PTE_ATTRINDX(MT_NORMAL) | >>> SWAPPER_PTE_FLAGS) >>> #endif >>> >>> +/* >>> + * To make optimal use of block mappings when laying out the linear >>> + * mapping, round down the base of physical memory to a size that can >>> + * be mapped efficiently, i.e., either PUD_SIZE (4k granule) or >>> PMD_SIZE >>> + * (64k granule), or a multiple that can be mapped using contiguous >>> bits >>> + * in the page tables: 32 * PMD_SIZE (16k granule) >>> + */ >>> +#ifdef CONFIG_ARM64_64K_PAGES >>> +#define ARM64_MEMSTART_ALIGN SZ_512M >>> +#else >>> +#define ARM64_MEMSTART_ALIGN SZ_1G >>> +#endif >>> >>> #endif /* __ASM_KERNEL_PGTABLE_H */ >>> diff --git a/arch/arm64/include/asm/kvm_asm.h >>> b/arch/arm64/include/asm/kvm_asm.h >>> index f5aee6e764e6..054ac25e7c2e 100644 >>> --- a/arch/arm64/include/asm/kvm_asm.h >>> +++ b/arch/arm64/include/asm/kvm_asm.h >>> @@ -26,7 +26,7 @@ >>> #define KVM_ARM64_DEBUG_DIRTY_SHIFT 0 >>> #define KVM_ARM64_DEBUG_DIRTY (1 << >>> KVM_ARM64_DEBUG_DIRTY_SHIFT) >>> >>> -#define kvm_ksym_ref(sym) ((void *)&sym - KIMAGE_VADDR + >>> PAGE_OFFSET) >>> +#define kvm_ksym_ref(sym) phys_to_virt((u64)&sym - >>> kimage_voffset) >>> >>> #ifndef __ASSEMBLY__ >>> struct kvm; >>> diff --git a/arch/arm64/include/asm/memory.h >>> b/arch/arm64/include/asm/memory.h >>> index 4388651d1f0d..61005e7dd6cb 100644 >>> --- a/arch/arm64/include/asm/memory.h >>> +++ b/arch/arm64/include/asm/memory.h >>> @@ -88,10 +88,10 @@ >>> #define __virt_to_phys(x) ({ \ >>> phys_addr_t __x = (phys_addr_t)(x); \ >>> __x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) : \ >>> - (__x - KIMAGE_VADDR + PHYS_OFFSET); }) >>> + (__x - kimage_voffset); }) >>> >>> #define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET + >>> PAGE_OFFSET)) >>> -#define __phys_to_kimg(x) ((unsigned long)((x) - PHYS_OFFSET + >>> KIMAGE_VADDR)) >>> +#define __phys_to_kimg(x) ((unsigned long)((x) + kimage_voffset)) >>> >>> /* >>> * Convert a page to/from a physical address >>> @@ -127,13 +127,14 @@ extern phys_addr_t memstart_addr; >>> /* PHYS_OFFSET - the physical address of the start of memory. */ >>> #define PHYS_OFFSET ({ memstart_addr; }) >>> >>> +/* the offset between the kernel virtual and physical mappings */ >>> +extern u64 kimage_voffset; >>> + >>> /* >>> - * The maximum physical address that the linear direct mapping >>> - * of system RAM can cover. (PAGE_OFFSET can be interpreted as >>> - * a 2's complement signed quantity and negated to derive the >>> - * maximum size of the linear mapping.) >>> + * Allow all memory at the discovery stage. We will clip it later. >>> */ >>> -#define MAX_MEMBLOCK_ADDR ({ memstart_addr - PAGE_OFFSET - 1; }) >>> +#define MIN_MEMBLOCK_ADDR 0 >>> +#define MAX_MEMBLOCK_ADDR U64_MAX >>> >>> /* >>> * PFNs are used to describe any physical page; this means >>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S >>> index 04d38a058b19..05b98289093e 100644 >>> --- a/arch/arm64/kernel/head.S >>> +++ b/arch/arm64/kernel/head.S >>> @@ -428,7 +428,11 @@ __mmap_switched: >>> and x4, x4, #~(THREAD_SIZE - 1) >>> msr sp_el0, x4 // Save thread_info >>> str_l x21, __fdt_pointer, x5 // Save FDT pointer >>> - str_l x24, memstart_addr, x6 // Save PHYS_OFFSET >>> + >>> + ldr x4, =KIMAGE_VADDR // Save the offset between >>> + sub x4, x4, x24 // the kernel virtual and >>> + str_l x4, kimage_voffset, x5 // physical mappings >>> + >>> mov x29, #0 >>> #ifdef CONFIG_KASAN >>> bl kasan_early_init >>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h >>> index 999633bd7294..c9c62cab25a4 100644 >>> --- a/arch/arm64/kernel/image.h >>> +++ b/arch/arm64/kernel/image.h >>> @@ -42,15 +42,18 @@ >>> #endif >>> >>> #ifdef CONFIG_CPU_BIG_ENDIAN >>> -#define __HEAD_FLAG_BE 1 >>> +#define __HEAD_FLAG_BE 1 >>> #else >>> -#define __HEAD_FLAG_BE 0 >>> +#define __HEAD_FLAG_BE 0 >>> #endif >>> >>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2) >>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2) >>> >>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) | \ >>> - (__HEAD_FLAG_PAGE_SIZE << 1)) >>> +#define __HEAD_FLAG_PHYS_BASE 1 >>> + >>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) | \ >>> + (__HEAD_FLAG_PAGE_SIZE << 1) | \ >>> + (__HEAD_FLAG_PHYS_BASE << 3)) >>> >>> /* >>> * These will output as part of the Image header, which should be >>> little-endian >>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >>> index 1d627cd8121c..e8e853a1024c 100644 >>> --- a/arch/arm64/mm/init.c >>> +++ b/arch/arm64/mm/init.c >>> @@ -35,8 +35,10 @@ >>> #include >>> #include >>> >>> +#include >>> #include >>> #include >>> +#include >>> #include >>> #include >>> #include >>> @@ -158,9 +160,80 @@ static int __init early_mem(char *p) >>> } >>> early_param("mem", early_mem); >>> >>> +/* >>> + * clip_mem_range() - remove memblock memory between @min and @max >>> until >>> + * we meet the limit in 'memory_limit'. >>> + */ >>> +static void __init clip_mem_range(u64 min, u64 max) >>> +{ >>> + u64 mem_size, to_remove; >>> + int i; >>> + >>> +again: >>> + mem_size = memblock_phys_mem_size(); >>> + if (mem_size <= memory_limit || max <= min) >>> + return; >>> + >>> + to_remove = mem_size - memory_limit; >>> + >>> + for (i = memblock.memory.cnt - 1; i >= 0; i--) { >>> + struct memblock_region *r = memblock.memory.regions + i; >>> + u64 start = max(min, r->base); >>> + u64 end = min(max, r->base + r->size); >>> + >>> + if (start >= max || end <= min) >>> + continue; >>> + >>> + if (end > min) { >>> + u64 size = min(to_remove, end - max(start, min)); >>> + >>> + memblock_remove(end - size, size); >>> + } else { >>> + memblock_remove(start, min(max - start, to_remove)); >>> + } >>> + goto again; >>> + } >>> +} >>> + >>> void __init arm64_memblock_init(void) >>> { >>> - memblock_enforce_memory_limit(memory_limit); >>> + const s64 linear_region_size = -(s64)PAGE_OFFSET; >>> + >>> + /* >>> + * Select a suitable value for the base of physical memory. >>> + */ >>> + memstart_addr = round_down(memblock_start_of_DRAM(), >>> + ARM64_MEMSTART_ALIGN); >>> + >>> + /* >>> + * Remove the memory that we will not be able to cover with the >>> + * linear mapping. Take care not to clip the kernel which may be >>> + * high in memory. >>> + */ >>> + memblock_remove(max(memstart_addr + linear_region_size, >>> __pa(_end)), >>> + ULLONG_MAX); >>> + if (memblock_end_of_DRAM() > linear_region_size) >>> + memblock_remove(0, memblock_end_of_DRAM() - >>> linear_region_size); >>> + >>> + if (memory_limit != (phys_addr_t)ULLONG_MAX) { >>> + u64 kbase = round_down(__pa(_text), MIN_KIMG_ALIGN); >>> + u64 kend = PAGE_ALIGN(__pa(_end)); >>> + u64 const sz_4g = 0x100000000UL; >>> + >>> + /* >>> + * Clip memory in order of preference: >>> + * - above the kernel and above 4 GB >>> + * - between 4 GB and the start of the kernel (if the kernel >>> + * is loaded high in memory) >>> + * - between the kernel and 4 GB (if the kernel is loaded >>> + * low in memory) >>> + * - below 4 GB >>> + */ >>> + clip_mem_range(max(sz_4g, kend), ULLONG_MAX); >>> + clip_mem_range(sz_4g, kbase); >>> + clip_mem_range(kend, sz_4g); >>> + clip_mem_range(0, min(kbase, sz_4g)); >>> + } >>> >>> /* >>> * Register the kernel text, kernel data, initrd, and initial >>> @@ -381,3 +454,28 @@ static int __init keepinitrd_setup(char *__unused) >>> >>> __setup("keepinitrd", keepinitrd_setup); >>> #endif >>> + >>> +/* >>> + * Dump out memory limit information on panic. >>> + */ >>> +static int dump_mem_limit(struct notifier_block *self, unsigned long >>> v, void *p) >>> +{ >>> + if (memory_limit != (phys_addr_t)ULLONG_MAX) { >>> + pr_emerg("Memory Limit: %llu MB\n", memory_limit >> 20); >>> + } else { >>> + pr_emerg("Memory Limit: none\n"); >>> + } >>> + return 0; >>> +} >>> + >>> +static struct notifier_block mem_limit_notifier = { >>> + .notifier_call = dump_mem_limit, >>> +}; >>> + >>> +static int __init register_mem_limit_dumper(void) >>> +{ >>> + atomic_notifier_chain_register(&panic_notifier_list, >>> + &mem_limit_notifier); >>> + return 0; >>> +} >>> +__initcall(register_mem_limit_dumper); >>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >>> index 4c4b15932963..8dda38378959 100644 >>> --- a/arch/arm64/mm/mmu.c >>> +++ b/arch/arm64/mm/mmu.c >>> @@ -46,6 +46,9 @@ >>> >>> u64 idmap_t0sz = TCR_T0SZ(VA_BITS); >>> >>> +u64 kimage_voffset __read_mostly; >>> +EXPORT_SYMBOL(kimage_voffset); >>> + >>> /* >>> * Empty_zero_page is a special page that is used for >>> zero-initialized data >>> * and COW. >>> -- >>> 2.5.0 >>>