From mboxrd@z Thu Jan 1 00:00:00 1970 From: bhsharma@redhat.com (Bhupesh Sharma) Date: Wed, 15 Nov 2017 16:28:55 +0530 Subject: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP In-Reply-To: References: <20171113092730.GA29552@linaro.org> Message-ID: <3df4c6c5-0abe-01ee-730d-2edaa5f497d2@redhat.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Ard, Akashi, On 11/14/2017 04:50 PM, Ard Biesheuvel wrote: > On 13 November 2017 at 09:27, AKASHI Takahiro > wrote: >> Hi, >> >> On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote: >>> Resent with Akashi's correct email address. >>> >>> On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma wrote: >>>> Hi Ard, Akashi >>>> >>>> I have met an issue on an arm64 board using the latest master branch from Linus. >> (snip) >>>> >>>> 8. Also, I think now the crashkernel handling changed by >>>> e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved >>>> memblock regions explicitly in iomem), needs to be changed to handle >>>> the change added by Ard to fix this issue on ACPI only machines. >>>> >>>> I have a dirty hack in place, but I would like to have your opinions >>>> about what can be a more concrete fix to this issue (as we mark these >>>> regions as System RAM now rather than NOMAP) and I don't have a DTB >>>> based machine to test on currently. >> >> I don't know much about acpi reclaim regions, >> can you please tell me how your change affects your panic case? Sorry I was away yesterday and couldn't get back with the dirty hack details. But I see Ard has already proposed the following change and it looks similar to the change I did locally however that doesn't seem to fix the issue completely at my end so far. Here are more details on the same .. > > Does this help at all? > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > index 7768423b39d3..61d867647cca 100644 > --- a/arch/arm64/kernel/setup.c > +++ b/arch/arm64/kernel/setup.c > @@ -213,7 +213,7 @@ static void __init request_standard_resources(void) > > for_each_memblock(memory, region) { > res = alloc_bootmem_low(sizeof(*res)); > - if (memblock_is_nomap(region)) { > + if (memblock_is_nomap(region) || memblock_is_reserved(region)) { > res->name = "reserved"; > res->flags = IORESOURCE_MEM; > } else { > .. So, I tried using the 'memblock_is_reserved' check in ' request_standard_resources' however as 'memblock_is_reserved' expects a phy_addr as an input argument, I changed mine to something like this: - if (memblock_is_nomap(region)) { + if (memblock_is_nomap(region) || memblock_is_reserved(__pfn_to_phys(memblock_region_reserved_base_pfn(region)))) { However, I see I am hitting a still hitting the issue and its quite peculiar one. First some more background on what is happening on this Huawei Taishan arm64 board that I have: 1a. I see from the boot logs that one of the ACPI tables (DSDT) is at phy addr 0x39710000: # dmesg | grep -i "DSDT" [ 0.000000] ACPI: DSDT 0x0000000039710000 006656 (v02 HISI HIP07 00000000 INTL 20151124) 1b. This DSDT table is correctly marked as a ACPI Reclaim memory, however I see that just preceding this entry there also is a 'Boot Code' entry from address '0x0000396c0000-0x00003970ffff': # dmesg | grep -B 2 -i "ACPI reclaim" [ 0.000000] efi: 0x000039670000-0x0000396bffff [Runtime Code |RUN| | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] 2. Now, I am not sure which kernel layer does the following changes (I am still trying to dig it out more), but I see that the 'Boot Code' and ACPI DSDT table regions are somehow merged into one memblock_region and appear as range '396c0000-3975ffff' in the '/proc/iomem' interface: # cat /proc/iomem | grep -A 2 -B 2 39 00000000-3961ffff : System RAM 00080000-00b6ffff : Kernel code 00cb0000-0167ffff : Kernel data 0e800000-2e7fffff : Crash kernel 39620000-396bffff : reserved 396c0000-3975ffff : System RAM 39760000-3976ffff : reserved 39770000-397affff : reserved 397b0000-3989ffff : reserved 398a0000-398bffff : reserved 398c0000-39d3ffff : reserved 39d40000-3ed2ffff : System RAM 3. As to why this merged region appears as a System RAM area, rather than a RESERVED one, the following code path explains the same: 3a. The check we added in 'arch/arm64/kernel/setup.c' doesn't handle the ACPI DSDT table properly and mark it as 'RESERVED'. This is because 'memblock_is_reserved' calls 'memblock_search' internally which is implemented currently as: static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr) { unsigned int left = 0, right = type->cnt; do { unsigned int mid = (right + left) / 2; if (addr < type->regions[mid].base) right = mid; else if (addr >= (type->regions[mid].base + type->regions[mid].size)) left = mid + 1; else return mid; } while (left < right); return -1; } 3b. Since 'addr' being passed to 'memblock_search' calculated via '__pfn_to__phys(memblock_region_memory_base_pfn(region)' in this case is 0x396c0000 (see iomem entry in point 2 above), so we never see that this memblock is reserved for the ACPI DSDT entry@0x39710000. 4. Now, when we run the kexec-tools to load a crashdump kernel, it doesn't find an entry for the ACPI DSDT table in the reserved range (but instead finds it as a System RAM range): # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname -r`.img --reuse-cmdline -d ... get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved get_memory_ranges_iomem_cb: 00000000396c0000 - 000000003975ffff : System RAM get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved get_memory_ranges_iomem_cb: 0000000039770000 - 00000000397affff : reserved get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved get_memory_ranges_iomem_cb: 00000000398a0000 - 00000000398bffff : reserved get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM elf_arm64_probe: Not an ELF executable. .. 5. Now when a crash is issued to boot the crashkernel, we see it panic while trying to access the acpi tables (note that the logs below have been snipped for clarity): # echo c > /proc/sysrq-trigger ... [ 419.495621] Bye! ... [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] ... [ 0.000000] ACPI: DSDT 0x0000000039710000 006656 (v02 HISI HIP07 00000000 INTL 20151124) ... [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000010200000-0x00000000301fffff] [ 0.000000] node 0: [mem 0x0000000039620000-0x00000000396bffff] [ 0.000000] node 0: [mem 0x0000000039760000-0x000000003976ffff] [ 0.000000] node 0: [mem 0x00000000397b0000-0x000000003989ffff] [ 0.000000] node 0: [mem 0x00000000398c0000-0x0000000039d3ffff] [ 0.000000] node 0: [mem 0x000000003ed30000-0x000000003ed5ffff] ... [ 0.039309] ACPI: Core revision 20170728 [ 0.044383] Unable to handle kernel paging request at virtual address ffff000009f10027 [ 0.052386] Mem abort info: [ 0.055201] Exception class = DABT (current EL), IL = 32 bits [ 0.061179] SET = 0, FnV = 0 [ 0.064258] EA = 0, S1PTW = 0 [ 0.067424] Data abort info: [ 0.070326] ISV = 0, ISS = 0x00000021 [ 0.074195] CM = 0, WnR = 0 [ 0.077187] swapper pgtable: 64k pages, 48-bit VAs, pgd = ffff000009650000 [ 0.084133] [ffff000009f10027] *pgd=00000000301d0003, *pud=00000000301d0003, *pmd=00000000301c0003, *pte=00e8000039710707 [ 0.095215] Internal error: Oops: 96000021 [#1] SMP [ 0.100139] Modules linked in: [ 0.103219] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0+ #30 [ 0.109373] task: ffff000008d05580 task.stack: ffff000008cc0000 [ 0.115356] PC is at acpi_ns_lookup+0x25c/0x3c0 [ 0.119929] LR is at acpi_ds_load1_begin_op+0xa4/0x294 [ 0.125117] pc : [] lr : [] pstate: 60000045 [ 0.132589] sp : ffff000008ccfb40 [ 0.135930] x29: ffff000008ccfb40 x28: ffff000008a9c18c [ 0.141295] x27: ffff0000088be820 x26: 0000000000000000 [ 0.146659] x25: 000000000000001b x24: 0000000000000001 [ 0.152024] x23: 0000000000000001 x22: ffff000009f10027 [ 0.157389] x21: ffff000008ccfc50 x20: 0000000000000001 [ 0.162753] x19: 000000000000001b x18: 0000000000000005 [ 0.168117] x17: 0000000000000000 x16: 0000000000000000 [ 0.173481] x15: 0000000000000000 x14: 000000000000038e [ 0.178846] x13: ffffffff00000000 x12: ffffffffffffffff [ 0.184210] x11: 0000000000000006 x10: 00000000ffffff76 [ 0.189574] x9 : 000000000000005f x8 : ffff800014670140 [ 0.194939] x7 : 0000000000000000 x6 : ffff000008ccfc50 [ 0.200303] x5 : ffff800012d45000 x4 : 0000000000000001 [ 0.205668] x3 : ffff000008ccfbe0 x2 : ffff0000095e3a00 [ 0.211032] x1 : ffff000009f10027 x0 : 0000000000000000 [ 0.216397] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) [ 0.223166] Call trace: [ 0.225629] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) [ 0.232136] fa00: 0000000000000000 ffff000009f10027 ffff0000095e3a00 ffff000008ccfbe0 [ 0.240048] fa20: 0000000000000001 ffff800012d45000 ffff000008ccfc50 0000000000000000 [ 0.247960] fa40: ffff800014670140 000000000000005f 00000000ffffff76 0000000000000006 [ 0.255872] fa60: ffffffffffffffff ffffffff00000000 000000000000038e 0000000000000000 [ 0.263785] fa80: 0000000000000000 0000000000000000 0000000000000005 000000000000001b [ 0.271697] faa0: 0000000000000001 ffff000008ccfc50 ffff000009f10027 0000000000000001 [ 0.279609] fac0: 0000000000000001 000000000000001b 0000000000000000 ffff0000088be820 [ 0.287521] fae0: ffff000008a9c18c ffff000008ccfb40 ffff00000849d3c0 ffff000008ccfb40 [ 0.295433] fb00: ffff0000084a862c 0000000060000045 ffff000008ccfb40 ffff000008261918 [ 0.303345] fb20: ffffffffffffffff ffff0000087f193c ffff000008ccfb40 ffff0000084a862c [ 0.311258] [] acpi_ns_lookup+0x25c/0x3c0 [ 0.316885] [] acpi_ds_load1_begin_op+0xa4/0x294 [ 0.323128] [] acpi_ps_build_named_op+0xc4/0x198 [ 0.329371] [] acpi_ps_create_op+0x14c/0x270 [ 0.335262] [] acpi_ps_parse_loop+0x188/0x5c8 [ 0.341241] [] acpi_ps_parse_aml+0xb0/0x2b8 [ 0.347044] [] acpi_ns_one_complete_parse+0x144/0x184 [ 0.353726] [] acpi_ns_parse_table+0x48/0x68 [ 0.359616] [] acpi_ns_load_table+0x4c/0xdc [ 0.365420] [] acpi_tb_load_namespace+0xe4/0x264 [ 0.371664] [] acpi_load_tables+0x48/0xc0 [ 0.377292] [] acpi_early_init+0x9c/0xd0 [ 0.382832] [] start_kernel+0x3b4/0x43c So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT table' ranges to be merged into a single region at '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using 'memblock_is_reserved'. Any pointers? Regards, Bhupesh