public inbox for linux-mips@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
       [not found]         ` <20210228230811.wdae7oaaf3mbpgwl@mobilestation>
@ 2021-03-01  3:50           ` Florian Fainelli
  2021-03-01  9:22             ` Serge Semin
  2021-03-01  9:45             ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Mike Rapoport
  0 siblings, 2 replies; 17+ messages in thread
From: Florian Fainelli @ 2021-03-01  3:50 UTC (permalink / raw)
  To: Serge Semin, Mike Rapoport, Thomas Bogendoerfer
  Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu,
	Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko,
	linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

Hi Serge,

On 2/28/2021 3:08 PM, Serge Semin wrote:
> Hi folks,
> What you've got here seems a more complicated problem than it
> could originally look like. Please, see my comments below.
> 
> (Note I've discarded some of the email logs, which of no interest
> to the discovered problem. Please also note that I haven't got any
> Broadcom hardware to test out a solution suggested below.)
> 
> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
>> Hi Mike,
>>
>> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
>>> Hi Florian,
>>>
>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
>>>>
> 
>>>> [...]
> 
>>>>
>>>> Hi Roman, Thomas and other linux-mips folks,
>>>>
>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
>>>> commit, reverting it makes our MIPS platforms boot successfully. We do
>>>> not see a warning like this one in the commit message, instead what
>>>> happens appear to be a corrupted Device Tree which prevents the parsing
>>>> of the "rdb" node and leading to the interrupt controllers not being
>>>> registered, and the system eventually not booting.
>>>>
>>>> The Device Tree is built-into the kernel image and resides at
>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
>>>>
>>>> Do you have any idea what could be wrong with MIPS specifically here?
> 
> Most likely the problem you've discovered has been there for quite
> some time. The patch you are referring to just caused it to be
> triggered by extending the early allocation range. See before that
> patch was accepted the early memory allocations had been performed
> in the range:
> [kernel_end, RAM_END].
> The patch changed that, so the early allocations are done within
> [RAM_START + PAGE_SIZE, RAM_END].
> 
> In normal situations it's safe to do that as long as all the critical
> memory regions (including the memory residing a space below the
> kernel) have been reserved. But as soon as a memory with some critical
> structures haven't been reserved, the kernel may allocate it to be used
> for instance for early initializations with obviously unpredictable but
> most of the times unpleasant consequences.
> 
>>>
>>> Apparently there is a memblock allocation in one of the functions called
>>> from arch_mem_init() between plat_mem_setup() and
>>> early_init_fdt_reserve_self().
> 
> Mike, alas according to the log provided by Florian that's not the reason
> of the problem. Please, see my considerations below.
> 
>> [...]
>>
>> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
>> Feb 28 10:01:50 PST 2021
>> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
>> [    0.000000] FPU revision is: 00130001
> 
>> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
>> early_init_dt_scan_memory+0x160/0x1e0
>> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
>> early_init_dt_scan_memory+0x160/0x1e0
>> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
>> early_init_dt_scan_memory+0x160/0x1e0
> 
> Here the memory has been added to the memblock allocator.
> 
>> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
>> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
>> [    0.000000] printk: bootconsole [ns16550a0] enabled
> 
>> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
>> setup_arch+0x128/0x69c
> 
> Here the fdt memory has been reserved. (Note it's built into the
> kernel.)
> 
>> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
>> setup_arch+0x1f8/0x69c
> 
> Here the kernel itself together with built-in dtb have been reserved.
> So far so good.
> 
>> [    0.000000] Initrd not found or empty - disabling initrd
> 
>> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
>> from=0x00000000 max_addr=0x00000000
>> early_init_dt_alloc_memory_arch+0x40/0x84
>> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
>> from=0x00000000 max_addr=0x00000000
>> early_init_dt_alloc_memory_arch+0x40/0x84
>> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
>> memblock_alloc_range_nid+0xf8/0x198
> 
> The log above most likely belongs to the call-chain:
> setup_arch()
> +-> arch_mem_init()
>     +-> device_tree_init() - BMIPS specific method
>         +-> unflatten_and_copy_device_tree()
> 
> So to speak here we've copied the fdt from the original space
> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> it to [0x00003aa4-0x0000ba4b].
> 
> The problem is that a bit later the next call-chain is performed:
> setup_arch()
> +-> plat_smp_setup()
>     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
>         +-> if (!board_ebase_setup)
>                  board_ebase_setup = &bmips_ebase_setup;
> 
> So at the moment of the CPU traps initialization the bmips_ebase_setup()
> method is called. What trap_init() does isn't compatible with the
> allocation performed by the unflatten_and_copy_device_tree() method.
> See the next comment.
> 
>> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
>> from=0x00000000 max_addr=0x00000000
>> early_init_dt_alloc_memory_arch+0x40/0x84
>> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
>> setup_arch+0x3fc/0x69c
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
>> bytes.
>> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
>> linesize 32 bytes
>> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Zone ranges:
>> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
>> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
>> [    0.000000] Movable zone start for each node
>> [    0.000000] Early memory node ranges
>> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
>> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
>> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
>> [    0.000000] Initmem setup node 0 [mem
>> 0x0000000000000000-0x00000000cfffffff]
>> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
>> from=0x00000000 max_addr=0x00000000
>> alloc_node_mem_map.constprop.135+0x6c/0xc8
>> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] MEMBLOCK configuration:
>> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
>> [    0.000000]  memory.cnt  = 0x3
>> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
>> bytes flags: 0x0
>> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
>> bytes flags: 0x0
>> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
>> bytes flags: 0x0
>> [    0.000000]  reserved.cnt  = 0xa
>> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
>> bytes flags: 0x0
>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
>> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
>> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
>> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
>> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
>> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_free: [0x03245000-0x03244fff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] memblock_free: [0x03257000-0x03256fff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] memblock_free: [0x03269000-0x03268fff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
>> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
>> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
>> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
>> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
>> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
>> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
>> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
>> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
>> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
>> pcpu_embed_first_chunk+0x838/0x884
>> [    0.000000] memblock_free: [0x03231400-0x032323ff]
>> pcpu_embed_first_chunk+0x850/0x884
>> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
>> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
>> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
>> bytes, linear)
>> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
>> bytes, linear)
> 
>> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
>> trap_init+0x70/0x4e8
> 
> Most likely someplace here the corruption has happened. The log above
> has just reserved a memory for NMI/reset vectors:
> arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> 
> But then the board_ebase_setup() pointer is dereferenced and called,
> which has been initialized with bmips_ebase_setup() earlier and which
> overwrites the ebase variable with: 0x80001000 as this is
> CPU_BMIPS5000 CPU. So any further calls of the functions like
> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> corruption of the memory above 0x80001000, which as we have discovered
> belongs to fdt and unflattened device tree.
> 
>> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
>> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
>> cma-reserved, 1835008K highmem)
>> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>> [    0.000000] rcu: Hierarchical RCU implementation.
>> [    0.000000] rcu:     RCU event tracing is enabled.
>> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
>> is 25 jiffies.
>> [    0.000000] NR_IRQS: 256
> 
>> [    0.000000] OF: Bad cell count for /rdb
>> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
>> [    0.000000] OF: of_irq_init: children remain, but no parents
> 
> So here is the first time we have got the consequence of the corruption
> popped up. Luckily it's just the "Bad cells count" error. We could have
> got much less obvious log here up to getting a crash at some place
> further...
> 
>> [    0.000000] random: get_random_bytes called from
>> start_kernel+0x444/0x654 with crng_init=0
>> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
>> wraps every 8589934590000000ns
> 
>>
>> and with your patch applied which unfortunately did not work we have the
>> following:
>>
>> [...]
> 
> So a patch like this shall workaround the corruption:
> 
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
>  
>  	__dt_setup_arch(dtb);
>  
> +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> +
>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>  					     q->compatible)) {

This patch works, thanks a lot for the troubleshooting and analysis! How
about the following which would be more generic and works as well and
should be more universal since it does not require each architecture to
provide an appropriate call to memblock_reserve():

diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index e0352958e2f7..b0a173b500e8 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2367,10 +2367,7 @@ void __init trap_init(void)

        if (!cpu_has_mips_r2_r6) {
                ebase = CAC_BASE;
-               ebase_pa = virt_to_phys((void *)ebase);
                vec_size = 0x400;
-
-               memblock_reserve(ebase_pa, vec_size);
        } else {
                if (cpu_has_veic || cpu_has_vint)
                        vec_size = 0x200 + VECTORSPACING*64;
@@ -2410,6 +2407,14 @@ void __init trap_init(void)

        if (board_ebase_setup)
                board_ebase_setup();
+
+       /* board_ebase_setup() can change the exception base address
+        * reserve it now after changes were made.
+        */
+       if (!cpu_has_mips_r2_r6) {
+               ebase_pa = virt_to_phys((void *)ebase);
+               memblock_reserve(ebase_pa, vec_size);
+       }
        per_cpu_trap_init(true);
        memblock_set_bottom_up(false);
-- 
Florian

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-03-01  3:50           ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Florian Fainelli
@ 2021-03-01  9:22             ` Serge Semin
  2021-03-02  4:09               ` Florian Fainelli
  2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
  2021-03-01  9:45             ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Mike Rapoport
  1 sibling, 2 replies; 17+ messages in thread
From: Serge Semin @ 2021-03-01  9:22 UTC (permalink / raw)
  To: Florian Fainelli, Mike Rapoport, Thomas Bogendoerfer
  Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu,
	Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko,
	linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> Hi Serge,
> 
> On 2/28/2021 3:08 PM, Serge Semin wrote:
> > Hi folks,
> > What you've got here seems a more complicated problem than it
> > could originally look like. Please, see my comments below.
> > 
> > (Note I've discarded some of the email logs, which of no interest
> > to the discovered problem. Please also note that I haven't got any
> > Broadcom hardware to test out a solution suggested below.)
> > 
> > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> >> Hi Mike,
> >>
> >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> >>> Hi Florian,
> >>>
> >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>>>
> > 
> >>>> [...]
> > 
> >>>>
> >>>> Hi Roman, Thomas and other linux-mips folks,
> >>>>
> >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> >>>> not see a warning like this one in the commit message, instead what
> >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> >>>> of the "rdb" node and leading to the interrupt controllers not being
> >>>> registered, and the system eventually not booting.
> >>>>
> >>>> The Device Tree is built-into the kernel image and resides at
> >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>>>
> >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > 
> > Most likely the problem you've discovered has been there for quite
> > some time. The patch you are referring to just caused it to be
> > triggered by extending the early allocation range. See before that
> > patch was accepted the early memory allocations had been performed
> > in the range:
> > [kernel_end, RAM_END].
> > The patch changed that, so the early allocations are done within
> > [RAM_START + PAGE_SIZE, RAM_END].
> > 
> > In normal situations it's safe to do that as long as all the critical
> > memory regions (including the memory residing a space below the
> > kernel) have been reserved. But as soon as a memory with some critical
> > structures haven't been reserved, the kernel may allocate it to be used
> > for instance for early initializations with obviously unpredictable but
> > most of the times unpleasant consequences.
> > 
> >>>
> >>> Apparently there is a memblock allocation in one of the functions called
> >>> from arch_mem_init() between plat_mem_setup() and
> >>> early_init_fdt_reserve_self().
> > 
> > Mike, alas according to the log provided by Florian that's not the reason
> > of the problem. Please, see my considerations below.
> > 
> >> [...]
> >>
> >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> >> Feb 28 10:01:50 PST 2021
> >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> >> [    0.000000] FPU revision is: 00130001
> > 
> >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> > 
> > Here the memory has been added to the memblock allocator.
> > 
> >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > 
> >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> >> setup_arch+0x128/0x69c
> > 
> > Here the fdt memory has been reserved. (Note it's built into the
> > kernel.)
> > 
> >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> >> setup_arch+0x1f8/0x69c
> > 
> > Here the kernel itself together with built-in dtb have been reserved.
> > So far so good.
> > 
> >> [    0.000000] Initrd not found or empty - disabling initrd
> > 
> >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> >> memblock_alloc_range_nid+0xf8/0x198
> > 
> > The log above most likely belongs to the call-chain:
> > setup_arch()
> > +-> arch_mem_init()
> >     +-> device_tree_init() - BMIPS specific method
> >         +-> unflatten_and_copy_device_tree()
> > 
> > So to speak here we've copied the fdt from the original space
> > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > it to [0x00003aa4-0x0000ba4b].
> > 
> > The problem is that a bit later the next call-chain is performed:
> > setup_arch()
> > +-> plat_smp_setup()
> >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> >         +-> if (!board_ebase_setup)
> >                  board_ebase_setup = &bmips_ebase_setup;
> > 
> > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > method is called. What trap_init() does isn't compatible with the
> > allocation performed by the unflatten_and_copy_device_tree() method.
> > See the next comment.
> > 
> >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
> >> setup_arch+0x3fc/0x69c
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
> >> bytes.
> >> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
> >> linesize 32 bytes
> >> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Zone ranges:
> >> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
> >> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
> >> [    0.000000] Movable zone start for each node
> >> [    0.000000] Early memory node ranges
> >> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
> >> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
> >> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
> >> [    0.000000] Initmem setup node 0 [mem
> >> 0x0000000000000000-0x00000000cfffffff]
> >> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
> >> from=0x00000000 max_addr=0x00000000
> >> alloc_node_mem_map.constprop.135+0x6c/0xc8
> >> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
> >> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
> >> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] MEMBLOCK configuration:
> >> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
> >> [    0.000000]  memory.cnt  = 0x3
> >> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
> >> bytes flags: 0x0
> >> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
> >> bytes flags: 0x0
> >> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
> >> bytes flags: 0x0
> >> [    0.000000]  reserved.cnt  = 0xa
> >> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
> >> bytes flags: 0x0
> >> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
> >> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
> >> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
> >> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
> >> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
> >> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
> >> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_free: [0x03245000-0x03244fff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] memblock_free: [0x03257000-0x03256fff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] memblock_free: [0x03269000-0x03268fff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
> >> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
> >> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
> >> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
> >> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
> >> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
> >> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
> >> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
> >> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
> >> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
> >> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
> >> pcpu_embed_first_chunk+0x838/0x884
> >> [    0.000000] memblock_free: [0x03231400-0x032323ff]
> >> pcpu_embed_first_chunk+0x850/0x884
> >> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
> >> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
> >> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
> >> bytes, linear)
> >> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> >> bytes, linear)
> > 
> >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> >> trap_init+0x70/0x4e8
> > 
> > Most likely someplace here the corruption has happened. The log above
> > has just reserved a memory for NMI/reset vectors:
> > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > 
> > But then the board_ebase_setup() pointer is dereferenced and called,
> > which has been initialized with bmips_ebase_setup() earlier and which
> > overwrites the ebase variable with: 0x80001000 as this is
> > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > corruption of the memory above 0x80001000, which as we have discovered
> > belongs to fdt and unflattened device tree.
> > 
> >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> >> cma-reserved, 1835008K highmem)
> >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> [    0.000000] rcu: Hierarchical RCU implementation.
> >> [    0.000000] rcu:     RCU event tracing is enabled.
> >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> >> is 25 jiffies.
> >> [    0.000000] NR_IRQS: 256
> > 
> >> [    0.000000] OF: Bad cell count for /rdb
> >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > 
> > So here is the first time we have got the consequence of the corruption
> > popped up. Luckily it's just the "Bad cells count" error. We could have
> > got much less obvious log here up to getting a crash at some place
> > further...
> > 
> >> [    0.000000] random: get_random_bytes called from
> >> start_kernel+0x444/0x654 with crng_init=0
> >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> >> wraps every 8589934590000000ns
> > 
> >>
> >> and with your patch applied which unfortunately did not work we have the
> >> following:
> >>
> >> [...]
> > 
> > So a patch like this shall workaround the corruption:
> > 
> > --- a/arch/mips/bmips/setup.c
> > +++ b/arch/mips/bmips/setup.c
> > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> >  
> >  	__dt_setup_arch(dtb);
> >  
> > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > +
> >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> >  					     q->compatible)) {
> 

> This patch works, thanks a lot for the troubleshooting and analysis! How
> about the following which would be more generic and works as well and
> should be more universal since it does not require each architecture to
> provide an appropriate call to memblock_reserve():

Hm, are you sure it's working? If so, my analysis hasn't been quite
correct. My suggestion was based on the memory initializations,
allocations and reservations trace. So here is the sequence of most
crucial of them:
1) Memblock initialization:
   start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch()
   (At this point I suggested to place the exceptions memory
    reservation.)
2) Base FDT memory reservation:
   start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self()
3) FDT "reserved-memory" nodes parsing and corresponding memory ranges
   reservation:
   start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem()
4) Reserve kernel itself, some critical sections like initrd and
   crash-kernel:
   start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()...
5) Copy and unflatten the built-into the kernel device tree
   (BMIPS-platform code):
   start_kernel()->setup_arch()->arch_mem_init()->device_tree_init()
   This is the very first time an allocation from the memblock pool
   is performed. Since we haven't reserved a memory for the exception
   vectors yet, the memblock allocator is free to return that memory
   range for any other use. Needless to say if we try to use that memory
   later without consulting with memblock, we may and in our case
   will get into troubles.
6) Many random early memblock allocations for kernel use before
   buddy and sl*b allocators are up and running...
   Note if for some fortunate reason the allocations made in 5) didn't
   overlap the exceptions memory, here we have much more chances to
   do that with obviously fatal consequences of the ranges independent
   usage.
7) Trap/exception vectors initialization and !memory reservation! for
   them:
   start_kernel()->trap_init()
   Only at this point we get to reserve the memory for the vectors.
8) Init and run buddy/sl*b allocators:
   start_kernel()->mm_init()->...mem_init()...

There are a lot of allocations done in 5) and 6) before the
trap_init() is called in 7). You can see that in your log. That's why
I have doubts that your patch worked well. Most likely you've
forgotten to revert the workaround suggested by me in the previous
message. Could you make sure that you didn't and re-test your patch
again? If it still works then I might have confused something and it's
strange that my patch worked in the first place...

A food for thoughts for everyone (Thomas, Mark, please join the
discussion). What we've got here is a bit bigger problem. AFAICS
if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node()
performs the allocation above the very first PAGE_SIZE memory chunk
(see that method code for details). So we are currently on a safe side
for some older MIPS platforms. But the platform with VEIC/VINT may get
into the same troubles here if they didn't reserve exception memory
early enough before the kernel starts random allocations from
memblock. So we either need to provide a generic workaround for that
or make sure each platform gets to reserve vectors itself for instance
in the plat_mem_setup() method.

-Sergey

> 
> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> index e0352958e2f7..b0a173b500e8 100644
> --- a/arch/mips/kernel/traps.c
> +++ b/arch/mips/kernel/traps.c
> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> 
>         if (!cpu_has_mips_r2_r6) {
>                 ebase = CAC_BASE;
> -               ebase_pa = virt_to_phys((void *)ebase);
>                 vec_size = 0x400;
> -
> -               memblock_reserve(ebase_pa, vec_size);
>         } else {
>                 if (cpu_has_veic || cpu_has_vint)
>                         vec_size = 0x200 + VECTORSPACING*64;
> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> 
>         if (board_ebase_setup)
>                 board_ebase_setup();
> +
> +       /* board_ebase_setup() can change the exception base address
> +        * reserve it now after changes were made.
> +        */
> +       if (!cpu_has_mips_r2_r6) {
> +               ebase_pa = virt_to_phys((void *)ebase);
> +               memblock_reserve(ebase_pa, vec_size);
> +       }
>         per_cpu_trap_init(true);
>         memblock_set_bottom_up(false);
> -- 
> Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-03-01  3:50           ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Florian Fainelli
  2021-03-01  9:22             ` Serge Semin
@ 2021-03-01  9:45             ` Mike Rapoport
  2021-03-02  3:55               ` Roman Gushchin
  1 sibling, 1 reply; 17+ messages in thread
From: Mike Rapoport @ 2021-03-01  9:45 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Serge Semin, Thomas Bogendoerfer, Serge Semin, Roman Gushchin,
	Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang,
	iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> Hi Serge,
> 
> On 2/28/2021 3:08 PM, Serge Semin wrote:
> > Hi folks,
> > What you've got here seems a more complicated problem than it
> > could originally look like. Please, see my comments below.
> > 
> > (Note I've discarded some of the email logs, which of no interest
> > to the discovered problem. Please also note that I haven't got any
> > Broadcom hardware to test out a solution suggested below.)
> > 
> > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> >> Hi Mike,
> >>
> >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> >>> Hi Florian,
> >>>
> >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>>>
> > 
> >>>> [...]
> > 
> >>>>
> >>>> Hi Roman, Thomas and other linux-mips folks,
> >>>>
> >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> >>>> not see a warning like this one in the commit message, instead what
> >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> >>>> of the "rdb" node and leading to the interrupt controllers not being
> >>>> registered, and the system eventually not booting.
> >>>>
> >>>> The Device Tree is built-into the kernel image and resides at
> >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>>>
> >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > 
> > Most likely the problem you've discovered has been there for quite
> > some time. The patch you are referring to just caused it to be
> > triggered by extending the early allocation range. See before that
> > patch was accepted the early memory allocations had been performed
> > in the range:
> > [kernel_end, RAM_END].
> > The patch changed that, so the early allocations are done within
> > [RAM_START + PAGE_SIZE, RAM_END].
> > 
> > In normal situations it's safe to do that as long as all the critical
> > memory regions (including the memory residing a space below the
> > kernel) have been reserved. But as soon as a memory with some critical
> > structures haven't been reserved, the kernel may allocate it to be used
> > for instance for early initializations with obviously unpredictable but
> > most of the times unpleasant consequences.
> > 
> >>>
> >>> Apparently there is a memblock allocation in one of the functions called
> >>> from arch_mem_init() between plat_mem_setup() and
> >>> early_init_fdt_reserve_self().
> > 
> > Mike, alas according to the log provided by Florian that's not the reason
> > of the problem. Please, see my considerations below.
> > 
> >> [...]
> >>
> >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> >> Feb 28 10:01:50 PST 2021
> >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> >> [    0.000000] FPU revision is: 00130001
> > 
> >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> > 
> > Here the memory has been added to the memblock allocator.
> > 
> >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > 
> >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> >> setup_arch+0x128/0x69c
> > 
> > Here the fdt memory has been reserved. (Note it's built into the
> > kernel.)
> > 
> >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> >> setup_arch+0x1f8/0x69c
> > 
> > Here the kernel itself together with built-in dtb have been reserved.
> > So far so good.
> > 
> >> [    0.000000] Initrd not found or empty - disabling initrd
> > 
> >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> >> memblock_alloc_range_nid+0xf8/0x198
> > 
> > The log above most likely belongs to the call-chain:
> > setup_arch()
> > +-> arch_mem_init()
> >     +-> device_tree_init() - BMIPS specific method
> >         +-> unflatten_and_copy_device_tree()
> > 
> > So to speak here we've copied the fdt from the original space
> > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > it to [0x00003aa4-0x0000ba4b].
> > 
> > The problem is that a bit later the next call-chain is performed:
> > setup_arch()
> > +-> plat_smp_setup()
> >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> >         +-> if (!board_ebase_setup)
> >                  board_ebase_setup = &bmips_ebase_setup;
> > 
> > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > method is called. What trap_init() does isn't compatible with the
> > allocation performed by the unflatten_and_copy_device_tree() method.
> > See the next comment.
> > 
> >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84

...

> >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> >> bytes, linear)
> > 
> >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> >> trap_init+0x70/0x4e8
> > 
> > Most likely someplace here the corruption has happened. The log above
> > has just reserved a memory for NMI/reset vectors:
> > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > 
> > But then the board_ebase_setup() pointer is dereferenced and called,
> > which has been initialized with bmips_ebase_setup() earlier and which
> > overwrites the ebase variable with: 0x80001000 as this is
> > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > corruption of the memory above 0x80001000, which as we have discovered
> > belongs to fdt and unflattened device tree.
> > 
> >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> >> cma-reserved, 1835008K highmem)
> >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> [    0.000000] rcu: Hierarchical RCU implementation.
> >> [    0.000000] rcu:     RCU event tracing is enabled.
> >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> >> is 25 jiffies.
> >> [    0.000000] NR_IRQS: 256
> > 
> >> [    0.000000] OF: Bad cell count for /rdb
> >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > 
> > So here is the first time we have got the consequence of the corruption
> > popped up. Luckily it's just the "Bad cells count" error. We could have
> > got much less obvious log here up to getting a crash at some place
> > further...
> > 
> >> [    0.000000] random: get_random_bytes called from
> >> start_kernel+0x444/0x654 with crng_init=0
> >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> >> wraps every 8589934590000000ns
> > 
> >>
> >> and with your patch applied which unfortunately did not work we have the
> >> following:
> >>
> >> [...]
> > 
> > So a patch like this shall workaround the corruption:
> > 
> > --- a/arch/mips/bmips/setup.c
> > +++ b/arch/mips/bmips/setup.c
> > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> >  
> >  	__dt_setup_arch(dtb);
> >  
> > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > +
> >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> >  					     q->compatible)) {
> 
> This patch works, thanks a lot for the troubleshooting and analysis! How
> about the following which would be more generic and works as well and
> should be more universal since it does not require each architecture to
> provide an appropriate call to memblock_reserve():
> 
> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> index e0352958e2f7..b0a173b500e8 100644
> --- a/arch/mips/kernel/traps.c
> +++ b/arch/mips/kernel/traps.c
> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> 
>         if (!cpu_has_mips_r2_r6) {
>                 ebase = CAC_BASE;
> -               ebase_pa = virt_to_phys((void *)ebase);
>                 vec_size = 0x400;
> -
> -               memblock_reserve(ebase_pa, vec_size);
>         } else {
>                 if (cpu_has_veic || cpu_has_vint)
>                         vec_size = 0x200 + VECTORSPACING*64;
> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> 
>         if (board_ebase_setup)
>                 board_ebase_setup();
> +
> +       /* board_ebase_setup() can change the exception base address
> +        * reserve it now after changes were made.
> +        */
> +       if (!cpu_has_mips_r2_r6) {
> +               ebase_pa = virt_to_phys((void *)ebase);
> +               memblock_reserve(ebase_pa, vec_size);
> +       }

With this it's still possible to have memblock allocations around ebase_pa
before it is reserved.

I think we have two options here to solve it in more or less generic way:

* split the reservation of ebase from traps_init() and move it earlier to
setup_arch(). I didn't check what board_ebase_setup() do, if they need to
allocate memory it would not work.

* add an API to memblock to set lower limit for allocations and then set
the lower limit, to e.g. kernel load address in arch_mem_init(). This may
add complexity for configurations with relocatable kernel and kaslr.

>         per_cpu_trap_init(true);
>         memblock_set_bottom_up(false);

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-03-01  9:45             ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Mike Rapoport
@ 2021-03-02  3:55               ` Roman Gushchin
  2021-03-02 13:08                 ` Serge Semin
  0 siblings, 1 reply; 17+ messages in thread
From: Roman Gushchin @ 2021-03-02  3:55 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Florian Fainelli, Serge Semin, Thomas Bogendoerfer, Serge Semin,
	Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang,
	iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Mon, Mar 01, 2021 at 11:45:42AM +0200, Mike Rapoport wrote:
> On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> > Hi Serge,
> > 
> > On 2/28/2021 3:08 PM, Serge Semin wrote:
> > > Hi folks,
> > > What you've got here seems a more complicated problem than it
> > > could originally look like. Please, see my comments below.
> > > 
> > > (Note I've discarded some of the email logs, which of no interest
> > > to the discovered problem. Please also note that I haven't got any
> > > Broadcom hardware to test out a solution suggested below.)
> > > 
> > > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> > >> Hi Mike,
> > >>
> > >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> > >>> Hi Florian,
> > >>>
> > >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> > >>>>
> > > 
> > >>>> [...]
> > > 
> > >>>>
> > >>>> Hi Roman, Thomas and other linux-mips folks,
> > >>>>
> > >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> > >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> > >>>> not see a warning like this one in the commit message, instead what
> > >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> > >>>> of the "rdb" node and leading to the interrupt controllers not being
> > >>>> registered, and the system eventually not booting.
> > >>>>
> > >>>> The Device Tree is built-into the kernel image and resides at
> > >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> > >>>>
> > >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > > 
> > > Most likely the problem you've discovered has been there for quite
> > > some time. The patch you are referring to just caused it to be
> > > triggered by extending the early allocation range. See before that
> > > patch was accepted the early memory allocations had been performed
> > > in the range:
> > > [kernel_end, RAM_END].
> > > The patch changed that, so the early allocations are done within
> > > [RAM_START + PAGE_SIZE, RAM_END].
> > > 
> > > In normal situations it's safe to do that as long as all the critical
> > > memory regions (including the memory residing a space below the
> > > kernel) have been reserved. But as soon as a memory with some critical
> > > structures haven't been reserved, the kernel may allocate it to be used
> > > for instance for early initializations with obviously unpredictable but
> > > most of the times unpleasant consequences.
> > > 
> > >>>
> > >>> Apparently there is a memblock allocation in one of the functions called
> > >>> from arch_mem_init() between plat_mem_setup() and
> > >>> early_init_fdt_reserve_self().
> > > 
> > > Mike, alas according to the log provided by Florian that's not the reason
> > > of the problem. Please, see my considerations below.
> > > 
> > >> [...]
> > >>
> > >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> > >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> > >> Feb 28 10:01:50 PST 2021
> > >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> > >> [    0.000000] FPU revision is: 00130001
> > > 
> > >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> > >> early_init_dt_scan_memory+0x160/0x1e0
> > >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> > >> early_init_dt_scan_memory+0x160/0x1e0
> > >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> > >> early_init_dt_scan_memory+0x160/0x1e0
> > > 
> > > Here the memory has been added to the memblock allocator.
> > > 
> > >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> > >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> > >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > > 
> > >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> > >> setup_arch+0x128/0x69c
> > > 
> > > Here the fdt memory has been reserved. (Note it's built into the
> > > kernel.)
> > > 
> > >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> > >> setup_arch+0x1f8/0x69c
> > > 
> > > Here the kernel itself together with built-in dtb have been reserved.
> > > So far so good.
> > > 
> > >> [    0.000000] Initrd not found or empty - disabling initrd
> > > 
> > >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> > >> from=0x00000000 max_addr=0x00000000
> > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> > >> memblock_alloc_range_nid+0xf8/0x198
> > >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> > >> from=0x00000000 max_addr=0x00000000
> > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> > >> memblock_alloc_range_nid+0xf8/0x198
> > > 
> > > The log above most likely belongs to the call-chain:
> > > setup_arch()
> > > +-> arch_mem_init()
> > >     +-> device_tree_init() - BMIPS specific method
> > >         +-> unflatten_and_copy_device_tree()
> > > 
> > > So to speak here we've copied the fdt from the original space
> > > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > > it to [0x00003aa4-0x0000ba4b].
> > > 
> > > The problem is that a bit later the next call-chain is performed:
> > > setup_arch()
> > > +-> plat_smp_setup()
> > >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> > >         +-> if (!board_ebase_setup)
> > >                  board_ebase_setup = &bmips_ebase_setup;
> > > 
> > > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > > method is called. What trap_init() does isn't compatible with the
> > > allocation performed by the unflatten_and_copy_device_tree() method.
> > > See the next comment.
> > > 
> > >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> > >> from=0x00000000 max_addr=0x00000000
> > >> early_init_dt_alloc_memory_arch+0x40/0x84
> 
> ...
> 
> > >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> > >> bytes, linear)
> > > 
> > >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> > >> trap_init+0x70/0x4e8
> > > 
> > > Most likely someplace here the corruption has happened. The log above
> > > has just reserved a memory for NMI/reset vectors:
> > > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > > 
> > > But then the board_ebase_setup() pointer is dereferenced and called,
> > > which has been initialized with bmips_ebase_setup() earlier and which
> > > overwrites the ebase variable with: 0x80001000 as this is
> > > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > > corruption of the memory above 0x80001000, which as we have discovered
> > > belongs to fdt and unflattened device tree.
> > > 
> > >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> > >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> > >> cma-reserved, 1835008K highmem)
> > >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > >> [    0.000000] rcu: Hierarchical RCU implementation.
> > >> [    0.000000] rcu:     RCU event tracing is enabled.
> > >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> > >> is 25 jiffies.
> > >> [    0.000000] NR_IRQS: 256
> > > 
> > >> [    0.000000] OF: Bad cell count for /rdb
> > >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> > >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > > 
> > > So here is the first time we have got the consequence of the corruption
> > > popped up. Luckily it's just the "Bad cells count" error. We could have
> > > got much less obvious log here up to getting a crash at some place
> > > further...
> > > 
> > >> [    0.000000] random: get_random_bytes called from
> > >> start_kernel+0x444/0x654 with crng_init=0
> > >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> > >> wraps every 8589934590000000ns
> > > 
> > >>
> > >> and with your patch applied which unfortunately did not work we have the
> > >> following:
> > >>
> > >> [...]
> > > 
> > > So a patch like this shall workaround the corruption:
> > > 
> > > --- a/arch/mips/bmips/setup.c
> > > +++ b/arch/mips/bmips/setup.c
> > > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> > >  
> > >  	__dt_setup_arch(dtb);
> > >  
> > > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > > +
> > >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> > >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> > >  					     q->compatible)) {
> > 
> > This patch works, thanks a lot for the troubleshooting and analysis! How
> > about the following which would be more generic and works as well and
> > should be more universal since it does not require each architecture to
> > provide an appropriate call to memblock_reserve():
> > 
> > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> > index e0352958e2f7..b0a173b500e8 100644
> > --- a/arch/mips/kernel/traps.c
> > +++ b/arch/mips/kernel/traps.c
> > @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> > 
> >         if (!cpu_has_mips_r2_r6) {
> >                 ebase = CAC_BASE;
> > -               ebase_pa = virt_to_phys((void *)ebase);
> >                 vec_size = 0x400;
> > -
> > -               memblock_reserve(ebase_pa, vec_size);
> >         } else {
> >                 if (cpu_has_veic || cpu_has_vint)
> >                         vec_size = 0x200 + VECTORSPACING*64;
> > @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> > 
> >         if (board_ebase_setup)
> >                 board_ebase_setup();
> > +
> > +       /* board_ebase_setup() can change the exception base address
> > +        * reserve it now after changes were made.
> > +        */
> > +       if (!cpu_has_mips_r2_r6) {
> > +               ebase_pa = virt_to_phys((void *)ebase);
> > +               memblock_reserve(ebase_pa, vec_size);
> > +       }

Hi folks!

First, I'm really sorry for breaking things and also being silent for last
couple of days: I was almost completely offline. Thank you for working on
this!

> 
> With this it's still possible to have memblock allocations around ebase_pa
> before it is reserved.
> 
> I think we have two options here to solve it in more or less generic way:
> 
> * split the reservation of ebase from traps_init() and move it earlier to
> setup_arch(). I didn't check what board_ebase_setup() do, if they need to
> allocate memory it would not work.

It seems that it doesn't allocate any memory, so it sounds like a good option.
But doesn't the ebase initialization depend on the memblock allocator?

I see in trap_init():
    if (!cpu_has_mips_r2_r6) {
        ...
    } else {
        ...
	ebase_pa = memblock_phys_alloc(vec_size, 1 << fls(vec_size));
	...
	if (!IS_ENABLED(CONFIG_EVA) && !WARN_ON(ebase_pa >= 0x20000000))
	    ebase = CKSEG0ADDR(ebase_pa);
        else
            ebase = (unsigned long)phys_to_virt(ebase_pa);


> 
> * add an API to memblock to set lower limit for allocations and then set
> the lower limit, to e.g. kernel load address in arch_mem_init(). This may
> add complexity for configurations with relocatable kernel and kaslr.

This option looks more like a workaround to me, but maybe it's ok too.

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-03-01  9:22             ` Serge Semin
@ 2021-03-02  4:09               ` Florian Fainelli
  2021-03-02 13:26                 ` Serge Semin
  2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
  1 sibling, 1 reply; 17+ messages in thread
From: Florian Fainelli @ 2021-03-02  4:09 UTC (permalink / raw)
  To: Serge Semin, Mike Rapoport, Thomas Bogendoerfer
  Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu,
	Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko,
	linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE



On 3/1/2021 1:22 AM, Serge Semin wrote:
> On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
>> Hi Serge,
>>
>> On 2/28/2021 3:08 PM, Serge Semin wrote:
>>> Hi folks,
>>> What you've got here seems a more complicated problem than it
>>> could originally look like. Please, see my comments below.
>>>
>>> (Note I've discarded some of the email logs, which of no interest
>>> to the discovered problem. Please also note that I haven't got any
>>> Broadcom hardware to test out a solution suggested below.)
>>>
>>> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
>>>> Hi Mike,
>>>>
>>>> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
>>>>> Hi Florian,
>>>>>
>>>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
>>>>>>
>>>
>>>>>> [...]
>>>
>>>>>>
>>>>>> Hi Roman, Thomas and other linux-mips folks,
>>>>>>
>>>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
>>>>>> commit, reverting it makes our MIPS platforms boot successfully. We do
>>>>>> not see a warning like this one in the commit message, instead what
>>>>>> happens appear to be a corrupted Device Tree which prevents the parsing
>>>>>> of the "rdb" node and leading to the interrupt controllers not being
>>>>>> registered, and the system eventually not booting.
>>>>>>
>>>>>> The Device Tree is built-into the kernel image and resides at
>>>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
>>>>>>
>>>>>> Do you have any idea what could be wrong with MIPS specifically here?
>>>
>>> Most likely the problem you've discovered has been there for quite
>>> some time. The patch you are referring to just caused it to be
>>> triggered by extending the early allocation range. See before that
>>> patch was accepted the early memory allocations had been performed
>>> in the range:
>>> [kernel_end, RAM_END].
>>> The patch changed that, so the early allocations are done within
>>> [RAM_START + PAGE_SIZE, RAM_END].
>>>
>>> In normal situations it's safe to do that as long as all the critical
>>> memory regions (including the memory residing a space below the
>>> kernel) have been reserved. But as soon as a memory with some critical
>>> structures haven't been reserved, the kernel may allocate it to be used
>>> for instance for early initializations with obviously unpredictable but
>>> most of the times unpleasant consequences.
>>>
>>>>>
>>>>> Apparently there is a memblock allocation in one of the functions called
>>>>> from arch_mem_init() between plat_mem_setup() and
>>>>> early_init_fdt_reserve_self().
>>>
>>> Mike, alas according to the log provided by Florian that's not the reason
>>> of the problem. Please, see my considerations below.
>>>
>>>> [...]
>>>>
>>>> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
>>>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
>>>> Feb 28 10:01:50 PST 2021
>>>> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
>>>> [    0.000000] FPU revision is: 00130001
>>>
>>>> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
>>>> early_init_dt_scan_memory+0x160/0x1e0
>>>> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
>>>> early_init_dt_scan_memory+0x160/0x1e0
>>>> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
>>>> early_init_dt_scan_memory+0x160/0x1e0
>>>
>>> Here the memory has been added to the memblock allocator.
>>>
>>>> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
>>>> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
>>>> [    0.000000] printk: bootconsole [ns16550a0] enabled
>>>
>>>> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
>>>> setup_arch+0x128/0x69c
>>>
>>> Here the fdt memory has been reserved. (Note it's built into the
>>> kernel.)
>>>
>>>> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
>>>> setup_arch+0x1f8/0x69c
>>>
>>> Here the kernel itself together with built-in dtb have been reserved.
>>> So far so good.
>>>
>>>> [    0.000000] Initrd not found or empty - disabling initrd
>>>
>>>> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
>>>> from=0x00000000 max_addr=0x00000000
>>>> early_init_dt_alloc_memory_arch+0x40/0x84
>>>> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
>>>> from=0x00000000 max_addr=0x00000000
>>>> early_init_dt_alloc_memory_arch+0x40/0x84
>>>> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>
>>> The log above most likely belongs to the call-chain:
>>> setup_arch()
>>> +-> arch_mem_init()
>>>     +-> device_tree_init() - BMIPS specific method
>>>         +-> unflatten_and_copy_device_tree()
>>>
>>> So to speak here we've copied the fdt from the original space
>>> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
>>> it to [0x00003aa4-0x0000ba4b].
>>>
>>> The problem is that a bit later the next call-chain is performed:
>>> setup_arch()
>>> +-> plat_smp_setup()
>>>     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
>>>         +-> if (!board_ebase_setup)
>>>                  board_ebase_setup = &bmips_ebase_setup;
>>>
>>> So at the moment of the CPU traps initialization the bmips_ebase_setup()
>>> method is called. What trap_init() does isn't compatible with the
>>> allocation performed by the unflatten_and_copy_device_tree() method.
>>> See the next comment.
>>>
>>>> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
>>>> from=0x00000000 max_addr=0x00000000
>>>> early_init_dt_alloc_memory_arch+0x40/0x84
>>>> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
>>>> setup_arch+0x3fc/0x69c
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>>>> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>>>> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>>>> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
>>>> bytes.
>>>> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
>>>> linesize 32 bytes
>>>> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>>>> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>>>> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>>>> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Zone ranges:
>>>> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
>>>> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
>>>> [    0.000000] Movable zone start for each node
>>>> [    0.000000] Early memory node ranges
>>>> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
>>>> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
>>>> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
>>>> [    0.000000] Initmem setup node 0 [mem
>>>> 0x0000000000000000-0x00000000cfffffff]
>>>> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
>>>> from=0x00000000 max_addr=0x00000000
>>>> alloc_node_mem_map.constprop.135+0x6c/0xc8
>>>> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
>>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>>>> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
>>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>>>> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] MEMBLOCK configuration:
>>>> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
>>>> [    0.000000]  memory.cnt  = 0x3
>>>> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
>>>> bytes flags: 0x0
>>>> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
>>>> bytes flags: 0x0
>>>> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved.cnt  = 0xa
>>>> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
>>>> bytes flags: 0x0
>>>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
>>>> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
>>>> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
>>>> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
>>>> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
>>>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
>>>> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_free: [0x03245000-0x03244fff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] memblock_free: [0x03257000-0x03256fff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] memblock_free: [0x03269000-0x03268fff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
>>>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
>>>> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
>>>> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
>>>> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
>>>> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
>>>> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
>>>> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
>>>> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
>>>> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
>>>> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
>>>> pcpu_embed_first_chunk+0x838/0x884
>>>> [    0.000000] memblock_free: [0x03231400-0x032323ff]
>>>> pcpu_embed_first_chunk+0x850/0x884
>>>> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
>>>> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
>>>> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>>>> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
>>>> bytes, linear)
>>>> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>>>> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
>>>> bytes, linear)
>>>
>>>> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
>>>> trap_init+0x70/0x4e8
>>>
>>> Most likely someplace here the corruption has happened. The log above
>>> has just reserved a memory for NMI/reset vectors:
>>> arch/mips/kernel/traps.c: trap_init(void): Line 2373.
>>>
>>> But then the board_ebase_setup() pointer is dereferenced and called,
>>> which has been initialized with bmips_ebase_setup() earlier and which
>>> overwrites the ebase variable with: 0x80001000 as this is
>>> CPU_BMIPS5000 CPU. So any further calls of the functions like
>>> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
>>> corruption of the memory above 0x80001000, which as we have discovered
>>> belongs to fdt and unflattened device tree.
>>>
>>>> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
>>>> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
>>>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
>>>> cma-reserved, 1835008K highmem)
>>>> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>>>> [    0.000000] rcu: Hierarchical RCU implementation.
>>>> [    0.000000] rcu:     RCU event tracing is enabled.
>>>> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
>>>> is 25 jiffies.
>>>> [    0.000000] NR_IRQS: 256
>>>
>>>> [    0.000000] OF: Bad cell count for /rdb
>>>> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
>>>> [    0.000000] OF: of_irq_init: children remain, but no parents
>>>
>>> So here is the first time we have got the consequence of the corruption
>>> popped up. Luckily it's just the "Bad cells count" error. We could have
>>> got much less obvious log here up to getting a crash at some place
>>> further...
>>>
>>>> [    0.000000] random: get_random_bytes called from
>>>> start_kernel+0x444/0x654 with crng_init=0
>>>> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
>>>> wraps every 8589934590000000ns
>>>
>>>>
>>>> and with your patch applied which unfortunately did not work we have the
>>>> following:
>>>>
>>>> [...]
>>>
>>> So a patch like this shall workaround the corruption:
>>>
>>> --- a/arch/mips/bmips/setup.c
>>> +++ b/arch/mips/bmips/setup.c
>>> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
>>>  
>>>  	__dt_setup_arch(dtb);
>>>  
>>> +	memblock_reserve(0x0, 0x1000 + 0x100*64);
>>> +
>>>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>>>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>>>  					     q->compatible)) {
>>
> 
>> This patch works, thanks a lot for the troubleshooting and analysis! How
>> about the following which would be more generic and works as well and
>> should be more universal since it does not require each architecture to
>> provide an appropriate call to memblock_reserve():
> 
> Hm, are you sure it's working?

I was until I noticed that I was working on top of a revert of Roman's
patch sorry about the brain fart here.

> If so, my analysis hasn't been quite
> correct. My suggestion was based on the memory initializations,
> allocations and reservations trace. So here is the sequence of most
> crucial of them:
> 1) Memblock initialization:
>    start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch()
>    (At this point I suggested to place the exceptions memory
>     reservation.)
> 2) Base FDT memory reservation:
>    start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self()
> 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges
>    reservation:
>    start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem()
> 4) Reserve kernel itself, some critical sections like initrd and
>    crash-kernel:
>    start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()...
> 5) Copy and unflatten the built-into the kernel device tree
>    (BMIPS-platform code):
>    start_kernel()->setup_arch()->arch_mem_init()->device_tree_init()
>    This is the very first time an allocation from the memblock pool
>    is performed. Since we haven't reserved a memory for the exception
>    vectors yet, the memblock allocator is free to return that memory
>    range for any other use. Needless to say if we try to use that memory
>    later without consulting with memblock, we may and in our case
>    will get into troubles.
> 6) Many random early memblock allocations for kernel use before
>    buddy and sl*b allocators are up and running...
>    Note if for some fortunate reason the allocations made in 5) didn't
>    overlap the exceptions memory, here we have much more chances to
>    do that with obviously fatal consequences of the ranges independent
>    usage.
> 7) Trap/exception vectors initialization and !memory reservation! for
>    them:
>    start_kernel()->trap_init()
>    Only at this point we get to reserve the memory for the vectors.
> 8) Init and run buddy/sl*b allocators:
>    start_kernel()->mm_init()->...mem_init()...
> 
> There are a lot of allocations done in 5) and 6) before the
> trap_init() is called in 7). You can see that in your log. That's why
> I have doubts that your patch worked well. Most likely you've
> forgotten to revert the workaround suggested by me in the previous
> message. Could you make sure that you didn't and re-test your patch
> again? If it still works then I might have confused something and it's
> strange that my patch worked in the first place...

I would like to submit a fix for 5.12-rc1 and get it back ported into
5.11 so we have BMIPS machines boot again, that will be essentially your
earlier proposed fix.

BMIPS is the only "legacy" MIPS platform that defines an exception base,
so while this problem may certainly exist with other platforms, I do
wonder how likely it is there, though?

> 
> A food for thoughts for everyone (Thomas, Mark, please join the
> discussion). What we've got here is a bit bigger problem. AFAICS
> if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node()
> performs the allocation above the very first PAGE_SIZE memory chunk
> (see that method code for details). So we are currently on a safe side
> for some older MIPS platforms. But the platform with VEIC/VINT may get
> into the same troubles here if they didn't reserve exception memory
> early enough before the kernel starts random allocations from
> memblock. So we either need to provide a generic workaround for that
> or make sure each platform gets to reserve vectors itself for instance
> in the plat_mem_setup() method.
> 
> -Sergey
> 
>>
>> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
>> index e0352958e2f7..b0a173b500e8 100644
>> --- a/arch/mips/kernel/traps.c
>> +++ b/arch/mips/kernel/traps.c
>> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
>>
>>         if (!cpu_has_mips_r2_r6) {
>>                 ebase = CAC_BASE;
>> -               ebase_pa = virt_to_phys((void *)ebase);
>>                 vec_size = 0x400;
>> -
>> -               memblock_reserve(ebase_pa, vec_size);
>>         } else {
>>                 if (cpu_has_veic || cpu_has_vint)
>>                         vec_size = 0x200 + VECTORSPACING*64;
>> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
>>
>>         if (board_ebase_setup)
>>                 board_ebase_setup();
>> +
>> +       /* board_ebase_setup() can change the exception base address
>> +        * reserve it now after changes were made.
>> +        */
>> +       if (!cpu_has_mips_r2_r6) {
>> +               ebase_pa = virt_to_phys((void *)ebase);
>> +               memblock_reserve(ebase_pa, vec_size);
>> +       }
>>         per_cpu_trap_init(true);
>>         memblock_set_bottom_up(false);
>> -- 
>> Florian

-- 
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-01  9:22             ` Serge Semin
  2021-03-02  4:09               ` Florian Fainelli
@ 2021-03-02  4:19               ` Florian Fainelli
  2021-03-02  8:09                 ` Mike Rapoport
                                   ` (3 more replies)
  1 sibling, 4 replies; 17+ messages in thread
From: Florian Fainelli @ 2021-03-02  4:19 UTC (permalink / raw)
  To: linux-mips
  Cc: rppt, fancer.lancer, guro, akpm, paul, Florian Fainelli,
	Serge Semin, Kamal Dasu, Thomas Bogendoerfer, Yanteng Si,
	Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE,
	open list

BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.

Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].

The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.

To fix this, we need to perform an early reservation of the custom
exception that is going to be installed and this needs to happen at
plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
finds a space that is suitable, away from reserved memory.

Huge thanks to Serget for analysing and proposing a solution to this
issue.

Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Thomas,

This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
by the stable team for 5.11. We should find a safer way to avoid these
problems for 5.13 maybe.

 arch/mips/bmips/setup.c       | 22 ++++++++++++++++++++++
 arch/mips/include/asm/traps.h |  2 ++
 2 files changed, 24 insertions(+)

diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
index 31bcfa4e08b9..0088bd45b892 100644
--- a/arch/mips/bmips/setup.c
+++ b/arch/mips/bmips/setup.c
@@ -149,6 +149,26 @@ void __init plat_time_init(void)
 	mips_hpt_frequency = freq;
 }
 
+static void __init bmips_ebase_reserve(void)
+{
+	phys_addr_t base, size = VECTORSPACING * 64;
+
+	switch (current_cpu_type()) {
+	default:
+	case CPU_BMIPS4350:
+		return;
+	case CPU_BMIPS3300:
+	case CPU_BMIPS4380:
+		base = 0x0400;
+		break;
+	case CPU_BMIPS5000:
+		base = 0x1000;
+		break;
+	}
+
+	memblock_reserve(base, size);
+}
+
 void __init plat_mem_setup(void)
 {
 	void *dtb;
@@ -169,6 +189,8 @@ void __init plat_mem_setup(void)
 
 	__dt_setup_arch(dtb);
 
+	bmips_ebase_reserve();
+
 	for (q = bmips_quirk_list; q->quirk_fn; q++) {
 		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
 					     q->compatible)) {
diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
index 6aa8f126a43d..0ba6bb7f9618 100644
--- a/arch/mips/include/asm/traps.h
+++ b/arch/mips/include/asm/traps.h
@@ -14,6 +14,8 @@
 #define MIPS_BE_FIXUP	1		/* return to the fixup code */
 #define MIPS_BE_FATAL	2		/* treat as an unrecoverable error */
 
+#define VECTORSPACING 0x100	/* for EI/VI mode */
+
 extern void (*board_be_init)(void);
 extern int (*board_be_handler)(struct pt_regs *regs, int is_fixup);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
@ 2021-03-02  8:09                 ` Mike Rapoport
  2021-03-02 13:54                 ` Serge Semin
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Mike Rapoport @ 2021-03-02  8:09 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: linux-mips, fancer.lancer, guro, akpm, paul, Serge Semin,
	Kamal Dasu, Thomas Bogendoerfer, Yanteng Si, Huacai Chen,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list

On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception that is going to be installed and this needs to happen at
> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
> finds a space that is suitable, away from reserved memory.
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.
> 
> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Acked-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
> Thomas,
> 
> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
> by the stable team for 5.11. We should find a safer way to avoid these
> problems for 5.13 maybe.
> 
>  arch/mips/bmips/setup.c       | 22 ++++++++++++++++++++++
>  arch/mips/include/asm/traps.h |  2 ++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
> index 31bcfa4e08b9..0088bd45b892 100644
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -149,6 +149,26 @@ void __init plat_time_init(void)
>  	mips_hpt_frequency = freq;
>  }
>  
> +static void __init bmips_ebase_reserve(void)
> +{
> +	phys_addr_t base, size = VECTORSPACING * 64;
> +
> +	switch (current_cpu_type()) {
> +	default:
> +	case CPU_BMIPS4350:
> +		return;
> +	case CPU_BMIPS3300:
> +	case CPU_BMIPS4380:
> +		base = 0x0400;
> +		break;
> +	case CPU_BMIPS5000:
> +		base = 0x1000;
> +		break;
> +	}
> +
> +	memblock_reserve(base, size);
> +}
> +
>  void __init plat_mem_setup(void)
>  {
>  	void *dtb;
> @@ -169,6 +189,8 @@ void __init plat_mem_setup(void)
>  
>  	__dt_setup_arch(dtb);
>  
> +	bmips_ebase_reserve();
> +
>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>  					     q->compatible)) {
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index 6aa8f126a43d..0ba6bb7f9618 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -14,6 +14,8 @@
>  #define MIPS_BE_FIXUP	1		/* return to the fixup code */
>  #define MIPS_BE_FATAL	2		/* treat as an unrecoverable error */
>  
> +#define VECTORSPACING 0x100	/* for EI/VI mode */
> +
>  extern void (*board_be_init)(void);
>  extern int (*board_be_handler)(struct pt_regs *regs, int is_fixup);
>  
> -- 
> 2.25.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-03-02  3:55               ` Roman Gushchin
@ 2021-03-02 13:08                 ` Serge Semin
  0 siblings, 0 replies; 17+ messages in thread
From: Serge Semin @ 2021-03-02 13:08 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Serge Semin, Mike Rapoport, Florian Fainelli, Thomas Bogendoerfer,
	Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang,
	iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Mon, Mar 01, 2021 at 07:55:21PM -0800, Roman Gushchin wrote:
> On Mon, Mar 01, 2021 at 11:45:42AM +0200, Mike Rapoport wrote:
> > On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> > > Hi Serge,
> > > 
> > > On 2/28/2021 3:08 PM, Serge Semin wrote:
> > > > Hi folks,
> > > > What you've got here seems a more complicated problem than it
> > > > could originally look like. Please, see my comments below.
> > > > 
> > > > (Note I've discarded some of the email logs, which of no interest
> > > > to the discovered problem. Please also note that I haven't got any
> > > > Broadcom hardware to test out a solution suggested below.)
> > > > 
> > > > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> > > >> Hi Mike,
> > > >>
> > > >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> > > >>> Hi Florian,
> > > >>>
> > > >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> > > >>>>
> > > > 
> > > >>>> [...]
> > > > 
> > > >>>>
> > > >>>> Hi Roman, Thomas and other linux-mips folks,
> > > >>>>
> > > >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> > > >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> > > >>>> not see a warning like this one in the commit message, instead what
> > > >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> > > >>>> of the "rdb" node and leading to the interrupt controllers not being
> > > >>>> registered, and the system eventually not booting.
> > > >>>>
> > > >>>> The Device Tree is built-into the kernel image and resides at
> > > >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> > > >>>>
> > > >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > > > 
> > > > Most likely the problem you've discovered has been there for quite
> > > > some time. The patch you are referring to just caused it to be
> > > > triggered by extending the early allocation range. See before that
> > > > patch was accepted the early memory allocations had been performed
> > > > in the range:
> > > > [kernel_end, RAM_END].
> > > > The patch changed that, so the early allocations are done within
> > > > [RAM_START + PAGE_SIZE, RAM_END].
> > > > 
> > > > In normal situations it's safe to do that as long as all the critical
> > > > memory regions (including the memory residing a space below the
> > > > kernel) have been reserved. But as soon as a memory with some critical
> > > > structures haven't been reserved, the kernel may allocate it to be used
> > > > for instance for early initializations with obviously unpredictable but
> > > > most of the times unpleasant consequences.
> > > > 
> > > >>>
> > > >>> Apparently there is a memblock allocation in one of the functions called
> > > >>> from arch_mem_init() between plat_mem_setup() and
> > > >>> early_init_fdt_reserve_self().
> > > > 
> > > > Mike, alas according to the log provided by Florian that's not the reason
> > > > of the problem. Please, see my considerations below.
> > > > 
> > > >> [...]
> > > >>
> > > >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> > > >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> > > >> Feb 28 10:01:50 PST 2021
> > > >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> > > >> [    0.000000] FPU revision is: 00130001
> > > > 
> > > >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> > > >> early_init_dt_scan_memory+0x160/0x1e0
> > > >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> > > >> early_init_dt_scan_memory+0x160/0x1e0
> > > >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> > > >> early_init_dt_scan_memory+0x160/0x1e0
> > > > 
> > > > Here the memory has been added to the memblock allocator.
> > > > 
> > > >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> > > >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> > > >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > > > 
> > > >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> > > >> setup_arch+0x128/0x69c
> > > > 
> > > > Here the fdt memory has been reserved. (Note it's built into the
> > > > kernel.)
> > > > 
> > > >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> > > >> setup_arch+0x1f8/0x69c
> > > > 
> > > > Here the kernel itself together with built-in dtb have been reserved.
> > > > So far so good.
> > > > 
> > > >> [    0.000000] Initrd not found or empty - disabling initrd
> > > > 
> > > >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> > > >> from=0x00000000 max_addr=0x00000000
> > > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > > >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> > > >> memblock_alloc_range_nid+0xf8/0x198
> > > >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> > > >> from=0x00000000 max_addr=0x00000000
> > > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > > >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> > > >> memblock_alloc_range_nid+0xf8/0x198
> > > > 
> > > > The log above most likely belongs to the call-chain:
> > > > setup_arch()
> > > > +-> arch_mem_init()
> > > >     +-> device_tree_init() - BMIPS specific method
> > > >         +-> unflatten_and_copy_device_tree()
> > > > 
> > > > So to speak here we've copied the fdt from the original space
> > > > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > > > it to [0x00003aa4-0x0000ba4b].
> > > > 
> > > > The problem is that a bit later the next call-chain is performed:
> > > > setup_arch()
> > > > +-> plat_smp_setup()
> > > >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> > > >         +-> if (!board_ebase_setup)
> > > >                  board_ebase_setup = &bmips_ebase_setup;
> > > > 
> > > > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > > > method is called. What trap_init() does isn't compatible with the
> > > > allocation performed by the unflatten_and_copy_device_tree() method.
> > > > See the next comment.
> > > > 
> > > >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> > > >> from=0x00000000 max_addr=0x00000000
> > > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > 
> > ...
> > 
> > > >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> > > >> bytes, linear)
> > > > 
> > > >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> > > >> trap_init+0x70/0x4e8
> > > > 
> > > > Most likely someplace here the corruption has happened. The log above
> > > > has just reserved a memory for NMI/reset vectors:
> > > > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > > > 
> > > > But then the board_ebase_setup() pointer is dereferenced and called,
> > > > which has been initialized with bmips_ebase_setup() earlier and which
> > > > overwrites the ebase variable with: 0x80001000 as this is
> > > > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > > > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > > > corruption of the memory above 0x80001000, which as we have discovered
> > > > belongs to fdt and unflattened device tree.
> > > > 
> > > >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > > >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> > > >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> > > >> cma-reserved, 1835008K highmem)
> > > >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > > >> [    0.000000] rcu: Hierarchical RCU implementation.
> > > >> [    0.000000] rcu:     RCU event tracing is enabled.
> > > >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> > > >> is 25 jiffies.
> > > >> [    0.000000] NR_IRQS: 256
> > > > 
> > > >> [    0.000000] OF: Bad cell count for /rdb
> > > >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> > > >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > > > 
> > > > So here is the first time we have got the consequence of the corruption
> > > > popped up. Luckily it's just the "Bad cells count" error. We could have
> > > > got much less obvious log here up to getting a crash at some place
> > > > further...
> > > > 
> > > >> [    0.000000] random: get_random_bytes called from
> > > >> start_kernel+0x444/0x654 with crng_init=0
> > > >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> > > >> wraps every 8589934590000000ns
> > > > 
> > > >>
> > > >> and with your patch applied which unfortunately did not work we have the
> > > >> following:
> > > >>
> > > >> [...]
> > > > 
> > > > So a patch like this shall workaround the corruption:
> > > > 
> > > > --- a/arch/mips/bmips/setup.c
> > > > +++ b/arch/mips/bmips/setup.c
> > > > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> > > >  
> > > >  	__dt_setup_arch(dtb);
> > > >  
> > > > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > > > +
> > > >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> > > >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> > > >  					     q->compatible)) {
> > > 
> > > This patch works, thanks a lot for the troubleshooting and analysis! How
> > > about the following which would be more generic and works as well and
> > > should be more universal since it does not require each architecture to
> > > provide an appropriate call to memblock_reserve():
> > > 
> > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> > > index e0352958e2f7..b0a173b500e8 100644
> > > --- a/arch/mips/kernel/traps.c
> > > +++ b/arch/mips/kernel/traps.c
> > > @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> > > 
> > >         if (!cpu_has_mips_r2_r6) {
> > >                 ebase = CAC_BASE;
> > > -               ebase_pa = virt_to_phys((void *)ebase);
> > >                 vec_size = 0x400;
> > > -
> > > -               memblock_reserve(ebase_pa, vec_size);
> > >         } else {
> > >                 if (cpu_has_veic || cpu_has_vint)
> > >                         vec_size = 0x200 + VECTORSPACING*64;
> > > @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> > > 
> > >         if (board_ebase_setup)
> > >                 board_ebase_setup();
> > > +
> > > +       /* board_ebase_setup() can change the exception base address
> > > +        * reserve it now after changes were made.
> > > +        */
> > > +       if (!cpu_has_mips_r2_r6) {
> > > +               ebase_pa = virt_to_phys((void *)ebase);
> > > +               memblock_reserve(ebase_pa, vec_size);
> > > +       }
> 
> Hi folks!
> 
> First, I'm really sorry for breaking things and also being silent for last
> couple of days: I was almost completely offline. Thank you for working on
> this!
> 
> > 
> > With this it's still possible to have memblock allocations around ebase_pa
> > before it is reserved.
> > 
> > I think we have two options here to solve it in more or less generic way:
> > 
> > * split the reservation of ebase from traps_init() and move it earlier to
> > setup_arch(). I didn't check what board_ebase_setup() do, if they need to
> > allocate memory it would not work.
> 

> It seems that it doesn't allocate any memory, so it sounds like a good option.
> But doesn't the ebase initialization depend on the memblock allocator?
> 
> I see in trap_init():
>     if (!cpu_has_mips_r2_r6) {
>         ...
>     } else {
>         ...
> 	ebase_pa = memblock_phys_alloc(vec_size, 1 << fls(vec_size));
> 	...
> 	if (!IS_ENABLED(CONFIG_EVA) && !WARN_ON(ebase_pa >= 0x20000000))
> 	    ebase = CKSEG0ADDR(ebase_pa);
>         else
>             ebase = (unsigned long)phys_to_virt(ebase_pa);

Yeap, this seems like the best option for now. Of course we need to
reserve the memory only if the system needs that like in case of non
MIPS R2-R5 archs. In addition a custom ebase value must be taken into
account. The later is the hardest part to achieve. ebase is a global
variable. So we need to thoroughly scan all the MIPS platforms which
update it and make sure it's done before the reservation is
performed. 

> 
> 
> > 
> > * add an API to memblock to set lower limit for allocations and then set
> > the lower limit, to e.g. kernel load address in arch_mem_init(). This may
> > add complexity for configurations with relocatable kernel and kaslr.
> 
> This option looks more like a workaround to me, but maybe it's ok too.

Agree. The first one is better.

-Sergey

> 
> Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-03-02  4:09               ` Florian Fainelli
@ 2021-03-02 13:26                 ` Serge Semin
  0 siblings, 0 replies; 17+ messages in thread
From: Serge Semin @ 2021-03-02 13:26 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Serge Semin, Mike Rapoport, Thomas Bogendoerfer, Roman Gushchin,
	Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang,
	iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Mon, Mar 01, 2021 at 08:09:52PM -0800, Florian Fainelli wrote:
> 
> 
> On 3/1/2021 1:22 AM, Serge Semin wrote:
> > On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> >> Hi Serge,
> >>
> >> On 2/28/2021 3:08 PM, Serge Semin wrote:
> >>> Hi folks,
> >>> What you've got here seems a more complicated problem than it
> >>> could originally look like. Please, see my comments below.
> >>>
> >>> (Note I've discarded some of the email logs, which of no interest
> >>> to the discovered problem. Please also note that I haven't got any
> >>> Broadcom hardware to test out a solution suggested below.)
> >>>
> >>> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> >>>> Hi Mike,
> >>>>
> >>>> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> >>>>> Hi Florian,
> >>>>>
> >>>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>>>>>
> >>>
> >>>>>> [...]
> >>>
> >>>>>>
> >>>>>> Hi Roman, Thomas and other linux-mips folks,
> >>>>>>
> >>>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >>>>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> >>>>>> not see a warning like this one in the commit message, instead what
> >>>>>> happens appear to be a corrupted Device Tree which prevents the parsing
> >>>>>> of the "rdb" node and leading to the interrupt controllers not being
> >>>>>> registered, and the system eventually not booting.
> >>>>>>
> >>>>>> The Device Tree is built-into the kernel image and resides at
> >>>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>>>>>
> >>>>>> Do you have any idea what could be wrong with MIPS specifically here?
> >>>
> >>> Most likely the problem you've discovered has been there for quite
> >>> some time. The patch you are referring to just caused it to be
> >>> triggered by extending the early allocation range. See before that
> >>> patch was accepted the early memory allocations had been performed
> >>> in the range:
> >>> [kernel_end, RAM_END].
> >>> The patch changed that, so the early allocations are done within
> >>> [RAM_START + PAGE_SIZE, RAM_END].
> >>>
> >>> In normal situations it's safe to do that as long as all the critical
> >>> memory regions (including the memory residing a space below the
> >>> kernel) have been reserved. But as soon as a memory with some critical
> >>> structures haven't been reserved, the kernel may allocate it to be used
> >>> for instance for early initializations with obviously unpredictable but
> >>> most of the times unpleasant consequences.
> >>>
> >>>>>
> >>>>> Apparently there is a memblock allocation in one of the functions called
> >>>>> from arch_mem_init() between plat_mem_setup() and
> >>>>> early_init_fdt_reserve_self().
> >>>
> >>> Mike, alas according to the log provided by Florian that's not the reason
> >>> of the problem. Please, see my considerations below.
> >>>
> >>>> [...]
> >>>>
> >>>> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> >>>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> >>>> Feb 28 10:01:50 PST 2021
> >>>> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> >>>> [    0.000000] FPU revision is: 00130001
> >>>
> >>>> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> >>>> early_init_dt_scan_memory+0x160/0x1e0
> >>>> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> >>>> early_init_dt_scan_memory+0x160/0x1e0
> >>>> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> >>>> early_init_dt_scan_memory+0x160/0x1e0
> >>>
> >>> Here the memory has been added to the memblock allocator.
> >>>
> >>>> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> >>>> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> >>>> [    0.000000] printk: bootconsole [ns16550a0] enabled
> >>>
> >>>> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> >>>> setup_arch+0x128/0x69c
> >>>
> >>> Here the fdt memory has been reserved. (Note it's built into the
> >>> kernel.)
> >>>
> >>>> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> >>>> setup_arch+0x1f8/0x69c
> >>>
> >>> Here the kernel itself together with built-in dtb have been reserved.
> >>> So far so good.
> >>>
> >>>> [    0.000000] Initrd not found or empty - disabling initrd
> >>>
> >>>> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000
> >>>> early_init_dt_alloc_memory_arch+0x40/0x84
> >>>> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000
> >>>> early_init_dt_alloc_memory_arch+0x40/0x84
> >>>> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>
> >>> The log above most likely belongs to the call-chain:
> >>> setup_arch()
> >>> +-> arch_mem_init()
> >>>     +-> device_tree_init() - BMIPS specific method
> >>>         +-> unflatten_and_copy_device_tree()
> >>>
> >>> So to speak here we've copied the fdt from the original space
> >>> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> >>> it to [0x00003aa4-0x0000ba4b].
> >>>
> >>> The problem is that a bit later the next call-chain is performed:
> >>> setup_arch()
> >>> +-> plat_smp_setup()
> >>>     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> >>>         +-> if (!board_ebase_setup)
> >>>                  board_ebase_setup = &bmips_ebase_setup;
> >>>
> >>> So at the moment of the CPU traps initialization the bmips_ebase_setup()
> >>> method is called. What trap_init() does isn't compatible with the
> >>> allocation performed by the unflatten_and_copy_device_tree() method.
> >>> See the next comment.
> >>>
> >>>> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000
> >>>> early_init_dt_alloc_memory_arch+0x40/0x84
> >>>> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
> >>>> setup_arch+0x3fc/0x69c
> >>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >>>> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >>>> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >>>> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
> >>>> bytes.
> >>>> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
> >>>> linesize 32 bytes
> >>>> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
> >>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >>>> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >>>> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >>>> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] Zone ranges:
> >>>> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
> >>>> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
> >>>> [    0.000000] Movable zone start for each node
> >>>> [    0.000000] Early memory node ranges
> >>>> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
> >>>> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
> >>>> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
> >>>> [    0.000000] Initmem setup node 0 [mem
> >>>> 0x0000000000000000-0x00000000cfffffff]
> >>>> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
> >>>> from=0x00000000 max_addr=0x00000000
> >>>> alloc_node_mem_map.constprop.135+0x6c/0xc8
> >>>> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
> >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >>>> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
> >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >>>> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] MEMBLOCK configuration:
> >>>> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
> >>>> [    0.000000]  memory.cnt  = 0x3
> >>>> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved.cnt  = 0xa
> >>>> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
> >>>> bytes flags: 0x0
> >>>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
> >>>> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
> >>>> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
> >>>> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
> >>>> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
> >>>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
> >>>> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_free: [0x03245000-0x03244fff]
> >>>> pcpu_embed_first_chunk+0x7a0/0x884
> >>>> [    0.000000] memblock_free: [0x03257000-0x03256fff]
> >>>> pcpu_embed_first_chunk+0x7a0/0x884
> >>>> [    0.000000] memblock_free: [0x03269000-0x03268fff]
> >>>> pcpu_embed_first_chunk+0x7a0/0x884
> >>>> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
> >>>> pcpu_embed_first_chunk+0x7a0/0x884
> >>>> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
> >>>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
> >>>> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
> >>>> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
> >>>> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
> >>>> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
> >>>> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
> >>>> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
> >>>> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
> >>>> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
> >>>> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
> >>>> pcpu_embed_first_chunk+0x838/0x884
> >>>> [    0.000000] memblock_free: [0x03231400-0x032323ff]
> >>>> pcpu_embed_first_chunk+0x850/0x884
> >>>> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
> >>>> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
> >>>> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >>>> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
> >>>> bytes, linear)
> >>>> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >>>> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> >>>> bytes, linear)
> >>>
> >>>> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> >>>> trap_init+0x70/0x4e8
> >>>
> >>> Most likely someplace here the corruption has happened. The log above
> >>> has just reserved a memory for NMI/reset vectors:
> >>> arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> >>>
> >>> But then the board_ebase_setup() pointer is dereferenced and called,
> >>> which has been initialized with bmips_ebase_setup() earlier and which
> >>> overwrites the ebase variable with: 0x80001000 as this is
> >>> CPU_BMIPS5000 CPU. So any further calls of the functions like
> >>> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> >>> corruption of the memory above 0x80001000, which as we have discovered
> >>> belongs to fdt and unflattened device tree.
> >>>
> >>>> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >>>> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> >>>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> >>>> cma-reserved, 1835008K highmem)
> >>>> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >>>> [    0.000000] rcu: Hierarchical RCU implementation.
> >>>> [    0.000000] rcu:     RCU event tracing is enabled.
> >>>> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> >>>> is 25 jiffies.
> >>>> [    0.000000] NR_IRQS: 256
> >>>
> >>>> [    0.000000] OF: Bad cell count for /rdb
> >>>> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> >>>> [    0.000000] OF: of_irq_init: children remain, but no parents
> >>>
> >>> So here is the first time we have got the consequence of the corruption
> >>> popped up. Luckily it's just the "Bad cells count" error. We could have
> >>> got much less obvious log here up to getting a crash at some place
> >>> further...
> >>>
> >>>> [    0.000000] random: get_random_bytes called from
> >>>> start_kernel+0x444/0x654 with crng_init=0
> >>>> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> >>>> wraps every 8589934590000000ns
> >>>
> >>>>
> >>>> and with your patch applied which unfortunately did not work we have the
> >>>> following:
> >>>>
> >>>> [...]
> >>>
> >>> So a patch like this shall workaround the corruption:
> >>>
> >>> --- a/arch/mips/bmips/setup.c
> >>> +++ b/arch/mips/bmips/setup.c
> >>> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> >>>  
> >>>  	__dt_setup_arch(dtb);
> >>>  
> >>> +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> >>> +
> >>>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> >>>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> >>>  					     q->compatible)) {
> >>
> > 
> >> This patch works, thanks a lot for the troubleshooting and analysis! How
> >> about the following which would be more generic and works as well and
> >> should be more universal since it does not require each architecture to
> >> provide an appropriate call to memblock_reserve():
> > 
> > Hm, are you sure it's working?
> 
> I was until I noticed that I was working on top of a revert of Roman's
> patch sorry about the brain fart here.
> 
> > If so, my analysis hasn't been quite
> > correct. My suggestion was based on the memory initializations,
> > allocations and reservations trace. So here is the sequence of most
> > crucial of them:
> > 1) Memblock initialization:
> >    start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch()
> >    (At this point I suggested to place the exceptions memory
> >     reservation.)
> > 2) Base FDT memory reservation:
> >    start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self()
> > 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges
> >    reservation:
> >    start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem()
> > 4) Reserve kernel itself, some critical sections like initrd and
> >    crash-kernel:
> >    start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()...
> > 5) Copy and unflatten the built-into the kernel device tree
> >    (BMIPS-platform code):
> >    start_kernel()->setup_arch()->arch_mem_init()->device_tree_init()
> >    This is the very first time an allocation from the memblock pool
> >    is performed. Since we haven't reserved a memory for the exception
> >    vectors yet, the memblock allocator is free to return that memory
> >    range for any other use. Needless to say if we try to use that memory
> >    later without consulting with memblock, we may and in our case
> >    will get into troubles.
> > 6) Many random early memblock allocations for kernel use before
> >    buddy and sl*b allocators are up and running...
> >    Note if for some fortunate reason the allocations made in 5) didn't
> >    overlap the exceptions memory, here we have much more chances to
> >    do that with obviously fatal consequences of the ranges independent
> >    usage.
> > 7) Trap/exception vectors initialization and !memory reservation! for
> >    them:
> >    start_kernel()->trap_init()
> >    Only at this point we get to reserve the memory for the vectors.
> > 8) Init and run buddy/sl*b allocators:
> >    start_kernel()->mm_init()->...mem_init()...
> > 
> > There are a lot of allocations done in 5) and 6) before the
> > trap_init() is called in 7). You can see that in your log. That's why
> > I have doubts that your patch worked well. Most likely you've
> > forgotten to revert the workaround suggested by me in the previous
> > message. Could you make sure that you didn't and re-test your patch
> > again? If it still works then I might have confused something and it's
> > strange that my patch worked in the first place...
> 

> I would like to submit a fix for 5.12-rc1 and get it back ported into
> 5.11 so we have BMIPS machines boot again, that will be essentially your
> earlier proposed fix.
> 
> BMIPS is the only "legacy" MIPS platform that defines an exception base,
> so while this problem may certainly exist with other platforms, I do
> wonder how likely it is there, though?

Hm, at least we can be sure that the problem exists for each platform,
which conforms to the !cpu_has_mips_r2_r6 condition and which have VEIC/
VINT capability. Those platforms may get out of the first PAGE_SIZE
memory in initializing the exceptions table thus corrupting the memory
possibly allocated for something else. In my case the problem doesn't
manifest itself because the CPU is MIPS32r5.

-Sergey

> 
> > 
> > A food for thoughts for everyone (Thomas, Mark, please join the
> > discussion). What we've got here is a bit bigger problem. AFAICS
> > if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node()
> > performs the allocation above the very first PAGE_SIZE memory chunk
> > (see that method code for details). So we are currently on a safe side
> > for some older MIPS platforms. But the platform with VEIC/VINT may get
> > into the same troubles here if they didn't reserve exception memory
> > early enough before the kernel starts random allocations from
> > memblock. So we either need to provide a generic workaround for that
> > or make sure each platform gets to reserve vectors itself for instance
> > in the plat_mem_setup() method.
> > 
> > -Sergey
> > 
> >>
> >> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> >> index e0352958e2f7..b0a173b500e8 100644
> >> --- a/arch/mips/kernel/traps.c
> >> +++ b/arch/mips/kernel/traps.c
> >> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> >>
> >>         if (!cpu_has_mips_r2_r6) {
> >>                 ebase = CAC_BASE;
> >> -               ebase_pa = virt_to_phys((void *)ebase);
> >>                 vec_size = 0x400;
> >> -
> >> -               memblock_reserve(ebase_pa, vec_size);
> >>         } else {
> >>                 if (cpu_has_veic || cpu_has_vint)
> >>                         vec_size = 0x200 + VECTORSPACING*64;
> >> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> >>
> >>         if (board_ebase_setup)
> >>                 board_ebase_setup();
> >> +
> >> +       /* board_ebase_setup() can change the exception base address
> >> +        * reserve it now after changes were made.
> >> +        */
> >> +       if (!cpu_has_mips_r2_r6) {
> >> +               ebase_pa = virt_to_phys((void *)ebase);
> >> +               memblock_reserve(ebase_pa, vec_size);
> >> +       }
> >>         per_cpu_trap_init(true);
> >>         memblock_set_bottom_up(false);
> >> -- 
> >> Florian
> 
> -- 
> Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
  2021-03-02  8:09                 ` Mike Rapoport
@ 2021-03-02 13:54                 ` Serge Semin
  2021-03-02 19:04                 ` Roman Gushchin
  2021-03-02 23:54                 ` Thomas Bogendoerfer
  3 siblings, 0 replies; 17+ messages in thread
From: Serge Semin @ 2021-03-02 13:54 UTC (permalink / raw)
  To: Florian Fainelli, Thomas Bogendoerfer
  Cc: Serge Semin, Mike Rapoport, linux-mips, guro, akpm, paul,
	Kamal Dasu, Yanteng Si, Huacai Chen,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list

On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception that is going to be installed and this needs to happen at
> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
> finds a space that is suitable, away from reserved memory.
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.
> 
> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")

> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>

I'd change the order of these two tags... 

> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---

> Thomas,
> 
> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
> by the stable team for 5.11. We should find a safer way to avoid these
> problems for 5.13 maybe.

Thomas, could you join the discussion? If we had a more clever
solution to reserve the exceptions table for each possibly affected
platform this patch could have been omitted.

> 
>  arch/mips/bmips/setup.c       | 22 ++++++++++++++++++++++
>  arch/mips/include/asm/traps.h |  2 ++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
> index 31bcfa4e08b9..0088bd45b892 100644
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -149,6 +149,26 @@ void __init plat_time_init(void)
>  	mips_hpt_frequency = freq;
>  }
>  
> +static void __init bmips_ebase_reserve(void)
> +{
> +	phys_addr_t base, size = VECTORSPACING * 64;
> +
> +	switch (current_cpu_type()) {
> +	default:
> +	case CPU_BMIPS4350:
> +		return;
> +	case CPU_BMIPS3300:
> +	case CPU_BMIPS4380:
> +		base = 0x0400;
> +		break;
> +	case CPU_BMIPS5000:
> +		base = 0x1000;
> +		break;
> +	}
> +
> +	memblock_reserve(base, size);
> +}
> +
>  void __init plat_mem_setup(void)
>  {
>  	void *dtb;
> @@ -169,6 +189,8 @@ void __init plat_mem_setup(void)
>  
>  	__dt_setup_arch(dtb);
>  
> +	bmips_ebase_reserve();
> +
>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>  					     q->compatible)) {
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index 6aa8f126a43d..0ba6bb7f9618 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -14,6 +14,8 @@
>  #define MIPS_BE_FIXUP	1		/* return to the fixup code */
>  #define MIPS_BE_FATAL	2		/* treat as an unrecoverable error */
>  

> +#define VECTORSPACING 0x100	/* for EI/VI mode */

What about the same macro declared in arch/mips/kernel/traps.c? I'd suggest
to remove it from there and explicitly #include this header file into
the arch/mips/bmips/setup.c file.

-Sergey

> +
>  extern void (*board_be_init)(void);
>  extern int (*board_be_handler)(struct pt_regs *regs, int is_fixup);
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
  2021-03-02  8:09                 ` Mike Rapoport
  2021-03-02 13:54                 ` Serge Semin
@ 2021-03-02 19:04                 ` Roman Gushchin
  2021-03-02 23:54                 ` Thomas Bogendoerfer
  3 siblings, 0 replies; 17+ messages in thread
From: Roman Gushchin @ 2021-03-02 19:04 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: linux-mips, rppt, fancer.lancer, akpm, paul, Serge Semin,
	Kamal Dasu, Thomas Bogendoerfer, Yanteng Si, Huacai Chen,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list

On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception that is going to be installed and this needs to happen at
> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
> finds a space that is suitable, away from reserved memory.
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.
> 
> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Acked-by: Roman Gushchin <guro@fb.com>

Thank you!

> ---
> Thomas,
> 
> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
> by the stable team for 5.11. We should find a safer way to avoid these
> problems for 5.13 maybe.
> 
>  arch/mips/bmips/setup.c       | 22 ++++++++++++++++++++++
>  arch/mips/include/asm/traps.h |  2 ++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
> index 31bcfa4e08b9..0088bd45b892 100644
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -149,6 +149,26 @@ void __init plat_time_init(void)
>  	mips_hpt_frequency = freq;
>  }
>  
> +static void __init bmips_ebase_reserve(void)
> +{
> +	phys_addr_t base, size = VECTORSPACING * 64;
> +
> +	switch (current_cpu_type()) {
> +	default:
> +	case CPU_BMIPS4350:
> +		return;
> +	case CPU_BMIPS3300:
> +	case CPU_BMIPS4380:
> +		base = 0x0400;
> +		break;
> +	case CPU_BMIPS5000:
> +		base = 0x1000;
> +		break;
> +	}
> +
> +	memblock_reserve(base, size);
> +}
> +
>  void __init plat_mem_setup(void)
>  {
>  	void *dtb;
> @@ -169,6 +189,8 @@ void __init plat_mem_setup(void)
>  
>  	__dt_setup_arch(dtb);
>  
> +	bmips_ebase_reserve();
> +
>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>  					     q->compatible)) {
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index 6aa8f126a43d..0ba6bb7f9618 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -14,6 +14,8 @@
>  #define MIPS_BE_FIXUP	1		/* return to the fixup code */
>  #define MIPS_BE_FATAL	2		/* treat as an unrecoverable error */
>  
> +#define VECTORSPACING 0x100	/* for EI/VI mode */
> +
>  extern void (*board_be_init)(void);
>  extern int (*board_be_handler)(struct pt_regs *regs, int is_fixup);
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
                                   ` (2 preceding siblings ...)
  2021-03-02 19:04                 ` Roman Gushchin
@ 2021-03-02 23:54                 ` Thomas Bogendoerfer
  2021-03-03  1:30                   ` Florian Fainelli
  3 siblings, 1 reply; 17+ messages in thread
From: Thomas Bogendoerfer @ 2021-03-02 23:54 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: linux-mips, rppt, fancer.lancer, guro, akpm, paul, Serge Semin,
	Kamal Dasu, Yanteng Si, Huacai Chen,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list

On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception that is going to be installed and this needs to happen at
> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
> finds a space that is suitable, away from reserved memory.
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.
> 
> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> Thomas,
> 
> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
> by the stable team for 5.11. We should find a safer way to avoid these
> problems for 5.13 maybe.

let's try to make it in one ago. Hwo about reserving vector space in
cpu_probe, if it's known there and leave the rest to trap_init() ?

Below patch got a quick test on IP22 (real hardware) and malta (qemu).
Not sure, if I got all BMIPS parts correct, so please check/test.
BTW. do we really need to EXPORT_SYMBOL ebase ?

Thomas,


diff --git a/arch/mips/include/asm/setup.h b/arch/mips/include/asm/setup.h
index bb36a400203d..3ef62c23c34f 100644
--- a/arch/mips/include/asm/setup.h
+++ b/arch/mips/include/asm/setup.h
@@ -23,7 +23,7 @@ typedef void (*vi_handler_t)(void);
 extern void *set_vi_handler(int n, vi_handler_t addr);
 
 extern void *set_except_vector(int n, void *addr);
-extern unsigned long ebase;
+extern unsigned long ebase, ebase_size;
 extern unsigned int hwrena;
 extern void per_cpu_trap_init(bool);
 extern void cpu_cache_init(void);
diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
index 6aa8f126a43d..f7d59831aae3 100644
--- a/arch/mips/include/asm/traps.h
+++ b/arch/mips/include/asm/traps.h
@@ -26,6 +26,8 @@ extern void (*board_cache_error_setup)(void);
 extern int register_nmi_notifier(struct notifier_block *nb);
 extern char except_vec_nmi[];
 
+#define VECTORSPACING 0x100	/* for EI/VI mode */
+
 #define nmi_notifier(fn, pri)						\
 ({									\
 	static struct notifier_block fn##_nb = {			\
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 9a89637b4ecf..eef1a4e304da 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -13,6 +13,7 @@
 #include <linux/smp.h>
 #include <linux/stddef.h>
 #include <linux/export.h>
+#include <linux/memblock.h>
 
 #include <asm/bugs.h>
 #include <asm/cpu.h>
@@ -25,7 +26,9 @@
 #include <asm/watch.h>
 #include <asm/elf.h>
 #include <asm/pgtable-bits.h>
+#include <asm/setup.h>
 #include <asm/spram.h>
+#include <asm/traps.h>
 #include <linux/uaccess.h>
 
 #include "fpu-probe.h"
@@ -1628,6 +1631,8 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu)
 		c->cputype = CPU_BMIPS3300;
 		__cpu_name[cpu] = "Broadcom BMIPS3300";
 		set_elf_platform(cpu, "bmips3300");
+		ebase = 0x80000400;
+		ebase_size = VECTORSPACING * 64;
 		break;
 	case PRID_IMP_BMIPS43XX: {
 		int rev = c->processor_id & PRID_REV_MASK;
@@ -1638,6 +1643,8 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu)
 			__cpu_name[cpu] = "Broadcom BMIPS4380";
 			set_elf_platform(cpu, "bmips4380");
 			c->options |= MIPS_CPU_RIXI;
+			ebase = 0x80000400;
+			ebase_size = VECTORSPACING * 64;
 		} else {
 			c->cputype = CPU_BMIPS4350;
 			__cpu_name[cpu] = "Broadcom BMIPS4350";
@@ -1654,6 +1661,8 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu)
 			__cpu_name[cpu] = "Broadcom BMIPS5000";
 		set_elf_platform(cpu, "bmips5000");
 		c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
+		ebase = 0x80001000;
+		ebase_size = VECTORSPACING * 64;
 		break;
 	}
 }
@@ -2133,6 +2142,13 @@ void cpu_probe(void)
 	if (cpu == 0)
 		__ua_limit = ~((1ull << cpu_vmbits) - 1);
 #endif
+
+	if (ebase_size == 0 && !cpu_has_mips_r2_r6) {
+		ebase = CAC_BASE;
+		ebase_size = 0x400;
+	}
+	if (ebase_size)
+		memblock_reserve(__pa((void *)ebase), ebase_size);
 }
 
 void cpu_report(void)
diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c
index b6ef5f7312cf..ad3f2282a65a 100644
--- a/arch/mips/kernel/smp-bmips.c
+++ b/arch/mips/kernel/smp-bmips.c
@@ -528,10 +528,6 @@ static void bmips_set_reset_vec(int cpu, u32 val)
 
 void bmips_ebase_setup(void)
 {
-	unsigned long new_ebase = ebase;
-
-	BUG_ON(ebase != CKSEG0);
-
 	switch (current_cpu_type()) {
 	case CPU_BMIPS4350:
 		/*
@@ -554,7 +550,6 @@ void bmips_ebase_setup(void)
 		 * 0x8000_0000: reset/NMI (initially in kseg1)
 		 * 0x8000_0400: normal vectors
 		 */
-		new_ebase = 0x80000400;
 		bmips_set_reset_vec(0, RESET_FROM_KSEG0);
 		break;
 	case CPU_BMIPS5000:
@@ -562,16 +557,14 @@ void bmips_ebase_setup(void)
 		 * 0x8000_0000: reset/NMI (initially in kseg1)
 		 * 0x8000_1000: normal vectors
 		 */
-		new_ebase = 0x80001000;
 		bmips_set_reset_vec(0, RESET_FROM_KSEG0);
-		write_c0_ebase(new_ebase);
+		write_c0_ebase(ebase);
 		break;
 	default:
 		return;
 	}
 
 	board_nmi_handler_setup = &bmips_nmi_handler_setup;
-	ebase = new_ebase;
 }
 
 asmlinkage void __weak plat_wired_tlb_setup(void)
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index e0352958e2f7..21ba9d04683e 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2009,10 +2009,10 @@ void __noreturn nmi_exception_handler(struct pt_regs *regs)
 	nmi_exit();
 }
 
-#define VECTORSPACING 0x100	/* for EI/VI mode */
-
 unsigned long ebase;
 EXPORT_SYMBOL_GPL(ebase);
+unsigned long ebase_size;
+EXPORT_SYMBOL_GPL(ebase_size);
 unsigned long exception_handlers[32];
 unsigned long vi_handlers[64];
 
@@ -2360,27 +2360,22 @@ void __init trap_init(void)
 	extern char except_vec3_generic;
 	extern char except_vec4;
 	extern char except_vec3_r4000;
-	unsigned long i, vec_size;
 	phys_addr_t ebase_pa;
+	unsigned long i;
 
 	check_wait();
 
-	if (!cpu_has_mips_r2_r6) {
-		ebase = CAC_BASE;
-		ebase_pa = virt_to_phys((void *)ebase);
-		vec_size = 0x400;
-
-		memblock_reserve(ebase_pa, vec_size);
-	} else {
+	if (cpu_has_mips_r2_r6) {
 		if (cpu_has_veic || cpu_has_vint)
-			vec_size = 0x200 + VECTORSPACING*64;
+			ebase_size = 0x200 + VECTORSPACING*64;
 		else
-			vec_size = PAGE_SIZE;
+			ebase_size = PAGE_SIZE;
 
-		ebase_pa = memblock_phys_alloc(vec_size, 1 << fls(vec_size));
+		ebase_pa = memblock_phys_alloc(ebase_size,
+					       1 << fls(ebase_size));
 		if (!ebase_pa)
 			panic("%s: Failed to allocate %lu bytes align=0x%x\n",
-			      __func__, vec_size, 1 << fls(vec_size));
+			      __func__, ebase_size, 1 << fls(ebase_size));
 
 		/*
 		 * Try to ensure ebase resides in KSeg0 if possible.
@@ -2534,7 +2529,7 @@ void __init trap_init(void)
 	else
 		set_handler(0x080, &except_vec3_generic, 0x80);
 
-	local_flush_icache_range(ebase, ebase + vec_size);
+	local_flush_icache_range(ebase, ebase + ebase_size);
 
 	sort_extable(__start___dbe_table, __stop___dbe_table);
 



-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-02 23:54                 ` Thomas Bogendoerfer
@ 2021-03-03  1:30                   ` Florian Fainelli
  2021-03-03  9:41                     ` Thomas Bogendoerfer
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Fainelli @ 2021-03-03  1:30 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: linux-mips, rppt, fancer.lancer, guro, akpm, paul, Serge Semin,
	Kamal Dasu, Yanteng Si, Huacai Chen,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list



On 3/2/2021 3:54 PM, Thomas Bogendoerfer wrote:
> On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote:
>> BMIPS is one of the few platforms that do change the exception base.
>> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
>> with kernel_end") we started seeing BMIPS boards fail to boot with the
>> built-in FDT being corrupted.
>>
>> Before the cited commit, early allocations would be in the [kernel_end,
>> RAM_END] range, but after commit they would be within [RAM_START +
>> PAGE_SIZE, RAM_END].
>>
>> The custom exception base handler that is installed by
>> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
>> memory region allocated by unflatten_and_copy_device_tree() thus
>> corrupting the FDT used by the kernel.
>>
>> To fix this, we need to perform an early reservation of the custom
>> exception that is going to be installed and this needs to happen at
>> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
>> finds a space that is suitable, away from reserved memory.
>>
>> Huge thanks to Serget for analysing and proposing a solution to this
>> issue.
>>
>> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
>> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
>> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>> Thomas,
>>
>> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
>> by the stable team for 5.11. We should find a safer way to avoid these
>> problems for 5.13 maybe.
> 
> let's try to make it in one ago. Hwo about reserving vector space in
> cpu_probe, if it's known there and leave the rest to trap_init() ?
> 
> Below patch got a quick test on IP22 (real hardware) and malta (qemu).
> Not sure, if I got all BMIPS parts correct, so please check/test.

Works for me here:

Tested-by: Florian Fainelli <f.fainelli@gmail.com>

Thanks!

> BTW. do we really need to EXPORT_SYMBOL ebase ?

It seems like MIPS KVM support can be built as a module which is why
ebase was exported to modules with
878edf014e29de38c49153aba20273fbc9ae31af ("MIPS: KVM: Restore host EBase
from ebase variable")?
-- 
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-03  1:30                   ` Florian Fainelli
@ 2021-03-03  9:41                     ` Thomas Bogendoerfer
  2021-03-03 17:45                       ` Maciej W. Rozycki
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Bogendoerfer @ 2021-03-03  9:41 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: linux-mips, rppt, fancer.lancer, guro, akpm, paul, Serge Semin,
	Kamal Dasu, Yanteng Si, Huacai Chen,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list

On Tue, Mar 02, 2021 at 05:30:18PM -0800, Florian Fainelli wrote:
> 
> 
> On 3/2/2021 3:54 PM, Thomas Bogendoerfer wrote:
> > On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote:
> >> BMIPS is one of the few platforms that do change the exception base.
> >> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> >> with kernel_end") we started seeing BMIPS boards fail to boot with the
> >> built-in FDT being corrupted.
> >>
> >> Before the cited commit, early allocations would be in the [kernel_end,
> >> RAM_END] range, but after commit they would be within [RAM_START +
> >> PAGE_SIZE, RAM_END].
> >>
> >> The custom exception base handler that is installed by
> >> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> >> memory region allocated by unflatten_and_copy_device_tree() thus
> >> corrupting the FDT used by the kernel.
> >>
> >> To fix this, we need to perform an early reservation of the custom
> >> exception that is going to be installed and this needs to happen at
> >> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
> >> finds a space that is suitable, away from reserved memory.
> >>
> >> Huge thanks to Serget for analysing and proposing a solution to this
> >> issue.
> >>
> >> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
> >> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> >> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
> >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> >> ---
> >> Thomas,
> >>
> >> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
> >> by the stable team for 5.11. We should find a safer way to avoid these
> >> problems for 5.13 maybe.
> > 
> > let's try to make it in one ago. Hwo about reserving vector space in
> > cpu_probe, if it's known there and leave the rest to trap_init() ?
> > 
> > Below patch got a quick test on IP22 (real hardware) and malta (qemu).
> > Not sure, if I got all BMIPS parts correct, so please check/test.
> 
> Works for me here:

perfect, I only forgot about R3k... I'll submit a formal patch submission
later today.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-03  9:41                     ` Thomas Bogendoerfer
@ 2021-03-03 17:45                       ` Maciej W. Rozycki
  2021-03-03 18:15                         ` Thomas Bogendoerfer
  0 siblings, 1 reply; 17+ messages in thread
From: Maciej W. Rozycki @ 2021-03-03 17:45 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Florian Fainelli, linux-mips, rppt, fancer.lancer, guro,
	Andrew Morton, paul, Serge Semin, Kamal Dasu, Yanteng Si,
	Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE,
	open list

On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote:

> perfect, I only forgot about R3k... I'll submit a formal patch submission
> later today.

 What's up with the R3k (the usual trigger for me) here?

  Maciej

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-03 17:45                       ` Maciej W. Rozycki
@ 2021-03-03 18:15                         ` Thomas Bogendoerfer
  2021-03-03 21:50                           ` Maciej W. Rozycki
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Bogendoerfer @ 2021-03-03 18:15 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Florian Fainelli, linux-mips, rppt, fancer.lancer, guro,
	Andrew Morton, paul, Serge Semin, Kamal Dasu, Yanteng Si,
	Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE,
	open list

On Wed, Mar 03, 2021 at 06:45:52PM +0100, Maciej W. Rozycki wrote:
> On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote:
> 
> > perfect, I only forgot about R3k... I'll submit a formal patch submission
> > later today.
> 
>  What's up with the R3k (the usual trigger for me) here?

I've moved r3k cpu_probe() to it's own file and when moving ebase
reservation to cpu_probe(), I need to add it there as well. So just
a mechanic step, I've missed.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-03 18:15                         ` Thomas Bogendoerfer
@ 2021-03-03 21:50                           ` Maciej W. Rozycki
  0 siblings, 0 replies; 17+ messages in thread
From: Maciej W. Rozycki @ 2021-03-03 21:50 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Florian Fainelli, linux-mips, rppt, fancer.lancer, guro,
	Andrew Morton, paul, Serge Semin, Kamal Dasu, Yanteng Si,
	Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE,
	open list

On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote:

> >  What's up with the R3k (the usual trigger for me) here?
> 
> I've moved r3k cpu_probe() to it's own file and when moving ebase
> reservation to cpu_probe(), I need to add it there as well. So just
> a mechanic step, I've missed.

 Ah, right, I didn't notice the split.  Thanks for taking care of it!

  Maciej

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-03-04  0:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20201217201214.3414100-1-guro@fb.com>
     [not found] ` <20201217201214.3414100-2-guro@fb.com>
     [not found]   ` <23fc1ef9-7342-8bc2-d184-d898107c52b2@gmail.com>
     [not found]     ` <20210228090041.GO1447004@kernel.org>
     [not found]       ` <8cbafe95-0f8c-a9b7-2dc9-cded846622fd@gmail.com>
     [not found]         ` <20210228230811.wdae7oaaf3mbpgwl@mobilestation>
2021-03-01  3:50           ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Florian Fainelli
2021-03-01  9:22             ` Serge Semin
2021-03-02  4:09               ` Florian Fainelli
2021-03-02 13:26                 ` Serge Semin
2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
2021-03-02  8:09                 ` Mike Rapoport
2021-03-02 13:54                 ` Serge Semin
2021-03-02 19:04                 ` Roman Gushchin
2021-03-02 23:54                 ` Thomas Bogendoerfer
2021-03-03  1:30                   ` Florian Fainelli
2021-03-03  9:41                     ` Thomas Bogendoerfer
2021-03-03 17:45                       ` Maciej W. Rozycki
2021-03-03 18:15                         ` Thomas Bogendoerfer
2021-03-03 21:50                           ` Maciej W. Rozycki
2021-03-01  9:45             ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Mike Rapoport
2021-03-02  3:55               ` Roman Gushchin
2021-03-02 13:08                 ` Serge Semin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox