From: Qian Cai <cai@lca.pw>
To: David Hildenbrand <david@redhat.com>, linux-kernel@vger.kernel.org
Cc: Dan Williams <dan.j.williams@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org,
linux-mm@kvack.org, Andrew Banman <andrew.banman@hpe.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Arun KS <arunks@codeaurora.org>, Baoquan He <bhe@redhat.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Juergen Gross <jgross@suse.com>,
Keith Busch <keith.busch@intel.com>, Len Brown <lenb@kernel.org>,
Mel Gorman <mgorman@techsingularity.net>,
Michael Ellerman <mpe@ellerman.id.au>,
Michael Neuling <mikey@neuling.org>,
Michal Hocko <mhocko@suse.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
"mike.travis@hpe.com" <mike.travis@hpe.com>,
Oscar Salvador <osalvador@suse.com>,
Oscar Salvador <osalvador@suse.de>,
Paul Mackerras <paulus@samba.org>,
Pavel Tatashin <pasha.tatashin@oracle.com>,
Pavel Tatashin <pasha.tatashin@soleen.com>,
Pavel Tatashin <pavel.tatashin@microsoft.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
Rashmica Gupta <rashmica.g@gmail.com>,
Stephen Rothwell <sfr@canb.auug.org.au>,
Thomas Gleixner <tglx@linutronix.de>,
Vlastimil Babka <vbabka@suse.cz>,
Wei Yang <richard.weiyang@gmail.com>
Subject: Re: [PATCH v3 0/6] mm: Further memory block device cleanups
Date: Fri, 21 Jun 2019 15:07:39 -0400 [thread overview]
Message-ID: <1561144059.5154.52.camel@lca.pw> (raw)
In-Reply-To: <5b4e4f85-8f3e-0a15-6c85-704512205a42@redhat.com>
On Fri, 2019-06-21 at 20:56 +0200, David Hildenbrand wrote:
> On 21.06.19 20:24, David Hildenbrand wrote:
> > On 21.06.19 17:15, Qian Cai wrote:
> > > On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
> > > > @Andrew: Only patch 1, 4 and 6 changed compared to v1.
> > > >
> > > > Some further cleanups around memory block devices. Especially, clean up
> > > > and simplify walk_memory_range(). Including some other minor cleanups.
> > > >
> > > > Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
> > > >
> > > > v2 -> v3:
> > > > - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
> > > > -- Avoid warning on ppc.
> > > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
> > > > -- Fixup a comment regarding hinted devices.
> > > >
> > > > v1 -> v2:
> > > > - "mm: Section numbers use the type "unsigned long""
> > > > -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
> > > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
> > > > -- Fix compilation error
> > > > -- Get rid of the "hint" parameter completely
> > > >
> > > > David Hildenbrand (6):
> > > > mm: Section numbers use the type "unsigned long"
> > > > drivers/base/memory: Use "unsigned long" for block ids
> > > > mm: Make register_mem_sect_under_node() static
> > > > mm/memory_hotplug: Rename walk_memory_range() and pass start+size
> > > > instead of pfns
> > > > mm/memory_hotplug: Move and simplify walk_memory_blocks()
> > > > drivers/base/memory.c: Get rid of find_memory_block_hinted()
> > > >
> > > > arch/powerpc/platforms/powernv/memtrace.c | 23 ++---
> > > > drivers/acpi/acpi_memhotplug.c | 19 +---
> > > > drivers/base/memory.c | 120 +++++++++++++---------
> > > > drivers/base/node.c | 8 +-
> > > > include/linux/memory.h | 5 +-
> > > > include/linux/memory_hotplug.h | 2 -
> > > > include/linux/mmzone.h | 4 +-
> > > > include/linux/node.h | 7 --
> > > > mm/memory_hotplug.c | 57 +---------
> > > > mm/sparse.c | 12 +--
> > > > 10 files changed, 106 insertions(+), 151 deletions(-)
> > > >
> > >
> > > This series causes a few machines are unable to boot triggering endless
> > > soft
> > > lockups. Reverted those commits fixed the issue.
> > >
> > > 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and
> > > pass
> > > start+size instead of pfns"
> > > c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> > > startsize-instead-of-pfns-fix"
> > > 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify
> > > walk_memory_blocks()"
> > > 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> > > find_memory_block_hinted()"
> > > 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-
> > > find_memory_block_hinted-
> > > v3"
> > >
> > > [ 4.582081][ T1] ACPI FADT declares the system doesn't support PCIe
> > > ASPM,
> > > so disable it
> > > [ 4.590405][ T1] ACPI: bus type PCI registered
> > > [ 4.592908][ T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> > > 0x80000000-0x8fffffff] (base 0x80000000)
> > > [ 4.601860][ T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff]
> > > reserved in
> > > E820
> > > [ 4.601860][ T1] PCI: Using configuration type 1 for base access
> > > [ 28.661336][ C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > > [swapper/0:1]
> > > [ 28.671351][ C16] Modules linked in:
> > > [ 28.671354][ C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-
> > > rc5-
> > > next-20190621+ #1
> > > [ 28.681366][ C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [ 28.691334][ C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > > [ 28.701334][ C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53
> > > 48 8b
> > > 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d
> > > <65>
> > > ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> > > [ 28.711354][ C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246
> > > ORIG_RAX:
> > > ffffffffffffff13
> > > [ 28.721372][ C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
> > > ffffffffb6f2a3b8
> > > [ 28.731371][ C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
> > > ffff8882053d6138
> > > [ 28.741371][ C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
> > > ffffed1040a7ac27
> > > [ 28.751334][ C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> > > 0000000000000246
> > > [ 28.751370][ C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
> > > 0000000000000000
> > > [ 28.761368][ C16] FS: 0000000000000000(0000)
> > > GS:ffff888454500000(0000)
> > > knlGS:0000000000000000
> > > [ 28.771373][ C16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 28.781334][ C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> > > 00000000001406a0
> > > [ 28.791333][ C16] Call Trace:
> > > [ 28.791374][ C16] klist_next+0xd8/0x1c0
> > > [ 28.791374][ C16] subsys_find_device_by_id+0x13b/0x1f0
> > > [ 28.801334][ C16] ? bus_find_device_by_name+0x20/0x20
> > > [ 28.801370][ C16] ? kobject_put+0x23/0x250
> > > [ 28.811333][ C16] walk_memory_blocks+0x6c/0xb8
> > > [ 28.811353][ C16] ? write_policy_show+0x40/0x40
> > > [ 28.821334][ C16] link_mem_sections+0x7e/0xa0
> > > [ 28.821369][ C16] ? unregister_memory_block_under_nodes+0x210/0x210
> > > [ 28.831353][ C16] ? __register_one_node+0x3bd/0x600
> > > [ 28.831353][ C16] topology_init+0xbf/0x126
> > > [ 28.841364][ C16] ? enable_cpu0_hotplug+0x1a/0x1a
> > > [ 28.841368][ C16] do_one_initcall+0xfe/0x45a
> > > [ 28.851334][ C16] ? initcall_blacklisted+0x150/0x150
> > > [ 28.851353][ C16] ? kasan_check_write+0x14/0x20
> > > [ 28.861333][ C16] ? up_write+0x75/0x140
> > > [ 28.861369][ C16] kernel_init_freeable+0x619/0x6ac
> > > [ 28.871333][ C16] ? rest_init+0x188/0x188
> > > [ 28.871353][ C16] kernel_init+0x11/0x138
> > > [ 28.881363][ C16] ? rest_init+0x188/0x188
> > > [ 28.881363][ C16] ret_from_fork+0x22/0x40
> > > [ 56.661336][ C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > > [swapper/0:1]
> > > [ 56.671352][ C16] Modules linked in:
> > > [ 56.671354][ C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> > > G L 5.2.0-rc5-next-20190621+ #1
> > > [ 56.681357][ C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [ 56.691356][ C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
> > > [ 56.701334][ C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d
> > > 8b 7e
> > > 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00
> > > <75>
> > > c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
> > > [ 56.721333][ C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287
> > > ORIG_RAX:
> > > ffffffffffffff13
> > > [ 56.721370][ C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
> > > ffffffffb74c9dc1
> > > [ 56.731370][ C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
> > > ffff8888774ec3e0
> > > [ 56.741371][ C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
> > > ffffed1040a7ac27
> > > [ 56.751335][ C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> > > 0000000000085c1b
> > > [ 56.761334][ C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
> > > ffff8888774ec040
> > > [ 56.761372][ C16] FS: 0000000000000000(0000)
> > > GS:ffff888454500000(0000)
> > > knlGS:0000000000000000
> > > [ 56.771374][ C16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 56.781370][ C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> > > 00000000001406a0
> > > [ 56.791373][ C16] Call Trace:
> > > [ 56.791373][ C16] ? bus_find_device_by_name+0x20/0x20
> > > [ 56.801334][ C16] ? kobject_put+0x23/0x250
> > > [ 56.801334][ C16] walk_memory_blocks+0x6c/0xb8
> > > [ 56.811333][ C16] ? write_policy_show+0x40/0x40
> > > [ 56.811353][ C16] link_mem_sections+0x7e/0xa0
> > > [ 56.811353][ C16] ? unregister_memory_block_under_nodes+0x210/0x210
> > > [ 56.821333][ C16] ? __register_one_node+0x3bd/0x600
> > > [ 56.831333][ C16] topology_init+0xbf/0x126
> > > [ 56.831355][ C16] ? enable_cpu0_hotplug+0x1a/0x1a
> > > [ 56.841334][ C16] do_one_initcall+0xfe/0x45a
> > > [ 56.841334][ C16] ? initcall_blacklisted+0x150/0x150
> > > [ 56.851333][ C16] ? kasan_check_write+0x14/0x20
> > > [ 56.851354][ C16] ? up_write+0x75/0x140
> > > [ 56.861333][ C16] kernel_init_freeable+0x619/0x6ac
> > > [ 56.861333][ C16] ? rest_init+0x188/0x188
> > > [ 56.861369][ C16] kernel_init+0x11/0x138
> > > [ 56.871333][ C16] ? rest_init+0x188/0x188
> > > [ 56.871354][ C16] ret_from_fork+0x22/0x40
> > > [ 64.601362][ C16] rcu: INFO: rcu_sched self-detected stall on CPU
> > > [ 64.611335][ C16] rcu: 16-....: (5958 ticks this GP)
> > > idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000
> > > [ 64.621334][ C16] (t=6002 jiffies g=-1079 q=25)
> > > [ 64.621334][ C16] NMI backtrace for cpu 16
> > > [ 64.621374][ C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> > > G L 5.2.0-rc5-next-20190621+ #1
> > > [ 64.631372][ C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [ 64.641371][ C16] Call Trace:
> > > [ 64.651337][ C16] <IRQ>
> > > [ 64.651376][ C16] dump_stack+0x62/0x9a
> > > [ 64.651376][ C16] nmi_cpu_backtrace.cold.0+0x2e/0x33
> > > [ 64.661337][ C16] ? nmi_cpu_backtrace_handler+0x20/0x20
> > > [ 64.661337][ C16] nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
> > > [ 64.671353][ C16] arch_trigger_cpumask_backtrace+0x19/0x20
> > > [ 64.681366][ C16] rcu_dump_cpu_stacks+0x18b/0x1d6
> > > [ 64.681366][ C16] rcu_sched_clock_irq.cold.64+0x368/0x791
> > > [ 64.691336][ C16] ? kasan_check_read+0x11/0x20
> > > [ 64.691354][ C16] ? __raise_softirq_irqoff+0x66/0x150
> > > [ 64.701336][ C16] update_process_times+0x2f/0x60
> > > [ 64.701362][ C16] tick_periodic+0x38/0xe0
> > > [ 64.711334][ C16] tick_handle_periodic+0x2e/0x80
> > > [ 64.711353][ C16] smp_apic_timer_interrupt+0xfb/0x370
> > > [ 64.721367][ C16] apic_timer_interrupt+0xf/0x20
> > > [ 64.721367][ C16] </IRQ>
> > > [ 64.721367][ C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > > [ 64.731370][ C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53
> > >
> >
> > @Qian Cai, unfortunately I can't reproduce.
> >
> > If you get the chance, it would be great if you could retry with
> >
> > diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> > index 972c5336bebf..742f99ddd148 100644
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
> > long size,
> > unsigned long block_id;
> > int ret = 0;
> >
> > + if (!size)
> > + return;
> > +
> > for (block_id = start_block_id; block_id <= end_block_id;
> > block_id++) {
> > mem = find_memory_block_by_id(block_id);
> > if (!mem)
> >
> >
> >
> > If both, start and size are 0, we would get a veeeery long loop. This
> > would mean that we have an online node that does not span any pages at
> > all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).
> >
>
>
> ...trying to reproduce with QEMU (setting 0MB for the second node):
>
> qemu-system-x86_64 --enable-kvm -m 4G,maxmem=20G,slots=2 \
> -smp sockets=2,cores=1 \
> -numa node,nodeid=0,cpus=0,mem=4G \
> -numa node,nodeid=1,cpus=1,mem=0 ...
>
> I can indeed see that the node is online and
> "pgdat->node_start_pfn == 0 && start_pfn + pgdat->node_spanned_pages == 0".
>
> However, the kernel segfaults in an unrelated code path, so I can't
> verify if this solves this problem:
>
> [ 0.313284] BUG: kernel NULL pointer dereference, address: 00000000000000a0
> [ 0.313479] #PF: supervisor read access in kernel mode
> [ 0.313479] #PF: error_code(0x0000) - not-present page
> [ 0.313479] PGD 0 P4D 0
> [ 0.313479] Oops: 0000 [#1] SMP PTI
> [ 0.313479] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-
> 20190620+ #56
> [ 0.313479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
> [ 0.313479] RIP: 0010:bus_add_device+0x59/0x110
> [ 0.313479] Code: 20 48 89 df e8 f8 b4 ff ff 41 89 c4 85 c0 0f 85 81 00 00
> 00 48 8b 53 50 48 85 d2 75 03 48 8b 135
> [ 0.313479] RSP: 0000:ffffb4a6c0013e20 EFLAGS: 00010246
> [ 0.313479] RAX: 0000000000000000 RBX: ffff8b61bac23800 RCX:
> 0000000000000000
> [ 0.313479] RDX: ffff8b61bac29038 RSI: ffff8b61bac23800 RDI:
> ffff8b61bac23800
> [ 0.313479] RBP: ffffffff9d2f4500 R08: 0000000000000000 R09:
> 0000000000000001
> [ 0.313479] R10: 0000000000000000 R11: ffff8b61bad20878 R12:
> 0000000000000000
> [ 0.313479] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> [ 0.313479] FS: 0000000000000000(0000) GS:ffff8b61bba00000(0000)
> knlGS:0000000000000000
> [ 0.313479] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.313479] CR2: 00000000000000a0 CR3: 0000000013c24000 CR4:
> 00000000000006f0
> [ 0.313479] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 0.313479] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 0.313479] Call Trace:
> [ 0.313479] device_add+0x304/0x660
> [ 0.313479] ? __init_waitqueue_head+0x31/0x50
> [ 0.313479] __register_one_node+0x67/0x170
> [ 0.313479] __try_online_node.cold+0x3e/0x78
> [ 0.313479] try_online_node+0x25/0x40
> [ 0.313479] do_cpu_up+0x36/0xc0
> [ 0.313479] smp_init+0x59/0xb3
> [ 0.313479] kernel_init_freeable+0x11a/0x247
> [ 0.313479] ? rest_init+0x23f/0x23f
> [ 0.313479] kernel_init+0x5/0xf1
> [ 0.313479] ret_from_fork+0x3a/0x50
> [ 0.313479] Modules linked in:
>
> Figuring out what goes wrong here (maybe QEMU creating a weird
> system configuration) is a different journey :)
>
That is a separate issue need to revert,
"x86, numa: always initialize all possible nodes"
and then, you should be able to reproduce.
WARNING: multiple messages have this Message-ID (diff)
From: Qian Cai <cai@lca.pw>
To: David Hildenbrand <david@redhat.com>, linux-kernel@vger.kernel.org
Cc: Oscar Salvador <osalvador@suse.com>,
Michal Hocko <mhocko@suse.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Wei Yang <richard.weiyang@gmail.com>,
Keith Busch <keith.busch@intel.com>,
linux-mm@kvack.org, Arun KS <arunks@codeaurora.org>,
Rashmica Gupta <rashmica.g@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Stephen Rothwell <sfr@canb.auug.org.au>,
Michael Neuling <mikey@neuling.org>, Baoquan He <bhe@redhat.com>,
Pavel Tatashin <pasha.tatashin@oracle.com>,
linux-acpi@vger.kernel.org, Len Brown <lenb@kernel.org>,
Pavel Tatashin <pavel.tatashin@microsoft.com>,
Pavel Tatashin <pasha.tatashin@soleen.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
"mike.travis@hpe.com" <mike.travis@hpe.com>,
linuxppc-dev@lists.ozlabs.org,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
Dan Williams <dan.j.williams@intel.com>,
Vlastimil Babka <vbabka@suse.cz>,
Oscar Salvador <osalvador@suse.de>,
Juergen Gross <jgross@suse.com>,
Andrew Banman <andrew.banman@hpe.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
Johannes Weiner <hannes@cmpxchg.org>,
Paul Mackerras <paulus@samba.org>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH v3 0/6] mm: Further memory block device cleanups
Date: Fri, 21 Jun 2019 15:07:39 -0400 [thread overview]
Message-ID: <1561144059.5154.52.camel@lca.pw> (raw)
In-Reply-To: <5b4e4f85-8f3e-0a15-6c85-704512205a42@redhat.com>
On Fri, 2019-06-21 at 20:56 +0200, David Hildenbrand wrote:
> On 21.06.19 20:24, David Hildenbrand wrote:
> > On 21.06.19 17:15, Qian Cai wrote:
> > > On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
> > > > @Andrew: Only patch 1, 4 and 6 changed compared to v1.
> > > >
> > > > Some further cleanups around memory block devices. Especially, clean up
> > > > and simplify walk_memory_range(). Including some other minor cleanups.
> > > >
> > > > Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
> > > >
> > > > v2 -> v3:
> > > > - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
> > > > -- Avoid warning on ppc.
> > > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
> > > > -- Fixup a comment regarding hinted devices.
> > > >
> > > > v1 -> v2:
> > > > - "mm: Section numbers use the type "unsigned long""
> > > > -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
> > > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
> > > > -- Fix compilation error
> > > > -- Get rid of the "hint" parameter completely
> > > >
> > > > David Hildenbrand (6):
> > > > mm: Section numbers use the type "unsigned long"
> > > > drivers/base/memory: Use "unsigned long" for block ids
> > > > mm: Make register_mem_sect_under_node() static
> > > > mm/memory_hotplug: Rename walk_memory_range() and pass start+size
> > > > instead of pfns
> > > > mm/memory_hotplug: Move and simplify walk_memory_blocks()
> > > > drivers/base/memory.c: Get rid of find_memory_block_hinted()
> > > >
> > > > arch/powerpc/platforms/powernv/memtrace.c | 23 ++---
> > > > drivers/acpi/acpi_memhotplug.c | 19 +---
> > > > drivers/base/memory.c | 120 +++++++++++++---------
> > > > drivers/base/node.c | 8 +-
> > > > include/linux/memory.h | 5 +-
> > > > include/linux/memory_hotplug.h | 2 -
> > > > include/linux/mmzone.h | 4 +-
> > > > include/linux/node.h | 7 --
> > > > mm/memory_hotplug.c | 57 +---------
> > > > mm/sparse.c | 12 +--
> > > > 10 files changed, 106 insertions(+), 151 deletions(-)
> > > >
> > >
> > > This series causes a few machines are unable to boot triggering endless
> > > soft
> > > lockups. Reverted those commits fixed the issue.
> > >
> > > 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and
> > > pass
> > > start+size instead of pfns"
> > > c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> > > startsize-instead-of-pfns-fix"
> > > 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify
> > > walk_memory_blocks()"
> > > 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> > > find_memory_block_hinted()"
> > > 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-
> > > find_memory_block_hinted-
> > > v3"
> > >
> > > [ 4.582081][ T1] ACPI FADT declares the system doesn't support PCIe
> > > ASPM,
> > > so disable it
> > > [ 4.590405][ T1] ACPI: bus type PCI registered
> > > [ 4.592908][ T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> > > 0x80000000-0x8fffffff] (base 0x80000000)
> > > [ 4.601860][ T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff]
> > > reserved in
> > > E820
> > > [ 4.601860][ T1] PCI: Using configuration type 1 for base access
> > > [ 28.661336][ C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > > [swapper/0:1]
> > > [ 28.671351][ C16] Modules linked in:
> > > [ 28.671354][ C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-
> > > rc5-
> > > next-20190621+ #1
> > > [ 28.681366][ C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [ 28.691334][ C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > > [ 28.701334][ C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53
> > > 48 8b
> > > 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d
> > > <65>
> > > ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> > > [ 28.711354][ C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246
> > > ORIG_RAX:
> > > ffffffffffffff13
> > > [ 28.721372][ C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
> > > ffffffffb6f2a3b8
> > > [ 28.731371][ C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
> > > ffff8882053d6138
> > > [ 28.741371][ C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
> > > ffffed1040a7ac27
> > > [ 28.751334][ C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> > > 0000000000000246
> > > [ 28.751370][ C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
> > > 0000000000000000
> > > [ 28.761368][ C16] FS: 0000000000000000(0000)
> > > GS:ffff888454500000(0000)
> > > knlGS:0000000000000000
> > > [ 28.771373][ C16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 28.781334][ C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> > > 00000000001406a0
> > > [ 28.791333][ C16] Call Trace:
> > > [ 28.791374][ C16] klist_next+0xd8/0x1c0
> > > [ 28.791374][ C16] subsys_find_device_by_id+0x13b/0x1f0
> > > [ 28.801334][ C16] ? bus_find_device_by_name+0x20/0x20
> > > [ 28.801370][ C16] ? kobject_put+0x23/0x250
> > > [ 28.811333][ C16] walk_memory_blocks+0x6c/0xb8
> > > [ 28.811353][ C16] ? write_policy_show+0x40/0x40
> > > [ 28.821334][ C16] link_mem_sections+0x7e/0xa0
> > > [ 28.821369][ C16] ? unregister_memory_block_under_nodes+0x210/0x210
> > > [ 28.831353][ C16] ? __register_one_node+0x3bd/0x600
> > > [ 28.831353][ C16] topology_init+0xbf/0x126
> > > [ 28.841364][ C16] ? enable_cpu0_hotplug+0x1a/0x1a
> > > [ 28.841368][ C16] do_one_initcall+0xfe/0x45a
> > > [ 28.851334][ C16] ? initcall_blacklisted+0x150/0x150
> > > [ 28.851353][ C16] ? kasan_check_write+0x14/0x20
> > > [ 28.861333][ C16] ? up_write+0x75/0x140
> > > [ 28.861369][ C16] kernel_init_freeable+0x619/0x6ac
> > > [ 28.871333][ C16] ? rest_init+0x188/0x188
> > > [ 28.871353][ C16] kernel_init+0x11/0x138
> > > [ 28.881363][ C16] ? rest_init+0x188/0x188
> > > [ 28.881363][ C16] ret_from_fork+0x22/0x40
> > > [ 56.661336][ C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > > [swapper/0:1]
> > > [ 56.671352][ C16] Modules linked in:
> > > [ 56.671354][ C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> > > G L 5.2.0-rc5-next-20190621+ #1
> > > [ 56.681357][ C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [ 56.691356][ C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
> > > [ 56.701334][ C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d
> > > 8b 7e
> > > 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00
> > > <75>
> > > c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
> > > [ 56.721333][ C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287
> > > ORIG_RAX:
> > > ffffffffffffff13
> > > [ 56.721370][ C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
> > > ffffffffb74c9dc1
> > > [ 56.731370][ C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
> > > ffff8888774ec3e0
> > > [ 56.741371][ C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
> > > ffffed1040a7ac27
> > > [ 56.751335][ C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> > > 0000000000085c1b
> > > [ 56.761334][ C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
> > > ffff8888774ec040
> > > [ 56.761372][ C16] FS: 0000000000000000(0000)
> > > GS:ffff888454500000(0000)
> > > knlGS:0000000000000000
> > > [ 56.771374][ C16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 56.781370][ C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> > > 00000000001406a0
> > > [ 56.791373][ C16] Call Trace:
> > > [ 56.791373][ C16] ? bus_find_device_by_name+0x20/0x20
> > > [ 56.801334][ C16] ? kobject_put+0x23/0x250
> > > [ 56.801334][ C16] walk_memory_blocks+0x6c/0xb8
> > > [ 56.811333][ C16] ? write_policy_show+0x40/0x40
> > > [ 56.811353][ C16] link_mem_sections+0x7e/0xa0
> > > [ 56.811353][ C16] ? unregister_memory_block_under_nodes+0x210/0x210
> > > [ 56.821333][ C16] ? __register_one_node+0x3bd/0x600
> > > [ 56.831333][ C16] topology_init+0xbf/0x126
> > > [ 56.831355][ C16] ? enable_cpu0_hotplug+0x1a/0x1a
> > > [ 56.841334][ C16] do_one_initcall+0xfe/0x45a
> > > [ 56.841334][ C16] ? initcall_blacklisted+0x150/0x150
> > > [ 56.851333][ C16] ? kasan_check_write+0x14/0x20
> > > [ 56.851354][ C16] ? up_write+0x75/0x140
> > > [ 56.861333][ C16] kernel_init_freeable+0x619/0x6ac
> > > [ 56.861333][ C16] ? rest_init+0x188/0x188
> > > [ 56.861369][ C16] kernel_init+0x11/0x138
> > > [ 56.871333][ C16] ? rest_init+0x188/0x188
> > > [ 56.871354][ C16] ret_from_fork+0x22/0x40
> > > [ 64.601362][ C16] rcu: INFO: rcu_sched self-detected stall on CPU
> > > [ 64.611335][ C16] rcu: 16-....: (5958 ticks this GP)
> > > idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000
> > > [ 64.621334][ C16] (t=6002 jiffies g=-1079 q=25)
> > > [ 64.621334][ C16] NMI backtrace for cpu 16
> > > [ 64.621374][ C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> > > G L 5.2.0-rc5-next-20190621+ #1
> > > [ 64.631372][ C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [ 64.641371][ C16] Call Trace:
> > > [ 64.651337][ C16] <IRQ>
> > > [ 64.651376][ C16] dump_stack+0x62/0x9a
> > > [ 64.651376][ C16] nmi_cpu_backtrace.cold.0+0x2e/0x33
> > > [ 64.661337][ C16] ? nmi_cpu_backtrace_handler+0x20/0x20
> > > [ 64.661337][ C16] nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
> > > [ 64.671353][ C16] arch_trigger_cpumask_backtrace+0x19/0x20
> > > [ 64.681366][ C16] rcu_dump_cpu_stacks+0x18b/0x1d6
> > > [ 64.681366][ C16] rcu_sched_clock_irq.cold.64+0x368/0x791
> > > [ 64.691336][ C16] ? kasan_check_read+0x11/0x20
> > > [ 64.691354][ C16] ? __raise_softirq_irqoff+0x66/0x150
> > > [ 64.701336][ C16] update_process_times+0x2f/0x60
> > > [ 64.701362][ C16] tick_periodic+0x38/0xe0
> > > [ 64.711334][ C16] tick_handle_periodic+0x2e/0x80
> > > [ 64.711353][ C16] smp_apic_timer_interrupt+0xfb/0x370
> > > [ 64.721367][ C16] apic_timer_interrupt+0xf/0x20
> > > [ 64.721367][ C16] </IRQ>
> > > [ 64.721367][ C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > > [ 64.731370][ C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53
> > >
> >
> > @Qian Cai, unfortunately I can't reproduce.
> >
> > If you get the chance, it would be great if you could retry with
> >
> > diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> > index 972c5336bebf..742f99ddd148 100644
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
> > long size,
> > unsigned long block_id;
> > int ret = 0;
> >
> > + if (!size)
> > + return;
> > +
> > for (block_id = start_block_id; block_id <= end_block_id;
> > block_id++) {
> > mem = find_memory_block_by_id(block_id);
> > if (!mem)
> >
> >
> >
> > If both, start and size are 0, we would get a veeeery long loop. This
> > would mean that we have an online node that does not span any pages at
> > all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).
> >
>
>
> ...trying to reproduce with QEMU (setting 0MB for the second node):
>
> qemu-system-x86_64 --enable-kvm -m 4G,maxmem=20G,slots=2 \
> -smp sockets=2,cores=1 \
> -numa node,nodeid=0,cpus=0,mem=4G \
> -numa node,nodeid=1,cpus=1,mem=0 ...
>
> I can indeed see that the node is online and
> "pgdat->node_start_pfn == 0 && start_pfn + pgdat->node_spanned_pages == 0".
>
> However, the kernel segfaults in an unrelated code path, so I can't
> verify if this solves this problem:
>
> [ 0.313284] BUG: kernel NULL pointer dereference, address: 00000000000000a0
> [ 0.313479] #PF: supervisor read access in kernel mode
> [ 0.313479] #PF: error_code(0x0000) - not-present page
> [ 0.313479] PGD 0 P4D 0
> [ 0.313479] Oops: 0000 [#1] SMP PTI
> [ 0.313479] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-
> 20190620+ #56
> [ 0.313479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
> [ 0.313479] RIP: 0010:bus_add_device+0x59/0x110
> [ 0.313479] Code: 20 48 89 df e8 f8 b4 ff ff 41 89 c4 85 c0 0f 85 81 00 00
> 00 48 8b 53 50 48 85 d2 75 03 48 8b 135
> [ 0.313479] RSP: 0000:ffffb4a6c0013e20 EFLAGS: 00010246
> [ 0.313479] RAX: 0000000000000000 RBX: ffff8b61bac23800 RCX:
> 0000000000000000
> [ 0.313479] RDX: ffff8b61bac29038 RSI: ffff8b61bac23800 RDI:
> ffff8b61bac23800
> [ 0.313479] RBP: ffffffff9d2f4500 R08: 0000000000000000 R09:
> 0000000000000001
> [ 0.313479] R10: 0000000000000000 R11: ffff8b61bad20878 R12:
> 0000000000000000
> [ 0.313479] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> [ 0.313479] FS: 0000000000000000(0000) GS:ffff8b61bba00000(0000)
> knlGS:0000000000000000
> [ 0.313479] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.313479] CR2: 00000000000000a0 CR3: 0000000013c24000 CR4:
> 00000000000006f0
> [ 0.313479] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 0.313479] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 0.313479] Call Trace:
> [ 0.313479] device_add+0x304/0x660
> [ 0.313479] ? __init_waitqueue_head+0x31/0x50
> [ 0.313479] __register_one_node+0x67/0x170
> [ 0.313479] __try_online_node.cold+0x3e/0x78
> [ 0.313479] try_online_node+0x25/0x40
> [ 0.313479] do_cpu_up+0x36/0xc0
> [ 0.313479] smp_init+0x59/0xb3
> [ 0.313479] kernel_init_freeable+0x11a/0x247
> [ 0.313479] ? rest_init+0x23f/0x23f
> [ 0.313479] kernel_init+0x5/0xf1
> [ 0.313479] ret_from_fork+0x3a/0x50
> [ 0.313479] Modules linked in:
>
> Figuring out what goes wrong here (maybe QEMU creating a weird
> system configuration) is a different journey :)
>
That is a separate issue need to revert,
"x86, numa: always initialize all possible nodes"
and then, you should be able to reproduce.
next prev parent reply other threads:[~2019-06-21 19:07 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-20 18:31 [PATCH v3 0/6] mm: Further memory block device cleanups David Hildenbrand
2019-06-20 18:31 ` David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 1/6] mm: Section numbers use the type "unsigned long" David Hildenbrand
2019-06-20 18:31 ` David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 2/6] drivers/base/memory: Use "unsigned long" for block ids David Hildenbrand
2019-06-20 18:31 ` David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 3/6] mm: Make register_mem_sect_under_node() static David Hildenbrand
2019-06-20 18:31 ` David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 4/6] mm/memory_hotplug: Rename walk_memory_range() and pass start+size instead of pfns David Hildenbrand
2019-06-20 18:31 ` David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 5/6] mm/memory_hotplug: Move and simplify walk_memory_blocks() David Hildenbrand
2019-06-20 18:31 ` David Hildenbrand
2019-06-21 15:26 ` David Hildenbrand
2019-06-21 15:26 ` David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 6/6] drivers/base/memory.c: Get rid of find_memory_block_hinted() David Hildenbrand
2019-06-20 18:31 ` David Hildenbrand
2019-06-21 15:15 ` [PATCH v3 0/6] mm: Further memory block device cleanups Qian Cai
2019-06-21 15:15 ` Qian Cai
2019-06-21 15:22 ` David Hildenbrand
2019-06-21 15:22 ` David Hildenbrand
2019-06-21 18:24 ` David Hildenbrand
2019-06-21 18:24 ` David Hildenbrand
2019-06-21 18:56 ` David Hildenbrand
2019-06-21 18:56 ` David Hildenbrand
2019-06-21 19:07 ` Qian Cai [this message]
2019-06-21 19:07 ` Qian Cai
2019-06-21 19:25 ` David Hildenbrand
2019-06-21 19:25 ` David Hildenbrand
2019-06-21 19:29 ` Qian Cai
2019-06-21 19:29 ` Qian Cai
2019-06-21 23:42 ` Andrew Morton
2019-06-21 23:42 ` Andrew Morton
2019-06-21 19:38 ` [PATCH] drivers/base/memory.c: Fix "mm/memory_hotplug: Move and simplify walk_memory_blocks()" David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1561144059.5154.52.camel@lca.pw \
--to=cai@lca.pw \
--cc=akpm@linux-foundation.org \
--cc=andrew.banman@hpe.com \
--cc=anshuman.khandual@arm.com \
--cc=arunks@codeaurora.org \
--cc=benh@kernel.crashing.org \
--cc=bhe@redhat.com \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=jgross@suse.com \
--cc=keith.busch@intel.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=mike.travis@hpe.com \
--cc=mikey@neuling.org \
--cc=mpe@ellerman.id.au \
--cc=osalvador@suse.com \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@oracle.com \
--cc=pasha.tatashin@soleen.com \
--cc=paulus@samba.org \
--cc=pavel.tatashin@microsoft.com \
--cc=rafael@kernel.org \
--cc=rashmica.g@gmail.com \
--cc=richard.weiyang@gmail.com \
--cc=rjw@rjwysocki.net \
--cc=rppt@linux.vnet.ibm.com \
--cc=sfr@canb.auug.org.au \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.