From: David Hildenbrand <david@redhat.com>
To: Qian Cai <cai@lca.pw>, linux-kernel@vger.kernel.org
Cc: Oscar Salvador <osalvador@suse.com>,
Michal Hocko <mhocko@suse.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Wei Yang <richard.weiyang@gmail.com>,
Keith Busch <keith.busch@intel.com>,
linux-mm@kvack.org, Arun KS <arunks@codeaurora.org>,
Rashmica Gupta <rashmica.g@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Stephen Rothwell <sfr@canb.auug.org.au>,
Michael Neuling <mikey@neuling.org>, Baoquan He <bhe@redhat.com>,
Pavel Tatashin <pasha.tatashin@oracle.com>,
linux-acpi@vger.kernel.org, Len Brown <lenb@kernel.org>,
Pavel Tatashin <pavel.tatashin@microsoft.com>,
Pavel Tatashin <pasha.tatashin@soleen.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
"mike.travis@hpe.com" <mike.travis@hpe.com>,
linuxppc-dev@lists.ozlabs.org,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
Dan Williams <dan.j.williams@intel.com>,
Vlastimil Babka <vbabka@suse.cz>,
Oscar Salvador <osalvador@suse.de>,
Juergen Gross <jgross@suse.com>,
Andrew Banman <andrew.banman@hpe.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
Johannes Weiner <hannes@cmpxchg.org>,
Paul Mackerras <paulus@samba.org>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH v3 0/6] mm: Further memory block device cleanups
Date: Fri, 21 Jun 2019 17:22:13 +0200 [thread overview]
Message-ID: <f4d5979b-f7c4-0249-6c09-06539cc5824c@redhat.com> (raw)
In-Reply-To: <1561130120.5154.47.camel@lca.pw>
On 21.06.19 17:15, Qian Cai wrote:
> On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
>> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>>
>> Some further cleanups around memory block devices. Especially, clean up
>> and simplify walk_memory_range(). Including some other minor cleanups.
>>
>> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>>
>> v2 -> v3:
>> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
>> -- Avoid warning on ppc.
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
>> -- Fixup a comment regarding hinted devices.
>>
>> v1 -> v2:
>> - "mm: Section numbers use the type "unsigned long""
>> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
>> -- Fix compilation error
>> -- Get rid of the "hint" parameter completely
>>
>> David Hildenbrand (6):
>> mm: Section numbers use the type "unsigned long"
>> drivers/base/memory: Use "unsigned long" for block ids
>> mm: Make register_mem_sect_under_node() static
>> mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>> instead of pfns
>> mm/memory_hotplug: Move and simplify walk_memory_blocks()
>> drivers/base/memory.c: Get rid of find_memory_block_hinted()
>>
>> arch/powerpc/platforms/powernv/memtrace.c | 23 ++---
>> drivers/acpi/acpi_memhotplug.c | 19 +---
>> drivers/base/memory.c | 120 +++++++++++++---------
>> drivers/base/node.c | 8 +-
>> include/linux/memory.h | 5 +-
>> include/linux/memory_hotplug.h | 2 -
>> include/linux/mmzone.h | 4 +-
>> include/linux/node.h | 7 --
>> mm/memory_hotplug.c | 57 +---------
>> mm/sparse.c | 12 +--
>> 10 files changed, 106 insertions(+), 151 deletions(-)
>>
>
> This series causes a few machines are unable to boot triggering endless soft
> lockups. Reverted those commits fixed the issue.
>
> 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
> start+size instead of pfns"
> c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> startsize-instead-of-pfns-fix"
> 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify walk_memory_blocks()"
> 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> find_memory_block_hinted()"
> 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
> v3"
>
> [ 4.582081][ T1] ACPI FADT declares the system doesn't support PCIe ASPM,
> so disable it
> [ 4.590405][ T1] ACPI: bus type PCI registered
> [ 4.592908][ T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> 0x80000000-0x8fffffff] (base 0x80000000)
> [ 4.601860][ T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in
> E820
> [ 4.601860][ T1] PCI: Using configuration type 1 for base access
> [ 28.661336][ C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> [swapper/0:1]
> [ 28.671351][ C16] Modules linked in:
> [ 28.671354][ C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
> next-20190621+ #1
> [ 28.681366][ C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [ 28.691334][ C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> [ 28.701334][ C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 8b
> 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d <65>
> ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> [ 28.711354][ C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff13
> [ 28.721372][ C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
> ffffffffb6f2a3b8
> [ 28.731371][ C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
> ffff8882053d6138
> [ 28.741371][ C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
> ffffed1040a7ac27
> [ 28.751334][ C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> 0000000000000246
> [ 28.751370][ C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
> 0000000000000000
> [ 28.761368][ C16] FS: 0000000000000000(0000) GS:ffff888454500000(0000)
> knlGS:0000000000000000
> [ 28.771373][ C16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 28.781334][ C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> 00000000001406a0
> [ 28.791333][ C16] Call Trace:
> [ 28.791374][ C16] klist_next+0xd8/0x1c0
> [ 28.791374][ C16] subsys_find_device_by_id+0x13b/0x1f0
> [ 28.801334][ C16] ? bus_find_device_by_name+0x20/0x20
> [ 28.801370][ C16] ? kobject_put+0x23/0x250
> [ 28.811333][ C16] walk_memory_blocks+0x6c/0xb8
> [ 28.811353][ C16] ? write_policy_show+0x40/0x40
> [ 28.821334][ C16] link_mem_sections+0x7e/0xa0
> [ 28.821369][ C16] ? unregister_memory_block_under_nodes+0x210/0x210
> [ 28.831353][ C16] ? __register_one_node+0x3bd/0x600
> [ 28.831353][ C16] topology_init+0xbf/0x126
> [ 28.841364][ C16] ? enable_cpu0_hotplug+0x1a/0x1a
> [ 28.841368][ C16] do_one_initcall+0xfe/0x45a
> [ 28.851334][ C16] ? initcall_blacklisted+0x150/0x150
> [ 28.851353][ C16] ? kasan_check_write+0x14/0x20
> [ 28.861333][ C16] ? up_write+0x75/0x140
> [ 28.861369][ C16] kernel_init_freeable+0x619/0x6ac
> [ 28.871333][ C16] ? rest_init+0x188/0x188
> [ 28.871353][ C16] kernel_init+0x11/0x138
> [ 28.881363][ C16] ? rest_init+0x188/0x188
> [ 28.881363][ C16] ret_from_fork+0x22/0x40
> [ 56.661336][ C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> [swapper/0:1]
> [ 56.671352][ C16] Modules linked in:
> [ 56.671354][ C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> G L 5.2.0-rc5-next-20190621+ #1
> [ 56.681357][ C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [ 56.691356][ C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
> [ 56.701334][ C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d 8b 7e
> 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00 <75>
> c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
> [ 56.721333][ C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287 ORIG_RAX:
> ffffffffffffff13
> [ 56.721370][ C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
> ffffffffb74c9dc1
> [ 56.731370][ C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
> ffff8888774ec3e0
> [ 56.741371][ C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
> ffffed1040a7ac27
> [ 56.751335][ C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> 0000000000085c1b
> [ 56.761334][ C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
> ffff8888774ec040
> [ 56.761372][ C16] FS: 0000000000000000(0000) GS:ffff888454500000(0000)
> knlGS:0000000000000000
> [ 56.771374][ C16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 56.781370][ C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> 00000000001406a0
> [ 56.791373][ C16] Call Trace:
> [ 56.791373][ C16] ? bus_find_device_by_name+0x20/0x20
> [ 56.801334][ C16] ? kobject_put+0x23/0x250
> [ 56.801334][ C16] walk_memory_blocks+0x6c/0xb8
> [ 56.811333][ C16] ? write_policy_show+0x40/0x40
> [ 56.811353][ C16] link_mem_sections+0x7e/0xa0
> [ 56.811353][ C16] ? unregister_memory_block_under_nodes+0x210/0x210
> [ 56.821333][ C16] ? __register_one_node+0x3bd/0x600
> [ 56.831333][ C16] topology_init+0xbf/0x126
> [ 56.831355][ C16] ? enable_cpu0_hotplug+0x1a/0x1a
> [ 56.841334][ C16] do_one_initcall+0xfe/0x45a
> [ 56.841334][ C16] ? initcall_blacklisted+0x150/0x150
> [ 56.851333][ C16] ? kasan_check_write+0x14/0x20
> [ 56.851354][ C16] ? up_write+0x75/0x140
> [ 56.861333][ C16] kernel_init_freeable+0x619/0x6ac
> [ 56.861333][ C16] ? rest_init+0x188/0x188
> [ 56.861369][ C16] kernel_init+0x11/0x138
> [ 56.871333][ C16] ? rest_init+0x188/0x188
> [ 56.871354][ C16] ret_from_fork+0x22/0x40
> [ 64.601362][ C16] rcu: INFO: rcu_sched self-detected stall on CPU
> [ 64.611335][ C16] rcu: 16-....: (5958 ticks this GP)
> idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000
> [ 64.621334][ C16] (t=6002 jiffies g=-1079 q=25)
> [ 64.621334][ C16] NMI backtrace for cpu 16
> [ 64.621374][ C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> G L 5.2.0-rc5-next-20190621+ #1
> [ 64.631372][ C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [ 64.641371][ C16] Call Trace:
> [ 64.651337][ C16] <IRQ>
> [ 64.651376][ C16] dump_stack+0x62/0x9a
> [ 64.651376][ C16] nmi_cpu_backtrace.cold.0+0x2e/0x33
> [ 64.661337][ C16] ? nmi_cpu_backtrace_handler+0x20/0x20
> [ 64.661337][ C16] nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
> [ 64.671353][ C16] arch_trigger_cpumask_backtrace+0x19/0x20
> [ 64.681366][ C16] rcu_dump_cpu_stacks+0x18b/0x1d6
> [ 64.681366][ C16] rcu_sched_clock_irq.cold.64+0x368/0x791
> [ 64.691336][ C16] ? kasan_check_read+0x11/0x20
> [ 64.691354][ C16] ? __raise_softirq_irqoff+0x66/0x150
> [ 64.701336][ C16] update_process_times+0x2f/0x60
> [ 64.701362][ C16] tick_periodic+0x38/0xe0
> [ 64.711334][ C16] tick_handle_periodic+0x2e/0x80
> [ 64.711353][ C16] smp_apic_timer_interrupt+0xfb/0x370
> [ 64.721367][ C16] apic_timer_interrupt+0xf/0x20
> [ 64.721367][ C16] </IRQ>
> [ 64.721367][ C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> [ 64.731370][ C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53
>
Thanks for the report. Man, this series is nastier than I thought. This
is making more noise than I was hoping for.
@Andrew can you revert patch 4-6 for now? I'll be on vacation soon and
don't want cleanups to constantly break things. Just nasty.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2019-06-21 15:25 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-20 18:31 [PATCH v3 0/6] mm: Further memory block device cleanups David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 1/6] mm: Section numbers use the type "unsigned long" David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 2/6] drivers/base/memory: Use "unsigned long" for block ids David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 3/6] mm: Make register_mem_sect_under_node() static David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 4/6] mm/memory_hotplug: Rename walk_memory_range() and pass start+size instead of pfns David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 5/6] mm/memory_hotplug: Move and simplify walk_memory_blocks() David Hildenbrand
2019-06-21 15:26 ` David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 6/6] drivers/base/memory.c: Get rid of find_memory_block_hinted() David Hildenbrand
2019-06-21 15:15 ` [PATCH v3 0/6] mm: Further memory block device cleanups Qian Cai
2019-06-21 15:22 ` David Hildenbrand [this message]
2019-06-21 18:24 ` David Hildenbrand
2019-06-21 18:56 ` David Hildenbrand
2019-06-21 19:07 ` Qian Cai
2019-06-21 19:25 ` David Hildenbrand
2019-06-21 19:29 ` Qian Cai
2019-06-21 23:42 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f4d5979b-f7c4-0249-6c09-06539cc5824c@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.banman@hpe.com \
--cc=anshuman.khandual@arm.com \
--cc=arunks@codeaurora.org \
--cc=bhe@redhat.com \
--cc=cai@lca.pw \
--cc=dan.j.williams@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=jgross@suse.com \
--cc=keith.busch@intel.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=mike.travis@hpe.com \
--cc=mikey@neuling.org \
--cc=osalvador@suse.com \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@oracle.com \
--cc=pasha.tatashin@soleen.com \
--cc=paulus@samba.org \
--cc=pavel.tatashin@microsoft.com \
--cc=rafael@kernel.org \
--cc=rashmica.g@gmail.com \
--cc=richard.weiyang@gmail.com \
--cc=rjw@rjwysocki.net \
--cc=rppt@linux.vnet.ibm.com \
--cc=sfr@canb.auug.org.au \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).