All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@gmail.com>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: [PATCH] memory_hotplug: cond_resched in __remove_pages
Date: Wed, 31 Oct 2018 13:58:40 +0100	[thread overview]
Message-ID: <20181031125840.23982-1-mhocko@kernel.org> (raw)

From: Michal Hocko <mhocko@suse.com>

We have received a bug report that unbinding a large pmem (>1TB)
can result in a soft lockup:
[  380.339203] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [ndctl:4365]
[...]
[  380.339316] Supported: Yes
[  380.339318] CPU: 9 PID: 4365 Comm: ndctl Not tainted 4.12.14-94.40-default #1 SLE12-SP4
[  380.339318] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018
[  380.339319] task: ffff9cce7d4410c0 task.stack: ffffbe9eb1bc4000
[  380.339325] RIP: 0010:__put_page+0x62/0x80
[  380.339326] RSP: 0018:ffffbe9eb1bc7d30 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff10
[  380.339327] RAX: 000040540081c0d3 RBX: ffffeb8f03557200 RCX: 000063af40000000
[  380.339328] RDX: 0000000000000002 RSI: ffff9cce75bff498 RDI: ffff9e4a76072ff8
[  380.339329] RBP: 0000000a43557200 R08: 0000000000000000 R09: ffffbe9eb1bc7bb0
[  380.339329] R10: ffffbe9eb1bc7d08 R11: 0000000000000000 R12: ffff9e194a22a0e0
[  380.339330] R13: ffff9cce7062fc10 R14: ffff9e194a22a0a0 R15: ffff9cce6559c0e0
[  380.339331] FS:  00007fd132368880(0000) GS:ffff9cce7ea40000(0000) knlGS:0000000000000000
[  380.339332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  380.339332] CR2: 00000000020820a0 CR3: 000000017ef7a003 CR4: 00000000007606e0
[  380.339333] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  380.339334] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  380.339334] PKRU: 55555554
[  380.339334] Call Trace:
[  380.339338]  devm_memremap_pages_release+0x152/0x260
[  380.339342]  release_nodes+0x18d/0x1d0
[  380.339347]  device_release_driver_internal+0x160/0x210
[  380.339350]  unbind_store+0xb3/0xe0
[  380.339355]  kernfs_fop_write+0x102/0x180
[  380.339358]  __vfs_write+0x26/0x150
[  380.339363]  ? security_file_permission+0x3c/0xc0
[  380.339364]  vfs_write+0xad/0x1a0
[  380.339366]  SyS_write+0x42/0x90
[  380.339370]  do_syscall_64+0x74/0x150
[  380.339375]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  380.339377] RIP: 0033:0x7fd13166b3d0

It has been reported on an older (4.12) kernel but the current upstream
code doesn't cond_resched in the hot remove code at all and the given
range to remove might be really large. Fix the issue by calling cond_resched
once per memory section.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7e6509a53d79..1d87724fa558 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -587,6 +587,7 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	for (i = 0; i < sections_to_remove; i++) {
 		unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
 
+		cond_resched();
 		ret = __remove_section(zone, __pfn_to_section(pfn), map_offset,
 				altmap);
 		map_offset = 0;
-- 
2.19.1

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@gmail.com>,
	Johannes Thumshirn <jthumshirn@suse.de>, <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: [PATCH] memory_hotplug: cond_resched in __remove_pages
Date: Wed, 31 Oct 2018 13:58:40 +0100	[thread overview]
Message-ID: <20181031125840.23982-1-mhocko@kernel.org> (raw)

From: Michal Hocko <mhocko@suse.com>

We have received a bug report that unbinding a large pmem (>1TB)
can result in a soft lockup:
[  380.339203] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [ndctl:4365]
[...]
[  380.339316] Supported: Yes
[  380.339318] CPU: 9 PID: 4365 Comm: ndctl Not tainted 4.12.14-94.40-default #1 SLE12-SP4
[  380.339318] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018
[  380.339319] task: ffff9cce7d4410c0 task.stack: ffffbe9eb1bc4000
[  380.339325] RIP: 0010:__put_page+0x62/0x80
[  380.339326] RSP: 0018:ffffbe9eb1bc7d30 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff10
[  380.339327] RAX: 000040540081c0d3 RBX: ffffeb8f03557200 RCX: 000063af40000000
[  380.339328] RDX: 0000000000000002 RSI: ffff9cce75bff498 RDI: ffff9e4a76072ff8
[  380.339329] RBP: 0000000a43557200 R08: 0000000000000000 R09: ffffbe9eb1bc7bb0
[  380.339329] R10: ffffbe9eb1bc7d08 R11: 0000000000000000 R12: ffff9e194a22a0e0
[  380.339330] R13: ffff9cce7062fc10 R14: ffff9e194a22a0a0 R15: ffff9cce6559c0e0
[  380.339331] FS:  00007fd132368880(0000) GS:ffff9cce7ea40000(0000) knlGS:0000000000000000
[  380.339332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  380.339332] CR2: 00000000020820a0 CR3: 000000017ef7a003 CR4: 00000000007606e0
[  380.339333] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  380.339334] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  380.339334] PKRU: 55555554
[  380.339334] Call Trace:
[  380.339338]  devm_memremap_pages_release+0x152/0x260
[  380.339342]  release_nodes+0x18d/0x1d0
[  380.339347]  device_release_driver_internal+0x160/0x210
[  380.339350]  unbind_store+0xb3/0xe0
[  380.339355]  kernfs_fop_write+0x102/0x180
[  380.339358]  __vfs_write+0x26/0x150
[  380.339363]  ? security_file_permission+0x3c/0xc0
[  380.339364]  vfs_write+0xad/0x1a0
[  380.339366]  SyS_write+0x42/0x90
[  380.339370]  do_syscall_64+0x74/0x150
[  380.339375]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  380.339377] RIP: 0033:0x7fd13166b3d0

It has been reported on an older (4.12) kernel but the current upstream
code doesn't cond_resched in the hot remove code at all and the given
range to remove might be really large. Fix the issue by calling cond_resched
once per memory section.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7e6509a53d79..1d87724fa558 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -587,6 +587,7 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	for (i = 0; i < sections_to_remove; i++) {
 		unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
 
+		cond_resched();
 		ret = __remove_section(zone, __pfn_to_section(pfn), map_offset,
 				altmap);
 		map_offset = 0;
-- 
2.19.1


             reply	other threads:[~2018-10-31 12:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-31 12:58 Michal Hocko [this message]
2018-10-31 12:58 ` [PATCH] memory_hotplug: cond_resched in __remove_pages Michal Hocko
2018-10-31 13:11 ` Johannes Thumshirn
2018-10-31 13:11   ` Johannes Thumshirn
2018-10-31 19:15 ` Andrew Morton
2018-10-31 19:15   ` Andrew Morton
2018-10-31 21:42   ` Michal Hocko
2018-11-02  3:52 ` Balbir Singh
2018-11-02  7:05   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181031125840.23982-1-mhocko@kernel.org \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@gmail.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.