[PATCH] mm, vmscan: Warn about possible deadlock at shirink_inactive

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm, vmscan: Warn about possible deadlock at shirink_inactive_list
@ 2015-09-21 11:09 Tetsuo Handa
  2015-09-21 11:13 ` Tetsuo Handa
  2015-09-21 21:52 ` Dave Chinner
  0 siblings, 2 replies; 3+ messages in thread
From: Tetsuo Handa @ 2015-09-21 11:09 UTC (permalink / raw)
  To: david; +Cc: xfs, linux-mm, Tetsuo Handa

This is a difficult-to-trigger silent hang up bug.

The kswapd is allowed to bypass too_many_isolated() check in
shrink_inactive_list(). But the kswapd can be blocked by locks in
shrink_page_list() in shrink_inactive_list(). If the task which is
blocking the kswapd is trying to allocate memory with the locks held,
it forms memory reclaim deadlock.

----------
[  142.870301] kswapd0         D ffff88007fcd5b80     0    51      2 0x00000000
[  142.871941]  ffff88007c98f660 0000000000000046 ffff88007cde4c80 ffff88007c990000
[  142.873772]  ffff880035d08b40 ffff880035d08b58 ffff880079e4a828 ffff88007c98f890
[  142.875544]  ffff88007c98f678 ffffffff81632c68 ffff88007cde4c80 ffff88007c98f6d8
[  142.877338] Call Trace:
[  142.878220]  [<ffffffff81632c68>] schedule+0x38/0x90
[  142.879477]  [<ffffffff81636163>] rwsem_down_read_failed+0xd3/0x140
[  142.880937]  [<ffffffff81328314>] call_rwsem_down_read_failed+0x14/0x30
[  142.882595]  [<ffffffff81635b12>] ? down_read+0x12/0x20
[  142.883882]  [<ffffffff8126488b>] xfs_log_commit_cil+0x5b/0x460
[  142.885326]  [<ffffffff8125f67b>] __xfs_trans_commit+0x10b/0x1f0
[  142.886756]  [<ffffffff8125f9eb>] xfs_trans_commit+0xb/0x10
[  142.888085]  [<ffffffff81251505>] xfs_iomap_write_allocate+0x165/0x320
[  142.889657]  [<ffffffff8123f4aa>] xfs_map_blocks+0x15a/0x170
[  142.891002]  [<ffffffff8124045b>] xfs_vm_writepage+0x18b/0x5a0
[  142.892372]  [<ffffffff811295bc>] pageout.isra.42+0x18c/0x250
[  142.893813]  [<ffffffff8112a720>] shrink_page_list+0x650/0xa10
[  142.895182]  [<ffffffff8112b1f2>] shrink_inactive_list+0x1f2/0x560
[  142.896606]  [<ffffffff8112bedf>] shrink_lruvec+0x59f/0x760
[  142.898037]  [<ffffffff8112c146>] shrink_zone+0xa6/0x2d0
[  142.899320]  [<ffffffff8112d162>] kswapd+0x4c2/0x8e0
[  142.900545]  [<ffffffff8112cca0>] ? mem_cgroup_shrink_node_zone+0xe0/0xe0
[  142.902152]  [<ffffffff81086fb3>] kthread+0xd3/0xf0
[  142.903381]  [<ffffffff81086ee0>] ? kthread_create_on_node+0x1a0/0x1a0
[  142.904919]  [<ffffffff81637b9f>] ret_from_fork+0x3f/0x70
[  142.906237]  [<ffffffff81086ee0>] ? kthread_create_on_node+0x1a0/0x1a0
(...snipped...)
[  148.995189] a.out           D ffffffff813360c7     0  7821   7788 0x00000080
[  148.996854]  ffff88007c6b73d8 0000000000000086 ffff880078e7f2c0 ffff88007c6b8000
[  148.998583]  ffff88007c6b7410 ffff88007fc8dfc0 00000000fffd94f9 0000000000000002
[  149.000560]  ffff88007c6b73f0 ffffffff81632c68 ffff88007fc8dfc0 ffff88007c6b7470
[  149.002415] Call Trace:
[  149.003285]  [<ffffffff81632c68>] schedule+0x38/0x90
[  149.004624]  [<ffffffff816366e2>] schedule_timeout+0x122/0x1c0
[  149.006003]  [<ffffffff8108fc63>] ? preempt_count_add+0x43/0x90
[  149.007412]  [<ffffffff810c81b0>] ? cascade+0x90/0x90
[  149.008704]  [<ffffffff81632291>] io_schedule_timeout+0xa1/0x110
[  149.010109]  [<ffffffff811359bd>] congestion_wait+0x7d/0xd0
[  149.011536]  [<ffffffff810a64a0>] ? wait_woken+0x80/0x80
[  149.012891]  [<ffffffff8112b519>] shrink_inactive_list+0x519/0x560
[  149.014327]  [<ffffffff8109aa6e>] ? check_preempt_wakeup+0x10e/0x1f0
[  149.015867]  [<ffffffff8112bedf>] shrink_lruvec+0x59f/0x760
[  149.017340]  [<ffffffff8117bb4f>] ? mem_cgroup_iter+0xef/0x4e0
[  149.018742]  [<ffffffff8112c146>] shrink_zone+0xa6/0x2d0
[  149.020150]  [<ffffffff8112c6e4>] do_try_to_free_pages+0x164/0x420
[  149.021605]  [<ffffffff8112ca34>] try_to_free_pages+0x94/0xc0
[  149.022968]  [<ffffffff8112101b>] __alloc_pages_nodemask+0x4fb/0x930
[  149.024476]  [<ffffffff811626bc>] alloc_pages_current+0x8c/0x100
[  149.025883]  [<ffffffff81169b68>] new_slab+0x458/0x4d0
[  149.027209]  [<ffffffff8116bdbe>] ___slab_alloc+0x49e/0x610
[  149.028580]  [<ffffffff81260014>] ? kmem_alloc+0x74/0xe0
[  149.029864]  [<ffffffff81099548>] ? update_curr+0x58/0xe0
[  149.031327]  [<ffffffff8109969d>] ? update_cfs_shares+0xad/0xf0
[  149.032808]  [<ffffffff81099af9>] ? dequeue_entity+0x1e9/0x800
[  149.034301]  [<ffffffff811889be>] __slab_alloc.isra.67+0x53/0x6f
[  149.035780]  [<ffffffff81260014>] ? kmem_alloc+0x74/0xe0
[  149.037076]  [<ffffffff81260014>] ? kmem_alloc+0x74/0xe0
[  149.038344]  [<ffffffff8116c23d>] __kmalloc+0x14d/0x1a0
[  149.039677]  [<ffffffff81260014>] kmem_alloc+0x74/0xe0
[  149.040940]  [<ffffffff81264b82>] xfs_log_commit_cil+0x352/0x460
[  149.042321]  [<ffffffff8125f67b>] __xfs_trans_commit+0x10b/0x1f0
[  149.043733]  [<ffffffff8125f9eb>] xfs_trans_commit+0xb/0x10
[  149.045065]  [<ffffffff81251b9f>] xfs_vn_update_time+0xdf/0x130
[  149.046430]  [<ffffffff811a4768>] file_update_time+0xb8/0x110
[  149.047833]  [<ffffffff81249cde>] xfs_file_aio_write_checks+0x16e/0x1c0
[  149.049386]  [<ffffffff8124a089>] xfs_file_buffered_aio_write+0x79/0x1f0
[  149.051031]  [<ffffffff81636fd5>] ? _raw_spin_lock_irqsave+0x25/0x50
[  149.052581]  [<ffffffff8163709f>] ? _raw_spin_unlock_irqrestore+0x1f/0x40
[  149.054084]  [<ffffffff8124a274>] xfs_file_write_iter+0x74/0x110
[  149.055729]  [<ffffffff8118ada7>] __vfs_write+0xc7/0x100
[  149.057023]  [<ffffffff8118b574>] vfs_write+0xa4/0x190
[  149.058330]  [<ffffffff8118c200>] SyS_write+0x50/0xc0
[  149.059553]  [<ffffffff811b7c78>] ? do_fsync+0x38/0x60
[  149.060811]  [<ffffffff8163782e>] entry_SYSCALL_64_fastpath+0x12/0x71
(...snipped...)
[  264.199092] kswapd0         D ffff88007fcd5b80     0    51      2 0x00000000
[  264.200724]  ffff88007c98f660 0000000000000046 ffff88007cde4c80 ffff88007c990000
[  264.202469]  ffff880035d08b40 ffff880035d08b58 ffff880079e4a828 ffff88007c98f890
[  264.204233]  ffff88007c98f678 ffffffff81632c68 ffff88007cde4c80 ffff88007c98f6d8
[  264.206173] Call Trace:
[  264.207202]  [<ffffffff81632c68>] schedule+0x38/0x90
[  264.208536]  [<ffffffff81636163>] rwsem_down_read_failed+0xd3/0x140
[  264.210044]  [<ffffffff81328314>] call_rwsem_down_read_failed+0x14/0x30
[  264.211602]  [<ffffffff81635b12>] ? down_read+0x12/0x20
[  264.212929]  [<ffffffff8126488b>] xfs_log_commit_cil+0x5b/0x460
[  264.214369]  [<ffffffff8125f67b>] __xfs_trans_commit+0x10b/0x1f0
[  264.215820]  [<ffffffff8125f9eb>] xfs_trans_commit+0xb/0x10
[  264.217193]  [<ffffffff81251505>] xfs_iomap_write_allocate+0x165/0x320
[  264.218721]  [<ffffffff8123f4aa>] xfs_map_blocks+0x15a/0x170
[  264.220109]  [<ffffffff8124045b>] xfs_vm_writepage+0x18b/0x5a0
[  264.221586]  [<ffffffff811295bc>] pageout.isra.42+0x18c/0x250
[  264.222989]  [<ffffffff8112a720>] shrink_page_list+0x650/0xa10
[  264.224404]  [<ffffffff8112b1f2>] shrink_inactive_list+0x1f2/0x560
[  264.225876]  [<ffffffff8112bedf>] shrink_lruvec+0x59f/0x760
[  264.227248]  [<ffffffff8112c146>] shrink_zone+0xa6/0x2d0
[  264.228573]  [<ffffffff8112d162>] kswapd+0x4c2/0x8e0
[  264.229840]  [<ffffffff8112cca0>] ? mem_cgroup_shrink_node_zone+0xe0/0xe0
[  264.231407]  [<ffffffff81086fb3>] kthread+0xd3/0xf0
[  264.232662]  [<ffffffff81086ee0>] ? kthread_create_on_node+0x1a0/0x1a0
[  264.234185]  [<ffffffff81637b9f>] ret_from_fork+0x3f/0x70
[  264.235527]  [<ffffffff81086ee0>] ? kthread_create_on_node+0x1a0/0x1a0
(...snipped...)
[  270.339774] a.out           D ffffffff813360c7     0  7821   7788 0x00000080
[  270.341391]  ffff88007c6b73d8 0000000000000086 ffff880078e7f2c0 ffff88007c6b8000
[  270.343114]  ffff88007c6b7410 ffff88007fc4dfc0 00000000ffff8b29 0000000000000002
[  270.344859]  ffff88007c6b73f0 ffffffff81632c68 ffff88007fc4dfc0 ffff88007c6b7470
[  270.346670] Call Trace:
[  270.347608]  [<ffffffff81632c68>] schedule+0x38/0x90
[  270.348929]  [<ffffffff816366e2>] schedule_timeout+0x122/0x1c0
[  270.350354]  [<ffffffff8108fc63>] ? preempt_count_add+0x43/0x90
[  270.351790]  [<ffffffff810c81b0>] ? cascade+0x90/0x90
[  270.353106]  [<ffffffff81632291>] io_schedule_timeout+0xa1/0x110
[  270.354558]  [<ffffffff811359bd>] congestion_wait+0x7d/0xd0
[  270.355958]  [<ffffffff810a64a0>] ? wait_woken+0x80/0x80
[  270.357298]  [<ffffffff8112b519>] shrink_inactive_list+0x519/0x560
[  270.358779]  [<ffffffff8109aa6e>] ? check_preempt_wakeup+0x10e/0x1f0
[  270.360307]  [<ffffffff8112bedf>] shrink_lruvec+0x59f/0x760
[  270.361687]  [<ffffffff8117bb4f>] ? mem_cgroup_iter+0xef/0x4e0
[  270.363147]  [<ffffffff8112c146>] shrink_zone+0xa6/0x2d0
[  270.364462]  [<ffffffff8112c6e4>] do_try_to_free_pages+0x164/0x420
[  270.365898]  [<ffffffff8112ca34>] try_to_free_pages+0x94/0xc0
[  270.367261]  [<ffffffff8112101b>] __alloc_pages_nodemask+0x4fb/0x930
[  270.368744]  [<ffffffff811626bc>] alloc_pages_current+0x8c/0x100
[  270.370151]  [<ffffffff81169b68>] new_slab+0x458/0x4d0
[  270.371420]  [<ffffffff8116bdbe>] ___slab_alloc+0x49e/0x610
[  270.372769]  [<ffffffff81260014>] ? kmem_alloc+0x74/0xe0
[  270.374053]  [<ffffffff81099548>] ? update_curr+0x58/0xe0
[  270.375351]  [<ffffffff8109969d>] ? update_cfs_shares+0xad/0xf0
[  270.376748]  [<ffffffff81099af9>] ? dequeue_entity+0x1e9/0x800
[  270.378200]  [<ffffffff811889be>] __slab_alloc.isra.67+0x53/0x6f
[  270.379604]  [<ffffffff81260014>] ? kmem_alloc+0x74/0xe0
[  270.380879]  [<ffffffff81260014>] ? kmem_alloc+0x74/0xe0
[  270.382148]  [<ffffffff8116c23d>] __kmalloc+0x14d/0x1a0
[  270.383424]  [<ffffffff81260014>] kmem_alloc+0x74/0xe0
[  270.384668]  [<ffffffff81264b82>] xfs_log_commit_cil+0x352/0x460
[  270.386049]  [<ffffffff8125f67b>] __xfs_trans_commit+0x10b/0x1f0
[  270.387449]  [<ffffffff8125f9eb>] xfs_trans_commit+0xb/0x10
[  270.388761]  [<ffffffff81251b9f>] xfs_vn_update_time+0xdf/0x130
[  270.390126]  [<ffffffff811a4768>] file_update_time+0xb8/0x110
[  270.391484]  [<ffffffff81249cde>] xfs_file_aio_write_checks+0x16e/0x1c0
[  270.392962]  [<ffffffff8124a089>] xfs_file_buffered_aio_write+0x79/0x1f0
[  270.394728]  [<ffffffff81636fd5>] ? _raw_spin_lock_irqsave+0x25/0x50
[  270.396218]  [<ffffffff8163709f>] ? _raw_spin_unlock_irqrestore+0x1f/0x40
[  270.397769]  [<ffffffff8124a274>] xfs_file_write_iter+0x74/0x110
[  270.399194]  [<ffffffff8118ada7>] __vfs_write+0xc7/0x100
[  270.400507]  [<ffffffff8118b574>] vfs_write+0xa4/0x190
[  270.401788]  [<ffffffff8118c200>] SyS_write+0x50/0xc0
[  270.403048]  [<ffffffff811b7c78>] ? do_fsync+0x38/0x60
[  270.404324]  [<ffffffff8163782e>] entry_SYSCALL_64_fastpath+0x12/0x71
----------

While OOM-killer deadlock shows OOM-killer messages and CPU usage remains
100%, this hang up shows no kernel messages and CPU usage remains 0% as if
the system is completely idle.

This patch shows progress of shrinking inactive list in order to assist
warning about possible deadlock. So far I haven't succeeded to reproduce
this bug after applying this patch; excuse me for output messages example
is not available.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/vmscan.c | 45 +++++++++++++++++++++++++++++++--------------
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index db5339d..0464537 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1476,20 +1476,12 @@ int isolate_lru_page(struct page *page)
 	return ret;
 }
 
-static int __too_many_isolated(struct zone *zone, int file,
-			       struct scan_control *sc, int safe)
+static inline unsigned long inactive_pages(struct zone *zone, int file,
+					   struct scan_control *sc, int safe)
 {
-	unsigned long inactive, isolated;
-
-	if (safe) {
-		inactive = zone_page_state_snapshot(zone,
-				NR_INACTIVE_ANON + 2 * file);
-		isolated = zone_page_state_snapshot(zone,
-				NR_ISOLATED_ANON + file);
-	} else {
-		inactive = zone_page_state(zone, NR_INACTIVE_ANON + 2 * file);
-		isolated = zone_page_state(zone, NR_ISOLATED_ANON + file);
-	}
+	unsigned long inactive = safe ?
+		zone_page_state_snapshot(zone, NR_INACTIVE_ANON + 2 * file) :
+		zone_page_state(zone, NR_INACTIVE_ANON + 2 * file);
 
 	/*
 	 * GFP_NOIO/GFP_NOFS callers are allowed to isolate more pages, so they
@@ -1498,8 +1490,21 @@ static int __too_many_isolated(struct zone *zone, int file,
 	 */
 	if ((sc->gfp_mask & GFP_IOFS) == GFP_IOFS)
 		inactive >>= 3;
+	return inactive;
+}
 
-	return isolated > inactive;
+static inline unsigned long isolated_pages(struct zone *zone, int file,
+					   int safe)
+{
+	return safe ? zone_page_state_snapshot(zone, NR_ISOLATED_ANON + file) :
+		zone_page_state(zone, NR_ISOLATED_ANON + file);
+}
+
+static int __too_many_isolated(struct zone *zone, int file,
+			       struct scan_control *sc, int safe)
+{
+	return isolated_pages(zone, file, safe) >
+		inactive_pages(zone, file, sc, safe);
 }
 
 /*
@@ -1619,8 +1624,20 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	int file = is_file_lru(lru);
 	struct zone *zone = lruvec_zone(lruvec);
 	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
+	unsigned long start = jiffies;
+	unsigned long prev = start + 30 * HZ;
 
 	while (unlikely(too_many_isolated(zone, file, sc))) {
+		unsigned long now = jiffies;
+
+		if (time_after(now, prev)) {
+			pr_warn("vmscan: %s(%u) is waiting for %lu seconds at %s (mode:0x%x,isolated:%lu,inactive:%lu)\n",
+				current->comm, current->pid, (now - start) / HZ,
+				__func__, sc->gfp_mask,
+				isolated_pages(zone, file, 1),
+				inactive_pages(zone, file, sc, 1));
+			prev = now + 30 * HZ;
+		}
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
 
 		/* We are about to die and free our memory. Return now. */
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] mm, vmscan: Warn about possible deadlock at shirink_inactive_list
  2015-09-21 11:09 [PATCH] mm, vmscan: Warn about possible deadlock at shirink_inactive_list Tetsuo Handa
@ 2015-09-21 11:13 ` Tetsuo Handa
  2015-09-21 21:52 ` Dave Chinner
  1 sibling, 0 replies; 3+ messages in thread
From: Tetsuo Handa @ 2015-09-21 11:13 UTC (permalink / raw)
  To: david; +Cc: xfs, linux-mm

(Oops. I forgot to append below description.)

David, I got a backtrace where the system stalled forever with 0% CPU usage
because all memory allocating tasks are sleeping at congestion_wait()
in shrink_inactive_list() without triggering the OOM killer
(uptime > 100 of http://I-love.SAKURA.ne.jp/tmp/serial-20150920.txt.xz ).
Could you please have a look whether this is really a deadlock.
(Even if it was not a deadlock, sleeping for 2 minutes at congestion_wait()
is unusable...)

Tetsuo Handa wrote:
> This is a difficult-to-trigger silent hang up bug.
> 
> The kswapd is allowed to bypass too_many_isolated() check in
> shrink_inactive_list(). But the kswapd can be blocked by locks in
> shrink_page_list() in shrink_inactive_list(). If the task which is
> blocking the kswapd is trying to allocate memory with the locks held,
> it forms memory reclaim deadlock.
> 
> ----------
> [  142.870301] kswapd0         D ffff88007fcd5b80     0    51      2 0x00000000
> [  142.871941]  ffff88007c98f660 0000000000000046 ffff88007cde4c80 ffff88007c990000
> [  142.873772]  ffff880035d08b40 ffff880035d08b58 ffff880079e4a828 ffff88007c98f890
> [  142.875544]  ffff88007c98f678 ffffffff81632c68 ffff88007cde4c80 ffff88007c98f6d8
> [  142.877338] Call Trace:
> [  142.878220]  [<ffffffff81632c68>] schedule+0x38/0x90
> [  142.879477]  [<ffffffff81636163>] rwsem_down_read_failed+0xd3/0x140
> [  142.880937]  [<ffffffff81328314>] call_rwsem_down_read_failed+0x14/0x30
> [  142.882595]  [<ffffffff81635b12>] ? down_read+0x12/0x20
> [  142.883882]  [<ffffffff8126488b>] xfs_log_commit_cil+0x5b/0x460
> [  142.885326]  [<ffffffff8125f67b>] __xfs_trans_commit+0x10b/0x1f0
> [  142.886756]  [<ffffffff8125f9eb>] xfs_trans_commit+0xb/0x10
> [  142.888085]  [<ffffffff81251505>] xfs_iomap_write_allocate+0x165/0x320
> [  142.889657]  [<ffffffff8123f4aa>] xfs_map_blocks+0x15a/0x170
> [  142.891002]  [<ffffffff8124045b>] xfs_vm_writepage+0x18b/0x5a0
> [  142.892372]  [<ffffffff811295bc>] pageout.isra.42+0x18c/0x250
> [  142.893813]  [<ffffffff8112a720>] shrink_page_list+0x650/0xa10
> [  142.895182]  [<ffffffff8112b1f2>] shrink_inactive_list+0x1f2/0x560
> [  142.896606]  [<ffffffff8112bedf>] shrink_lruvec+0x59f/0x760
> [  142.898037]  [<ffffffff8112c146>] shrink_zone+0xa6/0x2d0
> [  142.899320]  [<ffffffff8112d162>] kswapd+0x4c2/0x8e0
> [  142.900545]  [<ffffffff8112cca0>] ? mem_cgroup_shrink_node_zone+0xe0/0xe0
> [  142.902152]  [<ffffffff81086fb3>] kthread+0xd3/0xf0
> [  142.903381]  [<ffffffff81086ee0>] ? kthread_create_on_node+0x1a0/0x1a0
> [  142.904919]  [<ffffffff81637b9f>] ret_from_fork+0x3f/0x70
> [  142.906237]  [<ffffffff81086ee0>] ? kthread_create_on_node+0x1a0/0x1a0
> (...snipped...)
> [  148.995189] a.out           D ffffffff813360c7     0  7821   7788 0x00000080
> [  148.996854]  ffff88007c6b73d8 0000000000000086 ffff880078e7f2c0 ffff88007c6b8000
> [  148.998583]  ffff88007c6b7410 ffff88007fc8dfc0 00000000fffd94f9 0000000000000002
> [  149.000560]  ffff88007c6b73f0 ffffffff81632c68 ffff88007fc8dfc0 ffff88007c6b7470
> [  149.002415] Call Trace:
> [  149.003285]  [<ffffffff81632c68>] schedule+0x38/0x90
> [  149.004624]  [<ffffffff816366e2>] schedule_timeout+0x122/0x1c0
> [  149.006003]  [<ffffffff8108fc63>] ? preempt_count_add+0x43/0x90
> [  149.007412]  [<ffffffff810c81b0>] ? cascade+0x90/0x90
> [  149.008704]  [<ffffffff81632291>] io_schedule_timeout+0xa1/0x110
> [  149.010109]  [<ffffffff811359bd>] congestion_wait+0x7d/0xd0
> [  149.011536]  [<ffffffff810a64a0>] ? wait_woken+0x80/0x80
> [  149.012891]  [<ffffffff8112b519>] shrink_inactive_list+0x519/0x560
> [  149.014327]  [<ffffffff8109aa6e>] ? check_preempt_wakeup+0x10e/0x1f0
> [  149.015867]  [<ffffffff8112bedf>] shrink_lruvec+0x59f/0x760
> [  149.017340]  [<ffffffff8117bb4f>] ? mem_cgroup_iter+0xef/0x4e0
> [  149.018742]  [<ffffffff8112c146>] shrink_zone+0xa6/0x2d0
> [  149.020150]  [<ffffffff8112c6e4>] do_try_to_free_pages+0x164/0x420
> [  149.021605]  [<ffffffff8112ca34>] try_to_free_pages+0x94/0xc0
> [  149.022968]  [<ffffffff8112101b>] __alloc_pages_nodemask+0x4fb/0x930
> [  149.024476]  [<ffffffff811626bc>] alloc_pages_current+0x8c/0x100
> [  149.025883]  [<ffffffff81169b68>] new_slab+0x458/0x4d0
> [  149.027209]  [<ffffffff8116bdbe>] ___slab_alloc+0x49e/0x610
> [  149.028580]  [<ffffffff81260014>] ? kmem_alloc+0x74/0xe0
> [  149.029864]  [<ffffffff81099548>] ? update_curr+0x58/0xe0
> [  149.031327]  [<ffffffff8109969d>] ? update_cfs_shares+0xad/0xf0
> [  149.032808]  [<ffffffff81099af9>] ? dequeue_entity+0x1e9/0x800
> [  149.034301]  [<ffffffff811889be>] __slab_alloc.isra.67+0x53/0x6f
> [  149.035780]  [<ffffffff81260014>] ? kmem_alloc+0x74/0xe0
> [  149.037076]  [<ffffffff81260014>] ? kmem_alloc+0x74/0xe0
> [  149.038344]  [<ffffffff8116c23d>] __kmalloc+0x14d/0x1a0
> [  149.039677]  [<ffffffff81260014>] kmem_alloc+0x74/0xe0
> [  149.040940]  [<ffffffff81264b82>] xfs_log_commit_cil+0x352/0x460
> [  149.042321]  [<ffffffff8125f67b>] __xfs_trans_commit+0x10b/0x1f0
> [  149.043733]  [<ffffffff8125f9eb>] xfs_trans_commit+0xb/0x10
> [  149.045065]  [<ffffffff81251b9f>] xfs_vn_update_time+0xdf/0x130
> [  149.046430]  [<ffffffff811a4768>] file_update_time+0xb8/0x110
> [  149.047833]  [<ffffffff81249cde>] xfs_file_aio_write_checks+0x16e/0x1c0
> [  149.049386]  [<ffffffff8124a089>] xfs_file_buffered_aio_write+0x79/0x1f0
> [  149.051031]  [<ffffffff81636fd5>] ? _raw_spin_lock_irqsave+0x25/0x50
> [  149.052581]  [<ffffffff8163709f>] ? _raw_spin_unlock_irqrestore+0x1f/0x40
> [  149.054084]  [<ffffffff8124a274>] xfs_file_write_iter+0x74/0x110
> [  149.055729]  [<ffffffff8118ada7>] __vfs_write+0xc7/0x100
> [  149.057023]  [<ffffffff8118b574>] vfs_write+0xa4/0x190
> [  149.058330]  [<ffffffff8118c200>] SyS_write+0x50/0xc0
> [  149.059553]  [<ffffffff811b7c78>] ? do_fsync+0x38/0x60
> [  149.060811]  [<ffffffff8163782e>] entry_SYSCALL_64_fastpath+0x12/0x71
> (...snipped...)
> [  264.199092] kswapd0         D ffff88007fcd5b80     0    51      2 0x00000000
> [  264.200724]  ffff88007c98f660 0000000000000046 ffff88007cde4c80 ffff88007c990000
> [  264.202469]  ffff880035d08b40 ffff880035d08b58 ffff880079e4a828 ffff88007c98f890
> [  264.204233]  ffff88007c98f678 ffffffff81632c68 ffff88007cde4c80 ffff88007c98f6d8
> [  264.206173] Call Trace:
> [  264.207202]  [<ffffffff81632c68>] schedule+0x38/0x90
> [  264.208536]  [<ffffffff81636163>] rwsem_down_read_failed+0xd3/0x140
> [  264.210044]  [<ffffffff81328314>] call_rwsem_down_read_failed+0x14/0x30
> [  264.211602]  [<ffffffff81635b12>] ? down_read+0x12/0x20
> [  264.212929]  [<ffffffff8126488b>] xfs_log_commit_cil+0x5b/0x460
> [  264.214369]  [<ffffffff8125f67b>] __xfs_trans_commit+0x10b/0x1f0
> [  264.215820]  [<ffffffff8125f9eb>] xfs_trans_commit+0xb/0x10
> [  264.217193]  [<ffffffff81251505>] xfs_iomap_write_allocate+0x165/0x320
> [  264.218721]  [<ffffffff8123f4aa>] xfs_map_blocks+0x15a/0x170
> [  264.220109]  [<ffffffff8124045b>] xfs_vm_writepage+0x18b/0x5a0
> [  264.221586]  [<ffffffff811295bc>] pageout.isra.42+0x18c/0x250
> [  264.222989]  [<ffffffff8112a720>] shrink_page_list+0x650/0xa10
> [  264.224404]  [<ffffffff8112b1f2>] shrink_inactive_list+0x1f2/0x560
> [  264.225876]  [<ffffffff8112bedf>] shrink_lruvec+0x59f/0x760
> [  264.227248]  [<ffffffff8112c146>] shrink_zone+0xa6/0x2d0
> [  264.228573]  [<ffffffff8112d162>] kswapd+0x4c2/0x8e0
> [  264.229840]  [<ffffffff8112cca0>] ? mem_cgroup_shrink_node_zone+0xe0/0xe0
> [  264.231407]  [<ffffffff81086fb3>] kthread+0xd3/0xf0
> [  264.232662]  [<ffffffff81086ee0>] ? kthread_create_on_node+0x1a0/0x1a0
> [  264.234185]  [<ffffffff81637b9f>] ret_from_fork+0x3f/0x70
> [  264.235527]  [<ffffffff81086ee0>] ? kthread_create_on_node+0x1a0/0x1a0
> (...snipped...)
> [  270.339774] a.out           D ffffffff813360c7     0  7821   7788 0x00000080
> [  270.341391]  ffff88007c6b73d8 0000000000000086 ffff880078e7f2c0 ffff88007c6b8000
> [  270.343114]  ffff88007c6b7410 ffff88007fc4dfc0 00000000ffff8b29 0000000000000002
> [  270.344859]  ffff88007c6b73f0 ffffffff81632c68 ffff88007fc4dfc0 ffff88007c6b7470
> [  270.346670] Call Trace:
> [  270.347608]  [<ffffffff81632c68>] schedule+0x38/0x90
> [  270.348929]  [<ffffffff816366e2>] schedule_timeout+0x122/0x1c0
> [  270.350354]  [<ffffffff8108fc63>] ? preempt_count_add+0x43/0x90
> [  270.351790]  [<ffffffff810c81b0>] ? cascade+0x90/0x90
> [  270.353106]  [<ffffffff81632291>] io_schedule_timeout+0xa1/0x110
> [  270.354558]  [<ffffffff811359bd>] congestion_wait+0x7d/0xd0
> [  270.355958]  [<ffffffff810a64a0>] ? wait_woken+0x80/0x80
> [  270.357298]  [<ffffffff8112b519>] shrink_inactive_list+0x519/0x560
> [  270.358779]  [<ffffffff8109aa6e>] ? check_preempt_wakeup+0x10e/0x1f0
> [  270.360307]  [<ffffffff8112bedf>] shrink_lruvec+0x59f/0x760
> [  270.361687]  [<ffffffff8117bb4f>] ? mem_cgroup_iter+0xef/0x4e0
> [  270.363147]  [<ffffffff8112c146>] shrink_zone+0xa6/0x2d0
> [  270.364462]  [<ffffffff8112c6e4>] do_try_to_free_pages+0x164/0x420
> [  270.365898]  [<ffffffff8112ca34>] try_to_free_pages+0x94/0xc0
> [  270.367261]  [<ffffffff8112101b>] __alloc_pages_nodemask+0x4fb/0x930
> [  270.368744]  [<ffffffff811626bc>] alloc_pages_current+0x8c/0x100
> [  270.370151]  [<ffffffff81169b68>] new_slab+0x458/0x4d0
> [  270.371420]  [<ffffffff8116bdbe>] ___slab_alloc+0x49e/0x610
> [  270.372769]  [<ffffffff81260014>] ? kmem_alloc+0x74/0xe0
> [  270.374053]  [<ffffffff81099548>] ? update_curr+0x58/0xe0
> [  270.375351]  [<ffffffff8109969d>] ? update_cfs_shares+0xad/0xf0
> [  270.376748]  [<ffffffff81099af9>] ? dequeue_entity+0x1e9/0x800
> [  270.378200]  [<ffffffff811889be>] __slab_alloc.isra.67+0x53/0x6f
> [  270.379604]  [<ffffffff81260014>] ? kmem_alloc+0x74/0xe0
> [  270.380879]  [<ffffffff81260014>] ? kmem_alloc+0x74/0xe0
> [  270.382148]  [<ffffffff8116c23d>] __kmalloc+0x14d/0x1a0
> [  270.383424]  [<ffffffff81260014>] kmem_alloc+0x74/0xe0
> [  270.384668]  [<ffffffff81264b82>] xfs_log_commit_cil+0x352/0x460
> [  270.386049]  [<ffffffff8125f67b>] __xfs_trans_commit+0x10b/0x1f0
> [  270.387449]  [<ffffffff8125f9eb>] xfs_trans_commit+0xb/0x10
> [  270.388761]  [<ffffffff81251b9f>] xfs_vn_update_time+0xdf/0x130
> [  270.390126]  [<ffffffff811a4768>] file_update_time+0xb8/0x110
> [  270.391484]  [<ffffffff81249cde>] xfs_file_aio_write_checks+0x16e/0x1c0
> [  270.392962]  [<ffffffff8124a089>] xfs_file_buffered_aio_write+0x79/0x1f0
> [  270.394728]  [<ffffffff81636fd5>] ? _raw_spin_lock_irqsave+0x25/0x50
> [  270.396218]  [<ffffffff8163709f>] ? _raw_spin_unlock_irqrestore+0x1f/0x40
> [  270.397769]  [<ffffffff8124a274>] xfs_file_write_iter+0x74/0x110
> [  270.399194]  [<ffffffff8118ada7>] __vfs_write+0xc7/0x100
> [  270.400507]  [<ffffffff8118b574>] vfs_write+0xa4/0x190
> [  270.401788]  [<ffffffff8118c200>] SyS_write+0x50/0xc0
> [  270.403048]  [<ffffffff811b7c78>] ? do_fsync+0x38/0x60
> [  270.404324]  [<ffffffff8163782e>] entry_SYSCALL_64_fastpath+0x12/0x71
> ----------
> 
> While OOM-killer deadlock shows OOM-killer messages and CPU usage remains
> 100%, this hang up shows no kernel messages and CPU usage remains 0% as if
> the system is completely idle.
> 
> This patch shows progress of shrinking inactive list in order to assist
> warning about possible deadlock. So far I haven't succeeded to reproduce
> this bug after applying this patch; excuse me for output messages example
> is not available.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  mm/vmscan.c | 45 +++++++++++++++++++++++++++++++--------------
>  1 file changed, 31 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index db5339d..0464537 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1476,20 +1476,12 @@ int isolate_lru_page(struct page *page)
>  	return ret;
>  }
>  
> -static int __too_many_isolated(struct zone *zone, int file,
> -			       struct scan_control *sc, int safe)
> +static inline unsigned long inactive_pages(struct zone *zone, int file,
> +					   struct scan_control *sc, int safe)
>  {
> -	unsigned long inactive, isolated;
> -
> -	if (safe) {
> -		inactive = zone_page_state_snapshot(zone,
> -				NR_INACTIVE_ANON + 2 * file);
> -		isolated = zone_page_state_snapshot(zone,
> -				NR_ISOLATED_ANON + file);
> -	} else {
> -		inactive = zone_page_state(zone, NR_INACTIVE_ANON + 2 * file);
> -		isolated = zone_page_state(zone, NR_ISOLATED_ANON + file);
> -	}
> +	unsigned long inactive = safe ?
> +		zone_page_state_snapshot(zone, NR_INACTIVE_ANON + 2 * file) :
> +		zone_page_state(zone, NR_INACTIVE_ANON + 2 * file);
>  
>  	/*
>  	 * GFP_NOIO/GFP_NOFS callers are allowed to isolate more pages, so they
> @@ -1498,8 +1490,21 @@ static int __too_many_isolated(struct zone *zone, int file,
>  	 */
>  	if ((sc->gfp_mask & GFP_IOFS) == GFP_IOFS)
>  		inactive >>= 3;
> +	return inactive;
> +}
>  
> -	return isolated > inactive;
> +static inline unsigned long isolated_pages(struct zone *zone, int file,
> +					   int safe)
> +{
> +	return safe ? zone_page_state_snapshot(zone, NR_ISOLATED_ANON + file) :
> +		zone_page_state(zone, NR_ISOLATED_ANON + file);
> +}
> +
> +static int __too_many_isolated(struct zone *zone, int file,
> +			       struct scan_control *sc, int safe)
> +{
> +	return isolated_pages(zone, file, safe) >
> +		inactive_pages(zone, file, sc, safe);
>  }
>  
>  /*
> @@ -1619,8 +1624,20 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>  	int file = is_file_lru(lru);
>  	struct zone *zone = lruvec_zone(lruvec);
>  	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
> +	unsigned long start = jiffies;
> +	unsigned long prev = start + 30 * HZ;
>  
>  	while (unlikely(too_many_isolated(zone, file, sc))) {
> +		unsigned long now = jiffies;
> +
> +		if (time_after(now, prev)) {
> +			pr_warn("vmscan: %s(%u) is waiting for %lu seconds at %s (mode:0x%x,isolated:%lu,inactive:%lu)\n",
> +				current->comm, current->pid, (now - start) / HZ,
> +				__func__, sc->gfp_mask,
> +				isolated_pages(zone, file, 1),
> +				inactive_pages(zone, file, sc, 1));
> +			prev = now + 30 * HZ;
> +		}
>  		congestion_wait(BLK_RW_ASYNC, HZ/10);
>  
>  		/* We are about to die and free our memory. Return now. */
> -- 
> 1.8.3.1
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] mm, vmscan: Warn about possible deadlock at shirink_inactive_list
  2015-09-21 11:09 [PATCH] mm, vmscan: Warn about possible deadlock at shirink_inactive_list Tetsuo Handa
  2015-09-21 11:13 ` Tetsuo Handa
@ 2015-09-21 21:52 ` Dave Chinner
  1 sibling, 0 replies; 3+ messages in thread
From: Dave Chinner @ 2015-09-21 21:52 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: xfs, linux-mm

On Mon, Sep 21, 2015 at 08:09:54PM +0900, Tetsuo Handa wrote:
> This is a difficult-to-trigger silent hang up bug.
> 
> The kswapd is allowed to bypass too_many_isolated() check in
> shrink_inactive_list(). But the kswapd can be blocked by locks in
> shrink_page_list() in shrink_inactive_list(). If the task which is
> blocking the kswapd is trying to allocate memory with the locks held,
> it forms memory reclaim deadlock.

It's a known problem in XFS and I'm currently working on patches to
fix it by hoisting the memory allocations outside of the CIL context
lock.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-09-21 21:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-21 11:09 [PATCH] mm, vmscan: Warn about possible deadlock at shirink_inactive_list Tetsuo Handa
2015-09-21 11:13 ` Tetsuo Handa
2015-09-21 21:52 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).