linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Minchan Kim <minchan.kim@gmail.com>
Subject: Re: [RFC PATCH 1/2] vmscan don't isolate too many pages
Date: Wed, 8 Jul 2009 11:19:01 +0800	[thread overview]
Message-ID: <20090708031901.GA9924@localhost> (raw)
In-Reply-To: <4A539B11.5020803@redhat.com>

On Wed, Jul 08, 2009 at 02:59:29AM +0800, Rik van Riel wrote:
> KOSAKI Motohiro wrote:
> 
> > FAQ
> > -------
> > Q: Why do you compared zone accumulate pages, not individual zone pages?
> > A: If we check individual zone, #-of-reclaimer is restricted by smallest zone.
> >    it mean decreasing the performance of the system having small dma zone.
> 
> That is a clever solution!  I was playing around a bit with
> doing it on a per-zone basis.  Your idea is much nicer.
> 
> However, I can see one potential problem with your patch:
> 
> +		nr_inactive += zone_page_state(zone, NR_INACTIVE_ANON);
> +		nr_inactive += zone_page_state(zone, NR_INACTIVE_FILE);
> +		nr_isolated += zone_page_state(zone, NR_ISOLATED_ANON);
> +		nr_isolated += zone_page_state(zone, NR_ISOLATED_FILE);
> +	}
> +
> +	return nr_isolated > nr_inactive;
> 
> What if we ran out of swap space, or are not scanning the
> anon list at all for some reason?
> 
> It is possible that there are no inactive_file pages left,
> with all file pages already isolated, and your function
> still letting reclaimers through.

Good catch!

If swap is always off, NR_ISOLATED_ANON = 0. So it becomes

        NR_ISOLATED_FILE > NR_INACTIVE_FILE + NR_INACTIVE_ANON

which will never be true if there are more anon pages than file pages.

If swap is on but goes full at some time, comparing *ANON is
also meaningless because the anon list won't be scanned.

> This means you could still get a spurious OOM.
> 
> I guess I should mail out my (ugly) approach, so we can
> compare the two :)

And it helps to be aware of all the alternatives, now and future :)

KOSAKI, I tested this updated patch. The OOM seems to be gone, but
now the process could sleep for too long time.

[  316.756006] BUG: soft lockup - CPU#1 stuck for 61s! [msgctl11:12497]
[  316.756006] Modules linked in: drm snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device iwlagn snd iwlcore soundcore snd_page_alloc video
[  316.756006] irq event stamp: 269858
[  316.756006] hardirqs last  enabled at (269857): [<ffffffff8100cc50>] restore_args+0x0/0x30
[  316.756006] hardirqs last disabled at (269858): [<ffffffff8100bf6a>] save_args+0x6a/0x70
[  316.756006] softirqs last  enabled at (269856): [<ffffffff81055d9e>] __do_softirq+0x19e/0x1f0
[  316.756006] softirqs last disabled at (269841): [<ffffffff8100d3cc>] call_softirq+0x1c/0x50
[  316.756006] CPU 1:
[  316.756006] Modules linked in: drm snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device iwlagn snd iwlcore soundcore snd_page_alloc video
[  316.756006] Pid: 12497, comm: msgctl11 Not tainted 2.6.31-rc1 #33 HP Compaq 6910p
[  316.756006] RIP: 0010:[<ffffffff810804a9>]  [<ffffffff810804a9>] lock_acquire+0xf9/0x120
[  316.756006] RSP: 0000:ffff880013a9fcd8  EFLAGS: 00000246
[  316.756006] RAX: ffff880013a7c500 RBX: ffff880013a9fd28 RCX: ffffffff81b6c928
[  316.756006] RDX: 0000000000000002 RSI: ffffffff82130ff0 RDI: 0000000000000246
[  316.756006] RBP: ffffffff8100cb8e R08: ffffff18f84dc1fb R09: 0000000000000001
[  316.756006] R10: 00000000000001ce R11: 0000000000000001 R12: 0000000000000002
[  316.756006] R13: ffff880013a7cc90 R14: 000000008107eca9 R15: ffff880013a9fd08
[  316.756006] FS:  00007f91a8bf76f0(0000) GS:ffff88000272f000(0000) knlGS:0000000000000000
[  316.756006] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  316.756006] CR2: 00007f91a8c079a0 CR3: 0000000013a81000 CR4: 00000000000006e0
[  316.756006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  316.756006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  316.756006] Call Trace:
[  316.756006]  [<ffffffff810fade9>] ? __swap_duplicate+0x59/0x1a0
[  316.756006]  [<ffffffff8158e0e6>] ? _spin_lock+0x36/0x70
[  316.756006]  [<ffffffff810fade9>] ? __swap_duplicate+0x59/0x1a0
[  316.756006]  [<ffffffff810fade9>] ? __swap_duplicate+0x59/0x1a0
[  316.756006]  [<ffffffff810faf43>] ? swapcache_prepare+0x13/0x20
[  316.756006]  [<ffffffff810fa423>] ? read_swap_cache_async+0x63/0x120
[  316.756006]  [<ffffffff810fa567>] ? swapin_readahead+0x87/0xc0
[  316.756006]  [<ffffffff810ec9f9>] ? handle_mm_fault+0x719/0x840
[  316.756006]  [<ffffffff815911cb>] ? do_page_fault+0x1cb/0x330
[  316.756006]  [<ffffffff8158e9e5>] ? page_fault+0x25/0x30
[  316.756006] Kernel panic - not syncing: softlockup: hung tasks
[  316.756006] Pid: 12497, comm: msgctl11 Not tainted 2.6.31-rc1 #33
[  316.756006] Call Trace:
[  316.756006]  <IRQ>  [<ffffffff8158a01a>] panic+0xa5/0x173
[  316.756006]  [<ffffffff8100cb8e>] ? common_interrupt+0xe/0x13
[  316.756006]  [<ffffffff81012e69>] ? sched_clock+0x9/0x10
[  316.756006]  [<ffffffff8107b745>] ? lock_release_holdtime+0x35/0x1c0
[  316.756006]  [<ffffffff8158df1b>] ? _spin_unlock+0x2b/0x40
[  316.756006]  [<ffffffff810a733d>] softlockup_tick+0x1ad/0x1e0
[  316.756006]  [<ffffffff8105b91d>] run_local_timers+0x1d/0x30
[  316.756006]  [<ffffffff8105b96c>] update_process_times+0x3c/0x80
[  316.756006]  [<ffffffff810773fc>] tick_periodic+0x2c/0x80
[  316.756006]  [<ffffffff81077476>] tick_handle_periodic+0x26/0x90
[  316.756006]  [<ffffffff81077848>] tick_do_broadcast+0x88/0x90
[  316.756006]  [<ffffffff810779a9>] tick_do_periodic_broadcast+0x39/0x50
[  316.756006]  [<ffffffff81077f34>] tick_handle_periodic_broadcast+0x14/0x50
[  316.756006]  [<ffffffff8100f5ef>] timer_interrupt+0x1f/0x30
[  316.756006]  [<ffffffff810a7e70>] handle_IRQ_event+0x70/0x180
[  316.756006]  [<ffffffff810a9cf1>] handle_edge_irq+0xc1/0x160
[  316.756006]  [<ffffffff8100ee6b>] handle_irq+0x4b/0xb0
[  316.756006]  [<ffffffff8159346f>] do_IRQ+0x6f/0xf0
[  316.756006]  [<ffffffff8100cb93>] ret_from_intr+0x0/0x16
[  316.756006]  <EOI>  [<ffffffff810804a9>] ? lock_acquire+0xf9/0x120
[  316.756006]  [<ffffffff810fade9>] ? __swap_duplicate+0x59/0x1a0
[  316.756006]  [<ffffffff8158e0e6>] ? _spin_lock+0x36/0x70
[  316.756006]  [<ffffffff810fade9>] ? __swap_duplicate+0x59/0x1a0
[  316.756006]  [<ffffffff810fade9>] ? __swap_duplicate+0x59/0x1a0
[  316.756006]  [<ffffffff810faf43>] ? swapcache_prepare+0x13/0x20
[  316.756006]  [<ffffffff810fa423>] ? read_swap_cache_async+0x63/0x120
[  316.756006]  [<ffffffff810fa567>] ? swapin_readahead+0x87/0xc0
[  316.756006]  [<ffffffff810ec9f9>] ? handle_mm_fault+0x719/0x840
[  316.756006]  [<ffffffff815911cb>] ? do_page_fault+0x1cb/0x330
[  316.756006]  [<ffffffff8158e9e5>] ? page_fault+0x25/0x30
[  316.756006] Rebooting in 100 seconds..


---
 mm/page_alloc.c |   29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

--- linux.orig/mm/page_alloc.c
+++ linux/mm/page_alloc.c
@@ -1721,6 +1721,30 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 	return alloc_flags;
 }
 
+static bool too_many_isolated(struct zonelist *zonelist,
+			      enum zone_type high_zoneidx, nodemask_t *nodemask)
+{
+	unsigned long nr_inactive = 0;
+	unsigned long nr_isolated = 0;
+	struct zoneref *z;
+	struct zone *zone;
+
+	for_each_zone_zonelist_nodemask(zone, z, zonelist,
+					high_zoneidx, nodemask) {
+		if (!populated_zone(zone))
+			continue;
+
+		nr_inactive += zone_page_state(zone, NR_INACTIVE_FILE);
+		nr_isolated += zone_page_state(zone, NR_ISOLATED_FILE);
+		if (nr_swap_pages) {
+			nr_inactive += zone_page_state(zone, NR_INACTIVE_ANON);
+			nr_isolated += zone_page_state(zone, NR_ISOLATED_ANON);
+		}
+	}
+
+	return nr_isolated > nr_inactive;
+}
+
 static inline struct page *
 __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
@@ -1789,6 +1813,11 @@ rebalance:
 	if (p->flags & PF_MEMALLOC)
 		goto nopage;
 
+	if (too_many_isolated(zonelist, high_zoneidx, nodemask)) {
+		schedule_timeout_uninterruptible(HZ/10);
+		goto restart;
+	}
+
 	/* Try direct reclaim and then allocating */
 	page = __alloc_pages_direct_reclaim(gfp_mask, order,
 					zonelist, high_zoneidx,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-07-08  3:13 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-07  9:40 [RFC PATCH 0/2] fix unnecessary accidental OOM problem on concurrent reclaim KOSAKI Motohiro
2009-07-07  9:47 ` [RFC PATCH 1/2] vmscan don't isolate too many pages KOSAKI Motohiro
2009-07-07 13:23   ` Wu Fengguang
2009-07-07 18:59   ` Rik van Riel
2009-07-08  3:19     ` Wu Fengguang [this message]
2009-07-09  1:51       ` [RFC PATCH 1/2] vmscan don't isolate too many pages in a zone Rik van Riel
2009-07-09  2:47         ` Wu Fengguang
2009-07-09  3:07           ` Wu Fengguang
2009-07-09  7:01             ` KOSAKI Motohiro
2009-07-09  8:42               ` Wu Fengguang
2009-07-09 11:07                 ` Minchan Kim
2009-07-09  6:39         ` KOSAKI Motohiro
2009-07-07 23:39   ` [RFC PATCH 1/2] vmscan don't isolate too many pages Minchan Kim
2009-07-09  3:12     ` KOSAKI Motohiro
2009-07-07  9:48 ` [RFC PATCH 2/2] Don't continue reclaim if the system have plenty free memory KOSAKI Motohiro
2009-07-07 13:20   ` Minchan Kim
2009-07-09  5:08     ` KOSAKI Motohiro
2009-07-09 10:58       ` Minchan Kim
2009-07-13  0:37         ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090708031901.GA9924@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).