All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Vlastimil Babka <vbabka@suse.cz>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v1
Date: Thu, 21 Jul 2016 16:07:14 +0900	[thread overview]
Message-ID: <20160721070714.GC31865@bbox> (raw)
In-Reply-To: <1469028111-1622-1-git-send-email-mgorman@techsingularity.net>

Hi Mel,

On Wed, Jul 20, 2016 at 04:21:46PM +0100, Mel Gorman wrote:
> Both Joonsoo Kim and Minchan Kim have reported premature OOM kills on
> a 32-bit platform. The common element is a zone-constrained high-order
> allocation failing. Two factors appear to be at fault -- pgdat being

Strictly speaking, my case is order-0 allocation failing, not high-order.
;)

> considered unreclaimable prematurely and insufficient rotation of the
> active list.
> 
> Unfortunately to date I have been unable to reproduce this with a variety
> of stress workloads on a 2G 32-bit KVM instance. It's not clear why as
> the steps are similar to what was described. It means I've been unable to
> determine if this series addresses the problem or not. I'm hoping they can
> test and report back before these are merged to mmotm. What I have checked
> is that a basic parallel DD workload completed successfully on the same
> machine I used for the node-lru performance tests. I'll leave the other
> tests running just in case anything interesting falls out.
> 
> The series is in three basic parts;
> 
> Patch 1 does not account for skipped pages as scanned. This avoids the pgdat
> 	being prematurely marked unreclaimable
> 
> Patches 2-4 add per-zone stats back in. The actual stats patch is different
> 	to Minchan's as the original patch did not account for unevictable
> 	LRU which would corrupt counters. The second two patches remove
> 	approximations based on pgdat statistics. It's effectively a
> 	revert of "mm, vmstat: remove zone and node double accounting by
> 	approximating retries" but different LRU stats are used. This
> 	is better than a full revert or a reworking of the series as
> 	it preserves history of why the zone stats are necessary.
> 
> 	If this work out, we may have to leave the double accounting in
> 	place for now until an alternative cheap solution presents itself.
> 
> Patch 5 rotates inactive/active lists for lowmem allocations. This is also
> 	quite different to Minchan's patch as the original patch did not
> 	account for memcg and would rotate if *any* eligible zone needed
> 	rotation which may rotate excessively. The new patch considers
> 	the ratio for all eligible zones which is more in line with
> 	node-lru in general.
> 

Now I tested and confirmed it works for me at the OOM point of view.
IOW, I cannot see OOM kill any more. But note that I tested it
without [1/5] which has a problem I mentioned in that thread.

If you want to merge [1/5], please resend updated version but
I doubt we need it at this moment.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Vlastimil Babka <vbabka@suse.cz>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v1
Date: Thu, 21 Jul 2016 16:07:14 +0900	[thread overview]
Message-ID: <20160721070714.GC31865@bbox> (raw)
In-Reply-To: <1469028111-1622-1-git-send-email-mgorman@techsingularity.net>

Hi Mel,

On Wed, Jul 20, 2016 at 04:21:46PM +0100, Mel Gorman wrote:
> Both Joonsoo Kim and Minchan Kim have reported premature OOM kills on
> a 32-bit platform. The common element is a zone-constrained high-order
> allocation failing. Two factors appear to be at fault -- pgdat being

Strictly speaking, my case is order-0 allocation failing, not high-order.
;)

> considered unreclaimable prematurely and insufficient rotation of the
> active list.
> 
> Unfortunately to date I have been unable to reproduce this with a variety
> of stress workloads on a 2G 32-bit KVM instance. It's not clear why as
> the steps are similar to what was described. It means I've been unable to
> determine if this series addresses the problem or not. I'm hoping they can
> test and report back before these are merged to mmotm. What I have checked
> is that a basic parallel DD workload completed successfully on the same
> machine I used for the node-lru performance tests. I'll leave the other
> tests running just in case anything interesting falls out.
> 
> The series is in three basic parts;
> 
> Patch 1 does not account for skipped pages as scanned. This avoids the pgdat
> 	being prematurely marked unreclaimable
> 
> Patches 2-4 add per-zone stats back in. The actual stats patch is different
> 	to Minchan's as the original patch did not account for unevictable
> 	LRU which would corrupt counters. The second two patches remove
> 	approximations based on pgdat statistics. It's effectively a
> 	revert of "mm, vmstat: remove zone and node double accounting by
> 	approximating retries" but different LRU stats are used. This
> 	is better than a full revert or a reworking of the series as
> 	it preserves history of why the zone stats are necessary.
> 
> 	If this work out, we may have to leave the double accounting in
> 	place for now until an alternative cheap solution presents itself.
> 
> Patch 5 rotates inactive/active lists for lowmem allocations. This is also
> 	quite different to Minchan's patch as the original patch did not
> 	account for memcg and would rotate if *any* eligible zone needed
> 	rotation which may rotate excessively. The new patch considers
> 	the ratio for all eligible zones which is more in line with
> 	node-lru in general.
> 

Now I tested and confirmed it works for me at the OOM point of view.
IOW, I cannot see OOM kill any more. But note that I tested it
without [1/5] which has a problem I mentioned in that thread.

If you want to merge [1/5], please resend updated version but
I doubt we need it at this moment.

  parent reply	other threads:[~2016-07-21  7:06 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-20 15:21 [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v1 Mel Gorman
2016-07-20 15:21 ` Mel Gorman
2016-07-20 15:21 ` [PATCH 1/5] mm, vmscan: Do not account skipped pages as scanned Mel Gorman
2016-07-20 15:21   ` Mel Gorman
2016-07-21  5:16   ` Minchan Kim
2016-07-21  5:16     ` Minchan Kim
2016-07-21  8:15     ` Mel Gorman
2016-07-21  8:15       ` Mel Gorman
2016-07-21  8:31       ` Minchan Kim
2016-07-21  8:31         ` Minchan Kim
2016-07-25  8:04   ` Minchan Kim
2016-07-25  8:04     ` Minchan Kim
2016-07-25  9:20     ` Mel Gorman
2016-07-25  9:20       ` Mel Gorman
2016-07-28  1:38       ` Minchan Kim
2016-07-28  1:38         ` Minchan Kim
2016-07-20 15:21 ` [PATCH 2/5] mm: add per-zone lru list stat Mel Gorman
2016-07-20 15:21   ` Mel Gorman
2016-07-21  7:10   ` Joonsoo Kim
2016-07-21  7:10     ` Joonsoo Kim
2016-07-23  0:45     ` Fengguang Wu
2016-07-23  0:45       ` Fengguang Wu
2016-07-23  1:25       ` Minchan Kim
2016-07-23  1:25         ` Minchan Kim
2016-07-20 15:21 ` [PATCH 3/5] mm, vmscan: Remove highmem_file_pages Mel Gorman
2016-07-20 15:21   ` Mel Gorman
2016-07-20 15:21 ` [PATCH 4/5] mm: Remove reclaim and compaction retry approximations Mel Gorman
2016-07-20 15:21   ` Mel Gorman
2016-07-20 15:21 ` [PATCH 5/5] mm: consider per-zone inactive ratio to deactivate Mel Gorman
2016-07-20 15:21   ` Mel Gorman
2016-07-21  5:30   ` Minchan Kim
2016-07-21  5:30     ` Minchan Kim
2016-07-21  8:08     ` Mel Gorman
2016-07-21  8:08       ` Mel Gorman
2016-07-21  7:10   ` Joonsoo Kim
2016-07-21  7:10     ` Joonsoo Kim
2016-07-21  8:16     ` Mel Gorman
2016-07-21  8:16       ` Mel Gorman
2016-07-21  7:07 ` Minchan Kim [this message]
2016-07-21  7:07   ` [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v1 Minchan Kim
2016-07-21  9:15   ` Mel Gorman
2016-07-21  9:15     ` Mel Gorman
2016-07-21  7:31 ` Joonsoo Kim
2016-07-21  7:31   ` Joonsoo Kim
2016-07-21  8:39   ` Minchan Kim
2016-07-21  8:39     ` Minchan Kim
2016-07-21  9:16   ` Mel Gorman
2016-07-21  9:16     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160721070714.GC31865@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.cz \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.