OOM at low page cache?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* OOM at low page cache?
@ 2015-01-23 22:18 John Moser
  2015-01-27 11:03 ` Vlastimil Babka
  0 siblings, 1 reply; 7+ messages in thread
From: John Moser @ 2015-01-23 22:18 UTC (permalink / raw)
  To: linux-kernel

Why is there no tunable to OOM at low page cache?

I have no swap configured.  I have 16GB RAM.  If Chrome or Gimp or some
other stupid program goes off the deep end and eats up my RAM, I hit
some 15.5GB or 15.75GB usage and stay there for about 40 minutes.  Every
time the program tries to do something to eat more RAM, it cranks disk
hard; the disk starts thrashing, the mouse pointer stops moving, and
nothing goes on.  It's like swapping like crazy, except you're reading
library files instead of paged anonymous RAM.

If only I could tell the system to OOM kill at 512MB or 1GB or 95%
non-evictable RAM, it would recover on its own.  As-is, I need to wait
or trigger the OOM killer by sysrq.

Am I just the only person in the world who's ever had that problem?  Or
is it a matter of questions fast popping up when you try to do this
*and* enable paging to disk?  (In my experience, that's a matter of too
much swap space:  if you have 16GB RAM and your computer dies at 15.25GB
usage, your swap space should be no larger than 750MB plus inactive
working RAM; obviously, your computer can't handle paging 750MB back and
forth.  If you make it 8GB wide and you start swap thrashing at 2GB
usage, you have too much swap available).

I guess you could try to detect excessive swap and page cache thrashing,
but that's complex; if anyone really wanted to do that, it would be done
by now.  A low-barrier OOM is much simpler.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: OOM at low page cache?
  2015-01-23 22:18 OOM at low page cache? John Moser
@ 2015-01-27 11:03 ` Vlastimil Babka
  2015-01-28  6:26   ` Minchan Kim
  0 siblings, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2015-01-27 11:03 UTC (permalink / raw)
  To: John Moser; +Cc: linux-kernel, linux-mm@kvack.org

CC linux-mm in case somebody has a good answer but missed this in lkml traffic

On 01/23/2015 11:18 PM, John Moser wrote:
> Why is there no tunable to OOM at low page cache?
> 
> I have no swap configured.  I have 16GB RAM.  If Chrome or Gimp or some
> other stupid program goes off the deep end and eats up my RAM, I hit
> some 15.5GB or 15.75GB usage and stay there for about 40 minutes.  Every
> time the program tries to do something to eat more RAM, it cranks disk
> hard; the disk starts thrashing, the mouse pointer stops moving, and
> nothing goes on.  It's like swapping like crazy, except you're reading
> library files instead of paged anonymous RAM.
> 
> If only I could tell the system to OOM kill at 512MB or 1GB or 95%
> non-evictable RAM, it would recover on its own.  As-is, I need to wait
> or trigger the OOM killer by sysrq.
> 
> Am I just the only person in the world who's ever had that problem?  Or
> is it a matter of questions fast popping up when you try to do this
> *and* enable paging to disk?  (In my experience, that's a matter of too
> much swap space:  if you have 16GB RAM and your computer dies at 15.25GB
> usage, your swap space should be no larger than 750MB plus inactive
> working RAM; obviously, your computer can't handle paging 750MB back and
> forth.  If you make it 8GB wide and you start swap thrashing at 2GB
> usage, you have too much swap available).
> 
> I guess you could try to detect excessive swap and page cache thrashing,
> but that's complex; if anyone really wanted to do that, it would be done
> by now.  A low-barrier OOM is much simpler.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: OOM at low page cache?
  2015-01-27 11:03 ` Vlastimil Babka
@ 2015-01-28  6:26   ` Minchan Kim
  2015-01-28 12:36     ` Rik van Riel
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Minchan Kim @ 2015-01-28  6:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: John Moser, linux-kernel, linux-mm@kvack.org, Rik van Riel,
	Johannes Weiner, KOSAKI Motohiro, Kamezawa Hiroyuki,
	Christoph Lameter

Hello,

On Tue, Jan 27, 2015 at 12:03:34PM +0100, Vlastimil Babka wrote:
> CC linux-mm in case somebody has a good answer but missed this in lkml traffic
> 
> On 01/23/2015 11:18 PM, John Moser wrote:
> > Why is there no tunable to OOM at low page cache?

AFAIR, there were several trial although there wasn't acceptable
at that time. One thing I can remember is min_filelist_kbytes.
FYI, http://lwn.net/Articles/412313/

> > 
> > I have no swap configured.  I have 16GB RAM.  If Chrome or Gimp or some
> > other stupid program goes off the deep end and eats up my RAM, I hit
> > some 15.5GB or 15.75GB usage and stay there for about 40 minutes.  Every
> > time the program tries to do something to eat more RAM, it cranks disk
> > hard; the disk starts thrashing, the mouse pointer stops moving, and
> > nothing goes on.  It's like swapping like crazy, except you're reading
> > library files instead of paged anonymous RAM.
> > 
> > If only I could tell the system to OOM kill at 512MB or 1GB or 95%
> > non-evictable RAM, it would recover on its own.  As-is, I need to wait
> > or trigger the OOM killer by sysrq.
> > 
> > Am I just the only person in the world who's ever had that problem?  Or
> > is it a matter of questions fast popping up when you try to do this
> > *and* enable paging to disk?  (In my experience, that's a matter of too
> > much swap space:  if you have 16GB RAM and your computer dies at 15.25GB
> > usage, your swap space should be no larger than 750MB plus inactive
> > working RAM; obviously, your computer can't handle paging 750MB back and
> > forth.  If you make it 8GB wide and you start swap thrashing at 2GB
> > usage, you have too much swap available).
> > 
> > I guess you could try to detect excessive swap and page cache thrashing,
> > but that's complex; if anyone really wanted to do that, it would be done
> > by now.  A low-barrier OOM is much simpler.

I'm far away from reclaim code for a long time but when I read again,
I found something strange.

With having swap in get_scan_count, we keep a mount of file LRU + free
as above than high wmark to prevent file LRU thrashing but we don't
with no swap. Why?

Anyway, I believe we should fix it and we now have workingset.c so
there might be more ways to be smart than old(although I am concern
about that shadow shrinker blows out lots of information to be useful
to detect in heavy memory pressure like page thrashing)

Below could be band-aid until we find a elegant solution?

>From c51787f7d75340b54bab2b5e3c587f4a483da51a Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Wed, 28 Jan 2015 14:01:57 +0900
Subject: [PATCH] mm: prevent page thrashing

No-Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/vmscan.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 671e47edb584..b258df552e3a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2143,6 +2143,25 @@ out:
 							denominator);
 				break;
 			case SCAN_FILE:
+				if (file && global_reclaim(sc)) {
+					unsigned long zonefile;
+					unsigned long zonefree;
+
+					zonefree = zone_page_state(zone,
+								NR_FREE_PAGES);
+					zonefile = zone_page_state(zone,
+							NR_ACTIVE_FILE) +
+							zone_page_state(zone,
+							NR_INACTIVE_FILE);
+
+					/* OOM is better than code thrashing */
+					if (zonefile + zonefree <=
+						high_wmark_pages(zone)) {
+						size = 0;
+						scan = 0;
+					}
+					break;
+				}
 			case SCAN_ANON:
 				/* Scan one type exclusively */
 				if ((scan_balance == SCAN_FILE) != file) {
-- 
1.9.1


-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: OOM at low page cache?
  2015-01-28  6:26   ` Minchan Kim
@ 2015-01-28 12:36     ` Rik van Riel
  2015-01-28 14:15     ` John Moser
  2015-01-28 14:27     ` John Moser
  2 siblings, 0 replies; 7+ messages in thread
From: Rik van Riel @ 2015-01-28 12:36 UTC (permalink / raw)
  To: Minchan Kim, Vlastimil Babka
  Cc: John Moser, linux-kernel, linux-mm@kvack.org, Johannes Weiner,
	KOSAKI Motohiro, Kamezawa Hiroyuki, Christoph Lameter

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/28/2015 01:26 AM, Minchan Kim wrote:
> Hello,
> 
> On Tue, Jan 27, 2015 at 12:03:34PM +0100, Vlastimil Babka wrote:
>> CC linux-mm in case somebody has a good answer but missed this in
>> lkml traffic
>> 
>> On 01/23/2015 11:18 PM, John Moser wrote:
>>> Why is there no tunable to OOM at low page cache?
> 
> AFAIR, there were several trial although there wasn't acceptable at
> that time. One thing I can remember is min_filelist_kbytes. FYI,
> http://lwn.net/Articles/412313/

The Android low memory killer does exactly what you want, and
for very much the same reasons.

See drivers/staging/android/lowmemorykiller.c

However, in the mainline kernel I think it does make sense to
apply something like the patch that Minchan cooked up with, to OOM
if freeing all the page cache could not bring us back up to the high
watermark, across all the memory zones.

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUyNewAAoJEM553pKExN6DlKQH/3PprrXF7IOyjiXnO+2Qqbau
wgWXO7mQWGFi+zNqSUzmWtfTCFVx6BxLi23MCQG1RqKGnQI4DehdOKMDidFwoC8D
2grKe9ELp04mEbyG0aipdxSw6FouIDFhC2FzmU7oQDZX5RKmLuxY7QPU4NTCitcR
xHp6jWrvyY2CDiSpA2QSAaAAIG21BtPJvXQg3WvY/LI03N1edqZnExt5Po8CY7oe
EeiO7ZtYISl/wRIoribEafZF4rMAfJ5A36kdbulqCqVtgCWEDPV0RCXimc5EtDIt
bFDiv924+YMiuEFULJlEqLGqTJOtfJ+NlBIn8nVRk5P1pOGEbO05zE+XV1Vea6k=
=8351
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: OOM at low page cache?
  2015-01-28  6:26   ` Minchan Kim
  2015-01-28 12:36     ` Rik van Riel
@ 2015-01-28 14:15     ` John Moser
  2015-01-29  1:24       ` Minchan Kim
  2015-01-28 14:27     ` John Moser
  2 siblings, 1 reply; 7+ messages in thread
From: John Moser @ 2015-01-28 14:15 UTC (permalink / raw)
  To: Minchan Kim, Vlastimil Babka
  Cc: linux-kernel, linux-mm@kvack.org, Rik van Riel, Johannes Weiner,
	KOSAKI Motohiro, Kamezawa Hiroyuki, Christoph Lameter

On 01/28/2015 01:26 AM, Minchan Kim wrote:
> Hello,
>
> On Tue, Jan 27, 2015 at 12:03:34PM +0100, Vlastimil Babka wrote:
>> CC linux-mm in case somebody has a good answer but missed this in lkml traffic
>>
>> On 01/23/2015 11:18 PM, John Moser wrote:
>>> Why is there no tunable to OOM at low page cache?
> AFAIR, there were several trial although there wasn't acceptable
> at that time. One thing I can remember is min_filelist_kbytes.
> FYI, http://lwn.net/Articles/412313/
>

That looks more straight-forward than http://lwn.net/Articles/422291/


> I'm far away from reclaim code for a long time but when I read again,
> I found something strange.
>
> With having swap in get_scan_count, we keep a mount of file LRU + free
> as above than high wmark to prevent file LRU thrashing but we don't
> with no swap. Why?
>

That's ... strange.  That means having a token 1MB swap file changes the
system's practical memory reclaim behavior dramatically?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: OOM at low page cache?
  2015-01-28 14:15     ` John Moser
@ 2015-01-29  1:24       ` Minchan Kim
  0 siblings, 0 replies; 7+ messages in thread
From: Minchan Kim @ 2015-01-29  1:24 UTC (permalink / raw)
  To: John Moser
  Cc: Vlastimil Babka, linux-kernel, linux-mm@kvack.org, Rik van Riel,
	Johannes Weiner, KOSAKI Motohiro, Kamezawa Hiroyuki,
	Christoph Lameter

Hello,

On Wed, Jan 28, 2015 at 09:15:50AM -0500, John Moser wrote:
> On 01/28/2015 01:26 AM, Minchan Kim wrote:
> > Hello,
> >
> > On Tue, Jan 27, 2015 at 12:03:34PM +0100, Vlastimil Babka wrote:
> >> CC linux-mm in case somebody has a good answer but missed this in lkml traffic
> >>
> >> On 01/23/2015 11:18 PM, John Moser wrote:
> >>> Why is there no tunable to OOM at low page cache?
> > AFAIR, there were several trial although there wasn't acceptable
> > at that time. One thing I can remember is min_filelist_kbytes.
> > FYI, http://lwn.net/Articles/412313/
> >
> 
> That looks more straight-forward than http://lwn.net/Articles/422291/
> 
> 
> > I'm far away from reclaim code for a long time but when I read again,
> > I found something strange.
> >
> > With having swap in get_scan_count, we keep a mount of file LRU + free
> > as above than high wmark to prevent file LRU thrashing but we don't
> > with no swap. Why?
> >
> 
> That's ... strange.  That means having a token 1MB swap file changes the
> system's practical memory reclaim behavior dramatically?

Basically, yes but 1M is too small. If all of swap consumed, the behavior
will be same so I think we need more explicit logic to prevent cache
thrashing. Could you test below patch?

Thanks.

>From d7659ff20f065b89633037652042968ba9c9f5c2 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Wed, 28 Jan 2015 14:01:57 +0900
Subject: [PATCH] mm: prevent page thrashing for non-swap

Josh reported

"I have no swap configured.  I have 16GB RAM.  If Chrome or Gimp or some
other stupid program goes off the deep end and eats up my RAM, I hit
some 15.5GB or 15.75GB usage and stay there for about 40 minutes.  Every
time the program tries to do something to eat more RAM, it cranks disk
hard; the disk starts thrashing, the mouse pointer stops moving, and
nothing goes on.  It's like swapping like crazy, except you're reading
library files instead of paged anonymous RAM."

With swap enable, get_scan_count has a logic to prevent cache thrasing
but it doesn't with no swap case. This patch adds the check for
non-swap case so that we shouldn't drop all of page cache in non-swap
case, either to prevent cache thrashing.

Reported-by: John Moser <john.r.moser@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/vmscan.c | 42 ++++++++++++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 671e47edb584..2a2236fceaee 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1957,6 +1957,22 @@ enum scan_balance {
 	SCAN_FILE,
 };
 
+bool enough_file_pages(struct zone *zone)
+{
+	bool ret = true;
+	unsigned long zonefile;
+	unsigned long zonefree;
+
+	zonefree = zone_page_state(zone, NR_FREE_PAGES);
+	zonefile = zone_page_state(zone, NR_ACTIVE_FILE) +
+		   zone_page_state(zone, NR_INACTIVE_FILE);
+
+	if (unlikely(zonefile + zonefree <= high_wmark_pages(zone)))
+		ret = false;
+
+	return ret;
+}
+
 /*
  * Determine how aggressively the anon and file LRU lists should be
  * scanned.  The relative value of each set of LRU lists is determined
@@ -2039,18 +2055,9 @@ static void get_scan_count(struct lruvec *lruvec, int swappiness,
 	 * thrashing file LRU becomes infinitely more attractive than
 	 * anon pages.  Try to detect this based on file LRU size.
 	 */
-	if (global_reclaim(sc)) {
-		unsigned long zonefile;
-		unsigned long zonefree;
-
-		zonefree = zone_page_state(zone, NR_FREE_PAGES);
-		zonefile = zone_page_state(zone, NR_ACTIVE_FILE) +
-			   zone_page_state(zone, NR_INACTIVE_FILE);
-
-		if (unlikely(zonefile + zonefree <= high_wmark_pages(zone))) {
-			scan_balance = SCAN_ANON;
-			goto out;
-		}
+	if (global_reclaim(sc) && !enough_file_pages(zone)) {
+		scan_balance = SCAN_ANON;
+		goto out;
 	}
 
 	/*
@@ -2143,6 +2150,17 @@ out:
 							denominator);
 				break;
 			case SCAN_FILE:
+				/*
+				 * If there isn't enough page cache to prevent
+				 * cache thrashing, OOM is better than long time
+				 * unresponsible system.
+				 */
+				if (global_reclaim(sc) && file &&
+						!enough_file_pages(zone)) {
+					size = 0;
+					scan = 0;
+					break;
+				}
 			case SCAN_ANON:
 				/* Scan one type exclusively */
 				if ((scan_balance == SCAN_FILE) != file) {
-- 
1.9.1

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: OOM at low page cache?
  2015-01-28  6:26   ` Minchan Kim
  2015-01-28 12:36     ` Rik van Riel
  2015-01-28 14:15     ` John Moser
@ 2015-01-28 14:27     ` John Moser
  2 siblings, 0 replies; 7+ messages in thread
From: John Moser @ 2015-01-28 14:27 UTC (permalink / raw)
  To: Minchan Kim, Vlastimil Babka
  Cc: linux-kernel, linux-mm@kvack.org, Rik van Riel, Johannes Weiner,
	KOSAKI Motohiro, Kamezawa Hiroyuki, Christoph Lameter

On 01/28/2015 01:26 AM, Minchan Kim wrote:
>
> Below could be band-aid until we find a elegant solution?
>
>

I don't know about elegant; but I'd be impressed if anyone figured out
how to just go Windows 95 with it and build a Task Master interface.  It
would be useful to have a kernel interface that allows a service to
attach, delegate an interface program, etc., and then pull it up under
certain conditions (low memory, heavy scheduling due to lots of
fork()ing, etc.) and assign temporary high priority.  Basically,
nearly-pause the system and allow the user to select and kill/term
processes, or bring a process forward (for like 10 seconds, then kick it
back again) so the user can save their work and exit gracefully.  At
hard OOM, you could either OOM or pause everything (you'd need a
zero-allocation path to kill things in a user-end OOM handler).

Yeah, imaginative fantasies.  Totally doable, but probably too complex
to bother.  There's all kinds of semaphore inversion or some such to
worry about; how do you ensure an X11 program is 100% snappy when the
system is being thrashed by fork() bombs and memory pressure?

Actually, I have no idea what I'm talking about.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-01-29  2:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-23 22:18 OOM at low page cache? John Moser
2015-01-27 11:03 ` Vlastimil Babka
2015-01-28  6:26   ` Minchan Kim
2015-01-28 12:36     ` Rik van Riel
2015-01-28 14:15     ` John Moser
2015-01-29  1:24       ` Minchan Kim
2015-01-28 14:27     ` John Moser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox