Re: readahead and oom - Wu Fengguang

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Minchan Kim <minchan.kim@gmail.com>,
	Dave Young <hidave.darkstar@gmail.com>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Mel Gorman <mel@linux.vnet.ibm.com>
Subject: Re: readahead and oom
Date: Thu, 28 Apr 2011 12:19:47 +0800	[thread overview]
Message-ID: <20110428041947.GA8761@localhost> (raw)
In-Reply-To: <20110426124743.e58d9746.akpm@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 3278 bytes --]

On Wed, Apr 27, 2011 at 03:47:43AM +0800, Andrew Morton wrote:
> On Tue, 26 Apr 2011 17:20:29 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.
> > 
> > readahead page allocations are completely optional. They are OK to
> > fail and in particular shall not trigger OOM on themselves.
> 
> I have distinct recollections of trying this many years ago, finding
> that it caused problems then deciding not to do it.  But I can't find
> an email trail and I don't remember the reasons :(

The most possible reason can be page allocation failures even if there
are plenty of _global_ reclaimable pages.

> If the system is so stressed for memory that the oom-killer might get
> involved then the readahead pages may well be getting reclaimed before
> the application actually gets to use them.  But that's just an aside.

Yes, when direct reclaim is working as expected, readahead thrashing
should happen long before NORETRY page allocation failures and OOM.

With that assumption I think it's OK to do this patch.  As for
readahead, sporadic allocation failures are acceptable. But there is a
problem, see below.

> Ho hum.  The patch *seems* good (as it did 5-10 years ago ;)) but there
> may be surprising side-effects which could be exposed under heavy
> testing.  Testing which I'm sure hasn't been performed...

The NORETRY direct reclaim does tend to fail a lot more on concurrent
reclaims, where one task's reclaimed pages can be stoled by others
before it's able to get it.

        __alloc_pages_direct_reclaim()
        {
                did_some_progress = try_to_free_pages();

                // pages stolen by others

                page = get_page_from_freelist();
        }

Here are the tests to demonstrate this problem.

Out of 1000GB reads and page allocations,

        test-ra-thrash.sh: read 1000 1G files interleaved in 1 single task:

        nr_alloc_fail 733

        test-dd-sparse.sh: read 1000 1G files concurrently in 1000 tasks:

        nr_alloc_fail 11799


Thanks,
Fengguang
---

--- linux-next.orig/include/linux/mmzone.h	2011-04-27 21:58:27.000000000 +0800
+++ linux-next/include/linux/mmzone.h	2011-04-27 21:58:39.000000000 +0800
@@ -106,6 +106,7 @@ enum zone_stat_item {
 	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
 	NR_DIRTIED,		/* page dirtyings since bootup */
 	NR_WRITTEN,		/* page writings since bootup */
+	NR_ALLOC_FAIL,
 #ifdef CONFIG_NUMA
 	NUMA_HIT,		/* allocated in intended node */
 	NUMA_MISS,		/* allocated in non intended node */
--- linux-next.orig/mm/page_alloc.c	2011-04-27 21:58:27.000000000 +0800
+++ linux-next/mm/page_alloc.c	2011-04-27 21:58:39.000000000 +0800
@@ -2176,6 +2176,8 @@ rebalance:
 	}
 
 nopage:
+	inc_zone_state(preferred_zone, NR_ALLOC_FAIL);
+	/* count_zone_vm_events(PGALLOCFAIL, preferred_zone, 1 << order); */
 	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
 		unsigned int filter = SHOW_MEM_FILTER_NODES;
 
--- linux-next.orig/mm/vmstat.c	2011-04-27 21:58:27.000000000 +0800
+++ linux-next/mm/vmstat.c	2011-04-27 21:58:53.000000000 +0800
@@ -879,6 +879,7 @@ static const char * const vmstat_text[] 
 	"nr_shmem",
 	"nr_dirtied",
 	"nr_written",
+	"nr_alloc_fail",
 
 #ifdef CONFIG_NUMA
 	"numa_hit",

[-- Attachment #2: test-dd-sparse.sh --]
[-- Type: application/x-sh, Size: 135 bytes --]

[-- Attachment #3: test-ra-thrash.sh --]
[-- Type: application/x-sh, Size: 124 bytes --]

next prev parent reply	other threads:[~2011-04-28  4:19 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-26  5:49 readahead and oom Dave Young
2011-04-26  5:55 ` Wu Fengguang
2011-04-26  6:05   ` Dave Young
2011-04-26  6:07     ` Dave Young
2011-04-26  6:25       ` Wu Fengguang
2011-04-26  6:29         ` Dave Young
2011-04-26  6:34           ` Wu Fengguang
2011-04-26  6:50             ` KOSAKI Motohiro
2011-04-26  7:41             ` Minchan Kim
2011-04-26  9:20               ` Wu Fengguang
2011-04-26  9:28                 ` Minchan Kim
2011-04-26 10:18                   ` Pekka Enberg
2011-04-26 19:47                 ` Andrew Morton
2011-04-28  4:19                   ` Wu Fengguang [this message]
2011-04-28 13:36                   ` [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures Wu Fengguang
2011-04-28 13:38                     ` [patch] vmstat: account " Wu Fengguang
2011-04-28 13:50                       ` KOSAKI Motohiro
2011-04-29  2:28                     ` [RFC][PATCH] mm: cut down __GFP_NORETRY " Wu Fengguang
2011-04-29  2:58                       ` Wu Fengguang
2011-04-30 14:17                       ` Wu Fengguang
2011-05-01 16:35                         ` Minchan Kim
2011-05-01 16:37                           ` Minchan Kim
2011-05-02 10:14                             ` KOSAKI Motohiro
2011-05-03  0:53                               ` Minchan Kim
2011-05-03  1:25                                 ` KOSAKI Motohiro
2011-05-02 10:29                           ` Wu Fengguang
2011-05-02 11:08                             ` Wu Fengguang
2011-05-03  0:49                             ` Minchan Kim
2011-05-03  3:51                               ` Wu Fengguang
2011-05-03  4:17                                 ` Minchan Kim
2011-05-02 13:29                           ` Wu Fengguang
2011-05-02 13:49                             ` Wu Fengguang
2011-05-03  0:27                               ` Satoru Moriya
2011-05-03  2:49                                 ` Wu Fengguang
2011-05-04  1:56                     ` Dave Young
2011-05-04  2:32                       ` Dave Young
2011-05-04  2:56                         ` Wu Fengguang
2011-05-04  4:23                           ` Wu Fengguang
2011-05-04  4:00                       ` Wu Fengguang
2011-05-04  7:33                         ` Dave Young
2011-04-26  6:13     ` readahead and oom Wu Fengguang
2011-04-26  6:23       ` Dave Young
2011-04-26  9:37 ` [PATCH] mm: readahead page allocations are OK to fail Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110428041947.GA8761@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=hidave.darkstar@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@linux.vnet.ibm.com \
    --cc=minchan.kim@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox