Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Wu Fengguang <fengguang.wu@intel.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@linux.vnet.ibm.com>,
	Dave Young <hidave.darkstar@gmail.com>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Christoph Lameter <cl@linux.com>,
	Dave Chinner <david@fromorbit.com>,
	David Rientjes <rientjes@google.com>,
	"Li, Shaohua" <shaohua.li@intel.com>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
Date: Tue, 3 May 2011 11:51:12 +0800	[thread overview]
Message-ID: <20110503035112.GA10906@localhost> (raw)
In-Reply-To: <BANLkTinXnhh5V0eH71=6PxZWpQxvti7QVw@mail.gmail.com>

Hi Minchan,

On Tue, May 03, 2011 at 08:49:20AM +0800, Minchan Kim wrote:
> Hi Wu, Sorry for slow response.
> I guess you know why I am slow. :)

Yeah, never mind :)

> Unfortunately, my patch doesn't consider order-0 pages, as you mentioned below.
> I read your mail which states it doesn't help although it considers
> order-0 pages and drain.
> Actually, I tried to look into that but in my poor system(core2duo, 2G
> ram), nr_alloc_fail never happens. :(

I'm running a 4-core 8-thread CPU with 3G ram.

Did you run with this patch?

[PATCH] mm: readahead page allocations are OK to fail
https://lkml.org/lkml/2011/4/26/129

It's very good at generating lots of __GFP_NORETRY order-0 page
allocation requests.

> I will try it in other desktop but I am not sure I can reproduce it.
> 
> >
> > root@fat /home/wfg# ./test-dd-sparse.sh
> > start time: 246
> > total time: 531
> > nr_alloc_fail 14097
> > allocstall 1578332
> > LOC:     542698     538947     536986     567118     552114     539605     541201     537623   Local timer interrupts
> > RES:       3368       1908       1474       1476       2809       1602       1500       1509   Rescheduling interrupts
> > CAL:     223844     224198     224268     224436     223952     224056     223700     223743   Function call interrupts
> > TLB:        381         27         22         19         96        404        111         67   TLB shootdowns
> >
> > root@fat /home/wfg# getdelays -dip `pidof dd`
> > print delayacct stats ON
> > printing IO accounting
> > PID     5202
> >
> >
> > CPU             count     real total  virtual total    delay total
> >                 1132     3635447328     3627947550   276722091605
> > IO              count    delay total  delay average
> >                    2      187809974             62ms
> > SWAP            count    delay total  delay average
> >                    0              0              0ms
> > RECLAIM         count    delay total  delay average
> >                 1334    35304580824             26ms
> > dd: read=278528, write=0, cancelled_write=0
> >
> > I guess your patch is mainly fixing the high order allocations while
> > my workload is mainly order 0 readahead page allocations. There are
> > 1000 forks, however the "start time: 246" seems to indicate that the
> > order-1 reclaim latency is not improved.
> 
> Maybe, 8K * 1000 isn't big footprint so I think reclaim doesn't happen.

It's mainly a guess. In an earlier experiment of simply increasing
nr_to_reclaim to high_wmark_pages() without any other constraints, it
does manage to reduce start time to about 25 seconds.

> > I'll try modifying your patch and see how it works out. The obvious
> > change is to apply it to the order-0 case. Hope this won't create much
> > more isolated pages.
> >
> > Attached is your patch rebased to 2.6.39-rc3, after resolving some
> > merge conflicts and fixing a trivial NULL pointer bug.
> 
> Thanks!
> I would like to see detail with it in my system if I can reproduce it.

OK.

> >> > no cond_resched():
> >>
> >> What's this?
> >
> > I tried a modified patch that also removes the cond_resched() call in
> > __alloc_pages_direct_reclaim(), between try_to_free_pages() and
> > get_page_from_freelist(). It seems not helping noticeably.
> >
> > It looks safe to remove that cond_resched() as we already have such
> > calls in shrink_page_list().
> 
> I tried similar thing but Andrew have a concern about it.
> https://lkml.org/lkml/2011/3/24/138

Yeah cond_resched() is at least not the root cause of our problems..

> >> > +                     if (total_scanned > 2 * sc->nr_to_reclaim)
> >> > +                             goto out;
> >>
> >> If there are lots of dirty pages in LRU?
> >> If there are lots of unevictable pages in LRU?
> >> If there are lots of mapped page in LRU but may_unmap = 0 cases?
> >> I means it's rather risky early conclusion.
> >
> > That test means to avoid scanning too much on __GFP_NORETRY direct
> > reclaims. My assumption for __GFP_NORETRY is, it should fail fast when
> > the LRU pages seem hard to reclaim. And the problem in the 1000 dd
> > case is, it's all easy to reclaim LRU pages but __GFP_NORETRY still
> > fails from time to time, with lots of IPIs that may hurt large
> > machines a lot.
> 
> I don't have  enough time and a environment to test it.
> So I can't make sure of it but my concern is a latency.
> If you solve latency problem considering CPU scaling, I won't oppose it. :)

OK, let's head for that direction :)

Thanks,
Fengguang

next prev parent reply	other threads:[~2011-05-03  3:51 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-26  5:49 readahead and oom Dave Young
2011-04-26  5:55 ` Wu Fengguang
2011-04-26  6:05   ` Dave Young
2011-04-26  6:07     ` Dave Young
2011-04-26  6:25       ` Wu Fengguang
2011-04-26  6:29         ` Dave Young
2011-04-26  6:34           ` Wu Fengguang
2011-04-26  6:50             ` KOSAKI Motohiro
2011-04-26  7:41             ` Minchan Kim
2011-04-26  9:20               ` Wu Fengguang
2011-04-26  9:28                 ` Minchan Kim
2011-04-26 10:18                   ` Pekka Enberg
2011-04-26 19:47                 ` Andrew Morton
2011-04-28  4:19                   ` Wu Fengguang
2011-04-28 13:36                   ` [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures Wu Fengguang
2011-04-28 13:38                     ` [patch] vmstat: account " Wu Fengguang
2011-04-28 13:50                       ` KOSAKI Motohiro
2011-04-29  2:28                     ` [RFC][PATCH] mm: cut down __GFP_NORETRY " Wu Fengguang
2011-04-29  2:58                       ` Wu Fengguang
2011-04-30 14:17                       ` Wu Fengguang
2011-05-01 16:35                         ` Minchan Kim
2011-05-01 16:37                           ` Minchan Kim
2011-05-02 10:14                             ` KOSAKI Motohiro
2011-05-03  0:53                               ` Minchan Kim
2011-05-03  1:25                                 ` KOSAKI Motohiro
2011-05-02 10:29                           ` Wu Fengguang
2011-05-02 11:08                             ` Wu Fengguang
2011-05-03  0:49                             ` Minchan Kim
2011-05-03  3:51                               ` Wu Fengguang [this message]
2011-05-03  4:17                                 ` Minchan Kim
2011-05-02 13:29                           ` Wu Fengguang
2011-05-02 13:49                             ` Wu Fengguang
2011-05-03  0:27                               ` Satoru Moriya
2011-05-03  2:49                                 ` Wu Fengguang
2011-05-04  1:56                     ` Dave Young
2011-05-04  2:32                       ` Dave Young
2011-05-04  2:56                         ` Wu Fengguang
2011-05-04  4:23                           ` Wu Fengguang
2011-05-04  4:00                       ` Wu Fengguang
2011-05-04  7:33                         ` Dave Young
2011-04-26  6:13     ` readahead and oom Wu Fengguang
2011-04-26  6:23       ` Dave Young
2011-04-26  9:37 ` [PATCH] mm: readahead page allocations are OK to fail Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110503035112.GA10906@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=david@fromorbit.com \
    --cc=hidave.darkstar@gmail.com \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@linux.vnet.ibm.com \
    --cc=minchan.kim@gmail.com \
    --cc=rientjes@google.com \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox