Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wu Fengguang <fengguang.wu@intel.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@linux.vnet.ibm.com>,
	Dave Young <hidave.darkstar@gmail.com>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Christoph Lameter <cl@linux.com>,
	Dave Chinner <david@fromorbit.com>,
	David Rientjes <rientjes@google.com>,
	"Li, Shaohua" <shaohua.li@intel.com>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
Date: Tue, 3 May 2011 11:51:12 +0800	[thread overview]
Message-ID: <20110503035112.GA10906@localhost> (raw)
In-Reply-To: <BANLkTinXnhh5V0eH71=6PxZWpQxvti7QVw@mail.gmail.com>

Hi Minchan,

On Tue, May 03, 2011 at 08:49:20AM +0800, Minchan Kim wrote:
> Hi Wu, Sorry for slow response.
> I guess you know why I am slow. :)

Yeah, never mind :)

> Unfortunately, my patch doesn't consider order-0 pages, as you mentioned below.
> I read your mail which states it doesn't help although it considers
> order-0 pages and drain.
> Actually, I tried to look into that but in my poor system(core2duo, 2G
> ram), nr_alloc_fail never happens. :(

I'm running a 4-core 8-thread CPU with 3G ram.

Did you run with this patch?

[PATCH] mm: readahead page allocations are OK to fail
https://lkml.org/lkml/2011/4/26/129

It's very good at generating lots of __GFP_NORETRY order-0 page
allocation requests.

> I will try it in other desktop but I am not sure I can reproduce it.
> 
> >
> > root@fat /home/wfg# ./test-dd-sparse.sh
> > start time: 246
> > total time: 531
> > nr_alloc_fail 14097
> > allocstall 1578332
> > LOC:     542698     538947     536986     567118     552114     539605     541201     537623   Local timer interrupts
> > RES:       3368       1908       1474       1476       2809       1602       1500       1509   Rescheduling interrupts
> > CAL:     223844     224198     224268     224436     223952     224056     223700     223743   Function call interrupts
> > TLB:        381         27         22         19         96        404        111         67   TLB shootdowns
> >
> > root@fat /home/wfg# getdelays -dip `pidof dd`
> > print delayacct stats ON
> > printing IO accounting
> > PID     5202
> >
> >
> > CPU             count     real total  virtual total    delay total
> >                 1132     3635447328     3627947550   276722091605
> > IO              count    delay total  delay average
> >                    2      187809974             62ms
> > SWAP            count    delay total  delay average
> >                    0              0              0ms
> > RECLAIM         count    delay total  delay average
> >                 1334    35304580824             26ms
> > dd: read=278528, write=0, cancelled_write=0
> >
> > I guess your patch is mainly fixing the high order allocations while
> > my workload is mainly order 0 readahead page allocations. There are
> > 1000 forks, however the "start time: 246" seems to indicate that the
> > order-1 reclaim latency is not improved.
> 
> Maybe, 8K * 1000 isn't big footprint so I think reclaim doesn't happen.

It's mainly a guess. In an earlier experiment of simply increasing
nr_to_reclaim to high_wmark_pages() without any other constraints, it
does manage to reduce start time to about 25 seconds.

> > I'll try modifying your patch and see how it works out. The obvious
> > change is to apply it to the order-0 case. Hope this won't create much
> > more isolated pages.
> >
> > Attached is your patch rebased to 2.6.39-rc3, after resolving some
> > merge conflicts and fixing a trivial NULL pointer bug.
> 
> Thanks!
> I would like to see detail with it in my system if I can reproduce it.

OK.

> >> > no cond_resched():
> >>
> >> What's this?
> >
> > I tried a modified patch that also removes the cond_resched() call in
> > __alloc_pages_direct_reclaim(), between try_to_free_pages() and
> > get_page_from_freelist(). It seems not helping noticeably.
> >
> > It looks safe to remove that cond_resched() as we already have such
> > calls in shrink_page_list().
> 
> I tried similar thing but Andrew have a concern about it.
> https://lkml.org/lkml/2011/3/24/138

Yeah cond_resched() is at least not the root cause of our problems..

> >> > +                     if (total_scanned > 2 * sc->nr_to_reclaim)
> >> > +                             goto out;
> >>
> >> If there are lots of dirty pages in LRU?
> >> If there are lots of unevictable pages in LRU?
> >> If there are lots of mapped page in LRU but may_unmap = 0 cases?
> >> I means it's rather risky early conclusion.
> >
> > That test means to avoid scanning too much on __GFP_NORETRY direct
> > reclaims. My assumption for __GFP_NORETRY is, it should fail fast when
> > the LRU pages seem hard to reclaim. And the problem in the 1000 dd
> > case is, it's all easy to reclaim LRU pages but __GFP_NORETRY still
> > fails from time to time, with lots of IPIs that may hurt large
> > machines a lot.
> 
> I don't have  enough time and a environment to test it.
> So I can't make sure of it but my concern is a latency.
> If you solve latency problem considering CPU scaling, I won't oppose it. :)

OK, let's head for that direction :)

Thanks,
Fengguang

WARNING: multiple messages have this Message-ID (diff)

From: Wu Fengguang <fengguang.wu@intel.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@linux.vnet.ibm.com>,
	Dave Young <hidave.darkstar@gmail.com>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Christoph Lameter <cl@linux.com>,
	Dave Chinner <david@fromorbit.com>,
	David Rientjes <rientjes@google.com>,
	"Li, Shaohua" <shaohua.li@intel.com>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
Date: Tue, 3 May 2011 11:51:12 +0800	[thread overview]
Message-ID: <20110503035112.GA10906@localhost> (raw)
In-Reply-To: <BANLkTinXnhh5V0eH71=6PxZWpQxvti7QVw@mail.gmail.com>

Hi Minchan,

On Tue, May 03, 2011 at 08:49:20AM +0800, Minchan Kim wrote:
> Hi Wu, Sorry for slow response.
> I guess you know why I am slow. :)

Yeah, never mind :)

> Unfortunately, my patch doesn't consider order-0 pages, as you mentioned below.
> I read your mail which states it doesn't help although it considers
> order-0 pages and drain.
> Actually, I tried to look into that but in my poor system(core2duo, 2G
> ram), nr_alloc_fail never happens. :(

I'm running a 4-core 8-thread CPU with 3G ram.

Did you run with this patch?

[PATCH] mm: readahead page allocations are OK to fail
https://lkml.org/lkml/2011/4/26/129

It's very good at generating lots of __GFP_NORETRY order-0 page
allocation requests.

> I will try it in other desktop but I am not sure I can reproduce it.
> 
> >
> > root@fat /home/wfg# ./test-dd-sparse.sh
> > start time: 246
> > total time: 531
> > nr_alloc_fail 14097
> > allocstall 1578332
> > LOC:     542698     538947     536986     567118     552114     539605     541201     537623   Local timer interrupts
> > RES:       3368       1908       1474       1476       2809       1602       1500       1509   Rescheduling interrupts
> > CAL:     223844     224198     224268     224436     223952     224056     223700     223743   Function call interrupts
> > TLB:        381         27         22         19         96        404        111         67   TLB shootdowns
> >
> > root@fat /home/wfg# getdelays -dip `pidof dd`
> > print delayacct stats ON
> > printing IO accounting
> > PID     5202
> >
> >
> > CPU             count     real total  virtual total    delay total
> >                 1132     3635447328     3627947550   276722091605
> > IO              count    delay total  delay average
> >                    2      187809974             62ms
> > SWAP            count    delay total  delay average
> >                    0              0              0ms
> > RECLAIM         count    delay total  delay average
> >                 1334    35304580824             26ms
> > dd: read=278528, write=0, cancelled_write=0
> >
> > I guess your patch is mainly fixing the high order allocations while
> > my workload is mainly order 0 readahead page allocations. There are
> > 1000 forks, however the "start time: 246" seems to indicate that the
> > order-1 reclaim latency is not improved.
> 
> Maybe, 8K * 1000 isn't big footprint so I think reclaim doesn't happen.

It's mainly a guess. In an earlier experiment of simply increasing
nr_to_reclaim to high_wmark_pages() without any other constraints, it
does manage to reduce start time to about 25 seconds.

> > I'll try modifying your patch and see how it works out. The obvious
> > change is to apply it to the order-0 case. Hope this won't create much
> > more isolated pages.
> >
> > Attached is your patch rebased to 2.6.39-rc3, after resolving some
> > merge conflicts and fixing a trivial NULL pointer bug.
> 
> Thanks!
> I would like to see detail with it in my system if I can reproduce it.

OK.

> >> > no cond_resched():
> >>
> >> What's this?
> >
> > I tried a modified patch that also removes the cond_resched() call in
> > __alloc_pages_direct_reclaim(), between try_to_free_pages() and
> > get_page_from_freelist(). It seems not helping noticeably.
> >
> > It looks safe to remove that cond_resched() as we already have such
> > calls in shrink_page_list().
> 
> I tried similar thing but Andrew have a concern about it.
> https://lkml.org/lkml/2011/3/24/138

Yeah cond_resched() is at least not the root cause of our problems..

> >> > +                     if (total_scanned > 2 * sc->nr_to_reclaim)
> >> > +                             goto out;
> >>
> >> If there are lots of dirty pages in LRU?
> >> If there are lots of unevictable pages in LRU?
> >> If there are lots of mapped page in LRU but may_unmap = 0 cases?
> >> I means it's rather risky early conclusion.
> >
> > That test means to avoid scanning too much on __GFP_NORETRY direct
> > reclaims. My assumption for __GFP_NORETRY is, it should fail fast when
> > the LRU pages seem hard to reclaim. And the problem in the 1000 dd
> > case is, it's all easy to reclaim LRU pages but __GFP_NORETRY still
> > fails from time to time, with lots of IPIs that may hurt large
> > machines a lot.
> 
> I don't have  enough time and a environment to test it.
> So I can't make sure of it but my concern is a latency.
> If you solve latency problem considering CPU scaling, I won't oppose it. :)

OK, let's head for that direction :)

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-05-03  3:51 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-26  5:49 readahead and oom Dave Young
2011-04-26  5:49 ` Dave Young
2011-04-26  5:55 ` Wu Fengguang
2011-04-26  5:55   ` Wu Fengguang
2011-04-26  6:05   ` Dave Young
2011-04-26  6:05     ` Dave Young
2011-04-26  6:07     ` Dave Young
2011-04-26  6:07       ` Dave Young
2011-04-26  6:25       ` Wu Fengguang
2011-04-26  6:25         ` Wu Fengguang
2011-04-26  6:29         ` Dave Young
2011-04-26  6:29           ` Dave Young
2011-04-26  6:34           ` Wu Fengguang
2011-04-26  6:34             ` Wu Fengguang
2011-04-26  6:50             ` KOSAKI Motohiro
2011-04-26  6:50               ` KOSAKI Motohiro
2011-04-26  7:41             ` Minchan Kim
2011-04-26  7:41               ` Minchan Kim
2011-04-26  9:20               ` Wu Fengguang
2011-04-26  9:20                 ` Wu Fengguang
2011-04-26  9:28                 ` Minchan Kim
2011-04-26  9:28                   ` Minchan Kim
2011-04-26 10:18                   ` Pekka Enberg
2011-04-26 10:18                     ` Pekka Enberg
2011-04-26 19:47                 ` Andrew Morton
2011-04-26 19:47                   ` Andrew Morton
2011-04-28  4:19                   ` Wu Fengguang
2011-04-28 13:36                   ` [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures Wu Fengguang
2011-04-28 13:36                     ` Wu Fengguang
2011-04-28 13:38                     ` [patch] vmstat: account " Wu Fengguang
2011-04-28 13:38                       ` Wu Fengguang
2011-04-28 13:50                       ` KOSAKI Motohiro
2011-04-28 13:50                         ` KOSAKI Motohiro
2011-04-29  2:28                     ` [RFC][PATCH] mm: cut down __GFP_NORETRY " Wu Fengguang
2011-04-29  2:28                       ` Wu Fengguang
2011-04-29  2:58                       ` Wu Fengguang
2011-04-29  2:58                         ` Wu Fengguang
2011-04-30 14:17                       ` Wu Fengguang
2011-04-30 14:17                         ` Wu Fengguang
2011-05-01 16:35                         ` Minchan Kim
2011-05-01 16:35                           ` Minchan Kim
2011-05-01 16:37                           ` Minchan Kim
2011-05-01 16:37                             ` Minchan Kim
2011-05-02 10:14                             ` KOSAKI Motohiro
2011-05-02 10:14                               ` KOSAKI Motohiro
2011-05-03  0:53                               ` Minchan Kim
2011-05-03  0:53                                 ` Minchan Kim
2011-05-03  1:25                                 ` KOSAKI Motohiro
2011-05-03  1:25                                   ` KOSAKI Motohiro
2011-05-02 10:29                           ` Wu Fengguang
2011-05-02 11:08                             ` Wu Fengguang
2011-05-02 11:08                               ` Wu Fengguang
2011-05-03  0:49                             ` Minchan Kim
2011-05-03  0:49                               ` Minchan Kim
2011-05-03  3:51                               ` Wu Fengguang [this message]
2011-05-03  3:51                                 ` Wu Fengguang
2011-05-03  4:17                                 ` Minchan Kim
2011-05-03  4:17                                   ` Minchan Kim
2011-05-02 13:29                           ` Wu Fengguang
2011-05-02 13:29                             ` Wu Fengguang
2011-05-02 13:49                             ` Wu Fengguang
2011-05-02 13:49                               ` Wu Fengguang
2011-05-03  0:27                               ` Satoru Moriya
2011-05-03  0:27                                 ` Satoru Moriya
2011-05-03  2:49                                 ` Wu Fengguang
2011-05-03  2:49                                   ` Wu Fengguang
2011-05-04  1:56                     ` Dave Young
2011-05-04  1:56                       ` Dave Young
2011-05-04  2:32                       ` Dave Young
2011-05-04  2:32                         ` Dave Young
2011-05-04  2:56                         ` Wu Fengguang
2011-05-04  2:56                           ` Wu Fengguang
2011-05-04  4:23                           ` Wu Fengguang
2011-05-04  4:23                             ` Wu Fengguang
2011-05-04  4:00                       ` Wu Fengguang
2011-05-04  4:00                         ` Wu Fengguang
2011-05-04  7:33                         ` Dave Young
2011-05-04  7:33                           ` Dave Young
2011-04-26  6:13     ` readahead and oom Wu Fengguang
2011-04-26  6:13       ` Wu Fengguang
2011-04-26  6:23       ` Dave Young
2011-04-26  6:23         ` Dave Young
2011-04-26  9:37 ` [PATCH] mm: readahead page allocations are OK to fail Wu Fengguang
2011-04-26  9:37   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110503035112.GA10906@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=david@fromorbit.com \
    --cc=hidave.darkstar@gmail.com \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@linux.vnet.ibm.com \
    --cc=minchan.kim@gmail.com \
    --cc=rientjes@google.com \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.