From: Wu Fengguang <fengguang.wu@intel.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mel@linux.vnet.ibm.com>,
Dave Young <hidave.darkstar@gmail.com>,
linux-mm <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Christoph Lameter <cl@linux.com>,
Dave Chinner <david@fromorbit.com>,
David Rientjes <rientjes@google.com>,
"Li, Shaohua" <shaohua.li@intel.com>,
Hugh Dickins <hughd@google.com>
Subject: Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
Date: Tue, 3 May 2011 11:51:12 +0800 [thread overview]
Message-ID: <20110503035112.GA10906@localhost> (raw)
In-Reply-To: <BANLkTinXnhh5V0eH71=6PxZWpQxvti7QVw@mail.gmail.com>
Hi Minchan,
On Tue, May 03, 2011 at 08:49:20AM +0800, Minchan Kim wrote:
> Hi Wu, Sorry for slow response.
> I guess you know why I am slow. :)
Yeah, never mind :)
> Unfortunately, my patch doesn't consider order-0 pages, as you mentioned below.
> I read your mail which states it doesn't help although it considers
> order-0 pages and drain.
> Actually, I tried to look into that but in my poor system(core2duo, 2G
> ram), nr_alloc_fail never happens. :(
I'm running a 4-core 8-thread CPU with 3G ram.
Did you run with this patch?
[PATCH] mm: readahead page allocations are OK to fail
https://lkml.org/lkml/2011/4/26/129
It's very good at generating lots of __GFP_NORETRY order-0 page
allocation requests.
> I will try it in other desktop but I am not sure I can reproduce it.
>
> >
> > root@fat /home/wfg# ./test-dd-sparse.sh
> > start time: 246
> > total time: 531
> > nr_alloc_fail 14097
> > allocstall 1578332
> > LOC: 542698 538947 536986 567118 552114 539605 541201 537623 Local timer interrupts
> > RES: 3368 1908 1474 1476 2809 1602 1500 1509 Rescheduling interrupts
> > CAL: 223844 224198 224268 224436 223952 224056 223700 223743 Function call interrupts
> > TLB: 381 27 22 19 96 404 111 67 TLB shootdowns
> >
> > root@fat /home/wfg# getdelays -dip `pidof dd`
> > print delayacct stats ON
> > printing IO accounting
> > PID 5202
> >
> >
> > CPU count real total virtual total delay total
> > 1132 3635447328 3627947550 276722091605
> > IO count delay total delay average
> > 2 187809974 62ms
> > SWAP count delay total delay average
> > 0 0 0ms
> > RECLAIM count delay total delay average
> > 1334 35304580824 26ms
> > dd: read=278528, write=0, cancelled_write=0
> >
> > I guess your patch is mainly fixing the high order allocations while
> > my workload is mainly order 0 readahead page allocations. There are
> > 1000 forks, however the "start time: 246" seems to indicate that the
> > order-1 reclaim latency is not improved.
>
> Maybe, 8K * 1000 isn't big footprint so I think reclaim doesn't happen.
It's mainly a guess. In an earlier experiment of simply increasing
nr_to_reclaim to high_wmark_pages() without any other constraints, it
does manage to reduce start time to about 25 seconds.
> > I'll try modifying your patch and see how it works out. The obvious
> > change is to apply it to the order-0 case. Hope this won't create much
> > more isolated pages.
> >
> > Attached is your patch rebased to 2.6.39-rc3, after resolving some
> > merge conflicts and fixing a trivial NULL pointer bug.
>
> Thanks!
> I would like to see detail with it in my system if I can reproduce it.
OK.
> >> > no cond_resched():
> >>
> >> What's this?
> >
> > I tried a modified patch that also removes the cond_resched() call in
> > __alloc_pages_direct_reclaim(), between try_to_free_pages() and
> > get_page_from_freelist(). It seems not helping noticeably.
> >
> > It looks safe to remove that cond_resched() as we already have such
> > calls in shrink_page_list().
>
> I tried similar thing but Andrew have a concern about it.
> https://lkml.org/lkml/2011/3/24/138
Yeah cond_resched() is at least not the root cause of our problems..
> >> > + if (total_scanned > 2 * sc->nr_to_reclaim)
> >> > + goto out;
> >>
> >> If there are lots of dirty pages in LRU?
> >> If there are lots of unevictable pages in LRU?
> >> If there are lots of mapped page in LRU but may_unmap = 0 cases?
> >> I means it's rather risky early conclusion.
> >
> > That test means to avoid scanning too much on __GFP_NORETRY direct
> > reclaims. My assumption for __GFP_NORETRY is, it should fail fast when
> > the LRU pages seem hard to reclaim. And the problem in the 1000 dd
> > case is, it's all easy to reclaim LRU pages but __GFP_NORETRY still
> > fails from time to time, with lots of IPIs that may hurt large
> > machines a lot.
>
> I don't have enough time and a environment to test it.
> So I can't make sure of it but my concern is a latency.
> If you solve latency problem considering CPU scaling, I won't oppose it. :)
OK, let's head for that direction :)
Thanks,
Fengguang
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mel@linux.vnet.ibm.com>,
Dave Young <hidave.darkstar@gmail.com>,
linux-mm <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Christoph Lameter <cl@linux.com>,
Dave Chinner <david@fromorbit.com>,
David Rientjes <rientjes@google.com>,
"Li, Shaohua" <shaohua.li@intel.com>,
Hugh Dickins <hughd@google.com>
Subject: Re: [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures
Date: Tue, 3 May 2011 11:51:12 +0800 [thread overview]
Message-ID: <20110503035112.GA10906@localhost> (raw)
In-Reply-To: <BANLkTinXnhh5V0eH71=6PxZWpQxvti7QVw@mail.gmail.com>
Hi Minchan,
On Tue, May 03, 2011 at 08:49:20AM +0800, Minchan Kim wrote:
> Hi Wu, Sorry for slow response.
> I guess you know why I am slow. :)
Yeah, never mind :)
> Unfortunately, my patch doesn't consider order-0 pages, as you mentioned below.
> I read your mail which states it doesn't help although it considers
> order-0 pages and drain.
> Actually, I tried to look into that but in my poor system(core2duo, 2G
> ram), nr_alloc_fail never happens. :(
I'm running a 4-core 8-thread CPU with 3G ram.
Did you run with this patch?
[PATCH] mm: readahead page allocations are OK to fail
https://lkml.org/lkml/2011/4/26/129
It's very good at generating lots of __GFP_NORETRY order-0 page
allocation requests.
> I will try it in other desktop but I am not sure I can reproduce it.
>
> >
> > root@fat /home/wfg# ./test-dd-sparse.sh
> > start time: 246
> > total time: 531
> > nr_alloc_fail 14097
> > allocstall 1578332
> > LOC: 542698 538947 536986 567118 552114 539605 541201 537623 Local timer interrupts
> > RES: 3368 1908 1474 1476 2809 1602 1500 1509 Rescheduling interrupts
> > CAL: 223844 224198 224268 224436 223952 224056 223700 223743 Function call interrupts
> > TLB: 381 27 22 19 96 404 111 67 TLB shootdowns
> >
> > root@fat /home/wfg# getdelays -dip `pidof dd`
> > print delayacct stats ON
> > printing IO accounting
> > PID 5202
> >
> >
> > CPU count real total virtual total delay total
> > 1132 3635447328 3627947550 276722091605
> > IO count delay total delay average
> > 2 187809974 62ms
> > SWAP count delay total delay average
> > 0 0 0ms
> > RECLAIM count delay total delay average
> > 1334 35304580824 26ms
> > dd: read=278528, write=0, cancelled_write=0
> >
> > I guess your patch is mainly fixing the high order allocations while
> > my workload is mainly order 0 readahead page allocations. There are
> > 1000 forks, however the "start time: 246" seems to indicate that the
> > order-1 reclaim latency is not improved.
>
> Maybe, 8K * 1000 isn't big footprint so I think reclaim doesn't happen.
It's mainly a guess. In an earlier experiment of simply increasing
nr_to_reclaim to high_wmark_pages() without any other constraints, it
does manage to reduce start time to about 25 seconds.
> > I'll try modifying your patch and see how it works out. The obvious
> > change is to apply it to the order-0 case. Hope this won't create much
> > more isolated pages.
> >
> > Attached is your patch rebased to 2.6.39-rc3, after resolving some
> > merge conflicts and fixing a trivial NULL pointer bug.
>
> Thanks!
> I would like to see detail with it in my system if I can reproduce it.
OK.
> >> > no cond_resched():
> >>
> >> What's this?
> >
> > I tried a modified patch that also removes the cond_resched() call in
> > __alloc_pages_direct_reclaim(), between try_to_free_pages() and
> > get_page_from_freelist(). It seems not helping noticeably.
> >
> > It looks safe to remove that cond_resched() as we already have such
> > calls in shrink_page_list().
>
> I tried similar thing but Andrew have a concern about it.
> https://lkml.org/lkml/2011/3/24/138
Yeah cond_resched() is at least not the root cause of our problems..
> >> > + if (total_scanned > 2 * sc->nr_to_reclaim)
> >> > + goto out;
> >>
> >> If there are lots of dirty pages in LRU?
> >> If there are lots of unevictable pages in LRU?
> >> If there are lots of mapped page in LRU but may_unmap = 0 cases?
> >> I means it's rather risky early conclusion.
> >
> > That test means to avoid scanning too much on __GFP_NORETRY direct
> > reclaims. My assumption for __GFP_NORETRY is, it should fail fast when
> > the LRU pages seem hard to reclaim. And the problem in the 1000 dd
> > case is, it's all easy to reclaim LRU pages but __GFP_NORETRY still
> > fails from time to time, with lots of IPIs that may hurt large
> > machines a lot.
>
> I don't have enough time and a environment to test it.
> So I can't make sure of it but my concern is a latency.
> If you solve latency problem considering CPU scaling, I won't oppose it. :)
OK, let's head for that direction :)
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-05-03 3:51 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-26 5:49 readahead and oom Dave Young
2011-04-26 5:49 ` Dave Young
2011-04-26 5:55 ` Wu Fengguang
2011-04-26 5:55 ` Wu Fengguang
2011-04-26 6:05 ` Dave Young
2011-04-26 6:05 ` Dave Young
2011-04-26 6:07 ` Dave Young
2011-04-26 6:07 ` Dave Young
2011-04-26 6:25 ` Wu Fengguang
2011-04-26 6:25 ` Wu Fengguang
2011-04-26 6:29 ` Dave Young
2011-04-26 6:29 ` Dave Young
2011-04-26 6:34 ` Wu Fengguang
2011-04-26 6:34 ` Wu Fengguang
2011-04-26 6:50 ` KOSAKI Motohiro
2011-04-26 6:50 ` KOSAKI Motohiro
2011-04-26 7:41 ` Minchan Kim
2011-04-26 7:41 ` Minchan Kim
2011-04-26 9:20 ` Wu Fengguang
2011-04-26 9:20 ` Wu Fengguang
2011-04-26 9:28 ` Minchan Kim
2011-04-26 9:28 ` Minchan Kim
2011-04-26 10:18 ` Pekka Enberg
2011-04-26 10:18 ` Pekka Enberg
2011-04-26 19:47 ` Andrew Morton
2011-04-26 19:47 ` Andrew Morton
2011-04-28 4:19 ` Wu Fengguang
2011-04-28 13:36 ` [RFC][PATCH] mm: cut down __GFP_NORETRY page allocation failures Wu Fengguang
2011-04-28 13:36 ` Wu Fengguang
2011-04-28 13:38 ` [patch] vmstat: account " Wu Fengguang
2011-04-28 13:38 ` Wu Fengguang
2011-04-28 13:50 ` KOSAKI Motohiro
2011-04-28 13:50 ` KOSAKI Motohiro
2011-04-29 2:28 ` [RFC][PATCH] mm: cut down __GFP_NORETRY " Wu Fengguang
2011-04-29 2:28 ` Wu Fengguang
2011-04-29 2:58 ` Wu Fengguang
2011-04-29 2:58 ` Wu Fengguang
2011-04-30 14:17 ` Wu Fengguang
2011-04-30 14:17 ` Wu Fengguang
2011-05-01 16:35 ` Minchan Kim
2011-05-01 16:35 ` Minchan Kim
2011-05-01 16:37 ` Minchan Kim
2011-05-01 16:37 ` Minchan Kim
2011-05-02 10:14 ` KOSAKI Motohiro
2011-05-02 10:14 ` KOSAKI Motohiro
2011-05-03 0:53 ` Minchan Kim
2011-05-03 0:53 ` Minchan Kim
2011-05-03 1:25 ` KOSAKI Motohiro
2011-05-03 1:25 ` KOSAKI Motohiro
2011-05-02 10:29 ` Wu Fengguang
2011-05-02 11:08 ` Wu Fengguang
2011-05-02 11:08 ` Wu Fengguang
2011-05-03 0:49 ` Minchan Kim
2011-05-03 0:49 ` Minchan Kim
2011-05-03 3:51 ` Wu Fengguang [this message]
2011-05-03 3:51 ` Wu Fengguang
2011-05-03 4:17 ` Minchan Kim
2011-05-03 4:17 ` Minchan Kim
2011-05-02 13:29 ` Wu Fengguang
2011-05-02 13:29 ` Wu Fengguang
2011-05-02 13:49 ` Wu Fengguang
2011-05-02 13:49 ` Wu Fengguang
2011-05-03 0:27 ` Satoru Moriya
2011-05-03 0:27 ` Satoru Moriya
2011-05-03 2:49 ` Wu Fengguang
2011-05-03 2:49 ` Wu Fengguang
2011-05-04 1:56 ` Dave Young
2011-05-04 1:56 ` Dave Young
2011-05-04 2:32 ` Dave Young
2011-05-04 2:32 ` Dave Young
2011-05-04 2:56 ` Wu Fengguang
2011-05-04 2:56 ` Wu Fengguang
2011-05-04 4:23 ` Wu Fengguang
2011-05-04 4:23 ` Wu Fengguang
2011-05-04 4:00 ` Wu Fengguang
2011-05-04 4:00 ` Wu Fengguang
2011-05-04 7:33 ` Dave Young
2011-05-04 7:33 ` Dave Young
2011-04-26 6:13 ` readahead and oom Wu Fengguang
2011-04-26 6:13 ` Wu Fengguang
2011-04-26 6:23 ` Dave Young
2011-04-26 6:23 ` Dave Young
2011-04-26 9:37 ` [PATCH] mm: readahead page allocations are OK to fail Wu Fengguang
2011-04-26 9:37 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110503035112.GA10906@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=david@fromorbit.com \
--cc=hidave.darkstar@gmail.com \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@linux.vnet.ibm.com \
--cc=minchan.kim@gmail.com \
--cc=rientjes@google.com \
--cc=shaohua.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.