* Re: Re: Performance degradation seen after using one list for hot/cold pages.
@ 2009-06-22 10:42 NARAYANAN GOPALAKRISHNAN
2009-06-22 11:09 ` Mel Gorman
0 siblings, 1 reply; 3+ messages in thread
From: NARAYANAN GOPALAKRISHNAN @ 2009-06-22 10:42 UTC (permalink / raw)
To: Mel Gorman, KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, cl@linux-foundation.org,
akpm@linux-foundation.org, kosaki.motohiro@jp.fujitsu.com
Hi,
We had also tried this patch and it fixes the issue. The read/write performance is regained.
The patch looks OK.
Can this be merged?
Regards,
Narayanan
------- Original Message -------
Sender : Mel Gorman<mel@csn.ul.ie>
Date : Jun 22, 2009 15:06 (GMT+05:00)
Title : Re: Performance degradation seen after using one list for hot/cold pages.
On Mon, Jun 22, 2009 at 04:41:47PM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 22 Jun 2009 11:20:14 +0530
> Narayanan Gopalakrishnan <narayanan.g@samsung.com> wrote:
>
> > Hi,
> >
> > We are facing a performance degradation of 2 MBps in kernels 2.6.25 and
> > above.
> > We were able to zero on the fact that the exact patch that has affected us
> > is this
> > (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdi
> > ff;h=3dfa5721f12c3d5a441448086bee156887daa961), that changes to have one
> > list for hot/cold pages.
> >
> > We see the at the block driver the pages we get are not contiguous hence the
> > number of LLD requests we are making have increased which is the cause of
> > this problem.
> >
> > The page allocation in our case is called from aio_read and hence it always
> > calls page_cache_alloc_cold(mapping) from readahead.
> >
> > We have found a hack for this that is, removing the __GFP_COLD macro when
> > __page_cache_alloc()is called helps us to regain the performance as we see
> > contiguous pages in block driver.
> >
> > Has anyone faced this problem or can give a possible solution for this?
> >
I've seen this problem before. In the 2.6.24 timeframe, performance degradation
of IO was reported when I broke the property of the buddy allocator that
returns contiguous pages in some cases. IIRC, some IO devices can automatically
merge requests if the pages happen to be physically contiguous.
> > Our target is OMAP2430 custom board with 128MB RAM.
> >
> Added some CCs.
>
> My understanding is this:
>
> Assume A,B,C,D are pfn of continuous pages. (B=A+1, C=A+2, D=A+3)
>
> 1) When there are 2 lists for hot and cold pages, pcp list is constracted in
> following order after rmqueue_bulk().
>
> pcp_list[cold] (next) <-> A <-> B <-> C <-> D <-(prev) pcp_list[cold]
>
> The pages are drained from "next" and pages were given in sequence of
> A, B, C, D...
>
> 2) Now, pcp list is constracted as following after rmqueue_bulk()
>
> pcp_list (next) <-> A <-> B <-> C <-> D <-> (prev) pcp_list
>
> When __GFP_COLD, the page is drained via "prev" and sequence of given pages
> is D,C,B,A...
>
> Then, removing __GFP_COLD allows you to allocate pages in sequence of
> A, B, C, D.
>
> Looking into page_alloc.c::rmqueue_bulk(),
> 871 /*
> 872 * Split buddy pages returned by expand() are received here
> 873 * in physical page order. The page is added to the callers and
> 874 * list and the list head then moves forward. From the callers
> 875 * perspective, the linked list is ordered by page number in
> 876 * some conditions. This is useful for IO devices that can
> 877 * merge IO requests if the physical pages are ordered
> 878 * properly.
> 879 */
>
> Order of pfn is taken into account but doesn't work well for __GFP_COLD
> allocation. (works well for not __GFP_COLD allocation.)
> Using 2 lists again or modify current behavior ?
>
This analysis looks spot-on. The lack of physical contiguity is what is
critical, not that the pages are hot or cold in cache. I think it would be
overkill to reintroduce two separate lists to preserve the ordering in
that case. How about something like the following?
==== CUT HERE ====
[PATCH] page-allocator: Preserve PFN ordering when __GFP_COLD is set
The page allocator tries to preserve contiguous PFN ordering when returning
pages such that repeated callers to the allocator have a strong chance of
getting physically contiguous pages, particularly when external fragmentation
is low. However, of the bulk of the allocations have __GFP_COLD set as
they are due to aio_read() for example, then the PFNs are in reverse PFN
order. This can cause performance degration when used with IO
controllers that could have merged the requests.
This patch attempts to preserve the contiguous ordering of PFNs for
users of __GFP_COLD.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/page_alloc.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a5f3c27..9cd32c8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -882,7 +882,7 @@ retry_reserve:
*/
static int rmqueue_bulk(struct zone *zone, unsigned int order,
unsigned long count, struct list_head *list,
- int migratetype)
+ int migratetype, int cold)
{
int i;
@@ -901,7 +901,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
* merge IO requests if the physical pages are ordered
* properly.
*/
- list_add(&page->lru, list);
+ if (likely(cold == 0))
+ list_add(&page->lru, list);
+ else
+ list_add_tail(&page->lru, list);
set_page_private(page, migratetype);
list = &page->lru;
}
@@ -1119,7 +1122,8 @@ again:
local_irq_save(flags);
if (!pcp->count) {
pcp->count = rmqueue_bulk(zone, 0,
- pcp->batch, &pcp->list, migratetype);
+ pcp->batch, &pcp->list,
+ migratetype, cold);
if (unlikely(!pcp->count))
goto failed;
}
@@ -1138,7 +1142,8 @@ again:
/* Allocate more to the pcp list if necessary */
if (unlikely(&page->lru == &pcp->list)) {
pcp->count += rmqueue_bulk(zone, 0,
- pcp->batch, &pcp->list, migratetype);
+ pcp->batch, &pcp->list,
+ migratetype, cold);
page = list_entry(pcp->list.next, struct page, lru);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Narayanan Gopalakrishnan
Memory Solutions Division,
Samsung India Software Operations,
Phone: (91) 80-41819999 Extn: 5148
Mobile: 91-93410-42022
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: Re: Performance degradation seen after using one list for hot/cold pages.
2009-06-22 10:42 Re: Performance degradation seen after using one list for hot/cold pages NARAYANAN GOPALAKRISHNAN
@ 2009-06-22 11:09 ` Mel Gorman
0 siblings, 0 replies; 3+ messages in thread
From: Mel Gorman @ 2009-06-22 11:09 UTC (permalink / raw)
To: NARAYANAN GOPALAKRISHNAN
Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, cl@linux-foundation.org,
akpm@linux-foundation.org, kosaki.motohiro@jp.fujitsu.com
On Mon, Jun 22, 2009 at 10:42:40AM +0000, NARAYANAN GOPALAKRISHNAN wrote:
> Hi,
>
> We had also tried this patch and it fixes the issue. The read/write performance is regained.
> The patch looks OK.
>
> Can this be merged?
>
Not just yet. Is there any chance you could provide a simple test program
using AIO read and tell me what filesystem you are based on please? I'd
like to at least look at identifying when the readahead is happening due to
aio_read() and using page_cache_alloc() instead of page_cache_alloc_cold()
in that case. It would avoid adding a branch to the page allocator itself.
> ------- Original Message -------
> Sender : Mel Gorman<mel@csn.ul.ie>
> Date : Jun 22, 2009 15:06 (GMT+05:00)
> Title : Re: Performance degradation seen after using one list for hot/cold pages.
>
> On Mon, Jun 22, 2009 at 04:41:47PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Mon, 22 Jun 2009 11:20:14 +0530
> > Narayanan Gopalakrishnan <narayanan.g@samsung.com> wrote:
> >
> > > Hi,
> > >
> > > We are facing a performance degradation of 2 MBps in kernels 2.6.25 and
> > > above.
> > > We were able to zero on the fact that the exact patch that has affected us
> > > is this
> > > (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdi
> > > ff;h=3dfa5721f12c3d5a441448086bee156887daa961), that changes to have one
> > > list for hot/cold pages.
> > >
> > > We see the at the block driver the pages we get are not contiguous hence the
> > > number of LLD requests we are making have increased which is the cause of
> > > this problem.
> > >
> > > The page allocation in our case is called from aio_read and hence it always
> > > calls page_cache_alloc_cold(mapping) from readahead.
> > >
> > > We have found a hack for this that is, removing the __GFP_COLD macro when
> > > __page_cache_alloc()is called helps us to regain the performance as we see
> > > contiguous pages in block driver.
> > >
> > > Has anyone faced this problem or can give a possible solution for this?
> > >
>
> I've seen this problem before. In the 2.6.24 timeframe, performance degradation
> of IO was reported when I broke the property of the buddy allocator that
> returns contiguous pages in some cases. IIRC, some IO devices can automatically
> merge requests if the pages happen to be physically contiguous.
>
> > > Our target is OMAP2430 custom board with 128MB RAM.
> > >
> > Added some CCs.
> >
> > My understanding is this:
> >
> > Assume A,B,C,D are pfn of continuous pages. (B=A+1, C=A+2, D=A+3)
> >
> > 1) When there are 2 lists for hot and cold pages, pcp list is constracted in
> > following order after rmqueue_bulk().
> >
> > pcp_list[cold] (next) <-> A <-> B <-> C <-> D <-(prev) pcp_list[cold]
> >
> > The pages are drained from "next" and pages were given in sequence of
> > A, B, C, D...
> >
> > 2) Now, pcp list is constracted as following after rmqueue_bulk()
> >
> > pcp_list (next) <-> A <-> B <-> C <-> D <-> (prev) pcp_list
> >
> > When __GFP_COLD, the page is drained via "prev" and sequence of given pages
> > is D,C,B,A...
> >
> > Then, removing __GFP_COLD allows you to allocate pages in sequence of
> > A, B, C, D.
> >
> > Looking into page_alloc.c::rmqueue_bulk(),
> > 871 /*
> > 872 * Split buddy pages returned by expand() are received here
> > 873 * in physical page order. The page is added to the callers and
> > 874 * list and the list head then moves forward. From the callers
> > 875 * perspective, the linked list is ordered by page number in
> > 876 * some conditions. This is useful for IO devices that can
> > 877 * merge IO requests if the physical pages are ordered
> > 878 * properly.
> > 879 */
> >
> > Order of pfn is taken into account but doesn't work well for __GFP_COLD
> > allocation. (works well for not __GFP_COLD allocation.)
> > Using 2 lists again or modify current behavior ?
> >
>
> This analysis looks spot-on. The lack of physical contiguity is what is
> critical, not that the pages are hot or cold in cache. I think it would be
> overkill to reintroduce two separate lists to preserve the ordering in
> that case. How about something like the following?
>
> ==== CUT HERE ====
> [PATCH] page-allocator: Preserve PFN ordering when __GFP_COLD is set
>
> The page allocator tries to preserve contiguous PFN ordering when returning
> pages such that repeated callers to the allocator have a strong chance of
> getting physically contiguous pages, particularly when external fragmentation
> is low. However, of the bulk of the allocations have __GFP_COLD set as
> they are due to aio_read() for example, then the PFNs are in reverse PFN
> order. This can cause performance degration when used with IO
> controllers that could have merged the requests.
>
> This patch attempts to preserve the contiguous ordering of PFNs for
> users of __GFP_COLD.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
> mm/page_alloc.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a5f3c27..9cd32c8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -882,7 +882,7 @@ retry_reserve:
> */
> static int rmqueue_bulk(struct zone *zone, unsigned int order,
> unsigned long count, struct list_head *list,
> - int migratetype)
> + int migratetype, int cold)
> {
> int i;
>
> @@ -901,7 +901,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
> * merge IO requests if the physical pages are ordered
> * properly.
> */
> - list_add(&page->lru, list);
> + if (likely(cold == 0))
> + list_add(&page->lru, list);
> + else
> + list_add_tail(&page->lru, list);
> set_page_private(page, migratetype);
> list = &page->lru;
> }
> @@ -1119,7 +1122,8 @@ again:
> local_irq_save(flags);
> if (!pcp->count) {
> pcp->count = rmqueue_bulk(zone, 0,
> - pcp->batch, &pcp->list, migratetype);
> + pcp->batch, &pcp->list,
> + migratetype, cold);
> if (unlikely(!pcp->count))
> goto failed;
> }
> @@ -1138,7 +1142,8 @@ again:
> /* Allocate more to the pcp list if necessary */
> if (unlikely(&page->lru == &pcp->list)) {
> pcp->count += rmqueue_bulk(zone, 0,
> - pcp->batch, &pcp->list, migratetype);
> + pcp->batch, &pcp->list,
> + migratetype, cold);
> page = list_entry(pcp->list.next, struct page, lru);
> }
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
>
>
> Narayanan Gopalakrishnan
> Memory Solutions Division,
> Samsung India Software Operations,
> Phone: (91) 80-41819999 Extn: 5148
> Mobile: 91-93410-42022
>
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Re: Performance degradation seen after using one list for hot/cold pages.
@ 2009-07-09 4:54 NARAYANAN GOPALAKRISHNAN
0 siblings, 0 replies; 3+ messages in thread
From: NARAYANAN GOPALAKRISHNAN @ 2009-07-09 4:54 UTC (permalink / raw)
To: Andrew Morton, Mel Gorman
Cc: kamezawa.hiroyu@jp.fujitsu.com, linux-mm@kvack.org,
cl@linux-foundation.org, kosaki.motohiro@jp.fujitsu.com,
stable@kernel.org, linux-kernel@vger.kernel.org,
linux-scsi@vger.kernel.org
> I'll add the rather important text:
>
> Fix a post-2.6.24 performance regression caused by
> 3dfa5721f12c3d5a441448086bee156887daa961 ("page-allocator: preserve PFN
> ordering when __GFP_COLD is set").
>
> This was a pretty major screwup.
>
> This is why changing core MM is so worrisome - there's so much secret and
> subtle history to it, and performance dependencies are unobvious and quite
> indirect and the lag time to discover regressions is long.
>
> Narayanan, are you able to quantify the regression more clearly? All I
> have is "2 MBps lower" which isn't very useful. What is this as a
> percentage, and with what sort of disk controller? Thanks.
It is around 15%. There is no disk controller as our setup is based on Samsung OneNAND
used as a memory mapped device on a OMAP2430 based board.
Narayanan
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-07-09 4:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-22 10:42 Re: Performance degradation seen after using one list for hot/cold pages NARAYANAN GOPALAKRISHNAN
2009-06-22 11:09 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2009-07-09 4:54 NARAYANAN GOPALAKRISHNAN
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).