Re: Re: Re: Performance degradation seen after using one list for hot/cold pages.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: Re: Re: Performance degradation seen after using one list for hot/cold pages.
@ 2009-06-22 11:32 NARAYANAN GOPALAKRISHNAN
  2009-06-22 16:52 ` Mel Gorman
  0 siblings, 1 reply; 14+ messages in thread
From: NARAYANAN GOPALAKRISHNAN @ 2009-06-22 11:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, cl@linux-foundation.org,
	akpm@linux-foundation.org, kosaki.motohiro@jp.fujitsu.com

Hi,

We are running on VFAT.
We are using iozone performance benchmarking tool (http://www.iozone.org/src/current/iozone3_326.tar) for testing.

The parameters are 
/iozone -A -s10M -e -U /tmp -f /tmp/iozone_file

Our block driver requires requests to be merged to get the best performance.
This was not happening due to non-contiguous pages in all kernels >= 2.6.25.

Regards,

Narayanan 
 

------- Original Message -------
Sender : Mel Gorman<mel@csn.ul.ie>
Date   : Jun 22, 2009 16:09 (GMT+05:00)
Title  : Re: Re: Performance degradation seen after using one list for	hot/cold pages.

On Mon, Jun 22, 2009 at 10:42:40AM +0000, NARAYANAN GOPALAKRISHNAN wrote:
> Hi,
>  
> We had also tried this patch and it fixes the issue. The read/write performance is regained.
> The patch looks OK.
> 
> Can this be merged? 
> 

Not just yet. Is there any chance you could provide a simple test program
using AIO read and tell me what filesystem you are based on please? I&#39;d
like to at least look at identifying when the readahead is happening due to
aio_read() and using page_cache_alloc() instead of page_cache_alloc_cold()
in that case. It would avoid adding a branch to the page allocator itself.

> ------- Original Message -------
> Sender : Mel Gorman<mel@csn.ul.ie>
> Date   : Jun 22, 2009 15:06 (GMT+05:00)
> Title  : Re: Performance degradation seen after using one list for hot/cold    pages.
> 
> On Mon, Jun 22, 2009 at 04:41:47PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Mon, 22 Jun 2009 11:20:14 +0530
> > Narayanan Gopalakrishnan <narayanan.g@samsung.com> wrote:
> > 
> > > Hi,
> > > 
> > > We are facing a performance degradation of 2 MBps in kernels 2.6.25 and
> > > above.
> > > We were able to zero on the fact that the exact patch that has affected us
> > > is this
> > > (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdi
> > > ff;h=3dfa5721f12c3d5a441448086bee156887daa961), that changes to have one
> > > list for hot/cold pages. 
> > > 
> > > We see the at the block driver the pages we get are not contiguous hence the
> > > number of LLD requests we are making have increased which is the cause of
> > > this problem.
> > > 
> > > The page allocation in our case is called from aio_read and hence it always
> > > calls page_cache_alloc_cold(mapping) from readahead.
> > > 
> > > We have found a hack for this that is, removing the __GFP_COLD macro when
> > > __page_cache_alloc()is called helps us to regain the performance as we see
> > > contiguous pages in block driver.
> > > 
> > > Has anyone faced this problem or can give a possible solution for this?
> > > 
> 
> I&&#35;39;ve seen this problem before. In the 2.6.24 timeframe, performance degradation
> of IO was reported when I broke the property of the buddy allocator that
> returns contiguous pages in some cases. IIRC, some IO devices can automatically
> merge requests if the pages happen to be physically contiguous.
> 
> > > Our target is OMAP2430 custom board with 128MB RAM.
> > > 
> > Added some CCs.
> > 
> > My understanding is this: 
> > 
> > Assume A,B,C,D are pfn of continuous pages. (B=A+1, C=A+2, D=A+3)
> > 
> > 1) When there are 2 lists for hot and cold pages, pcp list is constracted in
> >    following order after rmqueue_bulk().
> > 
> >    pcp_list[cold] (next) <-> A <-> B <-> C <-> D <-(prev) pcp_list[cold]
> > 
> >    The pages are drained from "next" and pages were given in sequence of
> >    A, B, C, D...
> > 
> > 2) Now, pcp list is constracted as following after  rmqueue_bulk()
> > 
> >     pcp_list (next) <-> A <-> B <-> C <-> D <-> (prev) pcp_list
> > 
> >    When __GFP_COLD, the page is drained via "prev" and sequence of given pages
> >    is D,C,B,A...
> > 
> >    Then, removing __GFP_COLD allows you to allocate pages in sequence of
> >    A, B, C, D.
> > 
> > Looking into page_alloc.c::rmqueue_bulk(),
> >  871     /*
> >  872      * Split buddy pages returned by expand() are received here
> >  873      * in physical page order. The page is added to the callers and
> >  874      * list and the list head then moves forward. From the callers
> >  875      * perspective, the linked list is ordered by page number in
> >  876      * some conditions. This is useful for IO devices that can
> >  877      * merge IO requests if the physical pages are ordered
> >  878      * properly.
> >  879      */
> > 
> > Order of pfn is taken into account but doesn&&#35;39;t work well for __GFP_COLD
> > allocation. (works well for not __GFP_COLD allocation.)
> > Using 2 lists again or modify current behavior ?
> > 
> 
> This analysis looks spot-on. The lack of physical contiguity is what is
> critical, not that the pages are hot or cold in cache. I think it would be
> overkill to reintroduce two separate lists to preserve the ordering in
> that case. How about something like the following?
> 
> ==== CUT HERE ====
> [PATCH] page-allocator: Preserve PFN ordering when __GFP_COLD is set
> 
> The page allocator tries to preserve contiguous PFN ordering when returning
> pages such that repeated callers to the allocator have a strong chance of
> getting physically contiguous pages, particularly when external fragmentation
> is low. However, of the bulk of the allocations have __GFP_COLD set as
> they are due to aio_read() for example, then the PFNs are in reverse PFN
> order. This can cause performance degration when used with IO
> controllers that could have merged the requests.
> 
> This patch attempts to preserve the contiguous ordering of PFNs for
> users of __GFP_COLD.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> --- 
>  mm/page_alloc.c |   13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a5f3c27..9cd32c8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -882,7 +882,7 @@ retry_reserve:
>   */
>  static int rmqueue_bulk(struct zone *zone, unsigned int order, 
>              unsigned long count, struct list_head *list,
> -            int migratetype)
> +            int migratetype, int cold)
>  {
>      int i;
>      
> @@ -901,7 +901,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>           * merge IO requests if the physical pages are ordered
>           * properly.
>           */
> -        list_add(&page->lru, list);
> +        if (likely(cold == 0))
> +            list_add(&page->lru, list);
> +        else
> +            list_add_tail(&page->lru, list);
>          set_page_private(page, migratetype);
>          list = &page->lru;
>      }
> @@ -1119,7 +1122,8 @@ again:
>          local_irq_save(flags);
>          if (!pcp->count) {
>              pcp->count = rmqueue_bulk(zone, 0,
> -                    pcp->batch, &pcp->list, migratetype);
> +                    pcp->batch, &pcp->list,
> +                    migratetype, cold);
>              if (unlikely(!pcp->count))
>                  goto failed;
>          }
> @@ -1138,7 +1142,8 @@ again:
>          /* Allocate more to the pcp list if necessary */
>          if (unlikely(&page->lru == &pcp->list)) {
>              pcp->count += rmqueue_bulk(zone, 0,
> -                    pcp->batch, &pcp->list, migratetype);
> +                    pcp->batch, &pcp->list,
> +                    migratetype, cold);
>              page = list_entry(pcp->list.next, struct page, lru);
>          }
>  
> 
> --
> To unsubscribe, send a message with &&#35;39;unsubscribe linux-mm&&#35;39; in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don&&#35;39;t email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
>  
>  
> Narayanan Gopalakrishnan
> Memory Solutions Division,
> Samsung India Software Operations,
> Phone: (91) 80-41819999 Extn: 5148
> Mobile: 91-93410-42022
>  
>  
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with &#39;unsubscribe linux-mm&#39; in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don&#39;t email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

 N‹§²æìr¸›zÇ§u©ž²Æ {\b†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: Re: Performance degradation seen after using one list for hot/cold pages.
  2009-06-22 11:32 Re: Re: Performance degradation seen after using one list for hot/cold pages NARAYANAN GOPALAKRISHNAN
@ 2009-06-22 16:52 ` Mel Gorman
  2009-06-23  0:06   ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2009-06-22 16:52 UTC (permalink / raw)
  To: NARAYANAN GOPALAKRISHNAN
  Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, cl@linux-foundation.org,
	akpm@linux-foundation.org, kosaki.motohiro@jp.fujitsu.com

On Mon, Jun 22, 2009 at 11:32:03AM +0000, NARAYANAN GOPALAKRISHNAN wrote:
> Hi,
> 
> We are running on VFAT.
> We are using iozone performance benchmarking tool (http://www.iozone.org/src/current/iozone3_326.tar) for testing.
> 
> The parameters are 
> /iozone -A -s10M -e -U /tmp -f /tmp/iozone_file
> 
> Our block driver requires requests to be merged to get the best performance.
> This was not happening due to non-contiguous pages in all kernels >= 2.6.25.
> 

Ok, by the looks of things, all the aio_read() requests are due to readahead
as opposed to explicit AIO  requests from userspace. In this case, nothing
springs to mind that would avoid excessive requests for cold pages.

It looks like the simpliest solution is to go with the patch I posted.
Does anyone see a better alternative that doesn't branch in rmqueue_bulk()
or add back the hot/cold PCP lists?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance degradation seen after using one list for hot/cold pages.
  2009-06-22 16:52 ` Mel Gorman
@ 2009-06-23  0:06   ` KAMEZAWA Hiroyuki
  2009-06-23 17:41     ` Christoph Lameter
                       ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-23  0:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: NARAYANAN GOPALAKRISHNAN, linux-mm@kvack.org,
	cl@linux-foundation.org, akpm@linux-foundation.org,
	kosaki.motohiro@jp.fujitsu.com

On Mon, 22 Jun 2009 17:52:36 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> On Mon, Jun 22, 2009 at 11:32:03AM +0000, NARAYANAN GOPALAKRISHNAN wrote:
> > Hi,
> > 
> > We are running on VFAT.
> > We are using iozone performance benchmarking tool (http://www.iozone.org/src/current/iozone3_326.tar) for testing.
> > 
> > The parameters are 
> > /iozone -A -s10M -e -U /tmp -f /tmp/iozone_file
> > 
> > Our block driver requires requests to be merged to get the best performance.
> > This was not happening due to non-contiguous pages in all kernels >= 2.6.25.
> > 
> 
> Ok, by the looks of things, all the aio_read() requests are due to readahead
> as opposed to explicit AIO  requests from userspace. In this case, nothing
> springs to mind that would avoid excessive requests for cold pages.
> 
> It looks like the simpliest solution is to go with the patch I posted.
> Does anyone see a better alternative that doesn't branch in rmqueue_bulk()
> or add back the hot/cold PCP lists?
> 
No objection.  But 2 questions...

> -        list_add(&page->lru, list);
> +        if (likely(cold == 0))
> +            list_add(&page->lru, list);
> +        else
> +            list_add_tail(&page->lru, list);
>          set_page_private(page, migratetype);
>          list = &page->lru;
>      }

1. if (likely(coild == 0))
	"likely" is necessary ?

2. Why moving pointer "list" rather than following ?

	if (cold)
		list_add(&page->lru, list);
	else
		list_add_tail(&page->lru, list);


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance degradation seen after using one list for hot/cold pages.
  2009-06-23  0:06   ` KAMEZAWA Hiroyuki
@ 2009-06-23 17:41     ` Christoph Lameter
  2009-06-25  9:11     ` Minchan Kim
  2009-06-29  9:15     ` Mel Gorman
  2 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2009-06-23 17:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mel Gorman, NARAYANAN GOPALAKRISHNAN, linux-mm@kvack.org,
	akpm@linux-foundation.org, kosaki.motohiro@jp.fujitsu.com

On Tue, 23 Jun 2009, KAMEZAWA Hiroyuki wrote:

> > Ok, by the looks of things, all the aio_read() requests are due to readahead
> > as opposed to explicit AIO  requests from userspace. In this case, nothing
> > springs to mind that would avoid excessive requests for cold pages.
> >
> > It looks like the simpliest solution is to go with the patch I posted.
> > Does anyone see a better alternative that doesn't branch in rmqueue_bulk()
> > or add back the hot/cold PCP lists?
> >
> No objection.  But 2 questions...

Also no objections here. Readahead makes sense.

> 1. if (likely(coild == 0))
> 	"likely" is necessary ?

Would not think so. Code is sufficiently compact so that the
processor "readahead" will have both branches in cache.

> 2. Why moving pointer "list" rather than following ?
>
> 	if (cold)
> 		list_add(&page->lru, list);
> 	else
> 		list_add_tail(&page->lru, list);

Not sure what your point is here. Can you pickup the patch fix it up and
resubmit? Mel is out for now it seems.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance degradation seen after using one list for hot/cold pages.
  2009-06-23  0:06   ` KAMEZAWA Hiroyuki
  2009-06-23 17:41     ` Christoph Lameter
@ 2009-06-25  9:11     ` Minchan Kim
  2009-06-29  9:15     ` Mel Gorman
  2 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2009-06-25  9:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mel Gorman, NARAYANAN GOPALAKRISHNAN, linux-mm@kvack.org,
	cl@linux-foundation.org, akpm@linux-foundation.org,
	kosaki.motohiro@jp.fujitsu.com

On Tue, Jun 23, 2009 at 9:06 AM, KAMEZAWA
Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 22 Jun 2009 17:52:36 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
>
>> On Mon, Jun 22, 2009 at 11:32:03AM +0000, NARAYANAN GOPALAKRISHNAN wrote:
>> > Hi,
>> >
>> > We are running on VFAT.
>> > We are using iozone performance benchmarking tool (http://www.iozone.org/src/current/iozone3_326.tar) for testing.
>> >
>> > The parameters are
>> > /iozone -A -s10M -e -U /tmp -f /tmp/iozone_file
>> >
>> > Our block driver requires requests to be merged to get the best performance.
>> > This was not happening due to non-contiguous pages in all kernels >= 2.6.25.
>> >
>>
>> Ok, by the looks of things, all the aio_read() requests are due to readahead
>> as opposed to explicit AIO  requests from userspace. In this case, nothing
>> springs to mind that would avoid excessive requests for cold pages.
>>
>> It looks like the simpliest solution is to go with the patch I posted.
>> Does anyone see a better alternative that doesn't branch in rmqueue_bulk()
>> or add back the hot/cold PCP lists?
>>
> No objection.  But 2 questions...
>
>> -        list_add(&page->lru, list);
>> +        if (likely(cold == 0))
>> +            list_add(&page->lru, list);
>> +        else
>> +            list_add_tail(&page->lru, list);
>>          set_page_private(page, migratetype);
>>          list = &page->lru;
>>      }
>
> 1. if (likely(coild == 0))
>        "likely" is necessary ?
>
> 2. Why moving pointer "list" rather than following ?
>
>        if (cold)
>                list_add(&page->lru, list);
>        else
>                list_add_tail(&page->lru, list);


I agree.
We can remove unnecessary list head moving forward.


-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance degradation seen after using one list for hot/cold pages.
  2009-06-23  0:06   ` KAMEZAWA Hiroyuki
  2009-06-23 17:41     ` Christoph Lameter
  2009-06-25  9:11     ` Minchan Kim
@ 2009-06-29  9:15     ` Mel Gorman
  2009-07-08  2:37       ` Performance degradation seen after using one list for hot/coldpages Narayanan Gopalakrishnan
  2 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2009-06-29  9:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: NARAYANAN GOPALAKRISHNAN, linux-mm@kvack.org,
	cl@linux-foundation.org, akpm@linux-foundation.org,
	kosaki.motohiro@jp.fujitsu.com

On Tue, Jun 23, 2009 at 09:06:30AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 22 Jun 2009 17:52:36 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Mon, Jun 22, 2009 at 11:32:03AM +0000, NARAYANAN GOPALAKRISHNAN wrote:
> > > Hi,
> > > 
> > > We are running on VFAT.
> > > We are using iozone performance benchmarking tool (http://www.iozone.org/src/current/iozone3_326.tar) for testing.
> > > 
> > > The parameters are 
> > > /iozone -A -s10M -e -U /tmp -f /tmp/iozone_file
> > > 
> > > Our block driver requires requests to be merged to get the best performance.
> > > This was not happening due to non-contiguous pages in all kernels >= 2.6.25.
> > > 
> > 
> > Ok, by the looks of things, all the aio_read() requests are due to readahead
> > as opposed to explicit AIO  requests from userspace. In this case, nothing
> > springs to mind that would avoid excessive requests for cold pages.
> > 
> > It looks like the simpliest solution is to go with the patch I posted.
> > Does anyone see a better alternative that doesn't branch in rmqueue_bulk()
> > or add back the hot/cold PCP lists?
> > 
> No objection.  But 2 questions...
> 
> > -        list_add(&page->lru, list);
> > +        if (likely(cold == 0))
> > +            list_add(&page->lru, list);
> > +        else
> > +            list_add_tail(&page->lru, list);
> >          set_page_private(page, migratetype);
> >          list = &page->lru;
> >      }
> 
> 1. if (likely(coild == 0))
> 	"likely" is necessary ?
> 

Is likely/unlikely ever really necessary? The branch is small so maybe it
doesn't matter but the expectation is that the !cold path is hotter and more
commonly used. I can drop this is you like.

> 2. Why moving pointer "list" rather than following ?
> 
> 	if (cold)
> 		list_add(&page->lru, list);
> 	else
> 		list_add_tail(&page->lru, list);
> 

So that the list head from the caller is at the beginning of the newly
allocated contiguous pages. Lets say the free list looked something like

head -> 212 -> 200 -> 198

and then we add a few pages that are contiguous using list_add_tail

1 -> 2 -> 3 -> head -> 212 -> 200 -> 198

With this arrangement, we have to consume the existing pages before we get to
the contiguous pages and the struct pages that were more recently accessed
are further down the list so we potentially access a new cache line after
returning so the struct page for PFN 212 is accessed. With the list head
moving forward it, the returned list should look more like

head -> 1 -> 2 -> 3 -> 212 -> 200 -> 198

so we are accessing the contiguous pages first and a recently accessed struct
page. This was the intention at least of the list head moving forward.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Performance degradation seen after using one list for hot/coldpages.
  2009-06-29  9:15     ` Mel Gorman
@ 2009-07-08  2:37       ` Narayanan Gopalakrishnan
  2009-07-08 14:53         ` Christoph Lameter
  0 siblings, 1 reply; 14+ messages in thread
From: Narayanan Gopalakrishnan @ 2009-07-08  2:37 UTC (permalink / raw)
  To: 'Mel Gorman', 'KAMEZAWA Hiroyuki'
  Cc: linux-mm, cl, akpm, kosaki.motohiro

Hi,

We have done some stress testing using fsstress (LTP).
This patch seems to work fine with our OMAP based targets.
Can we have this merged?

Narayanan 

-----Original Message-----
From: owner-linux-mm@kvack.org [mailto:owner-linux-mm@kvack.org] On Behalf
Of Mel Gorman
Sent: Monday, June 29, 2009 2:46 PM
To: KAMEZAWA Hiroyuki
Cc: NARAYANAN GOPALAKRISHNAN; linux-mm@kvack.org; cl@linux-foundation.org;
akpm@linux-foundation.org; kosaki.motohiro@jp.fujitsu.com
Subject: Re: Performance degradation seen after using one list for
hot/coldpages.

On Tue, Jun 23, 2009 at 09:06:30AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 22 Jun 2009 17:52:36 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Mon, Jun 22, 2009 at 11:32:03AM +0000, NARAYANAN GOPALAKRISHNAN
wrote:
> > > Hi,
> > > 
> > > We are running on VFAT.
> > > We are using iozone performance benchmarking tool
(http://www.iozone.org/src/current/iozone3_326.tar) for testing.
> > > 
> > > The parameters are 
> > > /iozone -A -s10M -e -U /tmp -f /tmp/iozone_file
> > > 
> > > Our block driver requires requests to be merged to get the best
performance.
> > > This was not happening due to non-contiguous pages in all kernels >=
2.6.25.
> > > 
> > 
> > Ok, by the looks of things, all the aio_read() requests are due to
readahead
> > as opposed to explicit AIO  requests from userspace. In this case,
nothing
> > springs to mind that would avoid excessive requests for cold pages.
> > 
> > It looks like the simpliest solution is to go with the patch I posted.
> > Does anyone see a better alternative that doesn't branch in
rmqueue_bulk()
> > or add back the hot/cold PCP lists?
> > 
> No objection.  But 2 questions...
> 
> > -        list_add(&page->lru, list);
> > +        if (likely(cold == 0))
> > +            list_add(&page->lru, list);
> > +        else
> > +            list_add_tail(&page->lru, list);
> >          set_page_private(page, migratetype);
> >          list = &page->lru;
> >      }
> 
> 1. if (likely(coild == 0))
> 	"likely" is necessary ?
> 

Is likely/unlikely ever really necessary? The branch is small so maybe it
doesn't matter but the expectation is that the !cold path is hotter and more
commonly used. I can drop this is you like.

> 2. Why moving pointer "list" rather than following ?
> 
> 	if (cold)
> 		list_add(&page->lru, list);
> 	else
> 		list_add_tail(&page->lru, list);
> 

So that the list head from the caller is at the beginning of the newly
allocated contiguous pages. Lets say the free list looked something like

head -> 212 -> 200 -> 198

and then we add a few pages that are contiguous using list_add_tail

1 -> 2 -> 3 -> head -> 212 -> 200 -> 198

With this arrangement, we have to consume the existing pages before we get
to
the contiguous pages and the struct pages that were more recently accessed
are further down the list so we potentially access a new cache line after
returning so the struct page for PFN 212 is accessed. With the list head
moving forward it, the returned list should look more like

head -> 1 -> 2 -> 3 -> 212 -> 200 -> 198

so we are accessing the contiguous pages first and a recently accessed
struct
page. This was the intention at least of the list head moving forward.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Performance degradation seen after using one list for hot/coldpages.
  2009-07-08  2:37       ` Performance degradation seen after using one list for hot/coldpages Narayanan Gopalakrishnan
@ 2009-07-08 14:53         ` Christoph Lameter
  2009-07-08 15:27           ` Mel Gorman
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2009-07-08 14:53 UTC (permalink / raw)
  To: Narayanan Gopalakrishnan
  Cc: 'Mel Gorman', 'KAMEZAWA Hiroyuki', linux-mm, akpm,
	kosaki.motohiro

On Wed, 8 Jul 2009, Narayanan Gopalakrishnan wrote:

> We have done some stress testing using fsstress (LTP).
> This patch seems to work fine with our OMAP based targets.
> Can we have this merged?

Please post the patch that you tested. I am a bit confused due to
topposting. There were several outstanding issues in the message you
included.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance degradation seen after using one list for hot/coldpages.
  2009-07-08 14:53         ` Christoph Lameter
@ 2009-07-08 15:27           ` Mel Gorman
  2009-07-08 21:32             ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2009-07-08 15:27 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Narayanan Gopalakrishnan, 'KAMEZAWA Hiroyuki', linux-mm,
	akpm, kosaki.motohiro

On Wed, Jul 08, 2009 at 10:53:38AM -0400, Christoph Lameter wrote:
> On Wed, 8 Jul 2009, Narayanan Gopalakrishnan wrote:
> 
> > We have done some stress testing using fsstress (LTP).
> > This patch seems to work fine with our OMAP based targets.
> > Can we have this merged?
> 
> Please post the patch that you tested. I am a bit confused due to
> topposting. There were several outstanding issues in the message you
> included.
> 

I know which patch he is on about, it's entitled "page-allocator: Preserve
PFN ordering when __GFP_COLD is set". There are a number of patches that
I don't believe have made it upstream or into mmotm but I've lost track
of what is in flight and what isn't. When an mmotm against 2.6.31-rc2 is
out, I'll be going through it again to see what made it in and resending
patches as appropriate.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance degradation seen after using one list for hot/coldpages.
  2009-07-08 15:27           ` Mel Gorman
@ 2009-07-08 21:32             ` Andrew Morton
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2009-07-08 21:32 UTC (permalink / raw)
  To: Mel Gorman; +Cc: cl, narayanan.g, kamezawa.hiroyu, linux-mm, kosaki.motohiro

> On Wed, 8 Jul 2009 16:27:55 +0100 Mel Gorman <mel@csn.ul.ie> wrote:
> There are a number of patches that
> I don't believe have made it upstream or into mmotm but I've lost track
> of what is in flight and what isn't. When an mmotm against 2.6.31-rc2 is
> out, I'll be going through it again to see what made it in and resending
> patches as appropriate.

I appear to be stuck in the wrong country again and won't be very
functional until next week, sorry.  As usual, resending stuff doesn't hurt,
especially when that stuff was buried in the middle of a long email trail
under a quite different Subject:.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Performance degradation seen after using one list for hot/cold pages.
@ 2009-06-22  5:50 Narayanan Gopalakrishnan
  2009-06-22  7:41 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 14+ messages in thread
From: Narayanan Gopalakrishnan @ 2009-06-22  5:50 UTC (permalink / raw)
  To: linux-mm

Hi,

We are facing a performance degradation of 2 MBps in kernels 2.6.25 and
above.
We were able to zero on the fact that the exact patch that has affected us
is this
(http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdi
ff;h=3dfa5721f12c3d5a441448086bee156887daa961), that changes to have one
list for hot/cold pages. 

We see the at the block driver the pages we get are not contiguous hence the
number of LLD requests we are making have increased which is the cause of
this problem.

The page allocation in our case is called from aio_read and hence it always
calls page_cache_alloc_cold(mapping) from readahead.

We have found a hack for this that is, removing the __GFP_COLD macro when
__page_cache_alloc()is called helps us to regain the performance as we see
contiguous pages in block driver.

Has anyone faced this problem or can give a possible solution for this?

Our target is OMAP2430 custom board with 128MB RAM.

Regards,

Narayanan Gopalakrishnan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance degradation seen after using one list for hot/cold pages.
  2009-06-22  5:50 Performance degradation seen after using one list for hot/cold pages Narayanan Gopalakrishnan
@ 2009-06-22  7:41 ` KAMEZAWA Hiroyuki
  2009-06-22 10:06   ` Mel Gorman
  0 siblings, 1 reply; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-22  7:41 UTC (permalink / raw)
  To: Narayanan Gopalakrishnan
  Cc: linux-mm, mel, cl, akpm@linux-foundation.org,
	kosaki.motohiro@jp.fujitsu.com

On Mon, 22 Jun 2009 11:20:14 +0530
Narayanan Gopalakrishnan <narayanan.g@samsung.com> wrote:

> Hi,
> 
> We are facing a performance degradation of 2 MBps in kernels 2.6.25 and
> above.
> We were able to zero on the fact that the exact patch that has affected us
> is this
> (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdi
> ff;h=3dfa5721f12c3d5a441448086bee156887daa961), that changes to have one
> list for hot/cold pages. 
> 
> We see the at the block driver the pages we get are not contiguous hence the
> number of LLD requests we are making have increased which is the cause of
> this problem.
> 
> The page allocation in our case is called from aio_read and hence it always
> calls page_cache_alloc_cold(mapping) from readahead.
> 
> We have found a hack for this that is, removing the __GFP_COLD macro when
> __page_cache_alloc()is called helps us to regain the performance as we see
> contiguous pages in block driver.
> 
> Has anyone faced this problem or can give a possible solution for this?
> 
> Our target is OMAP2430 custom board with 128MB RAM.
> 
Added some CCs.

My understanding is this: 

Assume A,B,C,D are pfn of continuous pages. (B=A+1, C=A+2, D=A+3)

1) When there are 2 lists for hot and cold pages, pcp list is constracted in
   following order after rmqueue_bulk().

   pcp_list[cold] (next) <-> A <-> B <-> C <-> D <-(prev) pcp_list[cold]

   The pages are drained from "next" and pages were given in sequence of
   A, B, C, D...

2) Now, pcp list is constracted as following after  rmqueue_bulk()

	pcp_list (next) <-> A <-> B <-> C <-> D <-> (prev) pcp_list

   When __GFP_COLD, the page is drained via "prev" and sequence of given pages
   is D,C,B,A...

   Then, removing __GFP_COLD allows you to allocate pages in sequence of
   A, B, C, D.

Looking into page_alloc.c::rmqueue_bulk(),
 871     /*
 872      * Split buddy pages returned by expand() are received here
 873      * in physical page order. The page is added to the callers and
 874      * list and the list head then moves forward. From the callers
 875      * perspective, the linked list is ordered by page number in
 876      * some conditions. This is useful for IO devices that can
 877      * merge IO requests if the physical pages are ordered
 878      * properly.
 879      */

Order of pfn is taken into account but doesn't work well for __GFP_COLD
allocation. (works well for not __GFP_COLD allocation.)
Using 2 lists again or modify current behavior ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance degradation seen after using one list for hot/cold pages.
  2009-06-22  7:41 ` KAMEZAWA Hiroyuki
@ 2009-06-22 10:06   ` Mel Gorman
  2009-07-08 21:29     ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2009-06-22 10:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Narayanan Gopalakrishnan, linux-mm, cl, akpm@linux-foundation.org,
	kosaki.motohiro@jp.fujitsu.com

On Mon, Jun 22, 2009 at 04:41:47PM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 22 Jun 2009 11:20:14 +0530
> Narayanan Gopalakrishnan <narayanan.g@samsung.com> wrote:
> 
> > Hi,
> > 
> > We are facing a performance degradation of 2 MBps in kernels 2.6.25 and
> > above.
> > We were able to zero on the fact that the exact patch that has affected us
> > is this
> > (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdi
> > ff;h=3dfa5721f12c3d5a441448086bee156887daa961), that changes to have one
> > list for hot/cold pages. 
> > 
> > We see the at the block driver the pages we get are not contiguous hence the
> > number of LLD requests we are making have increased which is the cause of
> > this problem.
> > 
> > The page allocation in our case is called from aio_read and hence it always
> > calls page_cache_alloc_cold(mapping) from readahead.
> > 
> > We have found a hack for this that is, removing the __GFP_COLD macro when
> > __page_cache_alloc()is called helps us to regain the performance as we see
> > contiguous pages in block driver.
> > 
> > Has anyone faced this problem or can give a possible solution for this?
> > 

I've seen this problem before. In the 2.6.24 timeframe, performance degradation
of IO was reported when I broke the property of the buddy allocator that
returns contiguous pages in some cases. IIRC, some IO devices can automatically
merge requests if the pages happen to be physically contiguous.

> > Our target is OMAP2430 custom board with 128MB RAM.
> > 
> Added some CCs.
> 
> My understanding is this: 
> 
> Assume A,B,C,D are pfn of continuous pages. (B=A+1, C=A+2, D=A+3)
> 
> 1) When there are 2 lists for hot and cold pages, pcp list is constracted in
>    following order after rmqueue_bulk().
> 
>    pcp_list[cold] (next) <-> A <-> B <-> C <-> D <-(prev) pcp_list[cold]
> 
>    The pages are drained from "next" and pages were given in sequence of
>    A, B, C, D...
> 
> 2) Now, pcp list is constracted as following after  rmqueue_bulk()
> 
> 	pcp_list (next) <-> A <-> B <-> C <-> D <-> (prev) pcp_list
> 
>    When __GFP_COLD, the page is drained via "prev" and sequence of given pages
>    is D,C,B,A...
> 
>    Then, removing __GFP_COLD allows you to allocate pages in sequence of
>    A, B, C, D.
> 
> Looking into page_alloc.c::rmqueue_bulk(),
>  871     /*
>  872      * Split buddy pages returned by expand() are received here
>  873      * in physical page order. The page is added to the callers and
>  874      * list and the list head then moves forward. From the callers
>  875      * perspective, the linked list is ordered by page number in
>  876      * some conditions. This is useful for IO devices that can
>  877      * merge IO requests if the physical pages are ordered
>  878      * properly.
>  879      */
> 
> Order of pfn is taken into account but doesn't work well for __GFP_COLD
> allocation. (works well for not __GFP_COLD allocation.)
> Using 2 lists again or modify current behavior ?
> 

This analysis looks spot-on. The lack of physical contiguity is what is
critical, not that the pages are hot or cold in cache. I think it would be
overkill to reintroduce two separate lists to preserve the ordering in
that case. How about something like the following?

==== CUT HERE ====
[PATCH] page-allocator: Preserve PFN ordering when __GFP_COLD is set

The page allocator tries to preserve contiguous PFN ordering when returning
pages such that repeated callers to the allocator have a strong chance of
getting physically contiguous pages, particularly when external fragmentation
is low. However, of the bulk of the allocations have __GFP_COLD set as
they are due to aio_read() for example, then the PFNs are in reverse PFN
order. This can cause performance degration when used with IO
controllers that could have merged the requests.

This patch attempts to preserve the contiguous ordering of PFNs for
users of __GFP_COLD.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
--- 
 mm/page_alloc.c |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a5f3c27..9cd32c8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -882,7 +882,7 @@ retry_reserve:
  */
 static int rmqueue_bulk(struct zone *zone, unsigned int order, 
 			unsigned long count, struct list_head *list,
-			int migratetype)
+			int migratetype, int cold)
 {
 	int i;
 	
@@ -901,7 +901,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		 * merge IO requests if the physical pages are ordered
 		 * properly.
 		 */
-		list_add(&page->lru, list);
+		if (likely(cold == 0))
+			list_add(&page->lru, list);
+		else
+			list_add_tail(&page->lru, list);
 		set_page_private(page, migratetype);
 		list = &page->lru;
 	}
@@ -1119,7 +1122,8 @@ again:
 		local_irq_save(flags);
 		if (!pcp->count) {
 			pcp->count = rmqueue_bulk(zone, 0,
-					pcp->batch, &pcp->list, migratetype);
+					pcp->batch, &pcp->list,
+					migratetype, cold);
 			if (unlikely(!pcp->count))
 				goto failed;
 		}
@@ -1138,7 +1142,8 @@ again:
 		/* Allocate more to the pcp list if necessary */
 		if (unlikely(&page->lru == &pcp->list)) {
 			pcp->count += rmqueue_bulk(zone, 0,
-					pcp->batch, &pcp->list, migratetype);
+					pcp->batch, &pcp->list,
+					migratetype, cold);
 			page = list_entry(pcp->list.next, struct page, lru);
 		}
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Performance degradation seen after using one list for hot/cold pages.
  2009-06-22 10:06   ` Mel Gorman
@ 2009-07-08 21:29     ` Andrew Morton
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2009-07-08 21:29 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kamezawa.hiroyu, narayanan.g, linux-mm, cl, kosaki.motohiro,
	stable, linux-kernel, linux-scsi

(cc stable, linux-kernel and linux-scsi)

> On Mon, 22 Jun 2009 11:06:32 +0100 Mel Gorman <mel@csn.ul.ie> wrote:
> [PATCH] page-allocator: Preserve PFN ordering when __GFP_COLD is set
> 
> The page allocator tries to preserve contiguous PFN ordering when returning
> pages such that repeated callers to the allocator have a strong chance of
> getting physically contiguous pages, particularly when external fragmentation
> is low. However, of the bulk of the allocations have __GFP_COLD set as
> they are due to aio_read() for example, then the PFNs are in reverse PFN
> order. This can cause performance degration when used with IO
> controllers that could have merged the requests.
> 
> This patch attempts to preserve the contiguous ordering of PFNs for
> users of __GFP_COLD.

Thanks.

I'll add the rather important text:

  Fix a post-2.6.24 performance regression caused by
  3dfa5721f12c3d5a441448086bee156887daa961 ("page-allocator: preserve PFN
  ordering when __GFP_COLD is set").

This was a pretty major screwup.

This is why changing core MM is so worrisome - there's so much secret and
subtle history to it, and performance dependencies are unobvious and quite
indirect and the lag time to discover regressions is long.

Narayanan, are you able to quantify the regression more clearly?  All I
have is "2 MBps lower" which isn't very useful.  What is this as a
percentage, and with what sort of disk controller?  Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-07-08 21:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-22 11:32 Re: Re: Performance degradation seen after using one list for hot/cold pages NARAYANAN GOPALAKRISHNAN
2009-06-22 16:52 ` Mel Gorman
2009-06-23  0:06   ` KAMEZAWA Hiroyuki
2009-06-23 17:41     ` Christoph Lameter
2009-06-25  9:11     ` Minchan Kim
2009-06-29  9:15     ` Mel Gorman
2009-07-08  2:37       ` Performance degradation seen after using one list for hot/coldpages Narayanan Gopalakrishnan
2009-07-08 14:53         ` Christoph Lameter
2009-07-08 15:27           ` Mel Gorman
2009-07-08 21:32             ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2009-06-22  5:50 Performance degradation seen after using one list for hot/cold pages Narayanan Gopalakrishnan
2009-06-22  7:41 ` KAMEZAWA Hiroyuki
2009-06-22 10:06   ` Mel Gorman
2009-07-08 21:29     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).