Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
       [not found]                   ` <20071213200958.GK10104@kernel.dk>
@ 2007-12-13 22:02                     ` Andrew Morton
  2007-12-13 22:15                       ` James Bottomley
  2007-12-13 22:17                       ` Jens Axboe
  2007-12-13 22:02                     ` VM allocates pages in reverse order again Matthew Wilcox
  1 sibling, 2 replies; 33+ messages in thread
From: Andrew Morton @ 2007-12-13 22:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: liml, lkml, matthew, linux-ide, linux-kernel, linux-scsi,
	linux-mm, Mel Gorman

On Thu, 13 Dec 2007 21:09:59 +0100
Jens Axboe <jens.axboe@oracle.com> wrote:

>
> OK, it's a vm issue,

cc linux-mm and probable culprit.

>  I have tens of thousand "backward" pages after a
> boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
> reverse. So it looks like that bug got reintroduced.

Bill Irwin fixed this a couple of years back: changed the page allocator so
that it mostly hands out pages in ascending physical-address order.

I guess we broke that, quite possibly in Mel's page allocator rework.

It would help if you could provide us with a simple recipe for
demonstrating this problem, please.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* VM allocates pages in reverse order again
       [not found]                   ` <20071213200958.GK10104@kernel.dk>
  2007-12-13 22:02                     ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Andrew Morton
@ 2007-12-13 22:02                     ` Matthew Wilcox
  1 sibling, 0 replies; 33+ messages in thread
From: Matthew Wilcox @ 2007-12-13 22:02 UTC (permalink / raw)
  To: Jens Axboe, linux-mm
  Cc: Mark Lord, Mark Lord, IDE/ATA development list, Linux Kernel,
	linux-scsi

On Thu, Dec 13, 2007 at 09:09:59PM +0100, Jens Axboe wrote:
> > >diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
> > >index e30b1a4..1e34b6f 100644
> > >--- a/block/ll_rw_blk.c
> > >+++ b/block/ll_rw_blk.c
> > >@@ -1349,6 +1351,8 @@ new_segment:
> > > 				sg = sg_next(sg);
> > > 			}
> > > 
> > >+			if (bvprv && (page_address(bvprv->bv_page) + 
> > >bvprv->bv_len == page_address(bvec->bv_page)))
> > >+				printk("missed merge\n");
> > > 			sg_set_page(sg, bvec->bv_page, nbytes, 
> > > 			bvec->bv_offset);
> > > 			nsegs++;
> > > 		}
> > >
> > ..
> > 
> > Yeah, the first part is similar to my own hack.
> > 
> > For testing, try "dd if=/dev/sda of=/dev/null bs=4096k".
> > That *really* should end up using contiguous pages on most systems.
> > 
> > I figured out the git thing, and am now building some in-between kernels to 
> > try.
> 
> OK, it's a vm issue, I have tens of thousand "backward" pages after a
> boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
> reverse. So it looks like that bug got reintroduced.

Perhaps we should ask the -mm folks if they happen to have an idea what
caused it ...

Background: we're seeing pages allocated in reverse order after boot.
This causes IO performance problems on machines without IOMMUs as we
can't merge pages when they're allocated in the wrong order.  This is
something that went wrong between 2.6.23 and 2.6.24-rc5.

Bill Irwin had a patch that fixed this; it was merged months ago, but
the effects of it seem to have been undone.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-13 22:02                     ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Andrew Morton
@ 2007-12-13 22:15                       ` James Bottomley
  2007-12-13 22:29                         ` Andrew Morton
  2007-12-13 22:17                       ` Jens Axboe
  1 sibling, 1 reply; 33+ messages in thread
From: James Bottomley @ 2007-12-13 22:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jens Axboe, liml, lkml, matthew, linux-ide, linux-kernel,
	linux-scsi, linux-mm, Mel Gorman

On Thu, 2007-12-13 at 14:02 -0800, Andrew Morton wrote:
> On Thu, 13 Dec 2007 21:09:59 +0100
> Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> >
> > OK, it's a vm issue,
> 
> cc linux-mm and probable culprit.
> 
> >  I have tens of thousand "backward" pages after a
> > boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
> > reverse. So it looks like that bug got reintroduced.
> 
> Bill Irwin fixed this a couple of years back: changed the page allocator so
> that it mostly hands out pages in ascending physical-address order.
> 
> I guess we broke that, quite possibly in Mel's page allocator rework.
> 
> It would help if you could provide us with a simple recipe for
> demonstrating this problem, please.

The simple way seems to be to malloc a large area, touch every page and
then look at the physical pages assigned ... they now mostly seem to be
descending in physical address.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-13 22:02                     ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Andrew Morton
  2007-12-13 22:15                       ` James Bottomley
@ 2007-12-13 22:17                       ` Jens Axboe
  1 sibling, 0 replies; 33+ messages in thread
From: Jens Axboe @ 2007-12-13 22:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: liml, lkml, matthew, linux-ide, linux-kernel, linux-scsi,
	linux-mm, Mel Gorman

On Thu, Dec 13 2007, Andrew Morton wrote:
> On Thu, 13 Dec 2007 21:09:59 +0100
> Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> >
> > OK, it's a vm issue,
> 
> cc linux-mm and probable culprit.
> 
> >  I have tens of thousand "backward" pages after a
> > boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
> > reverse. So it looks like that bug got reintroduced.
> 
> Bill Irwin fixed this a couple of years back: changed the page allocator so
> that it mostly hands out pages in ascending physical-address order.
> 
> I guess we broke that, quite possibly in Mel's page allocator rework.
> 
> It would help if you could provide us with a simple recipe for
> demonstrating this problem, please.

Basically anything involving IO :-). A boot here showed a handful of
good merges, and probably in the order of 100,000 descending
allocations. A kernel make is a fine test as well.

Something like the below should work fine - if you see oodles of these
basicaly doing any type of IO, then you are screwed.

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index e30b1a4..8ce3fcc 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1349,6 +1349,10 @@ new_segment:
 				sg = sg_next(sg);
 			}
 
+			if (bvprv) {
+				if (page_address(bvec->bv_page) + PAGE_SIZE == page_address(bvprv->bv_page) && printk_ratelimit())
+					printk("page alloc order backwards\n");
+			}
 			sg_set_page(sg, bvec->bv_page, nbytes, bvec->bv_offset);
 			nsegs++;
 		}

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-13 22:15                       ` James Bottomley
@ 2007-12-13 22:29                         ` Andrew Morton
  2007-12-13 22:33                           ` Mark Lord
  2007-12-15  1:09                           ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Mel Gorman
  0 siblings, 2 replies; 33+ messages in thread
From: Andrew Morton @ 2007-12-13 22:29 UTC (permalink / raw)
  To: James Bottomley
  Cc: jens.axboe, liml, lkml, matthew, linux-ide, linux-kernel,
	linux-scsi, linux-mm, mel

On Thu, 13 Dec 2007 17:15:06 -0500
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> 
> On Thu, 2007-12-13 at 14:02 -0800, Andrew Morton wrote:
> > On Thu, 13 Dec 2007 21:09:59 +0100
> > Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > >
> > > OK, it's a vm issue,
> > 
> > cc linux-mm and probable culprit.
> > 
> > >  I have tens of thousand "backward" pages after a
> > > boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
> > > reverse. So it looks like that bug got reintroduced.
> > 
> > Bill Irwin fixed this a couple of years back: changed the page allocator so
> > that it mostly hands out pages in ascending physical-address order.
> > 
> > I guess we broke that, quite possibly in Mel's page allocator rework.
> > 
> > It would help if you could provide us with a simple recipe for
> > demonstrating this problem, please.
> 
> The simple way seems to be to malloc a large area, touch every page and
> then look at the physical pages assigned ... they now mostly seem to be
> descending in physical address.
> 

OIC.  -mm's /proc/pid/pagemap can be used to get the pfn's...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-13 22:29                         ` Andrew Morton
@ 2007-12-13 22:33                           ` Mark Lord
  2007-12-13 23:13                             ` Mark Lord
  2007-12-15  1:09                           ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Mel Gorman
  1 sibling, 1 reply; 33+ messages in thread
From: Mark Lord @ 2007-12-13 22:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, jens.axboe, lkml, matthew, linux-ide,
	linux-kernel, linux-scsi, linux-mm, mel

Andrew Morton wrote:
> On Thu, 13 Dec 2007 17:15:06 -0500
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
>> On Thu, 2007-12-13 at 14:02 -0800, Andrew Morton wrote:
>>> On Thu, 13 Dec 2007 21:09:59 +0100
>>> Jens Axboe <jens.axboe@oracle.com> wrote:
>>>
>>>> OK, it's a vm issue,
>>> cc linux-mm and probable culprit.
>>>
>>>>  I have tens of thousand "backward" pages after a
>>>> boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
>>>> reverse. So it looks like that bug got reintroduced.
>>> Bill Irwin fixed this a couple of years back: changed the page allocator so
>>> that it mostly hands out pages in ascending physical-address order.
>>>
>>> I guess we broke that, quite possibly in Mel's page allocator rework.
>>>
>>> It would help if you could provide us with a simple recipe for
>>> demonstrating this problem, please.
>> The simple way seems to be to malloc a large area, touch every page and
>> then look at the physical pages assigned ... they now mostly seem to be
>> descending in physical address.
>>
> 
> OIC.  -mm's /proc/pid/pagemap can be used to get the pfn's...
..

I'm actually running the treadmill right now (have been for many hours, actually,
to bisect it to a specific commit.

Thought I was almost done, and then noticed that git-bisect doesn't keep
the Makefile VERSION lines the same, so I was actually running the wrong
kernel after the first few times.. duh.

Wrote a script to fix it now.

-ml

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-13 22:33                           ` Mark Lord
@ 2007-12-13 23:13                             ` Mark Lord
  2007-12-14  0:05                               ` Mark Lord
  0 siblings, 1 reply; 33+ messages in thread
From: Mark Lord @ 2007-12-13 23:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, jens.axboe, lkml, matthew, linux-ide,
	linux-kernel, linux-scsi, linux-mm, mel

Mark Lord wrote:
> Andrew Morton wrote:
>> On Thu, 13 Dec 2007 17:15:06 -0500
>> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>
>>> On Thu, 2007-12-13 at 14:02 -0800, Andrew Morton wrote:
>>>> On Thu, 13 Dec 2007 21:09:59 +0100
>>>> Jens Axboe <jens.axboe@oracle.com> wrote:
>>>>
>>>>> OK, it's a vm issue,
>>>> cc linux-mm and probable culprit.
>>>>
>>>>>  I have tens of thousand "backward" pages after a
>>>>> boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
>>>>> reverse. So it looks like that bug got reintroduced.
>>>> Bill Irwin fixed this a couple of years back: changed the page 
>>>> allocator so
>>>> that it mostly hands out pages in ascending physical-address order.
>>>>
>>>> I guess we broke that, quite possibly in Mel's page allocator rework.
>>>>
>>>> It would help if you could provide us with a simple recipe for
>>>> demonstrating this problem, please.
>>> The simple way seems to be to malloc a large area, touch every page and
>>> then look at the physical pages assigned ... they now mostly seem to be
>>> descending in physical address.
>>>
>>
>> OIC.  -mm's /proc/pid/pagemap can be used to get the pfn's...
> ..
> 
> I'm actually running the treadmill right now (have been for many hours, 
> actually,
> to bisect it to a specific commit.
> 
> Thought I was almost done, and then noticed that git-bisect doesn't keep
> the Makefile VERSION lines the same, so I was actually running the wrong
> kernel after the first few times.. duh.
> 
> Wrote a script to fix it now.
..

Well, that was a waste of three hours.

Somebody else can try it now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-13 23:13                             ` Mark Lord
@ 2007-12-14  0:05                               ` Mark Lord
  2007-12-14  0:30                                 ` Mark Lord
  0 siblings, 1 reply; 33+ messages in thread
From: Mark Lord @ 2007-12-14  0:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, jens.axboe, lkml, matthew, linux-ide,
	linux-kernel, linux-scsi, linux-mm, mel

Mark Lord wrote:
> Mark Lord wrote:
>> Andrew Morton wrote:
>>> On Thu, 13 Dec 2007 17:15:06 -0500
>>> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>
>>>> On Thu, 2007-12-13 at 14:02 -0800, Andrew Morton wrote:
>>>>> On Thu, 13 Dec 2007 21:09:59 +0100
>>>>> Jens Axboe <jens.axboe@oracle.com> wrote:
>>>>>
>>>>>> OK, it's a vm issue,
>>>>> cc linux-mm and probable culprit.
>>>>>
>>>>>>  I have tens of thousand "backward" pages after a
>>>>>> boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
>>>>>> reverse. So it looks like that bug got reintroduced.
>>>>> Bill Irwin fixed this a couple of years back: changed the page 
>>>>> allocator so
>>>>> that it mostly hands out pages in ascending physical-address order.
>>>>>
>>>>> I guess we broke that, quite possibly in Mel's page allocator rework.
>>>>>
>>>>> It would help if you could provide us with a simple recipe for
>>>>> demonstrating this problem, please.
>>>> The simple way seems to be to malloc a large area, touch every page and
>>>> then look at the physical pages assigned ... they now mostly seem to be
>>>> descending in physical address.
>>>>
>>>
>>> OIC.  -mm's /proc/pid/pagemap can be used to get the pfn's...
>> ..
>>
>> I'm actually running the treadmill right now (have been for many 
>> hours, actually,
>> to bisect it to a specific commit.
>>
>> Thought I was almost done, and then noticed that git-bisect doesn't keep
>> the Makefile VERSION lines the same, so I was actually running the wrong
>> kernel after the first few times.. duh.
>>
>> Wrote a script to fix it now.
> ..
> 
> Well, that was a waste of three hours.
..

Ahh.. it seems to be sensitive to one/both of these:

CONFIG_HIGHMEM64G=y with 4GB RAM:  not so bad, frequently does 20KB - 48KB segments.
CONFIG_HIGHMEM4G=y  with 2GB RAM:  very severe, rarely does more than 8KB segments.
CONFIG_HIGHMEM4G=y  with 3GB RAM:  very severe, rarely does more than 8KB segments.

So if you want to reproduce this on a large memory machine, use "mem=2GB" for starters.

Still testing..



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-14  0:05                               ` Mark Lord
@ 2007-12-14  0:30                                 ` Mark Lord
  2007-12-14  0:37                                   ` Andrew Morton
  2007-12-14  0:40                                   ` [PATCH] fix page_alloc for larger I/O segments Mark Lord
  0 siblings, 2 replies; 33+ messages in thread
From: Mark Lord @ 2007-12-14  0:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, jens.axboe, lkml, matthew, linux-ide,
	linux-kernel, linux-scsi, linux-mm, mel

Mark Lord wrote:
> Mark Lord wrote:
>> Mark Lord wrote:
>>> Andrew Morton wrote:
>>>> On Thu, 13 Dec 2007 17:15:06 -0500
>>>> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>>
>>>>> On Thu, 2007-12-13 at 14:02 -0800, Andrew Morton wrote:
>>>>>> On Thu, 13 Dec 2007 21:09:59 +0100
>>>>>> Jens Axboe <jens.axboe@oracle.com> wrote:
>>>>>>
>>>>>>> OK, it's a vm issue,
>>>>>> cc linux-mm and probable culprit.
>>>>>>
>>>>>>>  I have tens of thousand "backward" pages after a
>>>>>>> boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
>>>>>>> reverse. So it looks like that bug got reintroduced.
>>>>>> Bill Irwin fixed this a couple of years back: changed the page 
>>>>>> allocator so
>>>>>> that it mostly hands out pages in ascending physical-address order.
>>>>>>
>>>>>> I guess we broke that, quite possibly in Mel's page allocator rework.
>>>>>>
>>>>>> It would help if you could provide us with a simple recipe for
>>>>>> demonstrating this problem, please.
>>>>> The simple way seems to be to malloc a large area, touch every page 
>>>>> and
>>>>> then look at the physical pages assigned ... they now mostly seem 
>>>>> to be
>>>>> descending in physical address.
>>>>>
>>>>
>>>> OIC.  -mm's /proc/pid/pagemap can be used to get the pfn's...
>>> ..
>>>
>>> I'm actually running the treadmill right now (have been for many 
>>> hours, actually,
>>> to bisect it to a specific commit.
>>>
>>> Thought I was almost done, and then noticed that git-bisect doesn't keep
>>> the Makefile VERSION lines the same, so I was actually running the wrong
>>> kernel after the first few times.. duh.
>>>
>>> Wrote a script to fix it now.
>> ..
>>
>> Well, that was a waste of three hours.
> ..
> 
> Ahh.. it seems to be sensitive to one/both of these:
> 
> CONFIG_HIGHMEM64G=y with 4GB RAM:  not so bad, frequently does 20KB - 
> 48KB segments.
> CONFIG_HIGHMEM4G=y  with 2GB RAM:  very severe, rarely does more than 
> 8KB segments.
> CONFIG_HIGHMEM4G=y  with 3GB RAM:  very severe, rarely does more than 
> 8KB segments.
> 
> So if you want to reproduce this on a large memory machine, use 
> "mem=2GB" for starters.
..

Here's the commit that causes the regression:

535131e6925b4a95f321148ad7293f496e0e58d7 Choose pages from the per-cpu list based on migration type


From: Mel Gorman <mel@csn.ul.ie>
Date: Tue, 16 Oct 2007 08:25:49 +0000 (-0700)
Subject: Choose pages from the per-cpu list based on migration type
X-Git-Tag: v2.6.24-rc1~1141
X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=535131e6925b4a95f321148ad7293f496e0e58d7;hp=b2a0ac8875a0a3b9f0739b60526f8c5977d2200f

Choose pages from the per-cpu list based on migration type

The freelists for each migrate type can slowly become polluted due to the
per-cpu list.  Consider what happens when the following happens

1. A 2^(MAX_ORDER-1) list is reserved for __GFP_MOVABLE pages
2. An order-0 page is allocated from the newly reserved block
3. The page is freed and placed on the per-cpu list
4. alloc_page() is called with GFP_KERNEL as the gfp_mask
5. The per-cpu list is used to satisfy the allocation

This results in a kernel page is in the middle of a migratable region. This
patch prevents this leak occuring by storing the MIGRATE_ type of the page in
page->private. On allocate, a page will only be returned of the desired type,
else more pages will be allocated. This may temporarily allow a per-cpu list
to go over the pcp->high limit but it'll be corrected on the next free. Care
is taken to preserve the hotness of pages recently freed.

The additional code is not measurably slower for the workloads we've tested.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d54ecf4..e3e726b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -760,7 +760,8 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		struct page *page = __rmqueue(zone, order, migratetype);
 		if (unlikely(page == NULL))
 			break;
-		list_add_tail(&page->lru, list);
+		list_add(&page->lru, list);
+		set_page_private(page, migratetype);
 	}
 	spin_unlock(&zone->lock);
 	return i;
@@ -887,6 +888,7 @@ static void fastcall free_hot_cold_page(struct page *page, int cold)
 	local_irq_save(flags);
 	__count_vm_event(PGFREE);
 	list_add(&page->lru, &pcp->list);
+	set_page_private(page, get_pageblock_migratetype(page));
 	pcp->count++;
 	if (pcp->count >= pcp->high) {
 		free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
@@ -951,9 +953,27 @@ again:
 			if (unlikely(!pcp->count))
 				goto failed;
 		}
-		page = list_entry(pcp->list.next, struct page, lru);
-		list_del(&page->lru);
-		pcp->count--;
+		/* Find a page of the appropriate migrate type */
+		list_for_each_entry(page, &pcp->list, lru) {
+			if (page_private(page) == migratetype) {
+				list_del(&page->lru);
+				pcp->count--;
+				break;
+			}
+		}
+
+		/*
+		 * Check if a page of the appropriate migrate type
+		 * was found. If not, allocate more to the pcp list
+		 */
+		if (&page->lru == &pcp->list) {
+			pcp->count += rmqueue_bulk(zone, 0,
+					pcp->batch, &pcp->list, migratetype);
+			page = list_entry(pcp->list.next, struct page, lru);
+			VM_BUG_ON(page_private(page) != migratetype);
+			list_del(&page->lru);
+			pcp->count--;
+		}
 	} else {
 		spin_lock_irqsave(&zone->lock, flags);
 		page = __rmqueue(zone, order, migratetype);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-14  0:30                                 ` Mark Lord
@ 2007-12-14  0:37                                   ` Andrew Morton
  2007-12-14  0:42                                     ` Mark Lord
  2007-12-14 11:50                                     ` Mel Gorman
  2007-12-14  0:40                                   ` [PATCH] fix page_alloc for larger I/O segments Mark Lord
  1 sibling, 2 replies; 33+ messages in thread
From: Andrew Morton @ 2007-12-14  0:37 UTC (permalink / raw)
  To: Mark Lord
  Cc: James.Bottomley, jens.axboe, lkml, matthew, linux-ide,
	linux-kernel, linux-scsi, linux-mm, mel

On Thu, 13 Dec 2007 19:30:00 -0500
Mark Lord <liml@rtr.ca> wrote:

> Here's the commit that causes the regression:
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -760,7 +760,8 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>  		struct page *page = __rmqueue(zone, order, migratetype);
>  		if (unlikely(page == NULL))
>  			break;
> -		list_add_tail(&page->lru, list);
> +		list_add(&page->lru, list);

well that looks fishy.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] fix page_alloc for larger I/O segments
  2007-12-14  0:30                                 ` Mark Lord
  2007-12-14  0:37                                   ` Andrew Morton
@ 2007-12-14  0:40                                   ` Mark Lord
  2007-12-14  1:03                                     ` Andrew Morton
  1 sibling, 1 reply; 33+ messages in thread
From: Mark Lord @ 2007-12-14  0:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, jens.axboe, lkml, matthew, linux-ide,
	linux-kernel, linux-scsi, linux-mm, mel

Mark Lord wrote:
> Mark Lord wrote:
>> Mark Lord wrote:
>>> Mark Lord wrote:
>>>> Andrew Morton wrote:
>>>>> On Thu, 13 Dec 2007 17:15:06 -0500
>>>>> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>>>
>>>>>> On Thu, 2007-12-13 at 14:02 -0800, Andrew Morton wrote:
>>>>>>> On Thu, 13 Dec 2007 21:09:59 +0100
>>>>>>> Jens Axboe <jens.axboe@oracle.com> wrote:
>>>>>>>
>>>>>>>> OK, it's a vm issue,
>>>>>>> cc linux-mm and probable culprit.
>>>>>>>
>>>>>>>>  I have tens of thousand "backward" pages after a
>>>>>>>> boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
>>>>>>>> reverse. So it looks like that bug got reintroduced.
>>>>>>> Bill Irwin fixed this a couple of years back: changed the page 
>>>>>>> allocator so
>>>>>>> that it mostly hands out pages in ascending physical-address order.
>>>>>>>
>>>>>>> I guess we broke that, quite possibly in Mel's page allocator 
>>>>>>> rework.
>>>>>>>
>>>>>>> It would help if you could provide us with a simple recipe for
>>>>>>> demonstrating this problem, please.
>>>>>> The simple way seems to be to malloc a large area, touch every 
>>>>>> page and
>>>>>> then look at the physical pages assigned ... they now mostly seem 
>>>>>> to be
>>>>>> descending in physical address.
>>>>>>
>>>>>
>>>>> OIC.  -mm's /proc/pid/pagemap can be used to get the pfn's...
>>>> ..
>>>>
>>>> I'm actually running the treadmill right now (have been for many 
>>>> hours, actually,
>>>> to bisect it to a specific commit.
>>>>
>>>> Thought I was almost done, and then noticed that git-bisect doesn't 
>>>> keep
>>>> the Makefile VERSION lines the same, so I was actually running the 
>>>> wrong
>>>> kernel after the first few times.. duh.
>>>>
>>>> Wrote a script to fix it now.
>>> ..
>>>
>>> Well, that was a waste of three hours.
>> ..
>>
>> Ahh.. it seems to be sensitive to one/both of these:
>>
>> CONFIG_HIGHMEM64G=y with 4GB RAM:  not so bad, frequently does 20KB - 
>> 48KB segments.
>> CONFIG_HIGHMEM4G=y  with 2GB RAM:  very severe, rarely does more than 
>> 8KB segments.
>> CONFIG_HIGHMEM4G=y  with 3GB RAM:  very severe, rarely does more than 
>> 8KB segments.
>>
>> So if you want to reproduce this on a large memory machine, use 
>> "mem=2GB" for starters.
> ..
> 
> Here's the commit that causes the regression:
> 
> 535131e6925b4a95f321148ad7293f496e0e58d7 Choose pages from the per-cpu 
> list based on migration type
> 

And here is a patch that seems to fix it for me here:

* * * *

Fix page allocator to give better change of larger contiguous segments (again).

Signed-off-by: Mark Lord <mlord@pobox.com
---


--- old/mm/page_alloc.c.orig	2007-12-13 19:25:15.000000000 -0500
+++ linux-2.6/mm/page_alloc.c	2007-12-13 19:35:50.000000000 -0500
@@ -954,7 +954,7 @@
 				goto failed;
 		}
 		/* Find a page of the appropriate migrate type */
-		list_for_each_entry(page, &pcp->list, lru) {
+		list_for_each_entry_reverse(page, &pcp->list, lru) {
 			if (page_private(page) == migratetype) {
 				list_del(&page->lru);
 				pcp->count--;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-14  0:37                                   ` Andrew Morton
@ 2007-12-14  0:42                                     ` Mark Lord
  2007-12-14  0:46                                       ` [PATCH] fix page_alloc for larger I/O segments (improved) Mark Lord
  2007-12-14  0:47                                       ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Mark Lord
  2007-12-14 11:50                                     ` Mel Gorman
  1 sibling, 2 replies; 33+ messages in thread
From: Mark Lord @ 2007-12-14  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James.Bottomley, jens.axboe, lkml, matthew, linux-ide,
	linux-kernel, linux-scsi, linux-mm, mel

Andrew Morton wrote:
> On Thu, 13 Dec 2007 19:30:00 -0500
> Mark Lord <liml@rtr.ca> wrote:
> 
>> Here's the commit that causes the regression:
>>
>> ...
>>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -760,7 +760,8 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>>  		struct page *page = __rmqueue(zone, order, migratetype);
>>  		if (unlikely(page == NULL))
>>  			break;
>> -		list_add_tail(&page->lru, list);
>> +		list_add(&page->lru, list);
> 
> well that looks fishy.
..

Yeah.  I missed that, and instead just posted a patch
to search the list in reverse order, which seems to work for me.

I'll try just reversing that line above here now.. gimme 5 minutes or so.

Cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] fix page_alloc for larger I/O segments (improved)
  2007-12-14  0:42                                     ` Mark Lord
@ 2007-12-14  0:46                                       ` Mark Lord
  2007-12-14  0:57                                         ` James Bottomley
  2007-12-14 17:42                                         ` Mel Gorman
  2007-12-14  0:47                                       ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Mark Lord
  1 sibling, 2 replies; 33+ messages in thread
From: Mark Lord @ 2007-12-14  0:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James.Bottomley, jens.axboe, lkml, matthew, linux-ide,
	linux-kernel, linux-scsi, linux-mm, mel

"Improved version", more similar to the 2.6.23 code:

Fix page allocator to give better chance of larger contiguous segments (again).

Signed-off-by: Mark Lord <mlord@pobox.com
---

--- old/mm/page_alloc.c	2007-12-13 19:25:15.000000000 -0500
+++ linux-2.6/mm/page_alloc.c	2007-12-13 19:43:07.000000000 -0500
@@ -760,7 +760,7 @@
 		struct page *page = __rmqueue(zone, order, migratetype);
 		if (unlikely(page == NULL))
 			break;
-		list_add(&page->lru, list);
+		list_add_tail(&page->lru, list);
 		set_page_private(page, migratetype);
 	}
 	spin_unlock(&zone->lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-14  0:42                                     ` Mark Lord
  2007-12-14  0:46                                       ` [PATCH] fix page_alloc for larger I/O segments (improved) Mark Lord
@ 2007-12-14  0:47                                       ` Mark Lord
  1 sibling, 0 replies; 33+ messages in thread
From: Mark Lord @ 2007-12-14  0:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James.Bottomley, jens.axboe, lkml, matthew, linux-ide,
	linux-kernel, linux-scsi, linux-mm, mel

Mark Lord wrote:
> Andrew Morton wrote:
>> On Thu, 13 Dec 2007 19:30:00 -0500
>> Mark Lord <liml@rtr.ca> wrote:
>>
>>> Here's the commit that causes the regression:
>>>
>>> ...
>>>
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -760,7 +760,8 @@ static int rmqueue_bulk(struct zone *zone, 
>>> unsigned int order,
>>>          struct page *page = __rmqueue(zone, order, migratetype);
>>>          if (unlikely(page == NULL))
>>>              break;
>>> -        list_add_tail(&page->lru, list);
>>> +        list_add(&page->lru, list);
>>
>> well that looks fishy.
> ..
> 
> Yeah.  I missed that, and instead just posted a patch
> to search the list in reverse order, which seems to work for me.
> 
> I'll try just reversing that line above here now.. gimme 5 minutes or so.
..

Yep, that works too.  Alternative "improved" patch now posted.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fix page_alloc for larger I/O segments (improved)
  2007-12-14  0:46                                       ` [PATCH] fix page_alloc for larger I/O segments (improved) Mark Lord
@ 2007-12-14  0:57                                         ` James Bottomley
  2007-12-14  1:11                                           ` Andrew Morton
  2007-12-14 17:42                                         ` Mel Gorman
  1 sibling, 1 reply; 33+ messages in thread
From: James Bottomley @ 2007-12-14  0:57 UTC (permalink / raw)
  To: Mark Lord
  Cc: Andrew Morton, jens.axboe, lkml, matthew, linux-ide, linux-kernel,
	linux-scsi, linux-mm, mel

On Thu, 2007-12-13 at 19:46 -0500, Mark Lord wrote:
> "Improved version", more similar to the 2.6.23 code:
> 
> Fix page allocator to give better chance of larger contiguous segments (again).
> 
> Signed-off-by: Mark Lord <mlord@pobox.com
> ---
> 
> --- old/mm/page_alloc.c	2007-12-13 19:25:15.000000000 -0500
> +++ linux-2.6/mm/page_alloc.c	2007-12-13 19:43:07.000000000 -0500
> @@ -760,7 +760,7 @@
>  		struct page *page = __rmqueue(zone, order, migratetype);
>  		if (unlikely(page == NULL))
>  			break;
> -		list_add(&page->lru, list);
> +		list_add_tail(&page->lru, list);

Could we put a big comment above this explaining to the would be vm
tweakers why this has to be a list_add_tail, so we don't end up back in
this position after another two years?

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fix page_alloc for larger I/O segments
  2007-12-14  0:40                                   ` [PATCH] fix page_alloc for larger I/O segments Mark Lord
@ 2007-12-14  1:03                                     ` Andrew Morton
  2007-12-14  4:00                                       ` Matthew Wilcox
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2007-12-14  1:03 UTC (permalink / raw)
  To: Mark Lord
  Cc: James.Bottomley, jens.axboe, lkml, matthew, linux-ide,
	linux-kernel, linux-scsi, linux-mm, mel

On Thu, 13 Dec 2007 19:40:09 -0500
Mark Lord <liml@rtr.ca> wrote:

> And here is a patch that seems to fix it for me here:
> 
> * * * *
> 
> Fix page allocator to give better change of larger contiguous segments (again).
> 
> Signed-off-by: Mark Lord <mlord@pobox.com
> ---
> 
> 
> --- old/mm/page_alloc.c.orig	2007-12-13 19:25:15.000000000 -0500
> +++ linux-2.6/mm/page_alloc.c	2007-12-13 19:35:50.000000000 -0500
> @@ -954,7 +954,7 @@
>  				goto failed;
>  		}
>  		/* Find a page of the appropriate migrate type */
> -		list_for_each_entry(page, &pcp->list, lru) {
> +		list_for_each_entry_reverse(page, &pcp->list, lru) {
>  			if (page_private(page) == migratetype) {
>  				list_del(&page->lru);
>  				pcp->count--;

- needs help to make it apply to mainline

- needs a comment, methinks...


--- a/mm/page_alloc.c~fix-page-allocator-to-give-better-chance-of-larger-contiguous-segments-again
+++ a/mm/page_alloc.c
@@ -1060,8 +1060,12 @@ again:
 				goto failed;
 		}
 
-		/* Find a page of the appropriate migrate type */
-		list_for_each_entry(page, &pcp->list, lru)
+		/*
+		 * Find a page of the appropriate migrate type.  Doing a
+		 * reverse-order search here helps us to hand out pages in
+		 * ascending physical-address order.
+		 */
+		list_for_each_entry_reverse(page, &pcp->list, lru)
 			if (page_private(page) == migratetype)
 				break;
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fix page_alloc for larger I/O segments (improved)
  2007-12-14  0:57                                         ` James Bottomley
@ 2007-12-14  1:11                                           ` Andrew Morton
  2007-12-14  2:23                                             ` Mark Lord
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2007-12-14  1:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: liml, jens.axboe, lkml, matthew, linux-ide, linux-kernel,
	linux-scsi, linux-mm, mel

On Thu, 13 Dec 2007 19:57:29 -0500
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> 
> On Thu, 2007-12-13 at 19:46 -0500, Mark Lord wrote:
> > "Improved version", more similar to the 2.6.23 code:
> > 
> > Fix page allocator to give better chance of larger contiguous segments (again).
> > 
> > Signed-off-by: Mark Lord <mlord@pobox.com
> > ---
> > 
> > --- old/mm/page_alloc.c	2007-12-13 19:25:15.000000000 -0500
> > +++ linux-2.6/mm/page_alloc.c	2007-12-13 19:43:07.000000000 -0500
> > @@ -760,7 +760,7 @@
> >  		struct page *page = __rmqueue(zone, order, migratetype);
> >  		if (unlikely(page == NULL))
> >  			break;
> > -		list_add(&page->lru, list);
> > +		list_add_tail(&page->lru, list);
> 
> Could we put a big comment above this explaining to the would be vm
> tweakers why this has to be a list_add_tail, so we don't end up back in
> this position after another two years?
> 

Already done ;)

--- a/mm/page_alloc.c~fix-page_alloc-for-larger-i-o-segments-fix
+++ a/mm/page_alloc.c
@@ -847,6 +847,10 @@ static int rmqueue_bulk(struct zone *zon
 		struct page *page = __rmqueue(zone, order, migratetype);
 		if (unlikely(page == NULL))
 			break;
+		/*
+		 * Doing a list_add_tail() here helps us to hand out pages in
+		 * ascending physical-address order.
+		 */
 		list_add_tail(&page->lru, list);
 		set_page_private(page, migratetype);
 	}
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fix page_alloc for larger I/O segments (improved)
  2007-12-14  1:11                                           ` Andrew Morton
@ 2007-12-14  2:23                                             ` Mark Lord
  0 siblings, 0 replies; 33+ messages in thread
From: Mark Lord @ 2007-12-14  2:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, jens.axboe, lkml, matthew, linux-ide,
	linux-kernel, linux-scsi, linux-mm, mel

Andrew Morton wrote:
> On Thu, 13 Dec 2007 19:57:29 -0500
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
>> On Thu, 2007-12-13 at 19:46 -0500, Mark Lord wrote:
>>> "Improved version", more similar to the 2.6.23 code:
>>>
>>> Fix page allocator to give better chance of larger contiguous segments (again).
>>>
>>> Signed-off-by: Mark Lord <mlord@pobox.com
>>> ---
>>>
>>> --- old/mm/page_alloc.c	2007-12-13 19:25:15.000000000 -0500
>>> +++ linux-2.6/mm/page_alloc.c	2007-12-13 19:43:07.000000000 -0500
>>> @@ -760,7 +760,7 @@
>>>  		struct page *page = __rmqueue(zone, order, migratetype);
>>>  		if (unlikely(page == NULL))
>>>  			break;
>>> -		list_add(&page->lru, list);
>>> +		list_add_tail(&page->lru, list);
>> Could we put a big comment above this explaining to the would be vm
>> tweakers why this has to be a list_add_tail, so we don't end up back in
>> this position after another two years?
>>
> 
> Already done ;)
..

I thought of the comment as I rushed off for dinner.
Thanks, Andrew!

> --- a/mm/page_alloc.c~fix-page_alloc-for-larger-i-o-segments-fix
> +++ a/mm/page_alloc.c
> @@ -847,6 +847,10 @@ static int rmqueue_bulk(struct zone *zon
>  		struct page *page = __rmqueue(zone, order, migratetype);
>  		if (unlikely(page == NULL))
>  			break;
> +		/*
> +		 * Doing a list_add_tail() here helps us to hand out pages in
> +		 * ascending physical-address order.
> +		 */
>  		list_add_tail(&page->lru, list);
>  		set_page_private(page, migratetype);
>  	}
> _

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fix page_alloc for larger I/O segments
  2007-12-14  1:03                                     ` Andrew Morton
@ 2007-12-14  4:00                                       ` Matthew Wilcox
  0 siblings, 0 replies; 33+ messages in thread
From: Matthew Wilcox @ 2007-12-14  4:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mark Lord, James.Bottomley, jens.axboe, lkml, linux-ide,
	linux-kernel, linux-scsi, linux-mm, mel

On Thu, Dec 13, 2007 at 05:03:08PM -0800, Andrew Morton wrote:
> +		/*
> +		 * Find a page of the appropriate migrate type.  Doing a
> +		 * reverse-order search here helps us to hand out pages in
> +		 * ascending physical-address order.
> +		 */
> +		list_for_each_entry_reverse(page, &pcp->list, lru)

It's not obvious why ascending physical order is a good thing.  How
about:

+		/*
+		 * Find a page of the appropriate migrate type.  Doing a
+		 * reverse-order search here helps us to hand out pages in
+		 * ascending physical-address order, which improves our
+		 * chances of coalescing scatter-gather pages.
+		 */

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-14  0:37                                   ` Andrew Morton
  2007-12-14  0:42                                     ` Mark Lord
@ 2007-12-14 11:50                                     ` Mel Gorman
  2007-12-14 13:57                                       ` Mark Lord
  1 sibling, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2007-12-14 11:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mark Lord, James.Bottomley, jens.axboe, lkml, matthew, linux-ide,
	linux-kernel, linux-scsi, linux-mm

On (13/12/07 16:37), Andrew Morton didst pronounce:
> On Thu, 13 Dec 2007 19:30:00 -0500
> Mark Lord <liml@rtr.ca> wrote:
> 
> > Here's the commit that causes the regression:
> > 
> > ...
> >
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -760,7 +760,8 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
> >  		struct page *page = __rmqueue(zone, order, migratetype);
> >  		if (unlikely(page == NULL))
> >  			break;
> > -		list_add_tail(&page->lru, list);
> > +		list_add(&page->lru, list);
> 
> well that looks fishy.
> 

The reasoning behind the change was the first page encountered on the list
by the caller would have a matching migratetype. I failed to take into
account the physical ordering of pages returned. I'm setting up to run some
performance benchmarks of the candidate fix merged into the -mm tree to see
if the search shows up or not. I'm testing against 2.6.25-rc5 but it'll
take a few hours to complete.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-14 11:50                                     ` Mel Gorman
@ 2007-12-14 13:57                                       ` Mark Lord
  0 siblings, 0 replies; 33+ messages in thread
From: Mark Lord @ 2007-12-14 13:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, James.Bottomley, jens.axboe, lkml, matthew,
	linux-ide, linux-kernel, linux-scsi, linux-mm

Mel Gorman wrote:
> On (13/12/07 16:37), Andrew Morton didst pronounce:
>> On Thu, 13 Dec 2007 19:30:00 -0500
>> Mark Lord <liml@rtr.ca> wrote:
>>
>>> Here's the commit that causes the regression:
>>>
>>> ...
>>>
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -760,7 +760,8 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>>>  		struct page *page = __rmqueue(zone, order, migratetype);
>>>  		if (unlikely(page == NULL))
>>>  			break;
>>> -		list_add_tail(&page->lru, list);
>>> +		list_add(&page->lru, list);
>> well that looks fishy.
>>
> 
> The reasoning behind the change was the first page encountered on the list
> by the caller would have a matching migratetype. I failed to take into
> account the physical ordering of pages returned. I'm setting up to run some
> performance benchmarks of the candidate fix merged into the -mm tree to see
> if the search shows up or not. I'm testing against 2.6.25-rc5 but it'll
> take a few hours to complete.
..

Thanks, Mel.  This is all with CONFIG_SLAB=y, by the way.

Note that it did appear to behave better with CONFIG_SLUB=y when I accidently
used that .config on my 4GB machine here.  Physical segments of 4-10 pages
happended much more common than with CONFIG_SLAB=y on my 3GB machine
Slightly "apples and oranges" there, I know, but at least both were x86-32.  :)

So I would expect CONFIG_SLAB to be well off with this patch under most (all?)
conditions, but dunno about CONFIG_SLUB.

Cheers




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fix page_alloc for larger I/O segments (improved)
  2007-12-14  0:46                                       ` [PATCH] fix page_alloc for larger I/O segments (improved) Mark Lord
  2007-12-14  0:57                                         ` James Bottomley
@ 2007-12-14 17:42                                         ` Mel Gorman
  2007-12-14 18:07                                           ` Mark Lord
  2007-12-14 18:13                                           ` Matthew Wilcox
  1 sibling, 2 replies; 33+ messages in thread
From: Mel Gorman @ 2007-12-14 17:42 UTC (permalink / raw)
  To: Mark Lord
  Cc: Andrew Morton, James.Bottomley, jens.axboe, lkml, matthew,
	linux-ide, linux-kernel, linux-scsi, linux-mm

On (13/12/07 19:46), Mark Lord didst pronounce:
> 
> "Improved version", more similar to the 2.6.23 code:
> 
> Fix page allocator to give better chance of larger contiguous segments 
> (again).
> 
> Signed-off-by: Mark Lord <mlord@pobox.com

Regrettably this interferes with anti-fragmentation because the "next" page
on the list on return from rmqueue_bulk is not guaranteed to be of the right
mobility type. I fixed it as an additional patch but it adds additional cost
that should not be necessary and it's visible in microbenchmark results on
at least one machine.

The following patch should fix the page ordering problem without incurring an
additional cost or interfering with anti-fragmentation. However, I haven't
anything in place yet to verify that the physical page ordering is correct
but it makes sense. Can you verify it fixes the problem please?

It'll still be some time before I have a full set of performance results
but initially at least, this fix seems to avoid any impact.

======
Subject: Fix page allocation for larger I/O segments

In some cases the IO subsystem is able to merge requests if the pages are
adjacent in physical memory. This was achieved in the allocator by having
expand() return pages in physically contiguous order in situations were
a large buddy was split. However, list-based anti-fragmentation changed
the order pages were returned in to avoid searching in buffered_rmqueue()
for a page of the appropriate migrate type.

This patch restores behaviour of rmqueue_bulk() preserving the physical order
of pages returned by the allocator without incurring increased search costs for
anti-fragmentation.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
--- 
 page_alloc.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc5-clean/mm/page_alloc.c linux-2.6.24-rc5-giveback-physorder-listmove/mm/page_alloc.c
--- linux-2.6.24-rc5-clean/mm/page_alloc.c	2007-12-14 11:55:13.000000000 +0000
+++ linux-2.6.24-rc5-giveback-physorder-listmove/mm/page_alloc.c	2007-12-14 15:33:12.000000000 +0000
@@ -847,8 +847,19 @@ static int rmqueue_bulk(struct zone *zon
 		struct page *page = __rmqueue(zone, order, migratetype);
 		if (unlikely(page == NULL))
 			break;
+
+		/*
+		 * Split buddy pages returned by expand() are received here
+		 * in physical page order. The page is added to the callers and
+		 * list and the list head then moves forward. From the callers
+		 * perspective, the linked list is ordered by page number in
+		 * some conditions. This is useful for IO devices that can
+		 * merge IO requests if the physical pages are ordered
+		 * properly.
+		 */
 		list_add(&page->lru, list);
 		set_page_private(page, migratetype);
+		list = &page->lru;
 	}
 	spin_unlock(&zone->lock);
 	return i;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fix page_alloc for larger I/O segments (improved)
  2007-12-14 17:42                                         ` Mel Gorman
@ 2007-12-14 18:07                                           ` Mark Lord
  2007-12-16 21:56                                             ` Mel Gorman
  2007-12-14 18:13                                           ` Matthew Wilcox
  1 sibling, 1 reply; 33+ messages in thread
From: Mark Lord @ 2007-12-14 18:07 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, James.Bottomley, jens.axboe, lkml, matthew,
	linux-ide, linux-kernel, linux-scsi, linux-mm

Mel Gorman wrote:
> On (13/12/07 19:46), Mark Lord didst pronounce:
>> "Improved version", more similar to the 2.6.23 code:
>>
>> Fix page allocator to give better chance of larger contiguous segments 
>> (again).
>>
>> Signed-off-by: Mark Lord <mlord@pobox.com
> 
> Regrettably this interferes with anti-fragmentation because the "next" page
> on the list on return from rmqueue_bulk is not guaranteed to be of the right
> mobility type. I fixed it as an additional patch but it adds additional cost
> that should not be necessary and it's visible in microbenchmark results on
> at least one machine.
> 
> The following patch should fix the page ordering problem without incurring an
> additional cost or interfering with anti-fragmentation. However, I haven't
> anything in place yet to verify that the physical page ordering is correct
> but it makes sense. Can you verify it fixes the problem please?
> 
> It'll still be some time before I have a full set of performance results
> but initially at least, this fix seems to avoid any impact.
> 
> ======
> Subject: Fix page allocation for larger I/O segments
> 
> In some cases the IO subsystem is able to merge requests if the pages are
> adjacent in physical memory. This was achieved in the allocator by having
> expand() return pages in physically contiguous order in situations were
> a large buddy was split. However, list-based anti-fragmentation changed
> the order pages were returned in to avoid searching in buffered_rmqueue()
> for a page of the appropriate migrate type.
> 
> This patch restores behaviour of rmqueue_bulk() preserving the physical order
> of pages returned by the allocator without incurring increased search costs for
> anti-fragmentation.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> --- 
>  page_alloc.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc5-clean/mm/page_alloc.c linux-2.6.24-rc5-giveback-physorder-listmove/mm/page_alloc.c
> --- linux-2.6.24-rc5-clean/mm/page_alloc.c	2007-12-14 11:55:13.000000000 +0000
> +++ linux-2.6.24-rc5-giveback-physorder-listmove/mm/page_alloc.c	2007-12-14 15:33:12.000000000 +0000
> @@ -847,8 +847,19 @@ static int rmqueue_bulk(struct zone *zon
>  		struct page *page = __rmqueue(zone, order, migratetype);
>  		if (unlikely(page == NULL))
>  			break;
> +
> +		/*
> +		 * Split buddy pages returned by expand() are received here
> +		 * in physical page order. The page is added to the callers and
> +		 * list and the list head then moves forward. From the callers
> +		 * perspective, the linked list is ordered by page number in
> +		 * some conditions. This is useful for IO devices that can
> +		 * merge IO requests if the physical pages are ordered
> +		 * properly.
> +		 */
>  		list_add(&page->lru, list);
>  		set_page_private(page, migratetype);
> +		list = &page->lru;
>  	}
>  	spin_unlock(&zone->lock);
>  	return i;
..

That (also) works for me here, regularly generating 64KB I/O segments with SLAB.

Cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fix page_alloc for larger I/O segments (improved)
  2007-12-14 17:42                                         ` Mel Gorman
  2007-12-14 18:07                                           ` Mark Lord
@ 2007-12-14 18:13                                           ` Matthew Wilcox
  2007-12-14 18:30                                             ` Mark Lord
  2007-12-20 22:37                                             ` Matthew Wilcox
  1 sibling, 2 replies; 33+ messages in thread
From: Matthew Wilcox @ 2007-12-14 18:13 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Mark Lord, Andrew Morton, James.Bottomley, jens.axboe, lkml,
	linux-ide, linux-kernel, linux-scsi, linux-mm

On Fri, Dec 14, 2007 at 05:42:37PM +0000, Mel Gorman wrote:
> Regrettably this interferes with anti-fragmentation because the "next" page
> on the list on return from rmqueue_bulk is not guaranteed to be of the right
> mobility type. I fixed it as an additional patch but it adds additional cost
> that should not be necessary and it's visible in microbenchmark results on
> at least one machine.

Is this patch to be preferred to the one Andrew Morton posted to do
list_for_each_entry_reverse?

I'll send it to our DB team to see if this improves our numbers at all.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fix page_alloc for larger I/O segments (improved)
  2007-12-14 18:13                                           ` Matthew Wilcox
@ 2007-12-14 18:30                                             ` Mark Lord
  2007-12-20 22:37                                             ` Matthew Wilcox
  1 sibling, 0 replies; 33+ messages in thread
From: Mark Lord @ 2007-12-14 18:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mel Gorman, Andrew Morton, James.Bottomley, jens.axboe, lkml,
	linux-ide, linux-kernel, linux-scsi, linux-mm

Matthew Wilcox wrote:
> On Fri, Dec 14, 2007 at 05:42:37PM +0000, Mel Gorman wrote:
>> Regrettably this interferes with anti-fragmentation because the "next" page
>> on the list on return from rmqueue_bulk is not guaranteed to be of the right
>> mobility type. I fixed it as an additional patch but it adds additional cost
>> that should not be necessary and it's visible in microbenchmark results on
>> at least one machine.
> 
> Is this patch to be preferred to the one Andrew Morton posted to do
> list_for_each_entry_reverse?
..

This patch replaces my earlier patch that Andrew has:

-               list_add(&page->lru, list);
+               list_add_tail(&page->lru, list);

Which, in turn, replaced the even-earlier list_for_each_entry_reverse patch.

-ml

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-13 22:29                         ` Andrew Morton
  2007-12-13 22:33                           ` Mark Lord
@ 2007-12-15  1:09                           ` Mel Gorman
  2007-12-15  2:02                             ` Andrew Morton
  1 sibling, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2007-12-15  1:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, jens.axboe, liml, lkml, matthew, linux-kernel,
	linux-scsi, linux-mm

On (13/12/07 14:29), Andrew Morton didst pronounce:
> > The simple way seems to be to malloc a large area, touch every page and
> > then look at the physical pages assigned ... they now mostly seem to be
> > descending in physical address.
> > 
> 
> OIC.  -mm's /proc/pid/pagemap can be used to get the pfn's...
> 

I tried using pagemap to verify the patch but it triggered BUG_ON
checks. Perhaps I am using the interface wrong but I would still not
expect it to break in this fashion. I tried 2.6.24-rc4-mm1, 2.6.24-rc5-mm1,
2.6.24-rc5 with just the maps4 patches applied and 2.6.23 with maps4 patches
applied. Each time I get errors like this;

[   90.108315] BUG: sleeping function called from invalid context at include/asm/uaccess_32.h:457
[   90.211227] in_atomic():1, irqs_disabled():0
[   90.262251] no locks held by showcontiguous/2814.
[   90.318475] Pid: 2814, comm: showcontiguous Not tainted 2.6.24-rc5 #1
[   90.395344]  [<c010522a>] show_trace_log_lvl+0x1a/0x30
[   90.456948]  [<c0105bb2>] show_trace+0x12/0x20
[   90.510173]  [<c0105eee>] dump_stack+0x6e/0x80
[   90.563409]  [<c01205b3>] __might_sleep+0xc3/0xe0
[   90.619765]  [<c02264fd>] copy_to_user+0x3d/0x60
[   90.675153]  [<c01b3e9c>] add_to_pagemap+0x5c/0x80
[   90.732513]  [<c01b43e8>] pagemap_pte_range+0x68/0xb0
[   90.793010]  [<c0175ed2>] walk_page_range+0x112/0x210
[   90.853482]  [<c01b47c6>] pagemap_read+0x176/0x220
[   90.910863]  [<c0182dc4>] vfs_read+0x94/0x150
[   90.963058]  [<c01832fd>] sys_read+0x3d/0x70
[   91.014219]  [<c0104262>] syscall_call+0x7/0xb
[   91.067433]  =======================
[   91.110137] BUG: scheduling while atomic: showcontiguous/2814/0x00000001
[   91.190169] no locks held by showcontiguous/2814.
[   91.246293] Pid: 2814, comm: showcontiguous Not tainted 2.6.24-rc5 #1
[   91.323145]  [<c010522a>] show_trace_log_lvl+0x1a/0x30
[   91.384633]  [<c0105bb2>] show_trace+0x12/0x20
[   91.437878]  [<c0105eee>] dump_stack+0x6e/0x80
[   91.491116]  [<c0123816>] __schedule_bug+0x66/0x70
[   91.548467]  [<c033ba96>] schedule+0x556/0x7b0
[   91.601698]  [<c01042e6>] work_resched+0x5/0x21
[   91.655977]  =======================
[   91.704927] showcontiguous[2814]: segfault at b7eaa900 eip b7eaa900 esp bfa02e8c error 4
[   91.801633] BUG: scheduling while atomic: showcontiguous/2814/0x00000001
[   91.881634] no locks held by showcontiguous/2814.
[   91.937779] Pid: 2814, comm: showcontiguous Not tainted 2.6.24-rc5 #1
[   92.014606]  [<c010522a>] show_trace_log_lvl+0x1a/0x30
[   92.076123]  [<c0105bb2>] show_trace+0x12/0x20
[   92.129354]  [<c0105eee>] dump_stack+0x6e/0x80
[   92.182567]  [<c0123816>] __schedule_bug+0x66/0x70
[   92.239959]  [<c033ba96>] schedule+0x556/0x7b0
[   92.293187]  [<c01042e6>] work_resched+0x5/0x21
[   92.347452]  =======================
[   92.392697] note: showcontiguous[2814] exited with preempt_count 1
[   92.468611] BUG: scheduling while atomic: showcontiguous/2814/0x10000001
[   92.548588] no locks held by showcontiguous/2814.
[   92.604732] Pid: 2814, comm: showcontiguous Not tainted 2.6.24-rc5 #1
[   92.681665]  [<c010522a>] show_trace_log_lvl+0x1a/0x30
[   92.743180]  [<c0105bb2>] show_trace+0x12/0x20
[   92.796409]  [<c0105eee>] dump_stack+0x6e/0x80
[   92.849621]  [<c0123816>] __schedule_bug+0x66/0x70
[   92.907014]  [<c033ba96>] schedule+0x556/0x7b0
[   92.960349]  [<c0123847>] __cond_resched+0x27/0x40
[   93.017804]  [<c033be3a>] cond_resched+0x2a/0x40
[   93.073122]  [<c016e22c>] unmap_vmas+0x4ec/0x540
[   93.128418]  [<c017132f>] exit_mmap+0x6f/0xf0
[   93.180611]  [<c01254d1>] mmput+0x31/0xb0
[   93.228665]  [<c01295fd>] exit_mm+0x8d/0xf0
[   93.278788]  [<c012ac8f>] do_exit+0x15f/0x7e0
[   93.330965]  [<c012b339>] do_group_exit+0x29/0x70
[   93.387321]  [<c0133e07>] get_signal_to_deliver+0x2b7/0x490
[   93.454013]  [<c010373d>] do_notify_resume+0x7d/0x760
[   93.514476]  [<c0104315>] work_notifysig+0x13/0x1a
[   93.571869]  =======================

Just using cp to read the file is enough to cause problems but I included
a very basic program below that produces the BUG_ON checks. Is this a known
issue or am I using the interface incorrectly?

#include <stdio.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/types.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define MAPSIZE (4*1048576)
#define PM_ENTRY_BYTES sizeof(__u64)

int main(int argc, char **argv)
{
	int pagemap_fd;
	unsigned long *anonmapping;
	__u64 pagemap_entry = 0ULL;

	unsigned long vpfn, ppfn;
	size_t mmap_offset;
	int pagesize = getpagesize();

	/* Open the pagemap interface */
	pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
	if (pagemap_fd == -1) {
		perror("fopen");
		exit(EXIT_FAILURE);
	}

	/* Create an anonymous mapping */
	anonmapping = mmap(NULL, MAPSIZE,
			PROT_READ|PROT_WRITE,
			MAP_PRIVATE|MAP_ANONYMOUS|MAP_POPULATE,
			-1, 0);
	if (anonmapping == MAP_FAILED) {
		perror("mmap");
		exit(1);
	}

	/* Work out the VPN the mapping is at and seek to it in pagemap */
	vpfn = ((unsigned long)anonmapping) / pagesize;
	mmap_offset = lseek(pagemap_fd, vpfn * PM_ENTRY_BYTES, SEEK_SET);
	if (mmap_offset == -1) {
		perror("fseek");
		exit(EXIT_FAILURE);
	}

	/* Read the PFN of each page in the mapping */
	for (mmap_offset = 0; mmap_offset < MAPSIZE; mmap_offset += pagesize) {
		vpfn = ((unsigned long)anonmapping + mmap_offset) / pagesize;

		if (read(pagemap_fd, &pagemap_entry, PM_ENTRY_BYTES) == 0) {
			perror("fread");
			exit(EXIT_FAILURE);
		}

		ppfn = (unsigned long)pagemap_entry;
		printf("vpfn = %8lu ppfn = %8lu\n", vpfn, ppfn);
	}

	close(pagemap_fd);
	munmap(anonmapping, MAPSIZE);
	exit(EXIT_SUCCESS);
}

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-15  1:09                           ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Mel Gorman
@ 2007-12-15  2:02                             ` Andrew Morton
  2007-12-15  5:55                               ` Matt Mackall
  2007-12-16 21:55                               ` Mel Gorman
  0 siblings, 2 replies; 33+ messages in thread
From: Andrew Morton @ 2007-12-15  2:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: James Bottomley, jens.axboe, liml, lkml, matthew, linux-kernel,
	linux-scsi, linux-mm, Matt Mackall

On Sat, 15 Dec 2007 01:09:41 +0000 Mel Gorman <mel@csn.ul.ie> wrote:

> On (13/12/07 14:29), Andrew Morton didst pronounce:
> > > The simple way seems to be to malloc a large area, touch every page and
> > > then look at the physical pages assigned ... they now mostly seem to be
> > > descending in physical address.
> > > 
> > 
> > OIC.  -mm's /proc/pid/pagemap can be used to get the pfn's...
> > 
> 
> I tried using pagemap to verify the patch but it triggered BUG_ON
> checks. Perhaps I am using the interface wrong but I would still not
> expect it to break in this fashion. I tried 2.6.24-rc4-mm1, 2.6.24-rc5-mm1,
> 2.6.24-rc5 with just the maps4 patches applied and 2.6.23 with maps4 patches
> applied. Each time I get errors like this;
> 
> [   90.108315] BUG: sleeping function called from invalid context at include/asm/uaccess_32.h:457
> [   90.211227] in_atomic():1, irqs_disabled():0
> [   90.262251] no locks held by showcontiguous/2814.
> [   90.318475] Pid: 2814, comm: showcontiguous Not tainted 2.6.24-rc5 #1
> [   90.395344]  [<c010522a>] show_trace_log_lvl+0x1a/0x30
> [   90.456948]  [<c0105bb2>] show_trace+0x12/0x20
> [   90.510173]  [<c0105eee>] dump_stack+0x6e/0x80
> [   90.563409]  [<c01205b3>] __might_sleep+0xc3/0xe0
> [   90.619765]  [<c02264fd>] copy_to_user+0x3d/0x60
> [   90.675153]  [<c01b3e9c>] add_to_pagemap+0x5c/0x80
> [   90.732513]  [<c01b43e8>] pagemap_pte_range+0x68/0xb0
> [   90.793010]  [<c0175ed2>] walk_page_range+0x112/0x210
> [   90.853482]  [<c01b47c6>] pagemap_read+0x176/0x220
> [   90.910863]  [<c0182dc4>] vfs_read+0x94/0x150
> [   90.963058]  [<c01832fd>] sys_read+0x3d/0x70
> [   91.014219]  [<c0104262>] syscall_call+0x7/0xb
> 
> ...
>
> Just using cp to read the file is enough to cause problems but I included
> a very basic program below that produces the BUG_ON checks. Is this a known
> issue or am I using the interface incorrectly?

I'd say you're using it correctly but you've found a hitherto unknown bug. 
On i386 highmem machines with CONFIG_HIGHPTE (at least) pte_offset_map()
takes kmap_atomic(), so pagemap_pte_range() can't do copy_to_user() as it
presently does.

Drat.

Still, that shouldn't really disrupt the testing which you're doing.  You
could disable CONFIG_HIGHPTE to shut it up.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-15  2:02                             ` Andrew Morton
@ 2007-12-15  5:55                               ` Matt Mackall
  2007-12-16 21:55                               ` Mel Gorman
  1 sibling, 0 replies; 33+ messages in thread
From: Matt Mackall @ 2007-12-15  5:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, James Bottomley, jens.axboe, liml, lkml, matthew,
	linux-kernel, linux-scsi, linux-mm

On Fri, Dec 14, 2007 at 06:02:06PM -0800, Andrew Morton wrote:
> On Sat, 15 Dec 2007 01:09:41 +0000 Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On (13/12/07 14:29), Andrew Morton didst pronounce:
> > > > The simple way seems to be to malloc a large area, touch every page and
> > > > then look at the physical pages assigned ... they now mostly seem to be
> > > > descending in physical address.
> > > > 
> > > 
> > > OIC.  -mm's /proc/pid/pagemap can be used to get the pfn's...
> > > 
> > 
> > I tried using pagemap to verify the patch but it triggered BUG_ON
> > checks. Perhaps I am using the interface wrong but I would still not
> > expect it to break in this fashion. I tried 2.6.24-rc4-mm1, 2.6.24-rc5-mm1,
> > 2.6.24-rc5 with just the maps4 patches applied and 2.6.23 with maps4 patches
> > applied. Each time I get errors like this;
> > 
> > [   90.108315] BUG: sleeping function called from invalid context at include/asm/uaccess_32.h:457
> > [   90.211227] in_atomic():1, irqs_disabled():0
> > [   90.262251] no locks held by showcontiguous/2814.
> > [   90.318475] Pid: 2814, comm: showcontiguous Not tainted 2.6.24-rc5 #1
> > [   90.395344]  [<c010522a>] show_trace_log_lvl+0x1a/0x30
> > [   90.456948]  [<c0105bb2>] show_trace+0x12/0x20
> > [   90.510173]  [<c0105eee>] dump_stack+0x6e/0x80
> > [   90.563409]  [<c01205b3>] __might_sleep+0xc3/0xe0
> > [   90.619765]  [<c02264fd>] copy_to_user+0x3d/0x60
> > [   90.675153]  [<c01b3e9c>] add_to_pagemap+0x5c/0x80
> > [   90.732513]  [<c01b43e8>] pagemap_pte_range+0x68/0xb0
> > [   90.793010]  [<c0175ed2>] walk_page_range+0x112/0x210
> > [   90.853482]  [<c01b47c6>] pagemap_read+0x176/0x220
> > [   90.910863]  [<c0182dc4>] vfs_read+0x94/0x150
> > [   90.963058]  [<c01832fd>] sys_read+0x3d/0x70
> > [   91.014219]  [<c0104262>] syscall_call+0x7/0xb
> > 
> > ...
> >
> > Just using cp to read the file is enough to cause problems but I included
> > a very basic program below that produces the BUG_ON checks. Is this a known
> > issue or am I using the interface incorrectly?
> 
> I'd say you're using it correctly but you've found a hitherto unknown bug. 
> On i386 highmem machines with CONFIG_HIGHPTE (at least) pte_offset_map()
> takes kmap_atomic(), so pagemap_pte_range() can't do copy_to_user() as it
> presently does.

Damn, I coulda sworn I fixed that.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-15  2:02                             ` Andrew Morton
  2007-12-15  5:55                               ` Matt Mackall
@ 2007-12-16 21:55                               ` Mel Gorman
  2007-12-17 19:24                                 ` Randy Dunlap
  1 sibling, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2007-12-16 21:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, jens.axboe, liml, lkml, matthew, linux-kernel,
	linux-scsi, linux-mm, Matt Mackall

> > Just using cp to read the file is enough to cause problems but I included
> > a very basic program below that produces the BUG_ON checks. Is this a known
> > issue or am I using the interface incorrectly?
> 
> I'd say you're using it correctly but you've found a hitherto unknown bug. 
> On i386 highmem machines with CONFIG_HIGHPTE (at least) pte_offset_map()
> takes kmap_atomic(), so pagemap_pte_range() can't do copy_to_user() as it
> presently does.
> 
> Drat.
> 
> Still, that shouldn't really disrupt the testing which you're doing.  You
> could disable CONFIG_HIGHPTE to shut it up.
> 

Yes, that did the trick. Using pagemap, it was trivial to show that the
2.6.24-rc5-mm1 kernel was placing pages in reverse physical order like
the following output shows

b:  32763 v:   753091 p:    65559 . 65558 contig: 1
b:  32764 v:   753092 p:    65558 . 65557 contig: 1
b:  32765 v:   753093 p:    65557 . 65556 contig: 1
b:  32766 v:   753094 p:    65556 . 65555 contig: 1
b:  32767 v:   753095 p:    65555 . 65555 contig: 1

p: is the PFN of the page v: is the page offset within an anonymous
mapping and b: is the number of non-contiguous blocks in the anonymous
mapping. With the patch applied, it looks more like;

b:   1232 v:   752964 p:    58944 ................ 87328 contig: 15
b:   1233 v:   752980 p:    87328 ................ 91200 contig: 15
b:   1234 v:   752996 p:    91200 ................ 40272 contig: 15
b:   1235 v:   753012 p:    40272 ................ 85664 contig: 15
b:   1236 v:   753028 p:    85664 ................ 87312 contig: 15

so mappings are using contiguous pages again. This was the final test
program I used in case it's of any interest.

Thanks

/*
 * showcontiguous.c
 *
 * Use the /proc/pid/pagemap interface to give an indication of how contiguous
 * physical memory is in an anonymous virtual memory mapping
 */
#include <stdio.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/types.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define MAPSIZE (128*1048576)
#define PM_ENTRY_BYTES sizeof(__u64)

int main(int argc, char **argv)
{
	int pagemap_fd;
	unsigned long *anonmapping;
	__u64 pagemap_entry = 0ULL;

	unsigned long vpfn, ppfn, ppfn_last;
	int block_number = 0;
	int contig_count = 1;
	size_t mmap_offset;
	int pagesize = getpagesize();

	if (sizeof(pagemap_entry) < PM_ENTRY_BYTES) {
		printf("ERROR: Failed assumption on size of pagemap_entry\n");
		exit(EXIT_FAILURE);
	}

	/* Open the pagemap interface */
	pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
	if (pagemap_fd == -1) {
		perror("fopen");
		exit(EXIT_FAILURE);
	}

	/* Create an anonymous mapping */
	anonmapping = mmap(NULL, MAPSIZE,
			PROT_READ|PROT_WRITE,
			MAP_PRIVATE|MAP_ANONYMOUS|MAP_POPULATE,
			-1, 0);
	if (anonmapping == MAP_FAILED) {
		perror("mmap");
		exit(1);
	}

	/* Work out the VPN the mapping is at and seek to it in pagemap */
	vpfn = ((unsigned long)anonmapping) / pagesize;
	mmap_offset = lseek(pagemap_fd, vpfn * PM_ENTRY_BYTES, SEEK_SET);
	if (mmap_offset == -1) {
		perror("fseek");
		exit(EXIT_FAILURE);
	}
	ppfn_last = 0;

	/* Read the PFN of each page in the mapping */
	for (mmap_offset = 0; mmap_offset < MAPSIZE; mmap_offset += pagesize) {
		vpfn = ((unsigned long)anonmapping + mmap_offset) / pagesize;

		if (read(pagemap_fd, &pagemap_entry, PM_ENTRY_BYTES) == 0) {
			perror("fread");
			exit(EXIT_FAILURE);
		}

		ppfn = (unsigned long)pagemap_entry;
		if (ppfn == ppfn_last + 1) {
			printf(".");
			contig_count++;
		} else {
			printf(" %lu contig: %d\nb: %6d v: %8lu p: %8lu .",
				ppfn, contig_count,
				block_number, vpfn, ppfn);
			contig_count = 1;
			block_number++;
		}
		ppfn_last = ppfn;
	}
	printf(" %lu config: %d\n", ppfn, contig_count);

	close(pagemap_fd);
	munmap(anonmapping, MAPSIZE);
	exit(EXIT_SUCCESS);
}
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fix page_alloc for larger I/O segments (improved)
  2007-12-14 18:07                                           ` Mark Lord
@ 2007-12-16 21:56                                             ` Mel Gorman
  0 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2007-12-16 21:56 UTC (permalink / raw)
  To: Mark Lord
  Cc: Andrew Morton, James.Bottomley, jens.axboe, lkml, matthew,
	linux-ide, linux-kernel, linux-scsi, linux-mm

On (14/12/07 13:07), Mark Lord didst pronounce:
> <SNIP>
> 
> That (also) works for me here, regularly generating 64KB I/O segments with 
> SLAB.
> 

Brilliant. Thanks a lot Mark.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-16 21:55                               ` Mel Gorman
@ 2007-12-17 19:24                                 ` Randy Dunlap
  2007-12-18  2:42                                   ` Matt Mackall
  0 siblings, 1 reply; 33+ messages in thread
From: Randy Dunlap @ 2007-12-17 19:24 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, James Bottomley, jens.axboe, liml, lkml, matthew,
	linux-kernel, linux-scsi, linux-mm, Matt Mackall

On Sun, 16 Dec 2007 21:55:20 +0000 Mel Gorman wrote:

> > > Just using cp to read the file is enough to cause problems but I included
> > > a very basic program below that produces the BUG_ON checks. Is this a known
> > > issue or am I using the interface incorrectly?
> > 
> > I'd say you're using it correctly but you've found a hitherto unknown bug. 
> > On i386 highmem machines with CONFIG_HIGHPTE (at least) pte_offset_map()
> > takes kmap_atomic(), so pagemap_pte_range() can't do copy_to_user() as it
> > presently does.
> > 
> > Drat.
> > 
> > Still, that shouldn't really disrupt the testing which you're doing.  You
> > could disable CONFIG_HIGHPTE to shut it up.
> > 
> 
> Yes, that did the trick. Using pagemap, it was trivial to show that the
> 2.6.24-rc5-mm1 kernel was placing pages in reverse physical order like
> the following output shows
> 
> b:  32763 v:   753091 p:    65559 . 65558 contig: 1
> b:  32764 v:   753092 p:    65558 . 65557 contig: 1
> b:  32765 v:   753093 p:    65557 . 65556 contig: 1
> b:  32766 v:   753094 p:    65556 . 65555 contig: 1
> b:  32767 v:   753095 p:    65555 . 65555 contig: 1
> 
> p: is the PFN of the page v: is the page offset within an anonymous
> mapping and b: is the number of non-contiguous blocks in the anonymous
> mapping. With the patch applied, it looks more like;
> 
> b:   1232 v:   752964 p:    58944 ................ 87328 contig: 15
> b:   1233 v:   752980 p:    87328 ................ 91200 contig: 15
> b:   1234 v:   752996 p:    91200 ................ 40272 contig: 15
> b:   1235 v:   753012 p:    40272 ................ 85664 contig: 15
> b:   1236 v:   753028 p:    85664 ................ 87312 contig: 15
> 
> so mappings are using contiguous pages again. This was the final test
> program I used in case it's of any interest.
> 
> Thanks
> 
> /*
>  * showcontiguous.c
>  *
>  * Use the /proc/pid/pagemap interface to give an indication of how contiguous
>  * physical memory is in an anonymous virtual memory mapping
>  */

Matt,
Did you ever make your python pagemap scripts available?
If not, would you?

---
~Randy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
  2007-12-17 19:24                                 ` Randy Dunlap
@ 2007-12-18  2:42                                   ` Matt Mackall
  0 siblings, 0 replies; 33+ messages in thread
From: Matt Mackall @ 2007-12-18  2:42 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Mel Gorman, Andrew Morton, James Bottomley, jens.axboe, liml,
	lkml, matthew, linux-kernel, linux-scsi, linux-mm

On Mon, Dec 17, 2007 at 11:24:57AM -0800, Randy Dunlap wrote:
> On Sun, 16 Dec 2007 21:55:20 +0000 Mel Gorman wrote:
> 
> > > > Just using cp to read the file is enough to cause problems but I included
> > > > a very basic program below that produces the BUG_ON checks. Is this a known
> > > > issue or am I using the interface incorrectly?
> > > 
> > > I'd say you're using it correctly but you've found a hitherto unknown bug. 
> > > On i386 highmem machines with CONFIG_HIGHPTE (at least) pte_offset_map()
> > > takes kmap_atomic(), so pagemap_pte_range() can't do copy_to_user() as it
> > > presently does.
> > > 
> > > Drat.
> > > 
> > > Still, that shouldn't really disrupt the testing which you're doing.  You
> > > could disable CONFIG_HIGHPTE to shut it up.
> > > 
> > 
> > Yes, that did the trick. Using pagemap, it was trivial to show that the
> > 2.6.24-rc5-mm1 kernel was placing pages in reverse physical order like
> > the following output shows
> > 
> > b:  32763 v:   753091 p:    65559 . 65558 contig: 1
> > b:  32764 v:   753092 p:    65558 . 65557 contig: 1
> > b:  32765 v:   753093 p:    65557 . 65556 contig: 1
> > b:  32766 v:   753094 p:    65556 . 65555 contig: 1
> > b:  32767 v:   753095 p:    65555 . 65555 contig: 1
> > 
> > p: is the PFN of the page v: is the page offset within an anonymous
> > mapping and b: is the number of non-contiguous blocks in the anonymous
> > mapping. With the patch applied, it looks more like;
> > 
> > b:   1232 v:   752964 p:    58944 ................ 87328 contig: 15
> > b:   1233 v:   752980 p:    87328 ................ 91200 contig: 15
> > b:   1234 v:   752996 p:    91200 ................ 40272 contig: 15
> > b:   1235 v:   753012 p:    40272 ................ 85664 contig: 15
> > b:   1236 v:   753028 p:    85664 ................ 87312 contig: 15
> > 
> > so mappings are using contiguous pages again. This was the final test
> > program I used in case it's of any interest.
> > 
> > Thanks
> > 
> > /*
> >  * showcontiguous.c
> >  *
> >  * Use the /proc/pid/pagemap interface to give an indication of how contiguous
> >  * physical memory is in an anonymous virtual memory mapping
> >  */
> 
> Matt,
> Did you ever make your python pagemap scripts available?
> If not, would you?

There's a collection of them at http://selenic.com/repo/pagemap.
They're largely proof of concept, and I'm not sure I finished adapting
them all to the final 64-bit interface.

As it happens, the above regression I actually spotted immediately by
doing a simple hexdump on my very first test of the interface - lots
of pfns counting backwards. Mentioned it a few times to various people
in the cc: list and on lkml but never got around to tracking it down
myself..

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] fix page_alloc for larger I/O segments (improved)
  2007-12-14 18:13                                           ` Matthew Wilcox
  2007-12-14 18:30                                             ` Mark Lord
@ 2007-12-20 22:37                                             ` Matthew Wilcox
  1 sibling, 0 replies; 33+ messages in thread
From: Matthew Wilcox @ 2007-12-20 22:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Mark Lord, Andrew Morton, James.Bottomley, jens.axboe, lkml,
	linux-ide, linux-kernel, linux-scsi, linux-mm

On Fri, Dec 14, 2007 at 11:13:40AM -0700, Matthew Wilcox wrote:
> I'll send it to our DB team to see if this improves our numbers at all.

It does, by approximately 0.67%.  This is about double the margin of
error, and a significant improvement.  Thanks!

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2007-12-20 22:37 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20071213185326.GQ26334@parisc-linux.org>
     [not found] ` <4761821F.3050602@rtr.ca>
     [not found]   ` <20071213192633.GD10104@kernel.dk>
     [not found]     ` <4761883A.7050908@rtr.ca>
     [not found]       ` <476188C4.9030802@rtr.ca>
     [not found]         ` <20071213193937.GG10104@kernel.dk>
     [not found]           ` <47618B0B.8020203@rtr.ca>
     [not found]             ` <20071213195350.GH10104@kernel.dk>
     [not found]               ` <20071213200219.GI10104@kernel.dk>
     [not found]                 ` <476190BE.9010405@rtr.ca>
     [not found]                   ` <20071213200958.GK10104@kernel.dk>
2007-12-13 22:02                     ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Andrew Morton
2007-12-13 22:15                       ` James Bottomley
2007-12-13 22:29                         ` Andrew Morton
2007-12-13 22:33                           ` Mark Lord
2007-12-13 23:13                             ` Mark Lord
2007-12-14  0:05                               ` Mark Lord
2007-12-14  0:30                                 ` Mark Lord
2007-12-14  0:37                                   ` Andrew Morton
2007-12-14  0:42                                     ` Mark Lord
2007-12-14  0:46                                       ` [PATCH] fix page_alloc for larger I/O segments (improved) Mark Lord
2007-12-14  0:57                                         ` James Bottomley
2007-12-14  1:11                                           ` Andrew Morton
2007-12-14  2:23                                             ` Mark Lord
2007-12-14 17:42                                         ` Mel Gorman
2007-12-14 18:07                                           ` Mark Lord
2007-12-16 21:56                                             ` Mel Gorman
2007-12-14 18:13                                           ` Matthew Wilcox
2007-12-14 18:30                                             ` Mark Lord
2007-12-20 22:37                                             ` Matthew Wilcox
2007-12-14  0:47                                       ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Mark Lord
2007-12-14 11:50                                     ` Mel Gorman
2007-12-14 13:57                                       ` Mark Lord
2007-12-14  0:40                                   ` [PATCH] fix page_alloc for larger I/O segments Mark Lord
2007-12-14  1:03                                     ` Andrew Morton
2007-12-14  4:00                                       ` Matthew Wilcox
2007-12-15  1:09                           ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Mel Gorman
2007-12-15  2:02                             ` Andrew Morton
2007-12-15  5:55                               ` Matt Mackall
2007-12-16 21:55                               ` Mel Gorman
2007-12-17 19:24                                 ` Randy Dunlap
2007-12-18  2:42                                   ` Matt Mackall
2007-12-13 22:17                       ` Jens Axboe
2007-12-13 22:02                     ` VM allocates pages in reverse order again Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).