[patch] not to disturb page LRU state when unmapping memory range

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [patch] not to disturb page LRU state when unmapping memory range
@ 2007-01-31  4:41 Ken Chen
  2007-01-31 12:26 ` Peter Zijlstra
  2007-01-31 18:02 ` Hugh Dickins
  0 siblings, 2 replies; 14+ messages in thread
From: Ken Chen @ 2007-01-31  4:41 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton; +Cc: linux-mm

I stomped on another piece of code in zap_pte_range() that is a bit
questionable: when kernel unmaps an address range, it needs to transfer
PTE state into page struct. Currently, kernel transfer both dirty bit
and access bit via set_page_dirty and mark_page_accessed.

set_page_dirty is necessary and required.  However, transfering access
bit doesn't look logical.  Kernel usually mark the page accessed at the
time of fault, for example shmem_nopage() does so.  At unmap, another
call to mark_page_accessed is called and this causes page LRU state to
be bumped up one step closer to more recently used state. It is causing
quite a bit headache in a scenario when a process creates a shmem segment,
touch a whole bunch of pages, then unmaps it. The unmapping takes a long
time because mark_page_accessed() will start moving pages from inactive
to active list.

I'm not too much concerned with moving the page from one list to another
in LRU. Sooner or later it might be moved because of multiple mappings
from various processes.  But it just doesn't look logical that when user
asks a range to be unmapped, it's his intention that the process is no
longer interested in these pages. Moving those pages to active list (or
bumping up a state towards more active) seems to be an over reaction. It
also prolongs unmapping latency which is the core issue I'm trying to solve.

Given that the LRU state is maintained properly at fault time, I think we
should remove it in the unmap path.

Signed-off-by: Ken Chen <kenchen@google.com>

---
Hugh, would you please review?

diff -Nurp linux-2.6.20-rc6/mm/memory.c linux-2.6.20-rc6.unmap/mm/memory.c
--- linux-2.6.20-rc6/mm/memory.c	2007-01-30 19:23:45.000000000 -0800
+++ linux-2.6.20-rc6.unmap/mm/memory.c	2007-01-30 19:25:38.000000000 -0800
@@ -677,8 +677,6 @@ static unsigned long zap_pte_range(struc
 			else {
 				if (pte_dirty(ptent))
 					set_page_dirty(page);
-				if (pte_young(ptent))
-					mark_page_accessed(page);
 				file_rss--;
 			}
 			page_remove_rmap(page, vma);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31  4:41 [patch] not to disturb page LRU state when unmapping memory range Ken Chen
@ 2007-01-31 12:26 ` Peter Zijlstra
  2007-01-31 19:15   ` Balbir Singh
  2007-01-31 18:02 ` Hugh Dickins
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2007-01-31 12:26 UTC (permalink / raw)
  To: Ken Chen; +Cc: Hugh Dickins, Andrew Morton, linux-mm

On Tue, 2007-01-30 at 20:41 -0800, Ken Chen wrote:
> I stomped on another piece of code in zap_pte_range() that is a bit
> questionable: when kernel unmaps an address range, it needs to transfer
> PTE state into page struct. Currently, kernel transfer both dirty bit
> and access bit via set_page_dirty and mark_page_accessed.
> 
> set_page_dirty is necessary and required.  However, transfering access
> bit doesn't look logical.  Kernel usually mark the page accessed at the
> time of fault, for example shmem_nopage() does so.  At unmap, another
> call to mark_page_accessed is called and this causes page LRU state to
> be bumped up one step closer to more recently used state. It is causing
> quite a bit headache in a scenario when a process creates a shmem segment,
> touch a whole bunch of pages, then unmaps it. The unmapping takes a long
> time because mark_page_accessed() will start moving pages from inactive
> to active list.
> 
> I'm not too much concerned with moving the page from one list to another
> in LRU. Sooner or later it might be moved because of multiple mappings
> from various processes.  But it just doesn't look logical that when user
> asks a range to be unmapped, it's his intention that the process is no
> longer interested in these pages. Moving those pages to active list (or
> bumping up a state towards more active) seems to be an over reaction. It
> also prolongs unmapping latency which is the core issue I'm trying to solve.
> 
> Given that the LRU state is maintained properly at fault time, I think we
> should remove it in the unmap path.

We do not maintain the accessed state with faults. We might set an
initial ref bit, but thereafter it is up to page reclaim to scan for pte
young pages.

So by blindly removing the mark_page_accessed() call we do lose
information, it might have been recently referenced and it might still
be relevant (think of sliding mmaps and such).

That said, I think mark_page_accessed() does the wrong thing here, if it
were the page scanner that would pass by it would only act as if
PageReferenced() were set.

So may I suggest the following?

It preserves the information, but not more.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/mm/memory.c b/mm/memory.c
index ef09f0a..b1f9129 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -678,7 +678,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 				if (pte_dirty(ptent))
 					set_page_dirty(page);
 				if (pte_young(ptent))
-					mark_page_accessed(page);
+					SetPageReferenced(page);
 				file_rss--;
 			}
 			page_remove_rmap(page, vma);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31  4:41 [patch] not to disturb page LRU state when unmapping memory range Ken Chen
  2007-01-31 12:26 ` Peter Zijlstra
@ 2007-01-31 18:02 ` Hugh Dickins
  2007-01-31 21:43   ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Hugh Dickins @ 2007-01-31 18:02 UTC (permalink / raw)
  To: Ken Chen; +Cc: Andrew Morton, linux-mm

On Tue, 30 Jan 2007, Ken Chen wrote:

> I stomped on another piece of code in zap_pte_range() that is a bit
> questionable: when kernel unmaps an address range, it needs to transfer
> PTE state into page struct. Currently, kernel transfer both dirty bit
> and access bit via set_page_dirty and mark_page_accessed.
> 
> set_page_dirty is necessary and required.  However, transfering access
> bit doesn't look logical.  Kernel usually mark the page accessed at the
> time of fault, for example shmem_nopage() does so.  At unmap, another
> call to mark_page_accessed is called and this causes page LRU state to
> be bumped up one step closer to more recently used state. It is causing
> quite a bit headache in a scenario when a process creates a shmem segment,
> touch a whole bunch of pages, then unmaps it. The unmapping takes a long
> time because mark_page_accessed() will start moving pages from inactive
> to active list.
> 
> I'm not too much concerned with moving the page from one list to another
> in LRU. Sooner or later it might be moved because of multiple mappings
> from various processes.  But it just doesn't look logical that when user
> asks a range to be unmapped, it's his intention that the process is no
> longer interested in these pages. Moving those pages to active list (or
> bumping up a state towards more active) seems to be an over reaction. It
> also prolongs unmapping latency which is the core issue I'm trying to solve.
> 
> Given that the LRU state is maintained properly at fault time, I think we
> should remove it in the unmap path.
> 
> Signed-off-by: Ken Chen <kenchen@google.com>
> 
> ---
> Hugh, would you please review?

I'm sympathetic, but I'm going to chicken out on this one.  It was
me who made that set_page_dirty and mark_page_accessed conditional on
!PageAnon: because I didn't like the waste of time either, and could
see it was pointless in the PageAnon case.  But the situation is much
less clear to me in the file case, and it is very longstanding code.

If we had a large and representative set of test cases, I'd ask you to
run that with and without your change, and report back timings.  If.

Peter and Nick (and Rik and Andrea) are much better people to ask than
me, on such balancing matters - they have a much better feel for how
those LRUs end up working.

Peter's SetPageReferenced compromise seems appealing: I'd feel better
about it if we had other raw uses of SetPageReferenced in the balancing
code, to follow as precedents.  There used to be one in do_anonymous_page,
but Nick and I found that an odd-one-out and conspired to have it removed
in 2.6.16.

Hugh

> 
> diff -Nurp linux-2.6.20-rc6/mm/memory.c linux-2.6.20-rc6.unmap/mm/memory.c
> --- linux-2.6.20-rc6/mm/memory.c	2007-01-30 19:23:45.000000000 -0800
> +++ linux-2.6.20-rc6.unmap/mm/memory.c	2007-01-30 19:25:38.000000000 -0800
> @@ -677,8 +677,6 @@ static unsigned long zap_pte_range(struc
> 			else {
> 				if (pte_dirty(ptent))
> 					set_page_dirty(page);
> -				if (pte_young(ptent))
> -					mark_page_accessed(page);
> 				file_rss--;
> 			}
> 			page_remove_rmap(page, vma);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31 12:26 ` Peter Zijlstra
@ 2007-01-31 19:15   ` Balbir Singh
  2007-01-31 19:30     ` Christoph Lameter
  0 siblings, 1 reply; 14+ messages in thread
From: Balbir Singh @ 2007-01-31 19:15 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ken Chen, Hugh Dickins, Andrew Morton, linux-mm

Peter Zijlstra wrote:
[snip]

> It preserves the information, but not more.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> diff --git a/mm/memory.c b/mm/memory.c
> index ef09f0a..b1f9129 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -678,7 +678,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  				if (pte_dirty(ptent))
>  					set_page_dirty(page);
>  				if (pte_young(ptent))
> -					mark_page_accessed(page);
> +					SetPageReferenced(page);
>  				file_rss--;
>  			}
>  			page_remove_rmap(page, vma);

Does it make sense to do this only for shared mapped pages?

if (pte_young(ptent) && (page_mapcount(page) > 1))
	SetPageReferenced(page);


	Balbir Singh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31 19:15   ` Balbir Singh
@ 2007-01-31 19:30     ` Christoph Lameter
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2007-01-31 19:30 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Peter Zijlstra, Ken Chen, Hugh Dickins, Andrew Morton, linux-mm

On Thu, 1 Feb 2007, Balbir Singh wrote:

> Does it make sense to do this only for shared mapped pages?
> 
> if (pte_young(ptent) && (page_mapcount(page) > 1))
> 	SetPageReferenced(page);

If the page is only mapped by the process releasing the memory then it may 
be considered less likely that the page is reused. But the basic issue 
that Huge mentioned remains.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31 18:02 ` Hugh Dickins
@ 2007-01-31 21:43   ` Peter Zijlstra
  2007-01-31 21:51     ` Ken Chen
  2007-01-31 22:04     ` Andrew Morton
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2007-01-31 21:43 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Ken Chen, Andrew Morton, linux-mm

On Wed, 2007-01-31 at 18:02 +0000, Hugh Dickins wrote:

> I'm sympathetic, but I'm going to chicken out on this one.  It was
> me who made that set_page_dirty and mark_page_accessed conditional on
> !PageAnon: because I didn't like the waste of time either, and could
> see it was pointless in the PageAnon case.  But the situation is much
> less clear to me in the file case, and it is very longstanding code.

> Peter's SetPageReferenced compromise seems appealing: I'd feel better
> about it if we had other raw uses of SetPageReferenced in the balancing
> code, to follow as precedents.  There used to be one in do_anonymous_page,
> but Nick and I found that an odd-one-out and conspired to have it removed
> in 2.6.16.

The trouble seems to be that mark_page_accessed() is deformed by this
use once magick. And that really works against us in this case.

The fact is that these pages can have multiple mappings triggering
multiple calls to mark_page_accessed() launching these pages into the
active set. Which clearly seems wrong to me.

I'll go over other callers tomorrow, but I'd really like to change this
to SetPageReferenced(), this will just preserve the PTE young state and
let page reclaim do its usual thing.

Andrew, any strong opinions?

NOTE - the page_mapcount(page) > 1, idea seems interesting but lets not
go there, yet..

NOTE - recall, that in the PG_useonce patches mark_page_accessed() will
again be a simple:

  if (!PageReferenced(page))
    SetPageReferenced(page);

If only I could come up with a proper set of tests that covers all
this...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31 21:43   ` Peter Zijlstra
@ 2007-01-31 21:51     ` Ken Chen
  2007-01-31 22:04     ` Andrew Morton
  1 sibling, 0 replies; 14+ messages in thread
From: Ken Chen @ 2007-01-31 21:51 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Hugh Dickins, Andrew Morton, linux-mm

On 1/31/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Wed, 2007-01-31 at 18:02 +0000, Hugh Dickins wrote:
>
> > I'm sympathetic, but I'm going to chicken out on this one.  It was
> > me who made that set_page_dirty and mark_page_accessed conditional on
> > !PageAnon: because I didn't like the waste of time either, and could
> > see it was pointless in the PageAnon case.  But the situation is much
> > less clear to me in the file case, and it is very longstanding code.
>
> > Peter's SetPageReferenced compromise seems appealing: I'd feel better
> > about it if we had other raw uses of SetPageReferenced in the balancing
> > code, to follow as precedents.  There used to be one in do_anonymous_page,
> > but Nick and I found that an odd-one-out and conspired to have it removed
> > in 2.6.16.
>
> The trouble seems to be that mark_page_accessed() is deformed by this
> use once magick. And that really works against us in this case.
>
> The fact is that these pages can have multiple mappings triggering
> multiple calls to mark_page_accessed() launching these pages into the
> active set. Which clearly seems wrong to me.
>
> I'll go over other callers tomorrow, but I'd really like to change this
> to SetPageReferenced(), this will just preserve the PTE young state and
> let page reclaim do its usual thing.

I agree with Peter on changing it to SetPageReferenced() as a middle
ground.  Tested and it does relief majority of the problem by eliminate
calls to activate_page().  Ack'ing on Peter's earlier patch.

Acked-by: Ken Chen <kenchen@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31 21:43   ` Peter Zijlstra
  2007-01-31 21:51     ` Ken Chen
@ 2007-01-31 22:04     ` Andrew Morton
  2007-01-31 22:25       ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2007-01-31 22:04 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Hugh Dickins, Ken Chen, linux-mm

On Wed, 31 Jan 2007 22:43:31 +0100
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Wed, 2007-01-31 at 18:02 +0000, Hugh Dickins wrote:
> 
> > I'm sympathetic, but I'm going to chicken out on this one.  It was
> > me who made that set_page_dirty and mark_page_accessed conditional on
> > !PageAnon: because I didn't like the waste of time either, and could
> > see it was pointless in the PageAnon case.  But the situation is much
> > less clear to me in the file case, and it is very longstanding code.
> 
> > Peter's SetPageReferenced compromise seems appealing: I'd feel better
> > about it if we had other raw uses of SetPageReferenced in the balancing
> > code, to follow as precedents.  There used to be one in do_anonymous_page,
> > but Nick and I found that an odd-one-out and conspired to have it removed
> > in 2.6.16.
> 
> The trouble seems to be that mark_page_accessed() is deformed by this
> use once magick. And that really works against us in this case.
> 
> The fact is that these pages can have multiple mappings triggering
> multiple calls to mark_page_accessed() launching these pages into the
> active set. Which clearly seems wrong to me.
> 
> I'll go over other callers tomorrow, but I'd really like to change this
> to SetPageReferenced(), this will just preserve the PTE young state and
> let page reclaim do its usual thing.
> 
> Andrew, any strong opinions?

Not really.  If we change something in there, some workloads will get
better, some will get worse and most will be unaffected and any regressions
we cause won't be known until six months later.  The usual deal.

Remember that all this info is supposed to be estimating what is likely to
happen to this page in the future - we're not interested in what happened
in the past, per-se.

I'd have thought that if multiple processes are touching the same
page, this is a reason to think that the page will be required again in the
immediate future.  But you seem to think otherwise?

> NOTE - the page_mapcount(page) > 1, idea seems interesting but lets not
> go there, yet..
> 
> NOTE - recall, that in the PG_useonce patches mark_page_accessed() will
> again be a simple:
> 
>   if (!PageReferenced(page))
>     SetPageReferenced(page);
> 
> If only I could come up with a proper set of tests that covers all
> this...

Well yes, that's rather a sore point.  It's tough.  I wonder what $OTHER_OS
developers have done.  Probably their tests are priority ordered by
$market_share of their user's applications :(

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31 22:04     ` Andrew Morton
@ 2007-01-31 22:25       ` Peter Zijlstra
  2007-01-31 22:48         ` Andrew Morton
  2007-02-01  3:13         ` Rik van Riel
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2007-01-31 22:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, Ken Chen, linux-mm

On Wed, 2007-01-31 at 14:04 -0800, Andrew Morton wrote:

> > Andrew, any strong opinions?
> 
> Not really.  If we change something in there, some workloads will get
> better, some will get worse and most will be unaffected and any regressions
> we cause won't be known until six months later.  The usual deal.
> 
> Remember that all this info is supposed to be estimating what is likely to
> happen to this page in the future - we're not interested in what happened
> in the past, per-se.
> 
> I'd have thought that if multiple processes are touching the same
> page, this is a reason to think that the page will be required again in the
> immediate future.  But you seem to think otherwise?

Yes, why would unmapping a range make the pages more likely to be used
in the immediate future than otherwise indicated by their individual
young bits?

Even the opposite was suggested, that unmapping a range makes it less
likely to be used again.

> > If only I could come up with a proper set of tests that covers all
> > this...
> 
> Well yes, that's rather a sore point.  It's tough.  I wonder what $OTHER_OS
> developers have done.  Probably their tests are priority ordered by
> $market_share of their user's applications :(

Still requires them to set up and run said programs. If we could get a
suite of programs that we consider interesting....

Just hoping, I seem to be stuck with quite a lot of code without means
of evaluation.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31 22:25       ` Peter Zijlstra
@ 2007-01-31 22:48         ` Andrew Morton
  2007-01-31 23:52           ` Peter Zijlstra
  2007-02-01  3:21           ` Rik van Riel
  2007-02-01  3:13         ` Rik van Riel
  1 sibling, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2007-01-31 22:48 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Hugh Dickins, Ken Chen, linux-mm

On Wed, 31 Jan 2007 23:25:00 +0100
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Wed, 2007-01-31 at 14:04 -0800, Andrew Morton wrote:
> 
> > > Andrew, any strong opinions?
> > 
> > Not really.  If we change something in there, some workloads will get
> > better, some will get worse and most will be unaffected and any regressions
> > we cause won't be known until six months later.  The usual deal.
> > 
> > Remember that all this info is supposed to be estimating what is likely to
> > happen to this page in the future - we're not interested in what happened
> > in the past, per-se.
> > 
> > I'd have thought that if multiple processes are touching the same
> > page, this is a reason to think that the page will be required again in the
> > immediate future.  But you seem to think otherwise?
> 
> Yes, why would unmapping a range make the pages more likely to be used
> in the immediate future than otherwise indicated by their individual
> young bits?
> 
> Even the opposite was suggested, that unmapping a range makes it less
> likely to be used again.

Ah, yes, well, that's different.

Our handling of page referenced information is basically random: we had
something in place in 2.4.midway, then use-once went in and churned things
around, then we turned VM upside-down in 2.5 and I basically tried to keep
what we then had in an unaltered state in the fond belief that someone
would one day get down and actually apply some design and thought to what
we're doing.  That has yet to happen.

Take a simple mmap+pagefault+munmap path.  The initial fault will leave the
page pte-referenced+PageReferenced+!PageActive.  If the vm scanner sees the
page it will become !pte-referenced+!PageReferenced+PageActive.  If it gets
unmapped it becomes !PageReferenced+PageActive.

These things at least seem to be somewhat consistent.  But I'm not sure
there's any logic behind it.

Perhaps we're approaching this from the wrong direction.  Rather than
looking at the code and saying "hey, we should change that", we should be
looking at workloads and seeing how they can be improved.  Perhaps.

In the above (simple, common) scenario the proposed
s/mark_page_accessed/SetPageReferenced/ change will cause the page to end
up PageReferenced+!PageActive.  ie: it ends up on the inactive list and not
the active list.  <tests it, confirms>.  That's a substantial change in
behaviour: inactive-list pages are considerably more reclaimable than
active-list ones and we might well alter things for people my making this
change.  Whether that alteration is net-good or net-bad is unknown ;)

> > > If only I could come up with a proper set of tests that covers all
> > > this...
> > 
> > Well yes, that's rather a sore point.  It's tough.  I wonder what $OTHER_OS
> > developers have done.  Probably their tests are priority ordered by
> > $market_share of their user's applications :(
> 
> Still requires them to set up and run said programs. If we could get a
> suite of programs that we consider interesting....
> 
> Just hoping, I seem to be stuck with quite a lot of code without means
> of evaluation.

We don't _have_ to use live applications.  Often they are hard to set up,
and do complex things and are hard to understand.  A more controllable and
ultimately more useful result could be achieved by defining *workloads*:
particular scenarios for the VM.  Then write simple and easily observeable
testcases for each scenario.  That's basically what people do, I think, but
it's all a bit ad-hoc and uncoordinated.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31 22:48         ` Andrew Morton
@ 2007-01-31 23:52           ` Peter Zijlstra
  2007-02-01  0:33             ` Andrew Morton
  2007-02-01  3:21           ` Rik van Riel
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2007-01-31 23:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, Ken Chen, linux-mm

On Wed, 2007-01-31 at 14:48 -0800, Andrew Morton wrote:
> On Wed, 31 Jan 2007 23:25:00 +0100
> Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> > On Wed, 2007-01-31 at 14:04 -0800, Andrew Morton wrote:
> > 
> > > > Andrew, any strong opinions?
> > > 
> > > Not really.  If we change something in there, some workloads will get
> > > better, some will get worse and most will be unaffected and any regressions
> > > we cause won't be known until six months later.  The usual deal.
> > > 
> > > Remember that all this info is supposed to be estimating what is likely to
> > > happen to this page in the future - we're not interested in what happened
> > > in the past, per-se.
> > > 
> > > I'd have thought that if multiple processes are touching the same
> > > page, this is a reason to think that the page will be required again in the
> > > immediate future.  But you seem to think otherwise?
> > 
> > Yes, why would unmapping a range make the pages more likely to be used
> > in the immediate future than otherwise indicated by their individual
> > young bits?
> > 
> > Even the opposite was suggested, that unmapping a range makes it less
> > likely to be used again.
> 
> Ah, yes, well, that's different.
> 
> Our handling of page referenced information is basically random: we had
> something in place in 2.4.midway, then use-once went in and churned things
> around, then we turned VM upside-down in 2.5 and I basically tried to keep
> what we then had in an unaltered state in the fond belief that someone
> would one day get down and actually apply some design and thought to what
> we're doing.  That has yet to happen.
> 
> Take a simple mmap+pagefault+munmap path.  The initial fault will leave the
> page pte-referenced+PageReferenced+!PageActive. 

Assuming major fault, a minor fault might well map an active page.

>  If the vm scanner sees the
> page it will become !pte-referenced+!PageReferenced+PageActive.  If it gets
> unmapped it becomes !PageReferenced+PageActive.

scanner does:

1) referenced,   inactive -> unreferenced, active
2) referenced,   active   -> unreferenced, active

3) unreferenced, active   -> unreferenced, inactive
4) unreferenced, inactive -> reclaimed

> These things at least seem to be somewhat consistent.  But I'm not sure
> there's any logic behind it.

Seems rather logical, 2 level state, each clock period you either
promote or demote depending on activity.

> Perhaps we're approaching this from the wrong direction.  Rather than
> looking at the code and saying "hey, we should change that", we should be
> looking at workloads and seeing how they can be improved.  Perhaps.

Any which way I'm turning it, it keeps being a blind shot. But I get the
idea.

> In the above (simple, common) scenario the proposed
> s/mark_page_accessed/SetPageReferenced/ change will cause the page to end
> up PageReferenced+!PageActive. 

How so, it will not demote the page to inactive. 

Now unmap could promote to active, with the change not so. Neither will
ever demote, only page reclaim will do that.

currently with mark_page_accessed:

 referenced := (pte young || PageReferenced) 

1 active pte

  referenced (pte, !PG_referenced), inactive -> referenced,   inactive
  referenced (pte ,PG_referenced),  inactive -> unreferenced, active
  *,                                active   -> referenced,   active

2 active ptes

  referenced (pte, !PG_referenced), inactive -> unreferenced, active
  referenced (pte, PG_referenced),  inactive -> referenced, active
  *,                                active   -> referenced, active

3+ active ptes

  *, * -> referenced, active

which I find quite horrid for unmap...

Or, with the proposed SetPageReferenced:

1+ active pte(s)
  referenced (pte,!PG_referenced), * -> referenced (PG_referenced), *
  referenced (pte, PG_referenced), * -> referenced (PG_referenced), *

Its actually an identity map, it just moves pte young bits into the
referenced bit, which is all the same to page_referenced().

>  ie: it ends up on the inactive list and not
> the active list.  <tests it, confirms>. 

it will stay on whatever list it was.

>  That's a substantial change in
> behaviour: inactive-list pages are considerably more reclaimable than
> active-list ones and we might well alter things for people my making this
> change.  Whether that alteration is net-good or net-bad is unknown ;)

Its quite a change indeed, but either I'm not quite parsing what you're
saying and we're in violent agreement, or I should go sleep ;-)

I hope this state machinery makes sense, I feel asleep already.

> We don't _have_ to use live applications.  Often they are hard to set up,
> and do complex things and are hard to understand.  

> A more controllable and
> ultimately more useful result could be achieved by defining *workloads*:
> particular scenarios for the VM.  

> Then write simple and easily observeable
> testcases for each scenario.  That's basically what people do, I think, but
> it's all a bit ad-hoc and uncoordinated.

I have started writing an application that can perform simple patterns,
perhaps we should discuss interesting patterns during the VM summit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31 23:52           ` Peter Zijlstra
@ 2007-02-01  0:33             ` Andrew Morton
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2007-02-01  0:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Hugh Dickins, Ken Chen, linux-mm

On Thu, 01 Feb 2007 00:52:14 +0100
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> > In the above (simple, common) scenario the proposed
> > s/mark_page_accessed/SetPageReferenced/ change will cause the page to end
> > up PageReferenced+!PageActive. 
> 
> How so, it will not demote the page to inactive. 
> 
> Now unmap could promote to active, with the change not so. Neither will
> ever demote, only page reclaim will do that.
> 
> currently with mark_page_accessed:
> 
>  referenced := (pte young || PageReferenced) 
> 
> 1 active pte
> 
>   referenced (pte, !PG_referenced), inactive -> referenced,   inactive
>   referenced (pte ,PG_referenced),  inactive -> unreferenced, active
>   *,                                active   -> referenced,   active
> 
> 2 active ptes
> 
>   referenced (pte, !PG_referenced), inactive -> unreferenced, active
>   referenced (pte, PG_referenced),  inactive -> referenced, active
>   *,                                active   -> referenced, active
> 
> 3+ active ptes
> 
>   *, * -> referenced, active
> 
> which I find quite horrid for unmap...
> 
> Or, with the proposed SetPageReferenced:
> 
> 1+ active pte(s)
>   referenced (pte,!PG_referenced), * -> referenced (PG_referenced), *
>   referenced (pte, PG_referenced), * -> referenced (PG_referenced), *
> 
> Its actually an identity map, it just moves pte young bits into the
> referenced bit, which is all the same to page_referenced().

<head spins>


Test it.  On the major fault the pages start out on the inactive list.  On
the munmap they goe onto the active list.  Taking the mark_page_accessed()
out of munmap() causes them to remain on the inactive list.

> >  ie: it ends up on the inactive list and not
> > the active list.  <tests it, confirms>. 
> 
> it will stay on whatever list it was.

Namely the inactive list.  Unlike 2.6.20-rc7.  That's a big change.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31 22:25       ` Peter Zijlstra
  2007-01-31 22:48         ` Andrew Morton
@ 2007-02-01  3:13         ` Rik van Riel
  1 sibling, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2007-02-01  3:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andrew Morton, Hugh Dickins, Ken Chen, linux-mm

Peter Zijlstra wrote:

> Yes, why would unmapping a range make the pages more likely to be used
> in the immediate future than otherwise indicated by their individual
> young bits?
> 
> Even the opposite was suggested, that unmapping a range makes it less
> likely to be used again.

I agree, the VM looks at the usage of individual pages and makes
decisions based on that.  We can only see how often individual
pages are referenced, and do not have much additional information
(except from the readahead code).

Making sweeping generalizations like "unmapping makes pages less
likely to be needed again" is bound to cause trouble for some
workloads.

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] not to disturb page LRU state when unmapping memory range
  2007-01-31 22:48         ` Andrew Morton
  2007-01-31 23:52           ` Peter Zijlstra
@ 2007-02-01  3:21           ` Rik van Riel
  1 sibling, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2007-02-01  3:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Peter Zijlstra, Hugh Dickins, Ken Chen, linux-mm

Andrew Morton wrote:

> Perhaps we're approaching this from the wrong direction.  Rather than
> looking at the code and saying "hey, we should change that", we should be
> looking at workloads and seeing how they can be improved.  Perhaps.

I think this makes a lot of sense.  It may not be benchmarkable,
because there is no exhaustive test of workloads, but we can at
least come up with several conceptual groups of workloads that
should be kept in mind when changing things to the VM.

I could think of a few workloads and their characteristics and
desired behaviour:

1) desktop workload - program working sets need to be kept in
    memory and protected from pressure by streaming IO

2) database workload - some pages get accessed more frequently
    than others, those need to be kept resident in memory

3) file server workload - some pages get accessed more frequently
    than others, those need to be kept resident in memory.  This
    is similar to the database workload, except the inter-reference
    distance on a file server is WAY larger and an LRU queue is
    likely not large enough to catch even the frequently accessed
    pages.

4) web server workload - somewhere in-between the desktop and the
    file server, the working sets of the server programs need to be
    kept in memory, and we want to cache the frequently accessed
    data pages

5) developer desktop - like the desktop workload, except we have
    programs like git and rsync doing streaming IO with double
    accesses next to each other, which will push the working sets
    of the desktop programs out of memory if our use-once algorithm
    gets fooled

6) realtime data processing - this kind of workload is usually
    mlocked, but sometimes still wants to do lots of file IO.
    We need to make sure the VM does not get upset by the sometimes
    large amount of mlocked data

7) ... fill in your own workload here :)

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-02-01  7:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-31  4:41 [patch] not to disturb page LRU state when unmapping memory range Ken Chen
2007-01-31 12:26 ` Peter Zijlstra
2007-01-31 19:15   ` Balbir Singh
2007-01-31 19:30     ` Christoph Lameter
2007-01-31 18:02 ` Hugh Dickins
2007-01-31 21:43   ` Peter Zijlstra
2007-01-31 21:51     ` Ken Chen
2007-01-31 22:04     ` Andrew Morton
2007-01-31 22:25       ` Peter Zijlstra
2007-01-31 22:48         ` Andrew Morton
2007-01-31 23:52           ` Peter Zijlstra
2007-02-01  0:33             ` Andrew Morton
2007-02-01  3:21           ` Rik van Riel
2007-02-01  3:13         ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).