Re: KSM: current madvise rollup

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Izik Eidus <ieidus@redhat.com>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Rik van Riel <riel@redhat.com>, Chris Wright <chrisw@redhat.com>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: KSM: current madvise rollup
Date: Mon, 13 Jul 2009 21:28:21 +0300	[thread overview]
Message-ID: <4A5B7CC5.4030908@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0907121459150.7417@sister.anvils>

Hugh Dickins wrote:
> On Sun, 12 Jul 2009, Izik Eidus wrote:
>   
>> On Sat, 11 Jul 2009 20:22:11 +0100 (BST)
>> Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
>>     
>>> We may want to do that anyway.  It concerned me a lot when I was
>>> first testing (and often saw kernel_pages_allocated greater than
>>> pages_shared - probably because of the original KSM's eagerness to
>>> merge forked pages, though I think there may have been more to it
>>> than that).  But seems much less of an issue now (that ratio is much
>>> healthier), and even less of an issue once KSM pages can be swapped.
>>> So I'm not bothering about it at the moment, but it may make sense.
>>>       
>
> I realized since writing that with the current statistics you really
> cannot tell how big an issue the orphaned (count 1) KSM pages are -
> good sharing of a few will completely hide non-sharing of many.
>
> But I've hacked in more stats (not something I'd care to share yet!),
> and those confirm that for my loads at least, the orphaned KSM pages
> are few compared with the shared ones.
>
>   
>> We could add patch like the below, but I think we should leave it as it
>> is now,
>>     
>
> I agree we should leave it as is for now.  My guess is that we'll
> prefer to leave them around, until approaching max_kernel_pages_alloc,
> pruning them only at that stage (rather as we free swap more aggressively
> when it's 50% full).  There may be benefit in not removing them too soon,
> there may be benefit in holding on to stable pages for longer (holding a
> reference in the stable tree for a while).  Or maybe not, just an idea.
>   

Well, I personaly dont see any real soultion to this problem without 
real swapping support,
So for now I dont mind not to deal with it at all (as long as the admin 
can decide how many unswappable pages he allow)
But, If you have a stronger opnion than me for that case, I really dont 
care to change it.

>   
>> and solve it all (like you have said) with the ksm pages
>> swapping support in next kernel release.
>> (Right now ksm can limit itself with max_kernel_pages_alloc)
>>
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index a0fbdb2..ee80861 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -1261,8 +1261,13 @@ static void ksm_do_scan(unsigned int scan_npages)
>>  		rmap_item = scan_get_next_rmap_item(&page);
>>  		if (!rmap_item)
>>  			return;
>> -		if (!PageKsm(page) || !in_stable_tree(rmap_item))
>> +		if (!PageKsm(page) || !in_stable_tree(rmap_item)) {
>>  			cmp_and_merge_page(page, rmap_item);
>> +		} else if (page_mapcount(page) == 0) {
>>     
>
> If we did that (but we agree not for now), shouldn't it be
> 			   page_mapcount(page) == 1
> ?  The mapcount 0 ones already got freed by the zap/unmap code.
>   

Yea, I dont know what i thought when i compare it to zero..., 1 is the 
right value here indeed.

>   
>> +			break_cow(rmap_item->mm,
>> +				  rmap_item->address & PAGE_MASK);
>>     
>
> Just a note on that " & PAGE_MASK": it's unnecessary there and
> almost everywhere else.  One of the pleasures of putting flags into
> the bottom bits of the address, in code concerned with faulting, is
> that the faulting address can be anywhere within the page, so we
> don't have to bother to mask off the flags.
>
>   
>> +			remove_rmap_item_from_tree(rmap_item);
>> +		}
>>  		put_page(page);
>>  	}
>>  }
>>
>>     
>>> Oh, something that might be making it higher, that I didn't highlight
>>> (and can revert if you like, it was just more straightforward this
>>> way): with scan_get_next_rmap skipping the non-present ptes,
>>> pages_to_scan is currently a limit on the _present_ pages scanned in
>>> one batch.
>>>       
>> You mean that now when you say: pages_to_scan = 512, it wont count the
>> none present ptes as part of the counter, so if we have 500 not present
>> ptes in the begining and then 512 ptes later, before it used to call
>> cmp_and_merge_page() only for 12 pages while now it will get called on
>> 512 pages?
>>     
>
> If I understand you right, yes, before it would do those 500 absent then
> 512 present in two batches, first 512 (of which only 12 present) then 500;
> whereas now it'll skip the 500 absent without counting them, and handle
> the 512 present in that same one batch.
>
>   
>> If yes, then I liked this change, it is more logical from cpu
>> consumption point of view,
>>     
>
> Yes, although it does spend a little time on the absent ones, it should
> be much less time than it spends comparing or checksumming on present ones.
>   

Yea, compare to the other stuff it is minor...

>   
>> and in addition we have that cond_reched()
>> so I dont see a problem with this.
>>     
>
> Right, that cond_resched() is vital in this case.
>
> By the way, something else I didn't highlight, a significant benefit
> from avoiding get_user_pages(): that was doing a mark_page_accessed()
> on every present pte that it found, interfering with pageout decisions.
>   

That is a great value, i didn't even thought about that...

> Hugh
>

WARNING: multiple messages have this Message-ID (diff)

From: Izik Eidus <ieidus@redhat.com>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Rik van Riel <riel@redhat.com>, Chris Wright <chrisw@redhat.com>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: KSM: current madvise rollup
Date: Mon, 13 Jul 2009 21:28:21 +0300	[thread overview]
Message-ID: <4A5B7CC5.4030908@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0907121459150.7417@sister.anvils>

Hugh Dickins wrote:
> On Sun, 12 Jul 2009, Izik Eidus wrote:
>   
>> On Sat, 11 Jul 2009 20:22:11 +0100 (BST)
>> Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
>>     
>>> We may want to do that anyway.  It concerned me a lot when I was
>>> first testing (and often saw kernel_pages_allocated greater than
>>> pages_shared - probably because of the original KSM's eagerness to
>>> merge forked pages, though I think there may have been more to it
>>> than that).  But seems much less of an issue now (that ratio is much
>>> healthier), and even less of an issue once KSM pages can be swapped.
>>> So I'm not bothering about it at the moment, but it may make sense.
>>>       
>
> I realized since writing that with the current statistics you really
> cannot tell how big an issue the orphaned (count 1) KSM pages are -
> good sharing of a few will completely hide non-sharing of many.
>
> But I've hacked in more stats (not something I'd care to share yet!),
> and those confirm that for my loads at least, the orphaned KSM pages
> are few compared with the shared ones.
>
>   
>> We could add patch like the below, but I think we should leave it as it
>> is now,
>>     
>
> I agree we should leave it as is for now.  My guess is that we'll
> prefer to leave them around, until approaching max_kernel_pages_alloc,
> pruning them only at that stage (rather as we free swap more aggressively
> when it's 50% full).  There may be benefit in not removing them too soon,
> there may be benefit in holding on to stable pages for longer (holding a
> reference in the stable tree for a while).  Or maybe not, just an idea.
>   

Well, I personaly dont see any real soultion to this problem without 
real swapping support,
So for now I dont mind not to deal with it at all (as long as the admin 
can decide how many unswappable pages he allow)
But, If you have a stronger opnion than me for that case, I really dont 
care to change it.

>   
>> and solve it all (like you have said) with the ksm pages
>> swapping support in next kernel release.
>> (Right now ksm can limit itself with max_kernel_pages_alloc)
>>
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index a0fbdb2..ee80861 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -1261,8 +1261,13 @@ static void ksm_do_scan(unsigned int scan_npages)
>>  		rmap_item = scan_get_next_rmap_item(&page);
>>  		if (!rmap_item)
>>  			return;
>> -		if (!PageKsm(page) || !in_stable_tree(rmap_item))
>> +		if (!PageKsm(page) || !in_stable_tree(rmap_item)) {
>>  			cmp_and_merge_page(page, rmap_item);
>> +		} else if (page_mapcount(page) == 0) {
>>     
>
> If we did that (but we agree not for now), shouldn't it be
> 			   page_mapcount(page) == 1
> ?  The mapcount 0 ones already got freed by the zap/unmap code.
>   

Yea, I dont know what i thought when i compare it to zero..., 1 is the 
right value here indeed.

>   
>> +			break_cow(rmap_item->mm,
>> +				  rmap_item->address & PAGE_MASK);
>>     
>
> Just a note on that " & PAGE_MASK": it's unnecessary there and
> almost everywhere else.  One of the pleasures of putting flags into
> the bottom bits of the address, in code concerned with faulting, is
> that the faulting address can be anywhere within the page, so we
> don't have to bother to mask off the flags.
>
>   
>> +			remove_rmap_item_from_tree(rmap_item);
>> +		}
>>  		put_page(page);
>>  	}
>>  }
>>
>>     
>>> Oh, something that might be making it higher, that I didn't highlight
>>> (and can revert if you like, it was just more straightforward this
>>> way): with scan_get_next_rmap skipping the non-present ptes,
>>> pages_to_scan is currently a limit on the _present_ pages scanned in
>>> one batch.
>>>       
>> You mean that now when you say: pages_to_scan = 512, it wont count the
>> none present ptes as part of the counter, so if we have 500 not present
>> ptes in the begining and then 512 ptes later, before it used to call
>> cmp_and_merge_page() only for 12 pages while now it will get called on
>> 512 pages?
>>     
>
> If I understand you right, yes, before it would do those 500 absent then
> 512 present in two batches, first 512 (of which only 12 present) then 500;
> whereas now it'll skip the 500 absent without counting them, and handle
> the 512 present in that same one batch.
>
>   
>> If yes, then I liked this change, it is more logical from cpu
>> consumption point of view,
>>     
>
> Yes, although it does spend a little time on the absent ones, it should
> be much less time than it spends comparing or checksumming on present ones.
>   

Yea, compare to the other stuff it is minor...

>   
>> and in addition we have that cond_reched()
>> so I dont see a problem with this.
>>     
>
> Right, that cond_resched() is vital in this case.
>
> By the way, something else I didn't highlight, a significant benefit
> from avoiding get_user_pages(): that was doing a mark_page_accessed()
> on every present pte that it found, interfering with pageout decisions.
>   

That is a great value, i didn't even thought about that...

> Hugh
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-07-13 18:25 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-29 14:14 KSM: current madvise rollup Hugh Dickins
2009-06-29 14:14 ` Hugh Dickins
2009-06-30  9:52 ` Izik Eidus
2009-06-30  9:52   ` Izik Eidus
2009-06-30 15:57   ` Hugh Dickins
2009-06-30 15:57     ` Hugh Dickins
2009-06-30 18:41     ` Izik Eidus
2009-06-30 18:41       ` Izik Eidus
2009-07-01  2:03       ` Hugh Dickins
2009-07-01  2:03         ` Hugh Dickins
2009-07-01  9:50         ` Izik Eidus
2009-07-01  9:50           ` Izik Eidus
2009-07-08 21:07           ` Hugh Dickins
2009-07-08 21:07             ` Hugh Dickins
2009-07-09 15:37             ` Izik Eidus
2009-07-09 15:37               ` Izik Eidus
2009-07-10 13:43               ` Hugh Dickins
2009-07-10 13:43                 ` Hugh Dickins
2009-07-10 22:42             ` Izik Eidus
2009-07-10 22:42               ` Izik Eidus
2009-07-10 22:49               ` Izik Eidus
2009-07-10 22:49                 ` Izik Eidus
2009-07-11 19:22               ` Hugh Dickins
2009-07-11 19:22                 ` Hugh Dickins
2009-07-11 21:22                 ` Izik Eidus
2009-07-11 21:22                   ` Izik Eidus
2009-07-12 14:44                   ` Hugh Dickins
2009-07-12 14:44                     ` Hugh Dickins
2009-07-13 18:28                     ` Izik Eidus [this message]
2009-07-13 18:28                       ` Izik Eidus
2009-07-01 10:33         ` Andrea Arcangeli
2009-07-01 10:33           ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A5B7CC5.4030908@redhat.com \
    --to=ieidus@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisw@redhat.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.