public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] emulate accessed bit for EPT
@ 2010-02-03 21:11 Rik van Riel
  2010-02-04  4:12 ` Balbir Singh
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Rik van Riel @ 2010-02-03 21:11 UTC (permalink / raw)
  To: jdike; +Cc: kvm, linux-kernel, avi, aarcange, mtosatti

Currently KVM pretends that pages with EPT mappings never got
accessed.  This has some side effects in the VM, like swapping
out actively used guest pages and needlessly breaking up actively
used hugepages.

We can avoid those very costly side effects by emulating the
accessed bit for EPT PTEs, which should only be slightly costly
because pages pass through page_referenced infrequently.

TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young().

This seems to help prevent KVM guests from being swapped out when
they should not on my system.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
Jeff, does this patch fix the issue you saw a few months ago, with
a 256MB KVM guest in a cgroup limited to 128GB memory?

 arch/x86/kvm/mmu.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 89a49fb..6101615 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -856,9 +856,15 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	u64 *spte;
 	int young = 0;
 
-	/* always return old for EPT */
+	/*
+	 * Emulate the accessed bit for EPT, by checking if this page has
+	 * an EPT mapping, and clearing it if it does. On the next access,
+	 * a new EPT mapping will be established.
+	 * This has some overhead, but not as much as the cost of swapping
+	 * out actively used pages or breaking up actively used hugepages.
+	 */
 	if (!shadow_accessed_mask)
-		return 0;
+		return kvm_unmap_rmapp(kvm, rmapp, data);
 
 	spte = rmap_next(kvm, rmapp, NULL);
 	while (spte) {

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] emulate accessed bit for EPT
  2010-02-03 21:11 [PATCH] emulate accessed bit for EPT Rik van Riel
@ 2010-02-04  4:12 ` Balbir Singh
  2010-02-04 13:40   ` Rik van Riel
  2010-02-04 16:17 ` Jeff Dike
  2010-02-08 10:27 ` Avi Kivity
  2 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2010-02-04  4:12 UTC (permalink / raw)
  To: Rik van Riel; +Cc: jdike, kvm, linux-kernel, avi, aarcange, mtosatti

* Rik van Riel <riel@redhat.com> [2010-02-03 16:11:03]:

> Currently KVM pretends that pages with EPT mappings never got
> accessed.  This has some side effects in the VM, like swapping
> out actively used guest pages and needlessly breaking up actively
> used hugepages.
> 
> We can avoid those very costly side effects by emulating the
> accessed bit for EPT PTEs, which should only be slightly costly
> because pages pass through page_referenced infrequently.
> 
> TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young().
> 
> This seems to help prevent KVM guests from being swapped out when
> they should not on my system.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> Jeff, does this patch fix the issue you saw a few months ago, with
> a 256MB KVM guest in a cgroup limited to 128GB memory?
> 
>  arch/x86/kvm/mmu.c |   10 ++++++++--
>  1 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 89a49fb..6101615 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -856,9 +856,15 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  	u64 *spte;
>  	int young = 0;
> 
> -	/* always return old for EPT */
> +	/*
> +	 * Emulate the accessed bit for EPT, by checking if this page has
> +	 * an EPT mapping, and clearing it if it does. On the next access,
> +	 * a new EPT mapping will be established.
> +	 * This has some overhead, but not as much as the cost of swapping
> +	 * out actively used pages or breaking up actively used hugepages.
> +	 */
>  	if (!shadow_accessed_mask)
> -		return 0;
> +		return kvm_unmap_rmapp(kvm, rmapp, data);
>

Quite a clever implementation, one side effect is that one would see a
larger number of minor faults with EPT enabled and an increase in
allocation/frees of rmap entries, but that can be easily explained.

-- 
	Balbir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] emulate accessed bit for EPT
  2010-02-04  4:12 ` Balbir Singh
@ 2010-02-04 13:40   ` Rik van Riel
  2010-02-04 15:30     ` Balbir Singh
  2010-02-04 17:47     ` Andrea Arcangeli
  0 siblings, 2 replies; 12+ messages in thread
From: Rik van Riel @ 2010-02-04 13:40 UTC (permalink / raw)
  To: balbir; +Cc: jdike, kvm, linux-kernel, avi, aarcange, mtosatti

On 02/03/2010 11:12 PM, Balbir Singh wrote:
> * Rik van Riel<riel@redhat.com>  [2010-02-03 16:11:03]:
>
>> Currently KVM pretends that pages with EPT mappings never got
>> accessed.  This has some side effects in the VM, like swapping
>> out actively used guest pages and needlessly breaking up actively
>> used hugepages.
>>
>> We can avoid those very costly side effects by emulating the
>> accessed bit for EPT PTEs, which should only be slightly costly
>> because pages pass through page_referenced infrequently.

> Quite a clever implementation, one side effect is that one would see a
> larger number of minor faults with EPT enabled and an increase in
> allocation/frees of rmap entries, but that can be easily explained.

I suspect it won't be very many. I have been monitoring
/proc/meminfo on my system while testing this patch, and
it is quite typical that the size of the inactive anon
list does not change for minutes at a time.

In other words, no pages are moved onto or off of the
inactive anon list for several minutes. That corresponds
to a very small number of minor faults introduced by my
patch.

Of course, when the system is swapping, we will have more
minor faults.  However, minor faults should be less of a
performance issue than major faults :)

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] emulate accessed bit for EPT
  2010-02-04 13:40   ` Rik van Riel
@ 2010-02-04 15:30     ` Balbir Singh
  2010-02-04 15:41       ` Rik van Riel
  2010-02-04 17:47     ` Andrea Arcangeli
  1 sibling, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2010-02-04 15:30 UTC (permalink / raw)
  To: Rik van Riel; +Cc: jdike, kvm, linux-kernel, avi, aarcange, mtosatti

* Rik van Riel <riel@redhat.com> [2010-02-04 08:40:43]:

> On 02/03/2010 11:12 PM, Balbir Singh wrote:
> >* Rik van Riel<riel@redhat.com>  [2010-02-03 16:11:03]:
> >
> >>Currently KVM pretends that pages with EPT mappings never got
> >>accessed.  This has some side effects in the VM, like swapping
> >>out actively used guest pages and needlessly breaking up actively
> >>used hugepages.
> >>
> >>We can avoid those very costly side effects by emulating the
> >>accessed bit for EPT PTEs, which should only be slightly costly
> >>because pages pass through page_referenced infrequently.
> 
> >Quite a clever implementation, one side effect is that one would see a
> >larger number of minor faults with EPT enabled and an increase in
> >allocation/frees of rmap entries, but that can be easily explained.
> 
> I suspect it won't be very many. I have been monitoring
> /proc/meminfo on my system while testing this patch, and
> it is quite typical that the size of the inactive anon
> list does not change for minutes at a time.
> 
> In other words, no pages are moved onto or off of the
> inactive anon list for several minutes. That corresponds
> to a very small number of minor faults introduced by my
> patch.
> 
> Of course, when the system is swapping, we will have more
> minor faults.  However, minor faults should be less of a
> performance issue than major faults :)
>

I do agree with you. 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] emulate accessed bit for EPT
  2010-02-04 15:30     ` Balbir Singh
@ 2010-02-04 15:41       ` Rik van Riel
  2010-02-04 15:52         ` Balbir Singh
  0 siblings, 1 reply; 12+ messages in thread
From: Rik van Riel @ 2010-02-04 15:41 UTC (permalink / raw)
  To: balbir; +Cc: jdike, kvm, linux-kernel, avi, aarcange, mtosatti

Balbir Singh wrote:
> * Rik van Riel <riel@redhat.com> [2010-02-04 08:40:43]:
> 
>> On 02/03/2010 11:12 PM, Balbir Singh wrote:
>>> * Rik van Riel<riel@redhat.com>  [2010-02-03 16:11:03]:
>>>
>>>> Currently KVM pretends that pages with EPT mappings never got
>>>> accessed.  This has some side effects in the VM, like swapping
>>>> out actively used guest pages and needlessly breaking up actively
>>>> used hugepages.
>>>>
>>>> We can avoid those very costly side effects by emulating the
>>>> accessed bit for EPT PTEs, which should only be slightly costly
>>>> because pages pass through page_referenced infrequently.
>>> Quite a clever implementation, one side effect is that one would see a
>>> larger number of minor faults with EPT enabled and an increase in
>>> allocation/frees of rmap entries, but that can be easily explained.
>> I suspect it won't be very many. I have been monitoring
>> /proc/meminfo on my system while testing this patch, and
>> it is quite typical that the size of the inactive anon
>> list does not change for minutes at a time.
>>
>> In other words, no pages are moved onto or off of the
>> inactive anon list for several minutes. That corresponds
>> to a very small number of minor faults introduced by my
>> patch.
>>
>> Of course, when the system is swapping, we will have more
>> minor faults.  However, minor faults should be less of a
>> performance issue than major faults :)
>>
> 
> I do agree with you. 

After 20 hours of uptime, it appears that this patch has
resolved the "KVM guests get swapped while buffer and page
cache stay in memory" problem my home system was experiencing.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] emulate accessed bit for EPT
  2010-02-04 15:41       ` Rik van Riel
@ 2010-02-04 15:52         ` Balbir Singh
  0 siblings, 0 replies; 12+ messages in thread
From: Balbir Singh @ 2010-02-04 15:52 UTC (permalink / raw)
  To: Rik van Riel; +Cc: jdike, kvm, linux-kernel, avi, aarcange, mtosatti

* Rik van Riel <riel@redhat.com> [2010-02-04 10:41:14]:

> Balbir Singh wrote:
> >* Rik van Riel <riel@redhat.com> [2010-02-04 08:40:43]:
> >
> >>On 02/03/2010 11:12 PM, Balbir Singh wrote:
> >>>* Rik van Riel<riel@redhat.com>  [2010-02-03 16:11:03]:
> >>>
> >>>>Currently KVM pretends that pages with EPT mappings never got
> >>>>accessed.  This has some side effects in the VM, like swapping
> >>>>out actively used guest pages and needlessly breaking up actively
> >>>>used hugepages.
> >>>>
> >>>>We can avoid those very costly side effects by emulating the
> >>>>accessed bit for EPT PTEs, which should only be slightly costly
> >>>>because pages pass through page_referenced infrequently.
> >>>Quite a clever implementation, one side effect is that one would see a
> >>>larger number of minor faults with EPT enabled and an increase in
> >>>allocation/frees of rmap entries, but that can be easily explained.
> >>I suspect it won't be very many. I have been monitoring
> >>/proc/meminfo on my system while testing this patch, and
> >>it is quite typical that the size of the inactive anon
> >>list does not change for minutes at a time.
> >>
> >>In other words, no pages are moved onto or off of the
> >>inactive anon list for several minutes. That corresponds
> >>to a very small number of minor faults introduced by my
> >>patch.
> >>
> >>Of course, when the system is swapping, we will have more
> >>minor faults.  However, minor faults should be less of a
> >>performance issue than major faults :)
> >>
> >
> >I do agree with you.
> 
> After 20 hours of uptime, it appears that this patch has
> resolved the "KVM guests get swapped while buffer and page
> cache stay in memory" problem my home system was experiencing.

Is this with cgroups enabled as defined by the setup Jeff had?

-- 
	Balbir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] emulate accessed bit for EPT
  2010-02-03 21:11 [PATCH] emulate accessed bit for EPT Rik van Riel
  2010-02-04  4:12 ` Balbir Singh
@ 2010-02-04 16:17 ` Jeff Dike
  2010-02-08 10:27 ` Avi Kivity
  2 siblings, 0 replies; 12+ messages in thread
From: Jeff Dike @ 2010-02-04 16:17 UTC (permalink / raw)
  To: Rik van Riel; +Cc: kvm, linux-kernel, avi, aarcange, mtosatti

On Wed, Feb 03, 2010 at 04:11:03PM -0500, Rik van Riel wrote:
> Jeff, does this patch fix the issue you saw a few months ago, with
> a 256MB KVM guest in a cgroup limited to 128GB memory?

Hum, let me dust off that workload and give it a shot...

				Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] emulate accessed bit for EPT
  2010-02-04 13:40   ` Rik van Riel
  2010-02-04 15:30     ` Balbir Singh
@ 2010-02-04 17:47     ` Andrea Arcangeli
  2010-02-05 17:34       ` Marcelo Tosatti
  1 sibling, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 2010-02-04 17:47 UTC (permalink / raw)
  To: Rik van Riel; +Cc: balbir, jdike, kvm, linux-kernel, avi, mtosatti

On Thu, Feb 04, 2010 at 08:40:43AM -0500, Rik van Riel wrote:
> I suspect it won't be very many. I have been monitoring
> /proc/meminfo on my system while testing this patch, and
> it is quite typical that the size of the inactive anon
> list does not change for minutes at a time.
> 
> In other words, no pages are moved onto or off of the
> inactive anon list for several minutes. That corresponds
> to a very small number of minor faults introduced by my
> patch.

When there's light VM pressure, ideally there should be zero overhead
caused by the patch. When there is VM pressure this will avoid some
unnecessary I/O which should outweight the minor faults. It should be
a good default behavior.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] emulate accessed bit for EPT
  2010-02-04 17:47     ` Andrea Arcangeli
@ 2010-02-05 17:34       ` Marcelo Tosatti
  2010-02-05 18:14         ` Andrea Arcangeli
  0 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2010-02-05 17:34 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, balbir, jdike, kvm, linux-kernel, avi

On Thu, Feb 04, 2010 at 06:47:15PM +0100, Andrea Arcangeli wrote:
> On Thu, Feb 04, 2010 at 08:40:43AM -0500, Rik van Riel wrote:
> > I suspect it won't be very many. I have been monitoring
> > /proc/meminfo on my system while testing this patch, and
> > it is quite typical that the size of the inactive anon
> > list does not change for minutes at a time.
> > 
> > In other words, no pages are moved onto or off of the
> > inactive anon list for several minutes. That corresponds
> > to a very small number of minor faults introduced by my
> > patch.
> 
> When there's light VM pressure, ideally there should be zero overhead
> caused by the patch. When there is VM pressure this will avoid some
> unnecessary I/O which should outweight the minor faults. It should be
> a good default behavior.

Agree.

But perhaps a module parameter to turn accessed bit emulation off might
be handy in the future?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] emulate accessed bit for EPT
  2010-02-05 17:34       ` Marcelo Tosatti
@ 2010-02-05 18:14         ` Andrea Arcangeli
  2010-02-07 19:21           ` Marcelo Tosatti
  0 siblings, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 2010-02-05 18:14 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Rik van Riel, balbir, jdike, kvm, linux-kernel, avi

On Fri, Feb 05, 2010 at 03:34:23PM -0200, Marcelo Tosatti wrote:
> But perhaps a module parameter to turn accessed bit emulation off might
> be handy in the future?

Maybe, but somebody should show that this can overall become a
downside, which I doubt... I think if it does, the VM is to blame for
calling page_referenced when there is no point to do so just yet.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] emulate accessed bit for EPT
  2010-02-05 18:14         ` Andrea Arcangeli
@ 2010-02-07 19:21           ` Marcelo Tosatti
  0 siblings, 0 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2010-02-07 19:21 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, balbir, jdike, kvm, linux-kernel, avi

On Fri, Feb 05, 2010 at 07:14:13PM +0100, Andrea Arcangeli wrote:
> On Fri, Feb 05, 2010 at 03:34:23PM -0200, Marcelo Tosatti wrote:
> > But perhaps a module parameter to turn accessed bit emulation off might
> > be handy in the future?
> 
> Maybe, but somebody should show that this can overall become a
> downside, which I doubt... I think if it does, the VM is to blame for
> calling page_referenced when there is no point to do so just yet.

Agreed. ACK.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] emulate accessed bit for EPT
  2010-02-03 21:11 [PATCH] emulate accessed bit for EPT Rik van Riel
  2010-02-04  4:12 ` Balbir Singh
  2010-02-04 16:17 ` Jeff Dike
@ 2010-02-08 10:27 ` Avi Kivity
  2 siblings, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2010-02-08 10:27 UTC (permalink / raw)
  To: Rik van Riel; +Cc: jdike, kvm, linux-kernel, aarcange, mtosatti

On 02/03/2010 11:11 PM, Rik van Riel wrote:
> Currently KVM pretends that pages with EPT mappings never got
> accessed.  This has some side effects in the VM, like swapping
> out actively used guest pages and needlessly breaking up actively
> used hugepages.
>
> We can avoid those very costly side effects by emulating the
> accessed bit for EPT PTEs, which should only be slightly costly
> because pages pass through page_referenced infrequently.
>
> TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young().
>
> This seems to help prevent KVM guests from being swapped out when
> they should not on my system.
>
>    

Applied, thanks.

>
> -	/* always return old for EPT */
> +	/*
> +	 * Emulate the accessed bit for EPT, by checking if this page has
> +	 * an EPT mapping, and clearing it if it does. On the next access,
> +	 * a new EPT mapping will be established.
> +	 * This has some overhead, but not as much as the cost of swapping
> +	 * out actively used pages or breaking up actively used hugepages.
> +	 */
>   	if (!shadow_accessed_mask)
> -		return 0;
> +		return kvm_unmap_rmapp(kvm, rmapp, data);
>    

This could be optimized by using a software-available bit for 'present' 
and the rwx bits for young, that is:

   (present, rwx) -> the page is present and recently accessed, will not 
cause EPT violation
   (present, !rwx) -> page is present but old, will cause EPT violation 
but not rmap games and get_user_pages_fast().

However that's best done later if ever.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-02-08 10:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-03 21:11 [PATCH] emulate accessed bit for EPT Rik van Riel
2010-02-04  4:12 ` Balbir Singh
2010-02-04 13:40   ` Rik van Riel
2010-02-04 15:30     ` Balbir Singh
2010-02-04 15:41       ` Rik van Riel
2010-02-04 15:52         ` Balbir Singh
2010-02-04 17:47     ` Andrea Arcangeli
2010-02-05 17:34       ` Marcelo Tosatti
2010-02-05 18:14         ` Andrea Arcangeli
2010-02-07 19:21           ` Marcelo Tosatti
2010-02-04 16:17 ` Jeff Dike
2010-02-08 10:27 ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox