public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC] respect the referenced bit of KVM guest pages?
       [not found] ` <4A793B92.9040204@redhat.com>
@ 2009-08-05  8:17   ` Avi Kivity
  2009-08-05 14:33     ` Rik van Riel
  0 siblings, 1 reply; 3+ messages in thread
From: Avi Kivity @ 2009-08-05  8:17 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Rik van Riel, Dike, Jeffrey G, Yu, Wilfred, Kleen, Andi,
	Andrea Arcangeli, Hugh Dickins, Andrew Morton, Christoph Lameter,
	KOSAKI Motohiro, Mel Gorman, LKML, linux-mm, Andrea Arcangeli,
	KVM list

[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]

On 08/05/2009 10:58 AM, Avi Kivity wrote:
> On 08/05/2009 05:40 AM, Wu Fengguang wrote:
>> Greetings,
>>
>> Jeff Dike found that many KVM pages are being refaulted in 2.6.29:
>>
>> "Lots of pages between discarded due to memory pressure only to be
>> faulted back in soon after. These pages are nearly all stack pages.
>> This is not consistent - sometimes there are relatively few such pages
>> and they are spread out between processes."
>>
>> The refaults can be drastically reduced by the following patch, which
>> respects the referenced bit of all anonymous pages (including the KVM
>> pages).
>>
>> However it risks reintroducing the problem addressed by commit 7e9cd4842
>> (fix reclaim scalability problem by ignoring the referenced bit,
>> mainly the pte young bit). I wonder if there are better solutions?
>
> How do you distinguish between kvm pages and non-kvm anonymous pages?  
> More importantly, why should you?
>
> Jeff, do you see the refaults on Nehalem systems?  If so, that's 
> likely due to the lack of an accessed bit on EPT pagetables.  It would 
> be interesting to compare with Barcelona  (which does).
>
> If that's indeed the case, we can have the EPT ageing mechanism give 
> pages a bit more time around by using an available bit in the EPT PTEs 
> to return accessed on the first pass and not-accessed on the second.
>

The attached patch implements this.

-- 
error compiling committee.c: too many arguments to function


[-- Attachment #2: ept-emulate-accessed-bit.patch --]
[-- Type: text/x-patch, Size: 2115 bytes --]

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7b53614..310938a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -195,6 +195,7 @@ static u64 __read_mostly shadow_x_mask;	/* mutual exclusive with nx_mask */
 static u64 __read_mostly shadow_user_mask;
 static u64 __read_mostly shadow_accessed_mask;
 static u64 __read_mostly shadow_dirty_mask;
+static int __read_mostly shadow_accessed_shift;
 
 static inline u64 rsvd_bits(int s, int e)
 {
@@ -219,6 +220,8 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 {
 	shadow_user_mask = user_mask;
 	shadow_accessed_mask = accessed_mask;
+	shadow_accessed_shift
+		= find_first_bit((void *)&shadow_accessed_mask, 64);
 	shadow_dirty_mask = dirty_mask;
 	shadow_nx_mask = nx_mask;
 	shadow_x_mask = x_mask;
@@ -817,11 +820,11 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
 	while (spte) {
 		int _young;
 		u64 _spte = *spte;
-		BUG_ON(!(_spte & PT_PRESENT_MASK));
-		_young = _spte & PT_ACCESSED_MASK;
+		BUG_ON(!(_spte & shadow_accessed_mask));
+		_young = _spte & shadow_accessed_mask;
 		if (_young) {
 			young = 1;
-			clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
+			clear_bit(shadow_accessed_shift, (unsigned long *)spte);
 		}
 		spte = rmap_next(kvm, rmapp, spte);
 	}
@@ -2572,7 +2575,7 @@ static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn)
 	    && shadow_accessed_mask
 	    && !(*spte & shadow_accessed_mask)
 	    && is_shadow_present_pte(*spte))
-		set_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
+		set_bit(shadow_accessed_shift, (unsigned long *)spte);
 }
 
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0ba706e..bc99367 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4029,7 +4029,7 @@ static int __init vmx_init(void)
 		bypass_guest_pf = 0;
 		kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
 			VMX_EPT_WRITABLE_MASK);
-		kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull,
+		kvm_mmu_set_mask_ptes(0ull, 1ull << 63, 0ull, 0ull,
 				VMX_EPT_EXECUTABLE_MASK);
 		kvm_enable_tdp();
 	} else

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC] respect the referenced bit of KVM guest pages?
  2009-08-05  8:17   ` [RFC] respect the referenced bit of KVM guest pages? Avi Kivity
@ 2009-08-05 14:33     ` Rik van Riel
  2009-08-05 15:37       ` Avi Kivity
  0 siblings, 1 reply; 3+ messages in thread
From: Rik van Riel @ 2009-08-05 14:33 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Wu Fengguang, Dike, Jeffrey G, Yu, Wilfred, Kleen, Andi,
	Andrea Arcangeli, Hugh Dickins, Andrew Morton, Christoph Lameter,
	KOSAKI Motohiro, Mel Gorman, LKML, linux-mm, KVM list

Avi Kivity wrote:

> The attached patch implements this.

The attached page requires each page to go around twice
before it is evicted, but they will still get evicted in
the order in which they were made present.

FIFO page replacement was shown to be a bad idea in the
1960's and it is still a terrible idea today.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] respect the referenced bit of KVM guest pages?
  2009-08-05 14:33     ` Rik van Riel
@ 2009-08-05 15:37       ` Avi Kivity
  0 siblings, 0 replies; 3+ messages in thread
From: Avi Kivity @ 2009-08-05 15:37 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Wu Fengguang, Dike, Jeffrey G, Yu, Wilfred, Kleen, Andi,
	Andrea Arcangeli, Hugh Dickins, Andrew Morton, Christoph Lameter,
	KOSAKI Motohiro, Mel Gorman, LKML, linux-mm, KVM list

On 08/05/2009 05:33 PM, Rik van Riel wrote:
> Avi Kivity wrote:
>
>> The attached patch implements this.
>
> The attached page requires each page to go around twice
> before it is evicted, but they will still get evicted in
> the order in which they were made present.
>
> FIFO page replacement was shown to be a bad idea in the
> 1960's and it is still a terrible idea today.
>

Which is why we have accessed bits in page tables... but emulating the 
accessed bit via RWX (note no present bit in EPT) is better than 
ignoring it.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-08-05 15:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20090805024058.GA8886@localhost>
     [not found] ` <4A793B92.9040204@redhat.com>
2009-08-05  8:17   ` [RFC] respect the referenced bit of KVM guest pages? Avi Kivity
2009-08-05 14:33     ` Rik van Riel
2009-08-05 15:37       ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox