From: Avi Kivity <avi@redhat.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Rik van Riel <riel@redhat.com>,
"Dike, Jeffrey G" <jeffrey.g.dike@intel.com>,
"Yu, Wilfred" <wilfred.yu@intel.com>,
"Kleen, Andi" <andi.kleen@intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Mel Gorman <mel@csn.ul.ie>, LKML <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>Andrea Arcangeli
<aarcange@redhat.com>, KVM list <kvm@vger.kernel.org>
Subject: Re: [RFC] respect the referenced bit of KVM guest pages?
Date: Wed, 05 Aug 2009 11:17:12 +0300 [thread overview]
Message-ID: <4A794008.6030204@redhat.com> (raw)
In-Reply-To: <4A793B92.9040204@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]
On 08/05/2009 10:58 AM, Avi Kivity wrote:
> On 08/05/2009 05:40 AM, Wu Fengguang wrote:
>> Greetings,
>>
>> Jeff Dike found that many KVM pages are being refaulted in 2.6.29:
>>
>> "Lots of pages between discarded due to memory pressure only to be
>> faulted back in soon after. These pages are nearly all stack pages.
>> This is not consistent - sometimes there are relatively few such pages
>> and they are spread out between processes."
>>
>> The refaults can be drastically reduced by the following patch, which
>> respects the referenced bit of all anonymous pages (including the KVM
>> pages).
>>
>> However it risks reintroducing the problem addressed by commit 7e9cd4842
>> (fix reclaim scalability problem by ignoring the referenced bit,
>> mainly the pte young bit). I wonder if there are better solutions?
>
> How do you distinguish between kvm pages and non-kvm anonymous pages?
> More importantly, why should you?
>
> Jeff, do you see the refaults on Nehalem systems? If so, that's
> likely due to the lack of an accessed bit on EPT pagetables. It would
> be interesting to compare with Barcelona (which does).
>
> If that's indeed the case, we can have the EPT ageing mechanism give
> pages a bit more time around by using an available bit in the EPT PTEs
> to return accessed on the first pass and not-accessed on the second.
>
The attached patch implements this.
--
error compiling committee.c: too many arguments to function
[-- Attachment #2: ept-emulate-accessed-bit.patch --]
[-- Type: text/x-patch, Size: 2115 bytes --]
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7b53614..310938a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -195,6 +195,7 @@ static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
static u64 __read_mostly shadow_user_mask;
static u64 __read_mostly shadow_accessed_mask;
static u64 __read_mostly shadow_dirty_mask;
+static int __read_mostly shadow_accessed_shift;
static inline u64 rsvd_bits(int s, int e)
{
@@ -219,6 +220,8 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
{
shadow_user_mask = user_mask;
shadow_accessed_mask = accessed_mask;
+ shadow_accessed_shift
+ = find_first_bit((void *)&shadow_accessed_mask, 64);
shadow_dirty_mask = dirty_mask;
shadow_nx_mask = nx_mask;
shadow_x_mask = x_mask;
@@ -817,11 +820,11 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
while (spte) {
int _young;
u64 _spte = *spte;
- BUG_ON(!(_spte & PT_PRESENT_MASK));
- _young = _spte & PT_ACCESSED_MASK;
+ BUG_ON(!(_spte & shadow_accessed_mask));
+ _young = _spte & shadow_accessed_mask;
if (_young) {
young = 1;
- clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
+ clear_bit(shadow_accessed_shift, (unsigned long *)spte);
}
spte = rmap_next(kvm, rmapp, spte);
}
@@ -2572,7 +2575,7 @@ static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn)
&& shadow_accessed_mask
&& !(*spte & shadow_accessed_mask)
&& is_shadow_present_pte(*spte))
- set_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
+ set_bit(shadow_accessed_shift, (unsigned long *)spte);
}
void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0ba706e..bc99367 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4029,7 +4029,7 @@ static int __init vmx_init(void)
bypass_guest_pf = 0;
kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
VMX_EPT_WRITABLE_MASK);
- kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull,
+ kvm_mmu_set_mask_ptes(0ull, 1ull << 63, 0ull, 0ull,
VMX_EPT_EXECUTABLE_MASK);
kvm_enable_tdp();
} else
next prev parent reply other threads:[~2009-08-05 8:12 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-05 2:40 [RFC] respect the referenced bit of KVM guest pages? Wu Fengguang
2009-08-05 4:15 ` KOSAKI Motohiro
2009-08-05 4:41 ` Wu Fengguang
2009-08-05 7:58 ` Avi Kivity
2009-08-05 8:17 ` Avi Kivity [this message]
2009-08-05 14:33 ` Rik van Riel
2009-08-05 15:37 ` Avi Kivity
2009-08-05 14:15 ` Rik van Riel
2009-08-05 15:12 ` Avi Kivity
2009-08-05 15:15 ` Rik van Riel
2009-08-05 15:25 ` Avi Kivity
2009-08-05 16:35 ` Andrea Arcangeli
2009-08-05 16:31 ` Andrea Arcangeli
2009-08-05 17:25 ` Rik van Riel
2009-08-05 15:45 ` Dike, Jeffrey G
2009-08-05 16:05 ` Andrea Arcangeli
2009-08-05 16:12 ` Dike, Jeffrey G
2009-08-05 16:19 ` Andrea Arcangeli
2009-08-05 15:58 ` Andrea Arcangeli
2009-08-05 17:20 ` Rik van Riel
2009-08-05 17:42 ` Rik van Riel
2009-08-06 10:15 ` Andrea Arcangeli
2009-08-06 10:08 ` Andrea Arcangeli
2009-08-06 10:18 ` Avi Kivity
2009-08-06 10:20 ` Andrea Arcangeli
2009-08-06 10:59 ` Wu Fengguang
2009-08-06 11:44 ` Avi Kivity
2009-08-06 13:06 ` Wu Fengguang
2009-08-06 13:16 ` Rik van Riel
2009-08-16 3:28 ` Wu Fengguang
2009-08-16 3:56 ` Rik van Riel
2009-08-16 4:43 ` Balbir Singh
2009-08-16 4:55 ` Wu Fengguang
2009-08-16 5:59 ` Balbir Singh
2009-08-17 19:47 ` Dike, Jeffrey G
2009-08-21 18:24 ` Balbir Singh
2009-08-31 19:43 ` Dike, Jeffrey G
2009-08-31 19:52 ` Rik van Riel
2009-08-31 20:06 ` Dike, Jeffrey G
2009-08-31 20:09 ` Rik van Riel
2009-08-31 20:11 ` Dike, Jeffrey G
2009-08-31 20:42 ` Balbir Singh
2009-08-06 13:46 ` Avi Kivity
2009-08-06 21:09 ` Jeff Dike
2009-08-16 3:18 ` Wu Fengguang
2009-08-16 3:53 ` Rik van Riel
2009-08-16 5:15 ` Wu Fengguang
2009-08-16 11:29 ` Wu Fengguang
2009-08-17 14:33 ` Minchan Kim
2009-08-18 2:34 ` Wu Fengguang
2009-08-18 4:17 ` Minchan Kim
2009-08-18 9:31 ` Wu Fengguang
2009-08-18 9:52 ` Minchan Kim
2009-08-18 10:00 ` Wu Fengguang
2009-08-18 11:00 ` Minchan Kim
2009-08-18 11:11 ` Wu Fengguang
2009-08-18 14:03 ` Minchan Kim
2009-08-18 16:27 ` KOSAKI Motohiro
2009-08-18 15:57 ` KOSAKI Motohiro
2009-08-19 12:01 ` Wu Fengguang
2009-08-19 12:05 ` KOSAKI Motohiro
2009-08-19 12:10 ` Wu Fengguang
2009-08-19 12:25 ` Minchan Kim
2009-08-19 13:19 ` KOSAKI Motohiro
2009-08-19 13:28 ` Minchan Kim
2009-08-21 11:17 ` KOSAKI Motohiro
2009-08-19 13:24 ` Wu Fengguang
2009-08-19 13:38 ` Minchan Kim
2009-08-19 14:00 ` Wu Fengguang
2009-08-06 13:13 ` Rik van Riel
2009-08-06 13:49 ` Avi Kivity
2009-08-07 3:11 ` KOSAKI Motohiro
2009-08-07 7:54 ` Balbir Singh
2009-08-07 8:24 ` KAMEZAWA Hiroyuki
2009-08-06 13:11 ` Rik van Riel
2009-08-06 13:08 ` Rik van Riel
2009-08-07 3:17 ` KOSAKI Motohiro
2009-08-12 7:48 ` Wu Fengguang
2009-08-12 14:31 ` Rik van Riel
2009-08-13 1:03 ` Wu Fengguang
2009-08-13 15:46 ` Rik van Riel
2009-08-13 16:12 ` Avi Kivity
2009-08-13 16:26 ` Rik van Riel
2009-08-13 19:12 ` Avi Kivity
2009-08-13 21:16 ` Johannes Weiner
2009-08-14 7:16 ` Avi Kivity
2009-08-14 9:10 ` Johannes Weiner
2009-08-14 9:51 ` Wu Fengguang
2009-08-14 13:19 ` Rik van Riel
2009-08-15 5:45 ` Wu Fengguang
2009-08-16 5:09 ` Balbir Singh
2009-08-16 5:41 ` Wu Fengguang
2009-08-16 5:50 ` Wu Fengguang
2009-08-18 15:57 ` KOSAKI Motohiro
2009-08-17 18:04 ` Dike, Jeffrey G
2009-08-18 2:26 ` Wu Fengguang
2009-09-02 19:30 ` Dike, Jeffrey G
2009-09-03 2:04 ` Wu Fengguang
2009-09-04 20:06 ` Dike, Jeffrey G
2009-09-04 20:57 ` Rik van Riel
2009-08-18 15:57 ` KOSAKI Motohiro
2009-08-19 12:08 ` Wu Fengguang
2009-08-19 13:40 ` [RFC] memcg: move definitions to .h and inline some functions Wu Fengguang
2009-08-19 14:18 ` KAMEZAWA Hiroyuki
2009-08-19 14:27 ` Balbir Singh
2009-08-20 1:34 ` Wu Fengguang
2009-08-14 21:42 ` [RFC] respect the referenced bit of KVM guest pages? Dike, Jeffrey G
2009-08-14 22:37 ` Rik van Riel
2009-08-15 5:32 ` Wu Fengguang
2009-09-13 16:23 ` KOSAKI Motohiro
2009-08-05 17:53 ` Rik van Riel
2009-08-05 19:00 ` Dike, Jeffrey G
2009-08-05 19:07 ` Rik van Riel
2009-08-05 19:18 ` Dike, Jeffrey G
2009-08-06 9:22 ` Avi Kivity
2009-08-06 9:25 ` Wu Fengguang
2009-08-06 9:35 ` Avi Kivity
2009-08-06 9:35 ` Wu Fengguang
2009-08-06 9:59 ` Avi Kivity
2009-08-06 9:59 ` Wu Fengguang
2009-08-06 10:14 ` Avi Kivity
2009-08-07 1:25 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A794008.6030204@redhat.com \
--to=avi@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi.kleen@intel.com \
--cc=cl@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=jeffrey.g.dike@intel.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=riel@redhat.com \
--cc=wilfred.yu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).