xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: linux-kernel@vger.kernel.org, x86@kernel.org,
	len.brown@intel.com, tglx@linutronix.de, jeremy@goop.org,
	hpa@zytor.com, bp@alien8.de, tj@kernel.org, trenn@suse.de
Cc: mingo@redhat.com, xen-devel@lists.xensource.com, stable@kernel.org
Subject: Re: [PATCH 2/3] x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations.
Date: Fri, 2 Dec 2011 18:31:22 -0500	[thread overview]
Message-ID: <20111202233122.GA12556@phenom.dumpdata.com> (raw)
In-Reply-To: <1320786914-10541-3-git-send-email-konrad.wilk@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 2139 bytes --]

> The fix, which this patch proposes, is to wrap the pte_pgprot in the CPA
> code with newly introduced pte_attrs which can go through the pvops interface
> to get the "emulated" value instead of the raw. Naturally if CONFIG_PARAVIRT is
> not set, it would end calling native_pte_val.
> 
> The other way to fix this is by wrapping pte_flags and go through the pvops
> interface and it really is the Right Thing to do.  The problem is, that past
> experience with mprotect stuff demonstrates that it be really expensive in inner
> loops, and pte_flags() is used in some very perf-critical areas.

I did not get to verify the mprotect stuff as I need to chase down the details of it,
but I did run some benchmarks using kernbench on three different boxes:

 AMD A8-3850 (8GB) - tst005
 Intel i3-2100 (8GB) - tst007
 Nehelem EX (32logical cpus) (32GB) - tst010

I've put all the kernebench results in https://www.dumpdata.com/results/baseline_pte_flags_pte_attrs/
(and the chart for the AMD is attached).

The boxes have a fresh install of F16, with a 3.2-rc3 variant kernel using the
.config that F16 came with. I just hit Enter when oldconfig asked me to choose.

The baseline is virgin v3.2-rc3. The pte_attrs is the patch that this email is
replaying too (on top of v3.2-rc3). The pte_flags are two patches that wrap pte_flags
in paravirt and use alternative_asm to patch the code (on top of v3.2-rc3).

The patches are in the URL mentioned or in my git branch as
devel/pte_attrs.v1 ( git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git).
I am also attaching them in this email.

The summary is that I could only get the numbers to show some difference when
the maximum load was run - and _only_ on the AMD machine. The small SandyBridge
and the big SandyBridge had no trouble with. The AMD machine the difference was
13% worst if pte_flags (so alternative_asm) was used instead of pte_attrs.

The way I did these tests is to bootup with 'init=/bin/bash', remount / as rw, activate
swap disk and run kernbench on the v3.2-rc3 linux tree. Then unplug the machine for a tea
break and then repeat the cycle with a different kernel.

[-- Attachment #2: AMD-A8-3850.png --]
[-- Type: image/png, Size: 9629 bytes --]

[-- Attachment #3: 0001-x86-paravirt-xen-Introduce-pte_flags.patch --]
[-- Type: text/plain, Size: 5164 bytes --]

>From 6f30ff305df4d36160641ad7232c39f58afc5c4a Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Mon, 28 Nov 2011 17:33:47 -0500
Subject: x86/paravirt/xen: Introduce pte_flags.

We are providing a version of pte_flags where the processing
under Xen goes couple of extra steps (if pat is enabled).

The baremetal version looks quite identical to the pte_val
one and the idea is to piggyback on the DEF_NATIVE/PTE_IDENT
mechanism that pte_val uses for baremetal.

Specifically we move the logical processing of "& PTE_FLAGS_MASK"
out of the functions to minimize the amount of operations
the paravirt operations have to do. This benefits us greatly
as this way we don't have do that logical operations in the function:
(native_pte_flags):

	pushq	%rbp
	movq	%rdi, %rax
	movabsq	$-70368744173569, %rdx
	andq	%rdx, %rax
	movq	%rsp, %rbp
	popq	%rbp
	ret

but instead the paravirt function ends up being:

	pushq	%rbp
	movq	%rdi, %rax
	movq	%rsp, %rbp
	popq	%rbp
	ret

and the logical "and" operation is done outside the call.

This means that the only real operation that native_pte_flags
is "movq %rdi, %rax" we can take advantage of the PTE_IDENT
functionality introduced in patch:
"x86/paravirt/xen: Optimize pte_flags by marking it as paravirt_ident_[32|64] type."

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/paravirt.h       |    5 +++++
 arch/x86/include/asm/paravirt_types.h |    1 +
 arch/x86/include/asm/pgtable.h        |    1 +
 arch/x86/include/asm/pgtable_types.h  |    4 ++--
 arch/x86/kernel/paravirt.c            |    1 +
 arch/x86/xen/mmu.c                    |   13 +++++++++++++
 6 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a7d2db9..6533409 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -462,6 +462,11 @@ static inline void pmd_update_defer(struct mm_struct *mm, unsigned long addr,
 	PVOP_VCALL3(pv_mmu_ops.pmd_update_defer, mm, addr, pmdp);
 }
 
+static inline pteval_t pte_flags(pte_t pte)
+{
+	return pv_mmu_ops.pte_flags(pte) & PTE_FLAGS_MASK;
+}
+
 static inline pte_t __pte(pteval_t val)
 {
 	pteval_t ret;
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 8e8b9a4..86bce9d 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -288,6 +288,7 @@ struct pv_mmu_ops {
 	void (*ptep_modify_prot_commit)(struct mm_struct *mm, unsigned long addr,
 					pte_t *ptep, pte_t pte);
 
+	pteval_t (*pte_flags)(pte_t pte);
 	struct paravirt_callee_save pte_val;
 	struct paravirt_callee_save make_pte;
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 18601c8..eebf625 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -76,6 +76,7 @@ extern struct mm_struct *pgd_page_get_mm(struct page *page);
 #define __pmd(x)	native_make_pmd(x)
 #endif
 
+#define pte_flags(x)	(native_pte_val(x) & PTE_FLAGS_MASK)
 #define pte_val(x)	native_pte_val(x)
 #define __pte(x)	native_make_pte(x)
 
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 013286a..4c957e8 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -270,9 +270,9 @@ static inline pteval_t native_pte_val(pte_t pte)
 	return pte.pte;
 }
 
-static inline pteval_t pte_flags(pte_t pte)
+static inline pteval_t native_pte_flags(pte_t pte)
 {
-	return native_pte_val(pte) & PTE_FLAGS_MASK;
+	return pte.pte;
 }
 
 #define pgprot_val(x)	((x).pgprot)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index d90272e..4780367 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -462,6 +462,7 @@ struct pv_mmu_ops pv_mmu_ops = {
 #endif
 #endif /* PAGETABLE_LEVELS >= 3 */
 
+	.pte_flags = native_pte_flags,
 	.pte_val = PTE_IDENT,
 	.pgd_val = PTE_IDENT,
 
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 87f6673..5b79048 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -412,6 +412,16 @@ static pteval_t iomap_pte(pteval_t val)
 	return val;
 }
 
+static pteval_t xen_pte_flags(pte_t pte)
+{
+	pteval_t pteval = pte.pte;
+
+	/* If this is a WC pte, convert back from Xen WC to Linux WC */
+	if ((pteval & (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)) == _PAGE_PAT)
+		pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
+	return pteval;
+}
+
 static pteval_t xen_pte_val(pte_t pte)
 {
 	pteval_t pteval = pte.pte;
@@ -1975,6 +1985,8 @@ static void __init xen_post_allocator_init(void)
 	pv_mmu_ops.release_pud = xen_release_pud;
 #endif
 
+	if (!pat_enabled)
+		pv_mmu_ops.pte_flags = native_pte_flags;
 #ifdef CONFIG_X86_64
 	SetPagePinned(virt_to_page(level3_user_vsyscall));
 #endif
@@ -2023,6 +2035,7 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
 	.ptep_modify_prot_start = __ptep_modify_prot_start,
 	.ptep_modify_prot_commit = __ptep_modify_prot_commit,
 
+	.pte_flags = xen_pte_flags,
 	.pte_val = PV_CALLEE_SAVE(xen_pte_val),
 	.pgd_val = PV_CALLEE_SAVE(xen_pgd_val),
 
-- 
1.7.7.3


[-- Attachment #4: 0002-x86-paravirt-xen-Optimize-pte_flags-by-marking-it-as.patch --]
[-- Type: text/plain, Size: 4345 bytes --]

>From 7e218b96dc908362afb501755f3f4b8ec530b700 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Mon, 28 Nov 2011 17:58:38 -0500
Subject: x86/paravirt/xen: Optimize pte_flags by marking it as
 paravirt_ident_[32|64] type.

Which means that we have to use:
 __PV_IS_CALLEE_SAVE(_paravirt_ident_64)
which requires that the pte_flags on the 'pv_mmu_ops' structure
be defined as struct paravirt_callee_save. We end up using the
_paravirt_ident_64 function for both baremetal and for Xen guests
that have PAT disabled. For those that have pat_enabled, we end
up calling xen_pte_flags.

The big benefit of making the .pte_flags as struct paravirt_callee_save
and making it PTE_IDENT is that while the code looks as so when compiled:

  ff 14 25 e8 52 c1 81 	callq  *0xffffffff81c152e8

That however, on baremetal and under Xen when !pat_enable ends up
being patched over (by using the DEF_NATIVE macro) to be:

   48 89 f8             	mov    %rdi,%rax
   66 66 66 90          	data32 data32 xchg %ax,%ax

(the 32-bit version is of course different).
For details refer to 'paravirt_patch_ident_64' function.

The Xen version which requires pat_enable==1, ends up patched
to be:

	call xen_pte_flags;

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/paravirt.h       |   11 ++++++++++-
 arch/x86/include/asm/paravirt_types.h |    2 +-
 arch/x86/kernel/paravirt.c            |    2 +-
 arch/x86/xen/mmu.c                    |   11 +++++++++--
 4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 6533409..b113feb 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -464,7 +464,16 @@ static inline void pmd_update_defer(struct mm_struct *mm, unsigned long addr,
 
 static inline pteval_t pte_flags(pte_t pte)
 {
-	return pv_mmu_ops.pte_flags(pte) & PTE_FLAGS_MASK;
+	pteval_t ret;
+
+	if (sizeof(pteval_t) > sizeof(long))
+		ret =  PVOP_CALLEE2(pteval_t, pv_mmu_ops.pte_flags,
+				    pte.pte, (u64)pte.pte >> 32);
+	else
+		ret =  PVOP_CALLEE1(pteval_t, pv_mmu_ops.pte_flags,
+				    pte.pte);
+
+	return ret & PTE_FLAGS_MASK;
 }
 
 static inline pte_t __pte(pteval_t val)
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 86bce9d..5890cd7c 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -288,7 +288,7 @@ struct pv_mmu_ops {
 	void (*ptep_modify_prot_commit)(struct mm_struct *mm, unsigned long addr,
 					pte_t *ptep, pte_t pte);
 
-	pteval_t (*pte_flags)(pte_t pte);
+	struct paravirt_callee_save pte_flags;
 	struct paravirt_callee_save pte_val;
 	struct paravirt_callee_save make_pte;
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 4780367..ba22188 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -462,7 +462,7 @@ struct pv_mmu_ops pv_mmu_ops = {
 #endif
 #endif /* PAGETABLE_LEVELS >= 3 */
 
-	.pte_flags = native_pte_flags,
+	.pte_flags = PTE_IDENT,
 	.pte_val = PTE_IDENT,
 	.pgd_val = PTE_IDENT,
 
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 5b79048..24e275a 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1965,6 +1965,13 @@ void __init xen_ident_map_ISA(void)
 	xen_flush_tlb();
 }
 
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
+/* 32-bit pagetable entries */
+#define PTE_IDENT	__PV_IS_CALLEE_SAVE(_paravirt_ident_32)
+#else
+/* 64-bit pagetable entries */
+#define PTE_IDENT	__PV_IS_CALLEE_SAVE(_paravirt_ident_64)
+#endif
 static void __init xen_post_allocator_init(void)
 {
 	pv_mmu_ops.set_pte = xen_set_pte;
@@ -1986,7 +1993,7 @@ static void __init xen_post_allocator_init(void)
 #endif
 
 	if (!pat_enabled)
-		pv_mmu_ops.pte_flags = native_pte_flags;
+		pv_mmu_ops.pte_flags = PTE_IDENT;
 #ifdef CONFIG_X86_64
 	SetPagePinned(virt_to_page(level3_user_vsyscall));
 #endif
@@ -2035,7 +2042,7 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
 	.ptep_modify_prot_start = __ptep_modify_prot_start,
 	.ptep_modify_prot_commit = __ptep_modify_prot_commit,
 
-	.pte_flags = xen_pte_flags,
+	.pte_flags = __PV_IS_CALLEE_SAVE(xen_pte_flags),
 	.pte_val = PV_CALLEE_SAVE(xen_pte_val),
 	.pgd_val = PV_CALLEE_SAVE(xen_pgd_val),
 
-- 
1.7.7.3


  reply	other threads:[~2011-12-02 23:31 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-08 21:15 [PATCH] x86/acpi fixes for 3.2 (v1) impacting distributions Konrad Rzeszutek Wilk
2011-11-08 21:15 ` [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle Konrad Rzeszutek Wilk
2011-11-09 10:19   ` Deepthi Dharwar
2011-11-09 10:51     ` Deepthi Dharwar
2011-11-09 14:44     ` Konrad Rzeszutek Wilk
2011-11-11 11:41       ` Deepthi Dharwar
2011-11-15 14:40         ` [Xen-devel] " Konrad Rzeszutek Wilk
2011-11-22 13:46           ` Konrad Rzeszutek Wilk
2011-11-23 11:06             ` Deepthi Dharwar
2011-11-10  4:06   ` [Xen-devel] " Konrad Rzeszutek Wilk
2011-11-13  6:00     ` Len Brown
2011-11-14 14:37       ` Konrad Rzeszutek Wilk
2011-11-08 21:15 ` [PATCH 2/3] x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations Konrad Rzeszutek Wilk
2011-12-02 23:31   ` Konrad Rzeszutek Wilk [this message]
2011-11-08 21:15 ` [PATCH 3/3] x86/paravirt: Use pte_val instead of pte_flags on CPA pageattr_test Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111202233122.GA12556@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=stable@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=trenn@suse.de \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).