All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] respin of mprotect performance patch
@ 2008-01-08  1:05 Bruce Rogers
  2008-01-10 15:52 ` Keir Fraser
  2008-01-12 11:47 ` Keir Fraser
  0 siblings, 2 replies; 12+ messages in thread
From: Bruce Rogers @ 2008-01-08  1:05 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 310 bytes --]

I've incorporated the comments of Jan and Keir into the attached patches, with the exception of moving mmu_ops related entries from public/xen.h into architecture specific headers.
Thanks for your review.  Let me know if I've missed anything.

Signed-off-by: Bruce Rogers <brogers@novell.com>

- Bruce


[-- Attachment #2: linux_mprotect_perf.patch --]
[-- Type: text/plain, Size: 7308 bytes --]

diff -r 61c96456a3e1 include/xen/interface/xen.h
--- a/include/xen/interface/xen.h	Thu Dec 20 16:58:14 2007 +0000
+++ b/include/xen/interface/xen.h	Mon Jan 07 17:46:41 2008 -0700
@@ -168,9 +168,39 @@
  * ptr[:2]  -- Machine address within the frame whose mapping to modify.
  *             The frame must belong to the FD, if one is specified.
  * val      -- Value to write into the mapping entry.
+ *
+ * ptr[1:0] == MMU_ATOMIC_PT_UPDATE:
+ * Updates an entry in an L1 page table such that concurrent hardware mmu
+ * updates to that entry are not lost. If the new table entry is valid/present,
+ * the mapped frame must belong to the FD, if an FD has been specified. If
+ * attempting to map an I/O page then the caller assumes the privilege of the
+ * FD.
+ * FD == DOMID_IO: Permit /only/ I/O mappings, at the priv level of the caller.
+ * FD == DOMID_XEN: Map restricted areas of Xen's heap space.
+ * ptr[:2]  -- Machine address of the page-table entry to modify.
+ * val      -- Value to write.
+ *
+ * ptr[1:0] == MMU_FLAG_RANGE_UPDATE:
+ * Updates a range of entries in an L1 page table such that concurrent hardware 
+ * mmu updates to those entries are not lost. The update value is created by
+ * updating specific bits in the the current page table entry, as specified by
+ * val. If the new table entry is valid/present, the mapped frame must belong
+ * to the FD, if an FD has been specified. If attempting to map an I/O page
+ * then the caller assumes the privilege of the FD.
+ * FD == DOMID_IO: Permit /only/ I/O mappings, at the priv level of the caller.
+ * FD == DOMID_XEN: Map restricted areas of Xen's heap space.
+ * ptr[:2]  -- Machine address of the first page-table entry to modify.
+ * val[11:0] = mask pte bits 11:0 (1 means update; 0 means don't change)
+ * val[23:12] = mask pte bits 63:52
+ * val[35:24] = new pte bits 11:0 (where corresp. Mask bit == 1)
+ * val[47:36] = new pte bits 63:52
+ * val[57:48] = number of ptes in range
+ * val[63:58] = MBZ
  */
 #define MMU_NORMAL_PT_UPDATE     0 /* checked '*ptr = val'. ptr is MA.       */
 #define MMU_MACHPHYS_UPDATE      1 /* ptr = MA of frame to modify entry for  */
+#define MMU_ATOMIC_PT_UPDATE     2 /* checked atomic '*ptr = val'. ptr is MA.*/
+#define MMU_FLAG_RANGE_UPDATE    3 /* range atomic checked '*ptr flags = val'*/
 
 /*
  * MMU EXTENDED OPERATIONS
diff -r 61c96456a3e1 include/asm-i386/mach-xen/asm/pgtable.h
--- a/include/asm-i386/mach-xen/asm/pgtable.h	Thu Dec 20 16:58:14 2007 +0000
+++ b/include/asm-i386/mach-xen/asm/pgtable.h	Mon Jan 07 17:47:31 2008 -0700
@@ -512,6 +512,12 @@ int touch_pte_range(struct mm_struct *mm
                     unsigned long address,
                     unsigned long size);
 
+int xen_change_pte_range(struct mm_struct *mm, pmd_t *pmd,
+		unsigned long addr, unsigned long end, pgprot_t newprot);
+
+#define arch_change_pte_range(mm, pmd, addr, end, newprot)	\
+		xen_change_pte_range(mm, pmd, addr, end, newprot)
+
 #define io_remap_pfn_range(vma,from,pfn,size,prot) \
 direct_remap_pfn_range(vma,from,pfn,size,prot,DOMID_IO)
 
diff -r 61c96456a3e1 include/asm-x86_64/mach-xen/asm/pgtable.h
--- a/include/asm-x86_64/mach-xen/asm/pgtable.h	Thu Dec 20 16:58:14 2007 +0000
+++ b/include/asm-x86_64/mach-xen/asm/pgtable.h	Mon Jan 07 17:47:41 2008 -0700
@@ -541,6 +541,12 @@ int touch_pte_range(struct mm_struct *mm
                     unsigned long address,
                     unsigned long size);
 
+int xen_change_pte_range(struct mm_struct *mm, pmd_t *pmd,
+		unsigned long addr, unsigned long end, pgprot_t newprot);
+
+#define arch_change_pte_range(mm, pmd, addr, end, newprot)	\
+		xen_change_pte_range(mm, pmd, addr, end, newprot)
+
 #define io_remap_pfn_range(vma, vaddr, pfn, size, prot)		\
 		direct_remap_pfn_range(vma,vaddr,pfn,size,prot,DOMID_IO)
 
diff -r 61c96456a3e1 include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h	Thu Dec 20 16:58:14 2007 +0000
+++ b/include/asm-generic/pgtable.h	Mon Jan 07 17:48:00 2008 -0700
@@ -188,6 +188,10 @@ static inline void ptep_set_wrprotect(st
 })
 #endif
 
+#ifndef arch_change_pte_range
+#define arch_change_pte_range(mm, pmd, addr, end, newprot) 0
+#endif
+
 #ifndef __ASSEMBLY__
 /*
  * When walking page tables, we usually want to skip any p?d_none entries;
diff -r 61c96456a3e1 arch/i386/mm/hypervisor.c
--- a/arch/i386/mm/hypervisor.c	Thu Dec 20 16:58:14 2007 +0000
+++ b/arch/i386/mm/hypervisor.c	Mon Jan 07 17:48:30 2008 -0700
@@ -548,3 +548,78 @@ int write_ldt_entry(void *ldt, int entry
 		mach_lp, (u64)entry_a | ((u64)entry_b<<32));
 }
 #endif
+
+#define MAX_BATCHED_FULL_PTES 32
+
+#define PTE_FLAGS 0xfff0000000000fffULL
+
+#define map_mask_bits(X) (((X & 0xfffULL) << 0) | \
+	((X & PTE_FLAGS) >> 40))
+#define map_flag_bits(X) (((X & PTE_FLAGS) << 24) | \
+	((X & PTE_FLAGS) >> 16))
+
+int xen_change_pte_range(struct mm_struct *mm, pmd_t *pmd,
+			 unsigned long addr, unsigned long end, pgprot_t newprot)
+{
+	int rc = 0;
+	u64 nr_full_ptes = 0, nr_flag_ptes = 0;
+	mmu_update_t u[MAX_BATCHED_FULL_PTES], v;
+	u64 newval, flags, mask, mask_bits;
+	pte_t *pte;
+	spinlock_t *ptl;
+
+	mask_bits = ~(PTE_MASK | _PAGE_DIRTY);
+#if defined(CONFIG_X86_PAE) || defined(CONFIG_X86_64)
+	mask_bits |= ~__supported_pte_mask;
+#endif
+	mask = map_mask_bits(mask_bits);
+
+	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+
+	for (;;) {
+		if (addr != end && pte_present(*pte)) {
+			newval = __pte_val(pte_modify(*pte, newprot));
+			if (((__pte_val(*pte) ^ newval) & ~PTE_FLAGS) || 
+			    (nr_flag_ptes && (newval & PTE_FLAGS) != flags)) {
+				if (nr_flag_ptes) {
+					v.val |= nr_flag_ptes << 48;
+					if ((rc = HYPERVISOR_mmu_update(
+						&v, 1, NULL, DOMID_SELF)) != 0)
+						break;
+					nr_flag_ptes = 0;
+				}
+				u[nr_full_ptes].ptr = virt_to_machine(pte) | MMU_ATOMIC_PT_UPDATE;
+				u[nr_full_ptes].val = newval;
+				if (++nr_full_ptes == MAX_BATCHED_FULL_PTES) {
+					if ((rc = HYPERVISOR_mmu_update(
+						&u[0], nr_full_ptes, NULL, DOMID_SELF)) != 0)
+						break;
+					nr_full_ptes = 0;
+				}
+			} else if (nr_flag_ptes++ == 0) {
+				flags = newval & PTE_FLAGS;
+				v.ptr = virt_to_machine(pte) | MMU_FLAG_RANGE_UPDATE;
+				v.val = mask | map_flag_bits(flags);
+			}
+		} else {
+			if (nr_flag_ptes) {
+				v.val |= nr_flag_ptes << 48;
+				if ((rc = HYPERVISOR_mmu_update(
+					&v, 1, NULL, DOMID_SELF)) != 0)
+					break;
+				nr_flag_ptes = 0;
+			}
+			if (addr == end) {
+				if (nr_full_ptes)
+					rc = HYPERVISOR_mmu_update(
+						&u[0], nr_full_ptes, NULL, DOMID_SELF);
+				break;
+			}
+		}
+		pte++;
+		addr += PAGE_SIZE;
+	}
+	pte_unmap_unlock(pte - 1, ptl);
+	BUG_ON(rc && rc != -ENOSYS);
+	return !rc;
+}
diff -r 61c96456a3e1 mm/mprotect.c
--- a/mm/mprotect.c	Thu Dec 20 16:58:14 2007 +0000
+++ b/mm/mprotect.c	Mon Jan 07 17:48:42 2008 -0700
@@ -75,6 +75,8 @@ static inline void change_pmd_range(stru
 	do {
 		next = pmd_addr_end(addr, end);
 		if (pmd_none_or_clear_bad(pmd))
+			continue;
+		if (arch_change_pte_range(mm, pmd, addr, next, newprot))
 			continue;
 		change_pte_range(mm, pmd, addr, next, newprot);
 	} while (pmd++, addr = next, addr != end);

[-- Attachment #3: xen_mprotect_perf.patch --]
[-- Type: text/plain, Size: 14269 bytes --]

diff -r 2491691e3e69 xen/include/public/xen.h
--- a/xen/include/public/xen.h	Sat Dec 29 17:57:47 2007 +0000
+++ b/xen/include/public/xen.h	Mon Jan 07 18:05:07 2008 -0700
@@ -168,9 +168,39 @@
  * ptr[:2]  -- Machine address within the frame whose mapping to modify.
  *             The frame must belong to the FD, if one is specified.
  * val      -- Value to write into the mapping entry.
+ *
+ * ptr[1:0] == MMU_ATOMIC_PT_UPDATE:
+ * Updates an entry in an L1 page table such that concurrent hardware mmu
+ * updates to that entry are not lost. If the new table entry is valid/present,
+ * the mapped frame must belong to the FD, if an FD has been specified. If
+ * attempting to map an I/O page then the caller assumes the privilege of the
+ * FD.
+ * FD == DOMID_IO: Permit /only/ I/O mappings, at the priv level of the caller.
+ * FD == DOMID_XEN: Map restricted areas of Xen's heap space.
+ * ptr[:2]  -- Machine address of the page-table entry to modify.
+ * val      -- Value to write.
+ *
+ * ptr[1:0] == MMU_FLAG_RANGE_UPDATE:
+ * Updates a range of entries in an L1 page table such that concurrent hardware 
+ * mmu updates to those entries are not lost. The update value is created by
+ * updating specific bits in the the current page table entry, as specified by
+ * val. If the new table entry is valid/present, the mapped frame must belong
+ * to the FD, if an FD has been specified. If attempting to map an I/O page
+ * then the caller assumes the privilege of the FD.
+ * FD == DOMID_IO: Permit /only/ I/O mappings, at the priv level of the caller.
+ * FD == DOMID_XEN: Map restricted areas of Xen's heap space.
+ * ptr[:2]  -- Machine address of the first page-table entry to modify.
+ * val[11:0] = mask pte bits 11:0 (1 means update; 0 means don't change)
+ * val[23:12] = mask pte bits 63:52
+ * val[35:24] = new pte bits 11:0 (where corresp. Mask bit == 1)
+ * val[47:36] = new pte bits 63:52
+ * val[57:48] = number of ptes in range
+ * val[63:58] = MBZ
  */
 #define MMU_NORMAL_PT_UPDATE     0 /* checked '*ptr = val'. ptr is MA.       */
 #define MMU_MACHPHYS_UPDATE      1 /* ptr = MA of frame to modify entry for  */
+#define MMU_ATOMIC_PT_UPDATE     2 /* checked atomic '*ptr = val'. ptr is MA.*/
+#define MMU_FLAG_RANGE_UPDATE    3 /* range atomic checked '*ptr flags = val'*/
 
 /*
  * MMU EXTENDED OPERATIONS
diff -r 2491691e3e69 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Sat Dec 29 17:57:47 2007 +0000
+++ b/xen/arch/x86/mm.c	Mon Jan 07 18:05:19 2008 -0700
@@ -1338,6 +1338,97 @@ static void free_l4_table(struct page_in
 
 /* How to write an entry to the guest pagetables.
  * Returns 0 for failure (pointer not valid), 1 for success. */
+static inline int update_intpte_sync(intpte_t *p, 
+                                intpte_t old, 
+                                intpte_t new,
+                                unsigned long mfn,
+                                struct vcpu *v)
+{
+    int rv;
+    intpte_t t = old;
+    for ( ; ; )
+    {
+        rv = paging_cmpxchg_guest_entry(v, p, &t, new, _mfn(mfn));
+        if ( unlikely(rv == 0) )
+        {
+            MEM_LOG("Failed to update %" PRIpte " -> %" PRIpte
+                    ": saw %" PRIpte, old, new, t);
+            break;
+        }
+
+        if ( t == old )
+            break;
+
+        /* Allowed to change in Accessed/Dirty flags only. */
+        BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
+
+        new |= t ^ old;
+        old = t;
+    }
+    return rv;
+}
+
+/* Macro that wraps the appropriate type-changes around update_intpte().
+ * Arguments are: type, ptr, old, new, mfn, vcpu */
+#define UPDATE_ENTRY_SYNC(_t,_p,_o,_n,_m,_v)                             \
+    update_intpte_sync(&_t ## e_get_intpte(*(_p)),                       \
+                  _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
+                  (_m), (_v))
+
+/* Update the L1 entry at pl1e to new value nl1e. */
+static int mod_l1_entry_sync(l1_pgentry_t *pl1e, l1_pgentry_t ol1e,
+                             l1_pgentry_t nl1e, unsigned long gl1mfn)
+{
+    struct vcpu *curr = current;
+    struct domain *d = curr->domain;
+    unsigned long mfn;
+
+    if ( unlikely(paging_mode_refcounts(d)) )
+        return UPDATE_ENTRY_SYNC(l1, pl1e, ol1e, nl1e, gl1mfn, curr);
+
+    if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
+    {
+        /* Translate foreign guest addresses. */
+        mfn = gmfn_to_mfn(FOREIGNDOM, l1e_get_pfn(nl1e));
+        if ( unlikely(mfn == INVALID_MFN) )
+            return 0;
+        ASSERT((mfn & ~(PADDR_MASK >> PAGE_SHIFT)) == 0);
+        nl1e = l1e_from_pfn(mfn, l1e_get_flags(nl1e));
+
+        if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(d)) )
+        {
+            MEM_LOG("Bad L1 flags %x",
+                    l1e_get_flags(nl1e) & l1_disallow_mask(d));
+            return 0;
+        }
+
+        adjust_guest_l1e(nl1e, d);
+
+        /* Fast path for identical mapping, r/w and presence. */
+        if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
+            return UPDATE_ENTRY_SYNC(l1, pl1e, ol1e, nl1e, gl1mfn, curr);
+
+        if ( unlikely(!get_page_from_l1e(nl1e, FOREIGNDOM)) )
+            return 0;
+        
+        if ( unlikely(!UPDATE_ENTRY_SYNC(l1, pl1e, ol1e, nl1e, gl1mfn, curr)) )
+        {
+            put_page_from_l1e(nl1e, d);
+            return 0;
+        }
+    }
+    else
+    {
+        if ( unlikely(!UPDATE_ENTRY_SYNC(l1, pl1e, ol1e, nl1e, gl1mfn, curr)) )
+            return 0;
+    }
+
+    put_page_from_l1e(ol1e, d);
+    return 1;
+}
+
+/* How to write an entry to the guest pagetables.
+ * Returns 0 for failure (pointer not valid), 1 for success. */
 static inline int update_intpte(intpte_t *p, 
                                 intpte_t old, 
                                 intpte_t new,
@@ -2403,11 +2494,13 @@ int do_mmu_update(
     unsigned long gpfn, gmfn, mfn;
     struct page_info *page;
     int rc = 0, okay = 1, i = 0;
-    unsigned int cmd, done = 0;
+    unsigned int cmd, done = 0, pte_cnt;
     struct vcpu *v = current;
     struct domain *d = v->domain;
     unsigned long type_info;
     struct domain_mmap_cache mapcache;
+    uint64_t mask_flags, add_flags;
+    l1_pgentry_t ol1e;
 
     if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
     {
@@ -2577,6 +2670,185 @@ int do_mmu_update(
             paging_mark_dirty(FOREIGNDOM, mfn);
 
             put_page(mfn_to_page(mfn));
+            break;
+
+        case MMU_ATOMIC_PT_UPDATE:
+
+            rc = xsm_mmu_atomic_update(d, req.val);
+            if ( rc )
+                break;
+
+            req.ptr -= cmd;
+
+            gmfn = req.ptr >> PAGE_SHIFT;
+            mfn = gmfn_to_mfn(d, gmfn);
+
+            if ( unlikely(!get_page_from_pagenr(mfn, d)) )
+            {
+                MEM_LOG("Could not get page for normal update");
+                break;
+            }
+
+            va = map_domain_page_with_cache(mfn, &mapcache);
+            va = (void *)((unsigned long)va +
+                          (unsigned long)(req.ptr & ~PAGE_MASK));
+            page = mfn_to_page(mfn);
+
+            switch ( (type_info = page->u.inuse.type_info) & PGT_type_mask )
+            {
+            case PGT_l1_page_table:
+            {
+                l1_pgentry_t l1e = l1e_from_intpte(req.val);
+
+                if ( paging_mode_refcounts(d) )
+                {
+                    MEM_LOG("mmu update on auto-refcounted domain!");
+                    break;
+                }
+
+                if ( unlikely(!get_page_type(
+                    page, type_info & (PGT_type_mask|PGT_pae_xen_l2))) )
+                    goto atomic_not_a_pt;
+
+                ol1e = *(l1_pgentry_t *)va;
+                okay = mod_l1_entry_sync(va, ol1e, l1e, mfn);
+                put_page_type(page);
+            }
+            break;
+
+            case PGT_l2_page_table:
+            case PGT_l3_page_table:
+            case PGT_l4_page_table:
+                MEM_LOG("Not L1 page table");
+                break;
+
+            default:
+            atomic_not_a_pt:
+            {
+                if ( unlikely(!get_page_type(page, PGT_writable_page)) )
+                    break;
+
+                okay = paging_write_guest_entry(v, va, req.val, _mfn(mfn));
+
+                put_page_type(page);
+            }
+            break;
+            }
+
+            unmap_domain_page_with_cache(va, &mapcache);
+
+            put_page(page);
+            break;
+
+        case MMU_FLAG_RANGE_UPDATE:
+
+            rc = xsm_mmu_flag_range_update(d, req.val);
+            if ( rc )
+                break;
+
+            req.ptr -= cmd;
+
+            pte_cnt = (req.val >> 48) & 0x3ff;
+
+            if (pte_cnt == 0)
+            {
+                okay = 1;
+                break;
+            }
+            if ((req.ptr & ~PAGE_MASK) + (pte_cnt * sizeof(l1_pgentry_t)) > PAGE_SIZE)
+            {
+                // count extends beyond end of page table
+                MEM_LOG("Count value is too large");
+                break;
+            }
+
+            gmfn = req.ptr >> PAGE_SHIFT;
+            mfn = gmfn_to_mfn(d, gmfn);
+
+            page = mfn_to_page(mfn);
+
+            if ( unlikely(!get_page_from_pagenr(mfn, d)) )
+            {
+                MEM_LOG("Could not get page for normal update");
+                break;
+            }
+
+            va = map_domain_page_with_cache(mfn, &mapcache);
+
+            mask_flags = ((req.val & 0xfff000ULL) << 40) | 
+               (req.val & 0xfffULL);
+            add_flags = ((req.val & 0xfff000000000ULL) << 16) |
+               ((req.val >> 24) & 0xfffULL);
+
+            switch ( (type_info = page->u.inuse.type_info) & PGT_type_mask )
+            {
+            case PGT_l1_page_table:
+            {
+                if ( paging_mode_refcounts(d) )
+                {
+                    MEM_LOG("mmu update on auto-refcounted domain!");
+                    break;
+                }
+
+                if ( unlikely(!get_page_type(
+                    page, type_info & (PGT_type_mask|PGT_pae_xen_l2))) )
+                    goto range_not_a_pt;
+
+                va = (void *)((unsigned long)va +
+                    (unsigned long)(req.ptr & ~PAGE_MASK) -
+                    sizeof(l1_pgentry_t));
+                do
+                {
+                    l1_pgentry_t nl1e;
+    
+                    va += sizeof(l1_pgentry_t);
+                    ol1e = *(l1_pgentry_t *)va;
+                    nl1e = ol1e;
+                    nl1e.l1 &= ~mask_flags;
+                    nl1e.l1 |= add_flags;
+                    okay = mod_l1_entry_sync(va, ol1e, nl1e, mfn);
+                    if (!okay)
+                        break;
+                } while (--pte_cnt);
+
+                put_page_type(page);
+            }
+            break;
+
+            case PGT_l2_page_table:
+            case PGT_l3_page_table:
+            case PGT_l4_page_table:
+                MEM_LOG("Operation permitted on l1 page table only");
+                break;
+
+
+            default:
+            range_not_a_pt:
+            {
+                if ( unlikely(!get_page_type(page, PGT_writable_page)) )
+                    break;
+                va = (void *)((unsigned long)va +
+                    (unsigned long)(req.ptr & ~PAGE_MASK) -
+                    sizeof(l1_pgentry_t));
+                do
+                {
+                    l1_pgentry_t l1e;
+                    va += sizeof(l1_pgentry_t);
+                    l1e = *(l1_pgentry_t *)va;
+                    l1e.l1 &= ~mask_flags;
+                    l1e.l1 |= add_flags;
+                    okay = paging_write_guest_entry(v, va, l1e.l1, _mfn(mfn));
+                    if (!okay)
+                        break;
+                } while (--pte_cnt);
+                put_page_type(page);
+            }
+            break;
+            }
+
+            unmap_domain_page_with_cache(va, &mapcache);
+
+            put_page(page);
             break;
 
         default:
diff -r 2491691e3e69 xen/include/xsm/xsm.h
--- a/xen/include/xsm/xsm.h	Sat Dec 29 17:57:47 2007 +0000
+++ b/xen/include/xsm/xsm.h	Mon Jan 07 18:05:56 2008 -0700
@@ -136,6 +136,8 @@ struct xsm_operations {
     int (*mmu_machphys_update) (struct domain *d, unsigned long mfn);
     int (*update_va_mapping) (struct domain *d, l1_pgentry_t pte);
     int (*add_to_physmap) (struct domain *d1, struct domain *d2);
+    int (*mmu_atomic_update) (struct domain *d, intpte_t fpte);
+    int (*mmu_flag_range_update) (struct domain *d, intpte_t fpte);
 #endif
 };
 
@@ -532,6 +534,16 @@ static inline int xsm_add_to_physmap(str
 {
     return xsm_call(add_to_physmap(d1, d2));
 }
+
+static inline int xsm_mmu_atomic_update (struct domain *d, intpte_t fpte)
+{
+    return xsm_call(mmu_atomic_update(d, fpte));
+}
+
+static inline int xsm_mmu_flag_range_update (struct domain *d, intpte_t fpte)
+{
+    return xsm_call(mmu_flag_range_update(d, fpte));
+}
 #endif /* CONFIG_X86 */
 
 #endif /* __XSM_H */
diff -r 2491691e3e69 xen/xsm/dummy.c
--- a/xen/xsm/dummy.c	Sat Dec 29 17:57:47 2007 +0000
+++ b/xen/xsm/dummy.c	Mon Jan 07 18:06:07 2008 -0700
@@ -385,6 +385,16 @@ static int dummy_add_to_physmap (struct 
 {
     return 0;
 }
+
+static int dummy_mmu_atomic_update (struct domain *d, intpte_t fpte)
+{
+    return 0;
+}
+
+static int dummy_mmu_flag_range_update (struct domain *d, intpte_t fpte)
+{
+    return 0;
+}
 #endif
 
 struct xsm_operations dummy_xsm_ops;
@@ -484,5 +494,7 @@ void xsm_fixup_ops (struct xsm_operation
     set_to_dummy_if_null(ops, mmu_machphys_update);
     set_to_dummy_if_null(ops, update_va_mapping);
     set_to_dummy_if_null(ops, add_to_physmap);
+    set_to_dummy_if_null(ops, mmu_atomic_update);
+    set_to_dummy_if_null(ops, mmu_flag_range_update);
 #endif
 }

[-- Attachment #4: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] respin of mprotect performance patch
  2008-01-08  1:05 [PATCH] respin of mprotect performance patch Bruce Rogers
@ 2008-01-10 15:52 ` Keir Fraser
  2008-01-10 17:13   ` Bruce Rogers
  2008-01-12 11:47 ` Keir Fraser
  1 sibling, 1 reply; 12+ messages in thread
From: Keir Fraser @ 2008-01-10 15:52 UTC (permalink / raw)
  To: Bruce Rogers, xen-devel

It's taken me a while to work out that MMU_ATOMIC_PT_UPDATE is actually
'atomic' in that it preserves the existing A/D bits (and should ignore the
ones specified in the given new value). Am I correct?

In which case:
 (1) The name and description in xen.h should be better. E.g.,
MMU_PT_UPDATE_PRESERVE_AD, with a corresponding comment explaining the
atomicity/consistency guarantee.
 (2) The implementation is actually wrong, as it only attempts to propagate
A/D bits into the 'new' value if the first cmpxchg attempt fails. Otherwise
the A/D values passed in by the guest are used, but the PTE might change
before Xen itself reads the ol1e.
 (3) In general, the hypercall implementation can probably be slimmed down a
bunch. There's lots of code duplication.

I can take a swing at the Xen side of this, if I haven't got the wrong end
of the stick?

 -- Keir

On 8/1/08 01:05, "Bruce Rogers" <brogers@novell.com> wrote:

> I've incorporated the comments of Jan and Keir into the attached patches, with
> the exception of moving mmu_ops related entries from public/xen.h into
> architecture specific headers.
> Thanks for your review.  Let me know if I've missed anything.
> 
> Signed-off-by: Bruce Rogers <brogers@novell.com>
> 
> - Bruce
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] respin of mprotect performance patch
  2008-01-10 15:52 ` Keir Fraser
@ 2008-01-10 17:13   ` Bruce Rogers
  2008-01-10 17:42     ` Keir Fraser
  0 siblings, 1 reply; 12+ messages in thread
From: Bruce Rogers @ 2008-01-10 17:13 UTC (permalink / raw)
  To: Keir Fraser, xen-devel



>>> On 1/10/2008 at 8:52 AM, in message <C3ABEFCB.1A5C8%Keir.Fraser@cl.cam.ac.uk>,
Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:
> It's taken me a while to work out that MMU_ATOMIC_PT_UPDATE is actually
> 'atomic' in that it preserves the existing A/D bits (and should ignore the
> ones specified in the given new value). Am I correct?
>From the point where the new bits are determined to the point where the pte is actually updated, the only way that the new value can be out of date is if the hardware mmu has updated (set) the A/D bits.  I don't believe they could have been cleared.  So I see it as not ignoring the A/D bits, but accepting that they may have been updated (set) and allowing for that change.
> 
> In which case:
>  (1) The name and description in xen.h should be better. E.g.,
> MMU_PT_UPDATE_PRESERVE_AD, with a corresponding comment explaining the
> atomicity/consistency guarantee.
That would be fine.  I didn't know what other architectures might have slightly different semantics so I stuck with a general (wider coverage) *atomic* nomenclature.  Since it's been pointed out that mmu_ops is x86 specific, then specifying that we're talking about the treatment of the A/D bits would be fine.
>  (2) The implementation is actually wrong, as it only attempts to propagate
> A/D bits into the 'new' value if the first cmpxchg attempt fails. Otherwise
> the A/D values passed in by the guest are used, but the PTE might change
> before Xen itself reads the ol1e.
Yes - I see that. So we need to OR in those bits from ol1e into the new value before the cmpxchg.  Thanks for catching that blunder.
>  (3) In general, the hypercall implementation can probably be slimmed down a
> bunch. There's lots of code duplication.
Agreed.  I wasn't sure how much I wanted to disrupt the other (existing) code paths, but it could certainly be combined.
> 
> I can take a swing at the Xen side of this, if I haven't got the wrong end
> of the stick?
That would be great.

Do you have any comment on whether the second hypercall is useful enough to keep?  It would be quite simple for me to rip it out if not.

- Bruce
> 
>  -- Keir
> 
> On 8/1/08 01:05, "Bruce Rogers" <brogers@novell.com> wrote:
> 
>> I've incorporated the comments of Jan and Keir into the attached patches, 
> with
>> the exception of moving mmu_ops related entries from public/xen.h into
>> architecture specific headers.
>> Thanks for your review.  Let me know if I've missed anything.
>> 
>> Signed-off-by: Bruce Rogers <brogers@novell.com>
>> 
>> - Bruce
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com 
>> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] respin of mprotect performance patch
  2008-01-10 17:13   ` Bruce Rogers
@ 2008-01-10 17:42     ` Keir Fraser
  0 siblings, 0 replies; 12+ messages in thread
From: Keir Fraser @ 2008-01-10 17:42 UTC (permalink / raw)
  To: Bruce Rogers, xen-devel

On 10/1/08 17:13, "Bruce Rogers" <BROGERS@novell.com> wrote:

>> I can take a swing at the Xen side of this, if I haven't got the wrong end
>> of the stick?
> That would be great.

All right, I'll take a look.

> Do you have any comment on whether the second hypercall is useful enough to
> keep?  It would be quite simple for me to rip it out if not.

It's kind of odd, being a batched sub-command within a batched hypercall. I
think we'll leave it out for now unless there's a really compelling reason
to introduce it.

 -- Keir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] respin of mprotect performance patch
  2008-01-08  1:05 [PATCH] respin of mprotect performance patch Bruce Rogers
  2008-01-10 15:52 ` Keir Fraser
@ 2008-01-12 11:47 ` Keir Fraser
  2008-01-13  3:08   ` Bruce Rogers
  1 sibling, 1 reply; 12+ messages in thread
From: Keir Fraser @ 2008-01-12 11:47 UTC (permalink / raw)
  To: Bruce Rogers, xen-devel

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

Bruce,

Attached is my equivalent implementation of ATOMIC_PT_UPDATE. Notice that I
changed the name to MMU_PT_UPDATE_PRESERVE_AD, also that I removed a bunch
of code duplication albeit at the expense of passing yet another parameter
around.

Could you rev your Linux patch and give it a spin?

 Thanks,
 Keir

On 8/1/08 01:05, "Bruce Rogers" <brogers@novell.com> wrote:

> I've incorporated the comments of Jan and Keir into the attached patches, with
> the exception of moving mmu_ops related entries from public/xen.h into
> architecture specific headers.
> Thanks for your review.  Let me know if I've missed anything.
> 
> Signed-off-by: Bruce Rogers <brogers@novell.com>
> 
> - Bruce
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


[-- Attachment #2: 00-preserve-ad.patch --]
[-- Type: application/octet-stream, Size: 13290 bytes --]

diff -r 533a8e6cebd0 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Sat Jan 12 11:26:04 2008 +0000
+++ b/xen/arch/x86/mm.c	Sat Jan 12 11:46:20 2008 +0000
@@ -1342,16 +1342,24 @@ static inline int update_intpte(intpte_t
                                 intpte_t old, 
                                 intpte_t new,
                                 unsigned long mfn,
-                                struct vcpu *v)
+                                struct vcpu *v,
+                                int preserve_ad)
 {
     int rv = 1;
 #ifndef PTE_UPDATE_WITH_CMPXCHG
-    rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
+    if ( !preserve_ad )
+    {
+        rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
+    }
+    else
 #else
     {
         intpte_t t = old;
         for ( ; ; )
         {
+            if ( preserve_ad )
+                new ^= (old ^ new) & (_PAGE_ACCESSED | _PAGE_DIRTY);
+
             rv = paging_cmpxchg_guest_entry(v, p, &t, new, _mfn(mfn));
             if ( unlikely(rv == 0) )
             {
@@ -1375,14 +1383,14 @@ static inline int update_intpte(intpte_t
 
 /* Macro that wraps the appropriate type-changes around update_intpte().
  * Arguments are: type, ptr, old, new, mfn, vcpu */
-#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v)                             \
+#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad)                         \
     update_intpte(&_t ## e_get_intpte(*(_p)),                       \
                   _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
-                  (_m), (_v))
+                  (_m), (_v), (_ad))
 
 /* Update the L1 entry at pl1e to new value nl1e. */
 static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e, 
-                        unsigned long gl1mfn)
+                        unsigned long gl1mfn, int preserve_ad)
 {
     l1_pgentry_t ol1e;
     struct vcpu *curr = current;
@@ -1393,7 +1401,7 @@ static int mod_l1_entry(l1_pgentry_t *pl
         return 0;
 
     if ( unlikely(paging_mode_refcounts(d)) )
-        return UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr);
+        return UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr, preserve_ad);
 
     if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
     {
@@ -1415,12 +1423,14 @@ static int mod_l1_entry(l1_pgentry_t *pl
 
         /* Fast path for identical mapping, r/w and presence. */
         if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
-            return UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr);
+            return UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr,
+                                preserve_ad);
 
         if ( unlikely(!get_page_from_l1e(nl1e, FOREIGNDOM)) )
             return 0;
         
-        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr)) )
+        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr,
+                                    preserve_ad)) )
         {
             put_page_from_l1e(nl1e, d);
             return 0;
@@ -1428,7 +1438,8 @@ static int mod_l1_entry(l1_pgentry_t *pl
     }
     else
     {
-        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr)) )
+        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr,
+                                    preserve_ad)) )
             return 0;
     }
 
@@ -1441,7 +1452,8 @@ static int mod_l2_entry(l2_pgentry_t *pl
 static int mod_l2_entry(l2_pgentry_t *pl2e, 
                         l2_pgentry_t nl2e, 
                         unsigned long pfn,
-                        unsigned long type)
+                        unsigned long type,
+                        int preserve_ad)
 {
     l2_pgentry_t ol2e;
     struct vcpu *curr = current;
@@ -1469,18 +1481,20 @@ static int mod_l2_entry(l2_pgentry_t *pl
 
         /* Fast path for identical mapping and presence. */
         if ( !l2e_has_changed(ol2e, nl2e, _PAGE_PRESENT))
-            return UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, curr);
+            return UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, curr, preserve_ad);
 
         if ( unlikely(!get_page_from_l2e(nl2e, pfn, d)) )
             return 0;
 
-        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, curr)) )
+        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, curr,
+                                    preserve_ad)) )
         {
             put_page_from_l2e(nl2e, pfn);
             return 0;
         }
     }
-    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, curr)) )
+    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, curr,
+                                     preserve_ad)) )
     {
         return 0;
     }
@@ -1494,7 +1508,8 @@ static int mod_l2_entry(l2_pgentry_t *pl
 /* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
 static int mod_l3_entry(l3_pgentry_t *pl3e, 
                         l3_pgentry_t nl3e, 
-                        unsigned long pfn)
+                        unsigned long pfn,
+                        int preserve_ad)
 {
     l3_pgentry_t ol3e;
     struct vcpu *curr = current;
@@ -1532,18 +1547,20 @@ static int mod_l3_entry(l3_pgentry_t *pl
 
         /* Fast path for identical mapping and presence. */
         if (!l3e_has_changed(ol3e, nl3e, _PAGE_PRESENT))
-            return UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, curr);
+            return UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, curr, preserve_ad);
 
         if ( unlikely(!get_page_from_l3e(nl3e, pfn, d)) )
             return 0;
 
-        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, curr)) )
+        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, curr,
+                                    preserve_ad)) )
         {
             put_page_from_l3e(nl3e, pfn);
             return 0;
         }
     }
-    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, curr)) )
+    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, curr,
+                                     preserve_ad)) )
     {
         return 0;
     }
@@ -1564,7 +1581,8 @@ static int mod_l3_entry(l3_pgentry_t *pl
 /* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
 static int mod_l4_entry(l4_pgentry_t *pl4e, 
                         l4_pgentry_t nl4e, 
-                        unsigned long pfn)
+                        unsigned long pfn,
+                        int preserve_ad)
 {
     struct vcpu *curr = current;
     struct domain *d = curr->domain;
@@ -1592,18 +1610,20 @@ static int mod_l4_entry(l4_pgentry_t *pl
 
         /* Fast path for identical mapping and presence. */
         if (!l4e_has_changed(ol4e, nl4e, _PAGE_PRESENT))
-            return UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, curr);
+            return UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, curr, preserve_ad);
 
         if ( unlikely(!get_page_from_l4e(nl4e, pfn, d)) )
             return 0;
 
-        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, curr)) )
+        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, curr,
+                                    preserve_ad)) )
         {
             put_page_from_l4e(nl4e, pfn);
             return 0;
         }
     }
-    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, curr)) )
+    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, curr,
+                                     preserve_ad)) )
     {
         return 0;
     }
@@ -1946,7 +1966,7 @@ int new_guest_cr3(unsigned long mfn)
                     l4e_from_pfn(
                         mfn,
                         (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)),
-                    pagetable_get_pfn(v->arch.guest_table));
+                    pagetable_get_pfn(v->arch.guest_table), 0);
         if ( unlikely(!okay) )
         {
             MEM_LOG("Error while installing new compat baseptr %lx", mfn);
@@ -2458,9 +2478,10 @@ int do_mmu_update(
         {
             /*
              * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
+             * MMU_UPDATE_PT_PRESERVE_AD: As above but do not change A/D bits.
              */
         case MMU_NORMAL_PT_UPDATE:
-
+        case MMU_PT_UPDATE_PRESERVE_AD:
             rc = xsm_mmu_normal_update(d, req.val);
             if ( rc )
                 break;
@@ -2501,20 +2522,23 @@ int do_mmu_update(
                 case PGT_l1_page_table:
                 {
                     l1_pgentry_t l1e = l1e_from_intpte(req.val);
-                    okay = mod_l1_entry(va, l1e, mfn);
+                    okay = mod_l1_entry(va, l1e, mfn,
+                                        cmd == MMU_PT_UPDATE_PRESERVE_AD);
                 }
                 break;
                 case PGT_l2_page_table:
                 {
                     l2_pgentry_t l2e = l2e_from_intpte(req.val);
-                    okay = mod_l2_entry(va, l2e, mfn, type_info);
+                    okay = mod_l2_entry(va, l2e, mfn, type_info,
+                                        cmd == MMU_PT_UPDATE_PRESERVE_AD);
                 }
                 break;
 #if CONFIG_PAGING_LEVELS >= 3
                 case PGT_l3_page_table:
                 {
                     l3_pgentry_t l3e = l3e_from_intpte(req.val);
-                    okay = mod_l3_entry(va, l3e, mfn);
+                    okay = mod_l3_entry(va, l3e, mfn,
+                                        cmd == MMU_PT_UPDATE_PRESERVE_AD);
                 }
                 break;
 #endif
@@ -2522,7 +2546,8 @@ int do_mmu_update(
                 case PGT_l4_page_table:
                 {
                     l4_pgentry_t l4e = l4e_from_intpte(req.val);
-                    okay = mod_l4_entry(va, l4e, mfn);
+                    okay = mod_l4_entry(va, l4e, mfn,
+                                        cmd == MMU_PT_UPDATE_PRESERVE_AD);
                 }
                 break;
 #endif
@@ -2652,7 +2677,7 @@ static int create_grant_pte_mapping(
     }
 
     ol1e = *(l1_pgentry_t *)va;
-    if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v) )
+    if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) )
     {
         put_page_type(page);
         rc = GNTST_general_error;
@@ -2720,9 +2745,11 @@ static int destroy_grant_pte_mapping(
     }
 
     /* Delete pagetable entry. */
-    if ( unlikely(!UPDATE_ENTRY(l1, 
-                      (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn, 
-                      d->vcpu[0] /* Change if we go to per-vcpu shadows. */)) )
+    if ( unlikely(!UPDATE_ENTRY
+                  (l1, 
+                   (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn, 
+                   d->vcpu[0] /* Change if we go to per-vcpu shadows. */,
+                   0)) )
     {
         MEM_LOG("Cannot delete PTE entry at %p", va);
         put_page_type(page);
@@ -2758,7 +2785,7 @@ static int create_grant_va_mapping(
         return GNTST_general_error;
     }
     ol1e = *pl1e;
-    okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v);
+    okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0);
     guest_unmap_l1e(v, pl1e);
     pl1e = NULL;
 
@@ -2796,7 +2823,7 @@ static int replace_grant_va_mapping(
     }
 
     /* Delete pagetable entry. */
-    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v)) )
+    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
     {
         MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
         rc = GNTST_general_error;
@@ -2860,7 +2887,8 @@ int replace_grant_host_mapping(
     }
     ol1e = *pl1e;
 
-    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(), gl1mfn, curr)) )
+    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(),
+                                gl1mfn, curr, 0)) )
     {
         MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
         guest_unmap_l1e(curr, pl1e);
@@ -2948,7 +2976,7 @@ int do_update_va_mapping(unsigned long v
 
     pl1e = guest_map_l1e(v, va, &gl1mfn);
 
-    if ( unlikely(!pl1e || !mod_l1_entry(pl1e, val, gl1mfn)) )
+    if ( unlikely(!pl1e || !mod_l1_entry(pl1e, val, gl1mfn, 0)) )
         rc = -EINVAL;
 
     if ( pl1e )
@@ -3517,7 +3545,7 @@ static int ptwr_emulated_update(
     else
     {
         ol1e = *pl1e;
-        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v) )
+        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
             BUG();
     }
 
diff -r 533a8e6cebd0 xen/include/public/xen.h
--- a/xen/include/public/xen.h	Sat Jan 12 11:26:04 2008 +0000
+++ b/xen/include/public/xen.h	Sat Jan 12 11:47:46 2008 +0000
@@ -168,9 +168,13 @@
  * ptr[:2]  -- Machine address within the frame whose mapping to modify.
  *             The frame must belong to the FD, if one is specified.
  * val      -- Value to write into the mapping entry.
- */
-#define MMU_NORMAL_PT_UPDATE     0 /* checked '*ptr = val'. ptr is MA.       */
-#define MMU_MACHPHYS_UPDATE      1 /* ptr = MA of frame to modify entry for  */
+ * 
+ * ptr[1:0] == MMU_PT_UPDATE_PRESERVE_AD:
+ * As MMU_NORMAL_PT_UPDATE above, but A/D bits in the PTE are not affected.
+ */
+#define MMU_NORMAL_PT_UPDATE      0 /* checked '*ptr = val'. ptr is MA.      */
+#define MMU_MACHPHYS_UPDATE       1 /* ptr = MA of frame to modify entry for */
+#define MMU_PT_UPDATE_PRESERVE_AD 2 /* '*ptr = val', preserve A/D bits       */
 
 /*
  * MMU EXTENDED OPERATIONS

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] respin of mprotect performance patch
  2008-01-12 11:47 ` Keir Fraser
@ 2008-01-13  3:08   ` Bruce Rogers
  2008-01-13  9:21     ` Keir Fraser
  2008-01-21 15:41     ` Rik van Riel
  0 siblings, 2 replies; 12+ messages in thread
From: Bruce Rogers @ 2008-01-13  3:08 UTC (permalink / raw)
  To: Keir Fraser, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1628 bytes --]

I made a few tweaks to your patch as follows:
As I see it we need to preserve the existing AD bits but additionally OR in the AD bits being passed in (see behavior of pte_modify(), where it preserves current AD in use of _PAGE_CHG_MASK along with ORing in newprot bits, which include AD bits (A at least).)
See update_intpte() for this change.

Also we need to remove cmd value from req.ptr in do_mmu_update..

Updated Linux patch is also included.

Let me know if I'm missing the point some how.

Signed-off-by: Bruce Rogers <brogers@novell.com>

- Bruce

>>> On Sat, Jan 12, 2008 at  4:47 AM, in message
<C3AE5951.12129%Keir.Fraser@cl.cam.ac.uk>, Keir Fraser
<Keir.Fraser@cl.cam.ac.uk> wrote: 
> Bruce,
> 
> Attached is my equivalent implementation of ATOMIC_PT_UPDATE. Notice that I
> changed the name to MMU_PT_UPDATE_PRESERVE_AD, also that I removed a bunch
> of code duplication albeit at the expense of passing yet another parameter
> around.
> 
> Could you rev your Linux patch and give it a spin?
> 
>  Thanks,
>  Keir
> 
> On 8/1/08 01:05, "Bruce Rogers" <brogers@novell.com> wrote:
> 
>> I've incorporated the comments of Jan and Keir into the attached patches, 
> with
>> the exception of moving mmu_ops related entries from public/xen.h into
>> architecture specific headers.
>> Thanks for your review.  Let me know if I've missed anything.
>> 
>> Signed-off-by: Bruce Rogers <brogers@novell.com>
>> 
>> - Bruce
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel



[-- Attachment #2: 01-preserve-ad.patch --]
[-- Type: text/plain, Size: 13849 bytes --]

diff -r 533a8e6cebd0 xen/include/public/xen.h
--- a/xen/include/public/xen.h	Sat Jan 12 11:26:04 2008 +0000
+++ b/xen/include/public/xen.h	Sat Jan 12 19:35:35 2008 -0700
@@ -168,9 +168,13 @@
  * ptr[:2]  -- Machine address within the frame whose mapping to modify.
  *             The frame must belong to the FD, if one is specified.
  * val      -- Value to write into the mapping entry.
+ * 
+ * ptr[1:0] == MMU_PT_UPDATE_PRESERVE_AD:
+ * As MMU_NORMAL_PT_UPDATE above, but A/D bits in the PTE are preserved (ORed).
  */
-#define MMU_NORMAL_PT_UPDATE     0 /* checked '*ptr = val'. ptr is MA.       */
-#define MMU_MACHPHYS_UPDATE      1 /* ptr = MA of frame to modify entry for  */
+#define MMU_NORMAL_PT_UPDATE      0 /* checked '*ptr = val'. ptr is MA.      */
+#define MMU_MACHPHYS_UPDATE       1 /* ptr = MA of frame to modify entry for */
+#define MMU_PT_UPDATE_PRESERVE_AD 2 /* '*ptr = val', preserve (OR) A/D bits  */
 
 /*
  * MMU EXTENDED OPERATIONS
diff -r 533a8e6cebd0 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Sat Jan 12 11:26:04 2008 +0000
+++ b/xen/arch/x86/mm.c	Sat Jan 12 19:35:39 2008 -0700
@@ -1342,16 +1342,24 @@ static inline int update_intpte(intpte_t
                                 intpte_t old, 
                                 intpte_t new,
                                 unsigned long mfn,
-                                struct vcpu *v)
+                                struct vcpu *v,
+                                int preserve_ad)
 {
     int rv = 1;
 #ifndef PTE_UPDATE_WITH_CMPXCHG
-    rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
-#else
+    if ( !preserve_ad )
+    {
+        rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
+    }
+    else
+#endif
     {
         intpte_t t = old;
         for ( ; ; )
         {
+            if ( preserve_ad )
+                new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
+
             rv = paging_cmpxchg_guest_entry(v, p, &t, new, _mfn(mfn));
             if ( unlikely(rv == 0) )
             {
@@ -1369,20 +1377,19 @@ static inline int update_intpte(intpte_t
             old = t;
         }
     }
-#endif
     return rv;
 }
 
 /* Macro that wraps the appropriate type-changes around update_intpte().
  * Arguments are: type, ptr, old, new, mfn, vcpu */
-#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v)                             \
+#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad)                         \
     update_intpte(&_t ## e_get_intpte(*(_p)),                       \
                   _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
-                  (_m), (_v))
+                  (_m), (_v), (_ad))
 
 /* Update the L1 entry at pl1e to new value nl1e. */
 static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e, 
-                        unsigned long gl1mfn)
+                        unsigned long gl1mfn, int preserve_ad)
 {
     l1_pgentry_t ol1e;
     struct vcpu *curr = current;
@@ -1393,7 +1400,7 @@ static int mod_l1_entry(l1_pgentry_t *pl
         return 0;
 
     if ( unlikely(paging_mode_refcounts(d)) )
-        return UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr);
+        return UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr, preserve_ad);
 
     if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
     {
@@ -1415,12 +1422,14 @@ static int mod_l1_entry(l1_pgentry_t *pl
 
         /* Fast path for identical mapping, r/w and presence. */
         if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
-            return UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr);
+            return UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr,
+                                preserve_ad);
 
         if ( unlikely(!get_page_from_l1e(nl1e, FOREIGNDOM)) )
             return 0;
         
-        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr)) )
+        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr,
+                                    preserve_ad)) )
         {
             put_page_from_l1e(nl1e, d);
             return 0;
@@ -1428,7 +1437,8 @@ static int mod_l1_entry(l1_pgentry_t *pl
     }
     else
     {
-        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr)) )
+        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr,
+                                    preserve_ad)) )
             return 0;
     }
 
@@ -1441,7 +1451,8 @@ static int mod_l2_entry(l2_pgentry_t *pl
 static int mod_l2_entry(l2_pgentry_t *pl2e, 
                         l2_pgentry_t nl2e, 
                         unsigned long pfn,
-                        unsigned long type)
+                        unsigned long type,
+                        int preserve_ad)
 {
     l2_pgentry_t ol2e;
     struct vcpu *curr = current;
@@ -1469,18 +1480,20 @@ static int mod_l2_entry(l2_pgentry_t *pl
 
         /* Fast path for identical mapping and presence. */
         if ( !l2e_has_changed(ol2e, nl2e, _PAGE_PRESENT))
-            return UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, curr);
+            return UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, curr, preserve_ad);
 
         if ( unlikely(!get_page_from_l2e(nl2e, pfn, d)) )
             return 0;
 
-        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, curr)) )
+        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, curr,
+                                    preserve_ad)) )
         {
             put_page_from_l2e(nl2e, pfn);
             return 0;
         }
     }
-    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, curr)) )
+    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, curr,
+                                     preserve_ad)) )
     {
         return 0;
     }
@@ -1494,7 +1507,8 @@ static int mod_l2_entry(l2_pgentry_t *pl
 /* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
 static int mod_l3_entry(l3_pgentry_t *pl3e, 
                         l3_pgentry_t nl3e, 
-                        unsigned long pfn)
+                        unsigned long pfn,
+                        int preserve_ad)
 {
     l3_pgentry_t ol3e;
     struct vcpu *curr = current;
@@ -1532,18 +1546,20 @@ static int mod_l3_entry(l3_pgentry_t *pl
 
         /* Fast path for identical mapping and presence. */
         if (!l3e_has_changed(ol3e, nl3e, _PAGE_PRESENT))
-            return UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, curr);
+            return UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, curr, preserve_ad);
 
         if ( unlikely(!get_page_from_l3e(nl3e, pfn, d)) )
             return 0;
 
-        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, curr)) )
+        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, curr,
+                                    preserve_ad)) )
         {
             put_page_from_l3e(nl3e, pfn);
             return 0;
         }
     }
-    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, curr)) )
+    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, curr,
+                                     preserve_ad)) )
     {
         return 0;
     }
@@ -1564,7 +1580,8 @@ static int mod_l3_entry(l3_pgentry_t *pl
 /* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
 static int mod_l4_entry(l4_pgentry_t *pl4e, 
                         l4_pgentry_t nl4e, 
-                        unsigned long pfn)
+                        unsigned long pfn,
+                        int preserve_ad)
 {
     struct vcpu *curr = current;
     struct domain *d = curr->domain;
@@ -1592,18 +1609,20 @@ static int mod_l4_entry(l4_pgentry_t *pl
 
         /* Fast path for identical mapping and presence. */
         if (!l4e_has_changed(ol4e, nl4e, _PAGE_PRESENT))
-            return UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, curr);
+            return UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, curr, preserve_ad);
 
         if ( unlikely(!get_page_from_l4e(nl4e, pfn, d)) )
             return 0;
 
-        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, curr)) )
+        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, curr,
+                                    preserve_ad)) )
         {
             put_page_from_l4e(nl4e, pfn);
             return 0;
         }
     }
-    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, curr)) )
+    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, curr,
+                                     preserve_ad)) )
     {
         return 0;
     }
@@ -1946,7 +1965,7 @@ int new_guest_cr3(unsigned long mfn)
                     l4e_from_pfn(
                         mfn,
                         (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)),
-                    pagetable_get_pfn(v->arch.guest_table));
+                    pagetable_get_pfn(v->arch.guest_table), 0);
         if ( unlikely(!okay) )
         {
             MEM_LOG("Error while installing new compat baseptr %lx", mfn);
@@ -2458,13 +2477,16 @@ int do_mmu_update(
         {
             /*
              * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
+             * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
+             * current A/D bits.
              */
         case MMU_NORMAL_PT_UPDATE:
-
+        case MMU_PT_UPDATE_PRESERVE_AD:
             rc = xsm_mmu_normal_update(d, req.val);
             if ( rc )
                 break;
 
+            req.ptr -= cmd;
             gmfn = req.ptr >> PAGE_SHIFT;
             mfn = gmfn_to_mfn(d, gmfn);
 
@@ -2501,20 +2523,23 @@ int do_mmu_update(
                 case PGT_l1_page_table:
                 {
                     l1_pgentry_t l1e = l1e_from_intpte(req.val);
-                    okay = mod_l1_entry(va, l1e, mfn);
+                    okay = mod_l1_entry(va, l1e, mfn,
+                                        cmd == MMU_PT_UPDATE_PRESERVE_AD);
                 }
                 break;
                 case PGT_l2_page_table:
                 {
                     l2_pgentry_t l2e = l2e_from_intpte(req.val);
-                    okay = mod_l2_entry(va, l2e, mfn, type_info);
+                    okay = mod_l2_entry(va, l2e, mfn, type_info,
+                                        cmd == MMU_PT_UPDATE_PRESERVE_AD);
                 }
                 break;
 #if CONFIG_PAGING_LEVELS >= 3
                 case PGT_l3_page_table:
                 {
                     l3_pgentry_t l3e = l3e_from_intpte(req.val);
-                    okay = mod_l3_entry(va, l3e, mfn);
+                    okay = mod_l3_entry(va, l3e, mfn,
+                                        cmd == MMU_PT_UPDATE_PRESERVE_AD);
                 }
                 break;
 #endif
@@ -2522,7 +2547,8 @@ int do_mmu_update(
                 case PGT_l4_page_table:
                 {
                     l4_pgentry_t l4e = l4e_from_intpte(req.val);
-                    okay = mod_l4_entry(va, l4e, mfn);
+                    okay = mod_l4_entry(va, l4e, mfn,
+                                        cmd == MMU_PT_UPDATE_PRESERVE_AD);
                 }
                 break;
 #endif
@@ -2652,7 +2678,7 @@ static int create_grant_pte_mapping(
     }
 
     ol1e = *(l1_pgentry_t *)va;
-    if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v) )
+    if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) )
     {
         put_page_type(page);
         rc = GNTST_general_error;
@@ -2720,9 +2746,11 @@ static int destroy_grant_pte_mapping(
     }
 
     /* Delete pagetable entry. */
-    if ( unlikely(!UPDATE_ENTRY(l1, 
-                      (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn, 
-                      d->vcpu[0] /* Change if we go to per-vcpu shadows. */)) )
+    if ( unlikely(!UPDATE_ENTRY
+                  (l1, 
+                   (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn, 
+                   d->vcpu[0] /* Change if we go to per-vcpu shadows. */,
+                   0)) )
     {
         MEM_LOG("Cannot delete PTE entry at %p", va);
         put_page_type(page);
@@ -2758,7 +2786,7 @@ static int create_grant_va_mapping(
         return GNTST_general_error;
     }
     ol1e = *pl1e;
-    okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v);
+    okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0);
     guest_unmap_l1e(v, pl1e);
     pl1e = NULL;
 
@@ -2796,7 +2824,7 @@ static int replace_grant_va_mapping(
     }
 
     /* Delete pagetable entry. */
-    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v)) )
+    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
     {
         MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
         rc = GNTST_general_error;
@@ -2860,7 +2888,8 @@ int replace_grant_host_mapping(
     }
     ol1e = *pl1e;
 
-    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(), gl1mfn, curr)) )
+    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(),
+                                gl1mfn, curr, 0)) )
     {
         MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
         guest_unmap_l1e(curr, pl1e);
@@ -2948,7 +2977,7 @@ int do_update_va_mapping(unsigned long v
 
     pl1e = guest_map_l1e(v, va, &gl1mfn);
 
-    if ( unlikely(!pl1e || !mod_l1_entry(pl1e, val, gl1mfn)) )
+    if ( unlikely(!pl1e || !mod_l1_entry(pl1e, val, gl1mfn, 0)) )
         rc = -EINVAL;
 
     if ( pl1e )
@@ -3517,7 +3546,7 @@ static int ptwr_emulated_update(
     else
     {
         ol1e = *pl1e;
-        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v) )
+        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
             BUG();
     }
 

[-- Attachment #3: linux_mprotect_perf.patch --]
[-- Type: text/plain, Size: 4573 bytes --]

diff -r 61c96456a3e1 include/xen/interface/xen.h
--- a/include/xen/interface/xen.h	Thu Dec 20 16:58:14 2007 +0000
+++ b/include/xen/interface/xen.h	Sat Jan 12 20:01:14 2008 -0700
@@ -168,9 +168,13 @@
  * ptr[:2]  -- Machine address within the frame whose mapping to modify.
  *             The frame must belong to the FD, if one is specified.
  * val      -- Value to write into the mapping entry.
+ *
+ * ptr[1:0] == MMU_PT_UPDATE_PRESERVE_AD:
+ * As MMU_NORMAL_PT_UPDATE above, but A/D bits in the PTE are preserved (ORed).
  */
-#define MMU_NORMAL_PT_UPDATE     0 /* checked '*ptr = val'. ptr is MA.       */
-#define MMU_MACHPHYS_UPDATE      1 /* ptr = MA of frame to modify entry for  */
+#define MMU_NORMAL_PT_UPDATE      0 /* checked '*ptr = val'. ptr is MA.      */
+#define MMU_MACHPHYS_UPDATE       1 /* ptr = MA of frame to modify entry for */
+#define MMU_PT_UPDATE_PRESERVE_AD 2 /* '*ptr = val', preserve (OR) A/D bits  */
 
 /*
  * MMU EXTENDED OPERATIONS
diff -r 61c96456a3e1 include/asm-i386/mach-xen/asm/pgtable.h
--- a/include/asm-i386/mach-xen/asm/pgtable.h	Thu Dec 20 16:58:14 2007 +0000
+++ b/include/asm-i386/mach-xen/asm/pgtable.h	Mon Jan 07 17:47:31 2008 -0700
@@ -512,6 +512,12 @@ int touch_pte_range(struct mm_struct *mm
                     unsigned long address,
                     unsigned long size);
 
+int xen_change_pte_range(struct mm_struct *mm, pmd_t *pmd,
+		unsigned long addr, unsigned long end, pgprot_t newprot);
+
+#define arch_change_pte_range(mm, pmd, addr, end, newprot)	\
+		xen_change_pte_range(mm, pmd, addr, end, newprot)
+
 #define io_remap_pfn_range(vma,from,pfn,size,prot) \
 direct_remap_pfn_range(vma,from,pfn,size,prot,DOMID_IO)
 
diff -r 61c96456a3e1 include/asm-x86_64/mach-xen/asm/pgtable.h
--- a/include/asm-x86_64/mach-xen/asm/pgtable.h	Thu Dec 20 16:58:14 2007 +0000
+++ b/include/asm-x86_64/mach-xen/asm/pgtable.h	Mon Jan 07 17:47:41 2008 -0700
@@ -541,6 +541,12 @@ int touch_pte_range(struct mm_struct *mm
                     unsigned long address,
                     unsigned long size);
 
+int xen_change_pte_range(struct mm_struct *mm, pmd_t *pmd,
+		unsigned long addr, unsigned long end, pgprot_t newprot);
+
+#define arch_change_pte_range(mm, pmd, addr, end, newprot)	\
+		xen_change_pte_range(mm, pmd, addr, end, newprot)
+
 #define io_remap_pfn_range(vma, vaddr, pfn, size, prot)		\
 		direct_remap_pfn_range(vma,vaddr,pfn,size,prot,DOMID_IO)
 
diff -r 61c96456a3e1 include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h	Thu Dec 20 16:58:14 2007 +0000
+++ b/include/asm-generic/pgtable.h	Mon Jan 07 17:48:00 2008 -0700
@@ -188,6 +188,10 @@ static inline void ptep_set_wrprotect(st
 })
 #endif
 
+#ifndef arch_change_pte_range
+#define arch_change_pte_range(mm, pmd, addr, end, newprot) 0
+#endif
+
 #ifndef __ASSEMBLY__
 /*
  * When walking page tables, we usually want to skip any p?d_none entries;
diff -r 61c96456a3e1 arch/i386/mm/hypervisor.c
--- a/arch/i386/mm/hypervisor.c	Thu Dec 20 16:58:14 2007 +0000
+++ b/arch/i386/mm/hypervisor.c	Sat Jan 12 20:01:40 2008 -0700
@@ -548,3 +548,33 @@ int write_ldt_entry(void *ldt, int entry
 		mach_lp, (u64)entry_a | ((u64)entry_b<<32));
 }
 #endif
+
+#define MAX_BATCHED_FULL_PTES 32
+
+int xen_change_pte_range(struct mm_struct *mm, pmd_t *pmd,
+			 unsigned long addr, unsigned long end, pgprot_t newprot)
+{
+	int rc = 0, i = 0;
+	mmu_update_t u[MAX_BATCHED_FULL_PTES];
+	pte_t *pte;
+	spinlock_t *ptl;
+
+	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+	do {
+		if (pte_present(*pte)) {
+			u[i].ptr = virt_to_machine(pte) | MMU_PT_UPDATE_PRESERVE_AD;
+			u[i].val = __pte_val(pte_modify(*pte, newprot));
+			if (++i == MAX_BATCHED_FULL_PTES) {
+				if ((rc = HYPERVISOR_mmu_update(
+					&u[0], i, NULL, DOMID_SELF)) != 0)
+					break;
+				i = 0;
+			}
+		}
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+	if (i)
+		rc = HYPERVISOR_mmu_update( &u[0], i, NULL, DOMID_SELF);
+	pte_unmap_unlock(pte - 1, ptl);
+	BUG_ON(rc && rc != -ENOSYS);
+	return !rc;
+}
diff -r 61c96456a3e1 mm/mprotect.c
--- a/mm/mprotect.c	Thu Dec 20 16:58:14 2007 +0000
+++ b/mm/mprotect.c	Mon Jan 07 17:48:42 2008 -0700
@@ -75,6 +75,8 @@ static inline void change_pmd_range(stru
 	do {
 		next = pmd_addr_end(addr, end);
 		if (pmd_none_or_clear_bad(pmd))
+			continue;
+		if (arch_change_pte_range(mm, pmd, addr, next, newprot))
 			continue;
 		change_pte_range(mm, pmd, addr, next, newprot);
 	} while (pmd++, addr = next, addr != end);

[-- Attachment #4: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] respin of mprotect performance patch
  2008-01-13  3:08   ` Bruce Rogers
@ 2008-01-13  9:21     ` Keir Fraser
  2008-01-21 15:41     ` Rik van Riel
  1 sibling, 0 replies; 12+ messages in thread
From: Keir Fraser @ 2008-01-13  9:21 UTC (permalink / raw)
  To: Bruce Rogers, xen-devel

On 13/1/08 03:08, "Bruce Rogers" <brogers@novell.com> wrote:

> I made a few tweaks to your patch as follows:
> As I see it we need to preserve the existing AD bits but additionally OR in
> the AD bits being passed in (see behavior of pte_modify(), where it preserves
> current AD in use of _PAGE_CHG_MASK along with ORing in newprot bits, which
> include AD bits (A at least).)
> See update_intpte() for this change.

This way is strictly more expressive (you can get my behaviour by clearing
A/D bits in the passed-in 'val'). So seems good to me.

> Also we need to remove cmd value from req.ptr in do_mmu_update..

Yes, missed that.

Looks good!

 -- Keir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] respin of mprotect performance patch
  2008-01-13  3:08   ` Bruce Rogers
  2008-01-13  9:21     ` Keir Fraser
@ 2008-01-21 15:41     ` Rik van Riel
  2008-01-22 18:55       ` Rik van Riel
  1 sibling, 1 reply; 12+ messages in thread
From: Rik van Riel @ 2008-01-21 15:41 UTC (permalink / raw)
  To: Bruce Rogers; +Cc: dshaks, xen-devel

On Sat, 12 Jan 2008 20:08:06 -0700
"Bruce Rogers" <brogers@novell.com> wrote:

> I made a few tweaks to your patch as follows:
> As I see it we need to preserve the existing AD bits but additionally OR in the AD bits being passed in (see behavior of pte_modify(), where it preserves current AD in use of _PAGE_CHG_MASK along with ORing in newprot bits, which include AD bits (A at least).)
> See update_intpte() for this change.
> 
> Also we need to remove cmd value from req.ptr in do_mmu_update..
> 
> Updated Linux patch is also included.
> 
> Let me know if I'm missing the point some how.

I tested a quick backport of these patches to XEN 3.1.2 and
kernel 2.6.18.

Unfortunately, performance and scalability do not seem to
have improved with these patches, with the SAP mprotect 
test program.  I do not know why - maybe their program
calls mprotect with really small areas so the batching
does not do much?

This is on a quad CPU Intel x86-64 system.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] respin of mprotect performance patch
  2008-01-21 15:41     ` Rik van Riel
@ 2008-01-22 18:55       ` Rik van Riel
  2008-01-22 21:51         ` Bruce Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Rik van Riel @ 2008-01-22 18:55 UTC (permalink / raw)
  To: Rik van Riel; +Cc: dshaks, bmarson, xen-devel, Bruce Rogers

On Mon, 21 Jan 2008 10:41:59 -0500
Rik van Riel <riel@redhat.com> wrote:

> I tested a quick backport of these patches to XEN 3.1.2 and
> kernel 2.6.18.
> 
> Unfortunately, performance and scalability do not seem to
> have improved with these patches,

OK, it turned out the guys with the SAP mprotect benchmark program
made the mistake of only upgrading the dom0 kernel and not the guest.

With both dom0 and guest, mprotect performance has improved slightly
on our dual cpu, dual core (4 cores total) Intel x86-64 test system:

CPULOAD    WITH     
STREAMS    PATCH     STANDARD
1        13108.77    11188.68
2        13135.63    11146.06
3        10522.82     9910.10
4         8622.47     9399.66
6         6571.49     7078.12
8         5141.23     5389.15
10        4514.37     4372.87
12        4226.79     3802.98

So we have about a 10-15% performance improvement at some numbers of
streams, and a performance decrease at some other numbers of streams.
Over-all performance impact of the patch seems to be about even :(

I wonder if this is just an artifact of the system we tested this on,
or if it can be seen across multiple systems.

Bruce, what kinds of systems have you tested the patch on and what
kind of performance numbers are you seeing?

-- 
All Rights Reversed

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] respin of mprotect performance patch
  2008-01-22 18:55       ` Rik van Riel
@ 2008-01-22 21:51         ` Bruce Rogers
  2008-01-28 17:53           ` Hannes Kuehnemund
  0 siblings, 1 reply; 12+ messages in thread
From: Bruce Rogers @ 2008-01-22 21:51 UTC (permalink / raw)
  To: Rik van Riel; +Cc: dshaks, bmarson, xen-devel

Your results are quite different from what I've seen. But then again I've been doing slightly different scaling tests, and of course the hardware is different.

In the original problem report we received from SAP, the # of cpus was varied while running their test which runs 3 cpuload streams on each cpu.  I've continued down that path since it shows a steady degradation as more cpus are added.  I've not played with varying the number of cpuload streams in my testing.
Other than that, it sounds like we are running the same SAP test (cpu_load.sh).

Here's what I've seen (numbers are the throughput per cpu, test assigns 3 streams per cpu, test was run in Dom0, as it was in the SAP problem report):
#cpus      native    with patch      without patch
    1         9434       8264                6095
    2         9884       8227                4530
    3         9790       8008                2962
    4         9392       7531                1801
<intervening cpu counts not test, trend seemed obvious>
    8         6507       5870                 669

So as you can see the performance degradation I am seeing is quite marked, and the benefit of the patch quite consistent and obviously beneficial.
The machine I used is a 8 processor  (4 dual core AMD Opteron 8218 CPUs, ea. w/ 1MB L2 Cache) HP Proliant Server with 16GB memory.
I do wish I had been able to do testing with more than 1 machine; perhaps that would have provided better insights into the performance problem.  Out lab is in a state of flux right now as we're moving floors and I hence probably won't be able to do a comparative run on other hardware very soon.

I believe SAP is doing some broad based testing of their own which may help identify how much the hardware has to do with the performance of this test.

What results do you get with native (non-smp) on that same hardware? Have you done runs on other hardware and seen consistent results?

- Bruce

>>> On 1/22/2008 at 11:55 AM, in message
<20080122135534.1225627e@cuia.boston.redhat.com>, Rik van Riel
<riel@redhat.com> wrote:
> On Mon, 21 Jan 2008 10:41:59 -0500
> Rik van Riel <riel@redhat.com> wrote:
> 
>> I tested a quick backport of these patches to XEN 3.1.2 and
>> kernel 2.6.18.
>> 
>> Unfortunately, performance and scalability do not seem to
>> have improved with these patches,
> 
> OK, it turned out the guys with the SAP mprotect benchmark program
> made the mistake of only upgrading the dom0 kernel and not the guest.
> 
> With both dom0 and guest, mprotect performance has improved slightly
> on our dual cpu, dual core (4 cores total) Intel x86-64 test system:
> 
> CPULOAD    WITH     
> STREAMS    PATCH     STANDARD
> 1        13108.77    11188.68
> 2        13135.63    11146.06
> 3        10522.82     9910.10
> 4         8622.47     9399.66
> 6         6571.49     7078.12
> 8         5141.23     5389.15
> 10        4514.37     4372.87
> 12        4226.79     3802.98
> 
> So we have about a 10-15% performance improvement at some numbers of
> streams, and a performance decrease at some other numbers of streams.
> Over-all performance impact of the patch seems to be about even :(
> 
> I wonder if this is just an artifact of the system we tested this on,
> or if it can be seen across multiple systems.
> 
> Bruce, what kinds of systems have you tested the patch on and what
> kind of performance numbers are you seeing?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] respin of mprotect performance patch
  2008-01-22 21:51         ` Bruce Rogers
@ 2008-01-28 17:53           ` Hannes Kuehnemund
  2008-01-28 20:00             ` Rik van Riel
  0 siblings, 1 reply; 12+ messages in thread
From: Hannes Kuehnemund @ 2008-01-28 17:53 UTC (permalink / raw)
  To: Bruce Rogers; +Cc: dshaks, bmarson, xen-devel

On 22.01.2008 22:51, Bruce Rogers wrote:
> [...]
> I believe SAP is doing some broad based testing of their own which may help identify how much the hardware has to do with the performance of this test.

We are currently testing the patched kernels we got from Red Hat and 
Novell with several Intel Tigerton (Quad Socket, Quad Core, shared L2 
Cache @ 2.93GHz) machines. I can't say for sure, if we'll have AMD too. 
It'll take a week until I can provide exact results.

Thanks,
   Hannes

> What results do you get with native (non-smp) on that same hardware? Have you done runs on other hardware and seen consistent results?
> 
> - Bruce
> 
>>>> On 1/22/2008 at 11:55 AM, in message
> <20080122135534.1225627e@cuia.boston.redhat.com>, Rik van Riel
> <riel@redhat.com> wrote:
>> On Mon, 21 Jan 2008 10:41:59 -0500
>> Rik van Riel <riel@redhat.com> wrote:
>>
>>> I tested a quick backport of these patches to XEN 3.1.2 and
>>> kernel 2.6.18.
>>>
>>> Unfortunately, performance and scalability do not seem to
>>> have improved with these patches,
>> OK, it turned out the guys with the SAP mprotect benchmark program
>> made the mistake of only upgrading the dom0 kernel and not the guest.
>>
>> With both dom0 and guest, mprotect performance has improved slightly
>> on our dual cpu, dual core (4 cores total) Intel x86-64 test system:
>>
>> CPULOAD    WITH     
>> STREAMS    PATCH     STANDARD
>> 1        13108.77    11188.68
>> 2        13135.63    11146.06
>> 3        10522.82     9910.10
>> 4         8622.47     9399.66
>> 6         6571.49     7078.12
>> 8         5141.23     5389.15
>> 10        4514.37     4372.87
>> 12        4226.79     3802.98
>>
>> So we have about a 10-15% performance improvement at some numbers of
>> streams, and a performance decrease at some other numbers of streams.
>> Over-all performance impact of the patch seems to be about even :(
>>
>> I wonder if this is just an artifact of the system we tested this on,
>> or if it can be seen across multiple systems.
>>
>> Bruce, what kinds of systems have you tested the patch on and what
>> kind of performance numbers are you seeing?
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

-- 

* Hannes Kuehnemund
* SAP LinuxLab
* SAP AG
* Dietmar Hopp Allee 16
* 69190 Walldorf, Germany
T  +49 6227 7-40615
F  +49 6227 78-34584
mailto: hannes.kuehnemund@sap.com
http://www.sap.com

Sitz der Gesellschaft/Registered Office: Walldorf, Germany
Vorstand/SAP Executive Board: Henning Kagermann (Sprecher/CEO),
Léo Apotheker (stellvertretender Sprecher/Deputy CEO),
Werner Brandt, Claus Heinrich, Gerhard Oswald, Peter Zencke
Vorsitzender des Aufsichtsrats/Chairperson of the SAP
Supervisory Board: Hasso Plattner
Registergericht/Commercial Register Mannheim No HRB 350269

Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder
sonstige vertrauliche Informationen enthalten. Sollten Sie
diese E-Mail irrtümlich erhalten haben, ist Ihnen eine
Kenntnisnahme des Inhalts, eine Vervielfältigung oder
Weitergabe der E-Mail ausdrücklich untersagt. Bitte
benachrichtigen Sie uns und vernichten Sie die empfangene
E-Mail. Vielen Dank.

This e-mail may contain trade secrets or privileged,
undisclosed, or otherwise confidential information. If you have
received this e-mail in error, you are hereby notified that any
review, copying, or distribution of it is strictly prohibited.
Please inform us immediately and destroy the original
transmittal. Thank you for your cooperation.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] respin of mprotect performance patch
  2008-01-28 17:53           ` Hannes Kuehnemund
@ 2008-01-28 20:00             ` Rik van Riel
  0 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2008-01-28 20:00 UTC (permalink / raw)
  To: Hannes Kuehnemund; +Cc: dshaks, bmarson, xen-devel, Bruce Rogers

On Mon, 28 Jan 2008 18:53:41 +0100
Hannes Kuehnemund <hannes.kuehnemund@sap.com> wrote:
> On 22.01.2008 22:51, Bruce Rogers wrote:
> > [...]
> > I believe SAP is doing some broad based testing of their own which
> > may help identify how much the hardware has to do with the
> > performance of this test.
> 
> We are currently testing the patched kernels we got from Red Hat and 
> Novell with several Intel Tigerton (Quad Socket, Quad Core, shared L2 
> Cache @ 2.93GHz) machines. I can't say for sure, if we'll have AMD
> too. It'll take a week until I can provide exact results.

I would like to know exactly which of our patched
kernels you are trying.

The older versions on my people page had bad performance
due to an unrelated bug.  Version 2.6.18-69.el5.bz412731.3
and newer should work right.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-01-28 20:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-08  1:05 [PATCH] respin of mprotect performance patch Bruce Rogers
2008-01-10 15:52 ` Keir Fraser
2008-01-10 17:13   ` Bruce Rogers
2008-01-10 17:42     ` Keir Fraser
2008-01-12 11:47 ` Keir Fraser
2008-01-13  3:08   ` Bruce Rogers
2008-01-13  9:21     ` Keir Fraser
2008-01-21 15:41     ` Rik van Riel
2008-01-22 18:55       ` Rik van Riel
2008-01-22 21:51         ` Bruce Rogers
2008-01-28 17:53           ` Hannes Kuehnemund
2008-01-28 20:00             ` Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.