All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Nested Paging Live Migration
@ 2007-06-01 15:05 Huang2, Wei
  2007-06-01 16:17 ` Tim Deegan
  0 siblings, 1 reply; 8+ messages in thread
From: Huang2, Wei @ 2007-06-01 15:05 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2604 bytes --]

The attached file supports AMD-V nested paging live migration. Please
comment. I will create an updated version after collecting feedbacks.
 
 arch/x86/hvm/hvm.c            |    2 
 arch/x86/hvm/io.c             |    2 
 arch/x86/hvm/svm/svm.c        |    3 
 arch/x86/mm.c                 |   12 +-
 arch/x86/mm/hap/hap.c         |  220
+++++++++++++++++++++++++++++++++++++++++-
 arch/x86/mm/p2m.c             |   92 +++++++++++++++--
 arch/x86/mm/paging.c          |   12 ++
 include/asm-x86/domain.h      |    8 +
 include/asm-x86/grant_table.h |    2 
 include/asm-x86/hap.h         |    1 
 include/asm-x86/p2m.h         |    5 
 include/asm-x86/page.h        |    2 
 include/asm-x86/paging.h      |    2 
 include/asm-x86/shadow.h      |    7 -
 14 files changed, 341 insertions(+), 29 deletions(-)

Design:
1. We handle four live migration operators as follow:
* XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY
** Allocates log_dirty_bitmap
** Set log dirty bit in paging mode
** Goes through the P2M table and mark all physical memory as NOT
WRITABLE
** Continues to run the guest as usual
 
* XEN_DOMCTL_SHADOW_OP_PEEK
** There is nothing special here. It is pretty similar to shadow code.
Just copy dirty bitmap information to live migration handler.
 
* XEN_DOMCTL_SHADOW_OP_CLEAN
** Clean dirty bitmap to all 0's.
** Goes through the P2M table and marks all physical memory as NOT
WRITABLE
** Continues to run the guest as usual
 
* XEN_DOMCTL_SHADOW_OP_OFF
** Fix P2M table and mark all physical memory as WRITABLE
** De-allocate dirty bitmap resources
** Clear log dirty bit in paging mode
 
2. We handle nested page fault as follow:
* Nested Paging Fault
** If it is MMIO space, call handle_mmio()
** Otherwise, call p2m_fix_table() to mark a specific page as WRITABLE.
Additionally, we call paging_mark_dirty() to update dirty bitmap. By
doing this, we only receive one NPF for each dirty page (in each cycle).
 
The following areas require special attention:
1. paging_mark_dirty()
Currently, paging_mark_dirty() dispatches to sh_mark_dirty() or
hap_mark_dirty() based on paging support. I personally prefer a function
pointer. However, current paging interface only provides a function
pointer for vcpu-level functions, not for domain-level functions. This
is a bit annoying. 
 
2. locking in p2m_set_l1e_flags()
p2m_set_l1e_flags(), which is invoked by hap.c, calls
hap_write_p2m_entry(). hap_lock() is called twice. I currently remove
hap_lock() in hap_write_p2m_entry(). A better solution is needed here.
 
 
Thanks,
 
-Wei

[-- Attachment #1.2: Type: text/html, Size: 5858 bytes --]

[-- Attachment #2: npt_live_migrate_RFC.txt --]
[-- Type: text/plain, Size: 21464 bytes --]

diff -r d07ecb861009 -r 41cfce9eeb10 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c	Tue May 29 06:02:39 2007 -0500
+++ b/xen/arch/x86/hvm/hvm.c	Wed May 30 10:09:48 2007 -0500
@@ -559,7 +559,7 @@ static int __hvm_copy(void *buf, paddr_t
         if ( dir )
         {
             memcpy(p, buf, count); /* dir == TRUE:  *to* guest */
-            mark_dirty(current->domain, mfn);
+            paging_mark_dirty(current->domain, mfn);
         }
         else
             memcpy(buf, p, count); /* dir == FALSE: *from guest */
diff -r d07ecb861009 -r 41cfce9eeb10 xen/arch/x86/hvm/io.c
--- a/xen/arch/x86/hvm/io.c	Tue May 29 06:02:39 2007 -0500
+++ b/xen/arch/x86/hvm/io.c	Wed May 30 10:09:48 2007 -0500
@@ -865,7 +865,7 @@ void hvm_io_assist(void)
     if ( (p->dir == IOREQ_READ) && p->data_is_ptr )
     {
         gmfn = get_mfn_from_gpfn(paging_gva_to_gfn(v, p->data));
-        mark_dirty(d, gmfn);
+        paging_mark_dirty(d, gmfn);
     }
 
  out:
diff -r d07ecb861009 -r 41cfce9eeb10 xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c	Tue May 29 06:02:39 2007 -0500
+++ b/xen/arch/x86/hvm/svm/svm.c	Wed May 30 10:09:48 2007 -0500
@@ -1013,8 +1013,7 @@ static int svm_do_nested_pgfault(paddr_t
         return 1;
     }
 
-    /* We should not reach here. Otherwise, P2M table is not correct.*/
-    return 0;
+    return p2m_fix_table(current->domain, gpa);
 }
 
 static void svm_do_no_device_fault(struct vmcb_struct *vmcb)
diff -r d07ecb861009 -r 41cfce9eeb10 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Tue May 29 06:02:39 2007 -0500
+++ b/xen/arch/x86/mm.c	Wed May 30 10:09:48 2007 -0500
@@ -1552,7 +1552,7 @@ int alloc_page_type(struct page_info *pa
 
     /* A page table is dirtied when its type count becomes non-zero. */
     if ( likely(owner != NULL) )
-        mark_dirty(owner, page_to_mfn(page));
+        paging_mark_dirty(owner, page_to_mfn(page));
 
     switch ( type & PGT_type_mask )
     {
@@ -1598,7 +1598,7 @@ void free_page_type(struct page_info *pa
         if ( unlikely(paging_mode_enabled(owner)) )
         {
             /* A page table is dirtied when its type count becomes zero. */
-            mark_dirty(owner, page_to_mfn(page));
+            paging_mark_dirty(owner, page_to_mfn(page));
 
             if ( shadow_mode_refcounts(owner) )
                 return;
@@ -2053,7 +2053,7 @@ int do_mmuext_op(
             }
 
             /* A page is dirtied when its pin status is set. */
-            mark_dirty(d, mfn);
+            paging_mark_dirty(d, mfn);
            
             /* We can race domain destruction (domain_relinquish_resources). */
             if ( unlikely(this_cpu(percpu_mm_info).foreign != NULL) )
@@ -2085,7 +2085,7 @@ int do_mmuext_op(
                 put_page_and_type(page);
                 put_page(page);
                 /* A page is dirtied when its pin status is cleared. */
-                mark_dirty(d, mfn);
+                paging_mark_dirty(d, mfn);
             }
             else
             {
@@ -2420,7 +2420,7 @@ int do_mmu_update(
             set_gpfn_from_mfn(mfn, gpfn);
             okay = 1;
 
-            mark_dirty(FOREIGNDOM, mfn);
+            paging_mark_dirty(FOREIGNDOM, mfn);
 
             put_page(mfn_to_page(mfn));
             break;
@@ -2959,7 +2959,7 @@ long do_update_descriptor(u64 pa, u64 de
         break;
     }
 
-    mark_dirty(dom, mfn);
+    paging_mark_dirty(dom, mfn);
 
     /* All is good so make the update. */
     gdt_pent = map_domain_page(mfn);
diff -r d07ecb861009 -r 41cfce9eeb10 xen/arch/x86/mm/hap/hap.c
--- a/xen/arch/x86/mm/hap/hap.c	Tue May 29 06:02:39 2007 -0500
+++ b/xen/arch/x86/mm/hap/hap.c	Wed May 30 10:09:48 2007 -0500
@@ -385,6 +385,211 @@ void hap_destroy_monitor_table(struct vc
 }
 
 /************************************************/
+/*            HAP LOG DIRTY SUPPORT             */
+/************************************************/
+void hap_mark_dirty(struct domain *d, mfn_t gmfn)
+{
+    unsigned long pfn;
+    int do_locking;
+
+    if ( !paging_mode_log_dirty(d) || !mfn_valid(gmfn) )
+        return;
+
+    /* Although this is an externally visible function, we do not know
+     * whether the lock will be held when it is called (since it
+     * can be called from __hvm_copy during emulation).
+     * If the lock isn't held, take it for the duration of the call. */
+    do_locking = !hap_locked_by_me(d);
+    if ( do_locking ) 
+    { 
+        hap_lock(d);
+        /* Check the mode again with the lock held */ 
+        if ( unlikely(!paging_mode_log_dirty(d)) )
+        {
+            hap_unlock(d);
+            return;
+        }
+    }
+
+    ASSERT(d->arch.paging.hap.dirty_bitmap != NULL);
+
+    /* We /really/ mean PFN here, even for non-translated guests. */
+    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
+
+    /*
+     * Values with the MSB set denote MFNs that aren't really part of the 
+     * domain's pseudo-physical memory map (e.g., the shared info frame).
+     * Nothing to do here...
+     */
+    if ( unlikely(!VALID_M2P(pfn)) )
+        return;
+
+    if ( likely(pfn < d->arch.paging.hap.dirty_bitmap_size) ) 
+    { 
+        if ( !__test_and_set_bit(pfn, d->arch.paging.hap.dirty_bitmap) )
+        {
+            d->arch.paging.hap.dirty_count++;
+        }
+    }
+    else
+    {
+        HAP_PRINTK("hap_mark_dirty OOR! "
+                   "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
+                   "owner=%d c=%08x t=%" PRtype_info "\n",
+                   mfn_x(gmfn), 
+                   pfn, 
+                   d->arch.paging.hap.dirty_bitmap_size,
+                   d->domain_id,
+                   (page_get_owner(mfn_to_page(gmfn))
+                    ? page_get_owner(mfn_to_page(gmfn))->domain_id
+                    : -1),
+                   mfn_to_page(gmfn)->count_info, 
+                   mfn_to_page(gmfn)->u.inuse.type_info);
+    }
+
+    if ( do_locking ) hap_unlock(d);
+}
+
+int hap_alloc_log_dirty_bitmap(struct domain *d)
+{
+    ASSERT(d->arch.paging.hap.dirty_bitmap == NULL);
+    
+    d->arch.paging.hap.dirty_bitmap_size = 
+        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
+    d->arch.paging.hap.dirty_bitmap = 
+        xmalloc_array(unsigned long,
+                      d->arch.paging.hap.dirty_bitmap_size / BITS_PER_LONG);
+    if ( d->arch.paging.hap.dirty_bitmap == NULL ) 
+    {
+	d->arch.paging.hap.dirty_bitmap_size = 0;
+        return -ENOMEM;
+    }
+    
+    memset(d->arch.paging.hap.dirty_bitmap, 0, 
+           d->arch.paging.hap.dirty_bitmap_size/8);
+    
+    return 0;
+}
+
+void hap_free_log_dirty_bitmap(struct domain *d)
+{
+    d->arch.paging.hap.dirty_bitmap_size = 0;
+    if ( d->arch.paging.hap.dirty_bitmap )
+    {
+	xfree(d->arch.paging.hap.dirty_bitmap);
+	d->arch.paging.hap.dirty_bitmap = NULL;
+    }
+}
+
+int hap_log_dirty_enable(struct domain *d)
+{
+    int ret;
+
+    domain_pause(d);
+    hap_lock(d);
+
+    ret = hap_alloc_log_dirty_bitmap(d);
+    if ( ret != 0 )
+    {
+	hap_free_log_dirty_bitmap(d);
+	goto out;
+    }
+
+    /* turn on PG_log_dirty bit in paging mode */
+    d->arch.paging.mode |= PG_log_dirty;
+
+    /* mark physical memory as not writable */
+    p2m_set_l1e_flags(d, __PAGE_HYPERVISOR_NOT_WRITABLE|_PAGE_USER);
+    flush_tlb_all_pge();
+
+ out:
+    hap_unlock(d);
+    domain_unpause(d);
+    
+    return ret;
+}
+
+int hap_log_dirty_disable(struct domain *d)
+{
+    domain_pause(d);
+    hap_lock(d);
+    if ( paging_mode_log_dirty(d) )
+	hap_free_log_dirty_bitmap(d);
+
+    /* turn off PG_log_dirty bit in paging mode */
+    d->arch.paging.mode &= ~PG_log_dirty;
+
+    /* recover P2M table to normal mode */
+    p2m_set_l1e_flags(d, __PAGE_HYPERVISOR|_PAGE_USER);
+
+    hap_unlock(d);
+    domain_unpause(d);
+
+    return 1;
+}
+
+int hap_log_dirty_op(struct domain *d, struct xen_domctl_shadow_op *sc)
+{
+    int i, ret = 0, clean = 0, peek = 1;
+
+    domain_pause(d);
+    hap_lock(d);
+
+    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
+    sc->stats.fault_count = d->arch.paging.hap.fault_count;
+    sc->stats.dirty_count = d->arch.paging.hap.dirty_count;
+
+    if ( clean ) 
+    {
+	d->arch.paging.hap.fault_count = 0;
+	d->arch.paging.hap.dirty_count = 0;
+    }
+
+    if ( guest_handle_is_null(sc->dirty_bitmap) )
+        peek = 0; /* caller just wants to clean the state or access stats */
+    
+    if ( (peek || clean) && (d->arch.paging.hap.dirty_bitmap == NULL) ) {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    if ( sc->pages > d->arch.paging.hap.dirty_bitmap_size )
+        sc->pages = d->arch.paging.hap.dirty_bitmap_size;
+
+#define CHUNK (8*1024) /* Transfer and clean in 1KB chunks for L1 cache */
+    for ( i = 0; i < sc->pages; i += CHUNK ) {
+        int bytes = ((((sc->pages - i) > CHUNK)
+                      ? CHUNK
+                      : (sc->pages - i)) + 7) / 8;
+        
+        if ( likely(peek) ) {
+            if ( copy_to_guest_offset(
+                 sc->dirty_bitmap, i/8,
+                 (uint8_t *)d->arch.paging.hap.dirty_bitmap + (i/8), bytes) )
+            {
+                ret = -EFAULT;
+                goto out;
+            }
+        }
+            
+        if ( clean )
+            memset((uint8_t *)d->arch.paging.hap.dirty_bitmap + (i/8), 0, bytes);
+    }
+#undef CHUNK
+
+    /* mark physical memory as not writable */
+    if ( clean ) {
+        p2m_set_l1e_flags(d, __PAGE_HYPERVISOR_NOT_WRITABLE|_PAGE_USER);
+        flush_tlb_all_pge();
+    }
+    
+
+ out:
+    hap_unlock(d);
+    domain_unpause(d);
+    return ret;    
+}
+/************************************************/
 /*          HAP DOMAIN LEVEL FUNCTIONS          */
 /************************************************/
 void hap_domain_init(struct domain *d)
@@ -504,6 +709,19 @@ int hap_domctl(struct domain *d, xen_dom
     }
     
     switch ( sc->op ) {
+    case XEN_DOMCTL_SHADOW_OP_OFF:
+	if ( paging_mode_log_dirty(d) )
+            if ( (rc = hap_log_dirty_disable(d)) != 0 )
+                return rc;
+	return 0;
+	
+    case XEN_DOMCTL_SHADOW_OP_CLEAN:
+    case XEN_DOMCTL_SHADOW_OP_PEEK:
+	return hap_log_dirty_op(d, sc);
+
+    case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
+	return hap_log_dirty_enable(d);
+
     case XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION:
         hap_lock(d);
         rc = hap_set_allocation(d, sc->mb << (20 - PAGE_SHIFT), &preempted);
@@ -669,7 +887,6 @@ hap_write_p2m_entry(struct vcpu *v, unsi
 hap_write_p2m_entry(struct vcpu *v, unsigned long gfn, l1_pgentry_t *p,
                     l1_pgentry_t new, unsigned int level)
 {
-    hap_lock(v->domain);
     safe_write_pte(p, new);
 #if CONFIG_PAGING_LEVELS == 3
     /* install P2M in monitor table for PAE Xen */
@@ -680,7 +897,6 @@ hap_write_p2m_entry(struct vcpu *v, unsi
 	
     }
 #endif
-    hap_unlock(v->domain);
 }
 
 /* Entry points into this mode of the hap code. */
diff -r d07ecb861009 -r 41cfce9eeb10 xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c	Tue May 29 06:02:39 2007 -0500
+++ b/xen/arch/x86/mm/p2m.c	Wed May 30 10:09:48 2007 -0500
@@ -169,7 +169,7 @@ p2m_next_level(struct domain *d, mfn_t *
 
 // Returns 0 on error (out of memory)
 static int
-set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, u32 l1e_flags)
 {
     // XXX -- this might be able to be faster iff current->domain == d
     mfn_t table_mfn = pagetable_get_mfn(d->arch.phys_table);
@@ -213,7 +213,7 @@ set_p2m_entry(struct domain *d, unsigned
         d->arch.p2m.max_mapped_pfn = gfn;
 
     if ( mfn_valid(mfn) )
-        entry_content = l1e_from_pfn(mfn_x(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
+        entry_content = l1e_from_pfn(mfn_x(mfn), l1e_flags);
     else
         entry_content = l1e_empty();
 
@@ -278,7 +278,7 @@ int p2m_alloc_table(struct domain *d,
         p2m_unlock(d);
         return -ENOMEM;
     }
-list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
+    list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
 
     p2m_top->count_info = 1;
     p2m_top->u.inuse.type_info = 
@@ -297,8 +297,8 @@ list_add_tail(&p2m_top->list, &d->arch.p
  
     /* Initialise physmap tables for slot zero. Other code assumes this. */
     gfn = 0;
-mfn = _mfn(INVALID_MFN);
-    if ( !set_p2m_entry(d, gfn, mfn) )
+    mfn = _mfn(INVALID_MFN);
+    if ( !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
         goto error;
 
     for ( entry = d->page_list.next;
@@ -316,7 +316,7 @@ mfn = _mfn(INVALID_MFN);
             (gfn != 0x55555555L)
 #endif
              && gfn != INVALID_M2P_ENTRY
-             && !set_p2m_entry(d, gfn, mfn) )
+             && !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
             goto error;
     }
 
@@ -626,7 +626,7 @@ p2m_remove_page(struct domain *d, unsign
     ASSERT(mfn_x(gfn_to_mfn(d, gfn)) == mfn);
     //ASSERT(mfn_to_gfn(d, mfn) == gfn);
 
-    set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+    set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
 }
 
@@ -659,7 +659,7 @@ guest_physmap_add_page(struct domain *d,
     omfn = gfn_to_mfn(d, gfn);
     if ( mfn_valid(omfn) )
     {
-        set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+        set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
         set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
     }
 
@@ -685,13 +685,87 @@ guest_physmap_add_page(struct domain *d,
         }
     }
 
-    set_p2m_entry(d, gfn, _mfn(mfn));
+    set_p2m_entry(d, gfn, _mfn(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, gfn);
 
     audit_p2m(d);
     p2m_unlock(d);
 }
 
+/* This function goes through P2M table and modify the flags of l1e. Note that 
+ * physical base address of l1e is intact. This function can be used for 
+ * special purpose, such as marking physical memory as Not-Writable for
+ * tracking dirty pages during live migration. 
+ */
+int p2m_set_l1e_flags(struct domain *d, u32 l1e_flags)
+{
+    mfn_t mfn;
+    struct list_head *entry;
+    struct page_info *page;
+    unsigned long gfn;
+
+    p2m_lock(d);
+
+    if ( pagetable_get_pfn(d->arch.phys_table) == 0 )
+    {
+	P2M_ERROR("p2m table has not been allocated for this domain yet!\n");
+	p2m_unlock(d);
+	return -EINVAL;
+    }
+
+    for ( entry = d->page_list.next;
+          entry != &d->page_list;
+          entry = entry->next )
+    {
+        page = list_entry(entry, struct page_info, list);
+        mfn = page_to_mfn(page);
+        gfn = get_gpfn_from_mfn(mfn_x(mfn));
+        if (
+#ifdef __x86_64__
+            (gfn != 0x5555555555555555L)
+#else
+            (gfn != 0x55555555L)
+#endif
+             && gfn != INVALID_M2P_ENTRY
+             && !set_p2m_entry(d, gfn, mfn, l1e_flags) )
+            goto error;
+    }
+
+    p2m_unlock(d);
+    return 0;
+
+ error:
+    P2M_PRINTK("failed to change l1e flags of p2m table, gfn=%05lx, mfn=%"
+               PRI_mfn "\n", gfn, mfn_x(mfn));
+    p2m_unlock(d);
+    return -ENOMEM;
+}
+
+/* This function handles P2M page faults by fixing l1e flags with correct 
+ * values.  It also calls paging_mark_dirty() function to record the dirty
+ * pages.
+ */
+int p2m_fix_table(struct domain *d, paddr_t gpa)
+{
+    unsigned long gfn;
+    mfn_t mfn;
+
+    p2m_lock(d);
+
+    gfn = gpa >> PAGE_SHIFT;
+      
+    mfn = gfn_to_mfn(d, gfn);
+    if ( mfn_valid(mfn) ) 
+    {
+        set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER);
+    }
+
+    paging_mark_dirty(d, mfn_x(mfn));
+    
+    p2m_unlock(d);
+
+    return 1; /* successful */
+}
 
 /*
  * Local variables:
diff -r d07ecb861009 -r 41cfce9eeb10 xen/arch/x86/mm/paging.c
--- a/xen/arch/x86/mm/paging.c	Tue May 29 06:02:39 2007 -0500
+++ b/xen/arch/x86/mm/paging.c	Wed May 30 10:09:48 2007 -0500
@@ -98,6 +98,18 @@ int paging_enable(struct domain *d, u32 
         return hap_enable(d, mode | PG_HAP_enable);
     else
         return shadow_enable(d, mode | PG_SH_enable);
+}
+
+/* Mark a dirty page for log dirty bitmap during live migration */
+void paging_mark_dirty(struct domain *d, unsigned long gmfn)
+{
+    if ( likely(!paging_mode_log_dirty(d)) )
+        return;
+
+    if ( opt_hap_enabled && is_hvm_domain(d) )
+        hap_mark_dirty(d, _mfn(gmfn));
+    else
+        sh_mark_dirty(d, _mfn(gmfn));
 }
 
 /* Print paging-assistance info to the console */
diff -r d07ecb861009 -r 41cfce9eeb10 xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h	Tue May 29 06:02:39 2007 -0500
+++ b/xen/include/asm-x86/domain.h	Wed May 30 10:09:48 2007 -0500
@@ -129,6 +129,14 @@ struct hap_domain {
     unsigned int      total_pages;  /* number of pages allocated */
     unsigned int      free_pages;   /* number of pages on freelists */
     unsigned int      p2m_pages;    /* number of pages allocates to p2m */
+
+    /* hap log-dirty bitmap */
+    unsigned long    *dirty_bitmap;
+    unsigned int      dirty_bitmap_size;  /* in pages, bit per page */
+    
+    /* hap log-dirty mode statistics */
+    unsigned int      fault_count;
+    unsigned int      dirty_count;
 };
 
 /************************************************/
diff -r d07ecb861009 -r 41cfce9eeb10 xen/include/asm-x86/grant_table.h
--- a/xen/include/asm-x86/grant_table.h	Tue May 29 06:02:39 2007 -0500
+++ b/xen/include/asm-x86/grant_table.h	Wed May 30 10:09:48 2007 -0500
@@ -31,7 +31,7 @@ int destroy_grant_host_mapping(
 #define gnttab_shared_gmfn(d, t, i)                     \
     (mfn_to_gmfn(d, gnttab_shared_mfn(d, t, i)))
 
-#define gnttab_mark_dirty(d, f) mark_dirty((d), (f))
+#define gnttab_mark_dirty(d, f) paging_mark_dirty((d), (f))
 
 static inline void gnttab_clear_flag(unsigned long nr, uint16_t *addr)
 {
diff -r d07ecb861009 -r 41cfce9eeb10 xen/include/asm-x86/hap.h
--- a/xen/include/asm-x86/hap.h	Tue May 29 06:02:39 2007 -0500
+++ b/xen/include/asm-x86/hap.h	Wed May 30 10:09:48 2007 -0500
@@ -104,6 +104,7 @@ int   hap_enable(struct domain *d, u32 m
 int   hap_enable(struct domain *d, u32 mode);
 void  hap_final_teardown(struct domain *d);
 void  hap_teardown(struct domain *d);
+void  hap_mark_dirty(struct domain *d, mfn_t gmfn);
 void  hap_vcpu_init(struct vcpu *v);
 
 extern struct paging_mode hap_paging_real_mode;
diff -r d07ecb861009 -r 41cfce9eeb10 xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h	Tue May 29 06:02:39 2007 -0500
+++ b/xen/include/asm-x86/p2m.h	Wed May 30 10:09:48 2007 -0500
@@ -129,6 +129,11 @@ void guest_physmap_remove_page(struct do
 void guest_physmap_remove_page(struct domain *d, unsigned long gfn,
                                unsigned long mfn);
 
+/* Configure l1e flags of P2M table */
+int p2m_set_l1e_flags(struct domain *d, u32 flags);
+
+/* Fix P2M table when page faults are related to P2M table entry */
+int p2m_fix_table(struct domain *d, paddr_t gpa);
 
 #endif /* _XEN_P2M_H */
 
diff -r d07ecb861009 -r 41cfce9eeb10 xen/include/asm-x86/page.h
--- a/xen/include/asm-x86/page.h	Tue May 29 06:02:39 2007 -0500
+++ b/xen/include/asm-x86/page.h	Wed May 30 10:09:48 2007 -0500
@@ -334,6 +334,8 @@ void setup_idle_pagetable(void);
     (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED)
 #define __PAGE_HYPERVISOR_NOCACHE \
     (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_PCD | _PAGE_ACCESSED)
+#define __PAGE_HYPERVISOR_NOT_WRITABLE \
+    (_PAGE_PRESENT | _PAGE_DIRTY | _PAGE_ACCESSED)
 
 #ifndef __ASSEMBLY__
 
diff -r d07ecb861009 -r 41cfce9eeb10 xen/include/asm-x86/paging.h
--- a/xen/include/asm-x86/paging.h	Tue May 29 06:02:39 2007 -0500
+++ b/xen/include/asm-x86/paging.h	Wed May 30 10:09:48 2007 -0500
@@ -164,6 +164,8 @@ void paging_final_teardown(struct domain
  * creation. */
 int paging_enable(struct domain *d, u32 mode);
 
+/* Mark dirty pages during live migration */
+void paging_mark_dirty(struct domain *d, unsigned long gmfn);
 
 /* Page fault handler
  * Called from pagefault handler in Xen, and from the HVM trap handlers
diff -r d07ecb861009 -r 41cfce9eeb10 xen/include/asm-x86/shadow.h
--- a/xen/include/asm-x86/shadow.h	Tue May 29 06:02:39 2007 -0500
+++ b/xen/include/asm-x86/shadow.h	Wed May 30 10:09:48 2007 -0500
@@ -78,13 +78,6 @@ void shadow_final_teardown(struct domain
 /* Mark a page as dirty in the log-dirty bitmap: called when Xen 
  * makes changes to guest memory on its behalf. */
 void sh_mark_dirty(struct domain *d, mfn_t gmfn);
-/* Cleaner version so we don't pepper shadow_mode tests all over the place */
-static inline void mark_dirty(struct domain *d, unsigned long gmfn)
-{
-    if ( unlikely(shadow_mode_log_dirty(d)) )
-        /* See the comment about locking in sh_mark_dirty */
-        sh_mark_dirty(d, _mfn(gmfn));
-}
 
 /* Update all the things that are derived from the guest's CR0/CR3/CR4.
  * Called to initialize paging structures if the paging mode

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Nested Paging Live Migration
  2007-06-01 15:05 [RFC] Nested Paging Live Migration Huang2, Wei
@ 2007-06-01 16:17 ` Tim Deegan
  2007-06-06  4:29   ` Wei Huang
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Deegan @ 2007-06-01 16:17 UTC (permalink / raw)
  To: Huang2, Wei; +Cc: xen-devel

Hi, 

Thanks for this patch.

At 10:05 -0500 on 01 Jun (1180692316), Huang2, Wei wrote:
> The attached file supports AMD-V nested paging live migration. Please
> comment. I will create an updated version after collecting feedbacks.

Can a lot more log-dirty code (bitmap allocation, clearing, reporting)
be made common?  E.g.: hap_mark_dirty() is virtually identical to
sh_mark_dirty() -- including some recursive locking and associated
comments that are not true in HAP modes.  Maybe give it its own lock to
cover bit-setting?  Probably only the code for clearing the bitmap
(i.e., resetting the trap that will cause us to mark pages dirty) needs
to be split out.

> The following areas require special attention:
> 1. paging_mark_dirty()
> Currently, paging_mark_dirty() dispatches to sh_mark_dirty() or
> hap_mark_dirty() based on paging support. I personally prefer a function
> pointer. However, current paging interface only provides a function
> pointer for vcpu-level functions, not for domain-level functions. This
> is a bit annoying. 

Make it a common function and that should go away. 
  
> 2. locking in p2m_set_l1e_flags()
> p2m_set_l1e_flags(), which is invoked by hap.c, calls
> hap_write_p2m_entry(). hap_lock() is called twice. I currently remove
> hap_lock() in hap_write_p2m_entry(). A better solution is needed here.

Hmm.  Since you don't ever change the monitor table of a HAP domain, it
might be possible to make hap_write_p2m_entry (and
hap.c:p2m_install_entry_in_monitors()) safe without locking.

It is worth noting that this would be a different locking discipline
from the one used in shadow code -- code paths that take both the p2m
lock and the shadow lock always take the p2m lock first (there are some
convolutions in shadow init routines etc to make sure this is true).
If the hap lock is to be taken before the p2m lock that will need some
care and attention in the rest of the code.

> +int p2m_set_l1e_flags(struct domain *d, u32 l1e_flags)
> +{
[...]
> +    for ( entry = d->page_list.next;
> +          entry != &d->page_list;
> +          entry = entry->next )
> +    {

Why not just walk the p2m?  It shouldn't be very sparse.

> +/* This function handles P2M page faults by fixing l1e flags with correct 
> + * values.  It also calls paging_mark_dirty() function to record the dirty
> + * pages.
> + */
> +int p2m_fix_table(struct domain *d, paddr_t gpa)

Can this have a better name?  It's not really fixing anything. Maybe
have this be p2m_set_flags() and the previous function be
p2m_set_flags_global()?

Also maybe the call to mark_dirty could be made from the SVM code, which
is where we're really handling the write?

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@xensource.com>, XenSource UK Limited
Registered office c/o EC2Y 5EB, UK; company number 05334508

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Nested Paging Live Migration
  2007-06-01 16:17 ` Tim Deegan
@ 2007-06-06  4:29   ` Wei Huang
  2007-06-06  9:54     ` Tim Deegan
  0 siblings, 1 reply; 8+ messages in thread
From: Wei Huang @ 2007-06-06  4:29 UTC (permalink / raw)
  To: Tim Deegan; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 3955 bytes --]

Retry.

1. Most common code are moved from shadow to paging:
* log dirty related fields (dirty_count ...) are moved to paging_domain
* log_dirty_bitmap allocation, free, peek, and clean
* mark_dirty_page becomes a common function too
* a new lock dirty lock is created to guard these code

2. shadow/hap_log_dirty_enable() and shadow/hap_log_dirty_disable()
These four functions were not changed. However, I really want to create 
two common functions (paging_log_dirty_disable() and 
paging_log_dirty_enable()) for them. To do this, it requires two 
function pointers (*log_dirty_enable() and *log_dirty_disable()), which 
point to shadow-specific code or hap-specific code. For example, 
*log_dirty_enable() points to shadow_log_dirty_enable().

Tim, let me know if you like this approach.

3. p2m_set_l1e_flags() is renamed to p2m_set_flags_global() as 
requested. It does NOT walk P2M. Instead, it still relies on 
set_p2m_entry() to walk P2M table.

The reason: I feel uncomfortable to duplicate the code of 
set_p2m_entry() in this method. Most of them will be same as 
set_p2m_entry() and p2m_next_level(). What is your opinion?


Any comments is welcome. I will create a new patch after collecting them.

Thanks,

-Wei


Tim Deegan wrote:
> Hi,
> 
> Thanks for this patch.
> 
> At 10:05 -0500 on 01 Jun (1180692316), Huang2, Wei wrote:
>  > The attached file supports AMD-V nested paging live migration. Please
>  > comment. I will create an updated version after collecting feedbacks.
> 
> Can a lot more log-dirty code (bitmap allocation, clearing, reporting)
> be made common?  E.g.: hap_mark_dirty() is virtually identical to
> sh_mark_dirty() -- including some recursive locking and associated
> comments that are not true in HAP modes.  Maybe give it its own lock to
> cover bit-setting?  Probably only the code for clearing the bitmap
> (i.e., resetting the trap that will cause us to mark pages dirty) needs
> to be split out.
> 
>  > The following areas require special attention:
>  > 1. paging_mark_dirty()
>  > Currently, paging_mark_dirty() dispatches to sh_mark_dirty() or
>  > hap_mark_dirty() based on paging support. I personally prefer a function
>  > pointer. However, current paging interface only provides a function
>  > pointer for vcpu-level functions, not for domain-level functions. This
>  > is a bit annoying.
> 
> Make it a common function and that should go away.
>  
>  > 2. locking in p2m_set_l1e_flags()
>  > p2m_set_l1e_flags(), which is invoked by hap.c, calls
>  > hap_write_p2m_entry(). hap_lock() is called twice. I currently remove
>  > hap_lock() in hap_write_p2m_entry(). A better solution is needed here.
> 
> Hmm.  Since you don't ever change the monitor table of a HAP domain, it
> might be possible to make hap_write_p2m_entry (and
> hap.c:p2m_install_entry_in_monitors()) safe without locking.
> 
> It is worth noting that this would be a different locking discipline
> from the one used in shadow code -- code paths that take both the p2m
> lock and the shadow lock always take the p2m lock first (there are some
> convolutions in shadow init routines etc to make sure this is true).
> If the hap lock is to be taken before the p2m lock that will need some
> care and attention in the rest of the code.
> 
> 
>  > +/* This function handles P2M page faults by fixing l1e flags with 
> correct
>  > + * values.  It also calls paging_mark_dirty() function to record the 
> dirty
>  > + * pages.
>  > + */
>  > +int p2m_fix_table(struct domain *d, paddr_t gpa)
> 
> Can this have a better name?  It's not really fixing anything. Maybe
> have this be p2m_set_flags() and the previous function be
> p2m_set_flags_global()?
> 
> Also maybe the call to mark_dirty could be made from the SVM code, which
> is where we're really handling the write?
> 
> Cheers,
> 
> Tim.
> 
> --
> Tim Deegan <Tim.Deegan@xensource.com>, XenSource UK Limited
> Registered office c/o EC2Y 5EB, UK; company number 05334508
> 
> 

[-- Attachment #2: npt_live_migration_RFC_2.txt --]
[-- Type: text/plain, Size: 36789 bytes --]

diff -r 7ab0527484c8 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/hvm/hvm.c	Tue Jun 05 04:35:27 2007 -0500
@@ -568,7 +568,7 @@ static int __hvm_copy(void *buf, paddr_t
         if ( dir )
         {
             memcpy(p, buf, count); /* dir == TRUE:  *to* guest */
-            mark_dirty(current->domain, mfn);
+            paging_mark_dirty(current->domain, mfn);
         }
         else
             memcpy(buf, p, count); /* dir == FALSE: *from guest */
diff -r 7ab0527484c8 xen/arch/x86/hvm/io.c
--- a/xen/arch/x86/hvm/io.c	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/hvm/io.c	Tue Jun 05 04:35:45 2007 -0500
@@ -865,7 +865,7 @@ void hvm_io_assist(void)
     if ( (p->dir == IOREQ_READ) && p->data_is_ptr )
     {
         gmfn = get_mfn_from_gpfn(paging_gva_to_gfn(v, p->data));
-        mark_dirty(d, gmfn);
+        paging_mark_dirty(d, gmfn);
     }
 
  out:
diff -r 7ab0527484c8 xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/hvm/svm/svm.c	Tue Jun 05 11:50:28 2007 -0500
@@ -1028,13 +1028,16 @@ int start_svm(struct cpuinfo_x86 *c)
 
 static int svm_do_nested_pgfault(paddr_t gpa, struct cpu_user_regs *regs)
 {
+    struct domain *d;
+
     if (mmio_space(gpa)) {
         handle_mmio(gpa);
         return 1;
     }
 
-    /* We should not reach here. Otherwise, P2M table is not correct.*/
-    return 0;
+    d = current->domain;
+    paging_mark_dirty(d, get_mfn_from_gpfn(gpa >> PAGE_SHIFT));
+    return p2m_set_flags(d, gpa, __PAGE_HYPERVISOR|_PAGE_USER);
 }
 
 static void svm_do_no_device_fault(struct vmcb_struct *vmcb)
diff -r 7ab0527484c8 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm.c	Tue Jun 05 04:34:56 2007 -0500
@@ -1556,7 +1556,7 @@ int alloc_page_type(struct page_info *pa
 
     /* A page table is dirtied when its type count becomes non-zero. */
     if ( likely(owner != NULL) )
-        mark_dirty(owner, page_to_mfn(page));
+        paging_mark_dirty(owner, page_to_mfn(page));
 
     switch ( type & PGT_type_mask )
     {
@@ -1602,7 +1602,7 @@ void free_page_type(struct page_info *pa
         if ( unlikely(paging_mode_enabled(owner)) )
         {
             /* A page table is dirtied when its type count becomes zero. */
-            mark_dirty(owner, page_to_mfn(page));
+            paging_mark_dirty(owner, page_to_mfn(page));
 
             if ( shadow_mode_refcounts(owner) )
                 return;
@@ -2057,7 +2057,7 @@ int do_mmuext_op(
             }
 
             /* A page is dirtied when its pin status is set. */
-            mark_dirty(d, mfn);
+            paging_mark_dirty(d, mfn);
            
             /* We can race domain destruction (domain_relinquish_resources). */
             if ( unlikely(this_cpu(percpu_mm_info).foreign != NULL) )
@@ -2089,7 +2089,7 @@ int do_mmuext_op(
                 put_page_and_type(page);
                 put_page(page);
                 /* A page is dirtied when its pin status is cleared. */
-                mark_dirty(d, mfn);
+                paging_mark_dirty(d, mfn);
             }
             else
             {
@@ -2424,7 +2424,7 @@ int do_mmu_update(
             set_gpfn_from_mfn(mfn, gpfn);
             okay = 1;
 
-            mark_dirty(FOREIGNDOM, mfn);
+            paging_mark_dirty(FOREIGNDOM, mfn);
 
             put_page(mfn_to_page(mfn));
             break;
@@ -3005,7 +3005,7 @@ long do_update_descriptor(u64 pa, u64 de
         break;
     }
 
-    mark_dirty(dom, mfn);
+    paging_mark_dirty(dom, mfn);
 
     /* All is good so make the update. */
     gdt_pent = map_domain_page(mfn);
diff -r 7ab0527484c8 xen/arch/x86/mm/hap/hap.c
--- a/xen/arch/x86/mm/hap/hap.c	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm/hap/hap.c	Tue Jun 05 16:37:53 2007 -0500
@@ -385,6 +385,56 @@ void hap_destroy_monitor_table(struct vc
 }
 
 /************************************************/
+/*             HAP LOG DIRTY SUPPORT            */
+/************************************************/
+int hap_log_dirty_enable(struct domain *d)
+{
+    int ret;
+
+    domain_pause(d);
+    hap_lock(d);
+
+    ret = paging_alloc_log_dirty_bitmap(d);
+    if ( ret != 0 )
+    {
+	paging_free_log_dirty_bitmap(d);
+	goto out;
+    }
+
+    /* turn on PG_log_dirty bit in paging mode */
+    d->arch.paging.mode |= PG_log_dirty;
+
+    /* mark physical memory as not writable */
+    p2m_set_flags_global(d, (_PAGE_PRESENT|_PAGE_USER));
+    flush_tlb_all_pge();
+
+ out:
+    hap_unlock(d);
+    domain_unpause(d);
+    
+    return ret;
+}
+
+int hap_log_dirty_disable(struct domain *d)
+{
+    domain_pause(d);
+    hap_lock(d);
+    if ( paging_mode_log_dirty(d) )
+	paging_free_log_dirty_bitmap(d);
+
+    /* turn off PG_log_dirty bit in paging mode */
+    d->arch.paging.mode &= ~PG_log_dirty;
+
+    /* recover P2M table to normal mode */
+    p2m_set_flags_global(d, __PAGE_HYPERVISOR|_PAGE_USER);
+
+    hap_unlock(d);
+    domain_unpause(d);
+
+    return 1;
+}
+
+/************************************************/
 /*          HAP DOMAIN LEVEL FUNCTIONS          */
 /************************************************/
 void hap_domain_init(struct domain *d)
@@ -498,12 +548,16 @@ int hap_domctl(struct domain *d, xen_dom
 
     HERE_I_AM;
 
-    if ( unlikely(d == current->domain) ) {
-        gdprintk(XENLOG_INFO, "Don't try to do a hap op on yourself!\n");
-        return -EINVAL;
-    }
-    
     switch ( sc->op ) {
+    case XEN_DOMCTL_SHADOW_OP_OFF:
+	if ( paging_mode_log_dirty(d) )
+            if ( (rc = hap_log_dirty_disable(d)) != 0 )
+                return rc;
+	return 0;
+
+    case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
+	return hap_log_dirty_enable(d);
+
     case XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION:
         hap_lock(d);
         rc = hap_set_allocation(d, sc->mb << (20 - PAGE_SHIFT), &preempted);
diff -r 7ab0527484c8 xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm/p2m.c	Tue Jun 05 11:41:29 2007 -0500
@@ -169,7 +169,7 @@ p2m_next_level(struct domain *d, mfn_t *
 
 // Returns 0 on error (out of memory)
 static int
-set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, u32 l1e_flags)
 {
     // XXX -- this might be able to be faster iff current->domain == d
     mfn_t table_mfn = pagetable_get_mfn(d->arch.phys_table);
@@ -213,7 +213,7 @@ set_p2m_entry(struct domain *d, unsigned
         d->arch.p2m.max_mapped_pfn = gfn;
 
     if ( mfn_valid(mfn) )
-        entry_content = l1e_from_pfn(mfn_x(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
+        entry_content = l1e_from_pfn(mfn_x(mfn), l1e_flags);
     else
         entry_content = l1e_empty();
 
@@ -278,7 +278,7 @@ int p2m_alloc_table(struct domain *d,
         p2m_unlock(d);
         return -ENOMEM;
     }
-list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
+    list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
 
     p2m_top->count_info = 1;
     p2m_top->u.inuse.type_info = 
@@ -297,8 +297,8 @@ list_add_tail(&p2m_top->list, &d->arch.p
  
     /* Initialise physmap tables for slot zero. Other code assumes this. */
     gfn = 0;
-mfn = _mfn(INVALID_MFN);
-    if ( !set_p2m_entry(d, gfn, mfn) )
+    mfn = _mfn(INVALID_MFN);
+    if ( !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
         goto error;
 
     for ( entry = d->page_list.next;
@@ -316,7 +316,7 @@ mfn = _mfn(INVALID_MFN);
             (gfn != 0x55555555L)
 #endif
              && gfn != INVALID_M2P_ENTRY
-             && !set_p2m_entry(d, gfn, mfn) )
+             && !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
             goto error;
     }
 
@@ -626,7 +626,7 @@ p2m_remove_page(struct domain *d, unsign
     ASSERT(mfn_x(gfn_to_mfn(d, gfn)) == mfn);
     //ASSERT(mfn_to_gfn(d, mfn) == gfn);
 
-    set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+    set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
 }
 
@@ -659,7 +659,7 @@ guest_physmap_add_page(struct domain *d,
     omfn = gfn_to_mfn(d, gfn);
     if ( mfn_valid(omfn) )
     {
-        set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+        set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
         set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
     }
 
@@ -685,13 +685,81 @@ guest_physmap_add_page(struct domain *d,
         }
     }
 
-    set_p2m_entry(d, gfn, _mfn(mfn));
+    set_p2m_entry(d, gfn, _mfn(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, gfn);
 
     audit_p2m(d);
     p2m_unlock(d);
 }
 
+/* This function goes through P2M table and modify l1e flags of all pages. Note
+ * that  physical base address of l1e is intact. This function can be used for 
+ * special purpose, such as marking physical memory as Not-Writable for
+ * tracking dirty pages during live migration. 
+ */
+int p2m_set_flags_global(struct domain *d, u32 l1e_flags) 
+{
+    mfn_t mfn;
+    struct list_head *entry;
+    struct page_info *page;
+    unsigned long gfn;
+
+    p2m_lock(d);
+
+    if ( pagetable_get_pfn(d->arch.phys_table) == 0 )
+    {
+	P2M_ERROR("p2m table has not been allocated for this domain yet!\n");
+	p2m_unlock(d);
+	return -EINVAL;
+    }
+
+    for ( entry = d->page_list.next;
+          entry != &d->page_list;
+          entry = entry->next )
+    {
+        page	= list_entry(entry, struct page_info, list);
+        mfn = page_to_mfn(page);
+        gfn = get_gpfn_from_mfn(mfn_x(mfn));
+        if (
+#ifdef __x86_64__
+            (gfn != 0x5555555555555555L)
+#else
+            (gfn != 0x55555555L)
+#endif
+             && gfn != INVALID_M2P_ENTRY
+             && !set_p2m_entry(d, gfn, mfn, l1e_flags) )
+            goto error;
+    }
+
+    p2m_unlock(d);
+    return 0;
+
+ error:
+    P2M_PRINTK("failed to change l1e flags of p2m table, gfn=%05lx, mfn=%"
+               PRI_mfn "\n", gfn, mfn_x(mfn));
+    p2m_unlock(d);
+    return -ENOMEM;
+}
+
+/* This function goes through p2M table and modifies l1e flags of a specific 
+ * gpa.
+ */
+int p2m_set_flags(struct domain *d, paddr_t gpa, u32 l1e_flags) 
+{
+    unsigned long gfn;
+    mfn_t mfn;
+
+    p2m_lock(d);
+
+    gfn = gpa >> PAGE_SHIFT;
+    mfn = gfn_to_mfn(d, gfn);
+    if ( mfn_valid(mfn) )
+        set_p2m_entry(d, gfn, mfn, l1e_flags);
+    
+    p2m_unlock(d);
+
+    return 1;
+}
 
 /*
  * Local variables:
diff -r 7ab0527484c8 xen/arch/x86/mm/paging.c
--- a/xen/arch/x86/mm/paging.c	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm/paging.c	Tue Jun 05 17:20:34 2007 -0500
@@ -25,6 +25,15 @@
 #include <asm/shadow.h>
 #include <asm/p2m.h>
 #include <asm/hap.h>
+#include <asm/guest_access.h>
+
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_page
+#define mfn_to_page(_m) (frame_table + mfn_x(_m))
+#undef mfn_valid
+#define mfn_valid(_mfn) (mfn_x(_mfn) < max_page)
+#undef page_to_mfn
+#define page_to_mfn(_pg) (_mfn((_pg) - frame_table))
 
 /* Xen command-line option to enable hardware-assisted paging */
 int opt_hap_enabled;
@@ -42,10 +51,200 @@ boolean_param("hap", opt_hap_enabled);
     } while (0)
 
 
+/* log dirty mode lock */
+#define log_dirty_lock_init(_d)                                   \
+    do {                                                          \
+        spin_lock_init(&(_d)->arch.paging.log_dirty_lock);        \
+        (_d)->arch.paging.log_dirty_locker = -1;                  \
+        (_d)->arch.paging.log_dirty_locker_function = "nobody";   \
+    } while (0)
+
+#define log_dirty_lock(_d)                                                   \
+    do {                                                                     \
+        if (unlikely((_d)->arch.paging.log_dirty_locker==current->processor))\
+        {                                                                    \
+            printk("Error: paging log dirty lock held by %s\n",              \
+                   (_d)->arch.paging.log_dirty_locker_function);             \
+            BUG();                                                           \
+        }                                                                    \
+        spin_lock(&(_d)->arch.paging.log_dirty_lock);                        \
+        ASSERT((_d)->arch.paging.log_dirty_locker == -1);                    \
+        (_d)->arch.paging.log_dirty_locker = current->processor;             \
+        (_d)->arch.paging.log_dirty_locker_function = __func__;              \
+    } while (0)
+
+#define log_dirty_unlock(_d)                                              \
+    do {                                                                  \
+        ASSERT((_d)->arch.paging.log_dirty_locker == current->processor); \
+        (_d)->arch.paging.log_dirty_locker = -1;                          \
+        (_d)->arch.paging.log_dirty_locker_function = "nobody";           \
+        spin_unlock(&(_d)->arch.paging.log_dirty_lock);                   \
+    } while (0)
+
+
+int paging_alloc_log_dirty_bitmap(struct domain *d)
+{
+    ASSERT(d->arch.paging.dirty_bitmap == NULL);
+    d->arch.paging.dirty_bitmap_size =
+        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
+    d->arch.paging.dirty_bitmap =
+        xmalloc_array(unsigned long,
+                      d->arch.paging.dirty_bitmap_size / BITS_PER_LONG);
+    if ( d->arch.paging.dirty_bitmap == NULL )
+    {
+        d->arch.paging.dirty_bitmap_size = 0;
+        return -ENOMEM;
+    }
+    memset(d->arch.paging.dirty_bitmap, 0,
+           d->arch.paging.dirty_bitmap_size/8);
+
+    return 0;
+}
+
+void paging_free_log_dirty_bitmap(struct domain *d)
+{
+    d->arch.paging.dirty_bitmap_size = 0;
+    if ( d->arch.paging.dirty_bitmap )
+    {
+        xfree(d->arch.paging.dirty_bitmap);
+        d->arch.paging.dirty_bitmap = NULL;
+    }
+}
+
+/* Mark a page as dirty */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn)
+{
+    unsigned long pfn;
+    mfn_t gmfn;
+
+    gmfn = _mfn(guest_mfn);
+
+    if ( !paging_mode_log_dirty(d) || !mfn_valid(gmfn) )
+        return;
+
+    log_dirty_lock(d);
+
+    ASSERT(d->arch.paging.dirty_bitmap != NULL);
+
+    /* We /really/ mean PFN here, even for non-translated guests. */
+    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
+
+    /*
+     * Values with the MSB set denote MFNs that aren't really part of the 
+     * domain's pseudo-physical memory map (e.g., the shared info frame).
+     * Nothing to do here...
+     */
+    if ( unlikely(!VALID_M2P(pfn)) )
+        return;
+
+    /* N.B. Can use non-atomic TAS because protected by shadow_lock. */
+    if ( likely(pfn < d->arch.paging.dirty_bitmap_size) ) 
+    { 
+        if ( !__test_and_set_bit(pfn, d->arch.paging.dirty_bitmap) )
+        {
+            PAGING_DEBUG(LOGDIRTY, 
+                          "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n",
+                          mfn_x(gmfn), pfn, d->domain_id);
+            d->arch.paging.dirty_count++;
+        }
+    }
+    else
+    {
+        PAGING_PRINTK("mark_dirty OOR! "
+                       "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
+                       "owner=%d c=%08x t=%" PRtype_info "\n",
+                       mfn_x(gmfn), 
+                       pfn, 
+                       d->arch.paging.dirty_bitmap_size,
+                       d->domain_id,
+                       (page_get_owner(mfn_to_page(gmfn))
+                        ? page_get_owner(mfn_to_page(gmfn))->domain_id
+                        : -1),
+                       mfn_to_page(gmfn)->count_info, 
+                       mfn_to_page(gmfn)->u.inuse.type_info);
+    }
+
+    log_dirty_unlock(d);
+}
+
+/* Read a domain's log-dirty bitmap and stats.  If the operation is a CLEAN, 
+ * clear the bitmap and stats as well. */
+int paging_log_dirty_op(struct domain *d, struct xen_domctl_shadow_op *sc)
+{
+    int i, rv = 0, clean = 0, peek = 1;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+
+    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
+
+    PAGING_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", 
+                  (clean) ? "clean" : "peek",
+                  d->domain_id,
+                  d->arch.paging.fault_count, 
+                  d->arch.paging.dirty_count);
+
+    sc->stats.fault_count = d->arch.paging.fault_count;
+    sc->stats.dirty_count = d->arch.paging.dirty_count;
+
+    if ( clean )
+    {
+	/* Further operations are required for XEN_DOMCTL_SHADOW_OP_CLEAN. We
+	 * dispatch to next-level log_dirty functions based on paging mode */
+	if ( !paging_mode_hap(d) )
+	    shadow_log_dirty_op_clean(d);
+
+        d->arch.paging.fault_count = 0;
+        d->arch.paging.dirty_count = 0;
+    }
+
+    if ( guest_handle_is_null(sc->dirty_bitmap) )
+        /* caller may have wanted just to clean the state or access stats. */
+        peek = 0;
+
+    if ( (peek || clean) && (d->arch.paging.dirty_bitmap == NULL) )
+    {
+        rv = -EINVAL; /* perhaps should be ENOMEM? */
+        goto out;
+    }
+ 
+    if ( sc->pages > d->arch.paging.dirty_bitmap_size )
+        sc->pages = d->arch.paging.dirty_bitmap_size;
+
+#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */
+    for ( i = 0; i < sc->pages; i += CHUNK )
+    {
+        int bytes = ((((sc->pages - i) > CHUNK)
+                      ? CHUNK
+                      : (sc->pages - i)) + 7) / 8;
+
+        if ( likely(peek) )
+        {
+            if ( copy_to_guest_offset(
+                sc->dirty_bitmap, i/8,
+                (uint8_t *)d->arch.paging.dirty_bitmap + (i/8), bytes) )
+            {
+                rv = -EFAULT;
+                goto out;
+            }
+        }
+
+        if ( clean )
+            memset((uint8_t *)d->arch.paging.dirty_bitmap + (i/8), 0, bytes);
+    }
+#undef CHUNK
+
+ out:
+    log_dirty_unlock(d);
+    domain_unpause(d);
+    return rv;
+}
+
 /* Domain paging struct initialization. */
 void paging_domain_init(struct domain *d)
 {
     p2m_init(d);
+    log_dirty_lock_init(d);
     shadow_domain_init(d);
 
     if ( opt_hap_enabled && is_hvm_domain(d) )
@@ -65,11 +264,40 @@ int paging_domctl(struct domain *d, xen_
 int paging_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,
                   XEN_GUEST_HANDLE(void) u_domctl)
 {
-    /* Here, dispatch domctl to the appropriate paging code */
-    if ( opt_hap_enabled && is_hvm_domain(d) )
-        return hap_domctl(d, sc, u_domctl);
-    else
-        return shadow_domctl(d, sc, u_domctl);
+    if ( unlikely(d == current->domain) )
+    {
+        gdprintk(XENLOG_INFO, "Dom %u tried to do a paging op on itself.\n",
+                 d->domain_id);
+        return -EINVAL;
+    }
+
+    if ( unlikely(d->is_dying) )
+    {
+        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
+                 d->domain_id);
+        return 0;
+    }
+
+    if ( unlikely(d->vcpu[0] == NULL) )
+    {
+        PAGING_ERROR("Paging op on a domain (%u) with no vcpus\n",
+                     d->domain_id);
+        return -EINVAL;
+    }
+
+    switch ( sc->op )
+    {
+    case XEN_DOMCTL_SHADOW_OP_CLEAN:
+    case XEN_DOMCTL_SHADOW_OP_PEEK:
+        return paging_log_dirty_op(d, sc);
+	
+    default:
+	/* Dispatch other domctl operations to the appropriate paging code */
+	if ( opt_hap_enabled && is_hvm_domain(d) )
+	    return hap_domctl(d, sc, u_domctl);
+	else
+	    return shadow_domctl(d, sc, u_domctl);
+    }
 }
 
 /* Call when destroying a domain */
diff -r 7ab0527484c8 xen/arch/x86/mm/shadow/common.c
--- a/xen/arch/x86/mm/shadow/common.c	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm/shadow/common.c	Tue Jun 05 17:20:34 2007 -0500
@@ -87,8 +87,6 @@ __initcall(shadow_audit_key_init);
 __initcall(shadow_audit_key_init);
 #endif /* SHADOW_AUDIT */
 
-static void sh_free_log_dirty_bitmap(struct domain *d);
-
 int _shadow_mode_refcounts(struct domain *d)
 {
     return shadow_mode_refcounts(d);
@@ -541,7 +539,7 @@ sh_validate_guest_entry(struct vcpu *v, 
     int result = 0;
     struct page_info *page = mfn_to_page(gmfn);
 
-    sh_mark_dirty(v->domain, gmfn);
+    paging_mark_dirty(v->domain, mfn_x(gmfn));
     
     // Determine which types of shadows are affected, and update each.
     //
@@ -2565,7 +2563,7 @@ void shadow_teardown(struct domain *d)
         if (d->arch.paging.shadow.hash_table) 
             shadow_hash_teardown(d);
         /* Release the log-dirty bitmap of dirtied pages */
-        sh_free_log_dirty_bitmap(d);
+        paging_free_log_dirty_bitmap(d);
         /* Should not have any more memory held */
         SHADOW_PRINTK("teardown done."
                        "  Shadow pages total = %u, free = %u, p2m=%u\n",
@@ -2724,37 +2722,6 @@ static int shadow_test_disable(struct do
     return ret;
 }
 
-static int
-sh_alloc_log_dirty_bitmap(struct domain *d)
-{
-    ASSERT(d->arch.paging.shadow.dirty_bitmap == NULL);
-    d->arch.paging.shadow.dirty_bitmap_size =
-        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
-    d->arch.paging.shadow.dirty_bitmap =
-        xmalloc_array(unsigned long,
-                      d->arch.paging.shadow.dirty_bitmap_size / BITS_PER_LONG);
-    if ( d->arch.paging.shadow.dirty_bitmap == NULL )
-    {
-        d->arch.paging.shadow.dirty_bitmap_size = 0;
-        return -ENOMEM;
-    }
-    memset(d->arch.paging.shadow.dirty_bitmap, 0,
-           d->arch.paging.shadow.dirty_bitmap_size/8);
-
-    return 0;
-}
-
-static void
-sh_free_log_dirty_bitmap(struct domain *d)
-{
-    d->arch.paging.shadow.dirty_bitmap_size = 0;
-    if ( d->arch.paging.shadow.dirty_bitmap )
-    {
-        xfree(d->arch.paging.shadow.dirty_bitmap);
-        d->arch.paging.shadow.dirty_bitmap = NULL;
-    }
-}
-
 static int shadow_log_dirty_enable(struct domain *d)
 {
     int ret;
@@ -2784,16 +2751,16 @@ static int shadow_log_dirty_enable(struc
         d->arch.paging.shadow.opt_flags = SHOPT_LINUX_L3_TOPLEVEL;
 #endif
 
-    ret = sh_alloc_log_dirty_bitmap(d);
+    ret = paging_alloc_log_dirty_bitmap(d);
     if ( ret != 0 )
     {
-        sh_free_log_dirty_bitmap(d);
+        paging_free_log_dirty_bitmap(d);
         goto out;
     }
 
     ret = shadow_one_bit_enable(d, PG_log_dirty);
     if ( ret != 0 )
-        sh_free_log_dirty_bitmap(d);
+        paging_free_log_dirty_bitmap(d);
 
  out:
     shadow_unlock(d);
@@ -2809,11 +2776,21 @@ static int shadow_log_dirty_disable(stru
     shadow_lock(d);
     ret = shadow_one_bit_disable(d, PG_log_dirty);
     if ( !shadow_mode_log_dirty(d) )
-        sh_free_log_dirty_bitmap(d);
+        paging_free_log_dirty_bitmap(d);
     shadow_unlock(d);
     domain_unpause(d);
 
     return ret;
+}
+
+void shadow_log_dirty_op_clean(struct domain *d) 
+{
+    /* Need to revoke write access to the domain's pages again.
+     * In future, we'll have a less heavy-handed approach to this,
+     * but for now, we just unshadow everything except Xen. */
+    shadow_lock(d);
+    shadow_blow_tables(d);
+    shadow_unlock(d);
 }
 
 /**************************************************************************/
@@ -2892,150 +2869,6 @@ void shadow_convert_to_log_dirty(struct 
     BUG();
 }
 
-
-/* Read a domain's log-dirty bitmap and stats.  
- * If the operation is a CLEAN, clear the bitmap and stats as well. */
-static int shadow_log_dirty_op(
-    struct domain *d, struct xen_domctl_shadow_op *sc)
-{
-    int i, rv = 0, clean = 0, peek = 1;
-
-    domain_pause(d);
-    shadow_lock(d);
-
-    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
-
-    SHADOW_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", 
-                  (clean) ? "clean" : "peek",
-                  d->domain_id,
-                  d->arch.paging.shadow.fault_count, 
-                  d->arch.paging.shadow.dirty_count);
-
-    sc->stats.fault_count = d->arch.paging.shadow.fault_count;
-    sc->stats.dirty_count = d->arch.paging.shadow.dirty_count;
-
-    if ( clean )
-    {
-        /* Need to revoke write access to the domain's pages again.
-         * In future, we'll have a less heavy-handed approach to this,
-         * but for now, we just unshadow everything except Xen. */
-        shadow_blow_tables(d);
-
-        d->arch.paging.shadow.fault_count = 0;
-        d->arch.paging.shadow.dirty_count = 0;
-    }
-
-    if ( guest_handle_is_null(sc->dirty_bitmap) )
-        /* caller may have wanted just to clean the state or access stats. */
-        peek = 0;
-
-    if ( (peek || clean) && (d->arch.paging.shadow.dirty_bitmap == NULL) )
-    {
-        rv = -EINVAL; /* perhaps should be ENOMEM? */
-        goto out;
-    }
- 
-    if ( sc->pages > d->arch.paging.shadow.dirty_bitmap_size )
-        sc->pages = d->arch.paging.shadow.dirty_bitmap_size;
-
-#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */
-    for ( i = 0; i < sc->pages; i += CHUNK )
-    {
-        int bytes = ((((sc->pages - i) > CHUNK)
-                      ? CHUNK
-                      : (sc->pages - i)) + 7) / 8;
-
-        if ( likely(peek) )
-        {
-            if ( copy_to_guest_offset(
-                sc->dirty_bitmap, i/8,
-                (uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), bytes) )
-            {
-                rv = -EFAULT;
-                goto out;
-            }
-        }
-
-        if ( clean )
-            memset((uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), 0, bytes);
-    }
-#undef CHUNK
-
- out:
-    shadow_unlock(d);
-    domain_unpause(d);
-    return rv;
-}
-
-
-/* Mark a page as dirty */
-void sh_mark_dirty(struct domain *d, mfn_t gmfn)
-{
-    unsigned long pfn;
-    int do_locking;
-
-    if ( !shadow_mode_log_dirty(d) || !mfn_valid(gmfn) )
-        return;
-
-    /* Although this is an externally visible function, we do not know
-     * whether the shadow lock will be held when it is called (since it
-     * can be called from __hvm_copy during emulation).
-     * If the lock isn't held, take it for the duration of the call. */
-    do_locking = !shadow_locked_by_me(d);
-    if ( do_locking ) 
-    { 
-        shadow_lock(d);
-        /* Check the mode again with the lock held */ 
-        if ( unlikely(!shadow_mode_log_dirty(d)) )
-        {
-            shadow_unlock(d);
-            return;
-        }
-    }
-
-    ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL);
-
-    /* We /really/ mean PFN here, even for non-translated guests. */
-    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
-
-    /*
-     * Values with the MSB set denote MFNs that aren't really part of the 
-     * domain's pseudo-physical memory map (e.g., the shared info frame).
-     * Nothing to do here...
-     */
-    if ( unlikely(!VALID_M2P(pfn)) )
-        return;
-
-    /* N.B. Can use non-atomic TAS because protected by shadow_lock. */
-    if ( likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) ) 
-    { 
-        if ( !__test_and_set_bit(pfn, d->arch.paging.shadow.dirty_bitmap) )
-        {
-            SHADOW_DEBUG(LOGDIRTY, 
-                          "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n",
-                          mfn_x(gmfn), pfn, d->domain_id);
-            d->arch.paging.shadow.dirty_count++;
-        }
-    }
-    else
-    {
-        SHADOW_PRINTK("mark_dirty OOR! "
-                       "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
-                       "owner=%d c=%08x t=%" PRtype_info "\n",
-                       mfn_x(gmfn), 
-                       pfn, 
-                       d->arch.paging.shadow.dirty_bitmap_size,
-                       d->domain_id,
-                       (page_get_owner(mfn_to_page(gmfn))
-                        ? page_get_owner(mfn_to_page(gmfn))->domain_id
-                        : -1),
-                       mfn_to_page(gmfn)->count_info, 
-                       mfn_to_page(gmfn)->u.inuse.type_info);
-    }
-
-    if ( do_locking ) shadow_unlock(d);
-}
-
 /**************************************************************************/
 /* Shadow-control XEN_DOMCTL dispatcher */
 
@@ -3044,27 +2877,6 @@ int shadow_domctl(struct domain *d,
                   XEN_GUEST_HANDLE(void) u_domctl)
 {
     int rc, preempted = 0;
-
-    if ( unlikely(d == current->domain) )
-    {
-        gdprintk(XENLOG_INFO, "Dom %u tried to do a shadow op on itself.\n",
-                 d->domain_id);
-        return -EINVAL;
-    }
-
-    if ( unlikely(d->is_dying) )
-    {
-        gdprintk(XENLOG_INFO, "Ignoring shadow op on dying domain %u\n",
-                 d->domain_id);
-        return 0;
-    }
-
-    if ( unlikely(d->vcpu[0] == NULL) )
-    {
-        SHADOW_ERROR("Shadow op on a domain (%u) with no vcpus\n",
-                     d->domain_id);
-        return -EINVAL;
-    }
 
     switch ( sc->op )
     {
@@ -3085,10 +2897,6 @@ int shadow_domctl(struct domain *d,
 
     case XEN_DOMCTL_SHADOW_OP_ENABLE_TRANSLATE:
         return shadow_enable(d, PG_refcounts|PG_translate);
-
-    case XEN_DOMCTL_SHADOW_OP_CLEAN:
-    case XEN_DOMCTL_SHADOW_OP_PEEK:
-        return shadow_log_dirty_op(d, sc);
 
     case XEN_DOMCTL_SHADOW_OP_ENABLE:
         if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY )
diff -r 7ab0527484c8 xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm/shadow/multi.c	Tue Jun 05 04:38:26 2007 -0500
@@ -457,7 +457,7 @@ static u32 guest_set_ad_bits(struct vcpu
     }
 
     /* Set the bit(s) */
-    sh_mark_dirty(v->domain, gmfn);
+    paging_mark_dirty(v->domain, mfn_x(gmfn));
     SHADOW_DEBUG(A_AND_D, "gfn = %" SH_PRI_gfn ", "
                  "old flags = %#x, new flags = %#x\n", 
                  gfn_x(guest_l1e_get_gfn(*ep)), guest_l1e_get_flags(*ep), 
@@ -717,7 +717,7 @@ _sh_propagate(struct vcpu *v,
     if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) )
     {
         if ( ft & FETCH_TYPE_WRITE ) 
-            sh_mark_dirty(d, target_mfn);
+            paging_mark_dirty(d, mfn_x(target_mfn));
         else if ( !sh_mfn_is_dirty(d, target_mfn) )
             sflags &= ~_PAGE_RW;
     }
@@ -2856,7 +2856,7 @@ static int sh_page_fault(struct vcpu *v,
     }
 
     perfc_incr(shadow_fault_fixed);
-    d->arch.paging.shadow.fault_count++;
+    d->arch.paging.fault_count++;
     reset_early_unshadow(v);
 
  done:
@@ -4058,7 +4058,7 @@ sh_x86_emulate_write(struct vcpu *v, uns
     else
         reset_early_unshadow(v);
     
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
@@ -4114,7 +4114,7 @@ sh_x86_emulate_cmpxchg(struct vcpu *v, u
     else
         reset_early_unshadow(v);
 
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
@@ -4158,7 +4158,7 @@ sh_x86_emulate_cmpxchg8b(struct vcpu *v,
     else
         reset_early_unshadow(v);
 
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
diff -r 7ab0527484c8 xen/arch/x86/mm/shadow/private.h
--- a/xen/arch/x86/mm/shadow/private.h	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm/shadow/private.h	Mon Jun 04 17:56:23 2007 -0500
@@ -496,13 +496,13 @@ sh_mfn_is_dirty(struct domain *d, mfn_t 
 {
     unsigned long pfn;
     ASSERT(shadow_mode_log_dirty(d));
-    ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL);
+    ASSERT(d->arch.paging.dirty_bitmap != NULL);
 
     /* We /really/ mean PFN here, even for non-translated guests. */
     pfn = get_gpfn_from_mfn(mfn_x(gmfn));
     if ( likely(VALID_M2P(pfn))
-         && likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) 
-         && test_bit(pfn, d->arch.paging.shadow.dirty_bitmap) )
+         && likely(pfn < d->arch.paging.dirty_bitmap_size) 
+         && test_bit(pfn, d->arch.paging.dirty_bitmap) )
         return 1;
 
     return 0;
diff -r 7ab0527484c8 xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/include/asm-x86/domain.h	Tue Jun 05 04:21:38 2007 -0500
@@ -92,14 +92,6 @@ struct shadow_domain {
 
     /* Fast MMIO path heuristic */
     int has_fast_mmio_entries;
-
-    /* Shadow log-dirty bitmap */
-    unsigned long *dirty_bitmap;
-    unsigned int dirty_bitmap_size;  /* in pages, bit per page */
-
-    /* Shadow log-dirty mode stats */
-    unsigned int fault_count;
-    unsigned int dirty_count;
 };
 
 struct shadow_vcpu {
@@ -164,6 +156,19 @@ struct paging_domain {
 
     /* Other paging assistance code will have structs here */
     struct hap_domain    hap;
+
+    /* log-dirty lock */
+    spinlock_t           log_dirty_lock;
+    int                  log_dirty_locker; /* processor which holds the lock */
+    const char          *log_dirty_locker_function; /* func that took it */
+
+    /* log-dirty bitmap */
+    unsigned long *dirty_bitmap;
+    unsigned int dirty_bitmap_size;  /* in pages, bit per page */
+
+    /* log-dirty mode stats */
+    unsigned int fault_count;
+    unsigned int dirty_count;
 };
 
 struct paging_vcpu {
diff -r 7ab0527484c8 xen/include/asm-x86/grant_table.h
--- a/xen/include/asm-x86/grant_table.h	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/include/asm-x86/grant_table.h	Tue Jun 05 04:33:38 2007 -0500
@@ -31,7 +31,7 @@ int replace_grant_host_mapping(
 #define gnttab_shared_gmfn(d, t, i)                     \
     (mfn_to_gmfn(d, gnttab_shared_mfn(d, t, i)))
 
-#define gnttab_mark_dirty(d, f) mark_dirty((d), (f))
+#define gnttab_mark_dirty(d, f) paging_mark_dirty((d), (f))
 
 static inline void gnttab_clear_flag(unsigned long nr, uint16_t *addr)
 {
diff -r 7ab0527484c8 xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/include/asm-x86/p2m.h	Tue Jun 05 11:42:54 2007 -0500
@@ -129,6 +129,11 @@ void guest_physmap_remove_page(struct do
 void guest_physmap_remove_page(struct domain *d, unsigned long gfn,
                                unsigned long mfn);
 
+/* Configure l1e flags of P2M table */
+int p2m_set_flags_global(struct domain *d, u32 flags);
+
+/* Set P2M l1e flags of a specific page */
+int p2m_set_flags(struct domain *d, paddr_t gpa, u32 flags);
 
 #endif /* _XEN_P2M_H */
 
diff -r 7ab0527484c8 xen/include/asm-x86/paging.h
--- a/xen/include/asm-x86/paging.h	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/include/asm-x86/paging.h	Tue Jun 05 04:55:23 2007 -0500
@@ -63,6 +63,8 @@
 #define paging_mode_translate(_d) ((_d)->arch.paging.mode & PG_translate)
 #define paging_mode_external(_d)  ((_d)->arch.paging.mode & PG_external)
 
+/* flags used for paging debug */
+#define PAGING_DEBUG_LOGDIRTY 0
 /******************************************************************************
  * The equivalent for a particular vcpu of a shadowed domain. */
 
@@ -164,6 +166,14 @@ void paging_final_teardown(struct domain
  * creation. */
 int paging_enable(struct domain *d, u32 mode);
 
+/* allocate memory resource for log dirty */
+int paging_alloc_log_dirty_bitmap(struct domain *d);
+
+/* free memory resource for log dirty */
+void paging_free_log_dirty_bitmap(struct domain *d);
+
+/* mark a page as dirty page */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn);
 
 /* Page fault handler
  * Called from pagefault handler in Xen, and from the HVM trap handlers
diff -r 7ab0527484c8 xen/include/asm-x86/shadow.h
--- a/xen/include/asm-x86/shadow.h	Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/include/asm-x86/shadow.h	Tue Jun 05 09:58:00 2007 -0500
@@ -75,22 +75,13 @@ void shadow_teardown(struct domain *d);
 /* Call once all of the references to the domain have gone away */
 void shadow_final_teardown(struct domain *d);
 
-/* Mark a page as dirty in the log-dirty bitmap: called when Xen 
- * makes changes to guest memory on its behalf. */
-void sh_mark_dirty(struct domain *d, mfn_t gmfn);
-/* Cleaner version so we don't pepper shadow_mode tests all over the place */
-static inline void mark_dirty(struct domain *d, unsigned long gmfn)
-{
-    if ( unlikely(shadow_mode_log_dirty(d)) )
-        /* See the comment about locking in sh_mark_dirty */
-        sh_mark_dirty(d, _mfn(gmfn));
-}
-
 /* Update all the things that are derived from the guest's CR0/CR3/CR4.
  * Called to initialize paging structures if the paging mode
  * has changed, and when bringing up a VCPU for the first time. */
 void shadow_update_paging_modes(struct vcpu *v);
 
+/* handle log_dirty CLEAN operation. */
+void shadow_log_dirty_op_clean(struct domain *d);
 
 /* Remove all mappings of the guest page from the shadows. 
  * This is called from common code.  It does not flush TLBs. */

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Nested Paging Live Migration
  2007-06-06  4:29   ` Wei Huang
@ 2007-06-06  9:54     ` Tim Deegan
  2007-06-07 21:58       ` Huang2, Wei
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Deegan @ 2007-06-06  9:54 UTC (permalink / raw)
  To: Wei Huang; +Cc: xen-devel

Hi, 

At 23:29 -0500 on 05 Jun (1181086164), Wei Huang wrote:
> 2. shadow/hap_log_dirty_enable() and shadow/hap_log_dirty_disable()
> These four functions were not changed. However, I really want to create 
> two common functions (paging_log_dirty_disable() and 
> paging_log_dirty_enable()) for them. To do this, it requires two 
> function pointers (*log_dirty_enable() and *log_dirty_disable()), which 
> point to shadow-specific code or hap-specific code. For example, 
> *log_dirty_enable() points to shadow_log_dirty_enable().
> 
> Tim, let me know if you like this approach.

Yep, that seems fine.
 
> 3. p2m_set_l1e_flags() is renamed to p2m_set_flags_global() as 
> requested. It does NOT walk P2M. Instead, it still relies on 
> set_p2m_entry() to walk P2M table.
> 
> The reason: I feel uncomfortable to duplicate the code of 
> set_p2m_entry() in this method. Most of them will be same as 
> set_p2m_entry() and p2m_next_level(). What is your opinion?

I think it'd be fairly easy to do with a few nested loops since it
doesn't need to care about contents or changing the shape of the tree,
or have to handle different PT layouts at run-time.

I was worried about the cost of reading the struct page-info and the m2p
and doing _two_ traverses of the p2m for every frame in the domain; but
I don't suppose that enabling log-dirty mode is too time-critical an
operation. :)

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@xensource.com>, XenSource UK Limited
Registered office c/o EC2Y 5EB, UK; company number 05334508

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [RFC] Nested Paging Live Migration
  2007-06-06  9:54     ` Tim Deegan
@ 2007-06-07 21:58       ` Huang2, Wei
  2007-06-08 10:52         ` Tim Deegan
  0 siblings, 1 reply; 8+ messages in thread
From: Huang2, Wei @ 2007-06-07 21:58 UTC (permalink / raw)
  To: Tim Deegan; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1860 bytes --]

Retry.

All common functions were extracted to paging level. Plus,
p2m_set_flags_global() does NOT rely on set_p2m_entry() anymore. 

Live_migrate_patch_all.txt is the complete patch. To make it clear, I
further splitted it into two small ones (1:
live_migrate_interface_patch.txt, 2: live_migrate_npt_patch.txt).

Please comment. Thanks.

-Wei

Tim Deegan wrote:
> Hi,
> 
> At 23:29 -0500 on 05 Jun (1181086164), Wei Huang wrote:
>> 2. shadow/hap_log_dirty_enable() and shadow/hap_log_dirty_disable()
>> These four functions were not changed. However, I really want to
>> create two common functions (paging_log_dirty_disable() and
>> paging_log_dirty_enable()) for them. To do this, it requires two
>> function pointers (*log_dirty_enable() and *log_dirty_disable()),
>> which point to shadow-specific code or hap-specific code. For
>> example, *log_dirty_enable() points to shadow_log_dirty_enable().
>> 
>> Tim, let me know if you like this approach.
> 
> Yep, that seems fine.
> 
>> 3. p2m_set_l1e_flags() is renamed to p2m_set_flags_global() as
>> requested. It does NOT walk P2M. Instead, it still relies on
>> set_p2m_entry() to walk P2M table.
>> 
>> The reason: I feel uncomfortable to duplicate the code of
>> set_p2m_entry() in this method. Most of them will be same as
>> set_p2m_entry() and p2m_next_level(). What is your opinion?
> 
> I think it'd be fairly easy to do with a few nested loops since it
> doesn't need to care about contents or changing the shape of the
> tree, or have to handle different PT layouts at run-time.  
> 
> I was worried about the cost of reading the struct page-info and the
> m2p and doing _two_ traverses of the p2m for every frame in the
> domain; but I don't suppose that enabling log-dirty mode is too
> time-critical an operation. :)   
> 
> Cheers,
> 
> Tim.

[-- Attachment #2: live_migrate_patch_all.txt --]
[-- Type: text/plain, Size: 47671 bytes --]

diff -r 45516ac94c9f xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/hvm/hvm.c	Wed Jun 06 12:05:42 2007 -0500
@@ -568,7 +568,7 @@ static int __hvm_copy(void *buf, paddr_t
         if ( dir )
         {
             memcpy(p, buf, count); /* dir == TRUE:  *to* guest */
-            mark_dirty(current->domain, mfn);
+            paging_mark_dirty(current->domain, mfn);
         }
         else
             memcpy(buf, p, count); /* dir == FALSE: *from guest */
diff -r 45516ac94c9f xen/arch/x86/hvm/io.c
--- a/xen/arch/x86/hvm/io.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/hvm/io.c	Wed Jun 06 12:05:56 2007 -0500
@@ -865,7 +865,7 @@ void hvm_io_assist(void)
     if ( (p->dir == IOREQ_READ) && p->data_is_ptr )
     {
         gmfn = get_mfn_from_gpfn(paging_gva_to_gfn(v, p->data));
-        mark_dirty(d, gmfn);
+        paging_mark_dirty(d, gmfn);
     }
 
  out:
diff -r 45516ac94c9f xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/hvm/svm/svm.c	Thu Jun 07 06:36:04 2007 -0500
@@ -1033,8 +1033,8 @@ static int svm_do_nested_pgfault(paddr_t
         return 1;
     }
 
-    /* We should not reach here. Otherwise, P2M table is not correct.*/
-    return 0;
+    paging_mark_dirty(current->domain, get_mfn_from_gpfn(gpa >> PAGE_SHIFT));
+    return p2m_set_flags(current->domain, gpa, __PAGE_HYPERVISOR|_PAGE_USER);
 }
 
 static void svm_do_no_device_fault(struct vmcb_struct *vmcb)
diff -r 45516ac94c9f xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm.c	Wed Jun 06 12:05:10 2007 -0500
@@ -1556,7 +1556,7 @@ int alloc_page_type(struct page_info *pa
 
     /* A page table is dirtied when its type count becomes non-zero. */
     if ( likely(owner != NULL) )
-        mark_dirty(owner, page_to_mfn(page));
+        paging_mark_dirty(owner, page_to_mfn(page));
 
     switch ( type & PGT_type_mask )
     {
@@ -1602,7 +1602,7 @@ void free_page_type(struct page_info *pa
         if ( unlikely(paging_mode_enabled(owner)) )
         {
             /* A page table is dirtied when its type count becomes zero. */
-            mark_dirty(owner, page_to_mfn(page));
+            paging_mark_dirty(owner, page_to_mfn(page));
 
             if ( shadow_mode_refcounts(owner) )
                 return;
@@ -2057,7 +2057,7 @@ int do_mmuext_op(
             }
 
             /* A page is dirtied when its pin status is set. */
-            mark_dirty(d, mfn);
+            paging_mark_dirty(d, mfn);
            
             /* We can race domain destruction (domain_relinquish_resources). */
             if ( unlikely(this_cpu(percpu_mm_info).foreign != NULL) )
@@ -2089,7 +2089,7 @@ int do_mmuext_op(
                 put_page_and_type(page);
                 put_page(page);
                 /* A page is dirtied when its pin status is cleared. */
-                mark_dirty(d, mfn);
+                paging_mark_dirty(d, mfn);
             }
             else
             {
@@ -2424,7 +2424,7 @@ int do_mmu_update(
             set_gpfn_from_mfn(mfn, gpfn);
             okay = 1;
 
-            mark_dirty(FOREIGNDOM, mfn);
+            paging_mark_dirty(FOREIGNDOM, mfn);
 
             put_page(mfn_to_page(mfn));
             break;
@@ -3005,7 +3005,7 @@ long do_update_descriptor(u64 pa, u64 de
         break;
     }
 
-    mark_dirty(dom, mfn);
+    paging_mark_dirty(dom, mfn);
 
     /* All is good so make the update. */
     gdt_pent = map_domain_page(mfn);
diff -r 45516ac94c9f xen/arch/x86/mm/hap/hap.c
--- a/xen/arch/x86/mm/hap/hap.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/hap/hap.c	Thu Jun 07 05:37:40 2007 -0500
@@ -49,6 +49,35 @@
 #undef page_to_mfn
 #define page_to_mfn(_pg) (_mfn((_pg) - frame_table))
 
+/************************************************/
+/*            HAP LOG DIRTY SUPPORT             */
+/************************************************/
+/* hap code to call when log_dirty is enable. return 0 if no problem found. */
+int hap_enable_log_dirty(struct domain *d)
+{
+    /* turn on PG_log_dirty bit in paging mode */
+    d->arch.paging.mode |= PG_log_dirty;
+    p2m_set_flags_global(d, (_PAGE_PRESENT|_PAGE_USER));
+    flush_tlb_all_pge();
+
+    return 0;
+}
+
+int hap_disable_log_dirty(struct domain *d)
+{
+    /* log dirty already accquired lock to guard this code */
+    d->arch.paging.mode &= ~PG_log_dirty;
+    p2m_set_flags_global(d, __PAGE_HYPERVISOR|_PAGE_USER);
+    
+    return 1;
+}
+
+void hap_clean_dirty_bitmap(struct domain *d)
+{
+    /* mark physical memory as not writetable and flush the TLB */
+    p2m_set_flags_global(d, (_PAGE_PRESENT|_PAGE_USER));
+    flush_tlb_all_pge();
+}
 /************************************************/
 /*             HAP SUPPORT FUNCTIONS            */
 /************************************************/
@@ -421,6 +450,10 @@ int hap_enable(struct domain *d, u32 mod
         }
     }
 
+    /* initialize log dirty here */
+    paging_log_dirty_init(d, hap_enable_log_dirty, hap_disable_log_dirty,
+                          hap_clean_dirty_bitmap);
+
     /* allocate P2m table */
     if ( mode & PG_translate ) {
         rv = p2m_alloc_table(d, hap_alloc_p2m_page, hap_free_p2m_page);
@@ -478,6 +511,8 @@ void hap_teardown(struct domain *d)
                       d->arch.paging.hap.free_pages,
                       d->arch.paging.hap.p2m_pages);
         hap_set_allocation(d, 0, NULL);
+        /* release the log-dirty bitmap of dirty pages */
+        paging_free_log_dirty_bitmap(d);
         HAP_PRINTK("teardown done."
                       "  pages total = %u, free = %u, p2m=%u\n",
                       d->arch.paging.hap.total_pages,
@@ -498,11 +533,6 @@ int hap_domctl(struct domain *d, xen_dom
 
     HERE_I_AM;
 
-    if ( unlikely(d == current->domain) ) {
-        gdprintk(XENLOG_INFO, "Don't try to do a hap op on yourself!\n");
-        return -EINVAL;
-    }
-    
     switch ( sc->op ) {
     case XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION:
         hap_lock(d);
diff -r 45516ac94c9f xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/p2m.c	Thu Jun 07 05:57:09 2007 -0500
@@ -169,7 +169,7 @@ p2m_next_level(struct domain *d, mfn_t *
 
 // Returns 0 on error (out of memory)
 static int
-set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, u32 l1e_flags)
 {
     // XXX -- this might be able to be faster iff current->domain == d
     mfn_t table_mfn = pagetable_get_mfn(d->arch.phys_table);
@@ -213,7 +213,7 @@ set_p2m_entry(struct domain *d, unsigned
         d->arch.p2m.max_mapped_pfn = gfn;
 
     if ( mfn_valid(mfn) )
-        entry_content = l1e_from_pfn(mfn_x(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
+        entry_content = l1e_from_pfn(mfn_x(mfn), l1e_flags);
     else
         entry_content = l1e_empty();
 
@@ -278,7 +278,7 @@ int p2m_alloc_table(struct domain *d,
         p2m_unlock(d);
         return -ENOMEM;
     }
-list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
+    list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
 
     p2m_top->count_info = 1;
     p2m_top->u.inuse.type_info = 
@@ -297,8 +297,8 @@ list_add_tail(&p2m_top->list, &d->arch.p
  
     /* Initialise physmap tables for slot zero. Other code assumes this. */
     gfn = 0;
-mfn = _mfn(INVALID_MFN);
-    if ( !set_p2m_entry(d, gfn, mfn) )
+    mfn = _mfn(INVALID_MFN);
+    if ( !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
         goto error;
 
     for ( entry = d->page_list.next;
@@ -316,7 +316,7 @@ mfn = _mfn(INVALID_MFN);
             (gfn != 0x55555555L)
 #endif
              && gfn != INVALID_M2P_ENTRY
-             && !set_p2m_entry(d, gfn, mfn) )
+             && !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
             goto error;
     }
 
@@ -497,7 +497,7 @@ static void audit_p2m(struct domain *d)
             /* This m2p entry is stale: the domain has another frame in
              * this physical slot.  No great disaster, but for neatness,
              * blow away the m2p entry. */ 
-            set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
+            set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY, __PAGE_HYPERVISOR|_PAGE_USER);
         }
 
         if ( test_linear && (gfn <= d->arch.p2m.max_mapped_pfn) )
@@ -626,7 +626,7 @@ p2m_remove_page(struct domain *d, unsign
     ASSERT(mfn_x(gfn_to_mfn(d, gfn)) == mfn);
     //ASSERT(mfn_to_gfn(d, mfn) == gfn);
 
-    set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+    set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
 }
 
@@ -659,7 +659,7 @@ guest_physmap_add_page(struct domain *d,
     omfn = gfn_to_mfn(d, gfn);
     if ( mfn_valid(omfn) )
     {
-        set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+        set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
         set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
     }
 
@@ -685,13 +685,129 @@ guest_physmap_add_page(struct domain *d,
         }
     }
 
-    set_p2m_entry(d, gfn, _mfn(mfn));
+    set_p2m_entry(d, gfn, _mfn(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, gfn);
 
     audit_p2m(d);
     p2m_unlock(d);
 }
 
+/* This function goes through P2M table and modify l1e flags of all pages. Note
+ * that physical base address of l1e is intact. This function can be used for
+ * special purpose, such as marking physical memory as NOT WRITABLE for
+ * tracking dirty pages during live migration.
+ */
+void p2m_set_flags_global(struct domain *d, u32 l1e_flags)
+{
+    unsigned long mfn, gfn;
+    l1_pgentry_t l1e_content;
+    l1_pgentry_t *l1e;
+    l2_pgentry_t *l2e;
+    int i1, i2;
+#if CONFIG_PAGING_LEVELS >= 3
+    l3_pgentry_t *l3e;
+    int i3;
+#if CONFIG_PAGING_LEVELS == 4
+    l4_pgentry_t *l4e;
+    int i4;
+#endif /* CONFIG_PAGING_LEVELS == 4 */
+#endif /* CONFIG_PAGING_LEVELS >= 3 */
+    
+    if ( !paging_mode_translate(d) )
+        return;
+ 
+    if ( pagetable_get_pfn(d->arch.phys_table) == 0 )
+        return;
+
+    p2m_lock(d);
+        
+#if CONFIG_PAGING_LEVELS == 4
+    l4e = map_domain_page(mfn_x(pagetable_get_mfn(d->arch.phys_table)));
+#elif CONFIG_PAGING_LEVELS == 3
+    l3e = map_domain_page(mfn_x(pagetable_get_mfn(d->arch.phys_table)));
+#else /* CONFIG_PAGING_LEVELS == 2 */
+    l2e = map_domain_page(mfn_x(pagetable_get_mfn(d->arch.phys_table)));
+#endif
+
+#if CONFIG_PAGING_LEVELS >= 3
+#if CONFIG_PAGING_LEVELS >= 4
+    for ( i4 = 0; i4 < L4_PAGETABLE_ENTRIES; i4++ ) 
+    {
+	if ( !(l4e_get_flags(l4e[i4]) & _PAGE_PRESENT) )
+	{
+	    continue;
+	}
+	l3e = map_domain_page(mfn_x(_mfn(l4e_get_pfn(l4e[i4]))));
+#endif /* now at levels 3 or 4... */
+	for ( i3 = 0; 
+	      i3 < ((CONFIG_PAGING_LEVELS==4) ? L3_PAGETABLE_ENTRIES : 8); 
+	      i3++ )
+	{
+	    if ( !(l3e_get_flags(l3e[i3]) & _PAGE_PRESENT) )
+	    {
+		continue;
+	    }
+	    l2e = map_domain_page(mfn_x(_mfn(l3e_get_pfn(l3e[i3]))));
+#endif /* all levels... */
+	    for ( i2 = 0; i2 < L2_PAGETABLE_ENTRIES; i2++ )
+	    {
+		if ( !(l2e_get_flags(l2e[i2]) & _PAGE_PRESENT) )
+		{
+		    continue;
+		}
+		l1e = map_domain_page(mfn_x(_mfn(l2e_get_pfn(l2e[i2]))));
+		
+		for ( i1 = 0; i1 < L1_PAGETABLE_ENTRIES; i1++, gfn++ )
+		{
+		    if ( !(l1e_get_flags(l1e[i1]) & _PAGE_PRESENT) )
+			continue;
+		    mfn = l1e_get_pfn(l1e[i1]);
+		    gfn = get_gpfn_from_mfn(mfn);
+		    /* create a new 1le entry using l1e_flags */
+		    l1e_content = l1e_from_pfn(mfn, l1e_flags);
+		    paging_write_p2m_entry(d, gfn, &l1e[i1], l1e_content, 1);
+		}
+		unmap_domain_page(l1e);
+	    }
+#if CONFIG_PAGING_LEVELS >= 3
+	    unmap_domain_page(l2e);
+	}
+#if CONFIG_PAGING_LEVELS >= 4
+	unmap_domain_page(l3e);
+    }
+#endif
+#endif
+
+#if CONFIG_PAGING_LEVELS == 4
+    unmap_domain_page(l4e);
+#elif CONFIG_PAGING_LEVELS == 3
+    unmap_domain_page(l3e);
+#else /* CONFIG_PAGING_LEVELS == 2 */
+    unmap_domain_page(l2e);
+#endif
+
+    p2m_unlock(d);
+}
+
+/* This function traces through P2M table and modifies l1e flags of a specific
+ * gpa.
+ */
+int p2m_set_flags(struct domain *d, paddr_t gpa, u32 l1e_flags)
+{
+    unsigned long gfn;
+    mfn_t mfn;
+
+    p2m_lock(d);
+
+    gfn = gpa >> PAGE_SHIFT;
+    mfn = gfn_to_mfn(d, gfn);
+    if ( mfn_valid(mfn) )
+        set_p2m_entry(d, gfn, mfn, l1e_flags);
+    
+    p2m_unlock(d);
+
+    return 1;
+}
 
 /*
  * Local variables:
diff -r 45516ac94c9f xen/arch/x86/mm/paging.c
--- a/xen/arch/x86/mm/paging.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/paging.c	Thu Jun 07 03:48:49 2007 -0500
@@ -25,6 +25,7 @@
 #include <asm/shadow.h>
 #include <asm/p2m.h>
 #include <asm/hap.h>
+#include <asm/guest_access.h>
 
 /* Xen command-line option to enable hardware-assisted paging */
 int opt_hap_enabled;
@@ -41,7 +42,269 @@ boolean_param("hap", opt_hap_enabled);
             debugtrace_printk("pgdebug: %s(): " _f, __func__, ##_a); \
     } while (0)
 
-
+/************************************************/
+/*              LOG DIRTY SUPPORT               */
+/************************************************/
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_page
+#define mfn_to_page(_m) (frame_table + mfn_x(_m))
+#undef mfn_valid
+#define mfn_valid(_mfn) (mfn_x(_mfn) < max_page)
+#undef page_to_mfn
+#define page_to_mfn(_pg) (_mfn((_pg) - frame_table))
+
+#define log_dirty_lock_init(_d)                                   \
+    do {                                                          \
+        spin_lock_init(&(_d)->arch.paging.log_dirty.lock);        \
+        (_d)->arch.paging.log_dirty.locker = -1;                  \
+        (_d)->arch.paging.log_dirty.locker_function = "nobody";   \
+    } while (0)
+
+#define log_dirty_lock(_d)                                                   \
+    do {                                                                     \
+        if (unlikely((_d)->arch.paging.log_dirty.locker==current->processor))\
+        {                                                                    \
+            printk("Error: paging log dirty lock held by %s\n",              \
+                   (_d)->arch.paging.log_dirty.locker_function);             \
+            BUG();                                                           \
+        }                                                                    \
+        spin_lock(&(_d)->arch.paging.log_dirty.lock);                        \
+        ASSERT((_d)->arch.paging.log_dirty.locker == -1);                    \
+        (_d)->arch.paging.log_dirty.locker = current->processor;             \
+        (_d)->arch.paging.log_dirty.locker_function = __func__;              \
+    } while (0)
+
+#define log_dirty_unlock(_d)                                              \
+    do {                                                                  \
+        ASSERT((_d)->arch.paging.log_dirty.locker == current->processor); \
+        (_d)->arch.paging.log_dirty.locker = -1;                          \
+        (_d)->arch.paging.log_dirty.locker_function = "nobody";           \
+        spin_unlock(&(_d)->arch.paging.log_dirty.lock);                   \
+    } while (0)
+
+/* allocate bitmap resources for log dirty */
+int paging_alloc_log_dirty_bitmap(struct domain *d)
+{
+    ASSERT(d->arch.paging.log_dirty.bitmap == NULL);
+    d->arch.paging.log_dirty.bitmap_size =
+        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
+    d->arch.paging.log_dirty.bitmap = 
+        xmalloc_array(unsigned long,
+                      d->arch.paging.log_dirty.bitmap_size / BITS_PER_LONG);
+    if ( d->arch.paging.log_dirty.bitmap == NULL )
+    {
+        d->arch.paging.log_dirty.bitmap_size = 0;
+        return -ENOMEM;
+    }
+    memset(d->arch.paging.log_dirty.bitmap, 0,
+           d->arch.paging.log_dirty.bitmap_size/8);
+
+    return 0;
+}
+
+/* free bitmap resources */
+void paging_free_log_dirty_bitmap(struct domain *d)
+{
+    d->arch.paging.log_dirty.bitmap_size = 0;
+    if ( d->arch.paging.log_dirty.bitmap )
+    {
+        xfree(d->arch.paging.log_dirty.bitmap);
+        d->arch.paging.log_dirty.bitmap = NULL;
+    }
+}
+
+int paging_log_dirty_enable(struct domain *d)
+{
+    int ret;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+
+    if ( paging_mode_log_dirty(d) )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    ret = paging_alloc_log_dirty_bitmap(d);
+    if ( ret != 0 )
+    {
+        paging_free_log_dirty_bitmap(d);
+        goto out;
+    }
+
+    ret = d->arch.paging.log_dirty.enable_log_dirty(d);
+    if ( ret != 0 )
+        paging_free_log_dirty_bitmap(d);
+
+ out:
+    log_dirty_unlock(d);
+    domain_unpause(d);
+    return ret;
+}
+
+int paging_log_dirty_disable(struct domain *d)
+{
+    int ret;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+    ret = d->arch.paging.log_dirty.disable_log_dirty(d);
+    if ( !paging_mode_log_dirty(d) )
+        paging_free_log_dirty_bitmap(d);
+    log_dirty_unlock(d);
+    domain_unpause(d);
+
+    return ret;
+}
+
+/* Mark a page as dirty */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn)
+{
+    unsigned long pfn;
+    mfn_t gmfn;
+
+    gmfn = _mfn(guest_mfn);
+
+    if ( !paging_mode_log_dirty(d) || !mfn_valid(gmfn) )
+        return;
+
+    log_dirty_lock(d);
+
+    ASSERT(d->arch.paging.log_dirty.bitmap != NULL);
+
+    /* We /really/ mean PFN here, even for non-translated guests. */
+    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
+
+    /*
+     * Values with the MSB set denote MFNs that aren't really part of the 
+     * domain's pseudo-physical memory map (e.g., the shared info frame).
+     * Nothing to do here...
+     */
+    if ( unlikely(!VALID_M2P(pfn)) )
+        return;
+
+    if ( likely(pfn < d->arch.paging.log_dirty.bitmap_size) ) 
+    { 
+        if ( !__test_and_set_bit(pfn, d->arch.paging.log_dirty.bitmap) )
+        {
+            PAGING_DEBUG(LOGDIRTY, 
+                         "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n",
+                         mfn_x(gmfn), pfn, d->domain_id);
+            d->arch.paging.log_dirty.dirty_count++;
+        }
+    }
+    else
+    {
+        PAGING_PRINTK("mark_dirty OOR! "
+                      "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
+                      "owner=%d c=%08x t=%" PRtype_info "\n",
+                      mfn_x(gmfn), 
+                      pfn, 
+                      d->arch.paging.log_dirty.bitmap_size,
+                      d->domain_id,
+                      (page_get_owner(mfn_to_page(gmfn))
+                       ? page_get_owner(mfn_to_page(gmfn))->domain_id
+                       : -1),
+                      mfn_to_page(gmfn)->count_info, 
+                      mfn_to_page(gmfn)->u.inuse.type_info);
+    }
+    
+    log_dirty_unlock(d);
+}
+
+/* Read a domain's log-dirty bitmap and stats.  If the operation is a CLEAN, 
+ * clear the bitmap and stats as well. */
+int paging_log_dirty_op(struct domain *d, struct xen_domctl_shadow_op *sc)
+{
+    int i, rv = 0, clean = 0, peek = 1;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+
+    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
+
+    PAGING_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", 
+                 (clean) ? "clean" : "peek",
+                 d->domain_id,
+                 d->arch.paging.log_dirty.fault_count, 
+                 d->arch.paging.log_dirty.dirty_count);
+
+    sc->stats.fault_count = d->arch.paging.log_dirty.fault_count;
+    sc->stats.dirty_count = d->arch.paging.log_dirty.dirty_count;
+    
+    if ( clean )
+    {
+        d->arch.paging.log_dirty.fault_count = 0;
+        d->arch.paging.log_dirty.dirty_count = 0;
+        
+        d->arch.paging.log_dirty.clean_dirty_bitmap(d);
+    }
+
+    if ( guest_handle_is_null(sc->dirty_bitmap) )
+        /* caller may have wanted just to clean the state or access stats. */
+        peek = 0;
+
+    if ( (peek || clean) && (d->arch.paging.log_dirty.bitmap == NULL) )
+    {
+        rv = -EINVAL; /* perhaps should be ENOMEM? */
+        goto out;
+    }
+ 
+    if ( sc->pages > d->arch.paging.log_dirty.bitmap_size )
+        sc->pages = d->arch.paging.log_dirty.bitmap_size;
+
+#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */
+    for ( i = 0; i < sc->pages; i += CHUNK )
+    {
+        int bytes = ((((sc->pages - i) > CHUNK)
+                      ? CHUNK
+                      : (sc->pages - i)) + 7) / 8;
+
+        if ( likely(peek) )
+        {
+            if ( copy_to_guest_offset(
+                sc->dirty_bitmap, i/8,
+                (uint8_t *)d->arch.paging.log_dirty.bitmap + (i/8), bytes) )
+            {
+                rv = -EFAULT;
+                goto out;
+            }
+        }
+
+        if ( clean )
+            memset((uint8_t *)d->arch.paging.log_dirty.bitmap + (i/8), 0, bytes);
+    }
+#undef CHUNK
+
+ out:
+    log_dirty_unlock(d);
+    domain_unpause(d);
+    return rv;
+}
+
+
+/* Note that this function takes three function pointers. Callers must supply
+ * these functions for log dirty code to call. This function usually is 
+ * invoked when paging is enabled. Check shadow_enable() and hap_enable() for 
+ * reference.
+ */
+void paging_log_dirty_init(struct domain *d,
+                           int    (*enable_log_dirty)(struct domain *d),
+                           int    (*disable_log_dirty)(struct domain *d),
+                           void   (*clean_dirty_bitmap)(struct domain *d))
+{
+    /* We initialize log dirty lock first */
+    log_dirty_lock_init(d);
+    
+    d->arch.paging.log_dirty.enable_log_dirty = enable_log_dirty;
+    d->arch.paging.log_dirty.disable_log_dirty = disable_log_dirty;
+    d->arch.paging.log_dirty.clean_dirty_bitmap = clean_dirty_bitmap;
+}
+
+/************************************************/
+/*           CODE FOR PAGING SUPPORT            */
+/************************************************/
 /* Domain paging struct initialization. */
 void paging_domain_init(struct domain *d)
 {
@@ -65,11 +328,60 @@ int paging_domctl(struct domain *d, xen_
 int paging_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,
                   XEN_GUEST_HANDLE(void) u_domctl)
 {
+    int rc;
+
+    if ( unlikely(d == current->domain) )
+    {
+        gdprintk(XENLOG_INFO, "Dom %u tried to do a shadow op on itself.\n",
+                 d->domain_id);
+        return -EINVAL;
+    }
+    
+    if ( unlikely(d->is_dying) )
+    {
+        gdprintk(XENLOG_INFO, "Ignoring shadow op on dying domain %u\n",
+                 d->domain_id);
+        return 0;
+    }
+
+    if ( unlikely(d->vcpu[0] == NULL) )
+    {
+        PAGING_ERROR("Shadow op on a domain (%u) with no vcpus\n",
+                     d->domain_id);
+        return -EINVAL;
+    }
+    
+    /* Code to handle log-dirty. Note that some log dirty operations
+     * piggy-back on shadow operations. For example, when 
+     * XEN_DOMCTL_SHADOW_OP_OFF is called, it first checks whether log dirty
+     * mode is enabled. If does, we disables log dirty and continues with 
+     * shadow code. For this reason, we need to further dispatch domctl 
+     * to next-level paging code (shadow or hap).
+     */
+    switch ( sc->op )
+    {
+    case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
+        return paging_log_dirty_enable(d);	
+	
+    case XEN_DOMCTL_SHADOW_OP_ENABLE:	
+        if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY )
+            return paging_log_dirty_enable(d);
+
+    case XEN_DOMCTL_SHADOW_OP_OFF:
+        if ( paging_mode_log_dirty(d) )
+            if ( (rc = paging_log_dirty_disable(d)) != 0 ) 
+                return rc;
+
+    case XEN_DOMCTL_SHADOW_OP_CLEAN:
+    case XEN_DOMCTL_SHADOW_OP_PEEK:
+	return paging_log_dirty_op(d, sc);
+    }
+	
     /* Here, dispatch domctl to the appropriate paging code */
     if ( opt_hap_enabled && is_hvm_domain(d) )
-        return hap_domctl(d, sc, u_domctl);
-    else
-        return shadow_domctl(d, sc, u_domctl);
+	return hap_domctl(d, sc, u_domctl);
+    else
+	return shadow_domctl(d, sc, u_domctl);
 }
 
 /* Call when destroying a domain */
diff -r 45516ac94c9f xen/arch/x86/mm/shadow/common.c
--- a/xen/arch/x86/mm/shadow/common.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/shadow/common.c	Wed Jun 06 12:58:27 2007 -0500
@@ -87,8 +87,6 @@ __initcall(shadow_audit_key_init);
 __initcall(shadow_audit_key_init);
 #endif /* SHADOW_AUDIT */
 
-static void sh_free_log_dirty_bitmap(struct domain *d);
-
 int _shadow_mode_refcounts(struct domain *d)
 {
     return shadow_mode_refcounts(d);
@@ -541,7 +539,7 @@ sh_validate_guest_entry(struct vcpu *v, 
     int result = 0;
     struct page_info *page = mfn_to_page(gmfn);
 
-    sh_mark_dirty(v->domain, gmfn);
+    paging_mark_dirty(v->domain, mfn_x(gmfn));
     
     // Determine which types of shadows are affected, and update each.
     //
@@ -2455,6 +2453,10 @@ int shadow_enable(struct domain *d, u32 
         }        
     }
 
+    /* initialize log dirty here */
+    paging_log_dirty_init(d, shadow_enable_log_dirty, 
+                          shadow_disable_log_dirty, shadow_clean_dirty_bitmap);
+
     /* Init the P2M table.  Must be done before we take the shadow lock 
      * to avoid possible deadlock. */
     if ( mode & PG_translate )
@@ -2463,6 +2465,7 @@ int shadow_enable(struct domain *d, u32 
         if (rv != 0)
             goto out_unlocked;
     }
+
 
     shadow_lock(d);
 
@@ -2565,7 +2568,7 @@ void shadow_teardown(struct domain *d)
         if (d->arch.paging.shadow.hash_table) 
             shadow_hash_teardown(d);
         /* Release the log-dirty bitmap of dirtied pages */
-        sh_free_log_dirty_bitmap(d);
+        paging_free_log_dirty_bitmap(d);
         /* Should not have any more memory held */
         SHADOW_PRINTK("teardown done."
                        "  Shadow pages total = %u, free = %u, p2m=%u\n",
@@ -2718,98 +2721,6 @@ static int shadow_test_disable(struct do
     domain_pause(d);
     shadow_lock(d);
     ret = shadow_one_bit_disable(d, PG_SH_enable);
-    shadow_unlock(d);
-    domain_unpause(d);
-
-    return ret;
-}
-
-static int
-sh_alloc_log_dirty_bitmap(struct domain *d)
-{
-    ASSERT(d->arch.paging.shadow.dirty_bitmap == NULL);
-    d->arch.paging.shadow.dirty_bitmap_size =
-        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
-    d->arch.paging.shadow.dirty_bitmap =
-        xmalloc_array(unsigned long,
-                      d->arch.paging.shadow.dirty_bitmap_size / BITS_PER_LONG);
-    if ( d->arch.paging.shadow.dirty_bitmap == NULL )
-    {
-        d->arch.paging.shadow.dirty_bitmap_size = 0;
-        return -ENOMEM;
-    }
-    memset(d->arch.paging.shadow.dirty_bitmap, 0,
-           d->arch.paging.shadow.dirty_bitmap_size/8);
-
-    return 0;
-}
-
-static void
-sh_free_log_dirty_bitmap(struct domain *d)
-{
-    d->arch.paging.shadow.dirty_bitmap_size = 0;
-    if ( d->arch.paging.shadow.dirty_bitmap )
-    {
-        xfree(d->arch.paging.shadow.dirty_bitmap);
-        d->arch.paging.shadow.dirty_bitmap = NULL;
-    }
-}
-
-static int shadow_log_dirty_enable(struct domain *d)
-{
-    int ret;
-
-    domain_pause(d);
-    shadow_lock(d);
-
-    if ( shadow_mode_log_dirty(d) )
-    {
-        ret = -EINVAL;
-        goto out;
-    }
-
-    if ( shadow_mode_enabled(d) )
-    {
-        /* This domain already has some shadows: need to clear them out 
-         * of the way to make sure that all references to guest memory are 
-         * properly write-protected */
-        shadow_blow_tables(d);
-    }
-
-#if (SHADOW_OPTIMIZATIONS & SHOPT_LINUX_L3_TOPLEVEL)
-    /* 32bit PV guests on 64bit xen behave like older 64bit linux: they
-     * change an l4e instead of cr3 to switch tables.  Give them the
-     * same optimization */
-    if ( is_pv_32on64_domain(d) )
-        d->arch.paging.shadow.opt_flags = SHOPT_LINUX_L3_TOPLEVEL;
-#endif
-
-    ret = sh_alloc_log_dirty_bitmap(d);
-    if ( ret != 0 )
-    {
-        sh_free_log_dirty_bitmap(d);
-        goto out;
-    }
-
-    ret = shadow_one_bit_enable(d, PG_log_dirty);
-    if ( ret != 0 )
-        sh_free_log_dirty_bitmap(d);
-
- out:
-    shadow_unlock(d);
-    domain_unpause(d);
-    return ret;
-}
-
-static int shadow_log_dirty_disable(struct domain *d)
-{
-    int ret;
-
-    domain_pause(d);
-    shadow_lock(d);
-    ret = shadow_one_bit_disable(d, PG_log_dirty);
-    if ( !shadow_mode_log_dirty(d) )
-        sh_free_log_dirty_bitmap(d);
     shadow_unlock(d);
     domain_unpause(d);
 
@@ -2892,150 +2803,62 @@ void shadow_convert_to_log_dirty(struct 
     BUG();
 }
 
-
-/* Read a domain's log-dirty bitmap and stats.  
- * If the operation is a CLEAN, clear the bitmap and stats as well. */
-static int shadow_log_dirty_op(
-    struct domain *d, struct xen_domctl_shadow_op *sc)
-{
-    int i, rv = 0, clean = 0, peek = 1;
-
-    domain_pause(d);
+/* Shadow specific code which is called in paging_log_dirty_enable().
+ * Return 0 if no problem found.
+ */
+int shadow_enable_log_dirty(struct domain *d)
+{
+    int ret;
+
+    /* shadow lock is required here */
     shadow_lock(d);
-
-    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
-
-    SHADOW_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", 
-                  (clean) ? "clean" : "peek",
-                  d->domain_id,
-                  d->arch.paging.shadow.fault_count, 
-                  d->arch.paging.shadow.dirty_count);
-
-    sc->stats.fault_count = d->arch.paging.shadow.fault_count;
-    sc->stats.dirty_count = d->arch.paging.shadow.dirty_count;
-
-    if ( clean )
-    {
-        /* Need to revoke write access to the domain's pages again.
-         * In future, we'll have a less heavy-handed approach to this,
-         * but for now, we just unshadow everything except Xen. */
+    if ( shadow_mode_enabled(d) )
+    {
+        /* This domain already has some shadows: need to clear them out 
+         * of the way to make sure that all references to guest memory are 
+         * properly write-protected */
         shadow_blow_tables(d);
-
-        d->arch.paging.shadow.fault_count = 0;
-        d->arch.paging.shadow.dirty_count = 0;
-    }
-
-    if ( guest_handle_is_null(sc->dirty_bitmap) )
-        /* caller may have wanted just to clean the state or access stats. */
-        peek = 0;
-
-    if ( (peek || clean) && (d->arch.paging.shadow.dirty_bitmap == NULL) )
-    {
-        rv = -EINVAL; /* perhaps should be ENOMEM? */
-        goto out;
-    }
- 
-    if ( sc->pages > d->arch.paging.shadow.dirty_bitmap_size )
-        sc->pages = d->arch.paging.shadow.dirty_bitmap_size;
-
-#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */
-    for ( i = 0; i < sc->pages; i += CHUNK )
-    {
-        int bytes = ((((sc->pages - i) > CHUNK)
-                      ? CHUNK
-                      : (sc->pages - i)) + 7) / 8;
-
-        if ( likely(peek) )
-        {
-            if ( copy_to_guest_offset(
-                sc->dirty_bitmap, i/8,
-                (uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), bytes) )
-            {
-                rv = -EFAULT;
-                goto out;
-            }
-        }
-
-        if ( clean )
-            memset((uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), 0, bytes);
-    }
-#undef CHUNK
-
- out:
+    }
+
+#if (SHADOW_OPTIMIZATIONS & SHOPT_LINUX_L3_TOPLEVEL)
+    /* 32bit PV guests on 64bit xen behave like older 64bit linux: they
+     * change an l4e instead of cr3 to switch tables.  Give them the
+     * same optimization */
+    if ( is_pv_32on64_domain(d) )
+        d->arch.paging.shadow.opt_flags = SHOPT_LINUX_L3_TOPLEVEL;
+#endif
+    
+    ret = shadow_one_bit_enable(d, PG_log_dirty);
     shadow_unlock(d);
-    domain_unpause(d);
-    return rv;
-}
-
-
-/* Mark a page as dirty */
-void sh_mark_dirty(struct domain *d, mfn_t gmfn)
-{
-    unsigned long pfn;
-    int do_locking;
-
-    if ( !shadow_mode_log_dirty(d) || !mfn_valid(gmfn) )
-        return;
-
-    /* Although this is an externally visible function, we do not know
-     * whether the shadow lock will be held when it is called (since it
-     * can be called from __hvm_copy during emulation).
-     * If the lock isn't held, take it for the duration of the call. */
-    do_locking = !shadow_locked_by_me(d);
-    if ( do_locking ) 
-    { 
-        shadow_lock(d);
-        /* Check the mode again with the lock held */ 
-        if ( unlikely(!shadow_mode_log_dirty(d)) )
-        {
-            shadow_unlock(d);
-            return;
-        }
-    }
-
-    ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL);
-
-    /* We /really/ mean PFN here, even for non-translated guests. */
-    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
-
-    /*
-     * Values with the MSB set denote MFNs that aren't really part of the 
-     * domain's pseudo-physical memory map (e.g., the shared info frame).
-     * Nothing to do here...
-     */
-    if ( unlikely(!VALID_M2P(pfn)) )
-        return;
-
-    /* N.B. Can use non-atomic TAS because protected by shadow_lock. */
-    if ( likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) ) 
-    { 
-        if ( !__test_and_set_bit(pfn, d->arch.paging.shadow.dirty_bitmap) )
-        {
-            SHADOW_DEBUG(LOGDIRTY, 
-                          "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n",
-                          mfn_x(gmfn), pfn, d->domain_id);
-            d->arch.paging.shadow.dirty_count++;
-        }
-    }
-    else
-    {
-        SHADOW_PRINTK("mark_dirty OOR! "
-                       "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
-                       "owner=%d c=%08x t=%" PRtype_info "\n",
-                       mfn_x(gmfn), 
-                       pfn, 
-                       d->arch.paging.shadow.dirty_bitmap_size,
-                       d->domain_id,
-                       (page_get_owner(mfn_to_page(gmfn))
-                        ? page_get_owner(mfn_to_page(gmfn))->domain_id
-                        : -1),
-                       mfn_to_page(gmfn)->count_info, 
-                       mfn_to_page(gmfn)->u.inuse.type_info);
-    }
-
-    if ( do_locking ) shadow_unlock(d);
-}
-
+
+    return ret;
+}
+
+/* shadow specfic code which is called in paging_log_dirty_disable() */
+int shadow_disable_log_dirty(struct domain *d)
+{
+    int ret;
+
+    /* shadow lock is required here */    
+    shadow_lock(d);
+    ret = shadow_one_bit_disable(d, PG_log_dirty);
+    shadow_unlock(d);
+    
+    return ret;
+}
+
+/* This function is called when we CLEAN log dirty bitmap. See 
+ * paging_log_dirty_op() for details. 
+ */
+void shadow_clean_dirty_bitmap(struct domain *d)
+{
+    shadow_lock(d);
+    /* Need to revoke write access to the domain's pages again.
+     * In future, we'll have a less heavy-handed approach to this,
+     * but for now, we just unshadow everything except Xen. */
+    shadow_blow_tables(d);
+    shadow_unlock(d);
+}
 /**************************************************************************/
 /* Shadow-control XEN_DOMCTL dispatcher */
 
@@ -3045,33 +2868,9 @@ int shadow_domctl(struct domain *d,
 {
     int rc, preempted = 0;
 
-    if ( unlikely(d == current->domain) )
-    {
-        gdprintk(XENLOG_INFO, "Dom %u tried to do a shadow op on itself.\n",
-                 d->domain_id);
-        return -EINVAL;
-    }
-
-    if ( unlikely(d->is_dying) )
-    {
-        gdprintk(XENLOG_INFO, "Ignoring shadow op on dying domain %u\n",
-                 d->domain_id);
-        return 0;
-    }
-
-    if ( unlikely(d->vcpu[0] == NULL) )
-    {
-        SHADOW_ERROR("Shadow op on a domain (%u) with no vcpus\n",
-                     d->domain_id);
-        return -EINVAL;
-    }
-
     switch ( sc->op )
     {
     case XEN_DOMCTL_SHADOW_OP_OFF:
-        if ( shadow_mode_log_dirty(d) )
-            if ( (rc = shadow_log_dirty_disable(d)) != 0 ) 
-                return rc;
         if ( d->arch.paging.mode == PG_SH_enable )
             if ( (rc = shadow_test_disable(d)) != 0 ) 
                 return rc;
@@ -3080,19 +2879,10 @@ int shadow_domctl(struct domain *d,
     case XEN_DOMCTL_SHADOW_OP_ENABLE_TEST:
         return shadow_test_enable(d);
 
-    case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
-        return shadow_log_dirty_enable(d);
-
     case XEN_DOMCTL_SHADOW_OP_ENABLE_TRANSLATE:
         return shadow_enable(d, PG_refcounts|PG_translate);
 
-    case XEN_DOMCTL_SHADOW_OP_CLEAN:
-    case XEN_DOMCTL_SHADOW_OP_PEEK:
-        return shadow_log_dirty_op(d, sc);
-
     case XEN_DOMCTL_SHADOW_OP_ENABLE:
-        if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY )
-            return shadow_log_dirty_enable(d);
         return shadow_enable(d, sc->mode << PG_mode_shift);
 
     case XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION:
diff -r 45516ac94c9f xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/shadow/multi.c	Wed Jun 06 12:08:38 2007 -0500
@@ -457,7 +457,7 @@ static u32 guest_set_ad_bits(struct vcpu
     }
 
     /* Set the bit(s) */
-    sh_mark_dirty(v->domain, gmfn);
+    paging_mark_dirty(v->domain, mfn_x(gmfn));
     SHADOW_DEBUG(A_AND_D, "gfn = %" SH_PRI_gfn ", "
                  "old flags = %#x, new flags = %#x\n", 
                  gfn_x(guest_l1e_get_gfn(*ep)), guest_l1e_get_flags(*ep), 
@@ -717,7 +717,7 @@ _sh_propagate(struct vcpu *v,
     if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) )
     {
         if ( ft & FETCH_TYPE_WRITE ) 
-            sh_mark_dirty(d, target_mfn);
+            paging_mark_dirty(d, mfn_x(target_mfn));
         else if ( !sh_mfn_is_dirty(d, target_mfn) )
             sflags &= ~_PAGE_RW;
     }
@@ -2856,7 +2856,7 @@ static int sh_page_fault(struct vcpu *v,
     }
 
     perfc_incr(shadow_fault_fixed);
-    d->arch.paging.shadow.fault_count++;
+    d->arch.paging.log_dirty.fault_count++;
     reset_early_unshadow(v);
 
  done:
@@ -4058,7 +4058,7 @@ sh_x86_emulate_write(struct vcpu *v, uns
     else
         reset_early_unshadow(v);
     
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
@@ -4114,7 +4114,7 @@ sh_x86_emulate_cmpxchg(struct vcpu *v, u
     else
         reset_early_unshadow(v);
 
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
@@ -4158,7 +4158,7 @@ sh_x86_emulate_cmpxchg8b(struct vcpu *v,
     else
         reset_early_unshadow(v);
 
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
diff -r 45516ac94c9f xen/arch/x86/mm/shadow/private.h
--- a/xen/arch/x86/mm/shadow/private.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/shadow/private.h	Wed Jun 06 09:12:08 2007 -0500
@@ -496,13 +496,13 @@ sh_mfn_is_dirty(struct domain *d, mfn_t 
 {
     unsigned long pfn;
     ASSERT(shadow_mode_log_dirty(d));
-    ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL);
+    ASSERT(d->arch.paging.log_dirty.bitmap != NULL);
 
     /* We /really/ mean PFN here, even for non-translated guests. */
     pfn = get_gpfn_from_mfn(mfn_x(gmfn));
     if ( likely(VALID_M2P(pfn))
-         && likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) 
-         && test_bit(pfn, d->arch.paging.shadow.dirty_bitmap) )
+         && likely(pfn < d->arch.paging.log_dirty.bitmap_size) 
+         && test_bit(pfn, d->arch.paging.log_dirty.bitmap) )
         return 1;
 
     return 0;
diff -r 45516ac94c9f xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/domain.h	Wed Jun 06 12:34:24 2007 -0500
@@ -92,14 +92,6 @@ struct shadow_domain {
 
     /* Fast MMIO path heuristic */
     int has_fast_mmio_entries;
-
-    /* Shadow log-dirty bitmap */
-    unsigned long *dirty_bitmap;
-    unsigned int dirty_bitmap_size;  /* in pages, bit per page */
-
-    /* Shadow log-dirty mode stats */
-    unsigned int fault_count;
-    unsigned int dirty_count;
 };
 
 struct shadow_vcpu {
@@ -134,7 +126,6 @@ struct hap_domain {
 /************************************************/
 /*       p2m handling                           */
 /************************************************/
-
 struct p2m_domain {
     /* Lock that protects updates to the p2m */
     spinlock_t         lock;
@@ -156,16 +147,36 @@ struct p2m_domain {
 /************************************************/
 /*       common paging data structure           */
 /************************************************/
+struct log_dirty_domain {
+    /* log-dirty lock */
+    spinlock_t     lock;
+    int            locker; /* processor that holds the lock */
+    const char    *locker_function; /* func that took it */
+
+    /* log-dirty bitmap to record dirty pages */
+    unsigned long *bitmap;
+    unsigned int   bitmap_size;  /* in pages, bit per page */
+
+    /* log-dirty mode stats */
+    unsigned int   fault_count;
+    unsigned int   dirty_count;
+
+    /* functions which are paging mode specific */
+    int            (*enable_log_dirty   )(struct domain *d);
+    int            (*disable_log_dirty  )(struct domain *d);
+    void           (*clean_dirty_bitmap )(struct domain *d);
+};
+
 struct paging_domain {
-    u32               mode;  /* flags to control paging operation */
-
+    /* flags to control paging operation */
+    u32                     mode;
     /* extension for shadow paging support */
-    struct shadow_domain shadow;
-
-    /* Other paging assistance code will have structs here */
-    struct hap_domain    hap;
-};
-
+    struct shadow_domain    shadow;
+    /* extension for hardware-assited paging */
+    struct hap_domain       hap;
+    /* log dirty support */
+    struct log_dirty_domain log_dirty;
+};
 struct paging_vcpu {
     /* Pointers to mode-specific entry points. */
     struct paging_mode *mode;
diff -r 45516ac94c9f xen/include/asm-x86/grant_table.h
--- a/xen/include/asm-x86/grant_table.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/grant_table.h	Wed Jun 06 12:03:21 2007 -0500
@@ -31,7 +31,7 @@ int replace_grant_host_mapping(
 #define gnttab_shared_gmfn(d, t, i)                     \
     (mfn_to_gmfn(d, gnttab_shared_mfn(d, t, i)))
 
-#define gnttab_mark_dirty(d, f) mark_dirty((d), (f))
+#define gnttab_mark_dirty(d, f) paging_mark_dirty((d), (f))
 
 static inline void gnttab_clear_flag(unsigned long nr, uint16_t *addr)
 {
diff -r 45516ac94c9f xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/p2m.h	Thu Jun 07 05:37:12 2007 -0500
@@ -129,6 +129,11 @@ void guest_physmap_remove_page(struct do
 void guest_physmap_remove_page(struct domain *d, unsigned long gfn,
                                unsigned long mfn);
 
+/* set P2M table l1e flags */
+void p2m_set_flags_global(struct domain *d, u32 l1e_flags);
+
+/* set P2M table l1e flags for a gpa */
+int p2m_set_flags(struct domain *d, paddr_t gpa, u32 l1e_flags);
 
 #endif /* _XEN_P2M_H */
 
diff -r 45516ac94c9f xen/include/asm-x86/paging.h
--- a/xen/include/asm-x86/paging.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/paging.h	Wed Jun 06 12:36:54 2007 -0500
@@ -62,6 +62,9 @@
 #define paging_mode_log_dirty(_d) ((_d)->arch.paging.mode & PG_log_dirty)
 #define paging_mode_translate(_d) ((_d)->arch.paging.mode & PG_translate)
 #define paging_mode_external(_d)  ((_d)->arch.paging.mode & PG_external)
+
+/* flags used for paging debug */
+#define PAGING_DEBUG_LOGDIRTY 0
 
 /******************************************************************************
  * The equivalent for a particular vcpu of a shadowed domain. */
@@ -136,6 +139,29 @@ struct paging_mode {
     struct shadow_paging_mode shadow;
 };
 
+/*****************************************************************************
+ * Log dirty code */
+
+/* allocate log dirty bitmap resource for recording dirty pages */
+int paging_alloc_log_dirty_bitmap(struct domain *d);
+
+/* free log dirty bitmap resource */
+void paging_free_log_dirty_bitmap(struct domain *d);
+
+/* enable log dirty */
+int paging_log_dirty_enable(struct domain *d);
+
+/* disable log dirty */
+int paging_log_dirty_disable(struct domain *d);
+
+/* log dirty initialization */
+void paging_log_dirty_init(struct domain *d,
+                           int  (*enable_log_dirty)(struct domain *d),
+                           int  (*disable_log_dirty)(struct domain *d),
+                           void (*clean_dirty_bitmap)(struct domain *d));
+
+/* mark a page as dirty */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn);
 
 /*****************************************************************************
  * Entry points into the paging-assistance code */
diff -r 45516ac94c9f xen/include/asm-x86/shadow.h
--- a/xen/include/asm-x86/shadow.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/shadow.h	Wed Jun 06 12:37:52 2007 -0500
@@ -75,16 +75,14 @@ void shadow_teardown(struct domain *d);
 /* Call once all of the references to the domain have gone away */
 void shadow_final_teardown(struct domain *d);
 
-/* Mark a page as dirty in the log-dirty bitmap: called when Xen 
- * makes changes to guest memory on its behalf. */
-void sh_mark_dirty(struct domain *d, mfn_t gmfn);
-/* Cleaner version so we don't pepper shadow_mode tests all over the place */
-static inline void mark_dirty(struct domain *d, unsigned long gmfn)
-{
-    if ( unlikely(shadow_mode_log_dirty(d)) )
-        /* See the comment about locking in sh_mark_dirty */
-        sh_mark_dirty(d, _mfn(gmfn));
-}
+/* shadow code to call when log dirty is enabled */
+int shadow_enable_log_dirty(struct domain *d);
+
+/* shadow code to call when log dirty is disabled */
+int shadow_disable_log_dirty(struct domain *d);
+
+/* shadow code to call when bitmap is being cleaned */
+void shadow_clean_dirty_bitmap(struct domain *d);
 
 /* Update all the things that are derived from the guest's CR0/CR3/CR4.
  * Called to initialize paging structures if the paging mode

[-- Attachment #3: live_migrate_interface_patch.txt --]
[-- Type: text/plain, Size: 37941 bytes --]

diff -r 45516ac94c9f -r 9bc6a196ad0e xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/hvm/hvm.c	Thu Jun 07 03:53:59 2007 -0500
@@ -568,7 +568,7 @@ static int __hvm_copy(void *buf, paddr_t
         if ( dir )
         {
             memcpy(p, buf, count); /* dir == TRUE:  *to* guest */
-            mark_dirty(current->domain, mfn);
+            paging_mark_dirty(current->domain, mfn);
         }
         else
             memcpy(buf, p, count); /* dir == FALSE: *from guest */
diff -r 45516ac94c9f -r 9bc6a196ad0e xen/arch/x86/hvm/io.c
--- a/xen/arch/x86/hvm/io.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/hvm/io.c	Thu Jun 07 03:53:59 2007 -0500
@@ -865,7 +865,7 @@ void hvm_io_assist(void)
     if ( (p->dir == IOREQ_READ) && p->data_is_ptr )
     {
         gmfn = get_mfn_from_gpfn(paging_gva_to_gfn(v, p->data));
-        mark_dirty(d, gmfn);
+        paging_mark_dirty(d, gmfn);
     }
 
  out:
diff -r 45516ac94c9f -r 9bc6a196ad0e xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm.c	Thu Jun 07 03:53:59 2007 -0500
@@ -1556,7 +1556,7 @@ int alloc_page_type(struct page_info *pa
 
     /* A page table is dirtied when its type count becomes non-zero. */
     if ( likely(owner != NULL) )
-        mark_dirty(owner, page_to_mfn(page));
+        paging_mark_dirty(owner, page_to_mfn(page));
 
     switch ( type & PGT_type_mask )
     {
@@ -1602,7 +1602,7 @@ void free_page_type(struct page_info *pa
         if ( unlikely(paging_mode_enabled(owner)) )
         {
             /* A page table is dirtied when its type count becomes zero. */
-            mark_dirty(owner, page_to_mfn(page));
+            paging_mark_dirty(owner, page_to_mfn(page));
 
             if ( shadow_mode_refcounts(owner) )
                 return;
@@ -2057,7 +2057,7 @@ int do_mmuext_op(
             }
 
             /* A page is dirtied when its pin status is set. */
-            mark_dirty(d, mfn);
+            paging_mark_dirty(d, mfn);
            
             /* We can race domain destruction (domain_relinquish_resources). */
             if ( unlikely(this_cpu(percpu_mm_info).foreign != NULL) )
@@ -2089,7 +2089,7 @@ int do_mmuext_op(
                 put_page_and_type(page);
                 put_page(page);
                 /* A page is dirtied when its pin status is cleared. */
-                mark_dirty(d, mfn);
+                paging_mark_dirty(d, mfn);
             }
             else
             {
@@ -2424,7 +2424,7 @@ int do_mmu_update(
             set_gpfn_from_mfn(mfn, gpfn);
             okay = 1;
 
-            mark_dirty(FOREIGNDOM, mfn);
+            paging_mark_dirty(FOREIGNDOM, mfn);
 
             put_page(mfn_to_page(mfn));
             break;
@@ -3005,7 +3005,7 @@ long do_update_descriptor(u64 pa, u64 de
         break;
     }
 
-    mark_dirty(dom, mfn);
+    paging_mark_dirty(dom, mfn);
 
     /* All is good so make the update. */
     gdt_pent = map_domain_page(mfn);
diff -r 45516ac94c9f -r 9bc6a196ad0e xen/arch/x86/mm/hap/hap.c
--- a/xen/arch/x86/mm/hap/hap.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/hap/hap.c	Thu Jun 07 03:53:59 2007 -0500
@@ -498,11 +498,6 @@ int hap_domctl(struct domain *d, xen_dom
 
     HERE_I_AM;
 
-    if ( unlikely(d == current->domain) ) {
-        gdprintk(XENLOG_INFO, "Don't try to do a hap op on yourself!\n");
-        return -EINVAL;
-    }
-    
     switch ( sc->op ) {
     case XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION:
         hap_lock(d);
diff -r 45516ac94c9f -r 9bc6a196ad0e xen/arch/x86/mm/paging.c
--- a/xen/arch/x86/mm/paging.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/paging.c	Thu Jun 07 03:53:59 2007 -0500
@@ -25,6 +25,7 @@
 #include <asm/shadow.h>
 #include <asm/p2m.h>
 #include <asm/hap.h>
+#include <asm/guest_access.h>
 
 /* Xen command-line option to enable hardware-assisted paging */
 int opt_hap_enabled;
@@ -41,7 +42,269 @@ boolean_param("hap", opt_hap_enabled);
             debugtrace_printk("pgdebug: %s(): " _f, __func__, ##_a); \
     } while (0)
 
-
+/************************************************/
+/*              LOG DIRTY SUPPORT               */
+/************************************************/
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_page
+#define mfn_to_page(_m) (frame_table + mfn_x(_m))
+#undef mfn_valid
+#define mfn_valid(_mfn) (mfn_x(_mfn) < max_page)
+#undef page_to_mfn
+#define page_to_mfn(_pg) (_mfn((_pg) - frame_table))
+
+#define log_dirty_lock_init(_d)                                   \
+    do {                                                          \
+        spin_lock_init(&(_d)->arch.paging.log_dirty.lock);        \
+        (_d)->arch.paging.log_dirty.locker = -1;                  \
+        (_d)->arch.paging.log_dirty.locker_function = "nobody";   \
+    } while (0)
+
+#define log_dirty_lock(_d)                                                   \
+    do {                                                                     \
+        if (unlikely((_d)->arch.paging.log_dirty.locker==current->processor))\
+        {                                                                    \
+            printk("Error: paging log dirty lock held by %s\n",              \
+                   (_d)->arch.paging.log_dirty.locker_function);             \
+            BUG();                                                           \
+        }                                                                    \
+        spin_lock(&(_d)->arch.paging.log_dirty.lock);                        \
+        ASSERT((_d)->arch.paging.log_dirty.locker == -1);                    \
+        (_d)->arch.paging.log_dirty.locker = current->processor;             \
+        (_d)->arch.paging.log_dirty.locker_function = __func__;              \
+    } while (0)
+
+#define log_dirty_unlock(_d)                                              \
+    do {                                                                  \
+        ASSERT((_d)->arch.paging.log_dirty.locker == current->processor); \
+        (_d)->arch.paging.log_dirty.locker = -1;                          \
+        (_d)->arch.paging.log_dirty.locker_function = "nobody";           \
+        spin_unlock(&(_d)->arch.paging.log_dirty.lock);                   \
+    } while (0)
+
+/* allocate bitmap resources for log dirty */
+int paging_alloc_log_dirty_bitmap(struct domain *d)
+{
+    ASSERT(d->arch.paging.log_dirty.bitmap == NULL);
+    d->arch.paging.log_dirty.bitmap_size =
+        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
+    d->arch.paging.log_dirty.bitmap = 
+        xmalloc_array(unsigned long,
+                      d->arch.paging.log_dirty.bitmap_size / BITS_PER_LONG);
+    if ( d->arch.paging.log_dirty.bitmap == NULL )
+    {
+        d->arch.paging.log_dirty.bitmap_size = 0;
+        return -ENOMEM;
+    }
+    memset(d->arch.paging.log_dirty.bitmap, 0,
+           d->arch.paging.log_dirty.bitmap_size/8);
+
+    return 0;
+}
+
+/* free bitmap resources */
+void paging_free_log_dirty_bitmap(struct domain *d)
+{
+    d->arch.paging.log_dirty.bitmap_size = 0;
+    if ( d->arch.paging.log_dirty.bitmap )
+    {
+        xfree(d->arch.paging.log_dirty.bitmap);
+        d->arch.paging.log_dirty.bitmap = NULL;
+    }
+}
+
+int paging_log_dirty_enable(struct domain *d)
+{
+    int ret;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+
+    if ( paging_mode_log_dirty(d) )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    ret = paging_alloc_log_dirty_bitmap(d);
+    if ( ret != 0 )
+    {
+        paging_free_log_dirty_bitmap(d);
+        goto out;
+    }
+
+    ret = d->arch.paging.log_dirty.enable_log_dirty(d);
+    if ( ret != 0 )
+        paging_free_log_dirty_bitmap(d);
+
+ out:
+    log_dirty_unlock(d);
+    domain_unpause(d);
+    return ret;
+}
+
+int paging_log_dirty_disable(struct domain *d)
+{
+    int ret;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+    ret = d->arch.paging.log_dirty.disable_log_dirty(d);
+    if ( !paging_mode_log_dirty(d) )
+        paging_free_log_dirty_bitmap(d);
+    log_dirty_unlock(d);
+    domain_unpause(d);
+
+    return ret;
+}
+
+/* Mark a page as dirty */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn)
+{
+    unsigned long pfn;
+    mfn_t gmfn;
+
+    gmfn = _mfn(guest_mfn);
+
+    if ( !paging_mode_log_dirty(d) || !mfn_valid(gmfn) )
+        return;
+
+    log_dirty_lock(d);
+
+    ASSERT(d->arch.paging.log_dirty.bitmap != NULL);
+
+    /* We /really/ mean PFN here, even for non-translated guests. */
+    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
+
+    /*
+     * Values with the MSB set denote MFNs that aren't really part of the 
+     * domain's pseudo-physical memory map (e.g., the shared info frame).
+     * Nothing to do here...
+     */
+    if ( unlikely(!VALID_M2P(pfn)) )
+        return;
+
+    if ( likely(pfn < d->arch.paging.log_dirty.bitmap_size) ) 
+    { 
+        if ( !__test_and_set_bit(pfn, d->arch.paging.log_dirty.bitmap) )
+        {
+            PAGING_DEBUG(LOGDIRTY, 
+                         "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n",
+                         mfn_x(gmfn), pfn, d->domain_id);
+            d->arch.paging.log_dirty.dirty_count++;
+        }
+    }
+    else
+    {
+        PAGING_PRINTK("mark_dirty OOR! "
+                      "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
+                      "owner=%d c=%08x t=%" PRtype_info "\n",
+                      mfn_x(gmfn), 
+                      pfn, 
+                      d->arch.paging.log_dirty.bitmap_size,
+                      d->domain_id,
+                      (page_get_owner(mfn_to_page(gmfn))
+                       ? page_get_owner(mfn_to_page(gmfn))->domain_id
+                       : -1),
+                      mfn_to_page(gmfn)->count_info, 
+                      mfn_to_page(gmfn)->u.inuse.type_info);
+    }
+    
+    log_dirty_unlock(d);
+}
+
+/* Read a domain's log-dirty bitmap and stats.  If the operation is a CLEAN, 
+ * clear the bitmap and stats as well. */
+int paging_log_dirty_op(struct domain *d, struct xen_domctl_shadow_op *sc)
+{
+    int i, rv = 0, clean = 0, peek = 1;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+
+    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
+
+    PAGING_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", 
+                 (clean) ? "clean" : "peek",
+                 d->domain_id,
+                 d->arch.paging.log_dirty.fault_count, 
+                 d->arch.paging.log_dirty.dirty_count);
+
+    sc->stats.fault_count = d->arch.paging.log_dirty.fault_count;
+    sc->stats.dirty_count = d->arch.paging.log_dirty.dirty_count;
+    
+    if ( clean )
+    {
+        d->arch.paging.log_dirty.fault_count = 0;
+        d->arch.paging.log_dirty.dirty_count = 0;
+        
+        d->arch.paging.log_dirty.clean_dirty_bitmap(d);
+    }
+
+    if ( guest_handle_is_null(sc->dirty_bitmap) )
+        /* caller may have wanted just to clean the state or access stats. */
+        peek = 0;
+
+    if ( (peek || clean) && (d->arch.paging.log_dirty.bitmap == NULL) )
+    {
+        rv = -EINVAL; /* perhaps should be ENOMEM? */
+        goto out;
+    }
+ 
+    if ( sc->pages > d->arch.paging.log_dirty.bitmap_size )
+        sc->pages = d->arch.paging.log_dirty.bitmap_size;
+
+#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */
+    for ( i = 0; i < sc->pages; i += CHUNK )
+    {
+        int bytes = ((((sc->pages - i) > CHUNK)
+                      ? CHUNK
+                      : (sc->pages - i)) + 7) / 8;
+
+        if ( likely(peek) )
+        {
+            if ( copy_to_guest_offset(
+                sc->dirty_bitmap, i/8,
+                (uint8_t *)d->arch.paging.log_dirty.bitmap + (i/8), bytes) )
+            {
+                rv = -EFAULT;
+                goto out;
+            }
+        }
+
+        if ( clean )
+            memset((uint8_t *)d->arch.paging.log_dirty.bitmap + (i/8), 0, bytes);
+    }
+#undef CHUNK
+
+ out:
+    log_dirty_unlock(d);
+    domain_unpause(d);
+    return rv;
+}
+
+
+/* Note that this function takes three function pointers. Callers must supply
+ * these functions for log dirty code to call. This function usually is 
+ * invoked when paging is enabled. Check shadow_enable() and hap_enable() for 
+ * reference.
+ */
+void paging_log_dirty_init(struct domain *d,
+                           int    (*enable_log_dirty)(struct domain *d),
+                           int    (*disable_log_dirty)(struct domain *d),
+                           void   (*clean_dirty_bitmap)(struct domain *d))
+{
+    /* We initialize log dirty lock first */
+    log_dirty_lock_init(d);
+    
+    d->arch.paging.log_dirty.enable_log_dirty = enable_log_dirty;
+    d->arch.paging.log_dirty.disable_log_dirty = disable_log_dirty;
+    d->arch.paging.log_dirty.clean_dirty_bitmap = clean_dirty_bitmap;
+}
+
+/************************************************/
+/*           CODE FOR PAGING SUPPORT            */
+/************************************************/
 /* Domain paging struct initialization. */
 void paging_domain_init(struct domain *d)
 {
@@ -65,11 +328,60 @@ int paging_domctl(struct domain *d, xen_
 int paging_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,
                   XEN_GUEST_HANDLE(void) u_domctl)
 {
+    int rc;
+
+    if ( unlikely(d == current->domain) )
+    {
+        gdprintk(XENLOG_INFO, "Dom %u tried to do a shadow op on itself.\n",
+                 d->domain_id);
+        return -EINVAL;
+    }
+    
+    if ( unlikely(d->is_dying) )
+    {
+        gdprintk(XENLOG_INFO, "Ignoring shadow op on dying domain %u\n",
+                 d->domain_id);
+        return 0;
+    }
+
+    if ( unlikely(d->vcpu[0] == NULL) )
+    {
+        PAGING_ERROR("Shadow op on a domain (%u) with no vcpus\n",
+                     d->domain_id);
+        return -EINVAL;
+    }
+    
+    /* Code to handle log-dirty. Note that some log dirty operations
+     * piggy-back on shadow operations. For example, when 
+     * XEN_DOMCTL_SHADOW_OP_OFF is called, it first checks whether log dirty
+     * mode is enabled. If does, we disables log dirty and continues with 
+     * shadow code. For this reason, we need to further dispatch domctl 
+     * to next-level paging code (shadow or hap).
+     */
+    switch ( sc->op )
+    {
+    case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
+        return paging_log_dirty_enable(d);	
+	
+    case XEN_DOMCTL_SHADOW_OP_ENABLE:	
+        if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY )
+            return paging_log_dirty_enable(d);
+
+    case XEN_DOMCTL_SHADOW_OP_OFF:
+        if ( paging_mode_log_dirty(d) )
+            if ( (rc = paging_log_dirty_disable(d)) != 0 ) 
+                return rc;
+
+    case XEN_DOMCTL_SHADOW_OP_CLEAN:
+    case XEN_DOMCTL_SHADOW_OP_PEEK:
+	return paging_log_dirty_op(d, sc);
+    }
+	
     /* Here, dispatch domctl to the appropriate paging code */
     if ( opt_hap_enabled && is_hvm_domain(d) )
-        return hap_domctl(d, sc, u_domctl);
-    else
-        return shadow_domctl(d, sc, u_domctl);
+	return hap_domctl(d, sc, u_domctl);
+    else
+	return shadow_domctl(d, sc, u_domctl);
 }
 
 /* Call when destroying a domain */
diff -r 45516ac94c9f -r 9bc6a196ad0e xen/arch/x86/mm/shadow/common.c
--- a/xen/arch/x86/mm/shadow/common.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/shadow/common.c	Thu Jun 07 03:53:59 2007 -0500
@@ -87,8 +87,6 @@ __initcall(shadow_audit_key_init);
 __initcall(shadow_audit_key_init);
 #endif /* SHADOW_AUDIT */
 
-static void sh_free_log_dirty_bitmap(struct domain *d);
-
 int _shadow_mode_refcounts(struct domain *d)
 {
     return shadow_mode_refcounts(d);
@@ -541,7 +539,7 @@ sh_validate_guest_entry(struct vcpu *v, 
     int result = 0;
     struct page_info *page = mfn_to_page(gmfn);
 
-    sh_mark_dirty(v->domain, gmfn);
+    paging_mark_dirty(v->domain, mfn_x(gmfn));
     
     // Determine which types of shadows are affected, and update each.
     //
@@ -2455,6 +2453,10 @@ int shadow_enable(struct domain *d, u32 
         }        
     }
 
+    /* initialize log dirty here */
+    paging_log_dirty_init(d, shadow_enable_log_dirty, 
+                          shadow_disable_log_dirty, shadow_clean_dirty_bitmap);
+
     /* Init the P2M table.  Must be done before we take the shadow lock 
      * to avoid possible deadlock. */
     if ( mode & PG_translate )
@@ -2463,6 +2465,7 @@ int shadow_enable(struct domain *d, u32 
         if (rv != 0)
             goto out_unlocked;
     }
+
 
     shadow_lock(d);
 
@@ -2565,7 +2568,7 @@ void shadow_teardown(struct domain *d)
         if (d->arch.paging.shadow.hash_table) 
             shadow_hash_teardown(d);
         /* Release the log-dirty bitmap of dirtied pages */
-        sh_free_log_dirty_bitmap(d);
+        paging_free_log_dirty_bitmap(d);
         /* Should not have any more memory held */
         SHADOW_PRINTK("teardown done."
                        "  Shadow pages total = %u, free = %u, p2m=%u\n",
@@ -2718,98 +2721,6 @@ static int shadow_test_disable(struct do
     domain_pause(d);
     shadow_lock(d);
     ret = shadow_one_bit_disable(d, PG_SH_enable);
-    shadow_unlock(d);
-    domain_unpause(d);
-
-    return ret;
-}
-
-static int
-sh_alloc_log_dirty_bitmap(struct domain *d)
-{
-    ASSERT(d->arch.paging.shadow.dirty_bitmap == NULL);
-    d->arch.paging.shadow.dirty_bitmap_size =
-        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
-    d->arch.paging.shadow.dirty_bitmap =
-        xmalloc_array(unsigned long,
-                      d->arch.paging.shadow.dirty_bitmap_size / BITS_PER_LONG);
-    if ( d->arch.paging.shadow.dirty_bitmap == NULL )
-    {
-        d->arch.paging.shadow.dirty_bitmap_size = 0;
-        return -ENOMEM;
-    }
-    memset(d->arch.paging.shadow.dirty_bitmap, 0,
-           d->arch.paging.shadow.dirty_bitmap_size/8);
-
-    return 0;
-}
-
-static void
-sh_free_log_dirty_bitmap(struct domain *d)
-{
-    d->arch.paging.shadow.dirty_bitmap_size = 0;
-    if ( d->arch.paging.shadow.dirty_bitmap )
-    {
-        xfree(d->arch.paging.shadow.dirty_bitmap);
-        d->arch.paging.shadow.dirty_bitmap = NULL;
-    }
-}
-
-static int shadow_log_dirty_enable(struct domain *d)
-{
-    int ret;
-
-    domain_pause(d);
-    shadow_lock(d);
-
-    if ( shadow_mode_log_dirty(d) )
-    {
-        ret = -EINVAL;
-        goto out;
-    }
-
-    if ( shadow_mode_enabled(d) )
-    {
-        /* This domain already has some shadows: need to clear them out 
-         * of the way to make sure that all references to guest memory are 
-         * properly write-protected */
-        shadow_blow_tables(d);
-    }
-
-#if (SHADOW_OPTIMIZATIONS & SHOPT_LINUX_L3_TOPLEVEL)
-    /* 32bit PV guests on 64bit xen behave like older 64bit linux: they
-     * change an l4e instead of cr3 to switch tables.  Give them the
-     * same optimization */
-    if ( is_pv_32on64_domain(d) )
-        d->arch.paging.shadow.opt_flags = SHOPT_LINUX_L3_TOPLEVEL;
-#endif
-
-    ret = sh_alloc_log_dirty_bitmap(d);
-    if ( ret != 0 )
-    {
-        sh_free_log_dirty_bitmap(d);
-        goto out;
-    }
-
-    ret = shadow_one_bit_enable(d, PG_log_dirty);
-    if ( ret != 0 )
-        sh_free_log_dirty_bitmap(d);
-
- out:
-    shadow_unlock(d);
-    domain_unpause(d);
-    return ret;
-}
-
-static int shadow_log_dirty_disable(struct domain *d)
-{
-    int ret;
-
-    domain_pause(d);
-    shadow_lock(d);
-    ret = shadow_one_bit_disable(d, PG_log_dirty);
-    if ( !shadow_mode_log_dirty(d) )
-        sh_free_log_dirty_bitmap(d);
     shadow_unlock(d);
     domain_unpause(d);
 
@@ -2892,150 +2803,62 @@ void shadow_convert_to_log_dirty(struct 
     BUG();
 }
 
-
-/* Read a domain's log-dirty bitmap and stats.  
- * If the operation is a CLEAN, clear the bitmap and stats as well. */
-static int shadow_log_dirty_op(
-    struct domain *d, struct xen_domctl_shadow_op *sc)
-{
-    int i, rv = 0, clean = 0, peek = 1;
-
-    domain_pause(d);
+/* Shadow specific code which is called in paging_log_dirty_enable().
+ * Return 0 if no problem found.
+ */
+int shadow_enable_log_dirty(struct domain *d)
+{
+    int ret;
+
+    /* shadow lock is required here */
     shadow_lock(d);
-
-    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
-
-    SHADOW_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", 
-                  (clean) ? "clean" : "peek",
-                  d->domain_id,
-                  d->arch.paging.shadow.fault_count, 
-                  d->arch.paging.shadow.dirty_count);
-
-    sc->stats.fault_count = d->arch.paging.shadow.fault_count;
-    sc->stats.dirty_count = d->arch.paging.shadow.dirty_count;
-
-    if ( clean )
-    {
-        /* Need to revoke write access to the domain's pages again.
-         * In future, we'll have a less heavy-handed approach to this,
-         * but for now, we just unshadow everything except Xen. */
+    if ( shadow_mode_enabled(d) )
+    {
+        /* This domain already has some shadows: need to clear them out 
+         * of the way to make sure that all references to guest memory are 
+         * properly write-protected */
         shadow_blow_tables(d);
-
-        d->arch.paging.shadow.fault_count = 0;
-        d->arch.paging.shadow.dirty_count = 0;
-    }
-
-    if ( guest_handle_is_null(sc->dirty_bitmap) )
-        /* caller may have wanted just to clean the state or access stats. */
-        peek = 0;
-
-    if ( (peek || clean) && (d->arch.paging.shadow.dirty_bitmap == NULL) )
-    {
-        rv = -EINVAL; /* perhaps should be ENOMEM? */
-        goto out;
-    }
- 
-    if ( sc->pages > d->arch.paging.shadow.dirty_bitmap_size )
-        sc->pages = d->arch.paging.shadow.dirty_bitmap_size;
-
-#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */
-    for ( i = 0; i < sc->pages; i += CHUNK )
-    {
-        int bytes = ((((sc->pages - i) > CHUNK)
-                      ? CHUNK
-                      : (sc->pages - i)) + 7) / 8;
-
-        if ( likely(peek) )
-        {
-            if ( copy_to_guest_offset(
-                sc->dirty_bitmap, i/8,
-                (uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), bytes) )
-            {
-                rv = -EFAULT;
-                goto out;
-            }
-        }
-
-        if ( clean )
-            memset((uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), 0, bytes);
-    }
-#undef CHUNK
-
- out:
+    }
+
+#if (SHADOW_OPTIMIZATIONS & SHOPT_LINUX_L3_TOPLEVEL)
+    /* 32bit PV guests on 64bit xen behave like older 64bit linux: they
+     * change an l4e instead of cr3 to switch tables.  Give them the
+     * same optimization */
+    if ( is_pv_32on64_domain(d) )
+        d->arch.paging.shadow.opt_flags = SHOPT_LINUX_L3_TOPLEVEL;
+#endif
+    
+    ret = shadow_one_bit_enable(d, PG_log_dirty);
     shadow_unlock(d);
-    domain_unpause(d);
-    return rv;
-}
-
-
-/* Mark a page as dirty */
-void sh_mark_dirty(struct domain *d, mfn_t gmfn)
-{
-    unsigned long pfn;
-    int do_locking;
-
-    if ( !shadow_mode_log_dirty(d) || !mfn_valid(gmfn) )
-        return;
-
-    /* Although this is an externally visible function, we do not know
-     * whether the shadow lock will be held when it is called (since it
-     * can be called from __hvm_copy during emulation).
-     * If the lock isn't held, take it for the duration of the call. */
-    do_locking = !shadow_locked_by_me(d);
-    if ( do_locking ) 
-    { 
-        shadow_lock(d);
-        /* Check the mode again with the lock held */ 
-        if ( unlikely(!shadow_mode_log_dirty(d)) )
-        {
-            shadow_unlock(d);
-            return;
-        }
-    }
-
-    ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL);
-
-    /* We /really/ mean PFN here, even for non-translated guests. */
-    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
-
-    /*
-     * Values with the MSB set denote MFNs that aren't really part of the 
-     * domain's pseudo-physical memory map (e.g., the shared info frame).
-     * Nothing to do here...
-     */
-    if ( unlikely(!VALID_M2P(pfn)) )
-        return;
-
-    /* N.B. Can use non-atomic TAS because protected by shadow_lock. */
-    if ( likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) ) 
-    { 
-        if ( !__test_and_set_bit(pfn, d->arch.paging.shadow.dirty_bitmap) )
-        {
-            SHADOW_DEBUG(LOGDIRTY, 
-                          "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n",
-                          mfn_x(gmfn), pfn, d->domain_id);
-            d->arch.paging.shadow.dirty_count++;
-        }
-    }
-    else
-    {
-        SHADOW_PRINTK("mark_dirty OOR! "
-                       "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
-                       "owner=%d c=%08x t=%" PRtype_info "\n",
-                       mfn_x(gmfn), 
-                       pfn, 
-                       d->arch.paging.shadow.dirty_bitmap_size,
-                       d->domain_id,
-                       (page_get_owner(mfn_to_page(gmfn))
-                        ? page_get_owner(mfn_to_page(gmfn))->domain_id
-                        : -1),
-                       mfn_to_page(gmfn)->count_info, 
-                       mfn_to_page(gmfn)->u.inuse.type_info);
-    }
-
-    if ( do_locking ) shadow_unlock(d);
-}
-
+
+    return ret;
+}
+
+/* shadow specfic code which is called in paging_log_dirty_disable() */
+int shadow_disable_log_dirty(struct domain *d)
+{
+    int ret;
+
+    /* shadow lock is required here */    
+    shadow_lock(d);
+    ret = shadow_one_bit_disable(d, PG_log_dirty);
+    shadow_unlock(d);
+    
+    return ret;
+}
+
+/* This function is called when we CLEAN log dirty bitmap. See 
+ * paging_log_dirty_op() for details. 
+ */
+void shadow_clean_dirty_bitmap(struct domain *d)
+{
+    shadow_lock(d);
+    /* Need to revoke write access to the domain's pages again.
+     * In future, we'll have a less heavy-handed approach to this,
+     * but for now, we just unshadow everything except Xen. */
+    shadow_blow_tables(d);
+    shadow_unlock(d);
+}
 /**************************************************************************/
 /* Shadow-control XEN_DOMCTL dispatcher */
 
@@ -3045,33 +2868,9 @@ int shadow_domctl(struct domain *d,
 {
     int rc, preempted = 0;
 
-    if ( unlikely(d == current->domain) )
-    {
-        gdprintk(XENLOG_INFO, "Dom %u tried to do a shadow op on itself.\n",
-                 d->domain_id);
-        return -EINVAL;
-    }
-
-    if ( unlikely(d->is_dying) )
-    {
-        gdprintk(XENLOG_INFO, "Ignoring shadow op on dying domain %u\n",
-                 d->domain_id);
-        return 0;
-    }
-
-    if ( unlikely(d->vcpu[0] == NULL) )
-    {
-        SHADOW_ERROR("Shadow op on a domain (%u) with no vcpus\n",
-                     d->domain_id);
-        return -EINVAL;
-    }
-
     switch ( sc->op )
     {
     case XEN_DOMCTL_SHADOW_OP_OFF:
-        if ( shadow_mode_log_dirty(d) )
-            if ( (rc = shadow_log_dirty_disable(d)) != 0 ) 
-                return rc;
         if ( d->arch.paging.mode == PG_SH_enable )
             if ( (rc = shadow_test_disable(d)) != 0 ) 
                 return rc;
@@ -3080,19 +2879,10 @@ int shadow_domctl(struct domain *d,
     case XEN_DOMCTL_SHADOW_OP_ENABLE_TEST:
         return shadow_test_enable(d);
 
-    case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
-        return shadow_log_dirty_enable(d);
-
     case XEN_DOMCTL_SHADOW_OP_ENABLE_TRANSLATE:
         return shadow_enable(d, PG_refcounts|PG_translate);
 
-    case XEN_DOMCTL_SHADOW_OP_CLEAN:
-    case XEN_DOMCTL_SHADOW_OP_PEEK:
-        return shadow_log_dirty_op(d, sc);
-
     case XEN_DOMCTL_SHADOW_OP_ENABLE:
-        if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY )
-            return shadow_log_dirty_enable(d);
         return shadow_enable(d, sc->mode << PG_mode_shift);
 
     case XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION:
diff -r 45516ac94c9f -r 9bc6a196ad0e xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/shadow/multi.c	Thu Jun 07 03:53:59 2007 -0500
@@ -457,7 +457,7 @@ static u32 guest_set_ad_bits(struct vcpu
     }
 
     /* Set the bit(s) */
-    sh_mark_dirty(v->domain, gmfn);
+    paging_mark_dirty(v->domain, mfn_x(gmfn));
     SHADOW_DEBUG(A_AND_D, "gfn = %" SH_PRI_gfn ", "
                  "old flags = %#x, new flags = %#x\n", 
                  gfn_x(guest_l1e_get_gfn(*ep)), guest_l1e_get_flags(*ep), 
@@ -717,7 +717,7 @@ _sh_propagate(struct vcpu *v,
     if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) )
     {
         if ( ft & FETCH_TYPE_WRITE ) 
-            sh_mark_dirty(d, target_mfn);
+            paging_mark_dirty(d, mfn_x(target_mfn));
         else if ( !sh_mfn_is_dirty(d, target_mfn) )
             sflags &= ~_PAGE_RW;
     }
@@ -2856,7 +2856,7 @@ static int sh_page_fault(struct vcpu *v,
     }
 
     perfc_incr(shadow_fault_fixed);
-    d->arch.paging.shadow.fault_count++;
+    d->arch.paging.log_dirty.fault_count++;
     reset_early_unshadow(v);
 
  done:
@@ -4058,7 +4058,7 @@ sh_x86_emulate_write(struct vcpu *v, uns
     else
         reset_early_unshadow(v);
     
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
@@ -4114,7 +4114,7 @@ sh_x86_emulate_cmpxchg(struct vcpu *v, u
     else
         reset_early_unshadow(v);
 
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
@@ -4158,7 +4158,7 @@ sh_x86_emulate_cmpxchg8b(struct vcpu *v,
     else
         reset_early_unshadow(v);
 
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
diff -r 45516ac94c9f -r 9bc6a196ad0e xen/arch/x86/mm/shadow/private.h
--- a/xen/arch/x86/mm/shadow/private.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/shadow/private.h	Thu Jun 07 03:53:59 2007 -0500
@@ -496,13 +496,13 @@ sh_mfn_is_dirty(struct domain *d, mfn_t 
 {
     unsigned long pfn;
     ASSERT(shadow_mode_log_dirty(d));
-    ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL);
+    ASSERT(d->arch.paging.log_dirty.bitmap != NULL);
 
     /* We /really/ mean PFN here, even for non-translated guests. */
     pfn = get_gpfn_from_mfn(mfn_x(gmfn));
     if ( likely(VALID_M2P(pfn))
-         && likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) 
-         && test_bit(pfn, d->arch.paging.shadow.dirty_bitmap) )
+         && likely(pfn < d->arch.paging.log_dirty.bitmap_size) 
+         && test_bit(pfn, d->arch.paging.log_dirty.bitmap) )
         return 1;
 
     return 0;
diff -r 45516ac94c9f -r 9bc6a196ad0e xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/domain.h	Thu Jun 07 03:53:59 2007 -0500
@@ -92,14 +92,6 @@ struct shadow_domain {
 
     /* Fast MMIO path heuristic */
     int has_fast_mmio_entries;
-
-    /* Shadow log-dirty bitmap */
-    unsigned long *dirty_bitmap;
-    unsigned int dirty_bitmap_size;  /* in pages, bit per page */
-
-    /* Shadow log-dirty mode stats */
-    unsigned int fault_count;
-    unsigned int dirty_count;
 };
 
 struct shadow_vcpu {
@@ -134,7 +126,6 @@ struct hap_domain {
 /************************************************/
 /*       p2m handling                           */
 /************************************************/
-
 struct p2m_domain {
     /* Lock that protects updates to the p2m */
     spinlock_t         lock;
@@ -156,16 +147,36 @@ struct p2m_domain {
 /************************************************/
 /*       common paging data structure           */
 /************************************************/
+struct log_dirty_domain {
+    /* log-dirty lock */
+    spinlock_t     lock;
+    int            locker; /* processor that holds the lock */
+    const char    *locker_function; /* func that took it */
+
+    /* log-dirty bitmap to record dirty pages */
+    unsigned long *bitmap;
+    unsigned int   bitmap_size;  /* in pages, bit per page */
+
+    /* log-dirty mode stats */
+    unsigned int   fault_count;
+    unsigned int   dirty_count;
+
+    /* functions which are paging mode specific */
+    int            (*enable_log_dirty   )(struct domain *d);
+    int            (*disable_log_dirty  )(struct domain *d);
+    void           (*clean_dirty_bitmap )(struct domain *d);
+};
+
 struct paging_domain {
-    u32               mode;  /* flags to control paging operation */
-
+    /* flags to control paging operation */
+    u32                     mode;
     /* extension for shadow paging support */
-    struct shadow_domain shadow;
-
-    /* Other paging assistance code will have structs here */
-    struct hap_domain    hap;
-};
-
+    struct shadow_domain    shadow;
+    /* extension for hardware-assited paging */
+    struct hap_domain       hap;
+    /* log dirty support */
+    struct log_dirty_domain log_dirty;
+};
 struct paging_vcpu {
     /* Pointers to mode-specific entry points. */
     struct paging_mode *mode;
diff -r 45516ac94c9f -r 9bc6a196ad0e xen/include/asm-x86/grant_table.h
--- a/xen/include/asm-x86/grant_table.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/grant_table.h	Thu Jun 07 03:53:59 2007 -0500
@@ -31,7 +31,7 @@ int replace_grant_host_mapping(
 #define gnttab_shared_gmfn(d, t, i)                     \
     (mfn_to_gmfn(d, gnttab_shared_mfn(d, t, i)))
 
-#define gnttab_mark_dirty(d, f) mark_dirty((d), (f))
+#define gnttab_mark_dirty(d, f) paging_mark_dirty((d), (f))
 
 static inline void gnttab_clear_flag(unsigned long nr, uint16_t *addr)
 {
diff -r 45516ac94c9f -r 9bc6a196ad0e xen/include/asm-x86/paging.h
--- a/xen/include/asm-x86/paging.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/paging.h	Thu Jun 07 03:53:59 2007 -0500
@@ -62,6 +62,9 @@
 #define paging_mode_log_dirty(_d) ((_d)->arch.paging.mode & PG_log_dirty)
 #define paging_mode_translate(_d) ((_d)->arch.paging.mode & PG_translate)
 #define paging_mode_external(_d)  ((_d)->arch.paging.mode & PG_external)
+
+/* flags used for paging debug */
+#define PAGING_DEBUG_LOGDIRTY 0
 
 /******************************************************************************
  * The equivalent for a particular vcpu of a shadowed domain. */
@@ -136,6 +139,29 @@ struct paging_mode {
     struct shadow_paging_mode shadow;
 };
 
+/*****************************************************************************
+ * Log dirty code */
+
+/* allocate log dirty bitmap resource for recording dirty pages */
+int paging_alloc_log_dirty_bitmap(struct domain *d);
+
+/* free log dirty bitmap resource */
+void paging_free_log_dirty_bitmap(struct domain *d);
+
+/* enable log dirty */
+int paging_log_dirty_enable(struct domain *d);
+
+/* disable log dirty */
+int paging_log_dirty_disable(struct domain *d);
+
+/* log dirty initialization */
+void paging_log_dirty_init(struct domain *d,
+                           int  (*enable_log_dirty)(struct domain *d),
+                           int  (*disable_log_dirty)(struct domain *d),
+                           void (*clean_dirty_bitmap)(struct domain *d));
+
+/* mark a page as dirty */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn);
 
 /*****************************************************************************
  * Entry points into the paging-assistance code */
diff -r 45516ac94c9f -r 9bc6a196ad0e xen/include/asm-x86/shadow.h
--- a/xen/include/asm-x86/shadow.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/shadow.h	Thu Jun 07 03:53:59 2007 -0500
@@ -75,16 +75,14 @@ void shadow_teardown(struct domain *d);
 /* Call once all of the references to the domain have gone away */
 void shadow_final_teardown(struct domain *d);
 
-/* Mark a page as dirty in the log-dirty bitmap: called when Xen 
- * makes changes to guest memory on its behalf. */
-void sh_mark_dirty(struct domain *d, mfn_t gmfn);
-/* Cleaner version so we don't pepper shadow_mode tests all over the place */
-static inline void mark_dirty(struct domain *d, unsigned long gmfn)
-{
-    if ( unlikely(shadow_mode_log_dirty(d)) )
-        /* See the comment about locking in sh_mark_dirty */
-        sh_mark_dirty(d, _mfn(gmfn));
-}
+/* shadow code to call when log dirty is enabled */
+int shadow_enable_log_dirty(struct domain *d);
+
+/* shadow code to call when log dirty is disabled */
+int shadow_disable_log_dirty(struct domain *d);
+
+/* shadow code to call when bitmap is being cleaned */
+void shadow_clean_dirty_bitmap(struct domain *d);
 
 /* Update all the things that are derived from the guest's CR0/CR3/CR4.
  * Called to initialize paging structures if the paging mode

[-- Attachment #4: live_migrate_npt_patch.txt --]
[-- Type: text/plain, Size: 10162 bytes --]

diff -r 9bc6a196ad0e -r 8f3639d92b08 xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c	Thu Jun 07 03:53:59 2007 -0500
+++ b/xen/arch/x86/hvm/svm/svm.c	Thu Jun 07 06:39:39 2007 -0500
@@ -1033,8 +1033,8 @@ static int svm_do_nested_pgfault(paddr_t
         return 1;
     }
 
-    /* We should not reach here. Otherwise, P2M table is not correct.*/
-    return 0;
+    paging_mark_dirty(current->domain, get_mfn_from_gpfn(gpa >> PAGE_SHIFT));
+    return p2m_set_flags(current->domain, gpa, __PAGE_HYPERVISOR|_PAGE_USER);
 }
 
 static void svm_do_no_device_fault(struct vmcb_struct *vmcb)
diff -r 9bc6a196ad0e -r 8f3639d92b08 xen/arch/x86/mm/hap/hap.c
--- a/xen/arch/x86/mm/hap/hap.c	Thu Jun 07 03:53:59 2007 -0500
+++ b/xen/arch/x86/mm/hap/hap.c	Thu Jun 07 06:39:39 2007 -0500
@@ -49,6 +49,35 @@
 #undef page_to_mfn
 #define page_to_mfn(_pg) (_mfn((_pg) - frame_table))
 
+/************************************************/
+/*            HAP LOG DIRTY SUPPORT             */
+/************************************************/
+/* hap code to call when log_dirty is enable. return 0 if no problem found. */
+int hap_enable_log_dirty(struct domain *d)
+{
+    /* turn on PG_log_dirty bit in paging mode */
+    d->arch.paging.mode |= PG_log_dirty;
+    p2m_set_flags_global(d, (_PAGE_PRESENT|_PAGE_USER));
+    flush_tlb_all_pge();
+
+    return 0;
+}
+
+int hap_disable_log_dirty(struct domain *d)
+{
+    /* log dirty already accquired lock to guard this code */
+    d->arch.paging.mode &= ~PG_log_dirty;
+    p2m_set_flags_global(d, __PAGE_HYPERVISOR|_PAGE_USER);
+    
+    return 1;
+}
+
+void hap_clean_dirty_bitmap(struct domain *d)
+{
+    /* mark physical memory as not writetable and flush the TLB */
+    p2m_set_flags_global(d, (_PAGE_PRESENT|_PAGE_USER));
+    flush_tlb_all_pge();
+}
 /************************************************/
 /*             HAP SUPPORT FUNCTIONS            */
 /************************************************/
@@ -421,6 +450,10 @@ int hap_enable(struct domain *d, u32 mod
         }
     }
 
+    /* initialize log dirty here */
+    paging_log_dirty_init(d, hap_enable_log_dirty, hap_disable_log_dirty,
+                          hap_clean_dirty_bitmap);
+
     /* allocate P2m table */
     if ( mode & PG_translate ) {
         rv = p2m_alloc_table(d, hap_alloc_p2m_page, hap_free_p2m_page);
@@ -478,6 +511,8 @@ void hap_teardown(struct domain *d)
                       d->arch.paging.hap.free_pages,
                       d->arch.paging.hap.p2m_pages);
         hap_set_allocation(d, 0, NULL);
+        /* release the log-dirty bitmap of dirty pages */
+        paging_free_log_dirty_bitmap(d);
         HAP_PRINTK("teardown done."
                       "  pages total = %u, free = %u, p2m=%u\n",
                       d->arch.paging.hap.total_pages,
diff -r 9bc6a196ad0e -r 8f3639d92b08 xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c	Thu Jun 07 03:53:59 2007 -0500
+++ b/xen/arch/x86/mm/p2m.c	Thu Jun 07 06:39:39 2007 -0500
@@ -169,7 +169,7 @@ p2m_next_level(struct domain *d, mfn_t *
 
 // Returns 0 on error (out of memory)
 static int
-set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, u32 l1e_flags)
 {
     // XXX -- this might be able to be faster iff current->domain == d
     mfn_t table_mfn = pagetable_get_mfn(d->arch.phys_table);
@@ -213,7 +213,7 @@ set_p2m_entry(struct domain *d, unsigned
         d->arch.p2m.max_mapped_pfn = gfn;
 
     if ( mfn_valid(mfn) )
-        entry_content = l1e_from_pfn(mfn_x(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
+        entry_content = l1e_from_pfn(mfn_x(mfn), l1e_flags);
     else
         entry_content = l1e_empty();
 
@@ -278,7 +278,7 @@ int p2m_alloc_table(struct domain *d,
         p2m_unlock(d);
         return -ENOMEM;
     }
-list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
+    list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
 
     p2m_top->count_info = 1;
     p2m_top->u.inuse.type_info = 
@@ -297,8 +297,8 @@ list_add_tail(&p2m_top->list, &d->arch.p
  
     /* Initialise physmap tables for slot zero. Other code assumes this. */
     gfn = 0;
-mfn = _mfn(INVALID_MFN);
-    if ( !set_p2m_entry(d, gfn, mfn) )
+    mfn = _mfn(INVALID_MFN);
+    if ( !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
         goto error;
 
     for ( entry = d->page_list.next;
@@ -316,7 +316,7 @@ mfn = _mfn(INVALID_MFN);
             (gfn != 0x55555555L)
 #endif
              && gfn != INVALID_M2P_ENTRY
-             && !set_p2m_entry(d, gfn, mfn) )
+             && !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
             goto error;
     }
 
@@ -497,7 +497,7 @@ static void audit_p2m(struct domain *d)
             /* This m2p entry is stale: the domain has another frame in
              * this physical slot.  No great disaster, but for neatness,
              * blow away the m2p entry. */ 
-            set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
+            set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY, __PAGE_HYPERVISOR|_PAGE_USER);
         }
 
         if ( test_linear && (gfn <= d->arch.p2m.max_mapped_pfn) )
@@ -626,7 +626,7 @@ p2m_remove_page(struct domain *d, unsign
     ASSERT(mfn_x(gfn_to_mfn(d, gfn)) == mfn);
     //ASSERT(mfn_to_gfn(d, mfn) == gfn);
 
-    set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+    set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
 }
 
@@ -659,7 +659,7 @@ guest_physmap_add_page(struct domain *d,
     omfn = gfn_to_mfn(d, gfn);
     if ( mfn_valid(omfn) )
     {
-        set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+        set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
         set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
     }
 
@@ -685,13 +685,129 @@ guest_physmap_add_page(struct domain *d,
         }
     }
 
-    set_p2m_entry(d, gfn, _mfn(mfn));
+    set_p2m_entry(d, gfn, _mfn(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, gfn);
 
     audit_p2m(d);
     p2m_unlock(d);
 }
 
+/* This function goes through P2M table and modify l1e flags of all pages. Note
+ * that physical base address of l1e is intact. This function can be used for
+ * special purpose, such as marking physical memory as NOT WRITABLE for
+ * tracking dirty pages during live migration.
+ */
+void p2m_set_flags_global(struct domain *d, u32 l1e_flags)
+{
+    unsigned long mfn, gfn;
+    l1_pgentry_t l1e_content;
+    l1_pgentry_t *l1e;
+    l2_pgentry_t *l2e;
+    int i1, i2;
+#if CONFIG_PAGING_LEVELS >= 3
+    l3_pgentry_t *l3e;
+    int i3;
+#if CONFIG_PAGING_LEVELS == 4
+    l4_pgentry_t *l4e;
+    int i4;
+#endif /* CONFIG_PAGING_LEVELS == 4 */
+#endif /* CONFIG_PAGING_LEVELS >= 3 */
+    
+    if ( !paging_mode_translate(d) )
+        return;
+ 
+    if ( pagetable_get_pfn(d->arch.phys_table) == 0 )
+        return;
+
+    p2m_lock(d);
+        
+#if CONFIG_PAGING_LEVELS == 4
+    l4e = map_domain_page(mfn_x(pagetable_get_mfn(d->arch.phys_table)));
+#elif CONFIG_PAGING_LEVELS == 3
+    l3e = map_domain_page(mfn_x(pagetable_get_mfn(d->arch.phys_table)));
+#else /* CONFIG_PAGING_LEVELS == 2 */
+    l2e = map_domain_page(mfn_x(pagetable_get_mfn(d->arch.phys_table)));
+#endif
+
+#if CONFIG_PAGING_LEVELS >= 3
+#if CONFIG_PAGING_LEVELS >= 4
+    for ( i4 = 0; i4 < L4_PAGETABLE_ENTRIES; i4++ ) 
+    {
+	if ( !(l4e_get_flags(l4e[i4]) & _PAGE_PRESENT) )
+	{
+	    continue;
+	}
+	l3e = map_domain_page(mfn_x(_mfn(l4e_get_pfn(l4e[i4]))));
+#endif /* now at levels 3 or 4... */
+	for ( i3 = 0; 
+	      i3 < ((CONFIG_PAGING_LEVELS==4) ? L3_PAGETABLE_ENTRIES : 8); 
+	      i3++ )
+	{
+	    if ( !(l3e_get_flags(l3e[i3]) & _PAGE_PRESENT) )
+	    {
+		continue;
+	    }
+	    l2e = map_domain_page(mfn_x(_mfn(l3e_get_pfn(l3e[i3]))));
+#endif /* all levels... */
+	    for ( i2 = 0; i2 < L2_PAGETABLE_ENTRIES; i2++ )
+	    {
+		if ( !(l2e_get_flags(l2e[i2]) & _PAGE_PRESENT) )
+		{
+		    continue;
+		}
+		l1e = map_domain_page(mfn_x(_mfn(l2e_get_pfn(l2e[i2]))));
+		
+		for ( i1 = 0; i1 < L1_PAGETABLE_ENTRIES; i1++, gfn++ )
+		{
+		    if ( !(l1e_get_flags(l1e[i1]) & _PAGE_PRESENT) )
+			continue;
+		    mfn = l1e_get_pfn(l1e[i1]);
+		    gfn = get_gpfn_from_mfn(mfn);
+		    /* create a new 1le entry using l1e_flags */
+		    l1e_content = l1e_from_pfn(mfn, l1e_flags);
+		    paging_write_p2m_entry(d, gfn, &l1e[i1], l1e_content, 1);
+		}
+		unmap_domain_page(l1e);
+	    }
+#if CONFIG_PAGING_LEVELS >= 3
+	    unmap_domain_page(l2e);
+	}
+#if CONFIG_PAGING_LEVELS >= 4
+	unmap_domain_page(l3e);
+    }
+#endif
+#endif
+
+#if CONFIG_PAGING_LEVELS == 4
+    unmap_domain_page(l4e);
+#elif CONFIG_PAGING_LEVELS == 3
+    unmap_domain_page(l3e);
+#else /* CONFIG_PAGING_LEVELS == 2 */
+    unmap_domain_page(l2e);
+#endif
+
+    p2m_unlock(d);
+}
+
+/* This function traces through P2M table and modifies l1e flags of a specific
+ * gpa.
+ */
+int p2m_set_flags(struct domain *d, paddr_t gpa, u32 l1e_flags)
+{
+    unsigned long gfn;
+    mfn_t mfn;
+
+    p2m_lock(d);
+
+    gfn = gpa >> PAGE_SHIFT;
+    mfn = gfn_to_mfn(d, gfn);
+    if ( mfn_valid(mfn) )
+        set_p2m_entry(d, gfn, mfn, l1e_flags);
+    
+    p2m_unlock(d);
+
+    return 1;
+}
 
 /*
  * Local variables:
diff -r 9bc6a196ad0e -r 8f3639d92b08 xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h	Thu Jun 07 03:53:59 2007 -0500
+++ b/xen/include/asm-x86/p2m.h	Thu Jun 07 06:39:39 2007 -0500
@@ -129,6 +129,11 @@ void guest_physmap_remove_page(struct do
 void guest_physmap_remove_page(struct domain *d, unsigned long gfn,
                                unsigned long mfn);
 
+/* set P2M table l1e flags */
+void p2m_set_flags_global(struct domain *d, u32 l1e_flags);
+
+/* set P2M table l1e flags for a gpa */
+int p2m_set_flags(struct domain *d, paddr_t gpa, u32 l1e_flags);
 
 #endif /* _XEN_P2M_H */
 

[-- Attachment #5: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Nested Paging Live Migration
  2007-06-07 21:58       ` Huang2, Wei
@ 2007-06-08 10:52         ` Tim Deegan
  2007-06-08 16:09           ` Huang2, Wei
  2007-06-08 19:26           ` Huang2, Wei
  0 siblings, 2 replies; 8+ messages in thread
From: Tim Deegan @ 2007-06-08 10:52 UTC (permalink / raw)
  To: Huang2, Wei; +Cc: xen-devel

Hi, 

This patch is much nicer.  One or two more nits below, and a
Signed-off-by: line, please. :)

At 16:58 -0500 on 07 Jun (1181235517), Huang2, Wei wrote:
> +int hap_enable_log_dirty(struct domain *d)
> +{
> +    /* turn on PG_log_dirty bit in paging mode */
> +    d->arch.paging.mode |= PG_log_dirty;
> +    p2m_set_flags_global(d, (_PAGE_PRESENT|_PAGE_USER));
> +    flush_tlb_all_pge();
> +
> +    return 0;
> +}
> +
> +int hap_disable_log_dirty(struct domain *d)
> +{
> +    /* log dirty already accquired lock to guard this code */
> +    d->arch.paging.mode &= ~PG_log_dirty;
> +    p2m_set_flags_global(d, __PAGE_HYPERVISOR|_PAGE_USER);
> +    
> +    return 1;
> +}

The log-dirty lock doesn't guard against concurrent updates of
d->arch.paging.mode!  You need the HAP lock here.

>  int paging_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,
>                    XEN_GUEST_HANDLE(void) u_domctl)
>  {
> +    int rc;
> +
> +    if ( unlikely(d == current->domain) )
> +    {
> +        gdprintk(XENLOG_INFO, "Dom %u tried to do a shadow op on itself.\n",

(and subsequently) s/shadow/paging/ here?

> @@ -2565,7 +2568,7 @@ void shadow_teardown(struct domain *d)
>          if (d->arch.paging.shadow.hash_table) 
>              shadow_hash_teardown(d);
>          /* Release the log-dirty bitmap of dirtied pages */
> -        sh_free_log_dirty_bitmap(d);
> +        paging_free_log_dirty_bitmap(d);

Shouldn't this be handled in paging.c?  Otherwise we'd need to acquire
the log-dirty lock with the shadow lock held.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@xensource.com>, XenSource UK Limited
Registered office c/o EC2Y 5EB, UK; company number 05334508

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [RFC] Nested Paging Live Migration
  2007-06-08 10:52         ` Tim Deegan
@ 2007-06-08 16:09           ` Huang2, Wei
  2007-06-08 19:26           ` Huang2, Wei
  1 sibling, 0 replies; 8+ messages in thread
From: Huang2, Wei @ 2007-06-08 16:09 UTC (permalink / raw)
  To: Tim Deegan; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 923 bytes --]

This patch creates a common interface for live migration. It also
supports nested paging live migration.

Signed-off-by: Wei Huang <wei.huang2@amd.com>


arch/x86/hvm/hvm.c            |    2 
arch/x86/hvm/io.c             |    2 
arch/x86/hvm/svm/svm.c        |    4 
arch/x86/mm.c                 |   12 -
arch/x86/mm/hap/hap.c         |   58 ++++++-
arch/x86/mm/p2m.c             |  136 ++++++++++++++++-
arch/x86/mm/paging.c          |  323
++++++++++++++++++++++++++++++++++++++++-
arch/x86/mm/shadow/common.c   |  330
+++++++-----------------------------------
arch/x86/mm/shadow/multi.c    |   12 -
arch/x86/mm/shadow/private.h  |    6 
include/asm-x86/domain.h      |   45 +++--
include/asm-x86/grant_table.h |    2 
include/asm-x86/p2m.h         |    5 
include/asm-x86/paging.h      |   26 +++
include/asm-x86/shadow.h      |   18 +-
15 files changed, 642 insertions(+), 339 deletions(-)


[-- Attachment #2: live_migration_patch.txt --]
[-- Type: text/plain, Size: 48335 bytes --]

diff -r 45516ac94c9f xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/hvm/hvm.c	Wed Jun 06 12:05:42 2007 -0500
@@ -568,7 +568,7 @@ static int __hvm_copy(void *buf, paddr_t
         if ( dir )
         {
             memcpy(p, buf, count); /* dir == TRUE:  *to* guest */
-            mark_dirty(current->domain, mfn);
+            paging_mark_dirty(current->domain, mfn);
         }
         else
             memcpy(buf, p, count); /* dir == FALSE: *from guest */
diff -r 45516ac94c9f xen/arch/x86/hvm/io.c
--- a/xen/arch/x86/hvm/io.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/hvm/io.c	Wed Jun 06 12:05:56 2007 -0500
@@ -865,7 +865,7 @@ void hvm_io_assist(void)
     if ( (p->dir == IOREQ_READ) && p->data_is_ptr )
     {
         gmfn = get_mfn_from_gpfn(paging_gva_to_gfn(v, p->data));
-        mark_dirty(d, gmfn);
+        paging_mark_dirty(d, gmfn);
     }
 
  out:
diff -r 45516ac94c9f xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/hvm/svm/svm.c	Thu Jun 07 06:36:04 2007 -0500
@@ -1033,8 +1033,8 @@ static int svm_do_nested_pgfault(paddr_t
         return 1;
     }
 
-    /* We should not reach here. Otherwise, P2M table is not correct.*/
-    return 0;
+    paging_mark_dirty(current->domain, get_mfn_from_gpfn(gpa >> PAGE_SHIFT));
+    return p2m_set_flags(current->domain, gpa, __PAGE_HYPERVISOR|_PAGE_USER);
 }
 
 static void svm_do_no_device_fault(struct vmcb_struct *vmcb)
diff -r 45516ac94c9f xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm.c	Wed Jun 06 12:05:10 2007 -0500
@@ -1556,7 +1556,7 @@ int alloc_page_type(struct page_info *pa
 
     /* A page table is dirtied when its type count becomes non-zero. */
     if ( likely(owner != NULL) )
-        mark_dirty(owner, page_to_mfn(page));
+        paging_mark_dirty(owner, page_to_mfn(page));
 
     switch ( type & PGT_type_mask )
     {
@@ -1602,7 +1602,7 @@ void free_page_type(struct page_info *pa
         if ( unlikely(paging_mode_enabled(owner)) )
         {
             /* A page table is dirtied when its type count becomes zero. */
-            mark_dirty(owner, page_to_mfn(page));
+            paging_mark_dirty(owner, page_to_mfn(page));
 
             if ( shadow_mode_refcounts(owner) )
                 return;
@@ -2057,7 +2057,7 @@ int do_mmuext_op(
             }
 
             /* A page is dirtied when its pin status is set. */
-            mark_dirty(d, mfn);
+            paging_mark_dirty(d, mfn);
            
             /* We can race domain destruction (domain_relinquish_resources). */
             if ( unlikely(this_cpu(percpu_mm_info).foreign != NULL) )
@@ -2089,7 +2089,7 @@ int do_mmuext_op(
                 put_page_and_type(page);
                 put_page(page);
                 /* A page is dirtied when its pin status is cleared. */
-                mark_dirty(d, mfn);
+                paging_mark_dirty(d, mfn);
             }
             else
             {
@@ -2424,7 +2424,7 @@ int do_mmu_update(
             set_gpfn_from_mfn(mfn, gpfn);
             okay = 1;
 
-            mark_dirty(FOREIGNDOM, mfn);
+            paging_mark_dirty(FOREIGNDOM, mfn);
 
             put_page(mfn_to_page(mfn));
             break;
@@ -3005,7 +3005,7 @@ long do_update_descriptor(u64 pa, u64 de
         break;
     }
 
-    mark_dirty(dom, mfn);
+    paging_mark_dirty(dom, mfn);
 
     /* All is good so make the update. */
     gdt_pent = map_domain_page(mfn);
diff -r 45516ac94c9f xen/arch/x86/mm/hap/hap.c
--- a/xen/arch/x86/mm/hap/hap.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/hap/hap.c	Fri Jun 08 04:48:25 2007 -0500
@@ -49,6 +49,40 @@
 #undef page_to_mfn
 #define page_to_mfn(_pg) (_mfn((_pg) - frame_table))
 
+/************************************************/
+/*            HAP LOG DIRTY SUPPORT             */
+/************************************************/
+/* hap code to call when log_dirty is enable. return 0 if no problem found. */
+int hap_enable_log_dirty(struct domain *d)
+{
+    hap_lock(d);
+    /* turn on PG_log_dirty bit in paging mode */
+    d->arch.paging.mode |= PG_log_dirty;
+    /* set l1e entries of P2M table to NOT_WRITABLE. */
+    p2m_set_flags_global(d, (_PAGE_PRESENT|_PAGE_USER));
+    flush_tlb_all_pge();
+    hap_unlock(d);
+
+    return 0;
+}
+
+int hap_disable_log_dirty(struct domain *d)
+{
+    hap_lock(d);
+    d->arch.paging.mode &= ~PG_log_dirty;
+    /* set l1e entries of P2M table with normal mode */
+    p2m_set_flags_global(d, __PAGE_HYPERVISOR|_PAGE_USER);
+    hap_unlock(d);
+    
+    return 1;
+}
+
+void hap_clean_dirty_bitmap(struct domain *d)
+{
+    /* mark physical memory as NOT_WRITEABLE and flush the TLB */
+    p2m_set_flags_global(d, (_PAGE_PRESENT|_PAGE_USER));
+    flush_tlb_all_pge();
+}
 /************************************************/
 /*             HAP SUPPORT FUNCTIONS            */
 /************************************************/
@@ -421,6 +455,10 @@ int hap_enable(struct domain *d, u32 mod
         }
     }
 
+    /* initialize log dirty here */
+    paging_log_dirty_init(d, hap_enable_log_dirty, hap_disable_log_dirty,
+                          hap_clean_dirty_bitmap);
+
     /* allocate P2m table */
     if ( mode & PG_translate ) {
         rv = p2m_alloc_table(d, hap_alloc_p2m_page, hap_free_p2m_page);
@@ -498,11 +536,6 @@ int hap_domctl(struct domain *d, xen_dom
 
     HERE_I_AM;
 
-    if ( unlikely(d == current->domain) ) {
-        gdprintk(XENLOG_INFO, "Don't try to do a hap op on yourself!\n");
-        return -EINVAL;
-    }
-    
     switch ( sc->op ) {
     case XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION:
         hap_lock(d);
@@ -669,7 +702,16 @@ hap_write_p2m_entry(struct vcpu *v, unsi
 hap_write_p2m_entry(struct vcpu *v, unsigned long gfn, l1_pgentry_t *p,
                     l1_pgentry_t new, unsigned int level)
 {
-    hap_lock(v->domain);
+    int do_locking;
+
+    /* This function can be called from two directions (P2M and log dirty). We
+     *  need to make sure this lock has been held or not.
+     */
+    do_locking = !hap_locked_by_me(v->domain);
+
+    if ( do_locking )
+        hap_lock(v->domain);
+
     safe_write_pte(p, new);
 #if CONFIG_PAGING_LEVELS == 3
     /* install P2M in monitor table for PAE Xen */
@@ -680,7 +722,9 @@ hap_write_p2m_entry(struct vcpu *v, unsi
 	
     }
 #endif
-    hap_unlock(v->domain);
+    
+    if ( do_locking )
+        hap_unlock(v->domain);
 }
 
 /* Entry points into this mode of the hap code. */
diff -r 45516ac94c9f xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/p2m.c	Thu Jun 07 05:57:09 2007 -0500
@@ -169,7 +169,7 @@ p2m_next_level(struct domain *d, mfn_t *
 
 // Returns 0 on error (out of memory)
 static int
-set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, u32 l1e_flags)
 {
     // XXX -- this might be able to be faster iff current->domain == d
     mfn_t table_mfn = pagetable_get_mfn(d->arch.phys_table);
@@ -213,7 +213,7 @@ set_p2m_entry(struct domain *d, unsigned
         d->arch.p2m.max_mapped_pfn = gfn;
 
     if ( mfn_valid(mfn) )
-        entry_content = l1e_from_pfn(mfn_x(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
+        entry_content = l1e_from_pfn(mfn_x(mfn), l1e_flags);
     else
         entry_content = l1e_empty();
 
@@ -278,7 +278,7 @@ int p2m_alloc_table(struct domain *d,
         p2m_unlock(d);
         return -ENOMEM;
     }
-list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
+    list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
 
     p2m_top->count_info = 1;
     p2m_top->u.inuse.type_info = 
@@ -297,8 +297,8 @@ list_add_tail(&p2m_top->list, &d->arch.p
  
     /* Initialise physmap tables for slot zero. Other code assumes this. */
     gfn = 0;
-mfn = _mfn(INVALID_MFN);
-    if ( !set_p2m_entry(d, gfn, mfn) )
+    mfn = _mfn(INVALID_MFN);
+    if ( !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
         goto error;
 
     for ( entry = d->page_list.next;
@@ -316,7 +316,7 @@ mfn = _mfn(INVALID_MFN);
             (gfn != 0x55555555L)
 #endif
              && gfn != INVALID_M2P_ENTRY
-             && !set_p2m_entry(d, gfn, mfn) )
+             && !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
             goto error;
     }
 
@@ -497,7 +497,7 @@ static void audit_p2m(struct domain *d)
             /* This m2p entry is stale: the domain has another frame in
              * this physical slot.  No great disaster, but for neatness,
              * blow away the m2p entry. */ 
-            set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
+            set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY, __PAGE_HYPERVISOR|_PAGE_USER);
         }
 
         if ( test_linear && (gfn <= d->arch.p2m.max_mapped_pfn) )
@@ -626,7 +626,7 @@ p2m_remove_page(struct domain *d, unsign
     ASSERT(mfn_x(gfn_to_mfn(d, gfn)) == mfn);
     //ASSERT(mfn_to_gfn(d, mfn) == gfn);
 
-    set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+    set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
 }
 
@@ -659,7 +659,7 @@ guest_physmap_add_page(struct domain *d,
     omfn = gfn_to_mfn(d, gfn);
     if ( mfn_valid(omfn) )
     {
-        set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+        set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
         set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
     }
 
@@ -685,13 +685,129 @@ guest_physmap_add_page(struct domain *d,
         }
     }
 
-    set_p2m_entry(d, gfn, _mfn(mfn));
+    set_p2m_entry(d, gfn, _mfn(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, gfn);
 
     audit_p2m(d);
     p2m_unlock(d);
 }
 
+/* This function goes through P2M table and modify l1e flags of all pages. Note
+ * that physical base address of l1e is intact. This function can be used for
+ * special purpose, such as marking physical memory as NOT WRITABLE for
+ * tracking dirty pages during live migration.
+ */
+void p2m_set_flags_global(struct domain *d, u32 l1e_flags)
+{
+    unsigned long mfn, gfn;
+    l1_pgentry_t l1e_content;
+    l1_pgentry_t *l1e;
+    l2_pgentry_t *l2e;
+    int i1, i2;
+#if CONFIG_PAGING_LEVELS >= 3
+    l3_pgentry_t *l3e;
+    int i3;
+#if CONFIG_PAGING_LEVELS == 4
+    l4_pgentry_t *l4e;
+    int i4;
+#endif /* CONFIG_PAGING_LEVELS == 4 */
+#endif /* CONFIG_PAGING_LEVELS >= 3 */
+    
+    if ( !paging_mode_translate(d) )
+        return;
+ 
+    if ( pagetable_get_pfn(d->arch.phys_table) == 0 )
+        return;
+
+    p2m_lock(d);
+        
+#if CONFIG_PAGING_LEVELS == 4
+    l4e = map_domain_page(mfn_x(pagetable_get_mfn(d->arch.phys_table)));
+#elif CONFIG_PAGING_LEVELS == 3
+    l3e = map_domain_page(mfn_x(pagetable_get_mfn(d->arch.phys_table)));
+#else /* CONFIG_PAGING_LEVELS == 2 */
+    l2e = map_domain_page(mfn_x(pagetable_get_mfn(d->arch.phys_table)));
+#endif
+
+#if CONFIG_PAGING_LEVELS >= 3
+#if CONFIG_PAGING_LEVELS >= 4
+    for ( i4 = 0; i4 < L4_PAGETABLE_ENTRIES; i4++ ) 
+    {
+	if ( !(l4e_get_flags(l4e[i4]) & _PAGE_PRESENT) )
+	{
+	    continue;
+	}
+	l3e = map_domain_page(mfn_x(_mfn(l4e_get_pfn(l4e[i4]))));
+#endif /* now at levels 3 or 4... */
+	for ( i3 = 0; 
+	      i3 < ((CONFIG_PAGING_LEVELS==4) ? L3_PAGETABLE_ENTRIES : 8); 
+	      i3++ )
+	{
+	    if ( !(l3e_get_flags(l3e[i3]) & _PAGE_PRESENT) )
+	    {
+		continue;
+	    }
+	    l2e = map_domain_page(mfn_x(_mfn(l3e_get_pfn(l3e[i3]))));
+#endif /* all levels... */
+	    for ( i2 = 0; i2 < L2_PAGETABLE_ENTRIES; i2++ )
+	    {
+		if ( !(l2e_get_flags(l2e[i2]) & _PAGE_PRESENT) )
+		{
+		    continue;
+		}
+		l1e = map_domain_page(mfn_x(_mfn(l2e_get_pfn(l2e[i2]))));
+		
+		for ( i1 = 0; i1 < L1_PAGETABLE_ENTRIES; i1++, gfn++ )
+		{
+		    if ( !(l1e_get_flags(l1e[i1]) & _PAGE_PRESENT) )
+			continue;
+		    mfn = l1e_get_pfn(l1e[i1]);
+		    gfn = get_gpfn_from_mfn(mfn);
+		    /* create a new 1le entry using l1e_flags */
+		    l1e_content = l1e_from_pfn(mfn, l1e_flags);
+		    paging_write_p2m_entry(d, gfn, &l1e[i1], l1e_content, 1);
+		}
+		unmap_domain_page(l1e);
+	    }
+#if CONFIG_PAGING_LEVELS >= 3
+	    unmap_domain_page(l2e);
+	}
+#if CONFIG_PAGING_LEVELS >= 4
+	unmap_domain_page(l3e);
+    }
+#endif
+#endif
+
+#if CONFIG_PAGING_LEVELS == 4
+    unmap_domain_page(l4e);
+#elif CONFIG_PAGING_LEVELS == 3
+    unmap_domain_page(l3e);
+#else /* CONFIG_PAGING_LEVELS == 2 */
+    unmap_domain_page(l2e);
+#endif
+
+    p2m_unlock(d);
+}
+
+/* This function traces through P2M table and modifies l1e flags of a specific
+ * gpa.
+ */
+int p2m_set_flags(struct domain *d, paddr_t gpa, u32 l1e_flags)
+{
+    unsigned long gfn;
+    mfn_t mfn;
+
+    p2m_lock(d);
+
+    gfn = gpa >> PAGE_SHIFT;
+    mfn = gfn_to_mfn(d, gfn);
+    if ( mfn_valid(mfn) )
+        set_p2m_entry(d, gfn, mfn, l1e_flags);
+    
+    p2m_unlock(d);
+
+    return 1;
+}
 
 /*
  * Local variables:
diff -r 45516ac94c9f xen/arch/x86/mm/paging.c
--- a/xen/arch/x86/mm/paging.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/paging.c	Fri Jun 08 04:06:36 2007 -0500
@@ -25,6 +25,7 @@
 #include <asm/shadow.h>
 #include <asm/p2m.h>
 #include <asm/hap.h>
+#include <asm/guest_access.h>
 
 /* Xen command-line option to enable hardware-assisted paging */
 int opt_hap_enabled;
@@ -41,7 +42,272 @@ boolean_param("hap", opt_hap_enabled);
             debugtrace_printk("pgdebug: %s(): " _f, __func__, ##_a); \
     } while (0)
 
-
+/************************************************/
+/*              LOG DIRTY SUPPORT               */
+/************************************************/
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_page
+#define mfn_to_page(_m) (frame_table + mfn_x(_m))
+#undef mfn_valid
+#define mfn_valid(_mfn) (mfn_x(_mfn) < max_page)
+#undef page_to_mfn
+#define page_to_mfn(_pg) (_mfn((_pg) - frame_table))
+
+#define log_dirty_lock_init(_d)                                   \
+    do {                                                          \
+        spin_lock_init(&(_d)->arch.paging.log_dirty.lock);        \
+        (_d)->arch.paging.log_dirty.locker = -1;                  \
+        (_d)->arch.paging.log_dirty.locker_function = "nobody";   \
+    } while (0)
+
+#define log_dirty_lock(_d)                                                   \
+    do {                                                                     \
+        if (unlikely((_d)->arch.paging.log_dirty.locker==current->processor))\
+        {                                                                    \
+            printk("Error: paging log dirty lock held by %s\n",              \
+                   (_d)->arch.paging.log_dirty.locker_function);             \
+            BUG();                                                           \
+        }                                                                    \
+        spin_lock(&(_d)->arch.paging.log_dirty.lock);                        \
+        ASSERT((_d)->arch.paging.log_dirty.locker == -1);                    \
+        (_d)->arch.paging.log_dirty.locker = current->processor;             \
+        (_d)->arch.paging.log_dirty.locker_function = __func__;              \
+    } while (0)
+
+#define log_dirty_unlock(_d)                                              \
+    do {                                                                  \
+        ASSERT((_d)->arch.paging.log_dirty.locker == current->processor); \
+        (_d)->arch.paging.log_dirty.locker = -1;                          \
+        (_d)->arch.paging.log_dirty.locker_function = "nobody";           \
+        spin_unlock(&(_d)->arch.paging.log_dirty.lock);                   \
+    } while (0)
+
+/* allocate bitmap resources for log dirty */
+int paging_alloc_log_dirty_bitmap(struct domain *d)
+{
+    ASSERT(d->arch.paging.log_dirty.bitmap == NULL);
+    d->arch.paging.log_dirty.bitmap_size =
+        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
+    d->arch.paging.log_dirty.bitmap = 
+        xmalloc_array(unsigned long,
+                      d->arch.paging.log_dirty.bitmap_size / BITS_PER_LONG);
+    if ( d->arch.paging.log_dirty.bitmap == NULL )
+    {
+        d->arch.paging.log_dirty.bitmap_size = 0;
+        return -ENOMEM;
+    }
+    memset(d->arch.paging.log_dirty.bitmap, 0,
+           d->arch.paging.log_dirty.bitmap_size/8);
+
+    return 0;
+}
+
+/* free bitmap resources */
+void paging_free_log_dirty_bitmap(struct domain *d)
+{
+    d->arch.paging.log_dirty.bitmap_size = 0;
+    if ( d->arch.paging.log_dirty.bitmap )
+    {
+        xfree(d->arch.paging.log_dirty.bitmap);
+        d->arch.paging.log_dirty.bitmap = NULL;
+    }
+}
+
+int paging_log_dirty_enable(struct domain *d)
+{
+    int ret;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+
+    if ( paging_mode_log_dirty(d) )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    ret = paging_alloc_log_dirty_bitmap(d);
+    if ( ret != 0 )
+    {
+        paging_free_log_dirty_bitmap(d);
+        goto out;
+    }
+
+    ret = d->arch.paging.log_dirty.enable_log_dirty(d);
+    if ( ret != 0 )
+        paging_free_log_dirty_bitmap(d);
+
+ out:
+    log_dirty_unlock(d);
+    domain_unpause(d);
+    return ret;
+}
+
+int paging_log_dirty_disable(struct domain *d)
+{
+    int ret;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+    ret = d->arch.paging.log_dirty.disable_log_dirty(d);
+    if ( !paging_mode_log_dirty(d) )
+        paging_free_log_dirty_bitmap(d);
+    log_dirty_unlock(d);
+    domain_unpause(d);
+
+    return ret;
+}
+
+/* Mark a page as dirty */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn)
+{
+    unsigned long pfn;
+    mfn_t gmfn;
+
+    gmfn = _mfn(guest_mfn);
+
+    if ( !paging_mode_log_dirty(d) || !mfn_valid(gmfn) )
+        return;
+
+    log_dirty_lock(d);
+
+    ASSERT(d->arch.paging.log_dirty.bitmap != NULL);
+
+    /* We /really/ mean PFN here, even for non-translated guests. */
+    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
+
+    /*
+     * Values with the MSB set denote MFNs that aren't really part of the 
+     * domain's pseudo-physical memory map (e.g., the shared info frame).
+     * Nothing to do here...
+     */
+    if ( unlikely(!VALID_M2P(pfn)) )
+        return;
+
+    if ( likely(pfn < d->arch.paging.log_dirty.bitmap_size) ) 
+    { 
+        if ( !__test_and_set_bit(pfn, d->arch.paging.log_dirty.bitmap) )
+        {
+            PAGING_DEBUG(LOGDIRTY, 
+                         "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n",
+                         mfn_x(gmfn), pfn, d->domain_id);
+            d->arch.paging.log_dirty.dirty_count++;
+        }
+    }
+    else
+    {
+        PAGING_PRINTK("mark_dirty OOR! "
+                      "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
+                      "owner=%d c=%08x t=%" PRtype_info "\n",
+                      mfn_x(gmfn), 
+                      pfn, 
+                      d->arch.paging.log_dirty.bitmap_size,
+                      d->domain_id,
+                      (page_get_owner(mfn_to_page(gmfn))
+                       ? page_get_owner(mfn_to_page(gmfn))->domain_id
+                       : -1),
+                      mfn_to_page(gmfn)->count_info, 
+                      mfn_to_page(gmfn)->u.inuse.type_info);
+    }
+    
+    log_dirty_unlock(d);
+}
+
+/* Read a domain's log-dirty bitmap and stats.  If the operation is a CLEAN, 
+ * clear the bitmap and stats as well. */
+int paging_log_dirty_op(struct domain *d, struct xen_domctl_shadow_op *sc)
+{
+    int i, rv = 0, clean = 0, peek = 1;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+
+    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
+
+    PAGING_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", 
+                 (clean) ? "clean" : "peek",
+                 d->domain_id,
+                 d->arch.paging.log_dirty.fault_count, 
+                 d->arch.paging.log_dirty.dirty_count);
+
+    sc->stats.fault_count = d->arch.paging.log_dirty.fault_count;
+    sc->stats.dirty_count = d->arch.paging.log_dirty.dirty_count;
+    
+    if ( clean )
+    {
+        d->arch.paging.log_dirty.fault_count = 0;
+        d->arch.paging.log_dirty.dirty_count = 0;
+
+        /* We need to further call clean_dirty_bitmap() functions of specific
+         * paging modes (shadow or hap).
+         */
+        d->arch.paging.log_dirty.clean_dirty_bitmap(d);
+    }
+
+    if ( guest_handle_is_null(sc->dirty_bitmap) )
+        /* caller may have wanted just to clean the state or access stats. */
+        peek = 0;
+
+    if ( (peek || clean) && (d->arch.paging.log_dirty.bitmap == NULL) )
+    {
+        rv = -EINVAL; /* perhaps should be ENOMEM? */
+        goto out;
+    }
+ 
+    if ( sc->pages > d->arch.paging.log_dirty.bitmap_size )
+        sc->pages = d->arch.paging.log_dirty.bitmap_size;
+
+#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */
+    for ( i = 0; i < sc->pages; i += CHUNK )
+    {
+        int bytes = ((((sc->pages - i) > CHUNK)
+                      ? CHUNK
+                      : (sc->pages - i)) + 7) / 8;
+
+        if ( likely(peek) )
+        {
+            if ( copy_to_guest_offset(
+                sc->dirty_bitmap, i/8,
+                (uint8_t *)d->arch.paging.log_dirty.bitmap + (i/8), bytes) )
+            {
+                rv = -EFAULT;
+                goto out;
+            }
+        }
+
+        if ( clean )
+            memset((uint8_t *)d->arch.paging.log_dirty.bitmap + (i/8), 0, bytes);
+    }
+#undef CHUNK
+
+ out:
+    log_dirty_unlock(d);
+    domain_unpause(d);
+    return rv;
+}
+
+
+/* Note that this function takes three function pointers. Callers must supply
+ * these functions for log dirty code to call. This function usually is 
+ * invoked when paging is enabled. Check shadow_enable() and hap_enable() for 
+ * reference.
+ */
+void paging_log_dirty_init(struct domain *d,
+                           int    (*enable_log_dirty)(struct domain *d),
+                           int    (*disable_log_dirty)(struct domain *d),
+                           void   (*clean_dirty_bitmap)(struct domain *d))
+{
+    /* We initialize log dirty lock first */
+    log_dirty_lock_init(d);
+    
+    d->arch.paging.log_dirty.enable_log_dirty = enable_log_dirty;
+    d->arch.paging.log_dirty.disable_log_dirty = disable_log_dirty;
+    d->arch.paging.log_dirty.clean_dirty_bitmap = clean_dirty_bitmap;
+}
+
+/************************************************/
+/*           CODE FOR PAGING SUPPORT            */
+/************************************************/
 /* Domain paging struct initialization. */
 void paging_domain_init(struct domain *d)
 {
@@ -65,11 +331,60 @@ int paging_domctl(struct domain *d, xen_
 int paging_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,
                   XEN_GUEST_HANDLE(void) u_domctl)
 {
+    int rc;
+
+    if ( unlikely(d == current->domain) )
+    {
+        gdprintk(XENLOG_INFO, "Dom %u tried to do a paging op on itself.\n",
+                 d->domain_id);
+        return -EINVAL;
+    }
+    
+    if ( unlikely(d->is_dying) )
+    {
+        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
+                 d->domain_id);
+        return 0;
+    }
+
+    if ( unlikely(d->vcpu[0] == NULL) )
+    {
+        PAGING_ERROR("Paging op on a domain (%u) with no vcpus\n",
+                     d->domain_id);
+        return -EINVAL;
+    }
+    
+    /* Code to handle log-dirty. Note that some log dirty operations
+     * piggy-back on shadow operations. For example, when 
+     * XEN_DOMCTL_SHADOW_OP_OFF is called, it first checks whether log dirty
+     * mode is enabled. If does, we disables log dirty and continues with 
+     * shadow code. For this reason, we need to further dispatch domctl 
+     * to next-level paging code (shadow or hap).
+     */
+    switch ( sc->op )
+    {
+    case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
+        return paging_log_dirty_enable(d);	
+	
+    case XEN_DOMCTL_SHADOW_OP_ENABLE:	
+        if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY )
+            return paging_log_dirty_enable(d);
+
+    case XEN_DOMCTL_SHADOW_OP_OFF:
+        if ( paging_mode_log_dirty(d) )
+            if ( (rc = paging_log_dirty_disable(d)) != 0 ) 
+                return rc;
+
+    case XEN_DOMCTL_SHADOW_OP_CLEAN:
+    case XEN_DOMCTL_SHADOW_OP_PEEK:
+	return paging_log_dirty_op(d, sc);
+    }
+	
     /* Here, dispatch domctl to the appropriate paging code */
     if ( opt_hap_enabled && is_hvm_domain(d) )
-        return hap_domctl(d, sc, u_domctl);
-    else
-        return shadow_domctl(d, sc, u_domctl);
+	return hap_domctl(d, sc, u_domctl);
+    else
+	return shadow_domctl(d, sc, u_domctl);
 }
 
 /* Call when destroying a domain */
diff -r 45516ac94c9f xen/arch/x86/mm/shadow/common.c
--- a/xen/arch/x86/mm/shadow/common.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/shadow/common.c	Fri Jun 08 04:30:11 2007 -0500
@@ -87,8 +87,6 @@ __initcall(shadow_audit_key_init);
 __initcall(shadow_audit_key_init);
 #endif /* SHADOW_AUDIT */
 
-static void sh_free_log_dirty_bitmap(struct domain *d);
-
 int _shadow_mode_refcounts(struct domain *d)
 {
     return shadow_mode_refcounts(d);
@@ -541,7 +539,7 @@ sh_validate_guest_entry(struct vcpu *v, 
     int result = 0;
     struct page_info *page = mfn_to_page(gmfn);
 
-    sh_mark_dirty(v->domain, gmfn);
+    paging_mark_dirty(v->domain, mfn_x(gmfn));
     
     // Determine which types of shadows are affected, and update each.
     //
@@ -2455,6 +2453,10 @@ int shadow_enable(struct domain *d, u32 
         }        
     }
 
+    /* initialize log dirty here */
+    paging_log_dirty_init(d, shadow_enable_log_dirty, 
+                          shadow_disable_log_dirty, shadow_clean_dirty_bitmap);
+
     /* Init the P2M table.  Must be done before we take the shadow lock 
      * to avoid possible deadlock. */
     if ( mode & PG_translate )
@@ -2463,6 +2465,7 @@ int shadow_enable(struct domain *d, u32 
         if (rv != 0)
             goto out_unlocked;
     }
+
 
     shadow_lock(d);
 
@@ -2564,8 +2567,6 @@ void shadow_teardown(struct domain *d)
         /* Release the hash table back to xenheap */
         if (d->arch.paging.shadow.hash_table) 
             shadow_hash_teardown(d);
-        /* Release the log-dirty bitmap of dirtied pages */
-        sh_free_log_dirty_bitmap(d);
         /* Should not have any more memory held */
         SHADOW_PRINTK("teardown done."
                        "  Shadow pages total = %u, free = %u, p2m=%u\n",
@@ -2718,98 +2719,6 @@ static int shadow_test_disable(struct do
     domain_pause(d);
     shadow_lock(d);
     ret = shadow_one_bit_disable(d, PG_SH_enable);
-    shadow_unlock(d);
-    domain_unpause(d);
-
-    return ret;
-}
-
-static int
-sh_alloc_log_dirty_bitmap(struct domain *d)
-{
-    ASSERT(d->arch.paging.shadow.dirty_bitmap == NULL);
-    d->arch.paging.shadow.dirty_bitmap_size =
-        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
-    d->arch.paging.shadow.dirty_bitmap =
-        xmalloc_array(unsigned long,
-                      d->arch.paging.shadow.dirty_bitmap_size / BITS_PER_LONG);
-    if ( d->arch.paging.shadow.dirty_bitmap == NULL )
-    {
-        d->arch.paging.shadow.dirty_bitmap_size = 0;
-        return -ENOMEM;
-    }
-    memset(d->arch.paging.shadow.dirty_bitmap, 0,
-           d->arch.paging.shadow.dirty_bitmap_size/8);
-
-    return 0;
-}
-
-static void
-sh_free_log_dirty_bitmap(struct domain *d)
-{
-    d->arch.paging.shadow.dirty_bitmap_size = 0;
-    if ( d->arch.paging.shadow.dirty_bitmap )
-    {
-        xfree(d->arch.paging.shadow.dirty_bitmap);
-        d->arch.paging.shadow.dirty_bitmap = NULL;
-    }
-}
-
-static int shadow_log_dirty_enable(struct domain *d)
-{
-    int ret;
-
-    domain_pause(d);
-    shadow_lock(d);
-
-    if ( shadow_mode_log_dirty(d) )
-    {
-        ret = -EINVAL;
-        goto out;
-    }
-
-    if ( shadow_mode_enabled(d) )
-    {
-        /* This domain already has some shadows: need to clear them out 
-         * of the way to make sure that all references to guest memory are 
-         * properly write-protected */
-        shadow_blow_tables(d);
-    }
-
-#if (SHADOW_OPTIMIZATIONS & SHOPT_LINUX_L3_TOPLEVEL)
-    /* 32bit PV guests on 64bit xen behave like older 64bit linux: they
-     * change an l4e instead of cr3 to switch tables.  Give them the
-     * same optimization */
-    if ( is_pv_32on64_domain(d) )
-        d->arch.paging.shadow.opt_flags = SHOPT_LINUX_L3_TOPLEVEL;
-#endif
-
-    ret = sh_alloc_log_dirty_bitmap(d);
-    if ( ret != 0 )
-    {
-        sh_free_log_dirty_bitmap(d);
-        goto out;
-    }
-
-    ret = shadow_one_bit_enable(d, PG_log_dirty);
-    if ( ret != 0 )
-        sh_free_log_dirty_bitmap(d);
-
- out:
-    shadow_unlock(d);
-    domain_unpause(d);
-    return ret;
-}
-
-static int shadow_log_dirty_disable(struct domain *d)
-{
-    int ret;
-
-    domain_pause(d);
-    shadow_lock(d);
-    ret = shadow_one_bit_disable(d, PG_log_dirty);
-    if ( !shadow_mode_log_dirty(d) )
-        sh_free_log_dirty_bitmap(d);
     shadow_unlock(d);
     domain_unpause(d);
 
@@ -2892,150 +2801,62 @@ void shadow_convert_to_log_dirty(struct 
     BUG();
 }
 
-
-/* Read a domain's log-dirty bitmap and stats.  
- * If the operation is a CLEAN, clear the bitmap and stats as well. */
-static int shadow_log_dirty_op(
-    struct domain *d, struct xen_domctl_shadow_op *sc)
-{
-    int i, rv = 0, clean = 0, peek = 1;
-
-    domain_pause(d);
+/* Shadow specific code which is called in paging_log_dirty_enable().
+ * Return 0 if no problem found.
+ */
+int shadow_enable_log_dirty(struct domain *d)
+{
+    int ret;
+
+    /* shadow lock is required here */
     shadow_lock(d);
-
-    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
-
-    SHADOW_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", 
-                  (clean) ? "clean" : "peek",
-                  d->domain_id,
-                  d->arch.paging.shadow.fault_count, 
-                  d->arch.paging.shadow.dirty_count);
-
-    sc->stats.fault_count = d->arch.paging.shadow.fault_count;
-    sc->stats.dirty_count = d->arch.paging.shadow.dirty_count;
-
-    if ( clean )
-    {
-        /* Need to revoke write access to the domain's pages again.
-         * In future, we'll have a less heavy-handed approach to this,
-         * but for now, we just unshadow everything except Xen. */
+    if ( shadow_mode_enabled(d) )
+    {
+        /* This domain already has some shadows: need to clear them out 
+         * of the way to make sure that all references to guest memory are 
+         * properly write-protected */
         shadow_blow_tables(d);
-
-        d->arch.paging.shadow.fault_count = 0;
-        d->arch.paging.shadow.dirty_count = 0;
-    }
-
-    if ( guest_handle_is_null(sc->dirty_bitmap) )
-        /* caller may have wanted just to clean the state or access stats. */
-        peek = 0;
-
-    if ( (peek || clean) && (d->arch.paging.shadow.dirty_bitmap == NULL) )
-    {
-        rv = -EINVAL; /* perhaps should be ENOMEM? */
-        goto out;
-    }
- 
-    if ( sc->pages > d->arch.paging.shadow.dirty_bitmap_size )
-        sc->pages = d->arch.paging.shadow.dirty_bitmap_size;
-
-#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */
-    for ( i = 0; i < sc->pages; i += CHUNK )
-    {
-        int bytes = ((((sc->pages - i) > CHUNK)
-                      ? CHUNK
-                      : (sc->pages - i)) + 7) / 8;
-
-        if ( likely(peek) )
-        {
-            if ( copy_to_guest_offset(
-                sc->dirty_bitmap, i/8,
-                (uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), bytes) )
-            {
-                rv = -EFAULT;
-                goto out;
-            }
-        }
-
-        if ( clean )
-            memset((uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), 0, bytes);
-    }
-#undef CHUNK
-
- out:
+    }
+
+#if (SHADOW_OPTIMIZATIONS & SHOPT_LINUX_L3_TOPLEVEL)
+    /* 32bit PV guests on 64bit xen behave like older 64bit linux: they
+     * change an l4e instead of cr3 to switch tables.  Give them the
+     * same optimization */
+    if ( is_pv_32on64_domain(d) )
+        d->arch.paging.shadow.opt_flags = SHOPT_LINUX_L3_TOPLEVEL;
+#endif
+    
+    ret = shadow_one_bit_enable(d, PG_log_dirty);
     shadow_unlock(d);
-    domain_unpause(d);
-    return rv;
-}
-
-
-/* Mark a page as dirty */
-void sh_mark_dirty(struct domain *d, mfn_t gmfn)
-{
-    unsigned long pfn;
-    int do_locking;
-
-    if ( !shadow_mode_log_dirty(d) || !mfn_valid(gmfn) )
-        return;
-
-    /* Although this is an externally visible function, we do not know
-     * whether the shadow lock will be held when it is called (since it
-     * can be called from __hvm_copy during emulation).
-     * If the lock isn't held, take it for the duration of the call. */
-    do_locking = !shadow_locked_by_me(d);
-    if ( do_locking ) 
-    { 
-        shadow_lock(d);
-        /* Check the mode again with the lock held */ 
-        if ( unlikely(!shadow_mode_log_dirty(d)) )
-        {
-            shadow_unlock(d);
-            return;
-        }
-    }
-
-    ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL);
-
-    /* We /really/ mean PFN here, even for non-translated guests. */
-    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
-
-    /*
-     * Values with the MSB set denote MFNs that aren't really part of the 
-     * domain's pseudo-physical memory map (e.g., the shared info frame).
-     * Nothing to do here...
-     */
-    if ( unlikely(!VALID_M2P(pfn)) )
-        return;
-
-    /* N.B. Can use non-atomic TAS because protected by shadow_lock. */
-    if ( likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) ) 
-    { 
-        if ( !__test_and_set_bit(pfn, d->arch.paging.shadow.dirty_bitmap) )
-        {
-            SHADOW_DEBUG(LOGDIRTY, 
-                          "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n",
-                          mfn_x(gmfn), pfn, d->domain_id);
-            d->arch.paging.shadow.dirty_count++;
-        }
-    }
-    else
-    {
-        SHADOW_PRINTK("mark_dirty OOR! "
-                       "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
-                       "owner=%d c=%08x t=%" PRtype_info "\n",
-                       mfn_x(gmfn), 
-                       pfn, 
-                       d->arch.paging.shadow.dirty_bitmap_size,
-                       d->domain_id,
-                       (page_get_owner(mfn_to_page(gmfn))
-                        ? page_get_owner(mfn_to_page(gmfn))->domain_id
-                        : -1),
-                       mfn_to_page(gmfn)->count_info, 
-                       mfn_to_page(gmfn)->u.inuse.type_info);
-    }
-
-    if ( do_locking ) shadow_unlock(d);
-}
-
+
+    return ret;
+}
+
+/* shadow specfic code which is called in paging_log_dirty_disable() */
+int shadow_disable_log_dirty(struct domain *d)
+{
+    int ret;
+
+    /* shadow lock is required here */    
+    shadow_lock(d);
+    ret = shadow_one_bit_disable(d, PG_log_dirty);
+    shadow_unlock(d);
+    
+    return ret;
+}
+
+/* This function is called when we CLEAN log dirty bitmap. See 
+ * paging_log_dirty_op() for details. 
+ */
+void shadow_clean_dirty_bitmap(struct domain *d)
+{
+    shadow_lock(d);
+    /* Need to revoke write access to the domain's pages again.
+     * In future, we'll have a less heavy-handed approach to this,
+     * but for now, we just unshadow everything except Xen. */
+    shadow_blow_tables(d);
+    shadow_unlock(d);
+}
 /**************************************************************************/
 /* Shadow-control XEN_DOMCTL dispatcher */
 
@@ -3045,33 +2866,9 @@ int shadow_domctl(struct domain *d,
 {
     int rc, preempted = 0;
 
-    if ( unlikely(d == current->domain) )
-    {
-        gdprintk(XENLOG_INFO, "Dom %u tried to do a shadow op on itself.\n",
-                 d->domain_id);
-        return -EINVAL;
-    }
-
-    if ( unlikely(d->is_dying) )
-    {
-        gdprintk(XENLOG_INFO, "Ignoring shadow op on dying domain %u\n",
-                 d->domain_id);
-        return 0;
-    }
-
-    if ( unlikely(d->vcpu[0] == NULL) )
-    {
-        SHADOW_ERROR("Shadow op on a domain (%u) with no vcpus\n",
-                     d->domain_id);
-        return -EINVAL;
-    }
-
     switch ( sc->op )
     {
     case XEN_DOMCTL_SHADOW_OP_OFF:
-        if ( shadow_mode_log_dirty(d) )
-            if ( (rc = shadow_log_dirty_disable(d)) != 0 ) 
-                return rc;
         if ( d->arch.paging.mode == PG_SH_enable )
             if ( (rc = shadow_test_disable(d)) != 0 ) 
                 return rc;
@@ -3080,19 +2877,10 @@ int shadow_domctl(struct domain *d,
     case XEN_DOMCTL_SHADOW_OP_ENABLE_TEST:
         return shadow_test_enable(d);
 
-    case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
-        return shadow_log_dirty_enable(d);
-
     case XEN_DOMCTL_SHADOW_OP_ENABLE_TRANSLATE:
         return shadow_enable(d, PG_refcounts|PG_translate);
 
-    case XEN_DOMCTL_SHADOW_OP_CLEAN:
-    case XEN_DOMCTL_SHADOW_OP_PEEK:
-        return shadow_log_dirty_op(d, sc);
-
     case XEN_DOMCTL_SHADOW_OP_ENABLE:
-        if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY )
-            return shadow_log_dirty_enable(d);
         return shadow_enable(d, sc->mode << PG_mode_shift);
 
     case XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION:
diff -r 45516ac94c9f xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/shadow/multi.c	Wed Jun 06 12:08:38 2007 -0500
@@ -457,7 +457,7 @@ static u32 guest_set_ad_bits(struct vcpu
     }
 
     /* Set the bit(s) */
-    sh_mark_dirty(v->domain, gmfn);
+    paging_mark_dirty(v->domain, mfn_x(gmfn));
     SHADOW_DEBUG(A_AND_D, "gfn = %" SH_PRI_gfn ", "
                  "old flags = %#x, new flags = %#x\n", 
                  gfn_x(guest_l1e_get_gfn(*ep)), guest_l1e_get_flags(*ep), 
@@ -717,7 +717,7 @@ _sh_propagate(struct vcpu *v,
     if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) )
     {
         if ( ft & FETCH_TYPE_WRITE ) 
-            sh_mark_dirty(d, target_mfn);
+            paging_mark_dirty(d, mfn_x(target_mfn));
         else if ( !sh_mfn_is_dirty(d, target_mfn) )
             sflags &= ~_PAGE_RW;
     }
@@ -2856,7 +2856,7 @@ static int sh_page_fault(struct vcpu *v,
     }
 
     perfc_incr(shadow_fault_fixed);
-    d->arch.paging.shadow.fault_count++;
+    d->arch.paging.log_dirty.fault_count++;
     reset_early_unshadow(v);
 
  done:
@@ -4058,7 +4058,7 @@ sh_x86_emulate_write(struct vcpu *v, uns
     else
         reset_early_unshadow(v);
     
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
@@ -4114,7 +4114,7 @@ sh_x86_emulate_cmpxchg(struct vcpu *v, u
     else
         reset_early_unshadow(v);
 
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
@@ -4158,7 +4158,7 @@ sh_x86_emulate_cmpxchg8b(struct vcpu *v,
     else
         reset_early_unshadow(v);
 
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
diff -r 45516ac94c9f xen/arch/x86/mm/shadow/private.h
--- a/xen/arch/x86/mm/shadow/private.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/shadow/private.h	Wed Jun 06 09:12:08 2007 -0500
@@ -496,13 +496,13 @@ sh_mfn_is_dirty(struct domain *d, mfn_t 
 {
     unsigned long pfn;
     ASSERT(shadow_mode_log_dirty(d));
-    ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL);
+    ASSERT(d->arch.paging.log_dirty.bitmap != NULL);
 
     /* We /really/ mean PFN here, even for non-translated guests. */
     pfn = get_gpfn_from_mfn(mfn_x(gmfn));
     if ( likely(VALID_M2P(pfn))
-         && likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) 
-         && test_bit(pfn, d->arch.paging.shadow.dirty_bitmap) )
+         && likely(pfn < d->arch.paging.log_dirty.bitmap_size) 
+         && test_bit(pfn, d->arch.paging.log_dirty.bitmap) )
         return 1;
 
     return 0;
diff -r 45516ac94c9f xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/domain.h	Wed Jun 06 12:34:24 2007 -0500
@@ -92,14 +92,6 @@ struct shadow_domain {
 
     /* Fast MMIO path heuristic */
     int has_fast_mmio_entries;
-
-    /* Shadow log-dirty bitmap */
-    unsigned long *dirty_bitmap;
-    unsigned int dirty_bitmap_size;  /* in pages, bit per page */
-
-    /* Shadow log-dirty mode stats */
-    unsigned int fault_count;
-    unsigned int dirty_count;
 };
 
 struct shadow_vcpu {
@@ -134,7 +126,6 @@ struct hap_domain {
 /************************************************/
 /*       p2m handling                           */
 /************************************************/
-
 struct p2m_domain {
     /* Lock that protects updates to the p2m */
     spinlock_t         lock;
@@ -156,16 +147,36 @@ struct p2m_domain {
 /************************************************/
 /*       common paging data structure           */
 /************************************************/
+struct log_dirty_domain {
+    /* log-dirty lock */
+    spinlock_t     lock;
+    int            locker; /* processor that holds the lock */
+    const char    *locker_function; /* func that took it */
+
+    /* log-dirty bitmap to record dirty pages */
+    unsigned long *bitmap;
+    unsigned int   bitmap_size;  /* in pages, bit per page */
+
+    /* log-dirty mode stats */
+    unsigned int   fault_count;
+    unsigned int   dirty_count;
+
+    /* functions which are paging mode specific */
+    int            (*enable_log_dirty   )(struct domain *d);
+    int            (*disable_log_dirty  )(struct domain *d);
+    void           (*clean_dirty_bitmap )(struct domain *d);
+};
+
 struct paging_domain {
-    u32               mode;  /* flags to control paging operation */
-
+    /* flags to control paging operation */
+    u32                     mode;
     /* extension for shadow paging support */
-    struct shadow_domain shadow;
-
-    /* Other paging assistance code will have structs here */
-    struct hap_domain    hap;
-};
-
+    struct shadow_domain    shadow;
+    /* extension for hardware-assited paging */
+    struct hap_domain       hap;
+    /* log dirty support */
+    struct log_dirty_domain log_dirty;
+};
 struct paging_vcpu {
     /* Pointers to mode-specific entry points. */
     struct paging_mode *mode;
diff -r 45516ac94c9f xen/include/asm-x86/grant_table.h
--- a/xen/include/asm-x86/grant_table.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/grant_table.h	Wed Jun 06 12:03:21 2007 -0500
@@ -31,7 +31,7 @@ int replace_grant_host_mapping(
 #define gnttab_shared_gmfn(d, t, i)                     \
     (mfn_to_gmfn(d, gnttab_shared_mfn(d, t, i)))
 
-#define gnttab_mark_dirty(d, f) mark_dirty((d), (f))
+#define gnttab_mark_dirty(d, f) paging_mark_dirty((d), (f))
 
 static inline void gnttab_clear_flag(unsigned long nr, uint16_t *addr)
 {
diff -r 45516ac94c9f xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/p2m.h	Thu Jun 07 05:37:12 2007 -0500
@@ -129,6 +129,11 @@ void guest_physmap_remove_page(struct do
 void guest_physmap_remove_page(struct domain *d, unsigned long gfn,
                                unsigned long mfn);
 
+/* set P2M table l1e flags */
+void p2m_set_flags_global(struct domain *d, u32 l1e_flags);
+
+/* set P2M table l1e flags for a gpa */
+int p2m_set_flags(struct domain *d, paddr_t gpa, u32 l1e_flags);
 
 #endif /* _XEN_P2M_H */
 
diff -r 45516ac94c9f xen/include/asm-x86/paging.h
--- a/xen/include/asm-x86/paging.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/paging.h	Wed Jun 06 12:36:54 2007 -0500
@@ -62,6 +62,9 @@
 #define paging_mode_log_dirty(_d) ((_d)->arch.paging.mode & PG_log_dirty)
 #define paging_mode_translate(_d) ((_d)->arch.paging.mode & PG_translate)
 #define paging_mode_external(_d)  ((_d)->arch.paging.mode & PG_external)
+
+/* flags used for paging debug */
+#define PAGING_DEBUG_LOGDIRTY 0
 
 /******************************************************************************
  * The equivalent for a particular vcpu of a shadowed domain. */
@@ -136,6 +139,29 @@ struct paging_mode {
     struct shadow_paging_mode shadow;
 };
 
+/*****************************************************************************
+ * Log dirty code */
+
+/* allocate log dirty bitmap resource for recording dirty pages */
+int paging_alloc_log_dirty_bitmap(struct domain *d);
+
+/* free log dirty bitmap resource */
+void paging_free_log_dirty_bitmap(struct domain *d);
+
+/* enable log dirty */
+int paging_log_dirty_enable(struct domain *d);
+
+/* disable log dirty */
+int paging_log_dirty_disable(struct domain *d);
+
+/* log dirty initialization */
+void paging_log_dirty_init(struct domain *d,
+                           int  (*enable_log_dirty)(struct domain *d),
+                           int  (*disable_log_dirty)(struct domain *d),
+                           void (*clean_dirty_bitmap)(struct domain *d));
+
+/* mark a page as dirty */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn);
 
 /*****************************************************************************
  * Entry points into the paging-assistance code */
diff -r 45516ac94c9f xen/include/asm-x86/shadow.h
--- a/xen/include/asm-x86/shadow.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/shadow.h	Wed Jun 06 12:37:52 2007 -0500
@@ -75,16 +75,14 @@ void shadow_teardown(struct domain *d);
 /* Call once all of the references to the domain have gone away */
 void shadow_final_teardown(struct domain *d);
 
-/* Mark a page as dirty in the log-dirty bitmap: called when Xen 
- * makes changes to guest memory on its behalf. */
-void sh_mark_dirty(struct domain *d, mfn_t gmfn);
-/* Cleaner version so we don't pepper shadow_mode tests all over the place */
-static inline void mark_dirty(struct domain *d, unsigned long gmfn)
-{
-    if ( unlikely(shadow_mode_log_dirty(d)) )
-        /* See the comment about locking in sh_mark_dirty */
-        sh_mark_dirty(d, _mfn(gmfn));
-}
+/* shadow code to call when log dirty is enabled */
+int shadow_enable_log_dirty(struct domain *d);
+
+/* shadow code to call when log dirty is disabled */
+int shadow_disable_log_dirty(struct domain *d);
+
+/* shadow code to call when bitmap is being cleaned */
+void shadow_clean_dirty_bitmap(struct domain *d);
 
 /* Update all the things that are derived from the guest's CR0/CR3/CR4.
  * Called to initialize paging structures if the paging mode

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [RFC] Nested Paging Live Migration
  2007-06-08 10:52         ` Tim Deegan
  2007-06-08 16:09           ` Huang2, Wei
@ 2007-06-08 19:26           ` Huang2, Wei
  1 sibling, 0 replies; 8+ messages in thread
From: Huang2, Wei @ 2007-06-08 19:26 UTC (permalink / raw)
  To: Tim Deegan; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2069 bytes --]

This patch creates a common interface for live migration. It also
supports nested paging live migration.

Signed-off-by: Wei Huang <wei.huang2@amd.com>


>> @@ -2565,7 +2568,7 @@ void shadow_teardown(struct domain *d)
>>          if (d->arch.paging.shadow.hash_table)
>>              shadow_hash_teardown(d);
>>          /* Release the log-dirty bitmap of dirtied pages */
>> -        sh_free_log_dirty_bitmap(d);
>> +        paging_free_log_dirty_bitmap(d);
> 
> Shouldn't this be handled in paging.c?  Otherwise we'd need to
> acquire the log-dirty lock with the shadow lock held. 
> 

Please ignore my previous patch. Most of time, log dirty will be turned
on and off together. Under this assumption, my previous patch only
removes paging_free_log_dirty_bitmap() from shadow.c and hap.c. But it
does not call paging_free_log_dirty_bitmap() in paging_teardown().

The attached file is same as previous one, expect that it adds a
paging_log_dirty_teardown() function to paging.c (see below).

Thanks,

-Wei

===============
diff -r f270fef2fb60 -r 6323c8beb60c xen/arch/x86/mm/paging.c
--- a/xen/arch/x86/mm/paging.c  Fri Jun 08 05:10:28 2007 -0500
+++ b/xen/arch/x86/mm/paging.c  Fri Jun 08 08:05:56 2007 -0500
@@ -305,6 +305,13 @@ void paging_log_dirty_init(struct domain
     d->arch.paging.log_dirty.clean_dirty_bitmap = clean_dirty_bitmap;
 }
 
+/* This function fress log dirty bitmap resources. */
+void paging_log_dirty_teardown(struct domain*d)
+{
+    log_dirty_lock(d);
+    paging_free_log_dirty_bitmap(d);
+    log_dirty_unlock(d);
+}
 /************************************************/
 /*           CODE FOR PAGING SUPPORT            */
 /************************************************/
@@ -390,6 +397,9 @@ int paging_domctl(struct domain *d, xen_
 /* Call when destroying a domain */
 void paging_teardown(struct domain *d)
 {
+    /* clean up log dirty resources. */
+    paging_log_dirty_teardown(d);
+    
     if ( opt_hap_enabled && is_hvm_domain(d) )
         hap_teardown(d);
     else

[-- Attachment #2: live_migration_patch_with_bitmap_free.txt --]
[-- Type: text/plain, Size: 48757 bytes --]

diff -r 45516ac94c9f xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/hvm/hvm.c	Wed Jun 06 12:05:42 2007 -0500
@@ -568,7 +568,7 @@ static int __hvm_copy(void *buf, paddr_t
         if ( dir )
         {
             memcpy(p, buf, count); /* dir == TRUE:  *to* guest */
-            mark_dirty(current->domain, mfn);
+            paging_mark_dirty(current->domain, mfn);
         }
         else
             memcpy(buf, p, count); /* dir == FALSE: *from guest */
diff -r 45516ac94c9f xen/arch/x86/hvm/io.c
--- a/xen/arch/x86/hvm/io.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/hvm/io.c	Wed Jun 06 12:05:56 2007 -0500
@@ -865,7 +865,7 @@ void hvm_io_assist(void)
     if ( (p->dir == IOREQ_READ) && p->data_is_ptr )
     {
         gmfn = get_mfn_from_gpfn(paging_gva_to_gfn(v, p->data));
-        mark_dirty(d, gmfn);
+        paging_mark_dirty(d, gmfn);
     }
 
  out:
diff -r 45516ac94c9f xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/hvm/svm/svm.c	Thu Jun 07 06:36:04 2007 -0500
@@ -1033,8 +1033,8 @@ static int svm_do_nested_pgfault(paddr_t
         return 1;
     }
 
-    /* We should not reach here. Otherwise, P2M table is not correct.*/
-    return 0;
+    paging_mark_dirty(current->domain, get_mfn_from_gpfn(gpa >> PAGE_SHIFT));
+    return p2m_set_flags(current->domain, gpa, __PAGE_HYPERVISOR|_PAGE_USER);
 }
 
 static void svm_do_no_device_fault(struct vmcb_struct *vmcb)
diff -r 45516ac94c9f xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm.c	Wed Jun 06 12:05:10 2007 -0500
@@ -1556,7 +1556,7 @@ int alloc_page_type(struct page_info *pa
 
     /* A page table is dirtied when its type count becomes non-zero. */
     if ( likely(owner != NULL) )
-        mark_dirty(owner, page_to_mfn(page));
+        paging_mark_dirty(owner, page_to_mfn(page));
 
     switch ( type & PGT_type_mask )
     {
@@ -1602,7 +1602,7 @@ void free_page_type(struct page_info *pa
         if ( unlikely(paging_mode_enabled(owner)) )
         {
             /* A page table is dirtied when its type count becomes zero. */
-            mark_dirty(owner, page_to_mfn(page));
+            paging_mark_dirty(owner, page_to_mfn(page));
 
             if ( shadow_mode_refcounts(owner) )
                 return;
@@ -2057,7 +2057,7 @@ int do_mmuext_op(
             }
 
             /* A page is dirtied when its pin status is set. */
-            mark_dirty(d, mfn);
+            paging_mark_dirty(d, mfn);
            
             /* We can race domain destruction (domain_relinquish_resources). */
             if ( unlikely(this_cpu(percpu_mm_info).foreign != NULL) )
@@ -2089,7 +2089,7 @@ int do_mmuext_op(
                 put_page_and_type(page);
                 put_page(page);
                 /* A page is dirtied when its pin status is cleared. */
-                mark_dirty(d, mfn);
+                paging_mark_dirty(d, mfn);
             }
             else
             {
@@ -2424,7 +2424,7 @@ int do_mmu_update(
             set_gpfn_from_mfn(mfn, gpfn);
             okay = 1;
 
-            mark_dirty(FOREIGNDOM, mfn);
+            paging_mark_dirty(FOREIGNDOM, mfn);
 
             put_page(mfn_to_page(mfn));
             break;
@@ -3005,7 +3005,7 @@ long do_update_descriptor(u64 pa, u64 de
         break;
     }
 
-    mark_dirty(dom, mfn);
+    paging_mark_dirty(dom, mfn);
 
     /* All is good so make the update. */
     gdt_pent = map_domain_page(mfn);
diff -r 45516ac94c9f xen/arch/x86/mm/hap/hap.c
--- a/xen/arch/x86/mm/hap/hap.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/hap/hap.c	Fri Jun 08 04:48:25 2007 -0500
@@ -49,6 +49,40 @@
 #undef page_to_mfn
 #define page_to_mfn(_pg) (_mfn((_pg) - frame_table))
 
+/************************************************/
+/*            HAP LOG DIRTY SUPPORT             */
+/************************************************/
+/* hap code to call when log_dirty is enable. return 0 if no problem found. */
+int hap_enable_log_dirty(struct domain *d)
+{
+    hap_lock(d);
+    /* turn on PG_log_dirty bit in paging mode */
+    d->arch.paging.mode |= PG_log_dirty;
+    /* set l1e entries of P2M table to NOT_WRITABLE. */
+    p2m_set_flags_global(d, (_PAGE_PRESENT|_PAGE_USER));
+    flush_tlb_all_pge();
+    hap_unlock(d);
+
+    return 0;
+}
+
+int hap_disable_log_dirty(struct domain *d)
+{
+    hap_lock(d);
+    d->arch.paging.mode &= ~PG_log_dirty;
+    /* set l1e entries of P2M table with normal mode */
+    p2m_set_flags_global(d, __PAGE_HYPERVISOR|_PAGE_USER);
+    hap_unlock(d);
+    
+    return 1;
+}
+
+void hap_clean_dirty_bitmap(struct domain *d)
+{
+    /* mark physical memory as NOT_WRITEABLE and flush the TLB */
+    p2m_set_flags_global(d, (_PAGE_PRESENT|_PAGE_USER));
+    flush_tlb_all_pge();
+}
 /************************************************/
 /*             HAP SUPPORT FUNCTIONS            */
 /************************************************/
@@ -421,6 +455,10 @@ int hap_enable(struct domain *d, u32 mod
         }
     }
 
+    /* initialize log dirty here */
+    paging_log_dirty_init(d, hap_enable_log_dirty, hap_disable_log_dirty,
+                          hap_clean_dirty_bitmap);
+
     /* allocate P2m table */
     if ( mode & PG_translate ) {
         rv = p2m_alloc_table(d, hap_alloc_p2m_page, hap_free_p2m_page);
@@ -498,11 +536,6 @@ int hap_domctl(struct domain *d, xen_dom
 
     HERE_I_AM;
 
-    if ( unlikely(d == current->domain) ) {
-        gdprintk(XENLOG_INFO, "Don't try to do a hap op on yourself!\n");
-        return -EINVAL;
-    }
-    
     switch ( sc->op ) {
     case XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION:
         hap_lock(d);
@@ -669,7 +702,16 @@ hap_write_p2m_entry(struct vcpu *v, unsi
 hap_write_p2m_entry(struct vcpu *v, unsigned long gfn, l1_pgentry_t *p,
                     l1_pgentry_t new, unsigned int level)
 {
-    hap_lock(v->domain);
+    int do_locking;
+
+    /* This function can be called from two directions (P2M and log dirty). We
+     *  need to make sure this lock has been held or not.
+     */
+    do_locking = !hap_locked_by_me(v->domain);
+
+    if ( do_locking )
+        hap_lock(v->domain);
+
     safe_write_pte(p, new);
 #if CONFIG_PAGING_LEVELS == 3
     /* install P2M in monitor table for PAE Xen */
@@ -680,7 +722,9 @@ hap_write_p2m_entry(struct vcpu *v, unsi
 	
     }
 #endif
-    hap_unlock(v->domain);
+    
+    if ( do_locking )
+        hap_unlock(v->domain);
 }
 
 /* Entry points into this mode of the hap code. */
diff -r 45516ac94c9f xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/p2m.c	Thu Jun 07 05:57:09 2007 -0500
@@ -169,7 +169,7 @@ p2m_next_level(struct domain *d, mfn_t *
 
 // Returns 0 on error (out of memory)
 static int
-set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, u32 l1e_flags)
 {
     // XXX -- this might be able to be faster iff current->domain == d
     mfn_t table_mfn = pagetable_get_mfn(d->arch.phys_table);
@@ -213,7 +213,7 @@ set_p2m_entry(struct domain *d, unsigned
         d->arch.p2m.max_mapped_pfn = gfn;
 
     if ( mfn_valid(mfn) )
-        entry_content = l1e_from_pfn(mfn_x(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
+        entry_content = l1e_from_pfn(mfn_x(mfn), l1e_flags);
     else
         entry_content = l1e_empty();
 
@@ -278,7 +278,7 @@ int p2m_alloc_table(struct domain *d,
         p2m_unlock(d);
         return -ENOMEM;
     }
-list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
+    list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
 
     p2m_top->count_info = 1;
     p2m_top->u.inuse.type_info = 
@@ -297,8 +297,8 @@ list_add_tail(&p2m_top->list, &d->arch.p
  
     /* Initialise physmap tables for slot zero. Other code assumes this. */
     gfn = 0;
-mfn = _mfn(INVALID_MFN);
-    if ( !set_p2m_entry(d, gfn, mfn) )
+    mfn = _mfn(INVALID_MFN);
+    if ( !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
         goto error;
 
     for ( entry = d->page_list.next;
@@ -316,7 +316,7 @@ mfn = _mfn(INVALID_MFN);
             (gfn != 0x55555555L)
 #endif
              && gfn != INVALID_M2P_ENTRY
-             && !set_p2m_entry(d, gfn, mfn) )
+             && !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
             goto error;
     }
 
@@ -497,7 +497,7 @@ static void audit_p2m(struct domain *d)
             /* This m2p entry is stale: the domain has another frame in
              * this physical slot.  No great disaster, but for neatness,
              * blow away the m2p entry. */ 
-            set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
+            set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY, __PAGE_HYPERVISOR|_PAGE_USER);
         }
 
         if ( test_linear && (gfn <= d->arch.p2m.max_mapped_pfn) )
@@ -626,7 +626,7 @@ p2m_remove_page(struct domain *d, unsign
     ASSERT(mfn_x(gfn_to_mfn(d, gfn)) == mfn);
     //ASSERT(mfn_to_gfn(d, mfn) == gfn);
 
-    set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+    set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
 }
 
@@ -659,7 +659,7 @@ guest_physmap_add_page(struct domain *d,
     omfn = gfn_to_mfn(d, gfn);
     if ( mfn_valid(omfn) )
     {
-        set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+        set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
         set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
     }
 
@@ -685,13 +685,129 @@ guest_physmap_add_page(struct domain *d,
         }
     }
 
-    set_p2m_entry(d, gfn, _mfn(mfn));
+    set_p2m_entry(d, gfn, _mfn(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, gfn);
 
     audit_p2m(d);
     p2m_unlock(d);
 }
 
+/* This function goes through P2M table and modify l1e flags of all pages. Note
+ * that physical base address of l1e is intact. This function can be used for
+ * special purpose, such as marking physical memory as NOT WRITABLE for
+ * tracking dirty pages during live migration.
+ */
+void p2m_set_flags_global(struct domain *d, u32 l1e_flags)
+{
+    unsigned long mfn, gfn;
+    l1_pgentry_t l1e_content;
+    l1_pgentry_t *l1e;
+    l2_pgentry_t *l2e;
+    int i1, i2;
+#if CONFIG_PAGING_LEVELS >= 3
+    l3_pgentry_t *l3e;
+    int i3;
+#if CONFIG_PAGING_LEVELS == 4
+    l4_pgentry_t *l4e;
+    int i4;
+#endif /* CONFIG_PAGING_LEVELS == 4 */
+#endif /* CONFIG_PAGING_LEVELS >= 3 */
+    
+    if ( !paging_mode_translate(d) )
+        return;
+ 
+    if ( pagetable_get_pfn(d->arch.phys_table) == 0 )
+        return;
+
+    p2m_lock(d);
+        
+#if CONFIG_PAGING_LEVELS == 4
+    l4e = map_domain_page(mfn_x(pagetable_get_mfn(d->arch.phys_table)));
+#elif CONFIG_PAGING_LEVELS == 3
+    l3e = map_domain_page(mfn_x(pagetable_get_mfn(d->arch.phys_table)));
+#else /* CONFIG_PAGING_LEVELS == 2 */
+    l2e = map_domain_page(mfn_x(pagetable_get_mfn(d->arch.phys_table)));
+#endif
+
+#if CONFIG_PAGING_LEVELS >= 3
+#if CONFIG_PAGING_LEVELS >= 4
+    for ( i4 = 0; i4 < L4_PAGETABLE_ENTRIES; i4++ ) 
+    {
+	if ( !(l4e_get_flags(l4e[i4]) & _PAGE_PRESENT) )
+	{
+	    continue;
+	}
+	l3e = map_domain_page(mfn_x(_mfn(l4e_get_pfn(l4e[i4]))));
+#endif /* now at levels 3 or 4... */
+	for ( i3 = 0; 
+	      i3 < ((CONFIG_PAGING_LEVELS==4) ? L3_PAGETABLE_ENTRIES : 8); 
+	      i3++ )
+	{
+	    if ( !(l3e_get_flags(l3e[i3]) & _PAGE_PRESENT) )
+	    {
+		continue;
+	    }
+	    l2e = map_domain_page(mfn_x(_mfn(l3e_get_pfn(l3e[i3]))));
+#endif /* all levels... */
+	    for ( i2 = 0; i2 < L2_PAGETABLE_ENTRIES; i2++ )
+	    {
+		if ( !(l2e_get_flags(l2e[i2]) & _PAGE_PRESENT) )
+		{
+		    continue;
+		}
+		l1e = map_domain_page(mfn_x(_mfn(l2e_get_pfn(l2e[i2]))));
+		
+		for ( i1 = 0; i1 < L1_PAGETABLE_ENTRIES; i1++, gfn++ )
+		{
+		    if ( !(l1e_get_flags(l1e[i1]) & _PAGE_PRESENT) )
+			continue;
+		    mfn = l1e_get_pfn(l1e[i1]);
+		    gfn = get_gpfn_from_mfn(mfn);
+		    /* create a new 1le entry using l1e_flags */
+		    l1e_content = l1e_from_pfn(mfn, l1e_flags);
+		    paging_write_p2m_entry(d, gfn, &l1e[i1], l1e_content, 1);
+		}
+		unmap_domain_page(l1e);
+	    }
+#if CONFIG_PAGING_LEVELS >= 3
+	    unmap_domain_page(l2e);
+	}
+#if CONFIG_PAGING_LEVELS >= 4
+	unmap_domain_page(l3e);
+    }
+#endif
+#endif
+
+#if CONFIG_PAGING_LEVELS == 4
+    unmap_domain_page(l4e);
+#elif CONFIG_PAGING_LEVELS == 3
+    unmap_domain_page(l3e);
+#else /* CONFIG_PAGING_LEVELS == 2 */
+    unmap_domain_page(l2e);
+#endif
+
+    p2m_unlock(d);
+}
+
+/* This function traces through P2M table and modifies l1e flags of a specific
+ * gpa.
+ */
+int p2m_set_flags(struct domain *d, paddr_t gpa, u32 l1e_flags)
+{
+    unsigned long gfn;
+    mfn_t mfn;
+
+    p2m_lock(d);
+
+    gfn = gpa >> PAGE_SHIFT;
+    mfn = gfn_to_mfn(d, gfn);
+    if ( mfn_valid(mfn) )
+        set_p2m_entry(d, gfn, mfn, l1e_flags);
+    
+    p2m_unlock(d);
+
+    return 1;
+}
 
 /*
  * Local variables:
diff -r 45516ac94c9f xen/arch/x86/mm/paging.c
--- a/xen/arch/x86/mm/paging.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/paging.c	Fri Jun 08 05:57:11 2007 -0500
@@ -25,6 +25,7 @@
 #include <asm/shadow.h>
 #include <asm/p2m.h>
 #include <asm/hap.h>
+#include <asm/guest_access.h>
 
 /* Xen command-line option to enable hardware-assisted paging */
 int opt_hap_enabled;
@@ -41,7 +42,279 @@ boolean_param("hap", opt_hap_enabled);
             debugtrace_printk("pgdebug: %s(): " _f, __func__, ##_a); \
     } while (0)
 
-
+/************************************************/
+/*              LOG DIRTY SUPPORT               */
+/************************************************/
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_page
+#define mfn_to_page(_m) (frame_table + mfn_x(_m))
+#undef mfn_valid
+#define mfn_valid(_mfn) (mfn_x(_mfn) < max_page)
+#undef page_to_mfn
+#define page_to_mfn(_pg) (_mfn((_pg) - frame_table))
+
+#define log_dirty_lock_init(_d)                                   \
+    do {                                                          \
+        spin_lock_init(&(_d)->arch.paging.log_dirty.lock);        \
+        (_d)->arch.paging.log_dirty.locker = -1;                  \
+        (_d)->arch.paging.log_dirty.locker_function = "nobody";   \
+    } while (0)
+
+#define log_dirty_lock(_d)                                                   \
+    do {                                                                     \
+        if (unlikely((_d)->arch.paging.log_dirty.locker==current->processor))\
+        {                                                                    \
+            printk("Error: paging log dirty lock held by %s\n",              \
+                   (_d)->arch.paging.log_dirty.locker_function);             \
+            BUG();                                                           \
+        }                                                                    \
+        spin_lock(&(_d)->arch.paging.log_dirty.lock);                        \
+        ASSERT((_d)->arch.paging.log_dirty.locker == -1);                    \
+        (_d)->arch.paging.log_dirty.locker = current->processor;             \
+        (_d)->arch.paging.log_dirty.locker_function = __func__;              \
+    } while (0)
+
+#define log_dirty_unlock(_d)                                              \
+    do {                                                                  \
+        ASSERT((_d)->arch.paging.log_dirty.locker == current->processor); \
+        (_d)->arch.paging.log_dirty.locker = -1;                          \
+        (_d)->arch.paging.log_dirty.locker_function = "nobody";           \
+        spin_unlock(&(_d)->arch.paging.log_dirty.lock);                   \
+    } while (0)
+
+/* allocate bitmap resources for log dirty */
+int paging_alloc_log_dirty_bitmap(struct domain *d)
+{
+    ASSERT(d->arch.paging.log_dirty.bitmap == NULL);
+    d->arch.paging.log_dirty.bitmap_size =
+        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
+    d->arch.paging.log_dirty.bitmap = 
+        xmalloc_array(unsigned long,
+                      d->arch.paging.log_dirty.bitmap_size / BITS_PER_LONG);
+    if ( d->arch.paging.log_dirty.bitmap == NULL )
+    {
+        d->arch.paging.log_dirty.bitmap_size = 0;
+        return -ENOMEM;
+    }
+    memset(d->arch.paging.log_dirty.bitmap, 0,
+           d->arch.paging.log_dirty.bitmap_size/8);
+
+    return 0;
+}
+
+/* free bitmap resources */
+void paging_free_log_dirty_bitmap(struct domain *d)
+{
+    d->arch.paging.log_dirty.bitmap_size = 0;
+    if ( d->arch.paging.log_dirty.bitmap )
+    {
+        xfree(d->arch.paging.log_dirty.bitmap);
+        d->arch.paging.log_dirty.bitmap = NULL;
+    }
+}
+
+int paging_log_dirty_enable(struct domain *d)
+{
+    int ret;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+
+    if ( paging_mode_log_dirty(d) )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    ret = paging_alloc_log_dirty_bitmap(d);
+    if ( ret != 0 )
+    {
+        paging_free_log_dirty_bitmap(d);
+        goto out;
+    }
+
+    ret = d->arch.paging.log_dirty.enable_log_dirty(d);
+    if ( ret != 0 )
+        paging_free_log_dirty_bitmap(d);
+
+ out:
+    log_dirty_unlock(d);
+    domain_unpause(d);
+    return ret;
+}
+
+int paging_log_dirty_disable(struct domain *d)
+{
+    int ret;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+    ret = d->arch.paging.log_dirty.disable_log_dirty(d);
+    if ( !paging_mode_log_dirty(d) )
+        paging_free_log_dirty_bitmap(d);
+    log_dirty_unlock(d);
+    domain_unpause(d);
+
+    return ret;
+}
+
+/* Mark a page as dirty */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn)
+{
+    unsigned long pfn;
+    mfn_t gmfn;
+
+    gmfn = _mfn(guest_mfn);
+
+    if ( !paging_mode_log_dirty(d) || !mfn_valid(gmfn) )
+        return;
+
+    log_dirty_lock(d);
+
+    ASSERT(d->arch.paging.log_dirty.bitmap != NULL);
+
+    /* We /really/ mean PFN here, even for non-translated guests. */
+    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
+
+    /*
+     * Values with the MSB set denote MFNs that aren't really part of the 
+     * domain's pseudo-physical memory map (e.g., the shared info frame).
+     * Nothing to do here...
+     */
+    if ( unlikely(!VALID_M2P(pfn)) )
+        return;
+
+    if ( likely(pfn < d->arch.paging.log_dirty.bitmap_size) ) 
+    { 
+        if ( !__test_and_set_bit(pfn, d->arch.paging.log_dirty.bitmap) )
+        {
+            PAGING_DEBUG(LOGDIRTY, 
+                         "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n",
+                         mfn_x(gmfn), pfn, d->domain_id);
+            d->arch.paging.log_dirty.dirty_count++;
+        }
+    }
+    else
+    {
+        PAGING_PRINTK("mark_dirty OOR! "
+                      "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
+                      "owner=%d c=%08x t=%" PRtype_info "\n",
+                      mfn_x(gmfn), 
+                      pfn, 
+                      d->arch.paging.log_dirty.bitmap_size,
+                      d->domain_id,
+                      (page_get_owner(mfn_to_page(gmfn))
+                       ? page_get_owner(mfn_to_page(gmfn))->domain_id
+                       : -1),
+                      mfn_to_page(gmfn)->count_info, 
+                      mfn_to_page(gmfn)->u.inuse.type_info);
+    }
+    
+    log_dirty_unlock(d);
+}
+
+/* Read a domain's log-dirty bitmap and stats.  If the operation is a CLEAN, 
+ * clear the bitmap and stats as well. */
+int paging_log_dirty_op(struct domain *d, struct xen_domctl_shadow_op *sc)
+{
+    int i, rv = 0, clean = 0, peek = 1;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+
+    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
+
+    PAGING_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", 
+                 (clean) ? "clean" : "peek",
+                 d->domain_id,
+                 d->arch.paging.log_dirty.fault_count, 
+                 d->arch.paging.log_dirty.dirty_count);
+
+    sc->stats.fault_count = d->arch.paging.log_dirty.fault_count;
+    sc->stats.dirty_count = d->arch.paging.log_dirty.dirty_count;
+    
+    if ( clean )
+    {
+        d->arch.paging.log_dirty.fault_count = 0;
+        d->arch.paging.log_dirty.dirty_count = 0;
+
+        /* We need to further call clean_dirty_bitmap() functions of specific
+         * paging modes (shadow or hap).
+         */
+        d->arch.paging.log_dirty.clean_dirty_bitmap(d);
+    }
+
+    if ( guest_handle_is_null(sc->dirty_bitmap) )
+        /* caller may have wanted just to clean the state or access stats. */
+        peek = 0;
+
+    if ( (peek || clean) && (d->arch.paging.log_dirty.bitmap == NULL) )
+    {
+        rv = -EINVAL; /* perhaps should be ENOMEM? */
+        goto out;
+    }
+ 
+    if ( sc->pages > d->arch.paging.log_dirty.bitmap_size )
+        sc->pages = d->arch.paging.log_dirty.bitmap_size;
+
+#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */
+    for ( i = 0; i < sc->pages; i += CHUNK )
+    {
+        int bytes = ((((sc->pages - i) > CHUNK)
+                      ? CHUNK
+                      : (sc->pages - i)) + 7) / 8;
+
+        if ( likely(peek) )
+        {
+            if ( copy_to_guest_offset(
+                sc->dirty_bitmap, i/8,
+                (uint8_t *)d->arch.paging.log_dirty.bitmap + (i/8), bytes) )
+            {
+                rv = -EFAULT;
+                goto out;
+            }
+        }
+
+        if ( clean )
+            memset((uint8_t *)d->arch.paging.log_dirty.bitmap + (i/8), 0, bytes);
+    }
+#undef CHUNK
+
+ out:
+    log_dirty_unlock(d);
+    domain_unpause(d);
+    return rv;
+}
+
+
+/* Note that this function takes three function pointers. Callers must supply
+ * these functions for log dirty code to call. This function usually is 
+ * invoked when paging is enabled. Check shadow_enable() and hap_enable() for 
+ * reference.
+ */
+void paging_log_dirty_init(struct domain *d,
+                           int    (*enable_log_dirty)(struct domain *d),
+                           int    (*disable_log_dirty)(struct domain *d),
+                           void   (*clean_dirty_bitmap)(struct domain *d))
+{
+    /* We initialize log dirty lock first */
+    log_dirty_lock_init(d);
+    
+    d->arch.paging.log_dirty.enable_log_dirty = enable_log_dirty;
+    d->arch.paging.log_dirty.disable_log_dirty = disable_log_dirty;
+    d->arch.paging.log_dirty.clean_dirty_bitmap = clean_dirty_bitmap;
+}
+
+/* This function fress log dirty bitmap resources. */
+void paging_log_dirty_teardown(struct domain*d)
+{
+    log_dirty_lock(d);
+    paging_free_log_dirty_bitmap(d);
+    log_dirty_unlock(d);
+}
+/************************************************/
+/*           CODE FOR PAGING SUPPORT            */
+/************************************************/
 /* Domain paging struct initialization. */
 void paging_domain_init(struct domain *d)
 {
@@ -65,16 +338,68 @@ int paging_domctl(struct domain *d, xen_
 int paging_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,
                   XEN_GUEST_HANDLE(void) u_domctl)
 {
+    int rc;
+
+    if ( unlikely(d == current->domain) )
+    {
+        gdprintk(XENLOG_INFO, "Dom %u tried to do a paging op on itself.\n",
+                 d->domain_id);
+        return -EINVAL;
+    }
+    
+    if ( unlikely(d->is_dying) )
+    {
+        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
+                 d->domain_id);
+        return 0;
+    }
+
+    if ( unlikely(d->vcpu[0] == NULL) )
+    {
+        PAGING_ERROR("Paging op on a domain (%u) with no vcpus\n",
+                     d->domain_id);
+        return -EINVAL;
+    }
+    
+    /* Code to handle log-dirty. Note that some log dirty operations
+     * piggy-back on shadow operations. For example, when 
+     * XEN_DOMCTL_SHADOW_OP_OFF is called, it first checks whether log dirty
+     * mode is enabled. If does, we disables log dirty and continues with 
+     * shadow code. For this reason, we need to further dispatch domctl 
+     * to next-level paging code (shadow or hap).
+     */
+    switch ( sc->op )
+    {
+    case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
+        return paging_log_dirty_enable(d);	
+	
+    case XEN_DOMCTL_SHADOW_OP_ENABLE:	
+        if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY )
+            return paging_log_dirty_enable(d);
+
+    case XEN_DOMCTL_SHADOW_OP_OFF:
+        if ( paging_mode_log_dirty(d) )
+            if ( (rc = paging_log_dirty_disable(d)) != 0 ) 
+                return rc;
+
+    case XEN_DOMCTL_SHADOW_OP_CLEAN:
+    case XEN_DOMCTL_SHADOW_OP_PEEK:
+	return paging_log_dirty_op(d, sc);
+    }
+	
     /* Here, dispatch domctl to the appropriate paging code */
     if ( opt_hap_enabled && is_hvm_domain(d) )
-        return hap_domctl(d, sc, u_domctl);
-    else
-        return shadow_domctl(d, sc, u_domctl);
+	return hap_domctl(d, sc, u_domctl);
+    else
+	return shadow_domctl(d, sc, u_domctl);
 }
 
 /* Call when destroying a domain */
 void paging_teardown(struct domain *d)
 {
+    /* clean up log dirty resources. */
+    paging_log_dirty_teardown(d);
+    
     if ( opt_hap_enabled && is_hvm_domain(d) )
         hap_teardown(d);
     else
diff -r 45516ac94c9f xen/arch/x86/mm/shadow/common.c
--- a/xen/arch/x86/mm/shadow/common.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/shadow/common.c	Fri Jun 08 04:30:11 2007 -0500
@@ -87,8 +87,6 @@ __initcall(shadow_audit_key_init);
 __initcall(shadow_audit_key_init);
 #endif /* SHADOW_AUDIT */
 
-static void sh_free_log_dirty_bitmap(struct domain *d);
-
 int _shadow_mode_refcounts(struct domain *d)
 {
     return shadow_mode_refcounts(d);
@@ -541,7 +539,7 @@ sh_validate_guest_entry(struct vcpu *v, 
     int result = 0;
     struct page_info *page = mfn_to_page(gmfn);
 
-    sh_mark_dirty(v->domain, gmfn);
+    paging_mark_dirty(v->domain, mfn_x(gmfn));
     
     // Determine which types of shadows are affected, and update each.
     //
@@ -2455,6 +2453,10 @@ int shadow_enable(struct domain *d, u32 
         }        
     }
 
+    /* initialize log dirty here */
+    paging_log_dirty_init(d, shadow_enable_log_dirty, 
+                          shadow_disable_log_dirty, shadow_clean_dirty_bitmap);
+
     /* Init the P2M table.  Must be done before we take the shadow lock 
      * to avoid possible deadlock. */
     if ( mode & PG_translate )
@@ -2463,6 +2465,7 @@ int shadow_enable(struct domain *d, u32 
         if (rv != 0)
             goto out_unlocked;
     }
+
 
     shadow_lock(d);
 
@@ -2564,8 +2567,6 @@ void shadow_teardown(struct domain *d)
         /* Release the hash table back to xenheap */
         if (d->arch.paging.shadow.hash_table) 
             shadow_hash_teardown(d);
-        /* Release the log-dirty bitmap of dirtied pages */
-        sh_free_log_dirty_bitmap(d);
         /* Should not have any more memory held */
         SHADOW_PRINTK("teardown done."
                        "  Shadow pages total = %u, free = %u, p2m=%u\n",
@@ -2718,98 +2719,6 @@ static int shadow_test_disable(struct do
     domain_pause(d);
     shadow_lock(d);
     ret = shadow_one_bit_disable(d, PG_SH_enable);
-    shadow_unlock(d);
-    domain_unpause(d);
-
-    return ret;
-}
-
-static int
-sh_alloc_log_dirty_bitmap(struct domain *d)
-{
-    ASSERT(d->arch.paging.shadow.dirty_bitmap == NULL);
-    d->arch.paging.shadow.dirty_bitmap_size =
-        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
-    d->arch.paging.shadow.dirty_bitmap =
-        xmalloc_array(unsigned long,
-                      d->arch.paging.shadow.dirty_bitmap_size / BITS_PER_LONG);
-    if ( d->arch.paging.shadow.dirty_bitmap == NULL )
-    {
-        d->arch.paging.shadow.dirty_bitmap_size = 0;
-        return -ENOMEM;
-    }
-    memset(d->arch.paging.shadow.dirty_bitmap, 0,
-           d->arch.paging.shadow.dirty_bitmap_size/8);
-
-    return 0;
-}
-
-static void
-sh_free_log_dirty_bitmap(struct domain *d)
-{
-    d->arch.paging.shadow.dirty_bitmap_size = 0;
-    if ( d->arch.paging.shadow.dirty_bitmap )
-    {
-        xfree(d->arch.paging.shadow.dirty_bitmap);
-        d->arch.paging.shadow.dirty_bitmap = NULL;
-    }
-}
-
-static int shadow_log_dirty_enable(struct domain *d)
-{
-    int ret;
-
-    domain_pause(d);
-    shadow_lock(d);
-
-    if ( shadow_mode_log_dirty(d) )
-    {
-        ret = -EINVAL;
-        goto out;
-    }
-
-    if ( shadow_mode_enabled(d) )
-    {
-        /* This domain already has some shadows: need to clear them out 
-         * of the way to make sure that all references to guest memory are 
-         * properly write-protected */
-        shadow_blow_tables(d);
-    }
-
-#if (SHADOW_OPTIMIZATIONS & SHOPT_LINUX_L3_TOPLEVEL)
-    /* 32bit PV guests on 64bit xen behave like older 64bit linux: they
-     * change an l4e instead of cr3 to switch tables.  Give them the
-     * same optimization */
-    if ( is_pv_32on64_domain(d) )
-        d->arch.paging.shadow.opt_flags = SHOPT_LINUX_L3_TOPLEVEL;
-#endif
-
-    ret = sh_alloc_log_dirty_bitmap(d);
-    if ( ret != 0 )
-    {
-        sh_free_log_dirty_bitmap(d);
-        goto out;
-    }
-
-    ret = shadow_one_bit_enable(d, PG_log_dirty);
-    if ( ret != 0 )
-        sh_free_log_dirty_bitmap(d);
-
- out:
-    shadow_unlock(d);
-    domain_unpause(d);
-    return ret;
-}
-
-static int shadow_log_dirty_disable(struct domain *d)
-{
-    int ret;
-
-    domain_pause(d);
-    shadow_lock(d);
-    ret = shadow_one_bit_disable(d, PG_log_dirty);
-    if ( !shadow_mode_log_dirty(d) )
-        sh_free_log_dirty_bitmap(d);
     shadow_unlock(d);
     domain_unpause(d);
 
@@ -2892,150 +2801,62 @@ void shadow_convert_to_log_dirty(struct 
     BUG();
 }
 
-
-/* Read a domain's log-dirty bitmap and stats.  
- * If the operation is a CLEAN, clear the bitmap and stats as well. */
-static int shadow_log_dirty_op(
-    struct domain *d, struct xen_domctl_shadow_op *sc)
-{
-    int i, rv = 0, clean = 0, peek = 1;
-
-    domain_pause(d);
+/* Shadow specific code which is called in paging_log_dirty_enable().
+ * Return 0 if no problem found.
+ */
+int shadow_enable_log_dirty(struct domain *d)
+{
+    int ret;
+
+    /* shadow lock is required here */
     shadow_lock(d);
-
-    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
-
-    SHADOW_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", 
-                  (clean) ? "clean" : "peek",
-                  d->domain_id,
-                  d->arch.paging.shadow.fault_count, 
-                  d->arch.paging.shadow.dirty_count);
-
-    sc->stats.fault_count = d->arch.paging.shadow.fault_count;
-    sc->stats.dirty_count = d->arch.paging.shadow.dirty_count;
-
-    if ( clean )
-    {
-        /* Need to revoke write access to the domain's pages again.
-         * In future, we'll have a less heavy-handed approach to this,
-         * but for now, we just unshadow everything except Xen. */
+    if ( shadow_mode_enabled(d) )
+    {
+        /* This domain already has some shadows: need to clear them out 
+         * of the way to make sure that all references to guest memory are 
+         * properly write-protected */
         shadow_blow_tables(d);
-
-        d->arch.paging.shadow.fault_count = 0;
-        d->arch.paging.shadow.dirty_count = 0;
-    }
-
-    if ( guest_handle_is_null(sc->dirty_bitmap) )
-        /* caller may have wanted just to clean the state or access stats. */
-        peek = 0;
-
-    if ( (peek || clean) && (d->arch.paging.shadow.dirty_bitmap == NULL) )
-    {
-        rv = -EINVAL; /* perhaps should be ENOMEM? */
-        goto out;
-    }
- 
-    if ( sc->pages > d->arch.paging.shadow.dirty_bitmap_size )
-        sc->pages = d->arch.paging.shadow.dirty_bitmap_size;
-
-#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */
-    for ( i = 0; i < sc->pages; i += CHUNK )
-    {
-        int bytes = ((((sc->pages - i) > CHUNK)
-                      ? CHUNK
-                      : (sc->pages - i)) + 7) / 8;
-
-        if ( likely(peek) )
-        {
-            if ( copy_to_guest_offset(
-                sc->dirty_bitmap, i/8,
-                (uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), bytes) )
-            {
-                rv = -EFAULT;
-                goto out;
-            }
-        }
-
-        if ( clean )
-            memset((uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), 0, bytes);
-    }
-#undef CHUNK
-
- out:
+    }
+
+#if (SHADOW_OPTIMIZATIONS & SHOPT_LINUX_L3_TOPLEVEL)
+    /* 32bit PV guests on 64bit xen behave like older 64bit linux: they
+     * change an l4e instead of cr3 to switch tables.  Give them the
+     * same optimization */
+    if ( is_pv_32on64_domain(d) )
+        d->arch.paging.shadow.opt_flags = SHOPT_LINUX_L3_TOPLEVEL;
+#endif
+    
+    ret = shadow_one_bit_enable(d, PG_log_dirty);
     shadow_unlock(d);
-    domain_unpause(d);
-    return rv;
-}
-
-
-/* Mark a page as dirty */
-void sh_mark_dirty(struct domain *d, mfn_t gmfn)
-{
-    unsigned long pfn;
-    int do_locking;
-
-    if ( !shadow_mode_log_dirty(d) || !mfn_valid(gmfn) )
-        return;
-
-    /* Although this is an externally visible function, we do not know
-     * whether the shadow lock will be held when it is called (since it
-     * can be called from __hvm_copy during emulation).
-     * If the lock isn't held, take it for the duration of the call. */
-    do_locking = !shadow_locked_by_me(d);
-    if ( do_locking ) 
-    { 
-        shadow_lock(d);
-        /* Check the mode again with the lock held */ 
-        if ( unlikely(!shadow_mode_log_dirty(d)) )
-        {
-            shadow_unlock(d);
-            return;
-        }
-    }
-
-    ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL);
-
-    /* We /really/ mean PFN here, even for non-translated guests. */
-    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
-
-    /*
-     * Values with the MSB set denote MFNs that aren't really part of the 
-     * domain's pseudo-physical memory map (e.g., the shared info frame).
-     * Nothing to do here...
-     */
-    if ( unlikely(!VALID_M2P(pfn)) )
-        return;
-
-    /* N.B. Can use non-atomic TAS because protected by shadow_lock. */
-    if ( likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) ) 
-    { 
-        if ( !__test_and_set_bit(pfn, d->arch.paging.shadow.dirty_bitmap) )
-        {
-            SHADOW_DEBUG(LOGDIRTY, 
-                          "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n",
-                          mfn_x(gmfn), pfn, d->domain_id);
-            d->arch.paging.shadow.dirty_count++;
-        }
-    }
-    else
-    {
-        SHADOW_PRINTK("mark_dirty OOR! "
-                       "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
-                       "owner=%d c=%08x t=%" PRtype_info "\n",
-                       mfn_x(gmfn), 
-                       pfn, 
-                       d->arch.paging.shadow.dirty_bitmap_size,
-                       d->domain_id,
-                       (page_get_owner(mfn_to_page(gmfn))
-                        ? page_get_owner(mfn_to_page(gmfn))->domain_id
-                        : -1),
-                       mfn_to_page(gmfn)->count_info, 
-                       mfn_to_page(gmfn)->u.inuse.type_info);
-    }
-
-    if ( do_locking ) shadow_unlock(d);
-}
-
+
+    return ret;
+}
+
+/* shadow specfic code which is called in paging_log_dirty_disable() */
+int shadow_disable_log_dirty(struct domain *d)
+{
+    int ret;
+
+    /* shadow lock is required here */    
+    shadow_lock(d);
+    ret = shadow_one_bit_disable(d, PG_log_dirty);
+    shadow_unlock(d);
+    
+    return ret;
+}
+
+/* This function is called when we CLEAN log dirty bitmap. See 
+ * paging_log_dirty_op() for details. 
+ */
+void shadow_clean_dirty_bitmap(struct domain *d)
+{
+    shadow_lock(d);
+    /* Need to revoke write access to the domain's pages again.
+     * In future, we'll have a less heavy-handed approach to this,
+     * but for now, we just unshadow everything except Xen. */
+    shadow_blow_tables(d);
+    shadow_unlock(d);
+}
 /**************************************************************************/
 /* Shadow-control XEN_DOMCTL dispatcher */
 
@@ -3045,33 +2866,9 @@ int shadow_domctl(struct domain *d,
 {
     int rc, preempted = 0;
 
-    if ( unlikely(d == current->domain) )
-    {
-        gdprintk(XENLOG_INFO, "Dom %u tried to do a shadow op on itself.\n",
-                 d->domain_id);
-        return -EINVAL;
-    }
-
-    if ( unlikely(d->is_dying) )
-    {
-        gdprintk(XENLOG_INFO, "Ignoring shadow op on dying domain %u\n",
-                 d->domain_id);
-        return 0;
-    }
-
-    if ( unlikely(d->vcpu[0] == NULL) )
-    {
-        SHADOW_ERROR("Shadow op on a domain (%u) with no vcpus\n",
-                     d->domain_id);
-        return -EINVAL;
-    }
-
     switch ( sc->op )
     {
     case XEN_DOMCTL_SHADOW_OP_OFF:
-        if ( shadow_mode_log_dirty(d) )
-            if ( (rc = shadow_log_dirty_disable(d)) != 0 ) 
-                return rc;
         if ( d->arch.paging.mode == PG_SH_enable )
             if ( (rc = shadow_test_disable(d)) != 0 ) 
                 return rc;
@@ -3080,19 +2877,10 @@ int shadow_domctl(struct domain *d,
     case XEN_DOMCTL_SHADOW_OP_ENABLE_TEST:
         return shadow_test_enable(d);
 
-    case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
-        return shadow_log_dirty_enable(d);
-
     case XEN_DOMCTL_SHADOW_OP_ENABLE_TRANSLATE:
         return shadow_enable(d, PG_refcounts|PG_translate);
 
-    case XEN_DOMCTL_SHADOW_OP_CLEAN:
-    case XEN_DOMCTL_SHADOW_OP_PEEK:
-        return shadow_log_dirty_op(d, sc);
-
     case XEN_DOMCTL_SHADOW_OP_ENABLE:
-        if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY )
-            return shadow_log_dirty_enable(d);
         return shadow_enable(d, sc->mode << PG_mode_shift);
 
     case XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION:
diff -r 45516ac94c9f xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/shadow/multi.c	Wed Jun 06 12:08:38 2007 -0500
@@ -457,7 +457,7 @@ static u32 guest_set_ad_bits(struct vcpu
     }
 
     /* Set the bit(s) */
-    sh_mark_dirty(v->domain, gmfn);
+    paging_mark_dirty(v->domain, mfn_x(gmfn));
     SHADOW_DEBUG(A_AND_D, "gfn = %" SH_PRI_gfn ", "
                  "old flags = %#x, new flags = %#x\n", 
                  gfn_x(guest_l1e_get_gfn(*ep)), guest_l1e_get_flags(*ep), 
@@ -717,7 +717,7 @@ _sh_propagate(struct vcpu *v,
     if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) )
     {
         if ( ft & FETCH_TYPE_WRITE ) 
-            sh_mark_dirty(d, target_mfn);
+            paging_mark_dirty(d, mfn_x(target_mfn));
         else if ( !sh_mfn_is_dirty(d, target_mfn) )
             sflags &= ~_PAGE_RW;
     }
@@ -2856,7 +2856,7 @@ static int sh_page_fault(struct vcpu *v,
     }
 
     perfc_incr(shadow_fault_fixed);
-    d->arch.paging.shadow.fault_count++;
+    d->arch.paging.log_dirty.fault_count++;
     reset_early_unshadow(v);
 
  done:
@@ -4058,7 +4058,7 @@ sh_x86_emulate_write(struct vcpu *v, uns
     else
         reset_early_unshadow(v);
     
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
@@ -4114,7 +4114,7 @@ sh_x86_emulate_cmpxchg(struct vcpu *v, u
     else
         reset_early_unshadow(v);
 
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
@@ -4158,7 +4158,7 @@ sh_x86_emulate_cmpxchg8b(struct vcpu *v,
     else
         reset_early_unshadow(v);
 
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
diff -r 45516ac94c9f xen/arch/x86/mm/shadow/private.h
--- a/xen/arch/x86/mm/shadow/private.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/arch/x86/mm/shadow/private.h	Wed Jun 06 09:12:08 2007 -0500
@@ -496,13 +496,13 @@ sh_mfn_is_dirty(struct domain *d, mfn_t 
 {
     unsigned long pfn;
     ASSERT(shadow_mode_log_dirty(d));
-    ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL);
+    ASSERT(d->arch.paging.log_dirty.bitmap != NULL);
 
     /* We /really/ mean PFN here, even for non-translated guests. */
     pfn = get_gpfn_from_mfn(mfn_x(gmfn));
     if ( likely(VALID_M2P(pfn))
-         && likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) 
-         && test_bit(pfn, d->arch.paging.shadow.dirty_bitmap) )
+         && likely(pfn < d->arch.paging.log_dirty.bitmap_size) 
+         && test_bit(pfn, d->arch.paging.log_dirty.bitmap) )
         return 1;
 
     return 0;
diff -r 45516ac94c9f xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/domain.h	Wed Jun 06 12:34:24 2007 -0500
@@ -92,14 +92,6 @@ struct shadow_domain {
 
     /* Fast MMIO path heuristic */
     int has_fast_mmio_entries;
-
-    /* Shadow log-dirty bitmap */
-    unsigned long *dirty_bitmap;
-    unsigned int dirty_bitmap_size;  /* in pages, bit per page */
-
-    /* Shadow log-dirty mode stats */
-    unsigned int fault_count;
-    unsigned int dirty_count;
 };
 
 struct shadow_vcpu {
@@ -134,7 +126,6 @@ struct hap_domain {
 /************************************************/
 /*       p2m handling                           */
 /************************************************/
-
 struct p2m_domain {
     /* Lock that protects updates to the p2m */
     spinlock_t         lock;
@@ -156,16 +147,36 @@ struct p2m_domain {
 /************************************************/
 /*       common paging data structure           */
 /************************************************/
+struct log_dirty_domain {
+    /* log-dirty lock */
+    spinlock_t     lock;
+    int            locker; /* processor that holds the lock */
+    const char    *locker_function; /* func that took it */
+
+    /* log-dirty bitmap to record dirty pages */
+    unsigned long *bitmap;
+    unsigned int   bitmap_size;  /* in pages, bit per page */
+
+    /* log-dirty mode stats */
+    unsigned int   fault_count;
+    unsigned int   dirty_count;
+
+    /* functions which are paging mode specific */
+    int            (*enable_log_dirty   )(struct domain *d);
+    int            (*disable_log_dirty  )(struct domain *d);
+    void           (*clean_dirty_bitmap )(struct domain *d);
+};
+
 struct paging_domain {
-    u32               mode;  /* flags to control paging operation */
-
+    /* flags to control paging operation */
+    u32                     mode;
     /* extension for shadow paging support */
-    struct shadow_domain shadow;
-
-    /* Other paging assistance code will have structs here */
-    struct hap_domain    hap;
-};
-
+    struct shadow_domain    shadow;
+    /* extension for hardware-assited paging */
+    struct hap_domain       hap;
+    /* log dirty support */
+    struct log_dirty_domain log_dirty;
+};
 struct paging_vcpu {
     /* Pointers to mode-specific entry points. */
     struct paging_mode *mode;
diff -r 45516ac94c9f xen/include/asm-x86/grant_table.h
--- a/xen/include/asm-x86/grant_table.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/grant_table.h	Wed Jun 06 12:03:21 2007 -0500
@@ -31,7 +31,7 @@ int replace_grant_host_mapping(
 #define gnttab_shared_gmfn(d, t, i)                     \
     (mfn_to_gmfn(d, gnttab_shared_mfn(d, t, i)))
 
-#define gnttab_mark_dirty(d, f) mark_dirty((d), (f))
+#define gnttab_mark_dirty(d, f) paging_mark_dirty((d), (f))
 
 static inline void gnttab_clear_flag(unsigned long nr, uint16_t *addr)
 {
diff -r 45516ac94c9f xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/p2m.h	Thu Jun 07 05:37:12 2007 -0500
@@ -129,6 +129,11 @@ void guest_physmap_remove_page(struct do
 void guest_physmap_remove_page(struct domain *d, unsigned long gfn,
                                unsigned long mfn);
 
+/* set P2M table l1e flags */
+void p2m_set_flags_global(struct domain *d, u32 l1e_flags);
+
+/* set P2M table l1e flags for a gpa */
+int p2m_set_flags(struct domain *d, paddr_t gpa, u32 l1e_flags);
 
 #endif /* _XEN_P2M_H */
 
diff -r 45516ac94c9f xen/include/asm-x86/paging.h
--- a/xen/include/asm-x86/paging.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/paging.h	Wed Jun 06 12:36:54 2007 -0500
@@ -62,6 +62,9 @@
 #define paging_mode_log_dirty(_d) ((_d)->arch.paging.mode & PG_log_dirty)
 #define paging_mode_translate(_d) ((_d)->arch.paging.mode & PG_translate)
 #define paging_mode_external(_d)  ((_d)->arch.paging.mode & PG_external)
+
+/* flags used for paging debug */
+#define PAGING_DEBUG_LOGDIRTY 0
 
 /******************************************************************************
  * The equivalent for a particular vcpu of a shadowed domain. */
@@ -136,6 +139,29 @@ struct paging_mode {
     struct shadow_paging_mode shadow;
 };
 
+/*****************************************************************************
+ * Log dirty code */
+
+/* allocate log dirty bitmap resource for recording dirty pages */
+int paging_alloc_log_dirty_bitmap(struct domain *d);
+
+/* free log dirty bitmap resource */
+void paging_free_log_dirty_bitmap(struct domain *d);
+
+/* enable log dirty */
+int paging_log_dirty_enable(struct domain *d);
+
+/* disable log dirty */
+int paging_log_dirty_disable(struct domain *d);
+
+/* log dirty initialization */
+void paging_log_dirty_init(struct domain *d,
+                           int  (*enable_log_dirty)(struct domain *d),
+                           int  (*disable_log_dirty)(struct domain *d),
+                           void (*clean_dirty_bitmap)(struct domain *d));
+
+/* mark a page as dirty */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn);
 
 /*****************************************************************************
  * Entry points into the paging-assistance code */
diff -r 45516ac94c9f xen/include/asm-x86/shadow.h
--- a/xen/include/asm-x86/shadow.h	Wed Jun 06 08:32:32 2007 -0500
+++ b/xen/include/asm-x86/shadow.h	Wed Jun 06 12:37:52 2007 -0500
@@ -75,16 +75,14 @@ void shadow_teardown(struct domain *d);
 /* Call once all of the references to the domain have gone away */
 void shadow_final_teardown(struct domain *d);
 
-/* Mark a page as dirty in the log-dirty bitmap: called when Xen 
- * makes changes to guest memory on its behalf. */
-void sh_mark_dirty(struct domain *d, mfn_t gmfn);
-/* Cleaner version so we don't pepper shadow_mode tests all over the place */
-static inline void mark_dirty(struct domain *d, unsigned long gmfn)
-{
-    if ( unlikely(shadow_mode_log_dirty(d)) )
-        /* See the comment about locking in sh_mark_dirty */
-        sh_mark_dirty(d, _mfn(gmfn));
-}
+/* shadow code to call when log dirty is enabled */
+int shadow_enable_log_dirty(struct domain *d);
+
+/* shadow code to call when log dirty is disabled */
+int shadow_disable_log_dirty(struct domain *d);
+
+/* shadow code to call when bitmap is being cleaned */
+void shadow_clean_dirty_bitmap(struct domain *d);
 
 /* Update all the things that are derived from the guest's CR0/CR3/CR4.
  * Called to initialize paging structures if the paging mode

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-06-08 19:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-01 15:05 [RFC] Nested Paging Live Migration Huang2, Wei
2007-06-01 16:17 ` Tim Deegan
2007-06-06  4:29   ` Wei Huang
2007-06-06  9:54     ` Tim Deegan
2007-06-07 21:58       ` Huang2, Wei
2007-06-08 10:52         ` Tim Deegan
2007-06-08 16:09           ` Huang2, Wei
2007-06-08 19:26           ` Huang2, Wei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.