From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Wei Huang" Subject: Re: [RFC] Nested Paging Live Migration Date: Tue, 05 Jun 2007 23:29:24 -0500 Message-ID: <46663824.6030200@amd.com> References: <7D748C767B7FA541A8AC5504A4C89A23030EF9B2@SAUSEXMB2.amd.com> <20070601161726.GB16995@york.uk.xensource.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=------------000604050005070008040902 Return-path: In-Reply-To: <20070601161726.GB16995@york.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Tim Deegan Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org This is a multi-part message in MIME format. --------------000604050005070008040902 Content-Type: text/plain; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 7bit Retry. 1. Most common code are moved from shadow to paging: * log dirty related fields (dirty_count ...) are moved to paging_domain * log_dirty_bitmap allocation, free, peek, and clean * mark_dirty_page becomes a common function too * a new lock dirty lock is created to guard these code 2. shadow/hap_log_dirty_enable() and shadow/hap_log_dirty_disable() These four functions were not changed. However, I really want to create two common functions (paging_log_dirty_disable() and paging_log_dirty_enable()) for them. To do this, it requires two function pointers (*log_dirty_enable() and *log_dirty_disable()), which point to shadow-specific code or hap-specific code. For example, *log_dirty_enable() points to shadow_log_dirty_enable(). Tim, let me know if you like this approach. 3. p2m_set_l1e_flags() is renamed to p2m_set_flags_global() as requested. It does NOT walk P2M. Instead, it still relies on set_p2m_entry() to walk P2M table. The reason: I feel uncomfortable to duplicate the code of set_p2m_entry() in this method. Most of them will be same as set_p2m_entry() and p2m_next_level(). What is your opinion? Any comments is welcome. I will create a new patch after collecting them. Thanks, -Wei Tim Deegan wrote: > Hi, > > Thanks for this patch. > > At 10:05 -0500 on 01 Jun (1180692316), Huang2, Wei wrote: > > The attached file supports AMD-V nested paging live migration. Please > > comment. I will create an updated version after collecting feedbacks. > > Can a lot more log-dirty code (bitmap allocation, clearing, reporting) > be made common? E.g.: hap_mark_dirty() is virtually identical to > sh_mark_dirty() -- including some recursive locking and associated > comments that are not true in HAP modes. Maybe give it its own lock to > cover bit-setting? Probably only the code for clearing the bitmap > (i.e., resetting the trap that will cause us to mark pages dirty) needs > to be split out. > > > The following areas require special attention: > > 1. paging_mark_dirty() > > Currently, paging_mark_dirty() dispatches to sh_mark_dirty() or > > hap_mark_dirty() based on paging support. I personally prefer a function > > pointer. However, current paging interface only provides a function > > pointer for vcpu-level functions, not for domain-level functions. This > > is a bit annoying. > > Make it a common function and that should go away. > > > 2. locking in p2m_set_l1e_flags() > > p2m_set_l1e_flags(), which is invoked by hap.c, calls > > hap_write_p2m_entry(). hap_lock() is called twice. I currently remove > > hap_lock() in hap_write_p2m_entry(). A better solution is needed here. > > Hmm. Since you don't ever change the monitor table of a HAP domain, it > might be possible to make hap_write_p2m_entry (and > hap.c:p2m_install_entry_in_monitors()) safe without locking. > > It is worth noting that this would be a different locking discipline > from the one used in shadow code -- code paths that take both the p2m > lock and the shadow lock always take the p2m lock first (there are some > convolutions in shadow init routines etc to make sure this is true). > If the hap lock is to be taken before the p2m lock that will need some > care and attention in the rest of the code. > > > > +/* This function handles P2M page faults by fixing l1e flags with > correct > > + * values. It also calls paging_mark_dirty() function to record the > dirty > > + * pages. > > + */ > > +int p2m_fix_table(struct domain *d, paddr_t gpa) > > Can this have a better name? It's not really fixing anything. Maybe > have this be p2m_set_flags() and the previous function be > p2m_set_flags_global()? > > Also maybe the call to mark_dirty could be made from the SVM code, which > is where we're really handling the write? > > Cheers, > > Tim. > > -- > Tim Deegan , XenSource UK Limited > Registered office c/o EC2Y 5EB, UK; company number 05334508 > > --------------000604050005070008040902 Content-Type: text/plain; name=npt_live_migration_RFC_2.txt Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename=npt_live_migration_RFC_2.txt diff -r 7ab0527484c8 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/arch/x86/hvm/hvm.c Tue Jun 05 04:35:27 2007 -0500 @@ -568,7 +568,7 @@ static int __hvm_copy(void *buf, paddr_t if ( dir ) { memcpy(p, buf, count); /* dir == TRUE: *to* guest */ - mark_dirty(current->domain, mfn); + paging_mark_dirty(current->domain, mfn); } else memcpy(buf, p, count); /* dir == FALSE: *from guest */ diff -r 7ab0527484c8 xen/arch/x86/hvm/io.c --- a/xen/arch/x86/hvm/io.c Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/arch/x86/hvm/io.c Tue Jun 05 04:35:45 2007 -0500 @@ -865,7 +865,7 @@ void hvm_io_assist(void) if ( (p->dir == IOREQ_READ) && p->data_is_ptr ) { gmfn = get_mfn_from_gpfn(paging_gva_to_gfn(v, p->data)); - mark_dirty(d, gmfn); + paging_mark_dirty(d, gmfn); } out: diff -r 7ab0527484c8 xen/arch/x86/hvm/svm/svm.c --- a/xen/arch/x86/hvm/svm/svm.c Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/arch/x86/hvm/svm/svm.c Tue Jun 05 11:50:28 2007 -0500 @@ -1028,13 +1028,16 @@ int start_svm(struct cpuinfo_x86 *c) static int svm_do_nested_pgfault(paddr_t gpa, struct cpu_user_regs *regs) { + struct domain *d; + if (mmio_space(gpa)) { handle_mmio(gpa); return 1; } - /* We should not reach here. Otherwise, P2M table is not correct.*/ - return 0; + d = current->domain; + paging_mark_dirty(d, get_mfn_from_gpfn(gpa >> PAGE_SHIFT)); + return p2m_set_flags(d, gpa, __PAGE_HYPERVISOR|_PAGE_USER); } static void svm_do_no_device_fault(struct vmcb_struct *vmcb) diff -r 7ab0527484c8 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/arch/x86/mm.c Tue Jun 05 04:34:56 2007 -0500 @@ -1556,7 +1556,7 @@ int alloc_page_type(struct page_info *pa /* A page table is dirtied when its type count becomes non-zero. */ if ( likely(owner != NULL) ) - mark_dirty(owner, page_to_mfn(page)); + paging_mark_dirty(owner, page_to_mfn(page)); switch ( type & PGT_type_mask ) { @@ -1602,7 +1602,7 @@ void free_page_type(struct page_info *pa if ( unlikely(paging_mode_enabled(owner)) ) { /* A page table is dirtied when its type count becomes zero. */ - mark_dirty(owner, page_to_mfn(page)); + paging_mark_dirty(owner, page_to_mfn(page)); if ( shadow_mode_refcounts(owner) ) return; @@ -2057,7 +2057,7 @@ int do_mmuext_op( } /* A page is dirtied when its pin status is set. */ - mark_dirty(d, mfn); + paging_mark_dirty(d, mfn); /* We can race domain destruction (domain_relinquish_resources). */ if ( unlikely(this_cpu(percpu_mm_info).foreign != NULL) ) @@ -2089,7 +2089,7 @@ int do_mmuext_op( put_page_and_type(page); put_page(page); /* A page is dirtied when its pin status is cleared. */ - mark_dirty(d, mfn); + paging_mark_dirty(d, mfn); } else { @@ -2424,7 +2424,7 @@ int do_mmu_update( set_gpfn_from_mfn(mfn, gpfn); okay = 1; - mark_dirty(FOREIGNDOM, mfn); + paging_mark_dirty(FOREIGNDOM, mfn); put_page(mfn_to_page(mfn)); break; @@ -3005,7 +3005,7 @@ long do_update_descriptor(u64 pa, u64 de break; } - mark_dirty(dom, mfn); + paging_mark_dirty(dom, mfn); /* All is good so make the update. */ gdt_pent = map_domain_page(mfn); diff -r 7ab0527484c8 xen/arch/x86/mm/hap/hap.c --- a/xen/arch/x86/mm/hap/hap.c Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/arch/x86/mm/hap/hap.c Tue Jun 05 16:37:53 2007 -0500 @@ -385,6 +385,56 @@ void hap_destroy_monitor_table(struct vc } /************************************************/ +/* HAP LOG DIRTY SUPPORT */ +/************************************************/ +int hap_log_dirty_enable(struct domain *d) +{ + int ret; + + domain_pause(d); + hap_lock(d); + + ret = paging_alloc_log_dirty_bitmap(d); + if ( ret != 0 ) + { + paging_free_log_dirty_bitmap(d); + goto out; + } + + /* turn on PG_log_dirty bit in paging mode */ + d->arch.paging.mode |= PG_log_dirty; + + /* mark physical memory as not writable */ + p2m_set_flags_global(d, (_PAGE_PRESENT|_PAGE_USER)); + flush_tlb_all_pge(); + + out: + hap_unlock(d); + domain_unpause(d); + + return ret; +} + +int hap_log_dirty_disable(struct domain *d) +{ + domain_pause(d); + hap_lock(d); + if ( paging_mode_log_dirty(d) ) + paging_free_log_dirty_bitmap(d); + + /* turn off PG_log_dirty bit in paging mode */ + d->arch.paging.mode &= ~PG_log_dirty; + + /* recover P2M table to normal mode */ + p2m_set_flags_global(d, __PAGE_HYPERVISOR|_PAGE_USER); + + hap_unlock(d); + domain_unpause(d); + + return 1; +} + +/************************************************/ /* HAP DOMAIN LEVEL FUNCTIONS */ /************************************************/ void hap_domain_init(struct domain *d) @@ -498,12 +548,16 @@ int hap_domctl(struct domain *d, xen_dom HERE_I_AM; - if ( unlikely(d == current->domain) ) { - gdprintk(XENLOG_INFO, "Don't try to do a hap op on yourself!\n"); - return -EINVAL; - } - switch ( sc->op ) { + case XEN_DOMCTL_SHADOW_OP_OFF: + if ( paging_mode_log_dirty(d) ) + if ( (rc = hap_log_dirty_disable(d)) != 0 ) + return rc; + return 0; + + case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY: + return hap_log_dirty_enable(d); + case XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION: hap_lock(d); rc = hap_set_allocation(d, sc->mb << (20 - PAGE_SHIFT), &preempted); diff -r 7ab0527484c8 xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/arch/x86/mm/p2m.c Tue Jun 05 11:41:29 2007 -0500 @@ -169,7 +169,7 @@ p2m_next_level(struct domain *d, mfn_t * // Returns 0 on error (out of memory) static int -set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) +set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, u32 l1e_flags) { // XXX -- this might be able to be faster iff current->domain == d mfn_t table_mfn = pagetable_get_mfn(d->arch.phys_table); @@ -213,7 +213,7 @@ set_p2m_entry(struct domain *d, unsigned d->arch.p2m.max_mapped_pfn = gfn; if ( mfn_valid(mfn) ) - entry_content = l1e_from_pfn(mfn_x(mfn), __PAGE_HYPERVISOR|_PAGE_USER); + entry_content = l1e_from_pfn(mfn_x(mfn), l1e_flags); else entry_content = l1e_empty(); @@ -278,7 +278,7 @@ int p2m_alloc_table(struct domain *d, p2m_unlock(d); return -ENOMEM; } -list_add_tail(&p2m_top->list, &d->arch.p2m.pages); + list_add_tail(&p2m_top->list, &d->arch.p2m.pages); p2m_top->count_info = 1; p2m_top->u.inuse.type_info = @@ -297,8 +297,8 @@ list_add_tail(&p2m_top->list, &d->arch.p /* Initialise physmap tables for slot zero. Other code assumes this. */ gfn = 0; -mfn = _mfn(INVALID_MFN); - if ( !set_p2m_entry(d, gfn, mfn) ) + mfn = _mfn(INVALID_MFN); + if ( !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) ) goto error; for ( entry = d->page_list.next; @@ -316,7 +316,7 @@ mfn = _mfn(INVALID_MFN); (gfn != 0x55555555L) #endif && gfn != INVALID_M2P_ENTRY - && !set_p2m_entry(d, gfn, mfn) ) + && !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) ) goto error; } @@ -626,7 +626,7 @@ p2m_remove_page(struct domain *d, unsign ASSERT(mfn_x(gfn_to_mfn(d, gfn)) == mfn); //ASSERT(mfn_to_gfn(d, mfn) == gfn); - set_p2m_entry(d, gfn, _mfn(INVALID_MFN)); + set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER); set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY); } @@ -659,7 +659,7 @@ guest_physmap_add_page(struct domain *d, omfn = gfn_to_mfn(d, gfn); if ( mfn_valid(omfn) ) { - set_p2m_entry(d, gfn, _mfn(INVALID_MFN)); + set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER); set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY); } @@ -685,13 +685,81 @@ guest_physmap_add_page(struct domain *d, } } - set_p2m_entry(d, gfn, _mfn(mfn)); + set_p2m_entry(d, gfn, _mfn(mfn), __PAGE_HYPERVISOR|_PAGE_USER); set_gpfn_from_mfn(mfn, gfn); audit_p2m(d); p2m_unlock(d); } +/* This function goes through P2M table and modify l1e flags of all pages. Note + * that physical base address of l1e is intact. This function can be used for + * special purpose, such as marking physical memory as Not-Writable for + * tracking dirty pages during live migration. + */ +int p2m_set_flags_global(struct domain *d, u32 l1e_flags) +{ + mfn_t mfn; + struct list_head *entry; + struct page_info *page; + unsigned long gfn; + + p2m_lock(d); + + if ( pagetable_get_pfn(d->arch.phys_table) == 0 ) + { + P2M_ERROR("p2m table has not been allocated for this domain yet!\n"); + p2m_unlock(d); + return -EINVAL; + } + + for ( entry = d->page_list.next; + entry != &d->page_list; + entry = entry->next ) + { + page = list_entry(entry, struct page_info, list); + mfn = page_to_mfn(page); + gfn = get_gpfn_from_mfn(mfn_x(mfn)); + if ( +#ifdef __x86_64__ + (gfn != 0x5555555555555555L) +#else + (gfn != 0x55555555L) +#endif + && gfn != INVALID_M2P_ENTRY + && !set_p2m_entry(d, gfn, mfn, l1e_flags) ) + goto error; + } + + p2m_unlock(d); + return 0; + + error: + P2M_PRINTK("failed to change l1e flags of p2m table, gfn=%05lx, mfn=%" + PRI_mfn "\n", gfn, mfn_x(mfn)); + p2m_unlock(d); + return -ENOMEM; +} + +/* This function goes through p2M table and modifies l1e flags of a specific + * gpa. + */ +int p2m_set_flags(struct domain *d, paddr_t gpa, u32 l1e_flags) +{ + unsigned long gfn; + mfn_t mfn; + + p2m_lock(d); + + gfn = gpa >> PAGE_SHIFT; + mfn = gfn_to_mfn(d, gfn); + if ( mfn_valid(mfn) ) + set_p2m_entry(d, gfn, mfn, l1e_flags); + + p2m_unlock(d); + + return 1; +} /* * Local variables: diff -r 7ab0527484c8 xen/arch/x86/mm/paging.c --- a/xen/arch/x86/mm/paging.c Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/arch/x86/mm/paging.c Tue Jun 05 17:20:34 2007 -0500 @@ -25,6 +25,15 @@ #include #include #include +#include + +/* Override macros from asm/page.h to make them work with mfn_t */ +#undef mfn_to_page +#define mfn_to_page(_m) (frame_table + mfn_x(_m)) +#undef mfn_valid +#define mfn_valid(_mfn) (mfn_x(_mfn) < max_page) +#undef page_to_mfn +#define page_to_mfn(_pg) (_mfn((_pg) - frame_table)) /* Xen command-line option to enable hardware-assisted paging */ int opt_hap_enabled; @@ -42,10 +51,200 @@ boolean_param("hap", opt_hap_enabled); } while (0) +/* log dirty mode lock */ +#define log_dirty_lock_init(_d) \ + do { \ + spin_lock_init(&(_d)->arch.paging.log_dirty_lock); \ + (_d)->arch.paging.log_dirty_locker = -1; \ + (_d)->arch.paging.log_dirty_locker_function = "nobody"; \ + } while (0) + +#define log_dirty_lock(_d) \ + do { \ + if (unlikely((_d)->arch.paging.log_dirty_locker==current->processor))\ + { \ + printk("Error: paging log dirty lock held by %s\n", \ + (_d)->arch.paging.log_dirty_locker_function); \ + BUG(); \ + } \ + spin_lock(&(_d)->arch.paging.log_dirty_lock); \ + ASSERT((_d)->arch.paging.log_dirty_locker == -1); \ + (_d)->arch.paging.log_dirty_locker = current->processor; \ + (_d)->arch.paging.log_dirty_locker_function = __func__; \ + } while (0) + +#define log_dirty_unlock(_d) \ + do { \ + ASSERT((_d)->arch.paging.log_dirty_locker == current->processor); \ + (_d)->arch.paging.log_dirty_locker = -1; \ + (_d)->arch.paging.log_dirty_locker_function = "nobody"; \ + spin_unlock(&(_d)->arch.paging.log_dirty_lock); \ + } while (0) + + +int paging_alloc_log_dirty_bitmap(struct domain *d) +{ + ASSERT(d->arch.paging.dirty_bitmap == NULL); + d->arch.paging.dirty_bitmap_size = + (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1); + d->arch.paging.dirty_bitmap = + xmalloc_array(unsigned long, + d->arch.paging.dirty_bitmap_size / BITS_PER_LONG); + if ( d->arch.paging.dirty_bitmap == NULL ) + { + d->arch.paging.dirty_bitmap_size = 0; + return -ENOMEM; + } + memset(d->arch.paging.dirty_bitmap, 0, + d->arch.paging.dirty_bitmap_size/8); + + return 0; +} + +void paging_free_log_dirty_bitmap(struct domain *d) +{ + d->arch.paging.dirty_bitmap_size = 0; + if ( d->arch.paging.dirty_bitmap ) + { + xfree(d->arch.paging.dirty_bitmap); + d->arch.paging.dirty_bitmap = NULL; + } +} + +/* Mark a page as dirty */ +void paging_mark_dirty(struct domain *d, unsigned long guest_mfn) +{ + unsigned long pfn; + mfn_t gmfn; + + gmfn = _mfn(guest_mfn); + + if ( !paging_mode_log_dirty(d) || !mfn_valid(gmfn) ) + return; + + log_dirty_lock(d); + + ASSERT(d->arch.paging.dirty_bitmap != NULL); + + /* We /really/ mean PFN here, even for non-translated guests. */ + pfn = get_gpfn_from_mfn(mfn_x(gmfn)); + + /* + * Values with the MSB set denote MFNs that aren't really part of the + * domain's pseudo-physical memory map (e.g., the shared info frame). + * Nothing to do here... + */ + if ( unlikely(!VALID_M2P(pfn)) ) + return; + + /* N.B. Can use non-atomic TAS because protected by shadow_lock. */ + if ( likely(pfn < d->arch.paging.dirty_bitmap_size) ) + { + if ( !__test_and_set_bit(pfn, d->arch.paging.dirty_bitmap) ) + { + PAGING_DEBUG(LOGDIRTY, + "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n", + mfn_x(gmfn), pfn, d->domain_id); + d->arch.paging.dirty_count++; + } + } + else + { + PAGING_PRINTK("mark_dirty OOR! " + "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n" + "owner=%d c=%08x t=%" PRtype_info "\n", + mfn_x(gmfn), + pfn, + d->arch.paging.dirty_bitmap_size, + d->domain_id, + (page_get_owner(mfn_to_page(gmfn)) + ? page_get_owner(mfn_to_page(gmfn))->domain_id + : -1), + mfn_to_page(gmfn)->count_info, + mfn_to_page(gmfn)->u.inuse.type_info); + } + + log_dirty_unlock(d); +} + +/* Read a domain's log-dirty bitmap and stats. If the operation is a CLEAN, + * clear the bitmap and stats as well. */ +int paging_log_dirty_op(struct domain *d, struct xen_domctl_shadow_op *sc) +{ + int i, rv = 0, clean = 0, peek = 1; + + domain_pause(d); + log_dirty_lock(d); + + clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN); + + PAGING_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", + (clean) ? "clean" : "peek", + d->domain_id, + d->arch.paging.fault_count, + d->arch.paging.dirty_count); + + sc->stats.fault_count = d->arch.paging.fault_count; + sc->stats.dirty_count = d->arch.paging.dirty_count; + + if ( clean ) + { + /* Further operations are required for XEN_DOMCTL_SHADOW_OP_CLEAN. We + * dispatch to next-level log_dirty functions based on paging mode */ + if ( !paging_mode_hap(d) ) + shadow_log_dirty_op_clean(d); + + d->arch.paging.fault_count = 0; + d->arch.paging.dirty_count = 0; + } + + if ( guest_handle_is_null(sc->dirty_bitmap) ) + /* caller may have wanted just to clean the state or access stats. */ + peek = 0; + + if ( (peek || clean) && (d->arch.paging.dirty_bitmap == NULL) ) + { + rv = -EINVAL; /* perhaps should be ENOMEM? */ + goto out; + } + + if ( sc->pages > d->arch.paging.dirty_bitmap_size ) + sc->pages = d->arch.paging.dirty_bitmap_size; + +#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */ + for ( i = 0; i < sc->pages; i += CHUNK ) + { + int bytes = ((((sc->pages - i) > CHUNK) + ? CHUNK + : (sc->pages - i)) + 7) / 8; + + if ( likely(peek) ) + { + if ( copy_to_guest_offset( + sc->dirty_bitmap, i/8, + (uint8_t *)d->arch.paging.dirty_bitmap + (i/8), bytes) ) + { + rv = -EFAULT; + goto out; + } + } + + if ( clean ) + memset((uint8_t *)d->arch.paging.dirty_bitmap + (i/8), 0, bytes); + } +#undef CHUNK + + out: + log_dirty_unlock(d); + domain_unpause(d); + return rv; +} + /* Domain paging struct initialization. */ void paging_domain_init(struct domain *d) { p2m_init(d); + log_dirty_lock_init(d); shadow_domain_init(d); if ( opt_hap_enabled && is_hvm_domain(d) ) @@ -65,11 +264,40 @@ int paging_domctl(struct domain *d, xen_ int paging_domctl(struct domain *d, xen_domctl_shadow_op_t *sc, XEN_GUEST_HANDLE(void) u_domctl) { - /* Here, dispatch domctl to the appropriate paging code */ - if ( opt_hap_enabled && is_hvm_domain(d) ) - return hap_domctl(d, sc, u_domctl); - else - return shadow_domctl(d, sc, u_domctl); + if ( unlikely(d == current->domain) ) + { + gdprintk(XENLOG_INFO, "Dom %u tried to do a paging op on itself.\n", + d->domain_id); + return -EINVAL; + } + + if ( unlikely(d->is_dying) ) + { + gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n", + d->domain_id); + return 0; + } + + if ( unlikely(d->vcpu[0] == NULL) ) + { + PAGING_ERROR("Paging op on a domain (%u) with no vcpus\n", + d->domain_id); + return -EINVAL; + } + + switch ( sc->op ) + { + case XEN_DOMCTL_SHADOW_OP_CLEAN: + case XEN_DOMCTL_SHADOW_OP_PEEK: + return paging_log_dirty_op(d, sc); + + default: + /* Dispatch other domctl operations to the appropriate paging code */ + if ( opt_hap_enabled && is_hvm_domain(d) ) + return hap_domctl(d, sc, u_domctl); + else + return shadow_domctl(d, sc, u_domctl); + } } /* Call when destroying a domain */ diff -r 7ab0527484c8 xen/arch/x86/mm/shadow/common.c --- a/xen/arch/x86/mm/shadow/common.c Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/arch/x86/mm/shadow/common.c Tue Jun 05 17:20:34 2007 -0500 @@ -87,8 +87,6 @@ __initcall(shadow_audit_key_init); __initcall(shadow_audit_key_init); #endif /* SHADOW_AUDIT */ -static void sh_free_log_dirty_bitmap(struct domain *d); - int _shadow_mode_refcounts(struct domain *d) { return shadow_mode_refcounts(d); @@ -541,7 +539,7 @@ sh_validate_guest_entry(struct vcpu *v, int result = 0; struct page_info *page = mfn_to_page(gmfn); - sh_mark_dirty(v->domain, gmfn); + paging_mark_dirty(v->domain, mfn_x(gmfn)); // Determine which types of shadows are affected, and update each. // @@ -2565,7 +2563,7 @@ void shadow_teardown(struct domain *d) if (d->arch.paging.shadow.hash_table) shadow_hash_teardown(d); /* Release the log-dirty bitmap of dirtied pages */ - sh_free_log_dirty_bitmap(d); + paging_free_log_dirty_bitmap(d); /* Should not have any more memory held */ SHADOW_PRINTK("teardown done." " Shadow pages total = %u, free = %u, p2m=%u\n", @@ -2724,37 +2722,6 @@ static int shadow_test_disable(struct do return ret; } -static int -sh_alloc_log_dirty_bitmap(struct domain *d) -{ - ASSERT(d->arch.paging.shadow.dirty_bitmap == NULL); - d->arch.paging.shadow.dirty_bitmap_size = - (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1); - d->arch.paging.shadow.dirty_bitmap = - xmalloc_array(unsigned long, - d->arch.paging.shadow.dirty_bitmap_size / BITS_PER_LONG); - if ( d->arch.paging.shadow.dirty_bitmap == NULL ) - { - d->arch.paging.shadow.dirty_bitmap_size = 0; - return -ENOMEM; - } - memset(d->arch.paging.shadow.dirty_bitmap, 0, - d->arch.paging.shadow.dirty_bitmap_size/8); - - return 0; -} - -static void -sh_free_log_dirty_bitmap(struct domain *d) -{ - d->arch.paging.shadow.dirty_bitmap_size = 0; - if ( d->arch.paging.shadow.dirty_bitmap ) - { - xfree(d->arch.paging.shadow.dirty_bitmap); - d->arch.paging.shadow.dirty_bitmap = NULL; - } -} - static int shadow_log_dirty_enable(struct domain *d) { int ret; @@ -2784,16 +2751,16 @@ static int shadow_log_dirty_enable(struc d->arch.paging.shadow.opt_flags = SHOPT_LINUX_L3_TOPLEVEL; #endif - ret = sh_alloc_log_dirty_bitmap(d); + ret = paging_alloc_log_dirty_bitmap(d); if ( ret != 0 ) { - sh_free_log_dirty_bitmap(d); + paging_free_log_dirty_bitmap(d); goto out; } ret = shadow_one_bit_enable(d, PG_log_dirty); if ( ret != 0 ) - sh_free_log_dirty_bitmap(d); + paging_free_log_dirty_bitmap(d); out: shadow_unlock(d); @@ -2809,11 +2776,21 @@ static int shadow_log_dirty_disable(stru shadow_lock(d); ret = shadow_one_bit_disable(d, PG_log_dirty); if ( !shadow_mode_log_dirty(d) ) - sh_free_log_dirty_bitmap(d); + paging_free_log_dirty_bitmap(d); shadow_unlock(d); domain_unpause(d); return ret; +} + +void shadow_log_dirty_op_clean(struct domain *d) +{ + /* Need to revoke write access to the domain's pages again. + * In future, we'll have a less heavy-handed approach to this, + * but for now, we just unshadow everything except Xen. */ + shadow_lock(d); + shadow_blow_tables(d); + shadow_unlock(d); } /**************************************************************************/ @@ -2892,150 +2869,6 @@ void shadow_convert_to_log_dirty(struct BUG(); } - -/* Read a domain's log-dirty bitmap and stats. - * If the operation is a CLEAN, clear the bitmap and stats as well. */ -static int shadow_log_dirty_op( - struct domain *d, struct xen_domctl_shadow_op *sc) -{ - int i, rv = 0, clean = 0, peek = 1; - - domain_pause(d); - shadow_lock(d); - - clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN); - - SHADOW_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", - (clean) ? "clean" : "peek", - d->domain_id, - d->arch.paging.shadow.fault_count, - d->arch.paging.shadow.dirty_count); - - sc->stats.fault_count = d->arch.paging.shadow.fault_count; - sc->stats.dirty_count = d->arch.paging.shadow.dirty_count; - - if ( clean ) - { - /* Need to revoke write access to the domain's pages again. - * In future, we'll have a less heavy-handed approach to this, - * but for now, we just unshadow everything except Xen. */ - shadow_blow_tables(d); - - d->arch.paging.shadow.fault_count = 0; - d->arch.paging.shadow.dirty_count = 0; - } - - if ( guest_handle_is_null(sc->dirty_bitmap) ) - /* caller may have wanted just to clean the state or access stats. */ - peek = 0; - - if ( (peek || clean) && (d->arch.paging.shadow.dirty_bitmap == NULL) ) - { - rv = -EINVAL; /* perhaps should be ENOMEM? */ - goto out; - } - - if ( sc->pages > d->arch.paging.shadow.dirty_bitmap_size ) - sc->pages = d->arch.paging.shadow.dirty_bitmap_size; - -#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */ - for ( i = 0; i < sc->pages; i += CHUNK ) - { - int bytes = ((((sc->pages - i) > CHUNK) - ? CHUNK - : (sc->pages - i)) + 7) / 8; - - if ( likely(peek) ) - { - if ( copy_to_guest_offset( - sc->dirty_bitmap, i/8, - (uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), bytes) ) - { - rv = -EFAULT; - goto out; - } - } - - if ( clean ) - memset((uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), 0, bytes); - } -#undef CHUNK - - out: - shadow_unlock(d); - domain_unpause(d); - return rv; -} - - -/* Mark a page as dirty */ -void sh_mark_dirty(struct domain *d, mfn_t gmfn) -{ - unsigned long pfn; - int do_locking; - - if ( !shadow_mode_log_dirty(d) || !mfn_valid(gmfn) ) - return; - - /* Although this is an externally visible function, we do not know - * whether the shadow lock will be held when it is called (since it - * can be called from __hvm_copy during emulation). - * If the lock isn't held, take it for the duration of the call. */ - do_locking = !shadow_locked_by_me(d); - if ( do_locking ) - { - shadow_lock(d); - /* Check the mode again with the lock held */ - if ( unlikely(!shadow_mode_log_dirty(d)) ) - { - shadow_unlock(d); - return; - } - } - - ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL); - - /* We /really/ mean PFN here, even for non-translated guests. */ - pfn = get_gpfn_from_mfn(mfn_x(gmfn)); - - /* - * Values with the MSB set denote MFNs that aren't really part of the - * domain's pseudo-physical memory map (e.g., the shared info frame). - * Nothing to do here... - */ - if ( unlikely(!VALID_M2P(pfn)) ) - return; - - /* N.B. Can use non-atomic TAS because protected by shadow_lock. */ - if ( likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) ) - { - if ( !__test_and_set_bit(pfn, d->arch.paging.shadow.dirty_bitmap) ) - { - SHADOW_DEBUG(LOGDIRTY, - "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n", - mfn_x(gmfn), pfn, d->domain_id); - d->arch.paging.shadow.dirty_count++; - } - } - else - { - SHADOW_PRINTK("mark_dirty OOR! " - "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n" - "owner=%d c=%08x t=%" PRtype_info "\n", - mfn_x(gmfn), - pfn, - d->arch.paging.shadow.dirty_bitmap_size, - d->domain_id, - (page_get_owner(mfn_to_page(gmfn)) - ? page_get_owner(mfn_to_page(gmfn))->domain_id - : -1), - mfn_to_page(gmfn)->count_info, - mfn_to_page(gmfn)->u.inuse.type_info); - } - - if ( do_locking ) shadow_unlock(d); -} - /**************************************************************************/ /* Shadow-control XEN_DOMCTL dispatcher */ @@ -3044,27 +2877,6 @@ int shadow_domctl(struct domain *d, XEN_GUEST_HANDLE(void) u_domctl) { int rc, preempted = 0; - - if ( unlikely(d == current->domain) ) - { - gdprintk(XENLOG_INFO, "Dom %u tried to do a shadow op on itself.\n", - d->domain_id); - return -EINVAL; - } - - if ( unlikely(d->is_dying) ) - { - gdprintk(XENLOG_INFO, "Ignoring shadow op on dying domain %u\n", - d->domain_id); - return 0; - } - - if ( unlikely(d->vcpu[0] == NULL) ) - { - SHADOW_ERROR("Shadow op on a domain (%u) with no vcpus\n", - d->domain_id); - return -EINVAL; - } switch ( sc->op ) { @@ -3085,10 +2897,6 @@ int shadow_domctl(struct domain *d, case XEN_DOMCTL_SHADOW_OP_ENABLE_TRANSLATE: return shadow_enable(d, PG_refcounts|PG_translate); - - case XEN_DOMCTL_SHADOW_OP_CLEAN: - case XEN_DOMCTL_SHADOW_OP_PEEK: - return shadow_log_dirty_op(d, sc); case XEN_DOMCTL_SHADOW_OP_ENABLE: if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY ) diff -r 7ab0527484c8 xen/arch/x86/mm/shadow/multi.c --- a/xen/arch/x86/mm/shadow/multi.c Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/arch/x86/mm/shadow/multi.c Tue Jun 05 04:38:26 2007 -0500 @@ -457,7 +457,7 @@ static u32 guest_set_ad_bits(struct vcpu } /* Set the bit(s) */ - sh_mark_dirty(v->domain, gmfn); + paging_mark_dirty(v->domain, mfn_x(gmfn)); SHADOW_DEBUG(A_AND_D, "gfn = %" SH_PRI_gfn ", " "old flags = %#x, new flags = %#x\n", gfn_x(guest_l1e_get_gfn(*ep)), guest_l1e_get_flags(*ep), @@ -717,7 +717,7 @@ _sh_propagate(struct vcpu *v, if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) ) { if ( ft & FETCH_TYPE_WRITE ) - sh_mark_dirty(d, target_mfn); + paging_mark_dirty(d, mfn_x(target_mfn)); else if ( !sh_mfn_is_dirty(d, target_mfn) ) sflags &= ~_PAGE_RW; } @@ -2856,7 +2856,7 @@ static int sh_page_fault(struct vcpu *v, } perfc_incr(shadow_fault_fixed); - d->arch.paging.shadow.fault_count++; + d->arch.paging.fault_count++; reset_early_unshadow(v); done: @@ -4058,7 +4058,7 @@ sh_x86_emulate_write(struct vcpu *v, uns else reset_early_unshadow(v); - sh_mark_dirty(v->domain, mfn); + paging_mark_dirty(v->domain, mfn_x(mfn)); sh_unmap_domain_page(addr); shadow_audit_tables(v); @@ -4114,7 +4114,7 @@ sh_x86_emulate_cmpxchg(struct vcpu *v, u else reset_early_unshadow(v); - sh_mark_dirty(v->domain, mfn); + paging_mark_dirty(v->domain, mfn_x(mfn)); sh_unmap_domain_page(addr); shadow_audit_tables(v); @@ -4158,7 +4158,7 @@ sh_x86_emulate_cmpxchg8b(struct vcpu *v, else reset_early_unshadow(v); - sh_mark_dirty(v->domain, mfn); + paging_mark_dirty(v->domain, mfn_x(mfn)); sh_unmap_domain_page(addr); shadow_audit_tables(v); diff -r 7ab0527484c8 xen/arch/x86/mm/shadow/private.h --- a/xen/arch/x86/mm/shadow/private.h Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/arch/x86/mm/shadow/private.h Mon Jun 04 17:56:23 2007 -0500 @@ -496,13 +496,13 @@ sh_mfn_is_dirty(struct domain *d, mfn_t { unsigned long pfn; ASSERT(shadow_mode_log_dirty(d)); - ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL); + ASSERT(d->arch.paging.dirty_bitmap != NULL); /* We /really/ mean PFN here, even for non-translated guests. */ pfn = get_gpfn_from_mfn(mfn_x(gmfn)); if ( likely(VALID_M2P(pfn)) - && likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) - && test_bit(pfn, d->arch.paging.shadow.dirty_bitmap) ) + && likely(pfn < d->arch.paging.dirty_bitmap_size) + && test_bit(pfn, d->arch.paging.dirty_bitmap) ) return 1; return 0; diff -r 7ab0527484c8 xen/include/asm-x86/domain.h --- a/xen/include/asm-x86/domain.h Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/include/asm-x86/domain.h Tue Jun 05 04:21:38 2007 -0500 @@ -92,14 +92,6 @@ struct shadow_domain { /* Fast MMIO path heuristic */ int has_fast_mmio_entries; - - /* Shadow log-dirty bitmap */ - unsigned long *dirty_bitmap; - unsigned int dirty_bitmap_size; /* in pages, bit per page */ - - /* Shadow log-dirty mode stats */ - unsigned int fault_count; - unsigned int dirty_count; }; struct shadow_vcpu { @@ -164,6 +156,19 @@ struct paging_domain { /* Other paging assistance code will have structs here */ struct hap_domain hap; + + /* log-dirty lock */ + spinlock_t log_dirty_lock; + int log_dirty_locker; /* processor which holds the lock */ + const char *log_dirty_locker_function; /* func that took it */ + + /* log-dirty bitmap */ + unsigned long *dirty_bitmap; + unsigned int dirty_bitmap_size; /* in pages, bit per page */ + + /* log-dirty mode stats */ + unsigned int fault_count; + unsigned int dirty_count; }; struct paging_vcpu { diff -r 7ab0527484c8 xen/include/asm-x86/grant_table.h --- a/xen/include/asm-x86/grant_table.h Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/include/asm-x86/grant_table.h Tue Jun 05 04:33:38 2007 -0500 @@ -31,7 +31,7 @@ int replace_grant_host_mapping( #define gnttab_shared_gmfn(d, t, i) \ (mfn_to_gmfn(d, gnttab_shared_mfn(d, t, i))) -#define gnttab_mark_dirty(d, f) mark_dirty((d), (f)) +#define gnttab_mark_dirty(d, f) paging_mark_dirty((d), (f)) static inline void gnttab_clear_flag(unsigned long nr, uint16_t *addr) { diff -r 7ab0527484c8 xen/include/asm-x86/p2m.h --- a/xen/include/asm-x86/p2m.h Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/include/asm-x86/p2m.h Tue Jun 05 11:42:54 2007 -0500 @@ -129,6 +129,11 @@ void guest_physmap_remove_page(struct do void guest_physmap_remove_page(struct domain *d, unsigned long gfn, unsigned long mfn); +/* Configure l1e flags of P2M table */ +int p2m_set_flags_global(struct domain *d, u32 flags); + +/* Set P2M l1e flags of a specific page */ +int p2m_set_flags(struct domain *d, paddr_t gpa, u32 flags); #endif /* _XEN_P2M_H */ diff -r 7ab0527484c8 xen/include/asm-x86/paging.h --- a/xen/include/asm-x86/paging.h Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/include/asm-x86/paging.h Tue Jun 05 04:55:23 2007 -0500 @@ -63,6 +63,8 @@ #define paging_mode_translate(_d) ((_d)->arch.paging.mode & PG_translate) #define paging_mode_external(_d) ((_d)->arch.paging.mode & PG_external) +/* flags used for paging debug */ +#define PAGING_DEBUG_LOGDIRTY 0 /****************************************************************************** * The equivalent for a particular vcpu of a shadowed domain. */ @@ -164,6 +166,14 @@ void paging_final_teardown(struct domain * creation. */ int paging_enable(struct domain *d, u32 mode); +/* allocate memory resource for log dirty */ +int paging_alloc_log_dirty_bitmap(struct domain *d); + +/* free memory resource for log dirty */ +void paging_free_log_dirty_bitmap(struct domain *d); + +/* mark a page as dirty page */ +void paging_mark_dirty(struct domain *d, unsigned long guest_mfn); /* Page fault handler * Called from pagefault handler in Xen, and from the HVM trap handlers diff -r 7ab0527484c8 xen/include/asm-x86/shadow.h --- a/xen/include/asm-x86/shadow.h Mon Jun 04 16:46:03 2007 -0500 +++ b/xen/include/asm-x86/shadow.h Tue Jun 05 09:58:00 2007 -0500 @@ -75,22 +75,13 @@ void shadow_teardown(struct domain *d); /* Call once all of the references to the domain have gone away */ void shadow_final_teardown(struct domain *d); -/* Mark a page as dirty in the log-dirty bitmap: called when Xen - * makes changes to guest memory on its behalf. */ -void sh_mark_dirty(struct domain *d, mfn_t gmfn); -/* Cleaner version so we don't pepper shadow_mode tests all over the place */ -static inline void mark_dirty(struct domain *d, unsigned long gmfn) -{ - if ( unlikely(shadow_mode_log_dirty(d)) ) - /* See the comment about locking in sh_mark_dirty */ - sh_mark_dirty(d, _mfn(gmfn)); -} - /* Update all the things that are derived from the guest's CR0/CR3/CR4. * Called to initialize paging structures if the paging mode * has changed, and when bringing up a VCPU for the first time. */ void shadow_update_paging_modes(struct vcpu *v); +/* handle log_dirty CLEAN operation. */ +void shadow_log_dirty_op_clean(struct domain *d); /* Remove all mappings of the guest page from the shadows. * This is called from common code. It does not flush TLBs. */ --------------000604050005070008040902 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --------------000604050005070008040902--