From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tim Deegan Subject: Re: lock in vhpet Date: Fri, 27 Apr 2012 10:26:45 +0100 Message-ID: <20120427092645.GC86045@ocelot.phlegethon.org> References: <2a7b92d7a952c53c0fb81bdebdd45d24.squirrel@webmail.lagarcavilla.org> <20120424091646.GB34721@ocelot.phlegethon.org> <958f5bfcc3ae6a631cf7208086553dce.squirrel@webmail.lagarcavilla.org> <20120426212531.GH67043@ocelot.phlegethon.org> <23c9d6057801250ebf2a9713a1bc5af3.squirrel@webmail.lagarcavilla.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="zYM0uCDKw75PZbzx" Return-path: Content-Disposition: inline In-Reply-To: <23c9d6057801250ebf2a9713a1bc5af3.squirrel@webmail.lagarcavilla.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andres Lagar-Cavilla Cc: "Zhang, Yang Z" , "xen-devel@lists.xensource.com" , Keir Fraser List-Id: xen-devel@lists.xenproject.org --zYM0uCDKw75PZbzx Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline At 20:02 -0700 on 26 Apr (1335470547), Andres Lagar-Cavilla wrote: > > Can you please try the attached patch? I think you'll need this one > > plus the ones that take the locks out of the hpet code. > > Right off the bat I'm getting a multitude of > (XEN) mm.c:3294:d0 Error while clearing mfn 100cbb7 > And a hung dom0 during initramfs. I'm a little baffled as to why, but it's > there (32 bit dom0, XenServer6). Curses, I knew there'd be one somewhere. I've been replacing get_page_and_type_from_pagenr()s (which return 0 for success) with old-school get_page_type()s (which return 1 for success) and not always getting the right number of inversions. That's a horrible horrible beartrap of an API, BTW, which had me cursing at the screen, but I had enough on my plate yesterday without touching _that_ code too! > > Andres, this is basically the big-hammer version of your "take a > > pagecount" changes, plus the change you made to hvmemul_rep_movs(). > > If this works I intend to follow it up with a patch to make some of the > > read-modify-write paths avoid taking the lock (by using a > > compare-exchange operation so they only take the lock on a write). If > > that succeeds I might drop put_gfn() altogether. > > You mean cmpxchg the whole p2m entry? I don't think I parse the plan. > There are code paths that do get_gfn_query -> p2m_change_type -> put_gfn. > But I guess those could lock the p2m up-front if they become the only > consumers of put_gfn left. Well, that's more or less what happens now. I was thinking of replacing any remaining (implicit) lock ; read ; think a bit ; maybe write ; unlock code with the fast-path-friendlier: read ; think ; maybe-cmpxchg (and on failure undo or retry which avoids taking the write lock altogether if there's no work to do. But maybe there aren't many of those left now. Obviously any path which will always write should just take the write-lock first. > > - grant-table operations still use the lock, because frankly I > > could not follow the current code, and it's quite late in the evening. > > It's pretty complex with serious nesting, and ifdef's for arm and 32 bit. > gfn_to_mfn_private callers will suffer from altering the current meaning, > as put_gfn resolves to the right thing for the ifdef'ed arch. The other > user is grant_transfer which also relies on the page *not* having an extra > ref in steal_page. So it's a prime candidate to be left alone. Sadly, I think it's not. The PV backends will be doing lots of grant ops, which shouldn't get serialized against all other P2M lookups. > > I also have a long list of uglinesses in the mm code that I found > > Uh, ugly stuff, how could that have happened? I can't imagine. :) Certainly nothing to do with me thinking "I'll clean that up when I get some time." > I have a few preliminary observations on the patch. Pasting relevant bits > here, since the body of the patch seems to have been lost by the email > thread: > > @@ -977,23 +976,25 @@ int arch_set_info_guest( > ... > + > + if (!paging_mode_refcounts(d) > + && !get_page_and_type(cr3_page, d, PGT_l3_page_table) ) > replace with && !get_page_type() ) Yep. > @@ -2404,32 +2373,33 @@ static enum hvm_copy_result __hvm_copy( > gfn = addr >> PAGE_SHIFT; > } > > - mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt)); > + page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE); > replace with (flags & HVMCOPY_to_guest) ? P2M_UNSHARE : P2M_ALLOC (and > same logic when checking p2m_is_shared). Not truly related to your patch > bit since we're at it. OK, but not in this patch. > Same, further down > - if ( !p2m_is_ram(p2mt) ) > + if ( !page ) > { > - put_gfn(curr->domain, gfn); > + if ( page ) > + put_page(page); > Last two lines are redundant Yep. > @@ -4019,35 +3993,16 @@ long do_hvm_op(unsigned long op, XEN_GUE > case HVMOP_modified_memory: a lot of error checking has been removed. Yes, but it was bogus - there's a race between the actual modification and the call, during which anything might have happened. The best we can do is throw log-dirty bits at everything, and the caller can't do anything with the error anyway. When I come to tidy up I'll just add a new mark_gfn_dirty function and skip the pointless gfn->mfn->gfn translation on this path. > arch/x86/mm.c:do_mmu_update -> you blew up all the paging/sharing checking > for target gfns of mmu updates of l2/3/4 entries. It seems that this > wouldn't work anyways, that's why you killed it? Yeah - since only L1es can point at foreign mappings it was all just noise, and even if there had been real p2m lookups on those paths there was no equivalent to the translate-in-place that happens in mod_l1_entry so it would have been broken in a much worse way. > +++ b/xen/arch/x86/mm/hap/guest_walk.c > @@ -54,34 +54,37 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA > ... > + if ( !top_page ) > { > pfec[0] &= ~PFEC_page_present; > - __put_gfn(p2m, top_gfn); > + put_page(top_page); > top_page is NULL here, remove put_page Yep. > get_page_from_gfn_p2m, slow path: no need for p2m_lock/unlock since > locking is already done by get_gfn_type_access/__put_gfn Yeah, but I was writing that with half an eye on killing that lock. :) I'll drop them for now. > (hope those observations made sense without inlining them in the actual > patch) Yes, absolutely - thanks for the review! If we can get this to work well enough I'll tidy it up into a sensible series next week. In the meantime, an updated verison of the monster patch is attached. Cheers, Tim. --zYM0uCDKw75PZbzx Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: attachment; filename=get-page-from-gfn # HG changeset patch # Parent 107285938c50f82667bd4d014820b439a077c22c diff -r 107285938c50 xen/arch/x86/domain.c --- a/xen/arch/x86/domain.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/domain.c Fri Apr 27 10:23:28 2012 +0100 @@ -716,7 +716,7 @@ int arch_set_info_guest( { struct domain *d = v->domain; unsigned long cr3_gfn; - unsigned long cr3_pfn = INVALID_MFN; + struct page_info *cr3_page; unsigned long flags, cr4; unsigned int i; int rc = 0, compat; @@ -925,46 +925,45 @@ int arch_set_info_guest( if ( !compat ) { cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[3]); - cr3_pfn = get_gfn_untyped(d, cr3_gfn); + cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC); - if ( !mfn_valid(cr3_pfn) || - (paging_mode_refcounts(d) - ? !get_page(mfn_to_page(cr3_pfn), d) - : !get_page_and_type(mfn_to_page(cr3_pfn), d, - PGT_base_page_table)) ) + if ( !cr3_page ) { - put_gfn(d, cr3_gfn); + destroy_gdt(v); + return -EINVAL; + } + if ( !paging_mode_refcounts(d) + && !get_page_type(cr3_page, PGT_base_page_table) ) + { + put_page(cr3_page); destroy_gdt(v); return -EINVAL; } - v->arch.guest_table = pagetable_from_pfn(cr3_pfn); - put_gfn(d, cr3_gfn); + v->arch.guest_table = pagetable_from_page(cr3_page); #ifdef __x86_64__ if ( c.nat->ctrlreg[1] ) { cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[1]); - cr3_pfn = get_gfn_untyped(d, cr3_gfn); + cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC); - if ( !mfn_valid(cr3_pfn) || - (paging_mode_refcounts(d) - ? !get_page(mfn_to_page(cr3_pfn), d) - : !get_page_and_type(mfn_to_page(cr3_pfn), d, - PGT_base_page_table)) ) + if ( !cr3_page || + (!paging_mode_refcounts(d) + && !get_page_type(cr3_page, PGT_base_page_table)) ) { - cr3_pfn = pagetable_get_pfn(v->arch.guest_table); + if (cr3_page) + put_page(cr3_page); + cr3_page = pagetable_get_page(v->arch.guest_table); v->arch.guest_table = pagetable_null(); if ( paging_mode_refcounts(d) ) - put_page(mfn_to_page(cr3_pfn)); + put_page(cr3_page); else - put_page_and_type(mfn_to_page(cr3_pfn)); - put_gfn(d, cr3_gfn); + put_page_and_type(cr3_page); destroy_gdt(v); return -EINVAL; } - v->arch.guest_table_user = pagetable_from_pfn(cr3_pfn); - put_gfn(d, cr3_gfn); + v->arch.guest_table_user = pagetable_from_page(cr3_page); } else if ( !(flags & VGCF_in_kernel) ) { @@ -977,23 +976,25 @@ int arch_set_info_guest( l4_pgentry_t *l4tab; cr3_gfn = compat_cr3_to_pfn(c.cmp->ctrlreg[3]); - cr3_pfn = get_gfn_untyped(d, cr3_gfn); + cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC); - if ( !mfn_valid(cr3_pfn) || - (paging_mode_refcounts(d) - ? !get_page(mfn_to_page(cr3_pfn), d) - : !get_page_and_type(mfn_to_page(cr3_pfn), d, - PGT_l3_page_table)) ) + if ( !cr3_page) { - put_gfn(d, cr3_gfn); + destroy_gdt(v); + return -EINVAL; + } + + if (!paging_mode_refcounts(d) + && !get_page_type(cr3_page, PGT_l3_page_table) ) + { + put_page(cr3_page); destroy_gdt(v); return -EINVAL; } l4tab = __va(pagetable_get_paddr(v->arch.guest_table)); - *l4tab = l4e_from_pfn( - cr3_pfn, _PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED); - put_gfn(d, cr3_gfn); + *l4tab = l4e_from_pfn(page_to_mfn(cr3_page), + _PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED); #endif } @@ -1064,7 +1065,7 @@ map_vcpu_info(struct vcpu *v, unsigned l struct domain *d = v->domain; void *mapping; vcpu_info_t *new_info; - unsigned long mfn; + struct page_info *page; int i; if ( offset > (PAGE_SIZE - sizeof(vcpu_info_t)) ) @@ -1077,19 +1078,20 @@ map_vcpu_info(struct vcpu *v, unsigned l if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) return -EINVAL; - mfn = get_gfn_untyped(d, gfn); - if ( !mfn_valid(mfn) || - !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) ) + page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); + if ( !page ) + return -EINVAL; + + if ( !get_page_type(page, PGT_writable_page) ) { - put_gfn(d, gfn); + put_page(page); return -EINVAL; } - mapping = map_domain_page_global(mfn); + mapping = __map_domain_page_global(page); if ( mapping == NULL ) { - put_page_and_type(mfn_to_page(mfn)); - put_gfn(d, gfn); + put_page_and_type(page); return -ENOMEM; } @@ -1106,7 +1108,7 @@ map_vcpu_info(struct vcpu *v, unsigned l } v->vcpu_info = new_info; - v->arch.pv_vcpu.vcpu_info_mfn = mfn; + v->arch.pv_vcpu.vcpu_info_mfn = page_to_mfn(page); /* Set new vcpu_info pointer /before/ setting pending flags. */ wmb(); @@ -1119,7 +1121,6 @@ map_vcpu_info(struct vcpu *v, unsigned l for ( i = 0; i < BITS_PER_EVTCHN_WORD(d); i++ ) set_bit(i, &vcpu_info(v, evtchn_pending_sel)); - put_gfn(d, gfn); return 0; } diff -r 107285938c50 xen/arch/x86/domctl.c --- a/xen/arch/x86/domctl.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/domctl.c Fri Apr 27 10:23:28 2012 +0100 @@ -202,16 +202,16 @@ long arch_do_domctl( for ( j = 0; j < k; j++ ) { - unsigned long type = 0, mfn = get_gfn_untyped(d, arr[j]); + unsigned long type = 0; - page = mfn_to_page(mfn); + page = get_page_from_gfn(d, arr[j], NULL, P2M_ALLOC); - if ( unlikely(!mfn_valid(mfn)) || - unlikely(is_xen_heap_mfn(mfn)) ) + if ( unlikely(!page) || + unlikely(is_xen_heap_page(page)) ) type = XEN_DOMCTL_PFINFO_XTAB; else if ( xsm_getpageframeinfo(page) != 0 ) ; - else if ( likely(get_page(page, d)) ) + else { switch( page->u.inuse.type_info & PGT_type_mask ) { @@ -231,13 +231,10 @@ long arch_do_domctl( if ( page->u.inuse.type_info & PGT_pinned ) type |= XEN_DOMCTL_PFINFO_LPINTAB; + } + if ( page ) put_page(page); - } - else - type = XEN_DOMCTL_PFINFO_XTAB; - - put_gfn(d, arr[j]); arr[j] = type; } @@ -304,21 +301,21 @@ long arch_do_domctl( { struct page_info *page; unsigned long gfn = arr32[j]; - unsigned long mfn = get_gfn_untyped(d, gfn); - page = mfn_to_page(mfn); + page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); if ( domctl->cmd == XEN_DOMCTL_getpageframeinfo3) arr32[j] = 0; - if ( unlikely(!mfn_valid(mfn)) || - unlikely(is_xen_heap_mfn(mfn)) ) + if ( unlikely(!page) || + unlikely(is_xen_heap_page(page)) ) arr32[j] |= XEN_DOMCTL_PFINFO_XTAB; else if ( xsm_getpageframeinfo(page) != 0 ) { - put_gfn(d, gfn); + put_page(page); continue; - } else if ( likely(get_page(page, d)) ) + } + else { unsigned long type = 0; @@ -341,12 +338,10 @@ long arch_do_domctl( if ( page->u.inuse.type_info & PGT_pinned ) type |= XEN_DOMCTL_PFINFO_LPINTAB; arr32[j] |= type; + } + + if ( page ) put_page(page); - } - else - arr32[j] |= XEN_DOMCTL_PFINFO_XTAB; - - put_gfn(d, gfn); } if ( copy_to_guest_offset(domctl->u.getpageframeinfo2.array, @@ -419,7 +414,7 @@ long arch_do_domctl( { struct domain *d = rcu_lock_domain_by_id(domctl->domain); unsigned long gmfn = domctl->u.hypercall_init.gmfn; - unsigned long mfn; + struct page_info *page; void *hypercall_page; ret = -ESRCH; @@ -433,26 +428,25 @@ long arch_do_domctl( break; } - mfn = get_gfn_untyped(d, gmfn); + page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); ret = -EACCES; - if ( !mfn_valid(mfn) || - !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) ) + if ( !page || !get_page_type(page, PGT_writable_page) ) { - put_gfn(d, gmfn); + if ( page ) + put_page(page); rcu_unlock_domain(d); break; } ret = 0; - hypercall_page = map_domain_page(mfn); + hypercall_page = __map_domain_page(page); hypercall_page_initialise(d, hypercall_page); unmap_domain_page(hypercall_page); - put_page_and_type(mfn_to_page(mfn)); + put_page_and_type(page); - put_gfn(d, gmfn); rcu_unlock_domain(d); } break; diff -r 107285938c50 xen/arch/x86/hvm/emulate.c --- a/xen/arch/x86/hvm/emulate.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/hvm/emulate.c Fri Apr 27 10:23:28 2012 +0100 @@ -60,34 +60,25 @@ static int hvmemul_do_io( ioreq_t *p = get_ioreq(curr); unsigned long ram_gfn = paddr_to_pfn(ram_gpa); p2m_type_t p2mt; - mfn_t ram_mfn; + struct page_info *ram_page; int rc; /* Check for paged out page */ - ram_mfn = get_gfn_unshare(curr->domain, ram_gfn, &p2mt); + ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE); if ( p2m_is_paging(p2mt) ) { - put_gfn(curr->domain, ram_gfn); + if ( ram_page ) + put_page(ram_page); p2m_mem_paging_populate(curr->domain, ram_gfn); return X86EMUL_RETRY; } if ( p2m_is_shared(p2mt) ) { - put_gfn(curr->domain, ram_gfn); + if ( ram_page ) + put_page(ram_page); return X86EMUL_RETRY; } - /* Maintain a ref on the mfn to ensure liveness. Put the gfn - * to avoid potential deadlock wrt event channel lock, later. */ - if ( mfn_valid(mfn_x(ram_mfn)) ) - if ( !get_page(mfn_to_page(mfn_x(ram_mfn)), - curr->domain) ) - { - put_gfn(curr->domain, ram_gfn); - return X86EMUL_RETRY; - } - put_gfn(curr->domain, ram_gfn); - /* * Weird-sized accesses have undefined behaviour: we discard writes * and read all-ones. @@ -98,8 +89,8 @@ static int hvmemul_do_io( ASSERT(p_data != NULL); /* cannot happen with a REP prefix */ if ( dir == IOREQ_READ ) memset(p_data, ~0, size); - if ( mfn_valid(mfn_x(ram_mfn)) ) - put_page(mfn_to_page(mfn_x(ram_mfn))); + if ( ram_page ) + put_page(ram_page); return X86EMUL_UNHANDLEABLE; } @@ -120,8 +111,8 @@ static int hvmemul_do_io( unsigned int bytes = vio->mmio_large_write_bytes; if ( (addr >= pa) && ((addr + size) <= (pa + bytes)) ) { - if ( mfn_valid(mfn_x(ram_mfn)) ) - put_page(mfn_to_page(mfn_x(ram_mfn))); + if ( ram_page ) + put_page(ram_page); return X86EMUL_OKAY; } } @@ -133,8 +124,8 @@ static int hvmemul_do_io( { memcpy(p_data, &vio->mmio_large_read[addr - pa], size); - if ( mfn_valid(mfn_x(ram_mfn)) ) - put_page(mfn_to_page(mfn_x(ram_mfn))); + if ( ram_page ) + put_page(ram_page); return X86EMUL_OKAY; } } @@ -148,8 +139,8 @@ static int hvmemul_do_io( vio->io_state = HVMIO_none; if ( p_data == NULL ) { - if ( mfn_valid(mfn_x(ram_mfn)) ) - put_page(mfn_to_page(mfn_x(ram_mfn))); + if ( ram_page ) + put_page(ram_page); return X86EMUL_UNHANDLEABLE; } goto finish_access; @@ -159,13 +150,13 @@ static int hvmemul_do_io( (addr == (vio->mmio_large_write_pa + vio->mmio_large_write_bytes)) ) { - if ( mfn_valid(mfn_x(ram_mfn)) ) - put_page(mfn_to_page(mfn_x(ram_mfn))); + if ( ram_page ) + put_page(ram_page); return X86EMUL_RETRY; } default: - if ( mfn_valid(mfn_x(ram_mfn)) ) - put_page(mfn_to_page(mfn_x(ram_mfn))); + if ( ram_page ) + put_page(ram_page); return X86EMUL_UNHANDLEABLE; } @@ -173,8 +164,8 @@ static int hvmemul_do_io( { gdprintk(XENLOG_WARNING, "WARNING: io already pending (%d)?\n", p->state); - if ( mfn_valid(mfn_x(ram_mfn)) ) - put_page(mfn_to_page(mfn_x(ram_mfn))); + if ( ram_page ) + put_page(ram_page); return X86EMUL_UNHANDLEABLE; } @@ -226,8 +217,8 @@ static int hvmemul_do_io( if ( rc != X86EMUL_OKAY ) { - if ( mfn_valid(mfn_x(ram_mfn)) ) - put_page(mfn_to_page(mfn_x(ram_mfn))); + if ( ram_page ) + put_page(ram_page); return rc; } @@ -263,8 +254,8 @@ static int hvmemul_do_io( } } - if ( mfn_valid(mfn_x(ram_mfn)) ) - put_page(mfn_to_page(mfn_x(ram_mfn))); + if ( ram_page ) + put_page(ram_page); return X86EMUL_OKAY; } @@ -686,7 +677,6 @@ static int hvmemul_rep_movs( p2m_type_t sp2mt, dp2mt; int rc, df = !!(ctxt->regs->eflags & X86_EFLAGS_DF); char *buf; - struct two_gfns tg; rc = hvmemul_virtual_to_linear( src_seg, src_offset, bytes_per_rep, reps, hvm_access_read, @@ -714,25 +704,17 @@ static int hvmemul_rep_movs( if ( rc != X86EMUL_OKAY ) return rc; - get_two_gfns(current->domain, sgpa >> PAGE_SHIFT, &sp2mt, NULL, NULL, - current->domain, dgpa >> PAGE_SHIFT, &dp2mt, NULL, NULL, - P2M_ALLOC, &tg); + /* Check for MMIO ops */ + (void) get_gfn_query_unlocked(current->domain, sgpa >> PAGE_SHIFT, &sp2mt); + (void) get_gfn_query_unlocked(current->domain, dgpa >> PAGE_SHIFT, &dp2mt); - if ( !p2m_is_ram(sp2mt) && !p2m_is_grant(sp2mt) ) - { - rc = hvmemul_do_mmio( + if ( sp2mt == p2m_mmio_dm ) + return hvmemul_do_mmio( sgpa, reps, bytes_per_rep, dgpa, IOREQ_READ, df, NULL); - put_two_gfns(&tg); - return rc; - } - if ( !p2m_is_ram(dp2mt) && !p2m_is_grant(dp2mt) ) - { - rc = hvmemul_do_mmio( + if ( dp2mt == p2m_mmio_dm ) + return hvmemul_do_mmio( dgpa, reps, bytes_per_rep, sgpa, IOREQ_WRITE, df, NULL); - put_two_gfns(&tg); - return rc; - } /* RAM-to-RAM copy: emulate as equivalent of memmove(dgpa, sgpa, bytes). */ bytes = *reps * bytes_per_rep; @@ -747,10 +729,7 @@ static int hvmemul_rep_movs( * can be emulated by a source-to-buffer-to-destination block copy. */ if ( ((dgpa + bytes_per_rep) > sgpa) && (dgpa < (sgpa + bytes)) ) - { - put_two_gfns(&tg); return X86EMUL_UNHANDLEABLE; - } /* Adjust destination address for reverse copy. */ if ( df ) @@ -759,10 +738,7 @@ static int hvmemul_rep_movs( /* Allocate temporary buffer. Fall back to slow emulation if this fails. */ buf = xmalloc_bytes(bytes); if ( buf == NULL ) - { - put_two_gfns(&tg); return X86EMUL_UNHANDLEABLE; - } /* * We do a modicum of checking here, just for paranoia's sake and to @@ -773,7 +749,6 @@ static int hvmemul_rep_movs( rc = hvm_copy_to_guest_phys(dgpa, buf, bytes); xfree(buf); - put_two_gfns(&tg); if ( rc == HVMCOPY_gfn_paged_out ) return X86EMUL_RETRY; diff -r 107285938c50 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/hvm/hvm.c Fri Apr 27 10:23:28 2012 +0100 @@ -395,48 +395,41 @@ int prepare_ring_for_helper( { struct page_info *page; p2m_type_t p2mt; - unsigned long mfn; void *va; - mfn = mfn_x(get_gfn_unshare(d, gmfn, &p2mt)); - if ( !p2m_is_ram(p2mt) ) - { - put_gfn(d, gmfn); - return -EINVAL; - } + page = get_page_from_gfn(d, gmfn, &p2mt, P2M_UNSHARE); if ( p2m_is_paging(p2mt) ) { - put_gfn(d, gmfn); + if ( page ) + put_page(page); p2m_mem_paging_populate(d, gmfn); return -ENOENT; } if ( p2m_is_shared(p2mt) ) { - put_gfn(d, gmfn); + if ( page ) + put_page(page); return -ENOENT; } - ASSERT(mfn_valid(mfn)); - - page = mfn_to_page(mfn); - if ( !get_page_and_type(page, d, PGT_writable_page) ) + if ( !page ) + return -EINVAL; + + if ( !get_page_type(page, PGT_writable_page) ) { - put_gfn(d, gmfn); + put_page(page); return -EINVAL; } - va = map_domain_page_global(mfn); + va = __map_domain_page_global(page); if ( va == NULL ) { put_page_and_type(page); - put_gfn(d, gmfn); return -ENOMEM; } *_va = va; *_page = page; - put_gfn(d, gmfn); - return 0; } @@ -1607,8 +1600,8 @@ int hvm_mov_from_cr(unsigned int cr, uns int hvm_set_cr0(unsigned long value) { struct vcpu *v = current; - p2m_type_t p2mt; - unsigned long gfn, mfn, old_value = v->arch.hvm_vcpu.guest_cr[0]; + unsigned long gfn, old_value = v->arch.hvm_vcpu.guest_cr[0]; + struct page_info *page; HVM_DBG_LOG(DBG_LEVEL_VMMU, "Update CR0 value = %lx", value); @@ -1647,23 +1640,20 @@ int hvm_set_cr0(unsigned long value) { /* The guest CR3 must be pointing to the guest physical. */ gfn = v->arch.hvm_vcpu.guest_cr[3]>>PAGE_SHIFT; - mfn = mfn_x(get_gfn(v->domain, gfn, &p2mt)); - if ( !p2m_is_ram(p2mt) || !mfn_valid(mfn) || - !get_page(mfn_to_page(mfn), v->domain)) + page = get_page_from_gfn(v->domain, gfn, NULL, P2M_ALLOC); + if ( !page ) { - put_gfn(v->domain, gfn); - gdprintk(XENLOG_ERR, "Invalid CR3 value = %lx (mfn=%lx)\n", - v->arch.hvm_vcpu.guest_cr[3], mfn); + gdprintk(XENLOG_ERR, "Invalid CR3 value = %lx\n", + v->arch.hvm_vcpu.guest_cr[3]); domain_crash(v->domain); return X86EMUL_UNHANDLEABLE; } /* Now arch.guest_table points to machine physical. */ - v->arch.guest_table = pagetable_from_pfn(mfn); + v->arch.guest_table = pagetable_from_page(page); HVM_DBG_LOG(DBG_LEVEL_VMMU, "Update CR3 value = %lx, mfn = %lx", - v->arch.hvm_vcpu.guest_cr[3], mfn); - put_gfn(v->domain, gfn); + v->arch.hvm_vcpu.guest_cr[3], page_to_mfn(page)); } } else if ( !(value & X86_CR0_PG) && (old_value & X86_CR0_PG) ) @@ -1738,26 +1728,21 @@ int hvm_set_cr0(unsigned long value) int hvm_set_cr3(unsigned long value) { - unsigned long mfn; - p2m_type_t p2mt; struct vcpu *v = current; + struct page_info *page; if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) && (value != v->arch.hvm_vcpu.guest_cr[3]) ) { /* Shadow-mode CR3 change. Check PDBR and update refcounts. */ HVM_DBG_LOG(DBG_LEVEL_VMMU, "CR3 value = %lx", value); - mfn = mfn_x(get_gfn(v->domain, value >> PAGE_SHIFT, &p2mt)); - if ( !p2m_is_ram(p2mt) || !mfn_valid(mfn) || - !get_page(mfn_to_page(mfn), v->domain) ) - { - put_gfn(v->domain, value >> PAGE_SHIFT); - goto bad_cr3; - } + page = get_page_from_gfn(v->domain, value >> PAGE_SHIFT, + NULL, P2M_ALLOC); + if ( !page ) + goto bad_cr3; put_page(pagetable_get_page(v->arch.guest_table)); - v->arch.guest_table = pagetable_from_pfn(mfn); - put_gfn(v->domain, value >> PAGE_SHIFT); + v->arch.guest_table = pagetable_from_page(page); HVM_DBG_LOG(DBG_LEVEL_VMMU, "Update CR3 value = %lx", value); } @@ -1914,46 +1899,29 @@ int hvm_virtual_to_linear_addr( static void *__hvm_map_guest_frame(unsigned long gfn, bool_t writable) { void *map; - unsigned long mfn; p2m_type_t p2mt; - struct page_info *pg; + struct page_info *page; struct domain *d = current->domain; - int rc; - - mfn = mfn_x(writable - ? get_gfn_unshare(d, gfn, &p2mt) - : get_gfn(d, gfn, &p2mt)); - if ( (p2m_is_shared(p2mt) && writable) || !p2m_is_ram(p2mt) ) + + page = get_page_from_gfn(d, gfn, &p2mt, + writable ? P2M_UNSHARE : P2M_ALLOC); + if ( (p2m_is_shared(p2mt) && writable) || !page ) { - put_gfn(d, gfn); + if ( page ) + put_page(page); return NULL; } if ( p2m_is_paging(p2mt) ) { - put_gfn(d, gfn); + put_page(page); p2m_mem_paging_populate(d, gfn); return NULL; } - ASSERT(mfn_valid(mfn)); - if ( writable ) - paging_mark_dirty(d, mfn); - - /* Get a ref on the page, considering that it could be shared */ - pg = mfn_to_page(mfn); - rc = get_page(pg, d); - if ( !rc && !writable ) - /* Page could be shared */ - rc = get_page(pg, dom_cow); - if ( !rc ) - { - put_gfn(d, gfn); - return NULL; - } - - map = map_domain_page(mfn); - put_gfn(d, gfn); + paging_mark_dirty(d, page_to_mfn(page)); + + map = __map_domain_page(page); return map; } @@ -2358,7 +2326,8 @@ static enum hvm_copy_result __hvm_copy( void *buf, paddr_t addr, int size, unsigned int flags, uint32_t pfec) { struct vcpu *curr = current; - unsigned long gfn, mfn; + unsigned long gfn; + struct page_info *page; p2m_type_t p2mt; char *p; int count, todo = size; @@ -2402,32 +2371,33 @@ static enum hvm_copy_result __hvm_copy( gfn = addr >> PAGE_SHIFT; } - mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt)); + page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE); if ( p2m_is_paging(p2mt) ) { - put_gfn(curr->domain, gfn); + if ( page ) + put_page(page); p2m_mem_paging_populate(curr->domain, gfn); return HVMCOPY_gfn_paged_out; } if ( p2m_is_shared(p2mt) ) { - put_gfn(curr->domain, gfn); + if ( page ) + put_page(page); return HVMCOPY_gfn_shared; } if ( p2m_is_grant(p2mt) ) { - put_gfn(curr->domain, gfn); + if ( page ) + put_page(page); return HVMCOPY_unhandleable; } - if ( !p2m_is_ram(p2mt) ) + if ( !page ) { - put_gfn(curr->domain, gfn); return HVMCOPY_bad_gfn_to_mfn; } - ASSERT(mfn_valid(mfn)); - - p = (char *)map_domain_page(mfn) + (addr & ~PAGE_MASK); + + p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK); if ( flags & HVMCOPY_to_guest ) { @@ -2437,12 +2407,12 @@ static enum hvm_copy_result __hvm_copy( if ( xchg(&lastpage, gfn) != gfn ) gdprintk(XENLOG_DEBUG, "guest attempted write to read-only" " memory page. gfn=%#lx, mfn=%#lx\n", - gfn, mfn); + gfn, page_to_mfn(page)); } else { memcpy(p, buf, count); - paging_mark_dirty(curr->domain, mfn); + paging_mark_dirty(curr->domain, page_to_mfn(page)); } } else @@ -2455,7 +2425,7 @@ static enum hvm_copy_result __hvm_copy( addr += count; buf += count; todo -= count; - put_gfn(curr->domain, gfn); + put_page(page); } return HVMCOPY_okay; @@ -2464,7 +2434,8 @@ static enum hvm_copy_result __hvm_copy( static enum hvm_copy_result __hvm_clear(paddr_t addr, int size) { struct vcpu *curr = current; - unsigned long gfn, mfn; + unsigned long gfn; + struct page_info *page; p2m_type_t p2mt; char *p; int count, todo = size; @@ -2500,32 +2471,35 @@ static enum hvm_copy_result __hvm_clear( return HVMCOPY_bad_gva_to_gfn; } - mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt)); + page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE); if ( p2m_is_paging(p2mt) ) { + if ( page ) + put_page(page); p2m_mem_paging_populate(curr->domain, gfn); - put_gfn(curr->domain, gfn); return HVMCOPY_gfn_paged_out; } if ( p2m_is_shared(p2mt) ) { - put_gfn(curr->domain, gfn); + if ( page ) + put_page(page); return HVMCOPY_gfn_shared; } if ( p2m_is_grant(p2mt) ) { - put_gfn(curr->domain, gfn); + if ( page ) + put_page(page); return HVMCOPY_unhandleable; } - if ( !p2m_is_ram(p2mt) ) + if ( !page ) { - put_gfn(curr->domain, gfn); + if ( page ) + put_page(page); return HVMCOPY_bad_gfn_to_mfn; } - ASSERT(mfn_valid(mfn)); - - p = (char *)map_domain_page(mfn) + (addr & ~PAGE_MASK); + + p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK); if ( p2mt == p2m_ram_ro ) { @@ -2533,19 +2507,19 @@ static enum hvm_copy_result __hvm_clear( if ( xchg(&lastpage, gfn) != gfn ) gdprintk(XENLOG_DEBUG, "guest attempted write to read-only" " memory page. gfn=%#lx, mfn=%#lx\n", - gfn, mfn); + gfn, page_to_mfn(page)); } else { memset(p, 0x00, count); - paging_mark_dirty(curr->domain, mfn); + paging_mark_dirty(curr->domain, page_to_mfn(page)); } unmap_domain_page(p); addr += count; todo -= count; - put_gfn(curr->domain, gfn); + put_page(page); } return HVMCOPY_okay; @@ -4000,35 +3974,16 @@ long do_hvm_op(unsigned long op, XEN_GUE for ( pfn = a.first_pfn; pfn < a.first_pfn + a.nr; pfn++ ) { - p2m_type_t t; - mfn_t mfn = get_gfn_unshare(d, pfn, &t); - if ( p2m_is_paging(t) ) + struct page_info *page; + page = get_page_from_gfn(d, pfn, NULL, P2M_UNSHARE); + if ( page ) { - put_gfn(d, pfn); - p2m_mem_paging_populate(d, pfn); - rc = -EINVAL; - goto param_fail3; - } - if( p2m_is_shared(t) ) - { - /* If it insists on not unsharing itself, crash the domain - * rather than crashing the host down in mark dirty */ - gdprintk(XENLOG_WARNING, - "shared pfn 0x%lx modified?\n", pfn); - domain_crash(d); - put_gfn(d, pfn); - rc = -EINVAL; - goto param_fail3; - } - - if ( mfn_x(mfn) != INVALID_MFN ) - { - paging_mark_dirty(d, mfn_x(mfn)); + paging_mark_dirty(d, page_to_mfn(page)); /* These are most probably not page tables any more */ /* don't take a long time and don't die either */ - sh_remove_shadows(d->vcpu[0], mfn, 1, 0); + sh_remove_shadows(d->vcpu[0], _mfn(page_to_mfn(page)), 1, 0); + put_page(page); } - put_gfn(d, pfn); } param_fail3: diff -r 107285938c50 xen/arch/x86/hvm/stdvga.c --- a/xen/arch/x86/hvm/stdvga.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/hvm/stdvga.c Fri Apr 27 10:23:28 2012 +0100 @@ -482,7 +482,8 @@ static int mmio_move(struct hvm_hw_stdvg if ( hvm_copy_to_guest_phys(data, &tmp, p->size) != HVMCOPY_okay ) { - (void)get_gfn(d, data >> PAGE_SHIFT, &p2mt); + struct page_info *dp = get_page_from_gfn( + d, data >> PAGE_SHIFT, &p2mt, P2M_ALLOC); /* * The only case we handle is vga_mem <-> vga_mem. * Anything else disables caching and leaves it to qemu-dm. @@ -490,11 +491,12 @@ static int mmio_move(struct hvm_hw_stdvg if ( (p2mt != p2m_mmio_dm) || (data < VGA_MEM_BASE) || ((data + p->size) > (VGA_MEM_BASE + VGA_MEM_SIZE)) ) { - put_gfn(d, data >> PAGE_SHIFT); + if ( dp ) + put_page(dp); return 0; } + ASSERT(!dp); stdvga_mem_write(data, tmp, p->size); - put_gfn(d, data >> PAGE_SHIFT); } data += sign * p->size; addr += sign * p->size; @@ -508,15 +510,16 @@ static int mmio_move(struct hvm_hw_stdvg if ( hvm_copy_from_guest_phys(&tmp, data, p->size) != HVMCOPY_okay ) { - (void)get_gfn(d, data >> PAGE_SHIFT, &p2mt); + struct page_info *dp = get_page_from_gfn( + d, data >> PAGE_SHIFT, &p2mt, P2M_ALLOC); if ( (p2mt != p2m_mmio_dm) || (data < VGA_MEM_BASE) || ((data + p->size) > (VGA_MEM_BASE + VGA_MEM_SIZE)) ) { - put_gfn(d, data >> PAGE_SHIFT); + if ( dp ) + put_page(dp); return 0; } tmp = stdvga_mem_read(data, p->size); - put_gfn(d, data >> PAGE_SHIFT); } stdvga_mem_write(addr, tmp, p->size); data += sign * p->size; diff -r 107285938c50 xen/arch/x86/hvm/viridian.c --- a/xen/arch/x86/hvm/viridian.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/hvm/viridian.c Fri Apr 27 10:23:28 2012 +0100 @@ -134,18 +134,19 @@ void dump_apic_assist(struct vcpu *v) static void enable_hypercall_page(struct domain *d) { unsigned long gmfn = d->arch.hvm_domain.viridian.hypercall_gpa.fields.pfn; - unsigned long mfn = get_gfn_untyped(d, gmfn); + struct page_info *page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); uint8_t *p; - if ( !mfn_valid(mfn) || - !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) ) + if ( !page || !get_page_type(page, PGT_writable_page) ) { - put_gfn(d, gmfn); - gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn, mfn); + if ( page ) + put_page(page); + gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn, + page_to_mfn(page)); return; } - p = map_domain_page(mfn); + p = __map_domain_page(page); /* * We set the bit 31 in %eax (reserved field in the Viridian hypercall @@ -162,15 +163,14 @@ static void enable_hypercall_page(struct unmap_domain_page(p); - put_page_and_type(mfn_to_page(mfn)); - put_gfn(d, gmfn); + put_page_and_type(page); } void initialize_apic_assist(struct vcpu *v) { struct domain *d = v->domain; unsigned long gmfn = v->arch.hvm_vcpu.viridian.apic_assist.fields.pfn; - unsigned long mfn = get_gfn_untyped(d, gmfn); + struct page_info *page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); uint8_t *p; /* @@ -183,22 +183,22 @@ void initialize_apic_assist(struct vcpu * details of how Windows uses the page. */ - if ( !mfn_valid(mfn) || - !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) ) + if ( !page || !get_page_type(page, PGT_writable_page) ) { - put_gfn(d, gmfn); - gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn, mfn); + if ( page ) + put_page(page); + gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn, + page_to_mfn(page)); return; } - p = map_domain_page(mfn); + p = __map_domain_page(page); *(u32 *)p = 0; unmap_domain_page(p); - put_page_and_type(mfn_to_page(mfn)); - put_gfn(d, gmfn); + put_page_and_type(page); } int wrmsr_viridian_regs(uint32_t idx, uint64_t val) diff -r 107285938c50 xen/arch/x86/hvm/vmx/vmx.c --- a/xen/arch/x86/hvm/vmx/vmx.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/hvm/vmx/vmx.c Fri Apr 27 10:23:28 2012 +0100 @@ -480,17 +480,16 @@ static void vmx_vmcs_save(struct vcpu *v static int vmx_restore_cr0_cr3( struct vcpu *v, unsigned long cr0, unsigned long cr3) { - unsigned long mfn = 0; - p2m_type_t p2mt; + struct page_info *page = NULL; if ( paging_mode_shadow(v->domain) ) { if ( cr0 & X86_CR0_PG ) { - mfn = mfn_x(get_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt)); - if ( !p2m_is_ram(p2mt) || !get_page(mfn_to_page(mfn), v->domain) ) + page = get_page_from_gfn(v->domain, cr3 >> PAGE_SHIFT, + NULL, P2M_ALLOC); + if ( !page ) { - put_gfn(v->domain, cr3 >> PAGE_SHIFT); gdprintk(XENLOG_ERR, "Invalid CR3 value=0x%lx\n", cr3); return -EINVAL; } @@ -499,9 +498,8 @@ static int vmx_restore_cr0_cr3( if ( hvm_paging_enabled(v) ) put_page(pagetable_get_page(v->arch.guest_table)); - v->arch.guest_table = pagetable_from_pfn(mfn); - if ( cr0 & X86_CR0_PG ) - put_gfn(v->domain, cr3 >> PAGE_SHIFT); + v->arch.guest_table = + page ? pagetable_from_page(page) : pagetable_null(); } v->arch.hvm_vcpu.guest_cr[0] = cr0 | X86_CR0_ET; @@ -1026,8 +1024,9 @@ static void vmx_set_interrupt_shadow(str static void vmx_load_pdptrs(struct vcpu *v) { - unsigned long cr3 = v->arch.hvm_vcpu.guest_cr[3], mfn; + unsigned long cr3 = v->arch.hvm_vcpu.guest_cr[3]; uint64_t *guest_pdptrs; + struct page_info *page; p2m_type_t p2mt; char *p; @@ -1038,24 +1037,19 @@ static void vmx_load_pdptrs(struct vcpu if ( (cr3 & 0x1fUL) && !hvm_pcid_enabled(v) ) goto crash; - mfn = mfn_x(get_gfn_unshare(v->domain, cr3 >> PAGE_SHIFT, &p2mt)); - if ( !p2m_is_ram(p2mt) || !mfn_valid(mfn) || - /* If we didn't succeed in unsharing, get_page will fail - * (page still belongs to dom_cow) */ - !get_page(mfn_to_page(mfn), v->domain) ) + page = get_page_from_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt, P2M_UNSHARE); + if ( !page ) { /* Ideally you don't want to crash but rather go into a wait * queue, but this is the wrong place. We're holding at least * the paging lock */ gdprintk(XENLOG_ERR, - "Bad cr3 on load pdptrs gfn %lx mfn %lx type %d\n", - cr3 >> PAGE_SHIFT, mfn, (int) p2mt); - put_gfn(v->domain, cr3 >> PAGE_SHIFT); + "Bad cr3 on load pdptrs gfn %lx type %d\n", + cr3 >> PAGE_SHIFT, (int) p2mt); goto crash; } - put_gfn(v->domain, cr3 >> PAGE_SHIFT); - - p = map_domain_page(mfn); + + p = __map_domain_page(page); guest_pdptrs = (uint64_t *)(p + (cr3 & ~PAGE_MASK)); @@ -1081,7 +1075,7 @@ static void vmx_load_pdptrs(struct vcpu vmx_vmcs_exit(v); unmap_domain_page(p); - put_page(mfn_to_page(mfn)); + put_page(page); return; crash: diff -r 107285938c50 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/mm.c Fri Apr 27 10:23:28 2012 +0100 @@ -651,7 +651,8 @@ int map_ldt_shadow_page(unsigned int off { struct vcpu *v = current; struct domain *d = v->domain; - unsigned long gmfn, mfn; + unsigned long gmfn; + struct page_info *page; l1_pgentry_t l1e, nl1e; unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT); int okay; @@ -663,28 +664,24 @@ int map_ldt_shadow_page(unsigned int off return 0; gmfn = l1e_get_pfn(l1e); - mfn = get_gfn_untyped(d, gmfn); - if ( unlikely(!mfn_valid(mfn)) ) + page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); + if ( unlikely(!page) ) + return 0; + + okay = get_page_type(page, PGT_seg_desc_page); + if ( unlikely(!okay) ) { - put_gfn(d, gmfn); + put_page(page); return 0; } - okay = get_page_and_type(mfn_to_page(mfn), d, PGT_seg_desc_page); - if ( unlikely(!okay) ) - { - put_gfn(d, gmfn); - return 0; - } - - nl1e = l1e_from_pfn(mfn, l1e_get_flags(l1e) | _PAGE_RW); + nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW); spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock); l1e_write(&v->arch.perdomain_ptes[off + 16], nl1e); v->arch.pv_vcpu.shadow_ldt_mapcnt++; spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock); - put_gfn(d, gmfn); return 1; } @@ -1819,7 +1816,6 @@ static int mod_l1_entry(l1_pgentry_t *pl { l1_pgentry_t ol1e; struct domain *pt_dom = pt_vcpu->domain; - p2m_type_t p2mt; int rc = 0; if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) ) @@ -1835,22 +1831,21 @@ static int mod_l1_entry(l1_pgentry_t *pl if ( l1e_get_flags(nl1e) & _PAGE_PRESENT ) { /* Translate foreign guest addresses. */ - unsigned long mfn, gfn; - gfn = l1e_get_pfn(nl1e); - mfn = mfn_x(get_gfn(pg_dom, gfn, &p2mt)); - if ( !p2m_is_ram(p2mt) || unlikely(mfn == INVALID_MFN) ) + struct page_info *page = NULL; + if ( paging_mode_translate(pg_dom) ) { - put_gfn(pg_dom, gfn); - return -EINVAL; + page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC); + if ( !page ) + return -EINVAL; + nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e)); } - ASSERT((mfn & ~(PADDR_MASK >> PAGE_SHIFT)) == 0); - nl1e = l1e_from_pfn(mfn, l1e_get_flags(nl1e)); if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) ) { MEM_LOG("Bad L1 flags %x", l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)); - put_gfn(pg_dom, gfn); + if ( page ) + put_page(page); return -EINVAL; } @@ -1860,15 +1855,21 @@ static int mod_l1_entry(l1_pgentry_t *pl adjust_guest_l1e(nl1e, pt_dom); if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) ) + { + if ( page ) + put_page(page); return 0; - put_gfn(pg_dom, gfn); + } + if ( page ) + put_page(page); return -EBUSY; } switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) ) { default: - put_gfn(pg_dom, gfn); + if ( page ) + put_page(page); return rc; case 0: break; @@ -1876,7 +1877,9 @@ static int mod_l1_entry(l1_pgentry_t *pl l1e_remove_flags(nl1e, _PAGE_RW); break; } - + if ( page ) + put_page(page); + adjust_guest_l1e(nl1e, pt_dom); if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad)) ) @@ -1884,7 +1887,6 @@ static int mod_l1_entry(l1_pgentry_t *pl ol1e = nl1e; rc = -EBUSY; } - put_gfn(pg_dom, gfn); } else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad)) ) @@ -3042,7 +3044,6 @@ int do_mmuext_op( type = PGT_l4_page_table; pin_page: { - unsigned long mfn; struct page_info *page; /* Ignore pinning of invalid paging levels. */ @@ -3052,25 +3053,28 @@ int do_mmuext_op( if ( paging_mode_refcounts(pg_owner) ) break; - mfn = get_gfn_untyped(pg_owner, op.arg1.mfn); - rc = get_page_and_type_from_pagenr(mfn, type, pg_owner, 0, 1); + page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC); + if ( unlikely(!page) ) + { + rc = -EINVAL; + break; + } + + rc = get_page_type_preemptible(page, type); okay = !rc; if ( unlikely(!okay) ) { if ( rc == -EINTR ) rc = -EAGAIN; else if ( rc != -EAGAIN ) - MEM_LOG("Error while pinning mfn %lx", mfn); - put_gfn(pg_owner, op.arg1.mfn); + MEM_LOG("Error while pinning mfn %lx", page_to_mfn(page)); + put_page(page); break; } - page = mfn_to_page(mfn); - if ( (rc = xsm_memory_pin_page(d, page)) != 0 ) { put_page_and_type(page); - put_gfn(pg_owner, op.arg1.mfn); okay = 0; break; } @@ -3078,16 +3082,15 @@ int do_mmuext_op( if ( unlikely(test_and_set_bit(_PGT_pinned, &page->u.inuse.type_info)) ) { - MEM_LOG("Mfn %lx already pinned", mfn); + MEM_LOG("Mfn %lx already pinned", page_to_mfn(page)); put_page_and_type(page); - put_gfn(pg_owner, op.arg1.mfn); okay = 0; break; } /* A page is dirtied when its pin status is set. */ - paging_mark_dirty(pg_owner, mfn); - + paging_mark_dirty(pg_owner, page_to_mfn(page)); + /* We can race domain destruction (domain_relinquish_resources). */ if ( unlikely(pg_owner != d) ) { @@ -3099,35 +3102,29 @@ int do_mmuext_op( spin_unlock(&pg_owner->page_alloc_lock); if ( drop_ref ) put_page_and_type(page); - put_gfn(pg_owner, op.arg1.mfn); } break; } case MMUEXT_UNPIN_TABLE: { - unsigned long mfn; struct page_info *page; if ( paging_mode_refcounts(pg_owner) ) break; - mfn = get_gfn_untyped(pg_owner, op.arg1.mfn); - if ( unlikely(!(okay = get_page_from_pagenr(mfn, pg_owner))) ) + page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC); + if ( unlikely(!page) ) { - put_gfn(pg_owner, op.arg1.mfn); - MEM_LOG("Mfn %lx bad domain", mfn); + MEM_LOG("Mfn %lx bad domain", op.arg1.mfn); break; } - page = mfn_to_page(mfn); - if ( !test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) ) { okay = 0; put_page(page); - put_gfn(pg_owner, op.arg1.mfn); - MEM_LOG("Mfn %lx not pinned", mfn); + MEM_LOG("Mfn %lx not pinned", op.arg1.mfn); break; } @@ -3135,40 +3132,43 @@ int do_mmuext_op( put_page(page); /* A page is dirtied when its pin status is cleared. */ - paging_mark_dirty(pg_owner, mfn); - - put_gfn(pg_owner, op.arg1.mfn); + paging_mark_dirty(pg_owner, page_to_mfn(page)); + break; } case MMUEXT_NEW_BASEPTR: - okay = new_guest_cr3(get_gfn_untyped(d, op.arg1.mfn)); - put_gfn(d, op.arg1.mfn); + okay = (!paging_mode_translate(d) + && new_guest_cr3(op.arg1.mfn)); break; + #ifdef __x86_64__ case MMUEXT_NEW_USER_BASEPTR: { - unsigned long old_mfn, mfn; - - mfn = get_gfn_untyped(d, op.arg1.mfn); - if ( mfn != 0 ) + unsigned long old_mfn; + + if ( paging_mode_translate(current->domain) ) + { + okay = 0; + break; + } + + if ( op.arg1.mfn != 0 ) { if ( paging_mode_refcounts(d) ) - okay = get_page_from_pagenr(mfn, d); + okay = get_page_from_pagenr(op.arg1.mfn, d); else okay = !get_page_and_type_from_pagenr( - mfn, PGT_root_page_table, d, 0, 0); + op.arg1.mfn, PGT_root_page_table, d, 0, 0); if ( unlikely(!okay) ) { - put_gfn(d, op.arg1.mfn); - MEM_LOG("Error while installing new mfn %lx", mfn); + MEM_LOG("Error while installing new mfn %lx", op.arg1.mfn); break; } } old_mfn = pagetable_get_pfn(curr->arch.guest_table_user); - curr->arch.guest_table_user = pagetable_from_pfn(mfn); - put_gfn(d, op.arg1.mfn); + curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn); if ( old_mfn != 0 ) { @@ -3283,28 +3283,26 @@ int do_mmuext_op( } case MMUEXT_CLEAR_PAGE: { - unsigned long mfn; + struct page_info *page; unsigned char *ptr; - mfn = get_gfn_untyped(d, op.arg1.mfn); - okay = !get_page_and_type_from_pagenr( - mfn, PGT_writable_page, d, 0, 0); - if ( unlikely(!okay) ) + page = get_page_from_gfn(d, op.arg1.mfn, NULL, P2M_ALLOC); + if ( !page || !get_page_type(page, PGT_writable_page) ) { - put_gfn(d, op.arg1.mfn); - MEM_LOG("Error while clearing mfn %lx", mfn); + if ( page ) + put_page(page); + MEM_LOG("Error while clearing mfn %lx", op.arg1.mfn); break; } /* A page is dirtied when it's being cleared. */ - paging_mark_dirty(d, mfn); - - ptr = fixmap_domain_page(mfn); + paging_mark_dirty(d, page_to_mfn(page)); + + ptr = fixmap_domain_page(page_to_mfn(page)); clear_page(ptr); fixunmap_domain_page(ptr); - put_page_and_type(mfn_to_page(mfn)); - put_gfn(d, op.arg1.mfn); + put_page_and_type(page); break; } @@ -3312,42 +3310,38 @@ int do_mmuext_op( { const unsigned char *src; unsigned char *dst; - unsigned long src_mfn, mfn; - - src_mfn = get_gfn_untyped(d, op.arg2.src_mfn); - okay = get_page_from_pagenr(src_mfn, d); + struct page_info *src_page, *dst_page; + + src_page = get_page_from_gfn(d, op.arg2.src_mfn, NULL, P2M_ALLOC); + if ( unlikely(!src_page) ) + { + okay = 0; + MEM_LOG("Error while copying from mfn %lx", op.arg2.src_mfn); + break; + } + + dst_page = get_page_from_gfn(d, op.arg1.mfn, NULL, P2M_ALLOC); + okay = (dst_page && get_page_type(dst_page, PGT_writable_page)); if ( unlikely(!okay) ) { - put_gfn(d, op.arg2.src_mfn); - MEM_LOG("Error while copying from mfn %lx", src_mfn); + put_page(src_page); + if ( dst_page ) + put_page(dst_page); + MEM_LOG("Error while copying to mfn %lx", op.arg1.mfn); break; } - mfn = get_gfn_untyped(d, op.arg1.mfn); - okay = !get_page_and_type_from_pagenr( - mfn, PGT_writable_page, d, 0, 0); - if ( unlikely(!okay) ) - { - put_gfn(d, op.arg1.mfn); - put_page(mfn_to_page(src_mfn)); - put_gfn(d, op.arg2.src_mfn); - MEM_LOG("Error while copying to mfn %lx", mfn); - break; - } - /* A page is dirtied when it's being copied to. */ - paging_mark_dirty(d, mfn); - - src = map_domain_page(src_mfn); - dst = fixmap_domain_page(mfn); + paging_mark_dirty(d, page_to_mfn(dst_page)); + + src = __map_domain_page(src_page); + dst = fixmap_domain_page(page_to_mfn(dst_page)); copy_page(dst, src); fixunmap_domain_page(dst); unmap_domain_page(src); - put_page_and_type(mfn_to_page(mfn)); - put_gfn(d, op.arg1.mfn); - put_page(mfn_to_page(src_mfn)); - put_gfn(d, op.arg2.src_mfn); + put_page_and_type(dst_page); + put_page(src_page); break; } @@ -3538,29 +3532,26 @@ int do_mmu_update( req.ptr -= cmd; gmfn = req.ptr >> PAGE_SHIFT; - mfn = mfn_x(get_gfn(pt_owner, gmfn, &p2mt)); - if ( !p2m_is_valid(p2mt) ) - mfn = INVALID_MFN; + page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC); if ( p2m_is_paged(p2mt) ) { - put_gfn(pt_owner, gmfn); + ASSERT(!page); p2m_mem_paging_populate(pg_owner, gmfn); rc = -ENOENT; break; } - if ( unlikely(!get_page_from_pagenr(mfn, pt_owner)) ) + if ( unlikely(!page) ) { MEM_LOG("Could not get page for normal update"); - put_gfn(pt_owner, gmfn); break; } + mfn = page_to_mfn(page); va = map_domain_page_with_cache(mfn, &mapcache); va = (void *)((unsigned long)va + (unsigned long)(req.ptr & ~PAGE_MASK)); - page = mfn_to_page(mfn); if ( page_lock(page) ) { @@ -3569,22 +3560,23 @@ int do_mmu_update( case PGT_l1_page_table: { l1_pgentry_t l1e = l1e_from_intpte(req.val); - p2m_type_t l1e_p2mt; - unsigned long l1egfn = l1e_get_pfn(l1e), l1emfn; - - l1emfn = mfn_x(get_gfn(pg_owner, l1egfn, &l1e_p2mt)); + p2m_type_t l1e_p2mt = p2m_ram_rw; + struct page_info *target = NULL; + + if ( paging_mode_translate(pg_owner) ) + target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e), + &l1e_p2mt, P2M_ALLOC); if ( p2m_is_paged(l1e_p2mt) ) { - put_gfn(pg_owner, l1egfn); + if ( target ) + put_page(target); p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e)); rc = -ENOENT; break; } - else if ( p2m_ram_paging_in == l1e_p2mt && - !mfn_valid(l1emfn) ) + else if ( p2m_ram_paging_in == l1e_p2mt && !target ) { - put_gfn(pg_owner, l1egfn); rc = -ENOENT; break; } @@ -3601,7 +3593,8 @@ int do_mmu_update( rc = mem_sharing_unshare_page(pg_owner, gfn, 0); if ( rc ) { - put_gfn(pg_owner, l1egfn); + if ( target ) + put_page(target); /* Notify helper, don't care about errors, will not * sleep on wq, since we're a foreign domain. */ (void)mem_sharing_notify_enomem(pg_owner, gfn, 0); @@ -3614,112 +3607,22 @@ int do_mmu_update( rc = mod_l1_entry(va, l1e, mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, v, pg_owner); - put_gfn(pg_owner, l1egfn); + if ( target ) + put_page(target); } break; case PGT_l2_page_table: - { - l2_pgentry_t l2e = l2e_from_intpte(req.val); - p2m_type_t l2e_p2mt; - unsigned long l2egfn = l2e_get_pfn(l2e), l2emfn; - - l2emfn = mfn_x(get_gfn(pg_owner, l2egfn, &l2e_p2mt)); - - if ( p2m_is_paged(l2e_p2mt) ) - { - put_gfn(pg_owner, l2egfn); - p2m_mem_paging_populate(pg_owner, l2egfn); - rc = -ENOENT; - break; - } - else if ( p2m_ram_paging_in == l2e_p2mt && - !mfn_valid(l2emfn) ) - { - put_gfn(pg_owner, l2egfn); - rc = -ENOENT; - break; - } - else if ( p2m_ram_shared == l2e_p2mt ) - { - put_gfn(pg_owner, l2egfn); - MEM_LOG("Unexpected attempt to map shared page.\n"); - break; - } - - - rc = mod_l2_entry(va, l2e, mfn, + rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, v); - put_gfn(pg_owner, l2egfn); - } - break; + break; case PGT_l3_page_table: - { - l3_pgentry_t l3e = l3e_from_intpte(req.val); - p2m_type_t l3e_p2mt; - unsigned long l3egfn = l3e_get_pfn(l3e), l3emfn; - - l3emfn = mfn_x(get_gfn(pg_owner, l3egfn, &l3e_p2mt)); - - if ( p2m_is_paged(l3e_p2mt) ) - { - put_gfn(pg_owner, l3egfn); - p2m_mem_paging_populate(pg_owner, l3egfn); - rc = -ENOENT; - break; - } - else if ( p2m_ram_paging_in == l3e_p2mt && - !mfn_valid(l3emfn) ) - { - put_gfn(pg_owner, l3egfn); - rc = -ENOENT; - break; - } - else if ( p2m_ram_shared == l3e_p2mt ) - { - put_gfn(pg_owner, l3egfn); - MEM_LOG("Unexpected attempt to map shared page.\n"); - break; - } - - rc = mod_l3_entry(va, l3e, mfn, + rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, 1, v); - put_gfn(pg_owner, l3egfn); - } - break; + break; #if CONFIG_PAGING_LEVELS >= 4 case PGT_l4_page_table: - { - l4_pgentry_t l4e = l4e_from_intpte(req.val); - p2m_type_t l4e_p2mt; - unsigned long l4egfn = l4e_get_pfn(l4e), l4emfn; - - l4emfn = mfn_x(get_gfn(pg_owner, l4egfn, &l4e_p2mt)); - - if ( p2m_is_paged(l4e_p2mt) ) - { - put_gfn(pg_owner, l4egfn); - p2m_mem_paging_populate(pg_owner, l4egfn); - rc = -ENOENT; - break; - } - else if ( p2m_ram_paging_in == l4e_p2mt && - !mfn_valid(l4emfn) ) - { - put_gfn(pg_owner, l4egfn); - rc = -ENOENT; - break; - } - else if ( p2m_ram_shared == l4e_p2mt ) - { - put_gfn(pg_owner, l4egfn); - MEM_LOG("Unexpected attempt to map shared page.\n"); - break; - } - - rc = mod_l4_entry(va, l4e, mfn, + rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, 1, v); - put_gfn(pg_owner, l4egfn); - } break; #endif case PGT_writable_page: @@ -3742,7 +3645,6 @@ int do_mmu_update( unmap_domain_page_with_cache(va, &mapcache); put_page(page); - put_gfn(pt_owner, gmfn); } break; diff -r 107285938c50 xen/arch/x86/mm/guest_walk.c --- a/xen/arch/x86/mm/guest_walk.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/mm/guest_walk.c Fri Apr 27 10:23:28 2012 +0100 @@ -94,39 +94,37 @@ static inline void *map_domain_gfn(struc p2m_type_t *p2mt, uint32_t *rc) { - p2m_access_t p2ma; + struct page_info *page; void *map; /* Translate the gfn, unsharing if shared */ - *mfn = get_gfn_type_access(p2m, gfn_x(gfn), p2mt, &p2ma, - P2M_ALLOC | P2M_UNSHARE, NULL); + page = get_page_from_gfn_p2m(p2m->domain, p2m, gfn_x(gfn), p2mt, NULL, + P2M_ALLOC | P2M_UNSHARE); if ( p2m_is_paging(*p2mt) ) { ASSERT(!p2m_is_nestedp2m(p2m)); - __put_gfn(p2m, gfn_x(gfn)); + if ( page ) + put_page(page); p2m_mem_paging_populate(p2m->domain, gfn_x(gfn)); *rc = _PAGE_PAGED; return NULL; } if ( p2m_is_shared(*p2mt) ) { - __put_gfn(p2m, gfn_x(gfn)); + if ( page ) + put_page(page); *rc = _PAGE_SHARED; return NULL; } - if ( !p2m_is_ram(*p2mt) ) + if ( !page ) { - __put_gfn(p2m, gfn_x(gfn)); *rc |= _PAGE_PRESENT; return NULL; } + *mfn = _mfn(page_to_mfn(page)); ASSERT(mfn_valid(mfn_x(*mfn))); - - /* Get an extra ref to the page to ensure liveness of the map. - * Then we can safely put gfn */ - page_get_owner_and_reference(mfn_to_page(mfn_x(*mfn))); + map = map_domain_page(mfn_x(*mfn)); - __put_gfn(p2m, gfn_x(gfn)); return map; } diff -r 107285938c50 xen/arch/x86/mm/hap/guest_walk.c --- a/xen/arch/x86/mm/hap/guest_walk.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/mm/hap/guest_walk.c Fri Apr 27 10:23:28 2012 +0100 @@ -54,34 +54,36 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA mfn_t top_mfn; void *top_map; p2m_type_t p2mt; - p2m_access_t p2ma; walk_t gw; unsigned long top_gfn; + struct page_info *top_page; /* Get the top-level table's MFN */ top_gfn = cr3 >> PAGE_SHIFT; - top_mfn = get_gfn_type_access(p2m, top_gfn, &p2mt, &p2ma, - P2M_ALLOC | P2M_UNSHARE, NULL); + top_page = get_page_from_gfn_p2m(p2m->domain, p2m, top_gfn, + &p2mt, NULL, P2M_ALLOC | P2M_UNSHARE); if ( p2m_is_paging(p2mt) ) { ASSERT(!p2m_is_nestedp2m(p2m)); pfec[0] = PFEC_page_paged; - __put_gfn(p2m, top_gfn); + if ( top_page ) + put_page(top_page); p2m_mem_paging_populate(p2m->domain, cr3 >> PAGE_SHIFT); return INVALID_GFN; } if ( p2m_is_shared(p2mt) ) { pfec[0] = PFEC_page_shared; - __put_gfn(p2m, top_gfn); + if ( top_page ) + put_page(top_page); return INVALID_GFN; } - if ( !p2m_is_ram(p2mt) ) + if ( !top_page ) { pfec[0] &= ~PFEC_page_present; - __put_gfn(p2m, top_gfn); return INVALID_GFN; } + top_mfn = _mfn(page_to_mfn(top_page)); /* Map the top-level table and call the tree-walker */ ASSERT(mfn_valid(mfn_x(top_mfn))); @@ -91,31 +93,30 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA #endif missing = guest_walk_tables(v, p2m, ga, &gw, pfec[0], top_mfn, top_map); unmap_domain_page(top_map); - __put_gfn(p2m, top_gfn); + put_page(top_page); /* Interpret the answer */ if ( missing == 0 ) { gfn_t gfn = guest_l1e_get_gfn(gw.l1e); - (void)get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma, - P2M_ALLOC | P2M_UNSHARE, NULL); + struct page_info *page; + page = get_page_from_gfn_p2m(p2m->domain, p2m, gfn_x(gfn), &p2mt, + NULL, P2M_ALLOC | P2M_UNSHARE); + if ( page ) + put_page(page); if ( p2m_is_paging(p2mt) ) { ASSERT(!p2m_is_nestedp2m(p2m)); pfec[0] = PFEC_page_paged; - __put_gfn(p2m, gfn_x(gfn)); p2m_mem_paging_populate(p2m->domain, gfn_x(gfn)); return INVALID_GFN; } if ( p2m_is_shared(p2mt) ) { pfec[0] = PFEC_page_shared; - __put_gfn(p2m, gfn_x(gfn)); return INVALID_GFN; } - __put_gfn(p2m, gfn_x(gfn)); - if ( page_order ) *page_order = guest_walk_to_page_order(&gw); diff -r 107285938c50 xen/arch/x86/mm/mm-locks.h --- a/xen/arch/x86/mm/mm-locks.h Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/mm/mm-locks.h Fri Apr 27 10:23:28 2012 +0100 @@ -166,13 +166,39 @@ declare_mm_lock(nestedp2m) * and later mutate it. */ -declare_mm_lock(p2m) -#define p2m_lock(p) mm_lock_recursive(p2m, &(p)->lock) -#define gfn_lock(p,g,o) mm_lock_recursive(p2m, &(p)->lock) -#define p2m_unlock(p) mm_unlock(&(p)->lock) -#define gfn_unlock(p,g,o) mm_unlock(&(p)->lock) -#define p2m_locked_by_me(p) mm_locked_by_me(&(p)->lock) -#define gfn_locked_by_me(p,g) mm_locked_by_me(&(p)->lock) +/* P2M lock is become an rwlock, purely so we can implement + * get_page_from_gfn. The mess below is a ghastly hack to make a + * recursive rwlock. If it works I'll come back and fix up the + * order-contraints magic. */ + +static inline void p2m_lock(struct p2m_domain *p) +{ + if ( p->wcpu != current->processor ) + { + write_lock(&p->lock); + p->wcpu = current->processor; + ASSERT(p->wcount == 0); + } + p->wcount++; +} + +static inline void p2m_unlock(struct p2m_domain *p) +{ + ASSERT(p->wcpu == current->processor); + if (--(p->wcount) == 0) + { + p->wcpu = -1; + write_unlock(&p->lock); + } +} + +#define gfn_lock(p,g,o) p2m_lock(p) +#define gfn_unlock(p,g,o) p2m_unlock(p) +#define p2m_read_lock(p) read_lock(&(p)->lock) +#define p2m_read_unlock(p) read_unlock(&(p)->lock) +#define p2m_locked_by_me(p) ((p)->wcpu == current->processor) +#define gfn_locked_by_me(p,g) p2m_locked_by_me(p) + /* Sharing per page lock * @@ -203,8 +229,8 @@ declare_mm_order_constraint(per_page_sha * counts, page lists, sweep parameters. */ declare_mm_lock(pod) -#define pod_lock(p) mm_lock(pod, &(p)->pod.lock) -#define pod_unlock(p) mm_unlock(&(p)->pod.lock) +#define pod_lock(p) do { p2m_lock(p); mm_lock(pod, &(p)->pod.lock); } while (0) +#define pod_unlock(p) do { mm_unlock(&(p)->pod.lock); p2m_unlock(p);} while (0) #define pod_locked_by_me(p) mm_locked_by_me(&(p)->pod.lock) /* Page alloc lock (per-domain) diff -r 107285938c50 xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/mm/p2m.c Fri Apr 27 10:23:28 2012 +0100 @@ -71,7 +71,9 @@ boolean_param("hap_2mb", opt_hap_2mb); /* Init the datastructures for later use by the p2m code */ static void p2m_initialise(struct domain *d, struct p2m_domain *p2m) { - mm_lock_init(&p2m->lock); + rwlock_init(&p2m->lock); + p2m->wcount = 0; + p2m->wcpu = -1; mm_lock_init(&p2m->pod.lock); INIT_LIST_HEAD(&p2m->np2m_list); INIT_PAGE_LIST_HEAD(&p2m->pages); @@ -207,6 +209,59 @@ void __put_gfn(struct p2m_domain *p2m, u gfn_unlock(p2m, gfn, 0); } +/* Atomically look up a GFN and take a reference count on the backing page. */ +struct page_info *get_page_from_gfn_p2m( + struct domain *d, struct p2m_domain *p2m, unsigned long gfn, + p2m_type_t *t, p2m_access_t *a, p2m_query_t q) +{ + struct page_info *page = NULL; + p2m_access_t _a; + p2m_type_t _t; + mfn_t mfn; + + /* Allow t or a to be NULL */ + t = t ?: &_t; + a = a ?: &_a; + + if ( likely(!p2m_locked_by_me(p2m)) ) + { + /* Fast path: look up and get out */ + p2m_read_lock(p2m); + mfn = __get_gfn_type_access(p2m, gfn, t, a, 0, NULL, 0); + if ( (p2m_is_ram(*t) || p2m_is_grant(*t)) + && mfn_valid(mfn) + && !((q & P2M_UNSHARE) && p2m_is_shared(*t)) ) + { + page = mfn_to_page(mfn); + if ( !get_page(page, d) + /* Page could be shared */ + && !get_page(page, dom_cow) ) + page = NULL; + } + p2m_read_unlock(p2m); + + if ( page ) + return page; + + /* Error path: not a suitable GFN at all */ + if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_magic(*t) ) + return NULL; + } + + /* Slow path: take the write lock and do fixups */ + mfn = get_gfn_type_access(p2m, gfn, t, a, q, NULL); + if ( p2m_is_ram(*t) && mfn_valid(mfn) ) + { + page = mfn_to_page(mfn); + if ( !get_page(page, d) ) + page = NULL; + } + put_gfn(d, gfn); + + return page; +} + + int set_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma) { diff -r 107285938c50 xen/arch/x86/physdev.c --- a/xen/arch/x86/physdev.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/physdev.c Fri Apr 27 10:23:28 2012 +0100 @@ -306,26 +306,27 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H case PHYSDEVOP_pirq_eoi_gmfn_v1: { struct physdev_pirq_eoi_gmfn info; unsigned long mfn; + struct page_info *page; ret = -EFAULT; if ( copy_from_guest(&info, arg, 1) != 0 ) break; ret = -EINVAL; - mfn = get_gfn_untyped(current->domain, info.gmfn); - if ( !mfn_valid(mfn) || - !get_page_and_type(mfn_to_page(mfn), v->domain, - PGT_writable_page) ) + page = get_page_from_gfn(current->domain, info.gmfn, NULL, P2M_ALLOC); + if ( !page ) + break; + if ( !get_page_type(page, PGT_writable_page) ) { - put_gfn(current->domain, info.gmfn); + put_page(page); break; } + mfn = page_to_mfn(page); if ( cmpxchg(&v->domain->arch.pv_domain.pirq_eoi_map_mfn, 0, mfn) != 0 ) { put_page_and_type(mfn_to_page(mfn)); - put_gfn(current->domain, info.gmfn); ret = -EBUSY; break; } @@ -335,14 +336,12 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H { v->domain->arch.pv_domain.pirq_eoi_map_mfn = 0; put_page_and_type(mfn_to_page(mfn)); - put_gfn(current->domain, info.gmfn); ret = -ENOSPC; break; } if ( cmd == PHYSDEVOP_pirq_eoi_gmfn_v1 ) v->domain->arch.pv_domain.auto_unmask = 1; - put_gfn(current->domain, info.gmfn); ret = 0; break; } diff -r 107285938c50 xen/arch/x86/traps.c --- a/xen/arch/x86/traps.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/arch/x86/traps.c Fri Apr 27 10:23:28 2012 +0100 @@ -662,9 +662,9 @@ int wrmsr_hypervisor_regs(uint32_t idx, case 0: { void *hypercall_page; - unsigned long mfn; unsigned long gmfn = val >> 12; unsigned int idx = val & 0xfff; + struct page_info *page; if ( idx > 0 ) { @@ -674,24 +674,23 @@ int wrmsr_hypervisor_regs(uint32_t idx, return 0; } - mfn = get_gfn_untyped(d, gmfn); - - if ( !mfn_valid(mfn) || - !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) ) + page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); + + if ( !page || !get_page_type(page, PGT_writable_page) ) { - put_gfn(d, gmfn); + if ( page ) + put_page(page); gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx) to MSR %08x\n", - gmfn, mfn, base + idx); + gmfn, page_to_mfn(page), base + idx); return 0; } - hypercall_page = map_domain_page(mfn); + hypercall_page = __map_domain_page(page); hypercall_page_initialise(d, hypercall_page); unmap_domain_page(hypercall_page); - put_page_and_type(mfn_to_page(mfn)); - put_gfn(d, gmfn); + put_page_and_type(page); break; } @@ -2374,7 +2373,8 @@ static int emulate_privileged_op(struct break; case 3: {/* Write CR3 */ - unsigned long mfn, gfn; + unsigned long gfn; + struct page_info *page; domain_lock(v->domain); if ( !is_pv_32on64_vcpu(v) ) { @@ -2384,9 +2384,10 @@ static int emulate_privileged_op(struct gfn = compat_cr3_to_pfn(*reg); #endif } - mfn = get_gfn_untyped(v->domain, gfn); - rc = new_guest_cr3(mfn); - put_gfn(v->domain, gfn); + page = get_page_from_gfn(v->domain, gfn, NULL, P2M_ALLOC); + rc = page ? new_guest_cr3(page_to_mfn(page)) : 0; + if ( page ) + put_page(page); domain_unlock(v->domain); if ( rc == 0 ) /* not okay */ goto fail; diff -r 107285938c50 xen/include/asm-x86/p2m.h --- a/xen/include/asm-x86/p2m.h Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/include/asm-x86/p2m.h Fri Apr 27 10:23:28 2012 +0100 @@ -192,7 +192,10 @@ typedef unsigned int p2m_query_t; /* Per-p2m-table state */ struct p2m_domain { /* Lock that protects updates to the p2m */ - mm_lock_t lock; + rwlock_t lock; + int wcpu; + int wcount; + const char *wfunc; /* Shadow translated domain: p2m mapping */ pagetable_t phys_table; @@ -377,6 +380,33 @@ static inline mfn_t get_gfn_query_unlock return __get_gfn_type_access(p2m_get_hostp2m(d), gfn, t, &a, 0, NULL, 0); } +/* Atomically look up a GFN and take a reference count on the backing page. + * This makes sure the page doesn't get freed (or shared) underfoot, + * and should be used by any path that intends to write to the backing page. + * Returns NULL if the page is not backed by RAM. + * The caller is responsible for calling put_page() afterwards. */ +struct page_info *get_page_from_gfn_p2m(struct domain *d, + struct p2m_domain *p2m, + unsigned long gfn, + p2m_type_t *t, p2m_access_t *a, + p2m_query_t q); + +static inline struct page_info *get_page_from_gfn( + struct domain *d, unsigned long gfn, p2m_type_t *t, p2m_query_t q) +{ + struct page_info *page; + + if ( paging_mode_translate(d) ) + return get_page_from_gfn_p2m(d, p2m_get_hostp2m(d), gfn, t, NULL, q); + + /* Non-translated guests see 1-1 RAM mappings everywhere */ + if (t) + *t = p2m_ram_rw; + page = __mfn_to_page(gfn); + return get_page(page, d) ? page : NULL; +} + + /* General conversion function from mfn to gfn */ static inline unsigned long mfn_to_gfn(struct domain *d, mfn_t mfn) { diff -r 107285938c50 xen/xsm/flask/hooks.c --- a/xen/xsm/flask/hooks.c Thu Apr 26 10:03:08 2012 +0100 +++ b/xen/xsm/flask/hooks.c Fri Apr 27 10:23:28 2012 +0100 @@ -1318,6 +1318,7 @@ static int flask_mmu_normal_update(struc struct domain_security_struct *dsec; u32 fsid; struct avc_audit_data ad; + struct page_info *page; if (d != t) rc = domain_has_perm(d, t, SECCLASS_MMU, MMU__REMOTE_REMAP); @@ -1333,7 +1334,8 @@ static int flask_mmu_normal_update(struc map_perms |= MMU__MAP_WRITE; AVC_AUDIT_DATA_INIT(&ad, MEMORY); - fmfn = get_gfn_untyped(f, l1e_get_pfn(l1e_from_intpte(fpte))); + page = get_page_from_gfn(f, l1e_get_pfn(l1e_from_intpte(fpte)), P2M_ALLOC); + mfn = page ? page_to_mfn(page) : INVALID_MFN; ad.sdom = d; ad.tdom = f; @@ -1342,7 +1344,8 @@ static int flask_mmu_normal_update(struc rc = get_mfn_sid(fmfn, &fsid); - put_gfn(f, fmfn); + if ( page ) + put_page(page); if ( rc ) return rc; --zYM0uCDKw75PZbzx Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --zYM0uCDKw75PZbzx--