From: "Andres Lagar-Cavilla" <andres@lagarcavilla.org>
To: Tim Deegan <tim@xen.org>
Cc: "Zhang, Yang Z" <yang.z.zhang@intel.com>,
oalf@aepfle.de,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
Keir Fraser <keir@xen.org>,
George.Dunlap@eu.citrix.com
Subject: Re: lock in vhpet
Date: Fri, 27 Apr 2012 14:08:20 -0700 [thread overview]
Message-ID: <ed59d3a342e079902f20b76e360644a1.squirrel@webmail.lagarcavilla.org> (raw)
In-Reply-To: <20120427092645.GC86045@ocelot.phlegethon.org>
> At 20:02 -0700 on 26 Apr (1335470547), Andres Lagar-Cavilla wrote:
>> > Can you please try the attached patch? I think you'll need this one
>> > plus the ones that take the locks out of the hpet code.
>>
>> Right off the bat I'm getting a multitude of
>> (XEN) mm.c:3294:d0 Error while clearing mfn 100cbb7
>> And a hung dom0 during initramfs. I'm a little baffled as to why, but
>> it's
>> there (32 bit dom0, XenServer6).
>
> Curses, I knew there'd be one somewhere. I've been replacing
> get_page_and_type_from_pagenr()s (which return 0 for success) with
> old-school get_page_type()s (which return 1 for success) and not always
> getting the right number of inversions. That's a horrible horrible
> beartrap of an API, BTW, which had me cursing at the screen, but I had
> enough on my plate yesterday without touching _that_ code too!
>
I now am quite pleased with the testing results on our end. I have a
four-patch series to top up your monster patch, which I'll submit shortly.
I encourage everyone interested to test this (obviously a lot of code is
touched). Including AMD, as I've expanded the code to touch svm too.
>> > Andres, this is basically the big-hammer version of your "take a
>> > pagecount" changes, plus the change you made to hvmemul_rep_movs().
>> > If this works I intend to follow it up with a patch to make some of
>> the
>> > read-modify-write paths avoid taking the lock (by using a
>> > compare-exchange operation so they only take the lock on a write). If
>> > that succeeds I might drop put_gfn() altogether.
>>
>> You mean cmpxchg the whole p2m entry? I don't think I parse the plan.
>> There are code paths that do get_gfn_query -> p2m_change_type ->
>> put_gfn.
>> But I guess those could lock the p2m up-front if they become the only
>> consumers of put_gfn left.
>
> Well, that's more or less what happens now. I was thinking of replacing
> any remaining
>
> (implicit) lock ; read ; think a bit ; maybe write ; unlock
>
> code with the fast-path-friendlier:
>
> read ; think ; maybe-cmpxchg (and on failure undo or retry
>
> which avoids taking the write lock altogether if there's no work to do.
> But maybe there aren't many of those left now. Obviously any path
> which will always write should just take the write-lock first.
After my four patches there aren't really any paths like the above left
(exhaustion disclaimer). I believe one or two iterative paths (like
HVMOP_set_mem_type) could be optimized to take the p2m lock up front,
instead of many get_gfn's.
The nice thing about get_gfn/put_gfn is that it will allow for seamless
(har har) transition to a fine-grained p2m. But then maybe we won't ever
need that with a p2m rwlock.
>
>> > - grant-table operations still use the lock, because frankly I
>> > could not follow the current code, and it's quite late in the
>> evening.
>>
>> It's pretty complex with serious nesting, and ifdef's for arm and 32
>> bit.
>> gfn_to_mfn_private callers will suffer from altering the current
>> meaning,
>> as put_gfn resolves to the right thing for the ifdef'ed arch. The other
>> user is grant_transfer which also relies on the page *not* having an
>> extra
>> ref in steal_page. So it's a prime candidate to be left alone.
>
> Sadly, I think it's not. The PV backends will be doing lots of grant
> ops, which shouldn't get serialized against all other P2M lookups.
>
Those are addressed in my patch series now, which should case waves of panic.
Andres
>> > I also have a long list of uglinesses in the mm code that I found
>>
>> Uh, ugly stuff, how could that have happened?
>
> I can't imagine. :) Certainly nothing to do with me thinking "I'll
> clean that up when I get some time."
>
>> I have a few preliminary observations on the patch. Pasting relevant
>> bits
>> here, since the body of the patch seems to have been lost by the email
>> thread:
>>
>> @@ -977,23 +976,25 @@ int arch_set_info_guest(
>> ...
>> +
>> + if (!paging_mode_refcounts(d)
>> + && !get_page_and_type(cr3_page, d, PGT_l3_page_table) )
>> replace with && !get_page_type() )
>
> Yep.
>
>> @@ -2404,32 +2373,33 @@ static enum hvm_copy_result __hvm_copy(
>> gfn = addr >> PAGE_SHIFT;
>> }
>>
>> - mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt));
>> + page = get_page_from_gfn(curr->domain, gfn, &p2mt,
>> P2M_UNSHARE);
>> replace with (flags & HVMCOPY_to_guest) ? P2M_UNSHARE : P2M_ALLOC (and
>> same logic when checking p2m_is_shared). Not truly related to your patch
>> bit since we're at it.
>
> OK, but not in this patch.
>
>> Same, further down
>> - if ( !p2m_is_ram(p2mt) )
>> + if ( !page )
>> {
>> - put_gfn(curr->domain, gfn);
>> + if ( page )
>> + put_page(page);
>> Last two lines are redundant
>
> Yep.
>
>> @@ -4019,35 +3993,16 @@ long do_hvm_op(unsigned long op, XEN_GUE
>> case HVMOP_modified_memory: a lot of error checking has been
>> removed.
>
> Yes, but it was bogus - there's a race between the actual modification
> and the call, during which anything might have happened. The best we
> can do is throw log-dirty bits at everything, and the caller can't do
> anything with the error anyway.
>
> When I come to tidy up I'll just add a new mark_gfn_dirty function
> and skip the pointless gfn->mfn->gfn translation on this path.
>
>> arch/x86/mm.c:do_mmu_update -> you blew up all the paging/sharing
>> checking
>> for target gfns of mmu updates of l2/3/4 entries. It seems that this
>> wouldn't work anyways, that's why you killed it?
>
> Yeah - since only L1es can point at foreign mappings it was all just
> noise, and even if there had been real p2m lookups on those paths there
> was no equivalent to the translate-in-place that happens in
> mod_l1_entry so it would have been broken in a much worse way.
>
>> +++ b/xen/arch/x86/mm/hap/guest_walk.c
>> @@ -54,34 +54,37 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA
>> ...
>> + if ( !top_page )
>> {
>> pfec[0] &= ~PFEC_page_present;
>> - __put_gfn(p2m, top_gfn);
>> + put_page(top_page);
>> top_page is NULL here, remove put_page
>
> Yep.
>
>> get_page_from_gfn_p2m, slow path: no need for p2m_lock/unlock since
>> locking is already done by get_gfn_type_access/__put_gfn
>
> Yeah, but I was writing that with half an eye on killing that lock. :)
> I'll drop them for now.
>
>> (hope those observations made sense without inlining them in the actual
>> patch)
>
> Yes, absolutely - thanks for the review!
>
> If we can get this to work well enough I'll tidy it up into a sensible
> series next week. In the meantime, an updated verison of the
> monster patch is attached.
>
> Cheers,
>
> Tim.
>
next prev parent reply other threads:[~2012-04-27 21:08 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-17 3:26 lock in vhpet Zhang, Yang Z
2012-04-17 7:27 ` Keir Fraser
2012-04-18 0:52 ` Zhang, Yang Z
2012-04-18 7:13 ` Keir Fraser
2012-04-18 7:55 ` Zhang, Yang Z
2012-04-18 8:29 ` Keir Fraser
2012-04-18 9:14 ` Keir Fraser
2012-04-18 9:30 ` Keir Fraser
2012-04-19 5:19 ` Zhang, Yang Z
2012-04-19 8:27 ` Tim Deegan
2012-04-19 8:47 ` Keir Fraser
2012-04-23 7:36 ` Zhang, Yang Z
2012-04-23 7:43 ` Jan Beulich
2012-04-23 8:15 ` Zhang, Yang Z
2012-04-23 8:22 ` Keir Fraser
2012-04-23 9:14 ` Tim Deegan
2012-04-23 15:26 ` Andres Lagar-Cavilla
2012-04-24 9:15 ` Tim Deegan
2012-04-24 13:28 ` Andres Lagar-Cavilla
2012-04-23 17:18 ` Andres Lagar-Cavilla
2012-04-24 8:58 ` Zhang, Yang Z
2012-04-24 9:16 ` Tim Deegan
2012-04-25 0:27 ` Zhang, Yang Z
2012-04-25 1:40 ` Andres Lagar-Cavilla
2012-04-25 1:48 ` Zhang, Yang Z
2012-04-25 2:31 ` Andres Lagar-Cavilla
2012-04-25 2:36 ` Zhang, Yang Z
2012-04-25 2:42 ` Andres Lagar-Cavilla
2012-04-25 3:12 ` Zhang, Yang Z
2012-04-25 3:34 ` Andres Lagar-Cavilla
2012-04-25 5:18 ` Zhang, Yang Z
2012-04-25 8:07 ` Jan Beulich
2012-04-26 21:25 ` Tim Deegan
2012-04-27 0:46 ` Zhang, Yang Z
2012-04-27 0:51 ` Andres Lagar-Cavilla
2012-04-27 1:24 ` Zhang, Yang Z
2012-04-27 8:36 ` Zhang, Yang Z
2012-04-27 3:02 ` Andres Lagar-Cavilla
2012-04-27 9:26 ` Tim Deegan
2012-04-27 14:17 ` Andres Lagar-Cavilla
2012-04-27 21:08 ` Andres Lagar-Cavilla [this message]
2012-05-16 11:36 ` Zhang, Yang Z
2012-05-16 12:36 ` Tim Deegan
2012-05-17 10:57 ` Tim Deegan
2012-05-28 6:54 ` Zhang, Yang Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ed59d3a342e079902f20b76e360644a1.squirrel@webmail.lagarcavilla.org \
--to=andres@lagarcavilla.org \
--cc=George.Dunlap@eu.citrix.com \
--cc=keir@xen.org \
--cc=oalf@aepfle.de \
--cc=tim@xen.org \
--cc=xen-devel@lists.xensource.com \
--cc=yang.z.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).