xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Andres Lagar-Cavilla" <andres@lagarcavilla.org>
To: Tim Deegan <tim@xen.org>
Cc: andres@gridcentric.ca, xen-devel@lists.xen.org, adin@gridcentric.ca
Subject: Re: [PATCH 0 of 2] x86/mm: Unsharing ENOMEM handling
Date: Thu, 15 Mar 2012 09:44:04 -0700	[thread overview]
Message-ID: <84a9413ca95cb88bfe0eeda79f9cd4ed.squirrel@webmail.lagarcavilla.org> (raw)
In-Reply-To: <20120315153601.GB11329@ocelot.phlegethon.org>

> At 07:35 -0700 on 15 Mar (1331796917), Andres Lagar-Cavilla wrote:
>> > At 11:29 -0400 on 12 Mar (1331551776), Andres Lagar-Cavilla wrote:
>> >> These two patches were originally posted on Feb 15th as part of a
>> larger
>> >> series.
>> >>
>> >> They were left to simmer as a discussion on wait queues took
>> precedence.
>> >>
>> >> Regardless of the ultimate fate of wait queues, these two patches are
>> >> necessary
>> >> as they solve some bugs on the memory sharing side. When unsharing
>> >> fails,
>> >> domains would spin forever, hosts would crash, etc.
>> >>
>> >> The patches also clarify the semantics of unsharing, and comment how
>> >> it's
>> >> handled.
>> >>
>> >> Two comments against the Feb 15th series taken care of here:
>> >>  - We assert that the unsharing code can only return success or
>> ENOMEN.
>> >>  - Acked-by Tim Deegan added to patch #1
>> >
>> > Applied, thanks.
>> >
>> > I'm a bit uneasy about the way this increases the amount of
>> boilerplate
>> > and p2m-related knowledge that's needed at call sites, but it fixes
>> real
>> > problems and I can't see an easy way to avoid it.
>> >
>> Agreed, completely. Luckily it's all internal to the hypervisor.
>>
>> I'm gonna float an idea right now, risking egg-in-the-face again. Our
>> main
>> issue is that going to sleep on a wait queue is disallowed in an atomic
>> context. For good reason, the vcpu goes to sleep holding locks.
>> Therefore,
>> we can't magically hide all the complexity behind get_gfn, and callers
>> need to know things they shouldn't.
>>
>> However, sleeping only deadlocks if the "waker upper" would need to grab
>> any of those locks.
>
> Tempting.  But I don't think it will fly -- in general dom0 tools should
> be able to crash and restart without locking up Xen.   And anything that
> causes a VCPU to sleep forever with a lock held is likely to do that.

If a mem event tool crashes, the domain is rather hosed. The restart would
have a hard hard time figuring out events that were pulled from the ring
but not yet acted upon.

So, assume some toolstack element is alerted of the helper crash. The only
way to go is to crash the domain. As long as the sleepers are not holding
global locks, we should be good. A sleeper holding a global lock is a
definitive no-no. We can strengthen this from a rule to actual runtime
enforcement (if we go down this crazy path). There is still the issue of
the domain cleanup process being preempted by the asleep lock holders.

>
> Also we have to worry about anything that has to happen before the
> waker-upper gets to run -- for example, on a single-CPU Xen, any attempt
> by any code to get the lock that's held by the sleeper will hang forever
> because the waker-upper can't be scheduled.
>
> We could have some sort of time-out-and-crash-the-domain safety net, I
> guess, but part of the reason for wanting wait queues was avoiding
> plumbing all those error paths.
>
> Maybe we could just extend the idea and have the slow path of the
> spinlock code dump the caller on a wait queue in the hope that someone
> else will sort it out. :)

What about the second (or first nested) spinlock?
Andres
>
> Cheers,
>
> Tim.
>

      reply	other threads:[~2012-03-15 16:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-12 15:29 [PATCH 0 of 2] x86/mm: Unsharing ENOMEM handling Andres Lagar-Cavilla
2012-03-12 15:29 ` [PATCH 1 of 2] x86/mm: Allow to not sleep on mem event ring Andres Lagar-Cavilla
2012-03-12 15:29 ` [PATCH 2 of 2] Memory sharing: better handling of ENOMEM while unsharing Andres Lagar-Cavilla
2012-03-15 11:34 ` [PATCH 0 of 2] x86/mm: Unsharing ENOMEM handling Tim Deegan
2012-03-15 14:35   ` Andres Lagar-Cavilla
2012-03-15 15:36     ` Tim Deegan
2012-03-15 16:44       ` Andres Lagar-Cavilla [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84a9413ca95cb88bfe0eeda79f9cd4ed.squirrel@webmail.lagarcavilla.org \
    --to=andres@lagarcavilla.org \
    --cc=adin@gridcentric.ca \
    --cc=andres@gridcentric.ca \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).