From mboxrd@z Thu Jan  1 00:00:00 1970
From: Keir Fraser <keir.xen@gmail.com>
Subject: Re: Need help with fixing the Xen waitqueue feature
Date: Tue, 08 Nov 2011 22:05:41 +0000
Message-ID: <CADF5835.245E1%keir.xen@gmail.com>
References: <20111108212024.GA5276@aepfle.de>
Mime-Version: 1.0
Content-Type: text/plain;
	charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <20111108212024.GA5276@aepfle.de>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Olaf Hering <olaf@aepfle.de>, xen-devel@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de> wrote:

> Another thing is that sometimes the host suddenly reboots without any
> message. I think the reason for this is that a vcpu whose stack was put
> aside and that was later resumed may find itself on another physical
> cpu. And if that happens, wouldnt that invalidate some of the local
> variables back in the callchain? If some of them point to the old
> physical cpu, how could this be fixed? Perhaps a few "volatiles" are
> needed in some places.

>>From how many call sites can we end up on a wait queue? I know we were going
to end up with a small and explicit number (e.g., in __hvm_copy()) but does
this patch make it a more generally-used mechanism? There will unavoidably
be many constraints on callers who want to be able to yield the cpu. We can
add Linux-style get_cpu/put_cpu abstractions to catch some of them. Actually
I don't think it's *that* common that hypercall contexts cache things like
per-cpu pointers. But every caller will need auditing, I expect.

A sudden reboot is very extreme. No message even on a serial line? That most
commonly indicates bad page tables. Most other bugs you'd at least get a
double fault message.

 -- Keir