From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: Need help with fixing the Xen waitqueue feature Date: Tue, 08 Nov 2011 22:05:41 +0000 Message-ID: References: <20111108212024.GA5276@aepfle.de> Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20111108212024.GA5276@aepfle.de> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Olaf Hering , xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On 08/11/2011 21:20, "Olaf Hering" wrote: > Another thing is that sometimes the host suddenly reboots without any > message. I think the reason for this is that a vcpu whose stack was put > aside and that was later resumed may find itself on another physical > cpu. And if that happens, wouldnt that invalidate some of the local > variables back in the callchain? If some of them point to the old > physical cpu, how could this be fixed? Perhaps a few "volatiles" are > needed in some places. >>From how many call sites can we end up on a wait queue? I know we were going to end up with a small and explicit number (e.g., in __hvm_copy()) but does this patch make it a more generally-used mechanism? There will unavoidably be many constraints on callers who want to be able to yield the cpu. We can add Linux-style get_cpu/put_cpu abstractions to catch some of them. Actually I don't think it's *that* common that hypercall contexts cache things like per-cpu pointers. But every caller will need auditing, I expect. A sudden reboot is very extreme. No message even on a serial line? That most commonly indicates bad page tables. Most other bugs you'd at least get a double fault message. -- Keir