From mboxrd@z Thu Jan 1 00:00:00 1970 From: Olaf Hering Subject: Re: Need help with fixing the Xen waitqueue feature Date: Thu, 10 Nov 2011 11:18:28 +0100 Message-ID: <20111110101828.GA31293@aepfle.de> References: <20111108224414.83985CF73A@homiemail-mx7.g.dreamhost.com> <3c097da8e49a42af1210e4ffcd39fd48.squirrel@webmail.lagarcavilla.org> <20111109070927.GB26154@aepfle.de> <0bb01a4d216a68c4ae8441b037927f61.squirrel@webmail.lagarcavilla.org> <20111109221148.GA17166@aepfle.de> <5d7d38b18271fcc7aa750604eeb52bbd.squirrel@webmail.lagarcavilla.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Return-path: Content-Disposition: inline In-Reply-To: <5d7d38b18271fcc7aa750604eeb52bbd.squirrel@webmail.lagarcavilla.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Andres Lagar-Cavilla Cc: keir.xen@gmail.com, xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On Wed, Nov 09, Andres Lagar-Cavilla wrote: > Olaf, > > On Wed, Nov 09, Andres Lagar-Cavilla wrote: > > > >> After a bit of thinking, things are far more complicated. I don't think > >> this is a "race." If the pager removed a page that later gets scheduled > >> by > >> the guest OS for IO, qemu will want to foreign-map that. With the > >> hypervisor returning ENOENT, the foreign map will fail, and there goes > >> qemu. > > > > The tools are supposed to catch ENOENT and try again. > > linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map() > > appears to do that as well. What code path uses qemu that leads to a > > crash? > > The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which > it isn't on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you using? I'm running SLES11 as dom0. Now thats really odd that there is no ENOENT handling in mainline, I will go and check the code. > And for backend drivers implemented in the kernel (netback, etc), there is > no retrying. A while ago I fixed the grant status handling, perhaps that change was never forwarded to pvops, at least I didnt do it at that time. > I'm using 24066:54a5e994a241. I start windows 7, make xenpaging try to > evict 90% of the RAM, qemu lasts for about two seconds. Linux fights > harder, but qemu also dies. No pv drivers. I haven't been able to trace > back the qemu crash (segfault on a NULL ide_if field for a dma callback) > to the exact paging action yet, but no crashes without paging. If the kernel is pvops it may need some audit to check the ENOENT handling. Olaf