From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1K9Sgo-0001me-Nv for qemu-devel@nongnu.org; Thu, 19 Jun 2008 18:33:10 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1K9Sgn-0001mN-Bl for qemu-devel@nongnu.org; Thu, 19 Jun 2008 18:33:10 -0400 Received: from [199.232.76.173] (port=55989 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1K9Sgn-0001mK-6P for qemu-devel@nongnu.org; Thu, 19 Jun 2008 18:33:09 -0400 Received: from mail.codesourcery.com ([65.74.133.4]:36518) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1K9Sgm-0001Rx-Tm for qemu-devel@nongnu.org; Thu, 19 Jun 2008 18:33:09 -0400 From: Paul Brook Subject: Re: [Qemu-devel] Re: LSI: avoid infinite loops Date: Thu, 19 Jun 2008 23:33:04 +0100 References: <20080507230206.GB28197@dmt> <20080508031315.GA29572@dmt> <20080619215340.GA20454@dmt.cnet> In-Reply-To: <20080619215340.GA20454@dmt.cnet> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200806192333.05194.paul@codesourcery.com> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Alberto =?iso-8859-1?q?Trevi=F1o?= , Marcelo Tosatti > > > > > > > The Windows driver has SCRIPTS code which busy loops on main > > > > > > > memory. So give the CPU's a chance to run if that happens. > > > > > > > > > > > > I'm kinda surprised this works. What causes the scripts engine > > > > > > to be restarted? > > > > > > > > > > LSI_ISTAT0_SIGP. > > > > > > > > In that case my surprise continues, and this is looking like an > > > > unbelievably horrid hack. > > > > > > > > By my reading you're making LSI_ISTAT0_SIGP effect whatever > > > > instruction happens to be executing when we stall. You get doubly > > > > lucky because (a) the guest OS decides to bang on SIGP, even though > > > > it doesn't need to. And (b) the last instruction executed happens to > > > > have set dnad to a value that "works". I'm guessing you always happen > > > > to stop execution on the conditional jump instruction and taking that > > > > jump doesn't cause any bad effects, right? > > > > > > Oh, I'd also be worried what happens if an async IO operation completes > > > at this point. lsi_command_complete is liable to trample all over your > > > state. > > > > So what do you suggest as a proper fix? > > What do you suggest as a proper fix to this problem? At minimum you need to address the issues I've raised with your current patch. Stalling execution temporarily every few hundred instructions and waiting for SIGP (or some other trigger) before resuming may be acceptable. Aborting execution and relying on very specific guest OS behavior to give correct results is not. The current code is written with the assumption that execution will only stop at very specific points. Your patch breaks this assumption. Ideally you'd also do proper loop detection rather than setting an arbitrary limit. I wouldn't be surprised if a good OS can create very long queues. Paul