From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1K9Sgo-0001me-Nv
	for qemu-devel@nongnu.org; Thu, 19 Jun 2008 18:33:10 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1K9Sgn-0001mN-Bl
	for qemu-devel@nongnu.org; Thu, 19 Jun 2008 18:33:10 -0400
Received: from [199.232.76.173] (port=55989 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1K9Sgn-0001mK-6P
	for qemu-devel@nongnu.org; Thu, 19 Jun 2008 18:33:09 -0400
Received: from mail.codesourcery.com ([65.74.133.4]:36518)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <paul@codesourcery.com>) id 1K9Sgm-0001Rx-Tm
	for qemu-devel@nongnu.org; Thu, 19 Jun 2008 18:33:09 -0400
From: Paul Brook <paul@codesourcery.com>
Subject: Re: [Qemu-devel] Re: LSI: avoid infinite loops
Date: Thu, 19 Jun 2008 23:33:04 +0100
References: <20080507230206.GB28197@dmt> <20080508031315.GA29572@dmt>
	<20080619215340.GA20454@dmt.cnet>
In-Reply-To: <20080619215340.GA20454@dmt.cnet>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200806192333.05194.paul@codesourcery.com>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Alberto =?iso-8859-1?q?Trevi=F1o?= <alberto@byu.edu>, Marcelo Tosatti <mtosatti@redhat.com>

> > > > > > > The Windows driver has SCRIPTS code which busy loops on main
> > > > > > > memory. So give the CPU's a chance to run if that happens.
> > > > > >
> > > > > > I'm kinda surprised this works.  What causes the scripts engine
> > > > > > to be restarted?
> > > > >
> > > > > LSI_ISTAT0_SIGP.
> > > >
> > > > In that case my surprise continues, and this is looking like an
> > > > unbelievably horrid hack.
> > > >
> > > > By my reading you're making LSI_ISTAT0_SIGP effect whatever
> > > > instruction happens to be executing when we stall. You get doubly
> > > > lucky because (a) the guest OS decides to bang on SIGP, even though
> > > > it doesn't need to. And (b) the last instruction executed happens to
> > > > have set dnad to a value that "works". I'm guessing you always happen
> > > > to stop execution on the conditional jump instruction and taking that
> > > > jump doesn't cause any bad effects, right?
> > >
> > > Oh, I'd also be worried what happens if an async IO operation completes
> > > at this point. lsi_command_complete is liable to trample all over your
> > > state.
> >
> > So what do you suggest as a proper fix?
>
> What do you suggest as a proper fix to this problem?

At minimum you need to address the issues I've raised with your current patch.  
Stalling execution temporarily every few hundred instructions and waiting for 
SIGP (or some other trigger) before resuming may be acceptable.  Aborting 
execution and relying on very specific guest OS behavior to give correct 
results is not.  The current code is written with the assumption that 
execution will only stop at very specific points. Your patch breaks this 
assumption.

Ideally you'd also do proper loop detection rather than setting an arbitrary 
limit. I wouldn't be surprised if a good OS can create very long queues.

Paul