From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:54748) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QSCh6-0003BE-8y for qemu-devel@nongnu.org; Thu, 02 Jun 2011 14:32:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QSCh3-00058Z-2x for qemu-devel@nongnu.org; Thu, 02 Jun 2011 14:32:31 -0400 Received: from mail-yw0-f45.google.com ([209.85.213.45]:62674) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QSCh2-00058F-A0 for qemu-devel@nongnu.org; Thu, 02 Jun 2011 14:32:28 -0400 Received: by ywl41 with SMTP id 41so538144ywl.4 for ; Thu, 02 Jun 2011 11:32:27 -0700 (PDT) Message-ID: <4DE7D739.6080607@codemonkey.ws> Date: Thu, 02 Jun 2011 13:32:25 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <20110601181255.077fb5fd@doriath> <4DE6B087.6010708@codemonkey.ws> <20110602090632.GB14571@redhat.com> <4DE78B53.1010201@codemonkey.ws> <20110602132405.GJ514380@orkuz.home> <4DE797F6.2060004@codemonkey.ws> <20110602150124.0b3c187f@doriath> In-Reply-To: <20110602150124.0b3c187f@doriath> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Luiz Capitulino Cc: Kevin Wolf , Stefan Hajnoczi , Jiri Denemark , qemu-devel@nongnu.org, Markus Armbruster On 06/02/2011 01:01 PM, Luiz Capitulino wrote: > On Thu, 02 Jun 2011 09:02:30 -0500 > Anthony Liguori wrote: > >> On 06/02/2011 08:24 AM, Jiri Denemark wrote: >>> On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote: >>>> On 06/02/2011 04:06 AM, Daniel P. Berrange wrote: >>>>>>> B. query-stop-reason >>>>>>> -------------------- >>>>>>> >>>>>>> I also have a simple solution for item 2. The vm_stop() accepts a reason >>>>>>> argument, so we could store it somewhere and return it as a string, like: >>>>>>> >>>>>>> -> { "execute": "query-stop-reason" } >>>>>>> <- { "return": { "reason": "user" } } >>>>>>> >>>>>>> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey, >>>>>>> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm", >>>>>>> "migrate". >>>>>>> >>>>>>> Also note that we have a STOP event. It should be extended with the >>>>>>> stop reason too, for completeness. >>>>>> >>>>>> >>>>>> Can we just extend query-block? >>>>> >>>>> Primarily we want 'query-stop-reason' to tell us what caused the VM >>>>> CPUs to stop. If that reason was 'ioerror', then 'query-block' could >>>>> be used to find out which particular block device(s) caused the IO >>>>> error to occurr& get the "reason" that was in the BLOCK_IO_ERROR >>>>> event. >>>> >>>> My concern is that we're over abstracting here. We're not going to add >>>> additional stop reasons in the future. >>>> >>>> Maybe just add an 'io-error': True to query-state. >>> >>> Sure, adding a new field to query-state response would work as well. And it >>> seems like a good idea to me since one already needs to call query-status to >>> check if CPUs are stopped or not so it makes sense to incorporate the >>> additional information there as well. And if you want to be safe for the >>> future, the new field doesn't have to be boolean 'io-error' but it can be the >>> string 'reason' which Luiz suggested above. >> >> >> String enumerations are a Bad Thing. It's impossible to figure out what >> strings are valid and it lacks type safety. >> >> Adding more booleans provides better type safety, and when we move to >> QAPI with a queryable schema, provides a way to figure out exactly what >> combinations are supported by QEMU. > > To summarize: > > 1. Add a 'io-error' field to query-status (which is only present if > field 'running' is false) It may or may not be present. Lack of presence does not tell you anything. It is only true when running is false AND the guest was stopped because of an io error. > > 2. Extend query-block to contain error information associated with the > device. This is interesting, because this information will be available > even if the error didn't cause the VM to stop Well we need at least some way to indicate that a block device is in a failed state. For instance, if you have two block device, but you miss the IO_ERROR event, you need to figure out which of the two devices is giving errors. But I was thinking of something that had the semantics of, last_iop_failed. Regards, Anthony Liguori > Seems good enough to me, comments? >