From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:45414) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QSDgT-0001Im-F2 for qemu-devel@nongnu.org; Thu, 02 Jun 2011 15:35:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QSDgR-0007PG-RT for qemu-devel@nongnu.org; Thu, 02 Jun 2011 15:35:57 -0400 Received: from mail-gx0-f173.google.com ([209.85.161.173]:35132) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QSDgR-0007PA-Jg for qemu-devel@nongnu.org; Thu, 02 Jun 2011 15:35:55 -0400 Received: by gxk26 with SMTP id 26so561650gxk.4 for ; Thu, 02 Jun 2011 12:35:54 -0700 (PDT) Message-ID: <4DE7E619.5050203@codemonkey.ws> Date: Thu, 02 Jun 2011 14:35:53 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <20110601181255.077fb5fd@doriath> <4DE6B087.6010708@codemonkey.ws> <20110602090632.GB14571@redhat.com> <4DE78B53.1010201@codemonkey.ws> <20110602132405.GJ514380@orkuz.home> <4DE797F6.2060004@codemonkey.ws> <20110602150124.0b3c187f@doriath> <4DE7D739.6080607@codemonkey.ws> <20110602155737.13f48a46@doriath> In-Reply-To: <20110602155737.13f48a46@doriath> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Luiz Capitulino Cc: Kevin Wolf , Stefan Hajnoczi , Jiri Denemark , qemu-devel@nongnu.org, Markus Armbruster On 06/02/2011 01:57 PM, Luiz Capitulino wrote: > On Thu, 02 Jun 2011 13:32:25 -0500 > Anthony Liguori wrote: > >> On 06/02/2011 01:01 PM, Luiz Capitulino wrote: >>> On Thu, 02 Jun 2011 09:02:30 -0500 >>> Anthony Liguori wrote: >>> >>>> On 06/02/2011 08:24 AM, Jiri Denemark wrote: >>>>> On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote: >>>>>> On 06/02/2011 04:06 AM, Daniel P. Berrange wrote: >>>>>>>>> B. query-stop-reason >>>>>>>>> -------------------- >>>>>>>>> >>>>>>>>> I also have a simple solution for item 2. The vm_stop() accepts a reason >>>>>>>>> argument, so we could store it somewhere and return it as a string, like: >>>>>>>>> >>>>>>>>> -> { "execute": "query-stop-reason" } >>>>>>>>> <- { "return": { "reason": "user" } } >>>>>>>>> >>>>>>>>> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey, >>>>>>>>> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm", >>>>>>>>> "migrate". >>>>>>>>> >>>>>>>>> Also note that we have a STOP event. It should be extended with the >>>>>>>>> stop reason too, for completeness. >>>>>>>> >>>>>>>> >>>>>>>> Can we just extend query-block? >>>>>>> >>>>>>> Primarily we want 'query-stop-reason' to tell us what caused the VM >>>>>>> CPUs to stop. If that reason was 'ioerror', then 'query-block' could >>>>>>> be used to find out which particular block device(s) caused the IO >>>>>>> error to occurr& get the "reason" that was in the BLOCK_IO_ERROR >>>>>>> event. >>>>>> >>>>>> My concern is that we're over abstracting here. We're not going to add >>>>>> additional stop reasons in the future. >>>>>> >>>>>> Maybe just add an 'io-error': True to query-state. >>>>> >>>>> Sure, adding a new field to query-state response would work as well. And it >>>>> seems like a good idea to me since one already needs to call query-status to >>>>> check if CPUs are stopped or not so it makes sense to incorporate the >>>>> additional information there as well. And if you want to be safe for the >>>>> future, the new field doesn't have to be boolean 'io-error' but it can be the >>>>> string 'reason' which Luiz suggested above. >>>> >>>> >>>> String enumerations are a Bad Thing. It's impossible to figure out what >>>> strings are valid and it lacks type safety. >>>> >>>> Adding more booleans provides better type safety, and when we move to >>>> QAPI with a queryable schema, provides a way to figure out exactly what >>>> combinations are supported by QEMU. >>> >>> To summarize: >>> >>> 1. Add a 'io-error' field to query-status (which is only present if >>> field 'running' is false) >> >> It may or may not be present. Lack of presence does not tell you anything. >> >> It is only true when running is false AND the guest was stopped because >> of an io error. > > Right. > >>> >>> 2. Extend query-block to contain error information associated with the >>> device. This is interesting, because this information will be available >>> even if the error didn't cause the VM to stop >> >> Well we need at least some way to indicate that a block device is in a >> failed state. For instance, if you have two block device, but you miss >> the IO_ERROR event, you need to figure out which of the two devices is >> giving errors. > > Can't query-block be used for that? The 'io-error' key will only be present > for the failing device(s). Yes, I think we're in violent agreement. Regards, Anthony Liguori > >> >> But I was thinking of something that had the semantics of, last_iop_failed. >> >> Regards, >> >> Anthony Liguori >> >>> Seems good enough to me, comments? >>> >> > >