From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:39009) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QSE7b-0008Mm-Ko for qemu-devel@nongnu.org; Thu, 02 Jun 2011 16:04:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QSE7Z-0004m2-CP for qemu-devel@nongnu.org; Thu, 02 Jun 2011 16:03:59 -0400 Received: from mail-pz0-f45.google.com ([209.85.210.45]:38878) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QSE7Y-0004ll-Tp for qemu-devel@nongnu.org; Thu, 02 Jun 2011 16:03:57 -0400 Received: by pzk30 with SMTP id 30so601365pzk.4 for ; Thu, 02 Jun 2011 13:03:55 -0700 (PDT) Message-ID: <4DE7ECA8.1050202@codemonkey.ws> Date: Thu, 02 Jun 2011 15:03:52 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <20110601181255.077fb5fd@doriath> <4DE6B087.6010708@codemonkey.ws> <20110602145730.4c80d668@doriath> <4DE7CFA4.9040300@codemonkey.ws> <20110602150900.7d2657fb@doriath> <4DE7D790.70807@codemonkey.ws> <20110602161318.0d9a2194@doriath> In-Reply-To: <20110602161318.0d9a2194@doriath> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Luiz Capitulino Cc: Kevin Wolf , "libvir-list@redhat.com" , Stefan Hajnoczi , qemu-devel@nongnu.org, Markus Armbruster , jdenemar@redhat.com On 06/02/2011 02:13 PM, Luiz Capitulino wrote: > On Thu, 02 Jun 2011 13:33:52 -0500 > Anthony Liguori wrote: > >> On 06/02/2011 01:09 PM, Luiz Capitulino wrote: >>> On Thu, 02 Jun 2011 13:00:04 -0500 >>> Anthony Liguori wrote: >>> >>>> On 06/02/2011 12:57 PM, Luiz Capitulino wrote: >>>>> On Wed, 01 Jun 2011 16:35:03 -0500 >>>>> Anthony Liguori wrote: >>>>> >>>>>> On 06/01/2011 04:12 PM, Luiz Capitulino wrote: >>>>>>> Hi there, >>>>>>> >>>>>>> There are people who want to use QMP for thin provisioning. That's, the VM is >>>>>>> started with a small storage and when a no space error is triggered, more space >>>>>>> is allocated and the VM is put to run again. >>>>>>> >>>>>>> QMP has two limitations that prevent people from doing this today: >>>>>>> >>>>>>> 1. The BLOCK_IO_ERROR doesn't contain error information >>>>>>> >>>>>>> 2. Considering we solve item 1, we still have to provide a way for clients >>>>>>> to query why a VM stopped. This is needed because clients may miss the >>>>>>> BLOCK_IO_ERROR event or may connect to the VM while it's already stopped >>>>>>> >>>>>>> A proposal to solve both problems follow. >>>>>>> >>>>>>> A. BLOCK_IO_ERROR information >>>>>>> ----------------------------- >>>>>>> >>>>>>> We already have discussed this a lot, but didn't reach a consensus. My solution >>>>>>> is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event, >>>>>>> for example (see the "reason" key): >>>>>>> >>>>>>> { "event": "BLOCK_IO_ERROR", >>>>>>> "data": { "device": "ide0-hd1", >>>>>>> "operation": "write", >>>>>>> "action": "stop", >>>>>>> "reason": "enospc", } >>>>>> >>>>>> you can call the reason whatever you want, but don't call it stringfied >>>>>> errno name :-) >>>>>> >>>>>> In fact, just make reason "no space". >>>>> >>>>> You mean, we should do: >>>>> >>>>> "reason": "no space" >>>>> >>>>> Or that we should make it a boolean, like: >>>>> >>>>> "no space": true >>>> >>>> >>>> Do we need reason in BLOCK_IO_ERROR if query-block returns this information? >>> >>> True, no. >>> >>>>> I'm ok with either way. But in case you meant the second one, I guess >>>>> we should make "reason" a dictionary so that we can group related >>>>> information when we extend the field, for example: >>>>> >>>>> "reason": { "no space": false, "no permission": true } >>>> >>>> Why would we ever have "no permission"? >> >> Why did it happen? It's not clear to me when read/write would return >> EPERM. open() should fail. In fact, EPERM is not mentioned in man 2 read. > > Actually, the error was an EACCESS which might sound more bizarre :) > > What happened was that the device file in question had its permission > changed during VM execution due to a bug somewhere else. I'm not sure if > the error was returned in a read() or write() (Kevin might have more details). Strange, EACCES should only happen on open(). Is it possible that somehow a reopen was happening? > This is a bit extreme and I'd agree it's arguable whether or not we should > report EACCESS, but I had this in mind and ended up mentioning it... If we can't explain why an error would occur, we shouldn't make it part of the protocol :-) > Maybe libvirt guys could provide more input wrt the error reason usage. > If we don't have valid use cases for other errors, then I'll agree that > providing only "no space" is enough. Definitely! Adding libvirt to the CC to help encourage their input. Regards, Anthony Liguori