From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:51498) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RYNYB-0003Kx-MR for qemu-devel@nongnu.org; Wed, 07 Dec 2011 14:53:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RYNYA-000713-7t for qemu-devel@nongnu.org; Wed, 07 Dec 2011 14:53:07 -0500 Received: from mail-qy0-f173.google.com ([209.85.216.173]:35736) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RYNYA-00070x-18 for qemu-devel@nongnu.org; Wed, 07 Dec 2011 14:53:06 -0500 Received: by qcsd15 with SMTP id d15so856517qcs.4 for ; Wed, 07 Dec 2011 11:53:05 -0800 (PST) Message-ID: <4EDFC41C.9030003@codemonkey.ws> Date: Wed, 07 Dec 2011 13:53:00 -0600 From: Anthony Liguori MIME-Version: 1.0 References: <20111205222208.31271.65662.stgit@ginnungagap.bsc.es> <20111205222312.31271.66303.stgit@ginnungagap.bsc.es> <4EDE7343.3010305@codemonkey.ws> <87ty5d1gfw.fsf@ginnungagap.bsc.es> <4EDE98BA.9070902@codemonkey.ws> <4EDF6EFE.3040303@codemonkey.ws> <4EDFC20B.8010604@linux.vnet.ibm.com> In-Reply-To: <4EDFC20B.8010604@linux.vnet.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] Insane virtio-serial semantics List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michael Roth Cc: Zhi Yong Wu , Markus Armbruster , qemu-devel@nongnu.org, Blue Swirl , amit.shah@redhat.com, Cam Macdonell , =?UTF-8?B?TGx1?= =?UTF-8?B?w61zIFZpbGFub3Zh?= On 12/07/2011 01:44 PM, Michael Roth wrote: > On 12/07/2011 07:49 AM, Anthony Liguori wrote: >> On 12/07/2011 02:21 AM, Markus Armbruster wrote: >>> Anthony Liguori writes: >>> >>>> On 12/06/2011 04:30 PM, LluĂ­s Vilanova wrote: >>>>> Anthony Liguori writes: >>>>> >>>>>> I really worry about us introducing so many of these one-off >>>>>> paravirtual devices. >>>>>> I would much prefer that you look at doing this as an extension to >>>>>> the ivshmem >>>>>> device as it already has this sort of scope. You should be able to >>>>>> do this by >>>>>> just extending the size of bar 1 and using a well known guest id. >>>>> >>>>> I did in fact look at ivshmem some time ago, and it's true that both >>>>> use the >>>>> same mechanisms; but each device has a completely different purpose. >>>>> To me it >>>>> just seems that extending the control BAR in ivshmem to call the >>>>> user-provided >>>>> backdoor callbacks is just conflating two completely separate >>>>> devices into a >>>>> single one. Besides, I think that the qemu-side of the backdoor is >>>>> simple enough >>>>> to avoid being a maintenance burden. >>>> >>>> They have the same purpose (which are both vague TBH). The only >>>> reason I'm sympathetic to this device is that virtio-serial has such >>>> insane semantics. >>> >>> Could you summarize what's wrong? Is it fixable? >> >> I don't think so as it's part of the userspace ABI now. >> >> Mike, please help me make sure I get this all right. A normal >> file/socket has the following guest semantics: >> >> 1) When a disconnect occurs, you will receive a return of '0' or -EPIPE >> depending on the platform. The fd is now unusable and you must >> close/reopen. >> >> 2) You can setup SIGIO/SIGPIPE to fire off whenever a file descriptor >> becomes readable/writable. >> >> virtio serial has the following semantics: >> >> 1) When a disconnect occurs, if you read() you will receive an -EPIPE. >> >> 2) However, if a reconnect occurs before you issue your read(), the read >> will complete with no indication that a disconnect occurred. >> >> 3) This makes it impossible to determine whether a disconnect has >> occurred which makes it very hard to reset your protocol stream. To deal >> with this, virtio-serial can issue a SIGIO signal upon disconnect. >> >> 4) Signals are asynchronous, so a reconnect may have occurred by the >> time you get the SIGIO signal. It's unclear that you can do anything >> useful with this. > > That about sums it up. There was a thread about this a while back where there > was some tentative agreement on a way to fix this by introducing QEMU flags that > invoke similar semantics to unix sockets: > > http://thread.gmane.org/gmane.comp.emulators.qemu/94721/focus=95496 > > But at this point we'd need to re-visit, since there's a fair number of > virtio-serial users now. It'd probably need to be something you could switch on > from the guest via an fcntl() or something. > >> >> So besides overloading the meaning of SIGIO, there's really no way to >> figure out in the guest when a reconnect has occurred. To deal with this >> in qemu-ga, we actually only allow 7-bit data transfers and use the 8th >> bit as an in-band message to tell the guest that a reset has occurred. > > Yup, it's not perfect though, due to a delayed/spurious response from an agent > that sent data before it read/handled the reset sequence. We don't get that > problem with unix sockets since they'd get an -EPIPE and would be blocked from > sending to a newly opened session. > > We try to account for this on the host by following up a reset sequences will > the guest-sync RPC, which contains a unique ID that the guest echos back to us. > That way we can throw away stale data on the host until we get the intended > response. In our case, it's not quite perfect since if the agent sent a "{" > before getting reset, subsequent transmission of the guest-sync response can be > lost. We'd need to precede responses to guest-sync with a 0xFF as well, so that > the host flushes it's rcv buffer/parser state... > > And, somewhat off-topic, but none of addresses the case where an agent hangs on > an RPC. This would require some additional handling by the agent side where we > might have tie some additional action to the 0xFF sequence. > > Previously this scenario was handled by a hard-coded timeout mechanism in the > agent, with a seperate thread handling the RPCs, but we've since dropped the > thread due to potential for memory leaks (with plans to re-introduce using a > child process). > > client-induced resets would be much nicer though, and a reserved byte is the > best solution we've been able to come up with given the current virtio-serial > semantics. Yeah, we really need a "sane reset semantics" flag for virtio-serial that provides a guest and host initiated channel close mechanism. I think you need to do this by using a single ring and using a simple session id with an explicit open/close message. That way there is never ambiguity. And yes, I can't help but think of Dave Millers comments long ago that any PV transport is eventually going to reinvent TCP, poorly. Regards, Anthony Liguori > >> >> Regards, >> >> Anthony Liguori >> >>> >>> [...] >>> >> > >