From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vincent Hanquez Subject: Re: libxl: error handling before xenstored runs Date: Thu, 10 Feb 2011 09:26:15 +0000 Message-ID: <4D53AF37.9010204@eu.citrix.com> References: <201102091213.06591.Christoph.Egger@amd.com> <4D52B5DD.2060900@gmail.com> <201102091652.09218.Christoph.Egger@amd.com> <201102091654.29318.Christoph.Egger@amd.com> <1297273192.29419.7.camel@qabil.uk.xensource.com> <1297328120.1047.21.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1297328120.1047.21.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Campbell Cc: Kamala Narasimhan , Christoph Egger , "xen-devel@lists.xensource.com" , Gianni Tedesco List-Id: xen-devel@lists.xenproject.org On 10/02/11 08:55, Ian Campbell wrote: > That's the underlying bug which the heuristic is trying to avoid... > > Fundamentally the xs ring protocol is missing any way to tell if someone > is listening on the other end so you have no choice but to try > communicating and see if anyone responds. > > It's a pretty straightforward bug that the kernel does the waiting to > see if anyone responds bit with an uninterruptible sleep. I took a quick > look a little while ago but unfortunately it didn't look straightforward > to fix on the kernel side :-( I can't remember why though. For starter, the protocol requires the messages to sit on the ring for a underdetermined amount of time (boot watches). > It might be simpler to support allowing the userspace client to > explicitly specify a timeout. I'm not sure what the impact on the ring > is of leaving unconsumed requests on the ring when the other end does > show up. Presumably the kernel driver just needs to be prepared to > swallow responses whose target has given up and gone home. No, the simplest thing to do is to use the socket connection exclusively. Just how we're doing it in XCP and XCI. The protocol is not design to do async either, so leaving unconsumed request, could be pretty disastrous if the other end show up. Providing the kernel doesn't detect it (i don't think it does [1]), it would imply spurious reply, for example the previous waiting read on "/abc/def" could reply to a next read on "/xyz/123". > Maybe we should add an explicit ping/pong ring message to the xs ring > protocol? And who's going to reply to this if xenstored is missing ? you would require the kernel to introspect the messages and reply by itself. [1] the kernel would be happy to read the previous reply on the ring after xenstored has put the actual reply after it, and trigger the eventchn. (the kernel could actually check the requestid and see if they match, but it doesn't.) -- Vincent