From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoph Egger Subject: Re: libxl: error handling before xenstored runs Date: Thu, 10 Feb 2011 12:32:34 +0100 Message-ID: <201102101232.34933.Christoph.Egger@amd.com> References: <201102091213.06591.Christoph.Egger@amd.com> <4D53AF37.9010204@eu.citrix.com> <1297337081.20491.132.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1297337081.20491.132.camel@zakaz.uk.xensource.com> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Campbell Cc: Kamala Narasimhan , "xen-devel@lists.xensource.com" , Vincent Hanquez , Gianni Tedesco List-Id: xen-devel@lists.xenproject.org On Thursday 10 February 2011 12:24:41 Ian Campbell wrote: > On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote: > > On 10/02/11 08:55, Ian Campbell wrote: > > > That's the underlying bug which the heuristic is trying to avoid... > > > > > > Fundamentally the xs ring protocol is missing any way to tell if > > > someone is listening on the other end so you have no choice but to try > > > communicating and see if anyone responds. > > > > > > It's a pretty straightforward bug that the kernel does the waiting to > > > see if anyone responds bit with an uninterruptible sleep. I took a > > > quick look a little while ago but unfortunately it didn't look > > > straightforward to fix on the kernel side :-( I can't remember why > > > though. > > > > For starter, the protocol requires the messages to sit on the ring for a > > underdetermined amount of time (boot watches). > > > > > It might be simpler to support allowing the userspace client to > > > explicitly specify a timeout. I'm not sure what the impact on the ring > > > is of leaving unconsumed requests on the ring when the other end does > > > show up. Presumably the kernel driver just needs to be prepared to > > > swallow responses whose target has given up and gone home. > > > > No, the simplest thing to do is to use the socket connection > > exclusively. Just how we're doing it in XCP and XCI. > > Right but this approach doesn't work with xenstored in a stubdomain. > Part of the point of using the ring protocol even when this isn't the > case is to help ensure that it is possible and help avoid regressions > etc. > > > The protocol is not design to do async either, so leaving unconsumed > > request, could be pretty disastrous if the other end show up. Providing > > the kernel doesn't detect it (i don't think it does [1]), it would imply > > spurious reply, for example the previous waiting read on "/abc/def" > > could reply to a next read on "/xyz/123". > > The wire protocol includes a req_id which is echoed in the response > which sh/could facilitate multiplexing this sort of thing. The pvops > kernel currently always sets it to zero but that's just an > implementation detail ;-) Currently the kernel does (roughly): > take_lock > write_request > wait_for_reply > release_lock > instead it should/could be: > take_lock(timeout) > write_request (++req_id) > while read_reply.req_id != req_id && not (timeout) > wait some more > release lock I prefer a userland solution. Fixing Linux Dom0 doesn't help NetBSD Dom0. Christoph > OK, so may be this is not in the "might be simpler" bucket any more, but > it sounds like plausibly the right direction to take. > > Properly handling multiple userspace clients asynchronously a demuxes > the responses etc would be even better but I don't think necessary to > solve this particular issue. > > > > Maybe we should add an explicit ping/pong ring message to the xs ring > > > protocol? > > > > And who's going to reply to this if xenstored is missing ? you would > > require the kernel to introspect the messages and reply by itself. > > The reason I suggested new messages was that I would solve that by > declaring that these new messages have whatever magic semantics I need > to make this work ;-) > > Ian. -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632