From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: blktap race against xenstore startup Date: Thu, 28 Sep 2006 17:45:33 -0500 Message-ID: <451C508D.1070103@us.ibm.com> References: <1159481874.8884.30.camel@sisko.scot.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1159481874.8884.30.camel@sisko.scot.redhat.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com Cc: Andrew Warfield , Julian Chesterfield , Steven Rostedt List-Id: xen-devel@lists.xenproject.org Stephen C. Tweedie wrote: > Hi all, > > With the various blktap fixes I've recently posted, blktap runs > reliably... the *second* time we start xend. First time, blktapctrl > just dies on init. > > It turns out that get_dom_domid() is SEGVing. It calls > > e = xs_directory(h, xth, "/local/domain", &num); > > and then iterates over the results to find the domain with the right > name (in this case, "Domain-0", which should be easy to find!) Trouble > is, it's racing with xenstore startup, and when it calls this the first > time, it gets back an ENOENT (easily seen on an strace.) That returns > e=NULL, and everything falls apart. > > I have "fixed" it locally with the following terrible hack: > > + for (i = 0; i < 10; i++) { > + e = xs_directory(h, xth, "/local/domain", &num); > + if (e) > + break; > + sleep(1); > + } > > - e = xs_directory(h, xth, "/local/domain", &num); > - > - for (i = 0; (i < num) && (domid == NULL); i++) { > + for (i = 0; e && (i < num) && (domid == NULL); i++) { > > which just loops calling xs_directory() with a 1-second pause in between > until it returns something sensible. > > Ugh. There has got to be a better way to synchronise with the initial > population of the dom0 information into xenstore, surely? Has no other > component of the Xen stack ever seen this before? I don't know how blktap is launched right now, but the same problem has occurred in the past for other daemons (like xenconsoled). xenstored won't close standard output until it's ready to receive connections. xend start will wait to start the other daemons until xenstored is ready. How does blktap get spawned? Regards, Anthony Liguori > --Stephen