* blktap race against xenstore startup
@ 2006-09-28 22:17 Stephen C. Tweedie
2006-09-28 22:45 ` Anthony Liguori
0 siblings, 1 reply; 7+ messages in thread
From: Stephen C. Tweedie @ 2006-09-28 22:17 UTC (permalink / raw)
To: xen-devel@lists.xensource.com
Cc: Julian Chesterfield, Steven Rostedt, Andrew Warfield
Hi all,
With the various blktap fixes I've recently posted, blktap runs
reliably... the *second* time we start xend. First time, blktapctrl
just dies on init.
It turns out that get_dom_domid() is SEGVing. It calls
e = xs_directory(h, xth, "/local/domain", &num);
and then iterates over the results to find the domain with the right
name (in this case, "Domain-0", which should be easy to find!) Trouble
is, it's racing with xenstore startup, and when it calls this the first
time, it gets back an ENOENT (easily seen on an strace.) That returns
e=NULL, and everything falls apart.
I have "fixed" it locally with the following terrible hack:
+ for (i = 0; i < 10; i++) {
+ e = xs_directory(h, xth, "/local/domain", &num);
+ if (e)
+ break;
+ sleep(1);
+ }
- e = xs_directory(h, xth, "/local/domain", &num);
-
- for (i = 0; (i < num) && (domid == NULL); i++) {
+ for (i = 0; e && (i < num) && (domid == NULL); i++) {
which just loops calling xs_directory() with a 1-second pause in between
until it returns something sensible.
Ugh. There has got to be a better way to synchronise with the initial
population of the dom0 information into xenstore, surely? Has no other
component of the Xen stack ever seen this before?
--Stephen
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: blktap race against xenstore startup
2006-09-28 22:17 blktap race against xenstore startup Stephen C. Tweedie
@ 2006-09-28 22:45 ` Anthony Liguori
2006-09-28 23:23 ` Stephen C. Tweedie
0 siblings, 1 reply; 7+ messages in thread
From: Anthony Liguori @ 2006-09-28 22:45 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Warfield, Julian Chesterfield, Steven Rostedt
Stephen C. Tweedie wrote:
> Hi all,
>
> With the various blktap fixes I've recently posted, blktap runs
> reliably... the *second* time we start xend. First time, blktapctrl
> just dies on init.
>
> It turns out that get_dom_domid() is SEGVing. It calls
>
> e = xs_directory(h, xth, "/local/domain", &num);
>
> and then iterates over the results to find the domain with the right
> name (in this case, "Domain-0", which should be easy to find!) Trouble
> is, it's racing with xenstore startup, and when it calls this the first
> time, it gets back an ENOENT (easily seen on an strace.) That returns
> e=NULL, and everything falls apart.
>
> I have "fixed" it locally with the following terrible hack:
>
> + for (i = 0; i < 10; i++) {
> + e = xs_directory(h, xth, "/local/domain", &num);
> + if (e)
> + break;
> + sleep(1);
> + }
>
> - e = xs_directory(h, xth, "/local/domain", &num);
> -
> - for (i = 0; (i < num) && (domid == NULL); i++) {
> + for (i = 0; e && (i < num) && (domid == NULL); i++) {
>
> which just loops calling xs_directory() with a 1-second pause in between
> until it returns something sensible.
>
> Ugh. There has got to be a better way to synchronise with the initial
> population of the dom0 information into xenstore, surely? Has no other
> component of the Xen stack ever seen this before?
I don't know how blktap is launched right now, but the same problem has
occurred in the past for other daemons (like xenconsoled).
xenstored won't close standard output until it's ready to receive
connections. xend start will wait to start the other daemons until
xenstored is ready. How does blktap get spawned?
Regards,
Anthony Liguori
> --Stephen
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Re: blktap race against xenstore startup
2006-09-28 22:45 ` Anthony Liguori
@ 2006-09-28 23:23 ` Stephen C. Tweedie
2006-09-29 1:15 ` Anthony Liguori
2006-09-29 6:54 ` Keir Fraser
0 siblings, 2 replies; 7+ messages in thread
From: Stephen C. Tweedie @ 2006-09-28 23:23 UTC (permalink / raw)
To: Anthony Liguori
Cc: Andrew Warfield, xen-devel@lists.xensource.com, Steven Rostedt,
Julian Chesterfield
Hi,
On Thu, 2006-09-28 at 17:45 -0500, Anthony Liguori wrote:
> > Ugh. There has got to be a better way to synchronise with the initial
> > population of the dom0 information into xenstore, surely? Has no other
> > component of the Xen stack ever seen this before?
>
> I don't know how blktap is launched right now, but the same problem has
> occurred in the past for other daemons (like xenconsoled).
>
> xenstored won't close standard output until it's ready to receive
> connections. xend start will wait to start the other daemons until
> xenstored is ready. How does blktap get spawned?
It (the blktapctrl userland daemon) gets execve'd by xend:
elif sys.argv[1] == 'start':
start_xenstored()
start_consoled()
start_blktapctrl()
return daemon.start()
The problem is not that xenstored is dead: it's alive and running, it
just hasn't had the /local/domain tree filled in, so it returns ENOENT.
xenstored *is* ready, but that's not enough.
--Stephen
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Re: blktap race against xenstore startup
2006-09-28 23:23 ` Stephen C. Tweedie
@ 2006-09-29 1:15 ` Anthony Liguori
2006-09-29 6:54 ` Keir Fraser
1 sibling, 0 replies; 7+ messages in thread
From: Anthony Liguori @ 2006-09-29 1:15 UTC (permalink / raw)
To: Stephen C. Tweedie
Cc: Andrew Warfield, xen-devel@lists.xensource.com, Steven Rostedt,
Julian Chesterfield
Stephen C. Tweedie wrote:
> Hi,
>
> On Thu, 2006-09-28 at 17:45 -0500, Anthony Liguori wrote:
>
>
>>> Ugh. There has got to be a better way to synchronise with the initial
>>> population of the dom0 information into xenstore, surely? Has no other
>>> component of the Xen stack ever seen this before?
>>>
>> I don't know how blktap is launched right now, but the same problem has
>> occurred in the past for other daemons (like xenconsoled).
>>
>> xenstored won't close standard output until it's ready to receive
>> connections. xend start will wait to start the other daemons until
>> xenstored is ready. How does blktap get spawned?
>>
>
> It (the blktapctrl userland daemon) gets execve'd by xend:
>
> elif sys.argv[1] == 'start':
> start_xenstored()
> start_consoled()
> start_blktapctrl()
> return daemon.start()
>
> The problem is not that xenstored is dead: it's alive and running, it
> just hasn't had the /local/domain tree filled in, so it returns ENOENT.
> xenstored *is* ready, but that's not enough.
>
Ah, I see. So it sounds like blktapctrl ought to be setting a watch for
/local/domain.
Regards,
Anthony Liguori
> --Stephen
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Re: blktap race against xenstore startup
2006-09-28 23:23 ` Stephen C. Tweedie
2006-09-29 1:15 ` Anthony Liguori
@ 2006-09-29 6:54 ` Keir Fraser
2006-10-02 9:36 ` Stephen C. Tweedie
1 sibling, 1 reply; 7+ messages in thread
From: Keir Fraser @ 2006-09-29 6:54 UTC (permalink / raw)
To: Stephen C. Tweedie, Anthony Liguori
Cc: Andrew Warfield, xen-devel@lists.xensource.com, Steven Rostedt,
Julian Chesterfield
On 29/9/06 12:23 am, "Stephen C. Tweedie" <sct@redhat.com> wrote:
> It (the blktapctrl userland daemon) gets execve'd by xend:
>
> elif sys.argv[1] == 'start':
> start_xenstored()
> start_consoled()
> start_blktapctrl()
> return daemon.start()
>
> The problem is not that xenstored is dead: it's alive and running, it
> just hasn't had the /local/domain tree filled in, so it returns ENOENT.
> xenstored *is* ready, but that's not enough.
Set a watch on /local/domain and wait for the directory to appear? Not a
beautiful approach, but better than spinning a few times? :-)
-- Keir
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Re: blktap race against xenstore startup
2006-09-29 6:54 ` Keir Fraser
@ 2006-10-02 9:36 ` Stephen C. Tweedie
2006-10-02 10:14 ` Keir Fraser
0 siblings, 1 reply; 7+ messages in thread
From: Stephen C. Tweedie @ 2006-10-02 9:36 UTC (permalink / raw)
To: Keir Fraser
Cc: Anthony Liguori, xen-devel@lists.xensource.com, Steven Rostedt,
Andrew Warfield, Julian Chesterfield
Hi,
On Fri, 2006-09-29 at 07:54 +0100, Keir Fraser wrote:
> > The problem is not that xenstored is dead: it's alive and running, it
> > just hasn't had the /local/domain tree filled in, so it returns ENOENT.
> > xenstored *is* ready, but that's not enough.
>
> Set a watch on /local/domain and wait for the directory to appear? Not a
> beautiful approach, but better than spinning a few times? :-)
OK, I didn't realise we could set watches on non-existant paths in the
store, but it seems like that should work.
I was wondering if there was a way to synchronise against xend itself,
though: doing it through the store is a little ugly. But yes, it's
probably better than looping.
Cheers,
Stephen
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Re: blktap race against xenstore startup
2006-10-02 9:36 ` Stephen C. Tweedie
@ 2006-10-02 10:14 ` Keir Fraser
0 siblings, 0 replies; 7+ messages in thread
From: Keir Fraser @ 2006-10-02 10:14 UTC (permalink / raw)
To: Stephen C. Tweedie
Cc: Andrew Warfield, Anthony Liguori, xen-devel@lists.xensource.com,
Steven Rostedt, Julian Chesterfield
On 2/10/06 10:36, "Stephen C. Tweedie" <sct@redhat.com> wrote:
>>> The problem is not that xenstored is dead: it's alive and running, it
>>> just hasn't had the /local/domain tree filled in, so it returns ENOENT.
>>> xenstored *is* ready, but that's not enough.
>>
>> Set a watch on /local/domain and wait for the directory to appear? Not a
>> beautiful approach, but better than spinning a few times? :-)
>
> OK, I didn't realise we could set watches on non-existant paths in the
> store, but it seems like that should work.
Actually now you mention it I'm not 100% certain that you can; I'd need to
double check that. I think it's something we should allow even if not
though. Or you could set the watch on / and filter. Not much happens until
/local/domain is set up so you won't get many (any?) false watch firings.
> I was wondering if there was a way to synchronise against xend itself,
> though: doing it through the store is a little ugly. But yes, it's
> probably better than looping.
It's the obvious way of doing it imo. Xenstore is an always-available
service, even if we decide to disaggregate domain0 in future (e.g., move
blktap daemon to a different VM). I guess it depends on your p.o.v. :-)
-- Keir
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-10-02 10:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-28 22:17 blktap race against xenstore startup Stephen C. Tweedie
2006-09-28 22:45 ` Anthony Liguori
2006-09-28 23:23 ` Stephen C. Tweedie
2006-09-29 1:15 ` Anthony Liguori
2006-09-29 6:54 ` Keir Fraser
2006-10-02 9:36 ` Stephen C. Tweedie
2006-10-02 10:14 ` Keir Fraser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.