* XenStore Watch Behavior
@ 2006-08-26 20:32 John McCullough
2006-08-27 14:57 ` Keir Fraser
0 siblings, 1 reply; 10+ messages in thread
From: John McCullough @ 2006-08-26 20:32 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1574 bytes --]
Hello,
I have noticed some issues with watches on XenStore. Mainly that
multiple watches on the same node in a hierachy or a watch on a node and
a child of that node do not fire as one might expect.
I have been working on hvm domain forking and I am using the
XenStore to communicate between xend and the qemu-dm. The first issue
that I noticed was that you cannot use a single node to communicate
state. Only one of the watches on the node would fire and no
communication could occur.
Using two nodes for bidirectional communication worked fine in
normal operation, however, I discovered that during shutdown some other
watch existed on the domain's path in the store and it blocked the
watches on the xend side. Initially I was using a combination of
xswatch with a Semaphore to perform blocking reads and the xswatch
function was never getting triggered. I changed to using the interface
more directly via xs.watch and xs.read_watch. I could block and read
data, but after my own function terminated the xswatch interface would
try to execute my token as an xswatch token. Adding a no-op .fn and
empty .args and .kwargs to my token let this pass through.
Unfortunately in general operations before guest destruction the changes
that I wanted to be caught by xs.read_watch were being consumed by an
unrelated xs.watch.
What is the intended behavior of watches on the XenStore? Should
only one watch be allowed on a given sub-hierarchy? Should the most
specific watch be triggered alone? Should all watches be triggered?
Regards,
John McCullough
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XenStore Watch Behavior
2006-08-26 20:32 XenStore Watch Behavior John McCullough
@ 2006-08-27 14:57 ` Keir Fraser
2006-08-29 0:48 ` John McCullough
0 siblings, 1 reply; 10+ messages in thread
From: Keir Fraser @ 2006-08-27 14:57 UTC (permalink / raw)
To: John McCullough, xen-devel
On 26/8/06 9:32 pm, "John McCullough" <jmccullo@cs.ucsd.edu> wrote:
> What is the intended behavior of watches on the XenStore? Should
> only one watch be allowed on a given sub-hierarchy? Should the most
> specific watch be triggered alone? Should all watches be triggered?
I believe it's all supposed to work in a very obvious and simple way: All
watches registered on a prefix of the updated node's path should be fired. A
single transaction can fire the same watch multiple times if that watch is
on a common prefix of a number of nodes updated by that transaction (since
each firing event specifies the full path of the modified node, so events
can't really be merged).
If you observe different behaviour from this then it is most likely a bug
and we would love to receive patches!
-- Keir
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XenStore Watch Behavior
2006-08-27 14:57 ` Keir Fraser
@ 2006-08-29 0:48 ` John McCullough
2006-08-29 0:52 ` John McCullough
2006-08-29 2:22 ` John McCullough
0 siblings, 2 replies; 10+ messages in thread
From: John McCullough @ 2006-08-29 0:48 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel, John McCullough
[-- Attachment #1: Type: text/plain, Size: 1625 bytes --]
On Sun, Aug 27, 2006 at 03:57:06PM +0100, Keir Fraser wrote:
> On 26/8/06 9:32 pm, "John McCullough" <jmccullo@cs.ucsd.edu> wrote:
>
> > What is the intended behavior of watches on the XenStore? Should
> > only one watch be allowed on a given sub-hierarchy? Should the most
> > specific watch be triggered alone? Should all watches be triggered?
>
> I believe it's all supposed to work in a very obvious and simple way: All
> watches registered on a prefix of the updated node's path should be fired. A
> single transaction can fire the same watch multiple times if that watch is
> on a common prefix of a number of nodes updated by that transaction (since
> each firing event specifies the full path of the modified node, so events
> can't really be merged).
>
> If you observe different behaviour from this then it is most likely a bug
> and we would love to receive patches!
>
I am attaching a band-aid style patch for xswatch. I haven't dug very
far into the xenstore code yet, and I'm not sure how much time I have to
dedicate on this quite yet.
What this patch addresses is xswatch's tendency to receive watches for
non-xswatch created watches with those tokens. Is the indended behavior
of read_watch to pick up on all available watches and leave you to
discriminate which to service based on token?
Something that has recently perplexed me, is when using the watch during
the save/restore process, my handler won't receive watches where the
value written in the store has an underscore. In the shutdown
situation, the underscore value is passed. I am at a loss to guess why
this is happening.
-John
[-- Attachment #2: xswatch-fix.diff --]
[-- Type: text/plain, Size: 821 bytes --]
Only respond to watches that originated from xswatch.
diff -r ec03b24a2d83 tools/python/xen/xend/xenstore/xswatch.py
--- a/tools/python/xen/xend/xenstore/xswatch.py Tue Aug 15 19:53:55 2006 +0100
+++ b/tools/python/xen/xend/xenstore/xswatch.py Mon Aug 28 11:04:43 2006 -0700
@@ -63,9 +63,10 @@ def watchMain():
try:
we = xs.read_watch()
watch = we[1]
- res = watch.fn(we[0], *watch.args, **watch.kwargs)
- if not res:
- watch.unwatch()
+ if watch.__class__ == xswatch:
+ res = watch.fn(we[0], *watch.args, **watch.kwargs)
+ if not res:
+ watch.unwatch()
except:
log.exception("read_watch failed")
# Ignore this exception -- there's no point throwing it
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XenStore Watch Behavior
2006-08-29 0:48 ` John McCullough
@ 2006-08-29 0:52 ` John McCullough
2006-08-29 2:22 ` John McCullough
1 sibling, 0 replies; 10+ messages in thread
From: John McCullough @ 2006-08-29 0:52 UTC (permalink / raw)
To: Keir Fraser, John McCullough, xen-devel
On Mon, Aug 28, 2006 at 05:48:31PM -0700, John McCullough wrote:
> Something that has recently perplexed me, is when using the watch during
> the save/restore process, my handler won't receive watches where the
> value written in the store has an underscore. In the shutdown
> situation, the underscore value is passed. I am at a loss to guess why
> this is happening.
>
This may be imagined.
-John
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XenStore Watch Behavior
2006-08-29 0:48 ` John McCullough
2006-08-29 0:52 ` John McCullough
@ 2006-08-29 2:22 ` John McCullough
2006-08-29 6:27 ` Keir Fraser
2006-08-29 9:15 ` Ewan Mellor
1 sibling, 2 replies; 10+ messages in thread
From: John McCullough @ 2006-08-29 2:22 UTC (permalink / raw)
To: xen-devel
On Mon, Aug 28, 2006 at 05:48:31PM -0700, John McCullough wrote:
> On Sun, Aug 27, 2006 at 03:57:06PM +0100, Keir Fraser wrote:
> > On 26/8/06 9:32 pm, "John McCullough" <jmccullo@cs.ucsd.edu> wrote:
> >
> > > What is the intended behavior of watches on the XenStore? Should
> > > only one watch be allowed on a given sub-hierarchy? Should the most
> > > specific watch be triggered alone? Should all watches be triggered?
> >
> > I believe it's all supposed to work in a very obvious and simple way: All
> > watches registered on a prefix of the updated node's path should be fired. A
> > single transaction can fire the same watch multiple times if that watch is
> > on a common prefix of a number of nodes updated by that transaction (since
> > each firing event specifies the full path of the modified node, so events
> > can't really be merged).
> >
> > If you observe different behaviour from this then it is most likely a bug
> > and we would love to receive patches!
> >
>
> I am attaching a band-aid style patch for xswatch. I haven't dug very
> far into the xenstore code yet, and I'm not sure how much time I have to
> dedicate on this quite yet.
>
> What this patch addresses is xswatch's tendency to receive watches for
> non-xswatch created watches with those tokens. Is the indended behavior
> of read_watch to pick up on all available watches and leave you to
> discriminate which to service based on token?
>
Recently I discovered that my watch and the xswatch were receiving
alternating watches (both in python). Looking at xs_read_watch in
tools/xenstore/xs.c, the mutex on the xshandle forces all xs_read_watch
calls to take turns. Given that the python interface shares a single
xshandle, this prevents multiple watches.
Creating an entirely new xshandle for each use of read_watch works.
Moving to a model where the xsutil.xshandle() call creates a new
xshandle seems easily supportable, given that xswatch is primarily used,
and it keeps a reference to it's own handle.
Does anyone know of other xshandle() uses that warrant the current
behavior?
Regards,
John
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XenStore Watch Behavior
2006-08-29 2:22 ` John McCullough
@ 2006-08-29 6:27 ` Keir Fraser
2006-08-29 9:15 ` Ewan Mellor
1 sibling, 0 replies; 10+ messages in thread
From: Keir Fraser @ 2006-08-29 6:27 UTC (permalink / raw)
To: John McCullough, xen-devel
On 29/8/06 3:22 am, "John McCullough" <jmccullo@cs.ucsd.edu> wrote:
> Recently I discovered that my watch and the xswatch were receiving
> alternating watches (both in python). Looking at xs_read_watch in
> tools/xenstore/xs.c, the mutex on the xshandle forces all xs_read_watch
> calls to take turns. Given that the python interface shares a single
> xshandle, this prevents multiple watches.
>
> Creating an entirely new xshandle for each use of read_watch works.
> Moving to a model where the xsutil.xshandle() call creates a new
> xshandle seems easily supportable, given that xswatch is primarily used,
> and it keeps a reference to it's own handle.
>
> Does anyone know of other xshandle() uses that warrant the current
> behavior?
The current behaviour is broken (or, at least, the semantics really make no
sense at all) if multiple people create 'xs' objects in the same python
program. A good fix would be to move the handle allocation from
xshandle_init to xshandle_new. The latter function will have to create a new
container object to hold the handle value, rather than returning self.
Watches will then be registered and read in the isolated context of a
particular caller's object handle, rather than a bogus shared global context
of all users of the xs library.
This fix should then get things working for your code if you create yourself
an xs object separate from xswatch's. It only raises the question how you
then implement a central select loop in your python program that waits on
the various file handles or sockets created by the various xs objects.
-- Keir
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XenStore Watch Behavior
2006-08-29 2:22 ` John McCullough
2006-08-29 6:27 ` Keir Fraser
@ 2006-08-29 9:15 ` Ewan Mellor
2006-08-29 19:12 ` John McCullough
1 sibling, 1 reply; 10+ messages in thread
From: Ewan Mellor @ 2006-08-29 9:15 UTC (permalink / raw)
To: xen-devel
On Mon, Aug 28, 2006 at 07:22:52PM -0700, John McCullough wrote:
> On Mon, Aug 28, 2006 at 05:48:31PM -0700, John McCullough wrote:
> > On Sun, Aug 27, 2006 at 03:57:06PM +0100, Keir Fraser wrote:
> > > On 26/8/06 9:32 pm, "John McCullough" <jmccullo@cs.ucsd.edu> wrote:
> > >
> > > > What is the intended behavior of watches on the XenStore? Should
> > > > only one watch be allowed on a given sub-hierarchy? Should the most
> > > > specific watch be triggered alone? Should all watches be triggered?
> > >
> > > I believe it's all supposed to work in a very obvious and simple way: All
> > > watches registered on a prefix of the updated node's path should be fired. A
> > > single transaction can fire the same watch multiple times if that watch is
> > > on a common prefix of a number of nodes updated by that transaction (since
> > > each firing event specifies the full path of the modified node, so events
> > > can't really be merged).
> > >
> > > If you observe different behaviour from this then it is most likely a bug
> > > and we would love to receive patches!
> > >
> >
> > I am attaching a band-aid style patch for xswatch. I haven't dug very
> > far into the xenstore code yet, and I'm not sure how much time I have to
> > dedicate on this quite yet.
> >
> > What this patch addresses is xswatch's tendency to receive watches for
> > non-xswatch created watches with those tokens. Is the indended behavior
> > of read_watch to pick up on all available watches and leave you to
> > discriminate which to service based on token?
> >
>
> Recently I discovered that my watch and the xswatch were receiving
> alternating watches (both in python). Looking at xs_read_watch in
> tools/xenstore/xs.c, the mutex on the xshandle forces all xs_read_watch
> calls to take turns. Given that the python interface shares a single
> xshandle, this prevents multiple watches.
>
> Creating an entirely new xshandle for each use of read_watch works.
> Moving to a model where the xsutil.xshandle() call creates a new
> xshandle seems easily supportable, given that xswatch is primarily used,
> and it keeps a reference to it's own handle.
I'm confused as to what you're trying to do, so perhaps you could start again
at the top.
xswatch starts a thread, and that thread handles all calls to xs.read_watch,
and dispatches appropriate callbacks when the watch fires. I expect that you
would simply create a new instance of xswatch, and then everything else would
be handled for you. What's giving you problems?
Ewan.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XenStore Watch Behavior
2006-08-29 9:15 ` Ewan Mellor
@ 2006-08-29 19:12 ` John McCullough
2006-08-29 19:42 ` Ewan Mellor
0 siblings, 1 reply; 10+ messages in thread
From: John McCullough @ 2006-08-29 19:12 UTC (permalink / raw)
To: xen-devel
Ewan Mellor wrote:
> On Mon, Aug 28, 2006 at 07:22:52PM -0700, John McCullough wrote:
>
>> On Mon, Aug 28, 2006 at 05:48:31PM -0700, John McCullough wrote:
>>> On Sun, Aug 27, 2006 at 03:57:06PM +0100, Keir Fraser wrote:
>>>> On 26/8/06 9:32 pm, "John McCullough" <jmccullo@cs.ucsd.edu> wrote:
>>>>
>>>>> What is the intended behavior of watches on the XenStore? Should
>>>>> only one watch be allowed on a given sub-hierarchy? Should the most
>>>>> specific watch be triggered alone? Should all watches be triggered?
>>>> I believe it's all supposed to work in a very obvious and simple way: All
>>>> watches registered on a prefix of the updated node's path should be fired. A
>>>> single transaction can fire the same watch multiple times if that watch is
>>>> on a common prefix of a number of nodes updated by that transaction (since
>>>> each firing event specifies the full path of the modified node, so events
>>>> can't really be merged).
>>>>
>>>> If you observe different behaviour from this then it is most likely a bug
>>>> and we would love to receive patches!
>>>>
>>> I am attaching a band-aid style patch for xswatch. I haven't dug very
>>> far into the xenstore code yet, and I'm not sure how much time I have to
>>> dedicate on this quite yet.
>>>
>>> What this patch addresses is xswatch's tendency to receive watches for
>>> non-xswatch created watches with those tokens. Is the indended behavior
>>> of read_watch to pick up on all available watches and leave you to
>>> discriminate which to service based on token?
>>>
>> Recently I discovered that my watch and the xswatch were receiving
>> alternating watches (both in python). Looking at xs_read_watch in
>> tools/xenstore/xs.c, the mutex on the xshandle forces all xs_read_watch
>> calls to take turns. Given that the python interface shares a single
>> xshandle, this prevents multiple watches.
>>
>> Creating an entirely new xshandle for each use of read_watch works.
>> Moving to a model where the xsutil.xshandle() call creates a new
>> xshandle seems easily supportable, given that xswatch is primarily used,
>> and it keeps a reference to it's own handle.
>
> I'm confused as to what you're trying to do, so perhaps you could start again
> at the top.
>
> xswatch starts a thread, and that thread handles all calls to xs.read_watch,
> and dispatches appropriate callbacks when the watch fires. I expect that you
> would simply create a new instance of xswatch, and then everything else would
> be handled for you. What's giving you problems?
>From the top:
I am working on forking hvm domains. Part of this involves
communicating with the qemu-dm via the xenstore, because it is the most
readily available channel more complicated than the process signals used
for shutdown and save/restore (via Edwin Zhai's patch).
After getting an initial prototype working for the forking, I decided I
would try to create a general purpose communications channel that could
be used to communicate with qemu-dm. The general use case is sending a
command ("shutdown") and waiting for a completion notification
("shutdown_done"). I am currently using a pair of nodes, one for each
communication direction. I had initial difficulty in getting watches to
trigger, but I am not trying to solve that right now.
I initially used xswatch in conjunction with a semaphore so that I could
set a watch and block on the semaphore until the watch had triggered.
This worked in the general case. I decided that I would try to replace
the current domain destruction signal with the "shutdown" command over
the channel. I found that during the destruction sequence, my xswatch
watch was never getting triggered and the semaphore would never get
incremented and waiting for the completion notification would block
indefinitely.
At this point I started looking at xswatch and I thought, unaware of the
xshandle behavior, that I could just use xs.read_watch and achieve
blocking without the use of a semaphore. So I followed that path and
arrived at the problem with a single xshandle and multiple read_watch
behavior.
Keir Fraser wrote:
> The current behaviour is broken (or, at least, the semantics really make no
> sense at all) if multiple people create 'xs' objects in the same python
> program. A good fix would be to move the handle allocation from
> xshandle_init to xshandle_new. The latter function will have to create a new
> container object to hold the handle value, rather than returning self.
> Watches will then be registered and read in the isolated context of a
> particular caller's object handle, rather than a bogus shared global context
> of all users of the xs library.
>
> This fix should then get things working for your code if you create yourself
> an xs object separate from xswatch's. It only raises the question how you
> then implement a central select loop in your python program that waits on
> the various file handles or sockets created by the various xs objects.
When I began I had to try to extract the semantics from the code. I
wrote the API section in
http://wiki.xensource.com/xenwiki/XenStoreReference which needs to be
fixed and better explained. Once we establish what the correct usage
pattern is I will try to reproduce it on the wiki page.
If I use an independently created xshandle in my blocking communication
channel code, it works in all cases. If I use the xswatch method, it is
failing in the destruction case.
If the usage model that is desired is to use a single xshandle in a
given process, then we should change the semantics and/or document the
relevant functions. Also, I would like to find out why my watch is not
executing in the destruction case.
A distilled version of the debugging log that I have is:
(XendDomainInfo:1424) XendDomainInfo.destroyDomain(6)
(xswatch:65) xswatch triggered on @releaseDomain
(image:397) hvm shutdown watch unregistered
(xsblockingchannel:79) waitFor executes and blocks
I haven't been able to get xswatch to trigger on any further writes to
my node in the xenstore via xenstore-write. My only guess is that
during the domain destruction that all watches within a domain's path
are unwatched. The surface-level solution that I can think of is to
move the qemu-dm/image destruction earlier in the domain destruction
process. Are there other solutions?
Regards,
John McCullough
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XenStore Watch Behavior
2006-08-29 19:12 ` John McCullough
@ 2006-08-29 19:42 ` Ewan Mellor
2006-08-29 23:35 ` John McCullough
0 siblings, 1 reply; 10+ messages in thread
From: Ewan Mellor @ 2006-08-29 19:42 UTC (permalink / raw)
To: John McCullough; +Cc: xen-devel
On Tue, Aug 29, 2006 at 12:12:11PM -0700, John McCullough wrote:
> When I began I had to try to extract the semantics from the code.
Yes, that's quite a common thing to need to do at the moment! Thanks for all
your efforts in documentation -- it's appreciated.
> I wrote the API section in
> http://wiki.xensource.com/xenwiki/XenStoreReference which needs to be fixed
> and better explained. Once we establish what the correct usage pattern is I
> will try to reproduce it on the wiki page.
>
> If I use an independently created xshandle in my blocking communication
> channel code, it works in all cases. If I use the xswatch method, it is
> failing in the destruction case.
>
> If the usage model that is desired is to use a single xshandle in a
> given process, then we should change the semantics and/or document the
> relevant functions. Also, I would like to find out why my watch is not
> executing in the destruction case.
>
> A distilled version of the debugging log that I have is:
> (XendDomainInfo:1424) XendDomainInfo.destroyDomain(6)
> (xswatch:65) xswatch triggered on @releaseDomain
> (image:397) hvm shutdown watch unregistered
> (xsblockingchannel:79) waitFor executes and blocks
Can I see the code? This doesn't mean an awful lot without seeing what you've
changed.
> I haven't been able to get xswatch to trigger on any further writes to
> my node in the xenstore via xenstore-write. My only guess is that
> during the domain destruction that all watches within a domain's path
> are unwatched.
You will certainly lose a watch on anything in the domain's path eventually,
because Xend and the hotplug scripts will be cleaning up behind the domain.
You should get one final watch fired when the path disappears.
> The surface-level solution that I can think of is to
> move the qemu-dm/image destruction earlier in the domain destruction
> process. Are there other solutions?
If you want to have data that outlive the domain (I presume in your case for
just a short while) then you should put them somewhere other than
/local/domain. There is a /tool/<yournamehere> hierarchy reserved for
third-party tools, if that suits you better. You would then have to handle
all the sweep-up yourself of course.
In your case, couldn't you just release the semaphore off the @releaseDomain
watch? Don't forget, domains can spontaneously self-destruct, maybe even
half-way between your "shutdown" and "shutdown_done", so you need to be able
to unconditionally abort and release locks.
Ewan.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XenStore Watch Behavior
2006-08-29 19:42 ` Ewan Mellor
@ 2006-08-29 23:35 ` John McCullough
0 siblings, 0 replies; 10+ messages in thread
From: John McCullough @ 2006-08-29 23:35 UTC (permalink / raw)
To: xen-devel
Ewan Mellor wrote:
> If you want to have data that outlive the domain (I presume in your case for
> just a short while) then you should put them somewhere other than
> /local/domain. There is a /tool/<yournamehere> hierarchy reserved for
> third-party tools, if that suits you better. You would then have to handle
> all the sweep-up yourself of course.
>
> In your case, couldn't you just release the semaphore off the @releaseDomain
> watch? Don't forget, domains can spontaneously self-destruct, maybe even
> half-way between your "shutdown" and "shutdown_done", so you need to be able
> to unconditionally abort and release locks.
I am getting the same behavior with xswatch when watching on /tool/blah
as with the /local/domain/%u/blah. The watch I added to @releaseDomain
is also not getting triggered.
Removing the wait for the shutdown_done allows it to come to completion.
I think it may be the case that the initial @releaseDomain then
triggers the destroyDomain in XendDomain.py via refresh() which then
triggers the destroy() in image.py. Then, by blocking, we prevent the
original xswatch from coming to completion and block our own watch from
ever getting triggered.
My initial thought is to return to using a separately created xshandle
for my blocking channel.
If this is the case, how do we want to develop the semantics?
-John
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-08-29 23:35 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-26 20:32 XenStore Watch Behavior John McCullough
2006-08-27 14:57 ` Keir Fraser
2006-08-29 0:48 ` John McCullough
2006-08-29 0:52 ` John McCullough
2006-08-29 2:22 ` John McCullough
2006-08-29 6:27 ` Keir Fraser
2006-08-29 9:15 ` Ewan Mellor
2006-08-29 19:12 ` John McCullough
2006-08-29 19:42 ` Ewan Mellor
2006-08-29 23:35 ` John McCullough
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.