* xenfs: race condition on xenstore watch
@ 2013-05-15 16:51 Jonathan Davies
2013-05-16 8:26 ` Jan Beulich
2013-05-28 12:55 ` Konrad Rzeszutek Wilk
0 siblings, 2 replies; 7+ messages in thread
From: Jonathan Davies @ 2013-05-15 16:51 UTC (permalink / raw)
To: xen-devel
Dear xen-devel,
There's a race condition in xenfs (xenstore driver) that causes
userspace utility xenstore-watch to crash.
Normally, the userspace process gets an "OK" from xenfs and then the
watch fires immediately after. Occasionally, this happens the other way
around: the watch fires before the driver sends "OK", which confuses
the xenstore-watch client. It seems to me that the client is within its
rights to expect the "OK" first.
Here's what is happening:
The userspace process xenstore-watch writes to /proc/xen/xenbus with
msg_type==XS_WATCH. This is handled by xenbus_write_watch which calls
register_xenbus_watch with watch_fired as a callback *before* acquiring
the reply_mutex and sending the synthesised "OK" reply.
This gives a fast xenstore the opportunity to cause the watch_fired to
run (and briefly grab the reply_mutex for itself) before the fake "OK"
message is sent.
Below, I've included a putative patch for pre-3.3 xenfs that fixes this
problem. (It looks like the patch would also apply cleanly to
3.3-onwards xenbus_dev_frontend.c, but I haven't tried.) Any comments
about whether this is a reasonable approach?
A cursory glance at drivers/xen/xenbus/xenbus_dev.c suggests that it
suffers from the same problem. Although I haven't haven't tested this,
I'd expect that it requires a very similar solution.
Jonathan
Take the (non-reentrant) reply_mutex before calling
register_xenbus_watch to prevent the watch_fired callback from writing
anything until the "OK" has been sent.
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
diff -r 5ab1b4af1faf drivers/xen/xenfs/xenbus.c
--- a/drivers/xen/xenfs/xenbus.c Thu Dec 03 06:00:06 2009 +0000
+++ b/drivers/xen/xenfs/xenbus.c Wed May 15 17:24:47 2013 +0100
@@ -359,6 +359,8 @@ static int xenbus_write_watch(unsigned m
}
token++;
+ mutex_lock(&u->reply_mutex);
+
if (msg_type == XS_WATCH) {
watch = alloc_watch_adapter(path, token);
if (watch == NULL) {
@@ -401,12 +403,11 @@ static int xenbus_write_watch(unsigned m
"OK"
};
- mutex_lock(&u->reply_mutex);
rc = queue_reply(&u->read_buffers, &reply, sizeof(reply));
- mutex_unlock(&u->reply_mutex);
}
out:
+ mutex_unlock(&u->reply_mutex);
return rc;
}
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: xenfs: race condition on xenstore watch
2013-05-15 16:51 xenfs: race condition on xenstore watch Jonathan Davies
@ 2013-05-16 8:26 ` Jan Beulich
2013-05-28 12:55 ` Konrad Rzeszutek Wilk
1 sibling, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2013-05-16 8:26 UTC (permalink / raw)
To: Jonathan Davies; +Cc: xen-devel
>>> On 15.05.13 at 18:51, Jonathan Davies <jonathan.davies@citrix.com> wrote:
> The userspace process xenstore-watch writes to /proc/xen/xenbus with
> msg_type==XS_WATCH. This is handled by xenbus_write_watch which calls
> register_xenbus_watch with watch_fired as a callback *before* acquiring
> the reply_mutex and sending the synthesised "OK" reply.
>
> This gives a fast xenstore the opportunity to cause the watch_fired to
> run (and briefly grab the reply_mutex for itself) before the fake "OK"
> message is sent.
>
> Below, I've included a putative patch for pre-3.3 xenfs that fixes this
> problem. (It looks like the patch would also apply cleanly to
> 3.3-onwards xenbus_dev_frontend.c, but I haven't tried.) Any comments
> about whether this is a reasonable approach?
Yes, this looks reasonable at a first glance, pending confirmation
by someone involved in the original xenbus/xenstore design that
the expectation to see the watch firing only after the OK response
is a valid one.
There's an implementation problem though:
> --- a/drivers/xen/xenfs/xenbus.c Thu Dec 03 06:00:06 2009 +0000
> +++ b/drivers/xen/xenfs/xenbus.c Wed May 15 17:24:47 2013 +0100
> @@ -359,6 +359,8 @@ static int xenbus_write_watch(unsigned m
> }
> token++;
>
> + mutex_lock(&u->reply_mutex);
> +
Right above the initial patch context there is another "goto out",
so ...
> if (msg_type == XS_WATCH) {
> watch = alloc_watch_adapter(path, token);
> if (watch == NULL) {
> @@ -401,12 +403,11 @@ static int xenbus_write_watch(unsigned m
> "OK"
> };
>
> - mutex_lock(&u->reply_mutex);
> rc = queue_reply(&u->read_buffers, &reply, sizeof(reply));
> - mutex_unlock(&u->reply_mutex);
> }
>
> out:
> + mutex_unlock(&u->reply_mutex);
... this is not valid.
> return rc;
> }
>
Thus the unlock needs to become conditional here (either via
tracking the state in a local variable, or via adding a second label,
or via replacing the first goto with a straight return).
I'd also recommend to shrink the protected region to the minimum
possible (i.e. acquire the mutex right before the call to
register_xenbus_watch(), and at the end of the "else" body). Since
you then would end up with only a single error path needing to
drop the lock, dropping it in that error path rather than moving it
past the "out" label might be preferable.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: xenfs: race condition on xenstore watch
2013-05-15 16:51 xenfs: race condition on xenstore watch Jonathan Davies
2013-05-16 8:26 ` Jan Beulich
@ 2013-05-28 12:55 ` Konrad Rzeszutek Wilk
2013-05-31 13:45 ` Jan Beulich
2013-05-31 17:07 ` Jonathan Davies
1 sibling, 2 replies; 7+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-05-28 12:55 UTC (permalink / raw)
To: Jonathan Davies; +Cc: xen-devel
On Wed, May 15, 2013 at 05:51:06PM +0100, Jonathan Davies wrote:
> Dear xen-devel,
>
> There's a race condition in xenfs (xenstore driver) that causes
> userspace utility xenstore-watch to crash.
>
> Normally, the userspace process gets an "OK" from xenfs and then the
> watch fires immediately after. Occasionally, this happens the other way
> around: the watch fires before the driver sends "OK", which confuses
> the xenstore-watch client. It seems to me that the client is within its
> rights to expect the "OK" first.
>
> Here's what is happening:
>
> The userspace process xenstore-watch writes to /proc/xen/xenbus with
> msg_type==XS_WATCH. This is handled by xenbus_write_watch which calls
> register_xenbus_watch with watch_fired as a callback *before* acquiring
> the reply_mutex and sending the synthesised "OK" reply.
>
> This gives a fast xenstore the opportunity to cause the watch_fired to
> run (and briefly grab the reply_mutex for itself) before the fake "OK"
> message is sent.
>
> Below, I've included a putative patch for pre-3.3 xenfs that fixes this
> problem. (It looks like the patch would also apply cleanly to
> 3.3-onwards xenbus_dev_frontend.c, but I haven't tried.) Any comments
> about whether this is a reasonable approach?
It can't apply cleanly as the file moved :-(
>
> A cursory glance at drivers/xen/xenbus/xenbus_dev.c suggests that it
> suffers from the same problem. Although I haven't haven't tested this,
> I'd expect that it requires a very similar solution.
>
> Jonathan
>
>
> Take the (non-reentrant) reply_mutex before calling
> register_xenbus_watch to prevent the watch_fired callback from writing
> anything until the "OK" has been sent.
Should that perhaps be done inside the msg_type == XS_WATCH code (with a
bool that would determine whether an reply_mutex has been taken?)
As in, is there no need to take this mutex if msg_type != XS_WATCH?
>
> Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
>
> diff -r 5ab1b4af1faf drivers/xen/xenfs/xenbus.c
> --- a/drivers/xen/xenfs/xenbus.c Thu Dec 03 06:00:06 2009 +0000
> +++ b/drivers/xen/xenfs/xenbus.c Wed May 15 17:24:47 2013 +0100
> @@ -359,6 +359,8 @@ static int xenbus_write_watch(unsigned m
> }
> token++;
>
> + mutex_lock(&u->reply_mutex);
> +
Have you tested this with the kernel compiled with DEBUG_MUTEX and DEBUG_PROVE_LOCKING
to make sure there are no mutex/spinlock issues?
> if (msg_type == XS_WATCH) {
> watch = alloc_watch_adapter(path, token);
> if (watch == NULL) {
> @@ -401,12 +403,11 @@ static int xenbus_write_watch(unsigned m
> "OK"
> };
>
> - mutex_lock(&u->reply_mutex);
> rc = queue_reply(&u->read_buffers, &reply, sizeof(reply));
> - mutex_unlock(&u->reply_mutex);
> }
>
> out:
> + mutex_unlock(&u->reply_mutex);
> return rc;
> }
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: xenfs: race condition on xenstore watch
2013-05-28 12:55 ` Konrad Rzeszutek Wilk
@ 2013-05-31 13:45 ` Jan Beulich
2013-05-31 17:07 ` Jonathan Davies
2013-05-31 17:07 ` Jonathan Davies
1 sibling, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2013-05-31 13:45 UTC (permalink / raw)
To: Jonathan Davies, Konrad Rzeszutek Wilk; +Cc: xen-devel
>>> On 28.05.13 at 14:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Wed, May 15, 2013 at 05:51:06PM +0100, Jonathan Davies wrote:
>> Normally, the userspace process gets an "OK" from xenfs and then the
>> watch fires immediately after. Occasionally, this happens the other way
>> around: the watch fires before the driver sends "OK", which confuses
>> the xenstore-watch client. It seems to me that the client is within its
>> rights to expect the "OK" first.
Now that I look at this another time, I wonder how this behavior
can be guaranteed even with your patch: xenbus_write_watch()
sends the reply by calling queue_reply() followed by wake_up(),
so the reply getting delivered and the watch firing appear to be
inherently asynchronous anyway, and your patch just reduces
the race window. Am I overlooking something here?
>> Take the (non-reentrant) reply_mutex before calling
>> register_xenbus_watch to prevent the watch_fired callback from writing
>> anything until the "OK" has been sent.
>
> Should that perhaps be done inside the msg_type == XS_WATCH code (with a
> bool that would determine whether an reply_mutex has been taken?)
>
> As in, is there no need to take this mutex if msg_type != XS_WATCH?
As said in an earlier reply, I also think that it would be preferable
to shrink the locked region as much as possible. But there clearly
is a need to also acquire the mutex for msg_type != XS_WATCH,
just not before the loop in the else branch.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: xenfs: race condition on xenstore watch
2013-05-31 13:45 ` Jan Beulich
@ 2013-05-31 17:07 ` Jonathan Davies
2013-06-03 7:57 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Davies @ 2013-05-31 17:07 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Konrad Rzeszutek Wilk
On 31/05/13 14:45, Jan Beulich wrote:
>>>> On 28.05.13 at 14:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> On Wed, May 15, 2013 at 05:51:06PM +0100, Jonathan Davies wrote:
>>> Normally, the userspace process gets an "OK" from xenfs and then the
>>> watch fires immediately after. Occasionally, this happens the other way
>>> around: the watch fires before the driver sends "OK", which confuses
>>> the xenstore-watch client. It seems to me that the client is within its
>>> rights to expect the "OK" first.
>
> Now that I look at this another time, I wonder how this behavior
> can be guaranteed even with your patch: xenbus_write_watch()
> sends the reply by calling queue_reply() followed by wake_up(),
> so the reply getting delivered and the watch firing appear to be
> inherently asynchronous anyway, and your patch just reduces
> the race window. Am I overlooking something here?
There's no call to wake_up() in xenbus_write_watch(), as far as I can see...
The key thing is to guarantee that xenbus_write_watch()'s call to
queue_reply() (for the "OK" message) executes before the watch_fired()
callback -- registered by the call to register_xenbus_watch() -- could be
called-back. Given that watch_fired() also takes the reply_mutex before its
own calls to queue_reply() and wake_up(), this behaviour can be guaranteed
by taking the reply_mutex before registering the callback.
>>> Take the (non-reentrant) reply_mutex before calling
>>> register_xenbus_watch to prevent the watch_fired callback from writing
>>> anything until the "OK" has been sent.
>>
>> Should that perhaps be done inside the msg_type == XS_WATCH code (with a
>> bool that would determine whether an reply_mutex has been taken?)
>>
>> As in, is there no need to take this mutex if msg_type != XS_WATCH?
>
> As said in an earlier reply, I also think that it would be preferable
> to shrink the locked region as much as possible.
v2 of the patch will have a much tighter locked region. I'll post that once
I've tested it.
Thanks,
Jonathan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: xenfs: race condition on xenstore watch
2013-05-31 17:07 ` Jonathan Davies
@ 2013-06-03 7:57 ` Jan Beulich
0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2013-06-03 7:57 UTC (permalink / raw)
To: Jonathan Davies; +Cc: Konrad Rzeszutek Wilk, xen-devel
>>> On 31.05.13 at 19:07, Jonathan Davies <jonathan.davies@citrix.com> wrote:
> On 31/05/13 14:45, Jan Beulich wrote:
>>>>> On 28.05.13 at 14:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>>> On Wed, May 15, 2013 at 05:51:06PM +0100, Jonathan Davies wrote:
>>>> Normally, the userspace process gets an "OK" from xenfs and then the
>>>> watch fires immediately after. Occasionally, this happens the other way
>>>> around: the watch fires before the driver sends "OK", which confuses
>>>> the xenstore-watch client. It seems to me that the client is within its
>>>> rights to expect the "OK" first.
>>
>> Now that I look at this another time, I wonder how this behavior
>> can be guaranteed even with your patch: xenbus_write_watch()
>> sends the reply by calling queue_reply() followed by wake_up(),
>> so the reply getting delivered and the watch firing appear to be
>> inherently asynchronous anyway, and your patch just reduces
>> the race window. Am I overlooking something here?
>
> There's no call to wake_up() in xenbus_write_watch(), as far as I can see...
I'm not sure where you're looking, but both xenfs/xenbus.c and
the more recent xenbus/xenbus_dev_frontend.c have this
mutex_lock(&u->reply_mutex);
rc = queue_reply(&u->read_buffers, &reply, sizeof(reply));
wake_up(&u->read_waitq);
mutex_unlock(&u->reply_mutex);
near the end of xenbus_write_watch().
> The key thing is to guarantee that xenbus_write_watch()'s call to
> queue_reply() (for the "OK" message) executes before the watch_fired()
> callback -- registered by the call to register_xenbus_watch() -- could be
> called-back. Given that watch_fired() also takes the reply_mutex before its
> own calls to queue_reply() and wake_up(), this behaviour can be guaranteed
> by taking the reply_mutex before registering the callback.
Okay, I agree after taking another closer look - the very different
variable naming obfuscate this a little. And inn fact the mutex
appears to get acquired too early in watch_fired - it could be
confined to just the list_splice_tail() (and maybe the wake_up(), if
the really requires external synchronization) invocation.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: xenfs: race condition on xenstore watch
2013-05-28 12:55 ` Konrad Rzeszutek Wilk
2013-05-31 13:45 ` Jan Beulich
@ 2013-05-31 17:07 ` Jonathan Davies
1 sibling, 0 replies; 7+ messages in thread
From: Jonathan Davies @ 2013-05-31 17:07 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: Jan Beulich, xen-devel
On 28/05/13 13:55, Konrad Rzeszutek Wilk wrote:
> On Wed, May 15, 2013 at 05:51:06PM +0100, Jonathan Davies wrote:
>> Dear xen-devel,
>>
>> There's a race condition in xenfs (xenstore driver) that causes
>> userspace utility xenstore-watch to crash.
>>
>> Normally, the userspace process gets an "OK" from xenfs and then the
>> watch fires immediately after. Occasionally, this happens the other way
>> around: the watch fires before the driver sends "OK", which confuses
>> the xenstore-watch client. It seems to me that the client is within its
>> rights to expect the "OK" first.
>>
>> Here's what is happening:
>>
>> The userspace process xenstore-watch writes to /proc/xen/xenbus with
>> msg_type==XS_WATCH. This is handled by xenbus_write_watch which calls
>> register_xenbus_watch with watch_fired as a callback *before* acquiring
>> the reply_mutex and sending the synthesised "OK" reply.
>>
>> This gives a fast xenstore the opportunity to cause the watch_fired to
>> run (and briefly grab the reply_mutex for itself) before the fake "OK"
>> message is sent.
>>
>> Below, I've included a putative patch for pre-3.3 xenfs that fixes this
>> problem. (It looks like the patch would also apply cleanly to
>> 3.3-onwards xenbus_dev_frontend.c, but I haven't tried.) Any comments
>> about whether this is a reasonable approach?
>
> It can't apply cleanly as the file moved :-(
Well, it applies cleanly if you rename the affected file to
drivers/xen/xenbus/xenbus_dev_frontend.c and can tolerate a 15-line
offset... :-)
>> A cursory glance at drivers/xen/xenbus/xenbus_dev.c suggests that it
>> suffers from the same problem. Although I haven't haven't tested this,
>> I'd expect that it requires a very similar solution.
>>
>> Jonathan
>>
>>
>> Take the (non-reentrant) reply_mutex before calling
>> register_xenbus_watch to prevent the watch_fired callback from writing
>> anything until the "OK" has been sent.
>
> Should that perhaps be done inside the msg_type == XS_WATCH code (with a
> bool that would determine whether an reply_mutex has been taken?)
>
> As in, is there no need to take this mutex if msg_type != XS_WATCH?
When msg_type != XS_WATCH, we only need to take the mutex briefly when
sending the "OK". There's no issue with XS_UNWATCH because there's never
anything further to be written. v2 of the patch won't hold the lock while
doing the work for XS_UNWATCH.
>> Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
>>
>> diff -r 5ab1b4af1faf drivers/xen/xenfs/xenbus.c
>> --- a/drivers/xen/xenfs/xenbus.c Thu Dec 03 06:00:06 2009 +0000
>> +++ b/drivers/xen/xenfs/xenbus.c Wed May 15 17:24:47 2013 +0100
>> @@ -359,6 +359,8 @@ static int xenbus_write_watch(unsigned m
>> }
>> token++;
>>
>> + mutex_lock(&u->reply_mutex);
>> +
>
> Have you tested this with the kernel compiled with DEBUG_MUTEX and DEBUG_PROVE_LOCKING
> to make sure there are no mutex/spinlock issues?
No. I will give that a try.
>> if (msg_type == XS_WATCH) {
>> watch = alloc_watch_adapter(path, token);
>> if (watch == NULL) {
>> @@ -401,12 +403,11 @@ static int xenbus_write_watch(unsigned m
>> "OK"
>> };
>>
>> - mutex_lock(&u->reply_mutex);
>> rc = queue_reply(&u->read_buffers, &reply, sizeof(reply));
>> - mutex_unlock(&u->reply_mutex);
>> }
>>
>> out:
>> + mutex_unlock(&u->reply_mutex);
>> return rc;
>> }
Thanks for the feedback,
Jonathan
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-06-03 7:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-15 16:51 xenfs: race condition on xenstore watch Jonathan Davies
2013-05-16 8:26 ` Jan Beulich
2013-05-28 12:55 ` Konrad Rzeszutek Wilk
2013-05-31 13:45 ` Jan Beulich
2013-05-31 17:07 ` Jonathan Davies
2013-06-03 7:57 ` Jan Beulich
2013-05-31 17:07 ` Jonathan Davies
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).