xenfs: race condition on xenstore watch

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* xenfs: race condition on xenstore watch
@ 2013-05-15 16:51 Jonathan Davies
  2013-05-16  8:26 ` Jan Beulich
  2013-05-28 12:55 ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 7+ messages in thread
From: Jonathan Davies @ 2013-05-15 16:51 UTC (permalink / raw)
  To: xen-devel

Dear xen-devel,

There's a race condition in xenfs (xenstore driver) that causes
userspace utility xenstore-watch to crash.

Normally, the userspace process gets an "OK" from xenfs and then the
watch fires immediately after. Occasionally, this happens the other way
around: the watch fires before the driver sends "OK", which confuses
the xenstore-watch client. It seems to me that the client is within its
rights to expect the "OK" first.

Here's what is happening:

The userspace process xenstore-watch writes to /proc/xen/xenbus with
msg_type==XS_WATCH. This is handled by xenbus_write_watch which calls
register_xenbus_watch with watch_fired as a callback *before* acquiring
the reply_mutex and sending the synthesised "OK" reply.

This gives a fast xenstore the opportunity to cause the watch_fired to
run (and briefly grab the reply_mutex for itself) before the fake "OK"
message is sent.

Below, I've included a putative patch for pre-3.3 xenfs that fixes this
problem. (It looks like the patch would also apply cleanly to
3.3-onwards xenbus_dev_frontend.c, but I haven't tried.) Any comments
about whether this is a reasonable approach?

A cursory glance at drivers/xen/xenbus/xenbus_dev.c suggests that it
suffers from the same problem. Although I haven't haven't tested this,
I'd expect that it requires a very similar solution.

Jonathan

Take the (non-reentrant) reply_mutex before calling
register_xenbus_watch to prevent the watch_fired callback from writing
anything until the "OK" has been sent.

Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>

diff -r 5ab1b4af1faf drivers/xen/xenfs/xenbus.c
--- a/drivers/xen/xenfs/xenbus.c	Thu Dec 03 06:00:06 2009 +0000
+++ b/drivers/xen/xenfs/xenbus.c	Wed May 15 17:24:47 2013 +0100
@@ -359,6 +359,8 @@ static int xenbus_write_watch(unsigned m
  	}
  	token++;

+	mutex_lock(&u->reply_mutex);
+
  	if (msg_type == XS_WATCH) {
  		watch = alloc_watch_adapter(path, token);
  		if (watch == NULL) {
@@ -401,12 +403,11 @@ static int xenbus_write_watch(unsigned m
  			"OK"
  		};

-		mutex_lock(&u->reply_mutex);
  		rc = queue_reply(&u->read_buffers, &reply, sizeof(reply));
-		mutex_unlock(&u->reply_mutex);
  	}

  out:
+	mutex_unlock(&u->reply_mutex);
  	return rc;
  }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xenfs: race condition on xenstore watch
  2013-05-15 16:51 xenfs: race condition on xenstore watch Jonathan Davies
@ 2013-05-16  8:26 ` Jan Beulich
  2013-05-28 12:55 ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2013-05-16  8:26 UTC (permalink / raw)
  To: Jonathan Davies; +Cc: xen-devel

>>> On 15.05.13 at 18:51, Jonathan Davies <jonathan.davies@citrix.com> wrote:
> The userspace process xenstore-watch writes to /proc/xen/xenbus with
> msg_type==XS_WATCH. This is handled by xenbus_write_watch which calls
> register_xenbus_watch with watch_fired as a callback *before* acquiring
> the reply_mutex and sending the synthesised "OK" reply.
> 
> This gives a fast xenstore the opportunity to cause the watch_fired to
> run (and briefly grab the reply_mutex for itself) before the fake "OK"
> message is sent.
> 
> Below, I've included a putative patch for pre-3.3 xenfs that fixes this
> problem. (It looks like the patch would also apply cleanly to
> 3.3-onwards xenbus_dev_frontend.c, but I haven't tried.) Any comments
> about whether this is a reasonable approach?

Yes, this looks reasonable at a first glance, pending confirmation
by someone involved in the original xenbus/xenstore design that
the expectation to see the watch firing only after the OK response
is a valid one.

There's an implementation problem though:

> --- a/drivers/xen/xenfs/xenbus.c	Thu Dec 03 06:00:06 2009 +0000
> +++ b/drivers/xen/xenfs/xenbus.c	Wed May 15 17:24:47 2013 +0100
> @@ -359,6 +359,8 @@ static int xenbus_write_watch(unsigned m
>   	}
>   	token++;
> 
> +	mutex_lock(&u->reply_mutex);
> +

Right above the initial patch context there is another "goto out",
so ...

>   	if (msg_type == XS_WATCH) {
>   		watch = alloc_watch_adapter(path, token);
>   		if (watch == NULL) {
> @@ -401,12 +403,11 @@ static int xenbus_write_watch(unsigned m
>   			"OK"
>   		};
> 
> -		mutex_lock(&u->reply_mutex);
>   		rc = queue_reply(&u->read_buffers, &reply, sizeof(reply));
> -		mutex_unlock(&u->reply_mutex);
>   	}
> 
>   out:
> +	mutex_unlock(&u->reply_mutex);

... this is not valid.

>   	return rc;
>   }
> 

Thus the unlock needs to become conditional here (either via
tracking the state in a local variable, or via adding a second label,
or via replacing the first goto with a straight return).

I'd also recommend to shrink the protected region to the minimum
possible (i.e. acquire the mutex right before the call to
register_xenbus_watch(), and at the end of the "else" body). Since
you then would end up with only a single error path needing to
drop the lock, dropping it in that error path rather than moving it
past the "out" label might be preferable.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xenfs: race condition on xenstore watch
  2013-05-15 16:51 xenfs: race condition on xenstore watch Jonathan Davies
  2013-05-16  8:26 ` Jan Beulich
@ 2013-05-28 12:55 ` Konrad Rzeszutek Wilk
  2013-05-31 13:45   ` Jan Beulich
  2013-05-31 17:07   ` Jonathan Davies
  1 sibling, 2 replies; 7+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-05-28 12:55 UTC (permalink / raw)
  To: Jonathan Davies; +Cc: xen-devel

On Wed, May 15, 2013 at 05:51:06PM +0100, Jonathan Davies wrote:
> Dear xen-devel,
> 
> There's a race condition in xenfs (xenstore driver) that causes
> userspace utility xenstore-watch to crash.
> 
> Normally, the userspace process gets an "OK" from xenfs and then the
> watch fires immediately after. Occasionally, this happens the other way
> around: the watch fires before the driver sends "OK", which confuses
> the xenstore-watch client. It seems to me that the client is within its
> rights to expect the "OK" first.
> 
> Here's what is happening:
> 
> The userspace process xenstore-watch writes to /proc/xen/xenbus with
> msg_type==XS_WATCH. This is handled by xenbus_write_watch which calls
> register_xenbus_watch with watch_fired as a callback *before* acquiring
> the reply_mutex and sending the synthesised "OK" reply.
> 
> This gives a fast xenstore the opportunity to cause the watch_fired to
> run (and briefly grab the reply_mutex for itself) before the fake "OK"
> message is sent.
> 
> Below, I've included a putative patch for pre-3.3 xenfs that fixes this
> problem. (It looks like the patch would also apply cleanly to
> 3.3-onwards xenbus_dev_frontend.c, but I haven't tried.) Any comments
> about whether this is a reasonable approach?

It can't apply cleanly as the file moved :-(

> 
> A cursory glance at drivers/xen/xenbus/xenbus_dev.c suggests that it
> suffers from the same problem. Although I haven't haven't tested this,
> I'd expect that it requires a very similar solution.
> 
> Jonathan
> 
> 
> Take the (non-reentrant) reply_mutex before calling
> register_xenbus_watch to prevent the watch_fired callback from writing
> anything until the "OK" has been sent.

Should that perhaps be done inside the msg_type == XS_WATCH code (with a
bool that would determine whether an reply_mutex has been taken?)

As in, is there no need to take this mutex if msg_type != XS_WATCH?

> 
> Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
> 
> diff -r 5ab1b4af1faf drivers/xen/xenfs/xenbus.c
> --- a/drivers/xen/xenfs/xenbus.c	Thu Dec 03 06:00:06 2009 +0000
> +++ b/drivers/xen/xenfs/xenbus.c	Wed May 15 17:24:47 2013 +0100
> @@ -359,6 +359,8 @@ static int xenbus_write_watch(unsigned m
>  	}
>  	token++;
> 
> +	mutex_lock(&u->reply_mutex);
> +

Have you tested this with the kernel compiled with DEBUG_MUTEX and DEBUG_PROVE_LOCKING
to make sure there are no mutex/spinlock issues?

>  	if (msg_type == XS_WATCH) {
>  		watch = alloc_watch_adapter(path, token);
>  		if (watch == NULL) {
> @@ -401,12 +403,11 @@ static int xenbus_write_watch(unsigned m
>  			"OK"
>  		};
> 
> -		mutex_lock(&u->reply_mutex);
>  		rc = queue_reply(&u->read_buffers, &reply, sizeof(reply));
> -		mutex_unlock(&u->reply_mutex);
>  	}
> 
>  out:
> +	mutex_unlock(&u->reply_mutex);
>  	return rc;
>  }
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xenfs: race condition on xenstore watch
  2013-05-28 12:55 ` Konrad Rzeszutek Wilk
@ 2013-05-31 13:45   ` Jan Beulich
  2013-05-31 17:07     ` Jonathan Davies
  2013-05-31 17:07   ` Jonathan Davies
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2013-05-31 13:45 UTC (permalink / raw)
  To: Jonathan Davies, Konrad Rzeszutek Wilk; +Cc: xen-devel

>>> On 28.05.13 at 14:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Wed, May 15, 2013 at 05:51:06PM +0100, Jonathan Davies wrote:
>> Normally, the userspace process gets an "OK" from xenfs and then the
>> watch fires immediately after. Occasionally, this happens the other way
>> around: the watch fires before the driver sends "OK", which confuses
>> the xenstore-watch client. It seems to me that the client is within its
>> rights to expect the "OK" first.

Now that I look at this another time, I wonder how this behavior
can be guaranteed even with your patch: xenbus_write_watch()
sends the reply by calling queue_reply() followed by wake_up(),
so the reply getting delivered and the watch firing appear to be
inherently asynchronous anyway, and your patch just reduces
the race window. Am I overlooking something here?

>> Take the (non-reentrant) reply_mutex before calling
>> register_xenbus_watch to prevent the watch_fired callback from writing
>> anything until the "OK" has been sent.
> 
> Should that perhaps be done inside the msg_type == XS_WATCH code (with a
> bool that would determine whether an reply_mutex has been taken?)
> 
> As in, is there no need to take this mutex if msg_type != XS_WATCH?

As said in an earlier reply, I also think that it would be preferable
to shrink the locked region as much as possible. But there clearly
is a need to also acquire the mutex for msg_type != XS_WATCH,
just not before the loop in the else branch.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xenfs: race condition on xenstore watch
  2013-05-31 13:45   ` Jan Beulich
@ 2013-05-31 17:07     ` Jonathan Davies
  2013-06-03  7:57       ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Davies @ 2013-05-31 17:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Konrad Rzeszutek Wilk

On 31/05/13 14:45, Jan Beulich wrote:
>>>> On 28.05.13 at 14:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> On Wed, May 15, 2013 at 05:51:06PM +0100, Jonathan Davies wrote:
>>> Normally, the userspace process gets an "OK" from xenfs and then the
>>> watch fires immediately after. Occasionally, this happens the other way
>>> around: the watch fires before the driver sends "OK", which confuses
>>> the xenstore-watch client. It seems to me that the client is within its
>>> rights to expect the "OK" first.
>
> Now that I look at this another time, I wonder how this behavior
> can be guaranteed even with your patch: xenbus_write_watch()
> sends the reply by calling queue_reply() followed by wake_up(),
> so the reply getting delivered and the watch firing appear to be
> inherently asynchronous anyway, and your patch just reduces
> the race window. Am I overlooking something here?

There's no call to wake_up() in xenbus_write_watch(), as far as I can see...

The key thing is to guarantee that xenbus_write_watch()'s call to 
queue_reply() (for the "OK" message) executes before the watch_fired() 
callback -- registered by the call to register_xenbus_watch() -- could be 
called-back. Given that watch_fired() also takes the reply_mutex before its 
own calls to queue_reply() and wake_up(), this behaviour can be guaranteed 
by taking the reply_mutex before registering the callback.

>>> Take the (non-reentrant) reply_mutex before calling
>>> register_xenbus_watch to prevent the watch_fired callback from writing
>>> anything until the "OK" has been sent.
>>
>> Should that perhaps be done inside the msg_type == XS_WATCH code (with a
>> bool that would determine whether an reply_mutex has been taken?)
>>
>> As in, is there no need to take this mutex if msg_type != XS_WATCH?
>
> As said in an earlier reply, I also think that it would be preferable
> to shrink the locked region as much as possible.

v2 of the patch will have a much tighter locked region. I'll post that once 
I've tested it.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xenfs: race condition on xenstore watch
  2013-05-31 17:07     ` Jonathan Davies
@ 2013-06-03  7:57       ` Jan Beulich
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2013-06-03  7:57 UTC (permalink / raw)
  To: Jonathan Davies; +Cc: Konrad Rzeszutek Wilk, xen-devel

>>> On 31.05.13 at 19:07, Jonathan Davies <jonathan.davies@citrix.com> wrote:
> On 31/05/13 14:45, Jan Beulich wrote:
>>>>> On 28.05.13 at 14:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>>> On Wed, May 15, 2013 at 05:51:06PM +0100, Jonathan Davies wrote:
>>>> Normally, the userspace process gets an "OK" from xenfs and then the
>>>> watch fires immediately after. Occasionally, this happens the other way
>>>> around: the watch fires before the driver sends "OK", which confuses
>>>> the xenstore-watch client. It seems to me that the client is within its
>>>> rights to expect the "OK" first.
>>
>> Now that I look at this another time, I wonder how this behavior
>> can be guaranteed even with your patch: xenbus_write_watch()
>> sends the reply by calling queue_reply() followed by wake_up(),
>> so the reply getting delivered and the watch firing appear to be
>> inherently asynchronous anyway, and your patch just reduces
>> the race window. Am I overlooking something here?
> 
> There's no call to wake_up() in xenbus_write_watch(), as far as I can see...

I'm not sure where you're looking, but both xenfs/xenbus.c and
the more recent xenbus/xenbus_dev_frontend.c have this

		mutex_lock(&u->reply_mutex);
		rc = queue_reply(&u->read_buffers, &reply, sizeof(reply));
		wake_up(&u->read_waitq);
		mutex_unlock(&u->reply_mutex);

near the end of xenbus_write_watch().

> The key thing is to guarantee that xenbus_write_watch()'s call to 
> queue_reply() (for the "OK" message) executes before the watch_fired() 
> callback -- registered by the call to register_xenbus_watch() -- could be 
> called-back. Given that watch_fired() also takes the reply_mutex before its 
> own calls to queue_reply() and wake_up(), this behaviour can be guaranteed 
> by taking the reply_mutex before registering the callback.

Okay, I agree after taking another closer look - the very different
variable naming obfuscate this a little. And inn fact the mutex
appears to get acquired too early in watch_fired - it could be
confined to just the list_splice_tail() (and maybe the wake_up(), if
the really requires external synchronization) invocation.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xenfs: race condition on xenstore watch
  2013-05-28 12:55 ` Konrad Rzeszutek Wilk
  2013-05-31 13:45   ` Jan Beulich
@ 2013-05-31 17:07   ` Jonathan Davies
  1 sibling, 0 replies; 7+ messages in thread
From: Jonathan Davies @ 2013-05-31 17:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Jan Beulich, xen-devel

On 28/05/13 13:55, Konrad Rzeszutek Wilk wrote:
> On Wed, May 15, 2013 at 05:51:06PM +0100, Jonathan Davies wrote:
>> Dear xen-devel,
>>
>> There's a race condition in xenfs (xenstore driver) that causes
>> userspace utility xenstore-watch to crash.
>>
>> Normally, the userspace process gets an "OK" from xenfs and then the
>> watch fires immediately after. Occasionally, this happens the other way
>> around: the watch fires before the driver sends "OK", which confuses
>> the xenstore-watch client. It seems to me that the client is within its
>> rights to expect the "OK" first.
>>
>> Here's what is happening:
>>
>> The userspace process xenstore-watch writes to /proc/xen/xenbus with
>> msg_type==XS_WATCH. This is handled by xenbus_write_watch which calls
>> register_xenbus_watch with watch_fired as a callback *before* acquiring
>> the reply_mutex and sending the synthesised "OK" reply.
>>
>> This gives a fast xenstore the opportunity to cause the watch_fired to
>> run (and briefly grab the reply_mutex for itself) before the fake "OK"
>> message is sent.
>>
>> Below, I've included a putative patch for pre-3.3 xenfs that fixes this
>> problem. (It looks like the patch would also apply cleanly to
>> 3.3-onwards xenbus_dev_frontend.c, but I haven't tried.) Any comments
>> about whether this is a reasonable approach?
>
> It can't apply cleanly as the file moved :-(

Well, it applies cleanly if you rename the affected file to 
drivers/xen/xenbus/xenbus_dev_frontend.c and can tolerate a 15-line 
offset... :-)

>> A cursory glance at drivers/xen/xenbus/xenbus_dev.c suggests that it
>> suffers from the same problem. Although I haven't haven't tested this,
>> I'd expect that it requires a very similar solution.
>>
>> Jonathan
>>
>>
>> Take the (non-reentrant) reply_mutex before calling
>> register_xenbus_watch to prevent the watch_fired callback from writing
>> anything until the "OK" has been sent.
>
> Should that perhaps be done inside the msg_type == XS_WATCH code (with a
> bool that would determine whether an reply_mutex has been taken?)
>
> As in, is there no need to take this mutex if msg_type != XS_WATCH?

When msg_type != XS_WATCH, we only need to take the mutex briefly when 
sending the "OK". There's no issue with XS_UNWATCH because there's never 
anything further to be written. v2 of the patch won't hold the lock while 
doing the work for XS_UNWATCH.

>> Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
>>
>> diff -r 5ab1b4af1faf drivers/xen/xenfs/xenbus.c
>> --- a/drivers/xen/xenfs/xenbus.c	Thu Dec 03 06:00:06 2009 +0000
>> +++ b/drivers/xen/xenfs/xenbus.c	Wed May 15 17:24:47 2013 +0100
>> @@ -359,6 +359,8 @@ static int xenbus_write_watch(unsigned m
>>   	}
>>   	token++;
>>
>> +	mutex_lock(&u->reply_mutex);
>> +
>
> Have you tested this with the kernel compiled with DEBUG_MUTEX and DEBUG_PROVE_LOCKING
> to make sure there are no mutex/spinlock issues?

No. I will give that a try.

>>   	if (msg_type == XS_WATCH) {
>>   		watch = alloc_watch_adapter(path, token);
>>   		if (watch == NULL) {
>> @@ -401,12 +403,11 @@ static int xenbus_write_watch(unsigned m
>>   			"OK"
>>   		};
>>
>> -		mutex_lock(&u->reply_mutex);
>>   		rc = queue_reply(&u->read_buffers, &reply, sizeof(reply));
>> -		mutex_unlock(&u->reply_mutex);
>>   	}
>>
>>   out:
>> +	mutex_unlock(&u->reply_mutex);
>>   	return rc;
>>   }

Thanks for the feedback,
Jonathan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-06-03  7:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-15 16:51 xenfs: race condition on xenstore watch Jonathan Davies
2013-05-16  8:26 ` Jan Beulich
2013-05-28 12:55 ` Konrad Rzeszutek Wilk
2013-05-31 13:45   ` Jan Beulich
2013-05-31 17:07     ` Jonathan Davies
2013-06-03  7:57       ` Jan Beulich
2013-05-31 17:07   ` Jonathan Davies

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).