From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jonathan Davies <jonathan.davies@citrix.com>
Subject: Re: xenfs: race condition on xenstore watch
Date: Fri, 31 May 2013 18:07:37 +0100
Message-ID: <51A8D8D9.7020106@citrix.com>
References: <5193BCFA.3030604@citrix.com>
	<20130528125504.GA28949@phenom.dumpdata.com>
	<51A8C59F02000078000DA131@nat28.tlf.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <51A8C59F02000078000DA131@nat28.tlf.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel@lists.xen.org, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
List-Id: xen-devel@lists.xenproject.org

On 31/05/13 14:45, Jan Beulich wrote:
>>>> On 28.05.13 at 14:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> On Wed, May 15, 2013 at 05:51:06PM +0100, Jonathan Davies wrote:
>>> Normally, the userspace process gets an "OK" from xenfs and then the
>>> watch fires immediately after. Occasionally, this happens the other way
>>> around: the watch fires before the driver sends "OK", which confuses
>>> the xenstore-watch client. It seems to me that the client is within its
>>> rights to expect the "OK" first.
>
> Now that I look at this another time, I wonder how this behavior
> can be guaranteed even with your patch: xenbus_write_watch()
> sends the reply by calling queue_reply() followed by wake_up(),
> so the reply getting delivered and the watch firing appear to be
> inherently asynchronous anyway, and your patch just reduces
> the race window. Am I overlooking something here?

There's no call to wake_up() in xenbus_write_watch(), as far as I can see...

The key thing is to guarantee that xenbus_write_watch()'s call to 
queue_reply() (for the "OK" message) executes before the watch_fired() 
callback -- registered by the call to register_xenbus_watch() -- could be 
called-back. Given that watch_fired() also takes the reply_mutex before its 
own calls to queue_reply() and wake_up(), this behaviour can be guaranteed 
by taking the reply_mutex before registering the callback.

>>> Take the (non-reentrant) reply_mutex before calling
>>> register_xenbus_watch to prevent the watch_fired callback from writing
>>> anything until the "OK" has been sent.
>>
>> Should that perhaps be done inside the msg_type == XS_WATCH code (with a
>> bool that would determine whether an reply_mutex has been taken?)
>>
>> As in, is there no need to take this mutex if msg_type != XS_WATCH?
>
> As said in an earlier reply, I also think that it would be preferable
> to shrink the locked region as much as possible.

v2 of the patch will have a much tighter locked region. I'll post that once 
I've tested it.

Thanks,
Jonathan