From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: A bug in Xenbus driver Date: Wed, 25 Aug 2010 12:11:27 -0700 Message-ID: <4C756ADF.1000207@goop.org> References: <433DDF91DFB08148BAD3FDB6FDDA314C9F35F3BB47@LONPMAILBOX01.citrite.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <433DDF91DFB08148BAD3FDB6FDDA314C9F35F3BB47@LONPMAILBOX01.citrite.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Jun Zhu (Intern)" Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org On 08/25/2010 11:57 AM, Jun Zhu (Intern) wrote: > Hi all, > > I think this is a serious bug, existing in the pvops 2.6.31.13 (also existing in linux 2.6.35); > > In the xenbus driver (drivers/xen/xenfs/xenbus.c), the function of xenbus_file_read has a section of source code like this: > if (ret != sz) { > if (i == 0) > i = -EFAULT; > goto out; > } > /* Clear out buffer if it has been consumed */ > if (rb->cons == rb->len) { > list_del(&rb->list); > kfree(rb); > if (list_empty(&u->read_buffers)) > break; > rb = list_entry(u->read_buffers.next, > struct read_buffer, list); > } > It should be like this: > // if (ret != sz) { > if (ret != 0) { > if (i == 0) > i = -EFAULT; > goto out; > } > This bug makes the read_buffer not be cleared most of the time. If the xenstore client uses PTHREAD to create a thread to receive reply message, the problem will incur. The new thread can not read what it wants to read, since the list is not empty. > > I found this problem from the xenstore client xs_watch function. xs_watch creates the new thread on demand. So I recommend that in the function of read_message(xen/tools/xenstore/xs.c), if using thread to receive message, in the case of read fault, it should signal to the listener and print out the error. I don't follow your description of the problem. The full context of the code in question is this: ret = copy_to_user(ubuf + i, &rb->msg[rb->cons], sz); i += sz - ret; rb->cons += sz - ret; if (ret != sz) { if (i == 0) i = -EFAULT; goto out; } copy_to_user returns the number of uncopied bytes, so if ret != sz, then some part of the data was not copied due to a memory fault problem. Changing the test to "ret != 0" means that it will return a partial result (at the end of a page, for example) without reporting an error, and discarding the uncopied data. If you're hitting the "ret != sz" case often, then I think there's some problem with the memory you're passing into read() (either the buffer itself, or the size argument). Thanks, J