* A bug in Xenbus driver
@ 2010-08-25 18:57 Jun Zhu (Intern)
2010-08-25 19:11 ` Jeremy Fitzhardinge
2010-08-25 19:19 ` Jeremy Fitzhardinge
0 siblings, 2 replies; 8+ messages in thread
From: Jun Zhu (Intern) @ 2010-08-25 18:57 UTC (permalink / raw)
To: xen-devel@lists.xensource.com
Hi all,
I think this is a serious bug, existing in the pvops 2.6.31.13 (also existing in linux 2.6.35);
In the xenbus driver (drivers/xen/xenfs/xenbus.c), the function of xenbus_file_read has a section of source code like this:
if (ret != sz) {
if (i == 0)
i = -EFAULT;
goto out;
}
/* Clear out buffer if it has been consumed */
if (rb->cons == rb->len) {
list_del(&rb->list);
kfree(rb);
if (list_empty(&u->read_buffers))
break;
rb = list_entry(u->read_buffers.next,
struct read_buffer, list);
}
It should be like this:
// if (ret != sz) {
if (ret != 0) {
if (i == 0)
i = -EFAULT;
goto out;
}
This bug makes the read_buffer not be cleared most of the time. If the xenstore client uses PTHREAD to create a thread to receive reply message, the problem will incur. The new thread can not read what it wants to read, since the list is not empty.
I found this problem from the xenstore client xs_watch function. xs_watch creates the new thread on demand. So I recommend that in the function of read_message(xen/tools/xenstore/xs.c), if using thread to receive message, in the case of read fault, it should signal to the listener and print out the error.
Jun Zhu
Citrix Systems UK
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A bug in Xenbus driver
2010-08-25 18:57 A bug in Xenbus driver Jun Zhu (Intern)
@ 2010-08-25 19:11 ` Jeremy Fitzhardinge
2010-08-25 19:15 ` Jeremy Fitzhardinge
2010-08-25 19:19 ` Jeremy Fitzhardinge
1 sibling, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-25 19:11 UTC (permalink / raw)
To: Jun Zhu (Intern); +Cc: xen-devel@lists.xensource.com
On 08/25/2010 11:57 AM, Jun Zhu (Intern) wrote:
> Hi all,
>
> I think this is a serious bug, existing in the pvops 2.6.31.13 (also existing in linux 2.6.35);
>
> In the xenbus driver (drivers/xen/xenfs/xenbus.c), the function of xenbus_file_read has a section of source code like this:
> if (ret != sz) {
> if (i == 0)
> i = -EFAULT;
> goto out;
> }
> /* Clear out buffer if it has been consumed */
> if (rb->cons == rb->len) {
> list_del(&rb->list);
> kfree(rb);
> if (list_empty(&u->read_buffers))
> break;
> rb = list_entry(u->read_buffers.next,
> struct read_buffer, list);
> }
> It should be like this:
> // if (ret != sz) {
> if (ret != 0) {
> if (i == 0)
> i = -EFAULT;
> goto out;
> }
> This bug makes the read_buffer not be cleared most of the time. If the xenstore client uses PTHREAD to create a thread to receive reply message, the problem will incur. The new thread can not read what it wants to read, since the list is not empty.
>
> I found this problem from the xenstore client xs_watch function. xs_watch creates the new thread on demand. So I recommend that in the function of read_message(xen/tools/xenstore/xs.c), if using thread to receive message, in the case of read fault, it should signal to the listener and print out the error.
I don't follow your description of the problem. The full context of the
code in question is this:
ret = copy_to_user(ubuf + i, &rb->msg[rb->cons], sz);
i += sz - ret;
rb->cons += sz - ret;
if (ret != sz) {
if (i == 0)
i = -EFAULT;
goto out;
}
copy_to_user returns the number of uncopied bytes, so if ret != sz, then
some part of the data was not copied due to a memory fault problem.
Changing the test to "ret != 0" means that it will return a partial
result (at the end of a page, for example) without reporting an error,
and discarding the uncopied data.
If you're hitting the "ret != sz" case often, then I think there's some
problem with the memory you're passing into read() (either the buffer
itself, or the size argument).
Thanks,
J
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A bug in Xenbus driver
2010-08-25 19:11 ` Jeremy Fitzhardinge
@ 2010-08-25 19:15 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-25 19:15 UTC (permalink / raw)
To: Jun Zhu (Intern); +Cc: xen-devel@lists.xensource.com
On 08/25/2010 12:11 PM, Jeremy Fitzhardinge wrote:
> copy_to_user returns the number of uncopied bytes, so if ret != sz, then
> some part of the data was not copied due to a memory fault problem.
No, wait, I've contradicted myself. copy_to_user does return the number
of uncopied bytes, so the result should be 0 on success, and you're
right, the test should be "ret != 0".
Sorry about that.
J
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A bug in Xenbus driver
2010-08-25 18:57 A bug in Xenbus driver Jun Zhu (Intern)
2010-08-25 19:11 ` Jeremy Fitzhardinge
@ 2010-08-25 19:19 ` Jeremy Fitzhardinge
2010-08-26 8:25 ` Jun Zhu (Intern)
1 sibling, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-25 19:19 UTC (permalink / raw)
To: Jun Zhu (Intern); +Cc: xen-devel@lists.xensource.com
On 08/25/2010 11:57 AM, Jun Zhu (Intern) wrote:
> This bug makes the read_buffer not be cleared most of the time. If the xenstore client uses PTHREAD to create a thread to receive reply message, the problem will incur. The new thread can not read what it wants to read, since the list is not empty.
This still puzzles me. The bug you found means that
partially-successful reads will not be reported correctly, but
completely successful and completely unsuccessful reads should be OK.
So if you're seeing a problem as a result, it still indicates there's
something wrong with what you're passing to read().
J
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: A bug in Xenbus driver
2010-08-25 19:19 ` Jeremy Fitzhardinge
@ 2010-08-26 8:25 ` Jun Zhu (Intern)
2010-08-26 16:34 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 8+ messages in thread
From: Jun Zhu (Intern) @ 2010-08-26 8:25 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xen-devel@lists.xensource.com
Hi,
Sorry for my description. "Does not report correctly" is not the problem of xenbus read.
I mean that in the xenstore/xs.c code, when using another thread (#ifdef USE_PTHREAD) to receive responses from xenbus, if this thread exits incorrectly, the listener should be noticed, or the listener will wait for response for ever.
Jun Zhu
Citrix Systems UK
________________________________________
From: Jeremy Fitzhardinge [jeremy@goop.org]
Sent: Wednesday, August 25, 2010 3:19 PM
To: Jun Zhu (Intern)
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] A bug in Xenbus driver
On 08/25/2010 11:57 AM, Jun Zhu (Intern) wrote:
> This bug makes the read_buffer not be cleared most of the time. If the xenstore client uses PTHREAD to create a thread to receive reply message, the problem will incur. The new thread can not read what it wants to read, since the list is not empty.
This still puzzles me. The bug you found means that
partially-successful reads will not be reported correctly, but
completely successful and completely unsuccessful reads should be OK.
So if you're seeing a problem as a result, it still indicates there's
something wrong with what you're passing to read().
J
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A bug in Xenbus driver
2010-08-26 8:25 ` Jun Zhu (Intern)
@ 2010-08-26 16:34 ` Jeremy Fitzhardinge
2010-08-27 11:03 ` Jun Zhu (Intern)
2010-08-27 12:38 ` Jun Zhu (Intern)
0 siblings, 2 replies; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-26 16:34 UTC (permalink / raw)
To: Jun Zhu (Intern); +Cc: xen-devel@lists.xensource.com
On 08/26/2010 01:25 AM, Jun Zhu (Intern) wrote:
> Sorry for my description. "Does not report correctly" is not the problem of xenbus read.
> I mean that in the xenstore/xs.c code, when using another thread (#ifdef USE_PTHREAD) to receive responses from xenbus, if this thread exits incorrectly, the listener should be noticed, or the listener will wait for response for ever.
How does that relate to your proposed fix?
J
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: A bug in Xenbus driver
2010-08-26 16:34 ` Jeremy Fitzhardinge
@ 2010-08-27 11:03 ` Jun Zhu (Intern)
2010-08-27 12:38 ` Jun Zhu (Intern)
1 sibling, 0 replies; 8+ messages in thread
From: Jun Zhu (Intern) @ 2010-08-27 11:03 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xen-devel@lists.xensource.com
Hi
Besides xenbus driver bug, the other problem is that there will be a deadlock if the read_thread exits!
Now I have recognized where the deadlock comes if the child thread (read_thread in xs.c) exists. After the child thread dies, the main thread (read_reply in xs.c) will go on. So in the following invocation of read_reply (no reading thread exists now), read_reply will use read_message(h) == -1 to read the reply. If the reply is XS_WATCH_EVENT type, the read_message function returns zero but leaves the reply_list empty, so read_reply goes to condvar_wait(&h->reply_condvar, &h->reply_mutex, h) since the reply_list is empty. The problem incurs here. Since condvar_wait is a different definition when using PTHREAD to read reply message, it will just wait on this signal, do nothing.
This is one possibility of deadlocks. You can replay this bug by commenting read_message function in the read_thread, just letting the read_thread die directly. In this case, the reading thread does nothing.
I am not sure I have a good solution to fix this problem now. Maybe we can let the read_message function returns a different value when the reply is XS_WATCH_EVENT, and the read_reply function checks. If it is XS_WATCH_EVENT type, call the read_message again.
Jun Zhu
Citrix Systems UK
________________________________________
From: Jeremy Fitzhardinge [jeremy@goop.org]
Sent: Thursday, August 26, 2010 12:34 PM
To: Jun Zhu (Intern)
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] A bug in Xenbus driver
On 08/26/2010 01:25 AM, Jun Zhu (Intern) wrote:
> Sorry for my description. "Does not report correctly" is not the problem of xenbus read.
> I mean that in the xenstore/xs.c code, when using another thread (#ifdef USE_PTHREAD) to receive responses from xenbus, if this thread exits incorrectly, the listener should be noticed, or the listener will wait for response for ever.
How does that relate to your proposed fix?
J
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: A bug in Xenbus driver
2010-08-26 16:34 ` Jeremy Fitzhardinge
2010-08-27 11:03 ` Jun Zhu (Intern)
@ 2010-08-27 12:38 ` Jun Zhu (Intern)
1 sibling, 0 replies; 8+ messages in thread
From: Jun Zhu (Intern) @ 2010-08-27 12:38 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xen-devel@lists.xensource.com
Hi,
Besides xenbus driver bug, the problem is that there will be a deadlock if the read_thread exits!
Now I have recognized where the deadlock comes if the child thread (read_thread in xs.c) exists. After the child thread dies, the main thread (read_reply in xs.c) will go on. So in the following invocation of read_reply (no reading thread exists now), read_reply will use read_message(h) == -1 to read the reply. If the reply is XS_WATCH_EVENT type, the read_message function returns zero but leaves the reply_list empty, so read_reply goes to condvar_wait(&h->reply_condvar, &h->reply_mutex, h) since the reply_list is empty. The problem incurs here. Since condvar_wait is a different definition when using PTHREAD to read reply message, it will just wait on this signal, do nothing.
This is one possibility of deadlocks. You can replay this bug by commenting read_message function in the read_thread, just letting the read_thread die directly. In this case, the reading thread does nothing.
I am not sure I have a good solution to fix this problem now. Maybe we can let the read_message function returns a different value when the reply is XS_WATCH_EVENT, and the read_reply function checks. If it is XS_WATCH_EVENT type, call the read_message again.
Jun Zhu
Citrix Systems UK
________________________________________
From: Jeremy Fitzhardinge [jeremy@goop.org]
Sent: Thursday, August 26, 2010 12:34 PM
To: Jun Zhu (Intern)
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] A bug in Xenbus driver
On 08/26/2010 01:25 AM, Jun Zhu (Intern) wrote:
> Sorry for my description. "Does not report correctly" is not the problem of xenbus read.
> I mean that in the xenstore/xs.c code, when using another thread (#ifdef USE_PTHREAD) to receive responses from xenbus, if this thread exits incorrectly, the listener should be noticed, or the listener will wait for response for ever.
How does that relate to your proposed fix?
J
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-08-27 12:38 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-25 18:57 A bug in Xenbus driver Jun Zhu (Intern)
2010-08-25 19:11 ` Jeremy Fitzhardinge
2010-08-25 19:15 ` Jeremy Fitzhardinge
2010-08-25 19:19 ` Jeremy Fitzhardinge
2010-08-26 8:25 ` Jun Zhu (Intern)
2010-08-26 16:34 ` Jeremy Fitzhardinge
2010-08-27 11:03 ` Jun Zhu (Intern)
2010-08-27 12:38 ` Jun Zhu (Intern)
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.