* [Qemu-devel] Possible bug in monitor code
@ 2014-01-22 15:53 Stratos Psomadakis
2014-01-23 3:07 ` Fam Zheng
2014-01-24 23:48 ` Luiz Capitulino
0 siblings, 2 replies; 15+ messages in thread
From: Stratos Psomadakis @ 2014-01-22 15:53 UTC (permalink / raw)
To: qemu-devel
Cc: Synnefo Development, Stratos Psomadakis, Ganeti Development,
Dimitris Aragiorgis
[-- Attachment #1.1: Type: text/plain, Size: 3055 bytes --]
Hi,
we've encountered a weird issue regarding monitor (qmp and hmp) behavior
with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue:
1) Client A connects to qmp socket with socat
2) Client A gets greeting message {"QMP": {"version": ..}
3) Client A waits (select on the socket's fd)
4) Client B tries to connect to the *same* qmp socket with socat
5) Client B does *NOT* get any greating message
6) Client B waits (select on the socket's fd)
7) Client B closes connection (kill socat)
8) Client A quits too
9) Client C connects to qmp socket
10) Client C gets *two* greeting messages!!!
After some investigation, we traced it down to the monitor_flush()
function in monitor.c. Specifically, when a second client connects to
the qmp (client B), while another one is already using it (client A), we
get the following from stracing the second client (client B):
connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0
getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3])
select(4, [0 3], [], [], NULL
So, the connect() syscall from client B succeeds, although client B
connection has not yet been accepted by the qmp server (it's still in
the backlog of the qmp listening socket).
After killing client B and then client A, we see the following when
stracing the qemu proc:
22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9
22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR)
22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0
22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
22363 fcntl(9, F_GETFL) = 0x802 (flags
O_RDWR|O_NONBLOCK)
22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) =
-1 EPIPE (Broken pipe)
22363 --- SIGPIPE (Broken pipe) @ 0 (0) ---
The qmp server / qemu accepts the connection from client B (who has now
closed the connection) and tries to write the greeting message to the
socket fd. This results in write returning an error (EPIPE).
The monitor_flush() function doesn't seem to handle this case (write
error). Instead, it adds a watch / handler to retry the write operation.
Thus, mon->outbuf is not cleaned up properly, which results in duplicate
greeting messages for the next client to connect.
The following seems to do the trick.
diff --git a/monitor.c b/monitor.c
index 845f608..5622f20 100644
--- a/monitor.c
+++ b/monitor.c
@@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
if (len && !mon->mux_out) {
rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
- if (rc == len) {
- /* all flushed */
+ if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
+ /* all flushed or error */
QDECREF(mon->outbuf);
mon->outbuf = qstring_new();
return;
Comments?
Thanks,
Stratos
--
Stratos Psomadakis
<psomas@grnet.gr>
[-- Attachment #1.2: Type: text/html, Size: 4264 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Possible bug in monitor code
2014-01-22 15:53 [Qemu-devel] Possible bug in monitor code Stratos Psomadakis
@ 2014-01-23 3:07 ` Fam Zheng
2014-01-23 10:17 ` Stratos Psomadakis
2014-01-24 23:48 ` Luiz Capitulino
1 sibling, 1 reply; 15+ messages in thread
From: Fam Zheng @ 2014-01-23 3:07 UTC (permalink / raw)
To: Stratos Psomadakis
Cc: Synnefo Development, Ganeti Development, qemu-devel,
Dimitris Aragiorgis
On Wed, 01/22 17:53, Stratos Psomadakis wrote:
> Hi,
>
> we've encountered a weird issue regarding monitor (qmp and hmp) behavior
> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue:
>
> 1) Client A connects to qmp socket with socat
> 2) Client A gets greeting message {"QMP": {"version": ..}
> 3) Client A waits (select on the socket's fd)
> 4) Client B tries to connect to the *same* qmp socket with socat
> 5) Client B does *NOT* get any greating message
> 6) Client B waits (select on the socket's fd)
> 7) Client B closes connection (kill socat)
> 8) Client A quits too
> 9) Client C connects to qmp socket
> 10) Client C gets *two* greeting messages!!!
Hi Stratos, thank you for debugging and reporting this.
I tested this sequence but can't fully reproduce this. What I see is 5) but no
10). Client C acts normally. And your patch below doesn't solve it for me.
To submit a patch, please follow instructions as described in
http://wiki.qemu.org/Contribute/SubmitAPatch
so it could be picked up by maintainers. Specifically, you need to format your
patch email with "git format-patch" and add a "Signed-off-by:" line in your
patch email.
Thanks,
Fam
>
> After some investigation, we traced it down to the monitor_flush()
> function in monitor.c. Specifically, when a second client connects to
> the qmp (client B), while another one is already using it (client A), we
> get the following from stracing the second client (client B):
>
> connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0
> getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
> select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3])
> select(4, [0 3], [], [], NULL
>
> So, the connect() syscall from client B succeeds, although client B
> connection has not yet been accepted by the qmp server (it's still in
> the backlog of the qmp listening socket).
>
> After killing client B and then client A, we see the following when
> stracing the qemu proc:
>
> 22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9
> 22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR)
> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> 22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
> 22363 fcntl(9, F_GETFL) = 0x802 (flags
> O_RDWR|O_NONBLOCK)
> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) =
> -1 EPIPE (Broken pipe)
> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) ---
>
> The qmp server / qemu accepts the connection from client B (who has now
> closed the connection) and tries to write the greeting message to the
> socket fd. This results in write returning an error (EPIPE).
>
> The monitor_flush() function doesn't seem to handle this case (write
> error). Instead, it adds a watch / handler to retry the write operation.
> Thus, mon->outbuf is not cleaned up properly, which results in duplicate
> greeting messages for the next client to connect.
>
> The following seems to do the trick.
>
> diff --git a/monitor.c b/monitor.c
> index 845f608..5622f20 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
>
> if (len && !mon->mux_out) {
> rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
> - if (rc == len) {
> - /* all flushed */
> + if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
> + /* all flushed or error */
> QDECREF(mon->outbuf);
> mon->outbuf = qstring_new();
> return;
>
> Comments?
>
> Thanks,
> Stratos
>
> --
> Stratos Psomadakis
> <psomas@grnet.gr>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Possible bug in monitor code
2014-01-23 3:07 ` Fam Zheng
@ 2014-01-23 10:17 ` Stratos Psomadakis
0 siblings, 0 replies; 15+ messages in thread
From: Stratos Psomadakis @ 2014-01-23 10:17 UTC (permalink / raw)
To: Fam Zheng
Cc: Synnefo Development, Ganeti Development, qemu-devel,
Dimitris Aragiorgis
[-- Attachment #1: Type: text/plain, Size: 4288 bytes --]
On 01/23/2014 05:07 AM, Fam Zheng wrote:
> On Wed, 01/22 17:53, Stratos Psomadakis wrote:
>> Hi,
>>
>> we've encountered a weird issue regarding monitor (qmp and hmp) behavior
>> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue:
>>
>> 1) Client A connects to qmp socket with socat
>> 2) Client A gets greeting message {"QMP": {"version": ..}
>> 3) Client A waits (select on the socket's fd)
>> 4) Client B tries to connect to the *same* qmp socket with socat
>> 5) Client B does *NOT* get any greating message
>> 6) Client B waits (select on the socket's fd)
>> 7) Client B closes connection (kill socat)
>> 8) Client A quits too
>> 9) Client C connects to qmp socket
>> 10) Client C gets *two* greeting messages!!!
> Hi Stratos, thank you for debugging and reporting this.
>
> I tested this sequence but can't fully reproduce this. What I see is 5) but no
> 10). Client C acts normally. And your patch below doesn't solve it for me.
Hm, which qemu version (or repo branch / tag) did you use? We did a
quick scan of the master branch code / commits, but we didn't find
anything that might fix the issue.
> To submit a patch, please follow instructions as described in
> http://wiki.qemu.org/Contribute/SubmitAPatch
> so it could be picked up by maintainers. Specifically, you need to format your
> patch email with "git format-patch" and add a "Signed-off-by:" line in your
> patch email.
Ok. If any dev can confirm that this is a bug (and that the patch below
is the correct way to fix it) I'll resubmit it properly.
Thanks,
Stratos
> Thanks,
>
> Fam
>
>> After some investigation, we traced it down to the monitor_flush()
>> function in monitor.c. Specifically, when a second client connects to
>> the qmp (client B), while another one is already using it (client A), we
>> get the following from stracing the second client (client B):
>>
>> connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0
>> getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
>> select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3])
>> select(4, [0 3], [], [], NULL
>>
>> So, the connect() syscall from client B succeeds, although client B
>> connection has not yet been accepted by the qmp server (it's still in
>> the backlog of the qmp listening socket).
>>
>> After killing client B and then client A, we see the following when
>> stracing the qemu proc:
>>
>> 22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9
>> 22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR)
>> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0
>> 22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
>> 22363 fcntl(9, F_GETFL) = 0x802 (flags
>> O_RDWR|O_NONBLOCK)
>> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) =
>> -1 EPIPE (Broken pipe)
>> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) ---
>>
>> The qmp server / qemu accepts the connection from client B (who has now
>> closed the connection) and tries to write the greeting message to the
>> socket fd. This results in write returning an error (EPIPE).
>>
>> The monitor_flush() function doesn't seem to handle this case (write
>> error). Instead, it adds a watch / handler to retry the write operation.
>> Thus, mon->outbuf is not cleaned up properly, which results in duplicate
>> greeting messages for the next client to connect.
>>
>> The following seems to do the trick.
>>
>> diff --git a/monitor.c b/monitor.c
>> index 845f608..5622f20 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
>>
>> if (len && !mon->mux_out) {
>> rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
>> - if (rc == len) {
>> - /* all flushed */
>> + if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
>> + /* all flushed or error */
>> QDECREF(mon->outbuf);
>> mon->outbuf = qstring_new();
>> return;
>>
>> Comments?
>>
>> Thanks,
>> Stratos
>>
>> --
>> Stratos Psomadakis
>> <psomas@grnet.gr>
>>
>
--
Stratos Psomadakis
<psomas@grnet.gr>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Possible bug in monitor code
2014-01-22 15:53 [Qemu-devel] Possible bug in monitor code Stratos Psomadakis
2014-01-23 3:07 ` Fam Zheng
@ 2014-01-24 23:48 ` Luiz Capitulino
2014-01-27 10:30 ` [Qemu-devel] [PATCH] monitor: Cleanup mon->outbuf on write error Stratos Psomadakis
1 sibling, 1 reply; 15+ messages in thread
From: Luiz Capitulino @ 2014-01-24 23:48 UTC (permalink / raw)
To: Stratos Psomadakis
Cc: Ganeti Development, qemu-devel, Synnefo Development, kraxel,
Dimitris Aragiorgis
On Wed, 22 Jan 2014 17:53:52 +0200
Stratos Psomadakis <psomas@grnet.gr> wrote:
> Hi,
>
> we've encountered a weird issue regarding monitor (qmp and hmp) behavior
> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue:
>
> 1) Client A connects to qmp socket with socat
> 2) Client A gets greeting message {"QMP": {"version": ..}
> 3) Client A waits (select on the socket's fd)
> 4) Client B tries to connect to the *same* qmp socket with socat
> 5) Client B does *NOT* get any greating message
> 6) Client B waits (select on the socket's fd)
> 7) Client B closes connection (kill socat)
> 8) Client A quits too
> 9) Client C connects to qmp socket
> 10) Client C gets *two* greeting messages!!!
>
> After some investigation, we traced it down to the monitor_flush()
> function in monitor.c. Specifically, when a second client connects to
> the qmp (client B), while another one is already using it (client A), we
> get the following from stracing the second client (client B):
>
> connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0
> getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
> select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3])
> select(4, [0 3], [], [], NULL
>
> So, the connect() syscall from client B succeeds, although client B
> connection has not yet been accepted by the qmp server (it's still in
> the backlog of the qmp listening socket).
>
> After killing client B and then client A, we see the following when
> stracing the qemu proc:
>
> 22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9
> 22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR)
> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> 22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
> 22363 fcntl(9, F_GETFL) = 0x802 (flags
> O_RDWR|O_NONBLOCK)
> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) =
> -1 EPIPE (Broken pipe)
> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) ---
>
> The qmp server / qemu accepts the connection from client B (who has now
> closed the connection) and tries to write the greeting message to the
> socket fd. This results in write returning an error (EPIPE).
>
> The monitor_flush() function doesn't seem to handle this case (write
> error). Instead, it adds a watch / handler to retry the write operation.
> Thus, mon->outbuf is not cleaned up properly, which results in duplicate
> greeting messages for the next client to connect.
>
> The following seems to do the trick.
>
> diff --git a/monitor.c b/monitor.c
> index 845f608..5622f20 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
>
> if (len && !mon->mux_out) {
> rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
> - if (rc == len) {
> - /* all flushed */
> + if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
> + /* all flushed or error */
> QDECREF(mon->outbuf);
> mon->outbuf = qstring_new();
> return;
>
> Comments?
I can reproduce the problem very easily and I can't think of a better
way to fix it.
The right thing to do, I guess, would be to move the error up and kill
the connection if there's any. But last time I checked the chardev
layer did not have an API to kill a connection...
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH] monitor: Cleanup mon->outbuf on write error
2014-01-24 23:48 ` Luiz Capitulino
@ 2014-01-27 10:30 ` Stratos Psomadakis
2014-01-29 10:46 ` Stratos Psomadakis
0 siblings, 1 reply; 15+ messages in thread
From: Stratos Psomadakis @ 2014-01-27 10:30 UTC (permalink / raw)
To: qemu-devel; +Cc: synnefo-devel, Stratos Psomdakis, ganeti-devel, dimara
In case monitor_flush() fails to write the contents of mon->outbuf to
the output device, mon->outbuf is not cleaned up properly. Check the
return code of the qemu_chr_fe_write() function and cleanup the outbuf
if it fails.
References: http://lists.nongnu.org/archive/html/qemu-devel/2014-01/msg02890.html
Signed-off-by: Stratos Psomadakis <psomas@grnet.gr>
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>
---
monitor.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/monitor.c b/monitor.c
index 80456fb..cba56bc 100644
--- a/monitor.c
+++ b/monitor.c
@@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
if (len && !mon->mux_out) {
rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
- if (rc == len) {
- /* all flushed */
+ if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
+ /* all flushed or error */
QDECREF(mon->outbuf);
mon->outbuf = qstring_new();
return;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH] monitor: Cleanup mon->outbuf on write error
2014-01-27 10:30 ` [Qemu-devel] [PATCH] monitor: Cleanup mon->outbuf on write error Stratos Psomadakis
@ 2014-01-29 10:46 ` Stratos Psomadakis
2014-01-29 14:12 ` Luiz Capitulino
0 siblings, 1 reply; 15+ messages in thread
From: Stratos Psomadakis @ 2014-01-29 10:46 UTC (permalink / raw)
To: qemu-devel
Cc: synnefo-devel, Stratos Psomdakis, ganeti-devel, dimara,
Luiz Capitulino
[-- Attachment #1: Type: text/plain, Size: 1214 bytes --]
On 01/27/2014 12:30 PM, Stratos Psomadakis wrote:
> In case monitor_flush() fails to write the contents of mon->outbuf to
> the output device, mon->outbuf is not cleaned up properly. Check the
> return code of the qemu_chr_fe_write() function and cleanup the outbuf
> if it fails.
>
> References: http://lists.nongnu.org/archive/html/qemu-devel/2014-01/msg02890.html
>
> Signed-off-by: Stratos Psomadakis <psomas@grnet.gr>
> Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>
> ---
> monitor.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/monitor.c b/monitor.c
> index 80456fb..cba56bc 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
>
> if (len && !mon->mux_out) {
> rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
> - if (rc == len) {
> - /* all flushed */
> + if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
> + /* all flushed or error */
> QDECREF(mon->outbuf);
> mon->outbuf = qstring_new();
> return;
Forgot to cc the maintainer.
--
Stratos Psomadakis
<psomas@grnet.gr>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH] monitor: Cleanup mon->outbuf on write error
2014-01-29 10:46 ` Stratos Psomadakis
@ 2014-01-29 14:12 ` Luiz Capitulino
0 siblings, 0 replies; 15+ messages in thread
From: Luiz Capitulino @ 2014-01-29 14:12 UTC (permalink / raw)
To: Stratos Psomadakis; +Cc: synnefo-devel, ganeti-devel, qemu-devel, dimara
On Wed, 29 Jan 2014 12:46:04 +0200
Stratos Psomadakis <psomas@grnet.gr> wrote:
> On 01/27/2014 12:30 PM, Stratos Psomadakis wrote:
> > In case monitor_flush() fails to write the contents of mon->outbuf to
> > the output device, mon->outbuf is not cleaned up properly. Check the
> > return code of the qemu_chr_fe_write() function and cleanup the outbuf
> > if it fails.
> >
> > References: http://lists.nongnu.org/archive/html/qemu-devel/2014-01/msg02890.html
> >
> > Signed-off-by: Stratos Psomadakis <psomas@grnet.gr>
> > Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>
> > ---
> > monitor.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/monitor.c b/monitor.c
> > index 80456fb..cba56bc 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
> >
> > if (len && !mon->mux_out) {
> > rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
> > - if (rc == len) {
> > - /* all flushed */
> > + if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
> > + /* all flushed or error */
> > QDECREF(mon->outbuf);
> > mon->outbuf = qstring_new();
> > return;
>
> Forgot to cc the maintainer.
I included it in my pull request:
http://lists.nongnu.org/archive/html/qemu-devel/2014-01/msg03859.html
But it seems that I forgot to reply to the patch too.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Possible bug in monitor code
@ 2014-01-23 11:23 Fam Zheng
2014-01-23 13:44 ` Luiz Capitulino
0 siblings, 1 reply; 15+ messages in thread
From: Fam Zheng @ 2014-01-23 11:23 UTC (permalink / raw)
To: Stratos Psomadakis
Cc: Synnefo Development, Ganeti Development, qemu-devel,
Dimitris Aragiorgis, Luiz Capitulino
Bcc:
Subject: Re: [Qemu-devel] Possible bug in monitor code
Reply-To:
In-Reply-To: <52E0EC4B.7010603@grnet.gr>
On Thu, 01/23 12:17, Stratos Psomadakis wrote:
> On 01/23/2014 05:07 AM, Fam Zheng wrote:
> > On Wed, 01/22 17:53, Stratos Psomadakis wrote:
> >> Hi,
> >>
> >> we've encountered a weird issue regarding monitor (qmp and hmp) behavior
> >> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue:
> >>
> >> 1) Client A connects to qmp socket with socat
> >> 2) Client A gets greeting message {"QMP": {"version": ..}
> >> 3) Client A waits (select on the socket's fd)
> >> 4) Client B tries to connect to the *same* qmp socket with socat
> >> 5) Client B does *NOT* get any greating message
> >> 6) Client B waits (select on the socket's fd)
> >> 7) Client B closes connection (kill socat)
> >> 8) Client A quits too
> >> 9) Client C connects to qmp socket
> >> 10) Client C gets *two* greeting messages!!!
> > Hi Stratos, thank you for debugging and reporting this.
> >
> > I tested this sequence but can't fully reproduce this. What I see is 5) but no
> > 10). Client C acts normally. And your patch below doesn't solve it for me.
>
> Hm, which qemu version (or repo branch / tag) did you use? We did a
> quick scan of the master branch code / commits, but we didn't find
> anything that might fix the issue.
I tried on qemu.git master, and also 1.7. I think it is a bug: in my test, step
5), B not getting any greeting message.
But I get only one greeting message in step 10), which is a bit different from
what you reported.
And no difference with your patch applied.
Cc'ing Luiz who maintains the monitor code and may have more ideas.
Thanks,
Fam
>
> > To submit a patch, please follow instructions as described in
> > http://wiki.qemu.org/Contribute/SubmitAPatch
> > so it could be picked up by maintainers. Specifically, you need to format your
> > patch email with "git format-patch" and add a "Signed-off-by:" line in your
> > patch email.
>
> Ok. If any dev can confirm that this is a bug (and that the patch below
> is the correct way to fix it) I'll resubmit it properly.
>
> Thanks,
> Stratos
>
> > Thanks,
> >
> > Fam
> >
> >> After some investigation, we traced it down to the monitor_flush()
> >> function in monitor.c. Specifically, when a second client connects to
> >> the qmp (client B), while another one is already using it (client A), we
> >> get the following from stracing the second client (client B):
> >>
> >> connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0
> >> getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
> >> select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3])
> >> select(4, [0 3], [], [], NULL
> >>
> >> So, the connect() syscall from client B succeeds, although client B
> >> connection has not yet been accepted by the qmp server (it's still in
> >> the backlog of the qmp listening socket).
> >>
> >> After killing client B and then client A, we see the following when
> >> stracing the qemu proc:
> >>
> >> 22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9
> >> 22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR)
> >> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> >> 22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
> >> 22363 fcntl(9, F_GETFL) = 0x802 (flags
> >> O_RDWR|O_NONBLOCK)
> >> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) =
> >> -1 EPIPE (Broken pipe)
> >> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) ---
> >>
> >> The qmp server / qemu accepts the connection from client B (who has now
> >> closed the connection) and tries to write the greeting message to the
> >> socket fd. This results in write returning an error (EPIPE).
> >>
> >> The monitor_flush() function doesn't seem to handle this case (write
> >> error). Instead, it adds a watch / handler to retry the write operation.
> >> Thus, mon->outbuf is not cleaned up properly, which results in duplicate
> >> greeting messages for the next client to connect.
> >>
> >> The following seems to do the trick.
> >>
> >> diff --git a/monitor.c b/monitor.c
> >> index 845f608..5622f20 100644
> >> --- a/monitor.c
> >> +++ b/monitor.c
> >> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
> >>
> >> if (len && !mon->mux_out) {
> >> rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
> >> - if (rc == len) {
> >> - /* all flushed */
> >> + if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
> >> + /* all flushed or error */
> >> QDECREF(mon->outbuf);
> >> mon->outbuf = qstring_new();
> >> return;
> >>
> >> Comments?
> >>
> >> Thanks,
> >> Stratos
> >>
> >> --
> >> Stratos Psomadakis
> >> <psomas@grnet.gr>
> >>
> >
>
>
> --
> Stratos Psomadakis
> <psomas@grnet.gr>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Possible bug in monitor code
2014-01-23 11:23 [Qemu-devel] Possible bug in monitor code Fam Zheng
@ 2014-01-23 13:44 ` Luiz Capitulino
2014-01-23 13:54 ` Luiz Capitulino
0 siblings, 1 reply; 15+ messages in thread
From: Luiz Capitulino @ 2014-01-23 13:44 UTC (permalink / raw)
To: Fam Zheng
Cc: Synnefo Development, Stratos Psomadakis, Ganeti Development,
qemu-devel, Dimitris Aragiorgis
On Thu, 23 Jan 2014 19:23:51 +0800
Fam Zheng <famz@redhat.com> wrote:
> Bcc:
> Subject: Re: [Qemu-devel] Possible bug in monitor code
> Reply-To:
> In-Reply-To: <52E0EC4B.7010603@grnet.gr>
>
> On Thu, 01/23 12:17, Stratos Psomadakis wrote:
> > On 01/23/2014 05:07 AM, Fam Zheng wrote:
> > > On Wed, 01/22 17:53, Stratos Psomadakis wrote:
> > >> Hi,
> > >>
> > >> we've encountered a weird issue regarding monitor (qmp and hmp) behavior
> > >> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue:
> > >>
> > >> 1) Client A connects to qmp socket with socat
> > >> 2) Client A gets greeting message {"QMP": {"version": ..}
> > >> 3) Client A waits (select on the socket's fd)
> > >> 4) Client B tries to connect to the *same* qmp socket with socat
QMP/HMP can only handle a single client per connection, so this is
not supported. What you could do is to create multiple QMP/HMP instances
on the command-line, but this has never been fully tested so we don't
officially support this either (although it may work).
Now, not gracefully failing on step 4 is a real bug here. I think the
best thing to do would be to close client's B connection. Patches are
welcome :)
> > >> 5) Client B does *NOT* get any greating message
> > >> 6) Client B waits (select on the socket's fd)
> > >> 7) Client B closes connection (kill socat)
> > >> 8) Client A quits too
> > >> 9) Client C connects to qmp socket
> > >> 10) Client C gets *two* greeting messages!!!
> > > Hi Stratos, thank you for debugging and reporting this.
> > >
> > > I tested this sequence but can't fully reproduce this. What I see is 5) but no
> > > 10). Client C acts normally. And your patch below doesn't solve it for me.
> >
> > Hm, which qemu version (or repo branch / tag) did you use? We did a
> > quick scan of the master branch code / commits, but we didn't find
> > anything that might fix the issue.
>
> I tried on qemu.git master, and also 1.7. I think it is a bug: in my test, step
> 5), B not getting any greeting message.
>
> But I get only one greeting message in step 10), which is a bit different from
> what you reported.
>
> And no difference with your patch applied.
>
> Cc'ing Luiz who maintains the monitor code and may have more ideas.
>
> Thanks,
>
> Fam
>
> >
> > > To submit a patch, please follow instructions as described in
> > > http://wiki.qemu.org/Contribute/SubmitAPatch
> > > so it could be picked up by maintainers. Specifically, you need to format your
> > > patch email with "git format-patch" and add a "Signed-off-by:" line in your
> > > patch email.
> >
> > Ok. If any dev can confirm that this is a bug (and that the patch below
> > is the correct way to fix it) I'll resubmit it properly.
> >
> > Thanks,
> > Stratos
> >
> > > Thanks,
> > >
> > > Fam
> > >
> > >> After some investigation, we traced it down to the monitor_flush()
> > >> function in monitor.c. Specifically, when a second client connects to
> > >> the qmp (client B), while another one is already using it (client A), we
> > >> get the following from stracing the second client (client B):
> > >>
> > >> connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0
> > >> getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
> > >> select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3])
> > >> select(4, [0 3], [], [], NULL
> > >>
> > >> So, the connect() syscall from client B succeeds, although client B
> > >> connection has not yet been accepted by the qmp server (it's still in
> > >> the backlog of the qmp listening socket).
> > >>
> > >> After killing client B and then client A, we see the following when
> > >> stracing the qemu proc:
> > >>
> > >> 22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9
> > >> 22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR)
> > >> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> > >> 22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
> > >> 22363 fcntl(9, F_GETFL) = 0x802 (flags
> > >> O_RDWR|O_NONBLOCK)
> > >> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) =
> > >> -1 EPIPE (Broken pipe)
> > >> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) ---
> > >>
> > >> The qmp server / qemu accepts the connection from client B (who has now
> > >> closed the connection) and tries to write the greeting message to the
> > >> socket fd. This results in write returning an error (EPIPE).
> > >>
> > >> The monitor_flush() function doesn't seem to handle this case (write
> > >> error). Instead, it adds a watch / handler to retry the write operation.
> > >> Thus, mon->outbuf is not cleaned up properly, which results in duplicate
> > >> greeting messages for the next client to connect.
> > >>
> > >> The following seems to do the trick.
> > >>
> > >> diff --git a/monitor.c b/monitor.c
> > >> index 845f608..5622f20 100644
> > >> --- a/monitor.c
> > >> +++ b/monitor.c
> > >> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
> > >>
> > >> if (len && !mon->mux_out) {
> > >> rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
> > >> - if (rc == len) {
> > >> - /* all flushed */
> > >> + if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
> > >> + /* all flushed or error */
> > >> QDECREF(mon->outbuf);
> > >> mon->outbuf = qstring_new();
> > >> return;
> > >>
> > >> Comments?
> > >>
> > >> Thanks,
> > >> Stratos
> > >>
> > >> --
> > >> Stratos Psomadakis
> > >> <psomas@grnet.gr>
> > >>
> > >
> >
> >
> > --
> > Stratos Psomadakis
> > <psomas@grnet.gr>
> >
> >
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Possible bug in monitor code
2014-01-23 13:44 ` Luiz Capitulino
@ 2014-01-23 13:54 ` Luiz Capitulino
2014-01-23 15:33 ` Stratos Psomadakis
0 siblings, 1 reply; 15+ messages in thread
From: Luiz Capitulino @ 2014-01-23 13:54 UTC (permalink / raw)
To: Fam Zheng
Cc: Synnefo Development, Stratos Psomadakis, Ganeti Development,
qemu-devel, Dimitris Aragiorgis
On Thu, 23 Jan 2014 08:44:02 -0500
Luiz Capitulino <lcapitulino@redhat.com> wrote:
> On Thu, 23 Jan 2014 19:23:51 +0800
> Fam Zheng <famz@redhat.com> wrote:
>
> > Bcc:
> > Subject: Re: [Qemu-devel] Possible bug in monitor code
> > Reply-To:
> > In-Reply-To: <52E0EC4B.7010603@grnet.gr>
> >
> > On Thu, 01/23 12:17, Stratos Psomadakis wrote:
> > > On 01/23/2014 05:07 AM, Fam Zheng wrote:
> > > > On Wed, 01/22 17:53, Stratos Psomadakis wrote:
> > > >> Hi,
> > > >>
> > > >> we've encountered a weird issue regarding monitor (qmp and hmp) behavior
> > > >> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue:
> > > >>
> > > >> 1) Client A connects to qmp socket with socat
> > > >> 2) Client A gets greeting message {"QMP": {"version": ..}
> > > >> 3) Client A waits (select on the socket's fd)
> > > >> 4) Client B tries to connect to the *same* qmp socket with socat
>
> QMP/HMP can only handle a single client per connection, so this is
> not supported. What you could do is to create multiple QMP/HMP instances
> on the command-line, but this has never been fully tested so we don't
> officially support this either (although it may work).
>
> Now, not gracefully failing on step 4 is a real bug here. I think the
> best thing to do would be to close client's B connection. Patches are
> welcome :)
Thinking about this again, I think this should already be the case
(as I don't believe chardev code is sitting on accept()). So maybe
you triggered a race on chardev code or broke something there.
>
> > > >> 5) Client B does *NOT* get any greating message
> > > >> 6) Client B waits (select on the socket's fd)
> > > >> 7) Client B closes connection (kill socat)
> > > >> 8) Client A quits too
> > > >> 9) Client C connects to qmp socket
> > > >> 10) Client C gets *two* greeting messages!!!
> > > > Hi Stratos, thank you for debugging and reporting this.
> > > >
> > > > I tested this sequence but can't fully reproduce this. What I see is 5) but no
> > > > 10). Client C acts normally. And your patch below doesn't solve it for me.
> > >
> > > Hm, which qemu version (or repo branch / tag) did you use? We did a
> > > quick scan of the master branch code / commits, but we didn't find
> > > anything that might fix the issue.
> >
> > I tried on qemu.git master, and also 1.7. I think it is a bug: in my test, step
> > 5), B not getting any greeting message.
> >
> > But I get only one greeting message in step 10), which is a bit different from
> > what you reported.
> >
> > And no difference with your patch applied.
> >
> > Cc'ing Luiz who maintains the monitor code and may have more ideas.
> >
> > Thanks,
> >
> > Fam
> >
> > >
> > > > To submit a patch, please follow instructions as described in
> > > > http://wiki.qemu.org/Contribute/SubmitAPatch
> > > > so it could be picked up by maintainers. Specifically, you need to format your
> > > > patch email with "git format-patch" and add a "Signed-off-by:" line in your
> > > > patch email.
> > >
> > > Ok. If any dev can confirm that this is a bug (and that the patch below
> > > is the correct way to fix it) I'll resubmit it properly.
> > >
> > > Thanks,
> > > Stratos
> > >
> > > > Thanks,
> > > >
> > > > Fam
> > > >
> > > >> After some investigation, we traced it down to the monitor_flush()
> > > >> function in monitor.c. Specifically, when a second client connects to
> > > >> the qmp (client B), while another one is already using it (client A), we
> > > >> get the following from stracing the second client (client B):
> > > >>
> > > >> connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0
> > > >> getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
> > > >> select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3])
> > > >> select(4, [0 3], [], [], NULL
> > > >>
> > > >> So, the connect() syscall from client B succeeds, although client B
> > > >> connection has not yet been accepted by the qmp server (it's still in
> > > >> the backlog of the qmp listening socket).
> > > >>
> > > >> After killing client B and then client A, we see the following when
> > > >> stracing the qemu proc:
> > > >>
> > > >> 22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9
> > > >> 22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR)
> > > >> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> > > >> 22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
> > > >> 22363 fcntl(9, F_GETFL) = 0x802 (flags
> > > >> O_RDWR|O_NONBLOCK)
> > > >> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) =
> > > >> -1 EPIPE (Broken pipe)
> > > >> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) ---
> > > >>
> > > >> The qmp server / qemu accepts the connection from client B (who has now
> > > >> closed the connection) and tries to write the greeting message to the
> > > >> socket fd. This results in write returning an error (EPIPE).
> > > >>
> > > >> The monitor_flush() function doesn't seem to handle this case (write
> > > >> error). Instead, it adds a watch / handler to retry the write operation.
> > > >> Thus, mon->outbuf is not cleaned up properly, which results in duplicate
> > > >> greeting messages for the next client to connect.
> > > >>
> > > >> The following seems to do the trick.
> > > >>
> > > >> diff --git a/monitor.c b/monitor.c
> > > >> index 845f608..5622f20 100644
> > > >> --- a/monitor.c
> > > >> +++ b/monitor.c
> > > >> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
> > > >>
> > > >> if (len && !mon->mux_out) {
> > > >> rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
> > > >> - if (rc == len) {
> > > >> - /* all flushed */
> > > >> + if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
> > > >> + /* all flushed or error */
> > > >> QDECREF(mon->outbuf);
> > > >> mon->outbuf = qstring_new();
> > > >> return;
> > > >>
> > > >> Comments?
> > > >>
> > > >> Thanks,
> > > >> Stratos
> > > >>
> > > >> --
> > > >> Stratos Psomadakis
> > > >> <psomas@grnet.gr>
> > > >>
> > > >
> > >
> > >
> > > --
> > > Stratos Psomadakis
> > > <psomas@grnet.gr>
> > >
> > >
> >
> >
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Possible bug in monitor code
2014-01-23 13:54 ` Luiz Capitulino
@ 2014-01-23 15:33 ` Stratos Psomadakis
2014-01-23 18:28 ` Luiz Capitulino
0 siblings, 1 reply; 15+ messages in thread
From: Stratos Psomadakis @ 2014-01-23 15:33 UTC (permalink / raw)
To: Luiz Capitulino
Cc: Synnefo Development, Ganeti Development, Fam Zheng, qemu-devel,
Dimitris Aragiorgis
[-- Attachment #1: Type: text/plain, Size: 7298 bytes --]
On 01/23/2014 03:54 PM, Luiz Capitulino wrote:
> On Thu, 23 Jan 2014 08:44:02 -0500
> Luiz Capitulino <lcapitulino@redhat.com> wrote:
>
>> On Thu, 23 Jan 2014 19:23:51 +0800
>> Fam Zheng <famz@redhat.com> wrote:
>>
>>> Bcc:
>>> Subject: Re: [Qemu-devel] Possible bug in monitor code
>>> Reply-To:
>>> In-Reply-To: <52E0EC4B.7010603@grnet.gr>
>>>
>>> On Thu, 01/23 12:17, Stratos Psomadakis wrote:
>>>> On 01/23/2014 05:07 AM, Fam Zheng wrote:
>>>>> On Wed, 01/22 17:53, Stratos Psomadakis wrote:
>>>>>> Hi,
>>>>>>
>>>>>> we've encountered a weird issue regarding monitor (qmp and hmp) behavior
>>>>>> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue:
>>>>>>
>>>>>> 1) Client A connects to qmp socket with socat
>>>>>> 2) Client A gets greeting message {"QMP": {"version": ..}
>>>>>> 3) Client A waits (select on the socket's fd)
>>>>>> 4) Client B tries to connect to the *same* qmp socket with socat
>> QMP/HMP can only handle a single client per connection, so this is
>> not supported. What you could do is to create multiple QMP/HMP instances
>> on the command-line, but this has never been fully tested so we don't
>> officially support this either (although it may work).
>>
>> Now, not gracefully failing on step 4 is a real bug here. I think the
>> best thing to do would be to close client's B connection. Patches are
>> welcome :)
> Thinking about this again, I think this should already be the case
> (as I don't believe chardev code is sitting on accept()). So maybe
> you triggered a race on chardev code or broke something there.
Yes, the chardev code doesn't accept any more connections until the
currently active connection is closed.
However, when a new client tries to connect (while there's another qmp /
hmp connection active) the kernel, as long as there's room in the socket
backlog, will return to the client that the connection has been
successfully established and will queue the connection to be accepted
later, when qmp actually finishes with its active connection and
re-executes accept().
The problem arises if the new client closes the connection before the
older one is finished. When qmp runs accept() again, it will get a
socket fd for a client who has now closed the connection. At this point,
the monitor control event handler will get triggered and try to write /
flush the greeting message to the client. The client however has closed
its socket so the write will fail with SIGPIPE / EPIPE. Neither the
qemu-char nor the monitor code seem to handle this situation.
Btw, have you tried to reproduce it?
Thanks,
Stratos
>
>>>>>> 5) Client B does *NOT* get any greating message
>>>>>> 6) Client B waits (select on the socket's fd)
>>>>>> 7) Client B closes connection (kill socat)
>>>>>> 8) Client A quits too
>>>>>> 9) Client C connects to qmp socket
>>>>>> 10) Client C gets *two* greeting messages!!!
>>>>> Hi Stratos, thank you for debugging and reporting this.
>>>>>
>>>>> I tested this sequence but can't fully reproduce this. What I see is 5) but no
>>>>> 10). Client C acts normally. And your patch below doesn't solve it for me.
>>>> Hm, which qemu version (or repo branch / tag) did you use? We did a
>>>> quick scan of the master branch code / commits, but we didn't find
>>>> anything that might fix the issue.
>>> I tried on qemu.git master, and also 1.7. I think it is a bug: in my test, step
>>> 5), B not getting any greeting message.
>>>
>>> But I get only one greeting message in step 10), which is a bit different from
>>> what you reported.
>>>
>>> And no difference with your patch applied.
>>>
>>> Cc'ing Luiz who maintains the monitor code and may have more ideas.
>>>
>>> Thanks,
>>>
>>> Fam
>>>
>>>>> To submit a patch, please follow instructions as described in
>>>>> http://wiki.qemu.org/Contribute/SubmitAPatch
>>>>> so it could be picked up by maintainers. Specifically, you need to format your
>>>>> patch email with "git format-patch" and add a "Signed-off-by:" line in your
>>>>> patch email.
>>>> Ok. If any dev can confirm that this is a bug (and that the patch below
>>>> is the correct way to fix it) I'll resubmit it properly.
>>>>
>>>> Thanks,
>>>> Stratos
>>>>
>>>>> Thanks,
>>>>>
>>>>> Fam
>>>>>
>>>>>> After some investigation, we traced it down to the monitor_flush()
>>>>>> function in monitor.c. Specifically, when a second client connects to
>>>>>> the qmp (client B), while another one is already using it (client A), we
>>>>>> get the following from stracing the second client (client B):
>>>>>>
>>>>>> connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0
>>>>>> getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
>>>>>> select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3])
>>>>>> select(4, [0 3], [], [], NULL
>>>>>>
>>>>>> So, the connect() syscall from client B succeeds, although client B
>>>>>> connection has not yet been accepted by the qmp server (it's still in
>>>>>> the backlog of the qmp listening socket).
>>>>>>
>>>>>> After killing client B and then client A, we see the following when
>>>>>> stracing the qemu proc:
>>>>>>
>>>>>> 22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9
>>>>>> 22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR)
>>>>>> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0
>>>>>> 22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
>>>>>> 22363 fcntl(9, F_GETFL) = 0x802 (flags
>>>>>> O_RDWR|O_NONBLOCK)
>>>>>> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) =
>>>>>> -1 EPIPE (Broken pipe)
>>>>>> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) ---
>>>>>>
>>>>>> The qmp server / qemu accepts the connection from client B (who has now
>>>>>> closed the connection) and tries to write the greeting message to the
>>>>>> socket fd. This results in write returning an error (EPIPE).
>>>>>>
>>>>>> The monitor_flush() function doesn't seem to handle this case (write
>>>>>> error). Instead, it adds a watch / handler to retry the write operation.
>>>>>> Thus, mon->outbuf is not cleaned up properly, which results in duplicate
>>>>>> greeting messages for the next client to connect.
>>>>>>
>>>>>> The following seems to do the trick.
>>>>>>
>>>>>> diff --git a/monitor.c b/monitor.c
>>>>>> index 845f608..5622f20 100644
>>>>>> --- a/monitor.c
>>>>>> +++ b/monitor.c
>>>>>> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
>>>>>>
>>>>>> if (len && !mon->mux_out) {
>>>>>> rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
>>>>>> - if (rc == len) {
>>>>>> - /* all flushed */
>>>>>> + if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
>>>>>> + /* all flushed or error */
>>>>>> QDECREF(mon->outbuf);
>>>>>> mon->outbuf = qstring_new();
>>>>>> return;
>>>>>>
>>>>>> Comments?
>>>>>>
>>>>>> Thanks,
>>>>>> Stratos
>>>>>>
>>>>>> --
>>>>>> Stratos Psomadakis
>>>>>> <psomas@grnet.gr>
>>>>>>
>>>>
>>>> --
>>>> Stratos Psomadakis
>>>> <psomas@grnet.gr>
>>>>
>>>>
>>>
--
Stratos Psomadakis
<psomas@grnet.gr>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Possible bug in monitor code
2014-01-23 15:33 ` Stratos Psomadakis
@ 2014-01-23 18:28 ` Luiz Capitulino
2014-01-24 10:14 ` Stratos Psomadakis
0 siblings, 1 reply; 15+ messages in thread
From: Luiz Capitulino @ 2014-01-23 18:28 UTC (permalink / raw)
To: Stratos Psomadakis
Cc: Synnefo Development, Ganeti Development, Fam Zheng, qemu-devel,
Dimitris Aragiorgis
On Thu, 23 Jan 2014 17:33:33 +0200
Stratos Psomadakis <psomas@grnet.gr> wrote:
> On 01/23/2014 03:54 PM, Luiz Capitulino wrote:
> > On Thu, 23 Jan 2014 08:44:02 -0500
> > Luiz Capitulino <lcapitulino@redhat.com> wrote:
> >
> >> On Thu, 23 Jan 2014 19:23:51 +0800
> >> Fam Zheng <famz@redhat.com> wrote:
> >>
> >>> Bcc:
> >>> Subject: Re: [Qemu-devel] Possible bug in monitor code
> >>> Reply-To:
> >>> In-Reply-To: <52E0EC4B.7010603@grnet.gr>
> >>>
> >>> On Thu, 01/23 12:17, Stratos Psomadakis wrote:
> >>>> On 01/23/2014 05:07 AM, Fam Zheng wrote:
> >>>>> On Wed, 01/22 17:53, Stratos Psomadakis wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> we've encountered a weird issue regarding monitor (qmp and hmp) behavior
> >>>>>> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue:
> >>>>>>
> >>>>>> 1) Client A connects to qmp socket with socat
> >>>>>> 2) Client A gets greeting message {"QMP": {"version": ..}
> >>>>>> 3) Client A waits (select on the socket's fd)
> >>>>>> 4) Client B tries to connect to the *same* qmp socket with socat
> >> QMP/HMP can only handle a single client per connection, so this is
> >> not supported. What you could do is to create multiple QMP/HMP instances
> >> on the command-line, but this has never been fully tested so we don't
> >> officially support this either (although it may work).
> >>
> >> Now, not gracefully failing on step 4 is a real bug here. I think the
> >> best thing to do would be to close client's B connection. Patches are
> >> welcome :)
> > Thinking about this again, I think this should already be the case
> > (as I don't believe chardev code is sitting on accept()). So maybe
> > you triggered a race on chardev code or broke something there.
>
> Yes, the chardev code doesn't accept any more connections until the
> currently active connection is closed.
>
> However, when a new client tries to connect (while there's another qmp /
> hmp connection active) the kernel, as long as there's room in the socket
> backlog, will return to the client that the connection has been
> successfully established and will queue the connection to be accepted
> later, when qmp actually finishes with its active connection and
> re-executes accept().
>
> The problem arises if the new client closes the connection before the
> older one is finished. When qmp runs accept() again, it will get a
> socket fd for a client who has now closed the connection. At this point,
> the monitor control event handler will get triggered and try to write /
> flush the greeting message to the client. The client however has closed
> its socket so the write will fail with SIGPIPE / EPIPE. Neither the
> qemu-char nor the monitor code seem to handle this situation.
>
> Btw, have you tried to reproduce it?
Not yet, I may have some time tomorrow. How reproducible is it for you?
Maybe you could try to reproduce with a different subsystem so that we
can rule out or confirm monitor's involvement? Like -serial?
Another question: have you tried to reproduce with an old qemu version
(say v1.0) to see if this bug always existed? If the bug was introduced
in some recent QEMU version you could try to bisect it.
>
> Thanks,
> Stratos
>
> >
> >>>>>> 5) Client B does *NOT* get any greating message
> >>>>>> 6) Client B waits (select on the socket's fd)
> >>>>>> 7) Client B closes connection (kill socat)
> >>>>>> 8) Client A quits too
> >>>>>> 9) Client C connects to qmp socket
> >>>>>> 10) Client C gets *two* greeting messages!!!
> >>>>> Hi Stratos, thank you for debugging and reporting this.
> >>>>>
> >>>>> I tested this sequence but can't fully reproduce this. What I see is 5) but no
> >>>>> 10). Client C acts normally. And your patch below doesn't solve it for me.
> >>>> Hm, which qemu version (or repo branch / tag) did you use? We did a
> >>>> quick scan of the master branch code / commits, but we didn't find
> >>>> anything that might fix the issue.
> >>> I tried on qemu.git master, and also 1.7. I think it is a bug: in my test, step
> >>> 5), B not getting any greeting message.
> >>>
> >>> But I get only one greeting message in step 10), which is a bit different from
> >>> what you reported.
> >>>
> >>> And no difference with your patch applied.
> >>>
> >>> Cc'ing Luiz who maintains the monitor code and may have more ideas.
> >>>
> >>> Thanks,
> >>>
> >>> Fam
> >>>
> >>>>> To submit a patch, please follow instructions as described in
> >>>>> http://wiki.qemu.org/Contribute/SubmitAPatch
> >>>>> so it could be picked up by maintainers. Specifically, you need to format your
> >>>>> patch email with "git format-patch" and add a "Signed-off-by:" line in your
> >>>>> patch email.
> >>>> Ok. If any dev can confirm that this is a bug (and that the patch below
> >>>> is the correct way to fix it) I'll resubmit it properly.
> >>>>
> >>>> Thanks,
> >>>> Stratos
> >>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Fam
> >>>>>
> >>>>>> After some investigation, we traced it down to the monitor_flush()
> >>>>>> function in monitor.c. Specifically, when a second client connects to
> >>>>>> the qmp (client B), while another one is already using it (client A), we
> >>>>>> get the following from stracing the second client (client B):
> >>>>>>
> >>>>>> connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0
> >>>>>> getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
> >>>>>> select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3])
> >>>>>> select(4, [0 3], [], [], NULL
> >>>>>>
> >>>>>> So, the connect() syscall from client B succeeds, although client B
> >>>>>> connection has not yet been accepted by the qmp server (it's still in
> >>>>>> the backlog of the qmp listening socket).
> >>>>>>
> >>>>>> After killing client B and then client A, we see the following when
> >>>>>> stracing the qemu proc:
> >>>>>>
> >>>>>> 22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9
> >>>>>> 22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR)
> >>>>>> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> >>>>>> 22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
> >>>>>> 22363 fcntl(9, F_GETFL) = 0x802 (flags
> >>>>>> O_RDWR|O_NONBLOCK)
> >>>>>> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) =
> >>>>>> -1 EPIPE (Broken pipe)
> >>>>>> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) ---
> >>>>>>
> >>>>>> The qmp server / qemu accepts the connection from client B (who has now
> >>>>>> closed the connection) and tries to write the greeting message to the
> >>>>>> socket fd. This results in write returning an error (EPIPE).
> >>>>>>
> >>>>>> The monitor_flush() function doesn't seem to handle this case (write
> >>>>>> error). Instead, it adds a watch / handler to retry the write operation.
> >>>>>> Thus, mon->outbuf is not cleaned up properly, which results in duplicate
> >>>>>> greeting messages for the next client to connect.
> >>>>>>
> >>>>>> The following seems to do the trick.
> >>>>>>
> >>>>>> diff --git a/monitor.c b/monitor.c
> >>>>>> index 845f608..5622f20 100644
> >>>>>> --- a/monitor.c
> >>>>>> +++ b/monitor.c
> >>>>>> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
> >>>>>>
> >>>>>> if (len && !mon->mux_out) {
> >>>>>> rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
> >>>>>> - if (rc == len) {
> >>>>>> - /* all flushed */
> >>>>>> + if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
> >>>>>> + /* all flushed or error */
> >>>>>> QDECREF(mon->outbuf);
> >>>>>> mon->outbuf = qstring_new();
> >>>>>> return;
> >>>>>>
> >>>>>> Comments?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Stratos
> >>>>>>
> >>>>>> --
> >>>>>> Stratos Psomadakis
> >>>>>> <psomas@grnet.gr>
> >>>>>>
> >>>>
> >>>> --
> >>>> Stratos Psomadakis
> >>>> <psomas@grnet.gr>
> >>>>
> >>>>
> >>>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Possible bug in monitor code
2014-01-23 18:28 ` Luiz Capitulino
@ 2014-01-24 10:14 ` Stratos Psomadakis
2014-01-24 10:52 ` Apollon Oikonomopoulos
2014-01-24 14:14 ` Luiz Capitulino
0 siblings, 2 replies; 15+ messages in thread
From: Stratos Psomadakis @ 2014-01-24 10:14 UTC (permalink / raw)
To: Luiz Capitulino
Cc: Synnefo Development, Ganeti Development, Fam Zheng, qemu-devel,
Dimitris Aragiorgis
[-- Attachment #1: Type: text/plain, Size: 8992 bytes --]
On 01/23/2014 08:28 PM, Luiz Capitulino wrote:
> On Thu, 23 Jan 2014 17:33:33 +0200
> Stratos Psomadakis <psomas@grnet.gr> wrote:
>
>> On 01/23/2014 03:54 PM, Luiz Capitulino wrote:
>>> On Thu, 23 Jan 2014 08:44:02 -0500
>>> Luiz Capitulino <lcapitulino@redhat.com> wrote:
>>>
>>>> On Thu, 23 Jan 2014 19:23:51 +0800
>>>> Fam Zheng <famz@redhat.com> wrote:
>>>>
>>>>> Bcc:
>>>>> Subject: Re: [Qemu-devel] Possible bug in monitor code
>>>>> Reply-To:
>>>>> In-Reply-To: <52E0EC4B.7010603@grnet.gr>
>>>>>
>>>>> On Thu, 01/23 12:17, Stratos Psomadakis wrote:
>>>>>> On 01/23/2014 05:07 AM, Fam Zheng wrote:
>>>>>>> On Wed, 01/22 17:53, Stratos Psomadakis wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> we've encountered a weird issue regarding monitor (qmp and hmp) behavior
>>>>>>>> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue:
>>>>>>>>
>>>>>>>> 1) Client A connects to qmp socket with socat
>>>>>>>> 2) Client A gets greeting message {"QMP": {"version": ..}
>>>>>>>> 3) Client A waits (select on the socket's fd)
>>>>>>>> 4) Client B tries to connect to the *same* qmp socket with socat
>>>> QMP/HMP can only handle a single client per connection, so this is
>>>> not supported. What you could do is to create multiple QMP/HMP instances
>>>> on the command-line, but this has never been fully tested so we don't
>>>> officially support this either (although it may work).
>>>>
>>>> Now, not gracefully failing on step 4 is a real bug here. I think the
>>>> best thing to do would be to close client's B connection. Patches are
>>>> welcome :)
>>> Thinking about this again, I think this should already be the case
>>> (as I don't believe chardev code is sitting on accept()). So maybe
>>> you triggered a race on chardev code or broke something there.
>> Yes, the chardev code doesn't accept any more connections until the
>> currently active connection is closed.
>>
>> However, when a new client tries to connect (while there's another qmp /
>> hmp connection active) the kernel, as long as there's room in the socket
>> backlog, will return to the client that the connection has been
>> successfully established and will queue the connection to be accepted
>> later, when qmp actually finishes with its active connection and
>> re-executes accept().
>>
>> The problem arises if the new client closes the connection before the
>> older one is finished. When qmp runs accept() again, it will get a
>> socket fd for a client who has now closed the connection. At this point,
>> the monitor control event handler will get triggered and try to write /
>> flush the greeting message to the client. The client however has closed
>> its socket so the write will fail with SIGPIPE / EPIPE. Neither the
>> qemu-char nor the monitor code seem to handle this situation.
>>
>> Btw, have you tried to reproduce it?
> Not yet, I may have some time tomorrow. How reproducible is it for you?
We can trigger it (by following the steps described in the first mail)
consistently.
> Another question: have you tried to reproduce with an old qemu version
> (say v1.0) to see if this bug always existed? If the bug was introduced
> in some recent QEMU version you could try to bisect it.
v1.1 is not affected. I checked the code and it seems the monitor code
has been refactored since v1.1.
> Maybe you could try to reproduce with a different subsystem so that we
> can rule out or confirm monitor's involvement? Like -serial?
It's actually a fault of the monitor_flush() function. As far as I can
understand, monitor_flush() calls qemu_chr_fe_write() and doesn't handle
all of the return codes / error cases properly (as I described in a
previous mail). If you check the function, you'll see that the final
case (where it set ups a watch / callback) always assumes an EAGAIN /
EWOULDBLOCK error.
If you can verify / confirm that this is the case and that the patch
sent resolves the issue in a sane / correct way, I'll resubmit it
properly (with git-format-patch, a git log msg etc).
Thanks,
Stratos
>
>> Thanks,
>> Stratos
>>
>>>>>>>> 5) Client B does *NOT* get any greating message
>>>>>>>> 6) Client B waits (select on the socket's fd)
>>>>>>>> 7) Client B closes connection (kill socat)
>>>>>>>> 8) Client A quits too
>>>>>>>> 9) Client C connects to qmp socket
>>>>>>>> 10) Client C gets *two* greeting messages!!!
>>>>>>> Hi Stratos, thank you for debugging and reporting this.
>>>>>>>
>>>>>>> I tested this sequence but can't fully reproduce this. What I see is 5) but no
>>>>>>> 10). Client C acts normally. And your patch below doesn't solve it for me.
>>>>>> Hm, which qemu version (or repo branch / tag) did you use? We did a
>>>>>> quick scan of the master branch code / commits, but we didn't find
>>>>>> anything that might fix the issue.
>>>>> I tried on qemu.git master, and also 1.7. I think it is a bug: in my test, step
>>>>> 5), B not getting any greeting message.
>>>>>
>>>>> But I get only one greeting message in step 10), which is a bit different from
>>>>> what you reported.
>>>>>
>>>>> And no difference with your patch applied.
>>>>>
>>>>> Cc'ing Luiz who maintains the monitor code and may have more ideas.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Fam
>>>>>
>>>>>>> To submit a patch, please follow instructions as described in
>>>>>>> http://wiki.qemu.org/Contribute/SubmitAPatch
>>>>>>> so it could be picked up by maintainers. Specifically, you need to format your
>>>>>>> patch email with "git format-patch" and add a "Signed-off-by:" line in your
>>>>>>> patch email.
>>>>>> Ok. If any dev can confirm that this is a bug (and that the patch below
>>>>>> is the correct way to fix it) I'll resubmit it properly.
>>>>>>
>>>>>> Thanks,
>>>>>> Stratos
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Fam
>>>>>>>
>>>>>>>> After some investigation, we traced it down to the monitor_flush()
>>>>>>>> function in monitor.c. Specifically, when a second client connects to
>>>>>>>> the qmp (client B), while another one is already using it (client A), we
>>>>>>>> get the following from stracing the second client (client B):
>>>>>>>>
>>>>>>>> connect(3, {sa_family=AF_FILE, path="foo.mon"}, 9) = 0
>>>>>>>> getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
>>>>>>>> select(4, [0 3], [1 3], [], NULL) = 2 (out [1 3])
>>>>>>>> select(4, [0 3], [], [], NULL
>>>>>>>>
>>>>>>>> So, the connect() syscall from client B succeeds, although client B
>>>>>>>> connection has not yet been accepted by the qmp server (it's still in
>>>>>>>> the backlog of the qmp listening socket).
>>>>>>>>
>>>>>>>> After killing client B and then client A, we see the following when
>>>>>>>> stracing the qemu proc:
>>>>>>>>
>>>>>>>> 22363 accept4(6, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 9
>>>>>>>> 22363 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR)
>>>>>>>> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0
>>>>>>>> 22363 fstat(9, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
>>>>>>>> 22363 fcntl(9, F_GETFL) = 0x802 (flags
>>>>>>>> O_RDWR|O_NONBLOCK)
>>>>>>>> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 127) =
>>>>>>>> -1 EPIPE (Broken pipe)
>>>>>>>> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) ---
>>>>>>>>
>>>>>>>> The qmp server / qemu accepts the connection from client B (who has now
>>>>>>>> closed the connection) and tries to write the greeting message to the
>>>>>>>> socket fd. This results in write returning an error (EPIPE).
>>>>>>>>
>>>>>>>> The monitor_flush() function doesn't seem to handle this case (write
>>>>>>>> error). Instead, it adds a watch / handler to retry the write operation.
>>>>>>>> Thus, mon->outbuf is not cleaned up properly, which results in duplicate
>>>>>>>> greeting messages for the next client to connect.
>>>>>>>>
>>>>>>>> The following seems to do the trick.
>>>>>>>>
>>>>>>>> diff --git a/monitor.c b/monitor.c
>>>>>>>> index 845f608..5622f20 100644
>>>>>>>> --- a/monitor.c
>>>>>>>> +++ b/monitor.c
>>>>>>>> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon)
>>>>>>>>
>>>>>>>> if (len && !mon->mux_out) {
>>>>>>>> rc = qemu_chr_fe_write(mon->chr, (const uint8_t *) buf, len);
>>>>>>>> - if (rc == len) {
>>>>>>>> - /* all flushed */
>>>>>>>> + if ((rc < 0 && errno != EAGAIN) || (rc == len)) {
>>>>>>>> + /* all flushed or error */
>>>>>>>> QDECREF(mon->outbuf);
>>>>>>>> mon->outbuf = qstring_new();
>>>>>>>> return;
>>>>>>>>
>>>>>>>> Comments?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Stratos
>>>>>>>>
>>>>>>>> --
>>>>>>>> Stratos Psomadakis
>>>>>>>> <psomas@grnet.gr>
>>>>>>>>
>>>>>> --
>>>>>> Stratos Psomadakis
>>>>>> <psomas@grnet.gr>
>>>>>>
>>>>>>
>>
--
Stratos Psomadakis
<psomas@grnet.gr>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Possible bug in monitor code
2014-01-24 10:14 ` Stratos Psomadakis
@ 2014-01-24 10:52 ` Apollon Oikonomopoulos
2014-01-24 14:14 ` Luiz Capitulino
1 sibling, 0 replies; 15+ messages in thread
From: Apollon Oikonomopoulos @ 2014-01-24 10:52 UTC (permalink / raw)
To: Stratos Psomadakis
Cc: Fam Zheng, qemu-devel, Luiz Capitulino, Synnefo Development,
Dimitris Aragiorgis, Ganeti Development
[-- Attachment #1: Type: text/plain, Size: 2255 bytes --]
Hi all,
On 12:14 Fri 24 Jan , Stratos Psomadakis wrote:
> On 01/23/2014 08:28 PM, Luiz Capitulino wrote:
> > Not yet, I may have some time tomorrow. How reproducible is it for
> > you?
>
> We can trigger it (by following the steps described in the first mail)
> consistently.
>
> > Another question: have you tried to reproduce with an old qemu version
> > (say v1.0) to see if this bug always existed? If the bug was introduced
> > in some recent QEMU version you could try to bisect it.
>
> v1.1 is not affected. I checked the code and it seems the monitor code
> has been refactored since v1.1.
>
> > Maybe you could try to reproduce with a different subsystem so that we
> > can rule out or confirm monitor's involvement? Like -serial?
>
> It's actually a fault of the monitor_flush() function. As far as I can
> understand, monitor_flush() calls qemu_chr_fe_write() and doesn't handle
> all of the return codes / error cases properly (as I described in a
> previous mail). If you check the function, you'll see that the final
> case (where it set ups a watch / callback) always assumes an EAGAIN /
> EWOULDBLOCK error.
>
> If you can verify / confirm that this is the case and that the patch
> sent resolves the issue in a sane / correct way, I'll resubmit it
> properly (with git-format-patch, a git log msg etc).
Please see the attached testcase (python script) that programmatically
reproduces this. Sample output with qemu 1.7.0:
------------------------------------------------------------------------
$ ./test-qmp.py
Spawning qemu
Connecting client 1
Monitor output:
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 1}, "package": " (Debian 1.7.0+dfsg-2)"}, "capabilities": []}}
Connecting client 2
Monitor output:
(timeout, disconnecting)
Disconnecting client 1
Connecting client 3
Monitor output
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 1}, "package": " (Debian 1.7.0+dfsg-2)"}, "capabilities": []}}
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 1}, "package": " (Debian 1.7.0+dfsg-2)"}, "capabilities": []}}
Terminating qemu
qemu: terminating on signal 15 from pid 11269
------------------------------------------------------------------------
Regards,
Apollon
[-- Attachment #2: test-qmp.py --]
[-- Type: text/x-python, Size: 1182 bytes --]
#!/usr/bin/python
import os
import socket
import tempfile
import subprocess
from time import sleep
sock_path = tempfile.mktemp()
print "Spawning qemu"
print
qemu = subprocess.Popen(["/usr/bin/qemu", "-chardev",
"socket,id=mon0,path=%s,server,nowait" % sock_path,
"-mon", "chardev=mon0,mode=control",
"-display", "none"])
# Wait for qemu to initialize
while not os.path.exists(sock_path):
sleep(0.1)
print "Connecting client 1\n"
cl1 = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
cl1.connect(sock_path)
print "Monitor output:"
print cl1.recv(1024)
print
print "Connecting client 2\n"
cl2 = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
cl2.settimeout(1)
try:
cl2.connect(sock_path)
print "Monitor output:"
print cl2.recv(1024)
except socket.timeout:
print "(timeout, disconnecting)\n"
cl2.close()
print "Disconnecting client 1\n"
cl1.close()
print "Connecting client 3\n"
cl3 = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
cl3.connect(sock_path)
print "Monitor output"
print cl3.recv(1024)
cl3.close()
print "Terminating qemu"
qemu.terminate()
qemu.wait()
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Possible bug in monitor code
2014-01-24 10:14 ` Stratos Psomadakis
2014-01-24 10:52 ` Apollon Oikonomopoulos
@ 2014-01-24 14:14 ` Luiz Capitulino
1 sibling, 0 replies; 15+ messages in thread
From: Luiz Capitulino @ 2014-01-24 14:14 UTC (permalink / raw)
To: Stratos Psomadakis
Cc: Synnefo Development, Ganeti Development, Fam Zheng, qemu-devel,
Dimitris Aragiorgis
On Fri, 24 Jan 2014 12:14:12 +0200
Stratos Psomadakis <psomas@grnet.gr> wrote:
> On 01/23/2014 08:28 PM, Luiz Capitulino wrote:
> > On Thu, 23 Jan 2014 17:33:33 +0200
> > Stratos Psomadakis <psomas@grnet.gr> wrote:
> >
> >> On 01/23/2014 03:54 PM, Luiz Capitulino wrote:
> >>> On Thu, 23 Jan 2014 08:44:02 -0500
> >>> Luiz Capitulino <lcapitulino@redhat.com> wrote:
> >>>
> >>>> On Thu, 23 Jan 2014 19:23:51 +0800
> >>>> Fam Zheng <famz@redhat.com> wrote:
> >>>>
> >>>>> Bcc:
> >>>>> Subject: Re: [Qemu-devel] Possible bug in monitor code
> >>>>> Reply-To:
> >>>>> In-Reply-To: <52E0EC4B.7010603@grnet.gr>
> >>>>>
> >>>>> On Thu, 01/23 12:17, Stratos Psomadakis wrote:
> >>>>>> On 01/23/2014 05:07 AM, Fam Zheng wrote:
> >>>>>>> On Wed, 01/22 17:53, Stratos Psomadakis wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> we've encountered a weird issue regarding monitor (qmp and hmp) behavior
> >>>>>>>> with qemu-1.7 (and qemu-1.5). The following steps will reproduce the issue:
> >>>>>>>>
> >>>>>>>> 1) Client A connects to qmp socket with socat
> >>>>>>>> 2) Client A gets greeting message {"QMP": {"version": ..}
> >>>>>>>> 3) Client A waits (select on the socket's fd)
> >>>>>>>> 4) Client B tries to connect to the *same* qmp socket with socat
> >>>> QMP/HMP can only handle a single client per connection, so this is
> >>>> not supported. What you could do is to create multiple QMP/HMP instances
> >>>> on the command-line, but this has never been fully tested so we don't
> >>>> officially support this either (although it may work).
> >>>>
> >>>> Now, not gracefully failing on step 4 is a real bug here. I think the
> >>>> best thing to do would be to close client's B connection. Patches are
> >>>> welcome :)
> >>> Thinking about this again, I think this should already be the case
> >>> (as I don't believe chardev code is sitting on accept()). So maybe
> >>> you triggered a race on chardev code or broke something there.
> >> Yes, the chardev code doesn't accept any more connections until the
> >> currently active connection is closed.
> >>
> >> However, when a new client tries to connect (while there's another qmp /
> >> hmp connection active) the kernel, as long as there's room in the socket
> >> backlog, will return to the client that the connection has been
> >> successfully established and will queue the connection to be accepted
> >> later, when qmp actually finishes with its active connection and
> >> re-executes accept().
> >>
> >> The problem arises if the new client closes the connection before the
> >> older one is finished. When qmp runs accept() again, it will get a
> >> socket fd for a client who has now closed the connection. At this point,
> >> the monitor control event handler will get triggered and try to write /
> >> flush the greeting message to the client. The client however has closed
> >> its socket so the write will fail with SIGPIPE / EPIPE. Neither the
> >> qemu-char nor the monitor code seem to handle this situation.
> >>
> >> Btw, have you tried to reproduce it?
> > Not yet, I may have some time tomorrow. How reproducible is it for you?
>
> We can trigger it (by following the steps described in the first mail)
> consistently.
>
> > Another question: have you tried to reproduce with an old qemu version
> > (say v1.0) to see if this bug always existed? If the bug was introduced
> > in some recent QEMU version you could try to bisect it.
>
> v1.1 is not affected. I checked the code and it seems the monitor code
> has been refactored since v1.1.
>
> > Maybe you could try to reproduce with a different subsystem so that we
> > can rule out or confirm monitor's involvement? Like -serial?
>
> It's actually a fault of the monitor_flush() function. As far as I can
> understand, monitor_flush() calls qemu_chr_fe_write() and doesn't handle
> all of the return codes / error cases properly (as I described in a
> previous mail). If you check the function, you'll see that the final
> case (where it set ups a watch / callback) always assumes an EAGAIN /
> EWOULDBLOCK error.
>
> If you can verify / confirm that this is the case and that the patch
> sent resolves the issue in a sane / correct way, I'll resubmit it
> properly (with git-format-patch, a git log msg etc).
I'll check that later today.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-01-29 14:13 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-22 15:53 [Qemu-devel] Possible bug in monitor code Stratos Psomadakis
2014-01-23 3:07 ` Fam Zheng
2014-01-23 10:17 ` Stratos Psomadakis
2014-01-24 23:48 ` Luiz Capitulino
2014-01-27 10:30 ` [Qemu-devel] [PATCH] monitor: Cleanup mon->outbuf on write error Stratos Psomadakis
2014-01-29 10:46 ` Stratos Psomadakis
2014-01-29 14:12 ` Luiz Capitulino
-- strict thread matches above, loose matches on Subject: below --
2014-01-23 11:23 [Qemu-devel] Possible bug in monitor code Fam Zheng
2014-01-23 13:44 ` Luiz Capitulino
2014-01-23 13:54 ` Luiz Capitulino
2014-01-23 15:33 ` Stratos Psomadakis
2014-01-23 18:28 ` Luiz Capitulino
2014-01-24 10:14 ` Stratos Psomadakis
2014-01-24 10:52 ` Apollon Oikonomopoulos
2014-01-24 14:14 ` Luiz Capitulino
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).