From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60706) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W6dmm-0000S1-Iu for qemu-devel@nongnu.org; Fri, 24 Jan 2014 05:14:58 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W6dmg-0002ts-Gy for qemu-devel@nongnu.org; Fri, 24 Jan 2014 05:14:52 -0500 Received: from averel.grnet-hq.admin.grnet.gr ([195.251.29.3]:28524) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W6dmg-0002qr-4w for qemu-devel@nongnu.org; Fri, 24 Jan 2014 05:14:46 -0500 Message-ID: <52E23CF4.7030302@grnet.gr> Date: Fri, 24 Jan 2014 12:14:12 +0200 From: Stratos Psomadakis MIME-Version: 1.0 References: <20140123112351.GA21477@T430.redhat.com> <20140123084402.3c82b2e9@redhat.com> <20140123085454.2c0fd29a@redhat.com> <52E1364D.2010807@grnet.gr> <20140123132858.75ce8507@redhat.com> In-Reply-To: <20140123132858.75ce8507@redhat.com> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ovHLPJvBhAkQVb2p4UJ8ckfJH7PBkGUUW" Subject: Re: [Qemu-devel] Possible bug in monitor code List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Luiz Capitulino Cc: Synnefo Development , Ganeti Development , Fam Zheng , qemu-devel@nongnu.org, Dimitris Aragiorgis This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --ovHLPJvBhAkQVb2p4UJ8ckfJH7PBkGUUW Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 01/23/2014 08:28 PM, Luiz Capitulino wrote: > On Thu, 23 Jan 2014 17:33:33 +0200 > Stratos Psomadakis wrote: > >> On 01/23/2014 03:54 PM, Luiz Capitulino wrote: >>> On Thu, 23 Jan 2014 08:44:02 -0500 >>> Luiz Capitulino wrote: >>> >>>> On Thu, 23 Jan 2014 19:23:51 +0800 >>>> Fam Zheng wrote: >>>> >>>>> Bcc:=20 >>>>> Subject: Re: [Qemu-devel] Possible bug in monitor code >>>>> Reply-To:=20 >>>>> In-Reply-To: <52E0EC4B.7010603@grnet.gr> >>>>> >>>>> On Thu, 01/23 12:17, Stratos Psomadakis wrote: >>>>>> On 01/23/2014 05:07 AM, Fam Zheng wrote: >>>>>>> On Wed, 01/22 17:53, Stratos Psomadakis wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> we've encountered a weird issue regarding monitor (qmp and hmp) = behavior >>>>>>>> with qemu-1.7 (and qemu-1.5). The following steps will reproduce= the issue: >>>>>>>> >>>>>>>> 1) Client A connects to qmp socket with socat >>>>>>>> 2) Client A gets greeting message {"QMP": {"version": ..} >>>>>>>> 3) Client A waits (select on the socket's fd) >>>>>>>> 4) Client B tries to connect to the *same* qmp socket with s= ocat >>>> QMP/HMP can only handle a single client per connection, so this is >>>> not supported. What you could do is to create multiple QMP/HMP insta= nces >>>> on the command-line, but this has never been fully tested so we don'= t >>>> officially support this either (although it may work). >>>> >>>> Now, not gracefully failing on step 4 is a real bug here. I think th= e >>>> best thing to do would be to close client's B connection. Patches ar= e >>>> welcome :) >>> Thinking about this again, I think this should already be the case >>> (as I don't believe chardev code is sitting on accept()). So maybe >>> you triggered a race on chardev code or broke something there. >> Yes, the chardev code doesn't accept any more connections until the >> currently active connection is closed. >> >> However, when a new client tries to connect (while there's another qmp= / >> hmp connection active) the kernel, as long as there's room in the sock= et >> backlog, will return to the client that the connection has been >> successfully established and will queue the connection to be accepted >> later, when qmp actually finishes with its active connection and >> re-executes accept(). >> >> The problem arises if the new client closes the connection before the >> older one is finished. When qmp runs accept() again, it will get a >> socket fd for a client who has now closed the connection. At this poin= t, >> the monitor control event handler will get triggered and try to write = / >> flush the greeting message to the client. The client however has close= d >> its socket so the write will fail with SIGPIPE / EPIPE. Neither the >> qemu-char nor the monitor code seem to handle this situation. >> >> Btw, have you tried to reproduce it? > Not yet, I may have some time tomorrow. How reproducible is it for you?= We can trigger it (by following the steps described in the first mail) consistently. > Another question: have you tried to reproduce with an old qemu version > (say v1.0) to see if this bug always existed? If the bug was introduced= > in some recent QEMU version you could try to bisect it. v1.1 is not affected. I checked the code and it seems the monitor code has been refactored since v1.1. > Maybe you could try to reproduce with a different subsystem so that we > can rule out or confirm monitor's involvement? Like -serial? It's actually a fault of the monitor_flush() function. As far as I can understand, monitor_flush() calls qemu_chr_fe_write() and doesn't handle all of the return codes / error cases properly (as I described in a previous mail). If you check the function, you'll see that the final case (where it set ups a watch / callback) always assumes an EAGAIN / EWOULDBLOCK error. If you can verify / confirm that this is the case and that the patch sent resolves the issue in a sane / correct way, I'll resubmit it properly (with git-format-patch, a git log msg etc). Thanks, Stratos > >> Thanks, >> Stratos >> >>>>>>>> 5) Client B does *NOT* get any greating message >>>>>>>> 6) Client B waits (select on the socket's fd) >>>>>>>> 7) Client B closes connection (kill socat) >>>>>>>> 8) Client A quits too >>>>>>>> 9) Client C connects to qmp socket >>>>>>>> 10) Client C gets *two* greeting messages!!! >>>>>>> Hi Stratos, thank you for debugging and reporting this. >>>>>>> >>>>>>> I tested this sequence but can't fully reproduce this. What I see= is 5) but no >>>>>>> 10). Client C acts normally. And your patch below doesn't solve i= t for me. >>>>>> Hm, which qemu version (or repo branch / tag) did you use? We did = a >>>>>> quick scan of the master branch code / commits, but we didn't find= >>>>>> anything that might fix the issue. >>>>> I tried on qemu.git master, and also 1.7. I think it is a bug: in m= y test, step >>>>> 5), B not getting any greeting message. >>>>> >>>>> But I get only one greeting message in step 10), which is a bit dif= ferent from >>>>> what you reported. >>>>> >>>>> And no difference with your patch applied. >>>>> >>>>> Cc'ing Luiz who maintains the monitor code and may have more ideas.= >>>>> >>>>> Thanks, >>>>> >>>>> Fam >>>>> >>>>>>> To submit a patch, please follow instructions as described in >>>>>>> http://wiki.qemu.org/Contribute/SubmitAPatch >>>>>>> so it could be picked up by maintainers. Specifically, you need t= o format your >>>>>>> patch email with "git format-patch" and add a "Signed-off-by:" li= ne in your >>>>>>> patch email. >>>>>> Ok. If any dev can confirm that this is a bug (and that the patch = below >>>>>> is the correct way to fix it) I'll resubmit it properly. >>>>>> >>>>>> Thanks, >>>>>> Stratos >>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Fam >>>>>>> >>>>>>>> After some investigation, we traced it down to the monitor_flush= () >>>>>>>> function in monitor.c. Specifically, when a second client connec= ts to >>>>>>>> the qmp (client B), while another one is already using it (clien= t A), we >>>>>>>> get the following from stracing the second client (client B): >>>>>>>> >>>>>>>> connect(3, {sa_family=3DAF_FILE, path=3D"foo.mon"}, 9) =3D 0= >>>>>>>> getsockname(3, {sa_family=3DAF_FILE, NULL}, [2]) =3D 0 >>>>>>>> select(4, [0 3], [1 3], [], NULL) =3D 2 (out [1 3]) >>>>>>>> select(4, [0 3], [], [], NULL >>>>>>>> >>>>>>>> So, the connect() syscall from client B succeeds, although clien= t B >>>>>>>> connection has not yet been accepted by the qmp server (it's sti= ll in >>>>>>>> the backlog of the qmp listening socket). >>>>>>>> >>>>>>>> After killing client B and then client A, we see the following w= hen >>>>>>>> stracing the qemu proc: >>>>>>>> >>>>>>>> 22363 accept4(6, {sa_family=3DAF_FILE, NULL}, [2], SOCK_CLOE= XEC) =3D 9 >>>>>>>> 22363 fcntl(9, F_GETFL) =3D 0x2 (flags O_RDW= R) >>>>>>>> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) =3D 0 >>>>>>>> 22363 fstat(9, {st_mode=3DS_IFSOCK|0777, st_size=3D0, ...}) = =3D 0 >>>>>>>> 22363 fcntl(9, F_GETFL) =3D 0x802 (flags >>>>>>>> O_RDWR|O_NONBLOCK) >>>>>>>> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"...,= 127) =3D >>>>>>>> -1 EPIPE (Broken pipe) >>>>>>>> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) --- >>>>>>>> >>>>>>>> The qmp server / qemu accepts the connection from client B (who = has now >>>>>>>> closed the connection) and tries to write the greeting message t= o the >>>>>>>> socket fd. This results in write returning an error (EPIPE). >>>>>>>> >>>>>>>> The monitor_flush() function doesn't seem to handle this case (w= rite >>>>>>>> error). Instead, it adds a watch / handler to retry the write op= eration. >>>>>>>> Thus, mon->outbuf is not cleaned up properly, which results in d= uplicate >>>>>>>> greeting messages for the next client to connect. >>>>>>>> >>>>>>>> The following seems to do the trick. >>>>>>>> >>>>>>>> diff --git a/monitor.c b/monitor.c >>>>>>>> index 845f608..5622f20 100644 >>>>>>>> --- a/monitor.c >>>>>>>> +++ b/monitor.c >>>>>>>> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon) >>>>>>>> =20 >>>>>>>> if (len && !mon->mux_out) { >>>>>>>> rc =3D qemu_chr_fe_write(mon->chr, (const uint8_t *) bu= f, len); >>>>>>>> - if (rc =3D=3D len) { >>>>>>>> - /* all flushed */ >>>>>>>> + if ((rc < 0 && errno !=3D EAGAIN) || (rc =3D=3D len)) {= >>>>>>>> + /* all flushed or error */ >>>>>>>> QDECREF(mon->outbuf); >>>>>>>> mon->outbuf =3D qstring_new(); >>>>>>>> return; >>>>>>>> >>>>>>>> Comments? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Stratos >>>>>>>> >>>>>>>> --=20 >>>>>>>> Stratos Psomadakis >>>>>>>> >>>>>>>> >>>>>> --=20 >>>>>> Stratos Psomadakis >>>>>> >>>>>> >>>>>> >> --=20 Stratos Psomadakis --ovHLPJvBhAkQVb2p4UJ8ckfJH7PBkGUUW Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.21 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJS4jz5AAoJEO/NY8gC4r+pJ3YP/1c9BV7MUx+QQRw3q+QVXRxS HycFqM5q37XIFQw4BTVOgl4d1mq0gK17UtwbQWOcu/qLa8HdCaFqd2ArfRHEhHgD Efm9NDtNDljrJV1KF5dI7m74KyWuVMBANLT/CfOnLcdvaDgnEZElzKIWnZ92EPaL +1CWlQxX2hPM80zAMHUJZ6ngB22wXvNm95yoIjmPUYP4rC8Guwj6OqB5QnWEYb3o cRAi9yn0SrvhkbAkpxl9rc1OgYnHuQ5mJM+cPAdlAC81q88kEH0fi6XsgqenU/zg jVHL6KeaZ1KE6bb9ZXs/WjNN7rsX3CISfth2VhGLN5qLjnAhiMjVaUJWRYCv4t/W +1+tP77GLebV3KmcPZsTHJnuzbHUipI7Gx7lI/t9/n+tnSJZL7n49joApF+Ie/1Q 6I7HIrGQmsuqqDxuda6y6tm95dj0HFlpS7HCAJnA/PZ86cXlb6eBsx1u/DCkTBbc YT4epKLsw/FNI9754U7+YDDo5ps8hoHcgsc80xl1LBfvfvLYGfEWZKc///PkwpBb h7ILGy1x3wHkT5TRL/TsvtJrFnRZAoTLFIPPskweaUc+cx2gU+2dcexvr62uAIt0 CrdfQQ+vqUNXet2FCEbrQZlOBFe8LqXNOvBsiC119M9iPyiuk7/qjqPbgT/4ph19 pYr7flrzmPPX4jJvyq23 =Bs3G -----END PGP SIGNATURE----- --ovHLPJvBhAkQVb2p4UJ8ckfJH7PBkGUUW--