From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40244) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W6MHq-0005C6-Pp for qemu-devel@nongnu.org; Thu, 23 Jan 2014 10:33:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W6MHk-0001LG-HA for qemu-devel@nongnu.org; Thu, 23 Jan 2014 10:33:46 -0500 Received: from averel.grnet-hq.admin.grnet.gr ([195.251.29.3]:7724) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W6MHk-0001Ks-5e for qemu-devel@nongnu.org; Thu, 23 Jan 2014 10:33:40 -0500 Message-ID: <52E1364D.2010807@grnet.gr> Date: Thu, 23 Jan 2014 17:33:33 +0200 From: Stratos Psomadakis MIME-Version: 1.0 References: <20140123112351.GA21477@T430.redhat.com> <20140123084402.3c82b2e9@redhat.com> <20140123085454.2c0fd29a@redhat.com> In-Reply-To: <20140123085454.2c0fd29a@redhat.com> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="TwWb23LD6cOVi3rBep8mK0KSTpARVLfeh" Subject: Re: [Qemu-devel] Possible bug in monitor code List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Luiz Capitulino Cc: Synnefo Development , Ganeti Development , Fam Zheng , qemu-devel@nongnu.org, Dimitris Aragiorgis This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --TwWb23LD6cOVi3rBep8mK0KSTpARVLfeh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 01/23/2014 03:54 PM, Luiz Capitulino wrote: > On Thu, 23 Jan 2014 08:44:02 -0500 > Luiz Capitulino wrote: > >> On Thu, 23 Jan 2014 19:23:51 +0800 >> Fam Zheng wrote: >> >>> Bcc:=20 >>> Subject: Re: [Qemu-devel] Possible bug in monitor code >>> Reply-To:=20 >>> In-Reply-To: <52E0EC4B.7010603@grnet.gr> >>> >>> On Thu, 01/23 12:17, Stratos Psomadakis wrote: >>>> On 01/23/2014 05:07 AM, Fam Zheng wrote: >>>>> On Wed, 01/22 17:53, Stratos Psomadakis wrote: >>>>>> Hi, >>>>>> >>>>>> we've encountered a weird issue regarding monitor (qmp and hmp) be= havior >>>>>> with qemu-1.7 (and qemu-1.5). The following steps will reproduce t= he issue: >>>>>> >>>>>> 1) Client A connects to qmp socket with socat >>>>>> 2) Client A gets greeting message {"QMP": {"version": ..} >>>>>> 3) Client A waits (select on the socket's fd) >>>>>> 4) Client B tries to connect to the *same* qmp socket with soc= at >> QMP/HMP can only handle a single client per connection, so this is >> not supported. What you could do is to create multiple QMP/HMP instanc= es >> on the command-line, but this has never been fully tested so we don't >> officially support this either (although it may work). >> >> Now, not gracefully failing on step 4 is a real bug here. I think the >> best thing to do would be to close client's B connection. Patches are >> welcome :) > Thinking about this again, I think this should already be the case > (as I don't believe chardev code is sitting on accept()). So maybe > you triggered a race on chardev code or broke something there. Yes, the chardev code doesn't accept any more connections until the currently active connection is closed. However, when a new client tries to connect (while there's another qmp / hmp connection active) the kernel, as long as there's room in the socket backlog, will return to the client that the connection has been successfully established and will queue the connection to be accepted later, when qmp actually finishes with its active connection and re-executes accept(). The problem arises if the new client closes the connection before the older one is finished. When qmp runs accept() again, it will get a socket fd for a client who has now closed the connection. At this point, the monitor control event handler will get triggered and try to write / flush the greeting message to the client. The client however has closed its socket so the write will fail with SIGPIPE / EPIPE. Neither the qemu-char nor the monitor code seem to handle this situation. Btw, have you tried to reproduce it? Thanks, Stratos > >>>>>> 5) Client B does *NOT* get any greating message >>>>>> 6) Client B waits (select on the socket's fd) >>>>>> 7) Client B closes connection (kill socat) >>>>>> 8) Client A quits too >>>>>> 9) Client C connects to qmp socket >>>>>> 10) Client C gets *two* greeting messages!!! >>>>> Hi Stratos, thank you for debugging and reporting this. >>>>> >>>>> I tested this sequence but can't fully reproduce this. What I see i= s 5) but no >>>>> 10). Client C acts normally. And your patch below doesn't solve it = for me. >>>> Hm, which qemu version (or repo branch / tag) did you use? We did a >>>> quick scan of the master branch code / commits, but we didn't find >>>> anything that might fix the issue. >>> I tried on qemu.git master, and also 1.7. I think it is a bug: in my = test, step >>> 5), B not getting any greeting message. >>> >>> But I get only one greeting message in step 10), which is a bit diffe= rent from >>> what you reported. >>> >>> And no difference with your patch applied. >>> >>> Cc'ing Luiz who maintains the monitor code and may have more ideas. >>> >>> Thanks, >>> >>> Fam >>> >>>>> To submit a patch, please follow instructions as described in >>>>> http://wiki.qemu.org/Contribute/SubmitAPatch >>>>> so it could be picked up by maintainers. Specifically, you need to = format your >>>>> patch email with "git format-patch" and add a "Signed-off-by:" line= in your >>>>> patch email. >>>> Ok. If any dev can confirm that this is a bug (and that the patch be= low >>>> is the correct way to fix it) I'll resubmit it properly. >>>> >>>> Thanks, >>>> Stratos >>>> >>>>> Thanks, >>>>> >>>>> Fam >>>>> >>>>>> After some investigation, we traced it down to the monitor_flush()= >>>>>> function in monitor.c. Specifically, when a second client connects= to >>>>>> the qmp (client B), while another one is already using it (client = A), we >>>>>> get the following from stracing the second client (client B): >>>>>> >>>>>> connect(3, {sa_family=3DAF_FILE, path=3D"foo.mon"}, 9) =3D 0 >>>>>> getsockname(3, {sa_family=3DAF_FILE, NULL}, [2]) =3D 0 >>>>>> select(4, [0 3], [1 3], [], NULL) =3D 2 (out [1 3]) >>>>>> select(4, [0 3], [], [], NULL >>>>>> >>>>>> So, the connect() syscall from client B succeeds, although client = B >>>>>> connection has not yet been accepted by the qmp server (it's still= in >>>>>> the backlog of the qmp listening socket). >>>>>> >>>>>> After killing client B and then client A, we see the following whe= n >>>>>> stracing the qemu proc: >>>>>> >>>>>> 22363 accept4(6, {sa_family=3DAF_FILE, NULL}, [2], SOCK_CLOEXE= C) =3D 9 >>>>>> 22363 fcntl(9, F_GETFL) =3D 0x2 (flags O_RDWR)= >>>>>> 22363 fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) =3D 0 >>>>>> 22363 fstat(9, {st_mode=3DS_IFSOCK|0777, st_size=3D0, ...}) =3D= 0 >>>>>> 22363 fcntl(9, F_GETFL) =3D 0x802 (flags >>>>>> O_RDWR|O_NONBLOCK) >>>>>> 22363 write(9, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 1= 27) =3D >>>>>> -1 EPIPE (Broken pipe) >>>>>> 22363 --- SIGPIPE (Broken pipe) @ 0 (0) --- >>>>>> >>>>>> The qmp server / qemu accepts the connection from client B (who ha= s now >>>>>> closed the connection) and tries to write the greeting message to = the >>>>>> socket fd. This results in write returning an error (EPIPE). >>>>>> >>>>>> The monitor_flush() function doesn't seem to handle this case (wri= te >>>>>> error). Instead, it adds a watch / handler to retry the write oper= ation. >>>>>> Thus, mon->outbuf is not cleaned up properly, which results in dup= licate >>>>>> greeting messages for the next client to connect. >>>>>> >>>>>> The following seems to do the trick. >>>>>> >>>>>> diff --git a/monitor.c b/monitor.c >>>>>> index 845f608..5622f20 100644 >>>>>> --- a/monitor.c >>>>>> +++ b/monitor.c >>>>>> @@ -288,8 +288,8 @@ void monitor_flush(Monitor *mon) >>>>>> =20 >>>>>> if (len && !mon->mux_out) { >>>>>> rc =3D qemu_chr_fe_write(mon->chr, (const uint8_t *) buf,= len); >>>>>> - if (rc =3D=3D len) { >>>>>> - /* all flushed */ >>>>>> + if ((rc < 0 && errno !=3D EAGAIN) || (rc =3D=3D len)) { >>>>>> + /* all flushed or error */ >>>>>> QDECREF(mon->outbuf); >>>>>> mon->outbuf =3D qstring_new(); >>>>>> return; >>>>>> >>>>>> Comments? >>>>>> >>>>>> Thanks, >>>>>> Stratos >>>>>> >>>>>> --=20 >>>>>> Stratos Psomadakis >>>>>> >>>>>> >>>> >>>> --=20 >>>> Stratos Psomadakis >>>> >>>> >>>> >>> --=20 Stratos Psomadakis --TwWb23LD6cOVi3rBep8mK0KSTpARVLfeh Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.21 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJS4TZRAAoJEO/NY8gC4r+pLLMP/0QlQ1cvVnFexZxqxbG78nBk eteXC8LoDx2vddt7ksgTpcLH/gzgAcVlriPH9PSKvD6vDddrjYi0NdiED29hXxCO 5B7TkQdOGQdlAw9VA42zymONGIMmUsoPCjgi8/grAv6L8SzBU4D8lModqe4YqBeV 4rNx9V/VDw9CGddpc0Lh7Y9Gvifc0gmCNlz6MEmI+ZdPxoVlUiRwOfG7b+NK74vW QRcqMnOfO6HCs+hFJXBWa0uk3TKS52XLs3gCZGTB71mbllNVTqLjNImNnir0X6U2 h0tMe1yBqoEpIy4zMtMOlUUvG260ivWVv1OvwBBoPYlbCF6gTy92ZWzQjNMJWiac eGf/HK4iZZ5F8B1fxByPiA9QvSNknkpnswbtxYrvAefgmGMuRHUXNGPHmKOh7ZZM IIEzrYw8qv9rZL5m/W0uvbRukFjDKkSirU0dMCesZp8T9F1Ry5+jsPExmq8eos+7 kmMoyW9/ToOCQ8BsJwxXX+1wBX9Ql9eeSsY+3DZtgqtS4NbiIcJ91GlNVGHNGkbx VMsjTc39VxY6aO/eoBnRGO0BgypVejY28T8bsq4v/cC+N4/sMzbBgdjk92kCtJiw 26v9A8Dj7Bt2hkgHl5ci7YSpje0HTNSC5c828OyRmm/4/wowPeT8k4M09RcSlv6z 4Lt57P/MgM5L6NRtVwvv =H1EE -----END PGP SIGNATURE----- --TwWb23LD6cOVi3rBep8mK0KSTpARVLfeh--