From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:54951) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UhkAU-0004BO-S4 for qemu-devel@nongnu.org; Wed, 29 May 2013 13:28:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UhkAO-0002L2-LM for qemu-devel@nongnu.org; Wed, 29 May 2013 13:28:10 -0400 Date: Wed, 29 May 2013 13:27:33 -0400 From: Luiz Capitulino Message-ID: <20130529132733.77918afb@redhat.com> In-Reply-To: <20130527175925.GE4599@vm> References: <1369582419-28205-1-git-send-email-mdroth@linux.vnet.ibm.com> <20130527112450.12eba408@redhat.com> <877gikflfi.fsf@codemonkey.ws> <20130527175925.GE4599@vm> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: mdroth Cc: qemu-devel@nongnu.org, qemu-stable@nongnu.org, Anthony Liguori , s.priebe@profihost.ag On Mon, 27 May 2013 12:59:25 -0500 mdroth wrote: > On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote: > > Luiz Capitulino writes: > > > > > On Sun, 26 May 2013 10:33:39 -0500 > > > Michael Roth wrote: > > > > > >> In the past, CHR_EVENT_OPENED events were emitted via a pre-expired > > >> QEMUTimer. Due to timers being processing at the tail end of each main > > >> loop iteration, this generally meant that such events would be emitted > > >> within the same main loop iteration, prior any client data being read > > >> by tcp/unix socket server backends. > > >> > > >> With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to > > >> a "bottom-half" that is registered via g_idle_add(). This makes it > > >> likely that the event won't be sent until a subsequent iteration, and > > >> also adds the possibility that such events will race with the > > >> processing of client data. > > >> > > >> In cases where we rely on the CHR_EVENT_OPENED event being delivered > > >> prior to any client data being read, this may cause unexpected behavior. > > >> > > >> In the case of a QMP monitor session using a socket backend, the delayed > > >> processing of the CHR_EVENT_OPENED event can lead to a situation where > > >> a previous session, where 'qmp_capabilities' has already processed, can > > >> cause the rejection of 'qmp_capabilities' for a subsequent session, > > >> since we reset capabilities in response to CHR_EVENT_OPENED, which may > > >> not yet have been delivered. This can be reproduced with the following > > >> command, generally within 50 or so iterations: > > >> > > >> mdroth@loki:~$ cat cmds > > >> {'execute':'qmp_capabilities'} > > >> {'execute':'query-cpus'} > > >> mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock > > >> /dev/null; then echo failed; break; else > > >> echo ok; fi; done; > > >> ok > > >> ok > > >> failed > > >> mdroth@loki:~$ > > >> > > >> As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED > > >> hook, which gets called as part of the GIOChannel cb associated with the > > >> client rather than a bottom-half, and is thus guaranteed to be delivered > > >> prior to accepting any subsequent client sessions. > > >> > > >> This does not address the more general problem of whether or not there > > >> are chardev users that rely on CHR_EVENT_OPENED being delivered prior to > > >> client data, and whether or not we should consider moving CHR_EVENT_OPENED > > >> "in-band" with connection establishment as a general solution, but fixes > > >> QMP for the time being. > > >> > > >> Reported-by: Stefan Priebe > > >> Cc: qemu-stable@nongnu.org > > >> Signed-off-by: Michael Roth > > > > > > Thanks for debugging this Michael. I'm going to apply your patch to the qmp > > > branch because we can't let this broken (esp. in -stable) but this is papering > > > over the real bug... > > > > Do we really need OPENED to happen in a bottom half? Shouldn't the open > > event handlers register the callback instead if they need it? > > I think that's the more general fix, but wasn't sure if there were > historical reasons why we didn't do this in the first place. > > Looking at it more closely it does seem like we need a general fix > sooner rather than later though, and I don't see any reason why we can't > move BH registration into the actual OPENED hooks as you suggest: > > hw/usb/ccid-card-passthru.c > - currently affected? no > debug hook only > - needs OPENED BH? no > > hw/usb/redirect.c > - currently affected? yes > can_read handler checks for dev->parser != NULL, which may be > true if CLOSED BH has been executed yet. In the past, OPENED > quiesced outstanding CLOSED events prior to us reading client data. > - need OPENED BH? possibly > seems to only be needed by CLOSED events, which can be issued by > USBDevice, so we do teardown/usb_detach in BH. For OPENED, this > may not apply. if we do issue a BH, we'd need to modify can_read > handler to avoid race. > > hw/usb/dev-serial.c > - currently affected? no > can_read checks for dev.attached != NULL. set to NULL > synchronously in CLOSED, will not accept reads until OPENED gets > called and sets dev.attached > - need opened BH? no > > hw/char/sclpconsole.c > - currently affected? no > guarded by can_read handler > - need opened BH? no > > hw/char/ipoctal232.c > - currently affected? no > debug hook only > - need opened BH? no > > hw/char/virtio-console.c > - currently affected? no > OPENED/CLOSED map to vq events handled asynchronously. can_read > checks for guest_connected state rather than anything set by OPENED > - need opened BH? no > > qtest.c > - currently affected? yes > can_read handler doesn't check for qtest_opened == true, can read > data before OPENED event is processed > - need opened BH? no > > monitor.c > - current affected? yes > negotiated session state can temporarilly leak into a new session > - need opened BH? no > > gdbstub.c > - currently affected? yes > can fail to pause machine prior to starting gdb session > - need opened BH? no > > So we have a number of cases that can be fixed by avoiding the use of > the generic BH, and only 1 possible cause where we might need a > device-specific BH. > > At first glance anyway. So if this all seems reasonable I can send a > more general fix shortly. Michael, I've dropped your first patch and am taking it that you're going to cook this more general fix. > > > > > Regards, > > > > Anthony Liguori > > > > > > > >> --- > > >> v1->v2: > > >> * remove command_mode reset from CHR_EVENT_OPENED case, since this > > >> might still cause a race > > >> > > >> monitor.c | 2 +- > > >> 1 file changed, 1 insertion(+), 1 deletion(-) > > >> > > >> diff --git a/monitor.c b/monitor.c > > >> index 6ce2a4e..f1953a0 100644 > > >> --- a/monitor.c > > >> +++ b/monitor.c > > >> @@ -4649,7 +4649,6 @@ static void monitor_control_event(void *opaque, int event) > > >> > > >> switch (event) { > > >> case CHR_EVENT_OPENED: > > >> - mon->mc->command_mode = 0; > > >> data = get_qmp_greeting(); > > >> monitor_json_emitter(mon, data); > > >> qobject_decref(data); > > >> @@ -4660,6 +4659,7 @@ static void monitor_control_event(void *opaque, int event) > > >> json_message_parser_init(&mon->mc->parser, handle_qmp_command); > > >> mon_refcount--; > > >> monitor_fdsets_cleanup(); > > >> + mon->mc->command_mode = 0; > > >> break; > > >> } > > >> } > > >