From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:54951)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lcapitulino@redhat.com>) id 1UhkAU-0004BO-S4
	for qemu-devel@nongnu.org; Wed, 29 May 2013 13:28:16 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <lcapitulino@redhat.com>) id 1UhkAO-0002L2-LM
	for qemu-devel@nongnu.org; Wed, 29 May 2013 13:28:10 -0400
Date: Wed, 29 May 2013 13:27:33 -0400
From: Luiz Capitulino <lcapitulino@redhat.com>
Message-ID: <20130529132733.77918afb@redhat.com>
In-Reply-To: <20130527175925.GE4599@vm>
References: <1369582419-28205-1-git-send-email-mdroth@linux.vnet.ibm.com>
	<20130527112450.12eba408@redhat.com> <877gikflfi.fsf@codemonkey.ws>
	<20130527175925.GE4599@vm>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v2] monitor: work around delayed
 CHR_EVENT_OPENED events
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: mdroth <mdroth@linux.vnet.ibm.com>
Cc: qemu-devel@nongnu.org, qemu-stable@nongnu.org, Anthony Liguori <anthony@codemonkey.ws>, s.priebe@profihost.ag

On Mon, 27 May 2013 12:59:25 -0500
mdroth <mdroth@linux.vnet.ibm.com> wrote:

> On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote:
> > Luiz Capitulino <lcapitulino@redhat.com> writes:
> > 
> > > On Sun, 26 May 2013 10:33:39 -0500
> > > Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> > >
> > >> In the past, CHR_EVENT_OPENED events were emitted via a pre-expired
> > >> QEMUTimer. Due to timers being processing at the tail end of each main
> > >> loop iteration, this generally meant that such events would be emitted
> > >> within the same main loop iteration, prior any client data being read
> > >> by tcp/unix socket server backends.
> > >> 
> > >> With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to
> > >> a "bottom-half" that is registered via g_idle_add(). This makes it
> > >> likely that the event won't be sent until a subsequent iteration, and
> > >> also adds the possibility that such events will race with the
> > >> processing of client data.
> > >> 
> > >> In cases where we rely on the CHR_EVENT_OPENED event being delivered
> > >> prior to any client data being read, this may cause unexpected behavior.
> > >> 
> > >> In the case of a QMP monitor session using a socket backend, the delayed
> > >> processing of the CHR_EVENT_OPENED event can lead to a situation where
> > >> a previous session, where 'qmp_capabilities' has already processed, can
> > >> cause the rejection of 'qmp_capabilities' for a subsequent session,
> > >> since we reset capabilities in response to CHR_EVENT_OPENED, which may
> > >> not yet have been delivered. This can be reproduced with the following
> > >> command, generally within 50 or so iterations:
> > >> 
> > >>   mdroth@loki:~$ cat cmds
> > >>   {'execute':'qmp_capabilities'}
> > >>   {'execute':'query-cpus'}
> > >>   mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock
> > >>   <cmds | grep CommandNotFound &>/dev/null; then echo failed; break; else
> > >>   echo ok; fi; done;
> > >>   ok
> > >>   ok
> > >>   failed
> > >>   mdroth@loki:~$
> > >> 
> > >> As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED
> > >> hook, which gets called as part of the GIOChannel cb associated with the
> > >> client rather than a bottom-half, and is thus guaranteed to be delivered
> > >> prior to accepting any subsequent client sessions.
> > >> 
> > >> This does not address the more general problem of whether or not there
> > >> are chardev users that rely on CHR_EVENT_OPENED being delivered prior to
> > >> client data, and whether or not we should consider moving CHR_EVENT_OPENED
> > >> "in-band" with connection establishment as a general solution, but fixes
> > >> QMP for the time being.
> > >> 
> > >> Reported-by: Stefan Priebe <s.priebe@profihost.ag>
> > >> Cc: qemu-stable@nongnu.org
> > >> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > >
> > > Thanks for debugging this Michael. I'm going to apply your patch to the qmp
> > > branch because we can't let this broken (esp. in -stable) but this is papering
> > > over the real bug...
> > 
> > Do we really need OPENED to happen in a bottom half?  Shouldn't the open
> > event handlers register the callback instead if they need it?
> 
> I think that's the more general fix, but wasn't sure if there were
> historical reasons why we didn't do this in the first place.
> 
> Looking at it more closely it does seem like we need a general fix
> sooner rather than later though, and I don't see any reason why we can't
> move BH registration into the actual OPENED hooks as you suggest:
> 
> hw/usb/ccid-card-passthru.c
>  - currently affected? no
>     debug hook only
>  - needs OPENED BH? no
> 
> hw/usb/redirect.c
>  - currently affected? yes
>     can_read handler checks for dev->parser != NULL, which may be
>     true if CLOSED BH has been executed yet. In the past, OPENED
>     quiesced outstanding CLOSED events prior to us reading client data.
>  - need OPENED BH? possibly
>     seems to only be needed by CLOSED events, which can be issued by
>     USBDevice, so we do teardown/usb_detach in BH. For OPENED, this
>     may not apply. if we do issue a BH, we'd need to modify can_read
>     handler to avoid race.
> 
> hw/usb/dev-serial.c
>  - currently affected? no
>     can_read checks for dev.attached != NULL. set to NULL
>     synchronously in CLOSED, will not accept reads until OPENED gets
>     called and sets dev.attached
>  - need opened BH?  no
> 
> hw/char/sclpconsole.c
>  - currently affected? no
>     guarded by can_read handler
>  - need opened BH? no
> 
> hw/char/ipoctal232.c
>  - currently affected? no
>     debug hook only
>  - need opened BH? no
> 
> hw/char/virtio-console.c
>  - currently affected? no
>     OPENED/CLOSED map to vq events handled asynchronously. can_read
>     checks for guest_connected state rather than anything set by OPENED
>  - need opened BH? no
> 
> qtest.c
>  - currently affected? yes
>     can_read handler doesn't check for qtest_opened == true, can read
>     data before OPENED event is processed
>  - need opened BH? no
> 
> monitor.c
>  - current affected? yes
>     negotiated session state can temporarilly leak into a new session
>  - need opened BH? no
> 
> gdbstub.c
>  - currently affected? yes
>     can fail to pause machine prior to starting gdb session
>  - need opened BH? no
> 
> So we have a number of cases that can be fixed by avoiding the use of
> the generic BH, and only 1 possible cause where we might need a
> device-specific BH.
> 
> At first glance anyway. So if this all seems reasonable I can send a
> more general fix shortly.

Michael, I've dropped your first patch and am taking it that you're
going to cook this more general fix.

> 
> > 
> > Regards,
> > 
> > Anthony Liguori
> > 
> > >
> > >> ---
> > >> v1->v2:
> > >> * remove command_mode reset from CHR_EVENT_OPENED case, since this
> > >>   might still cause a race
> > >> 
> > >>  monitor.c |    2 +-
> > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >> 
> > >> diff --git a/monitor.c b/monitor.c
> > >> index 6ce2a4e..f1953a0 100644
> > >> --- a/monitor.c
> > >> +++ b/monitor.c
> > >> @@ -4649,7 +4649,6 @@ static void monitor_control_event(void *opaque, int event)
> > >>  
> > >>      switch (event) {
> > >>      case CHR_EVENT_OPENED:
> > >> -        mon->mc->command_mode = 0;
> > >>          data = get_qmp_greeting();
> > >>          monitor_json_emitter(mon, data);
> > >>          qobject_decref(data);
> > >> @@ -4660,6 +4659,7 @@ static void monitor_control_event(void *opaque, int event)
> > >>          json_message_parser_init(&mon->mc->parser, handle_qmp_command);
> > >>          mon_refcount--;
> > >>          monitor_fdsets_cleanup();
> > >> +        mon->mc->command_mode = 0;
> > >>          break;
> > >>      }
> > >>  }
> > 
>