From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:56669)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <peterx@redhat.com>) id 1fg5Vk-0000x5-BO
	for qemu-devel@nongnu.org; Thu, 19 Jul 2018 05:46:13 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <peterx@redhat.com>) id 1fg5Vh-0006au-7M
	for qemu-devel@nongnu.org; Thu, 19 Jul 2018 05:46:12 -0400
Received: from mx3-rdu2.redhat.com ([66.187.233.73]:38488 helo=mx1.redhat.com)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <peterx@redhat.com>) id 1fg5Vh-0006a3-0j
	for qemu-devel@nongnu.org; Thu, 19 Jul 2018 05:46:09 -0400
Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com
	[10.11.54.6])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 5EC9A401EF09
	for <qemu-devel@nongnu.org>; Thu, 19 Jul 2018 09:46:08 +0000 (UTC)
Date: Thu, 19 Jul 2018 17:46:01 +0800
From: Peter Xu <peterx@redhat.com>
Message-ID: <20180719094601.GJ4071@xz-mi>
References: <20180620071040.28729-1-peterx@redhat.com>
	<87y3e8lfks.fsf@dusky.pond.sub.org> <20180719050145.GD4071@xz-mi>
	<87va9bitdp.fsf@dusky.pond.sub.org> <20180719080306.GF4071@xz-mi>
	<87fu0fh9y9.fsf@dusky.pond.sub.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <87fu0fh9y9.fsf@dusky.pond.sub.org>
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH v4] monitor: let cur_mon be per-thread
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Markus Armbruster <armbru@redhat.com>
Cc: =?utf-8?Q?Marc-Andr=C3=A9?= Lureau <marcandre.lureau@redhat.com>, qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>

On Thu, Jul 19, 2018 at 11:05:34AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
>=20
> > On Thu, Jul 19, 2018 at 09:20:34AM +0200, Markus Armbruster wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >>=20
> >> > On Wed, Jul 18, 2018 at 05:38:11PM +0200, Markus Armbruster wrote:
> >> >> Peter Xu <peterx@redhat.com> writes:
> >> >>=20
> >> >> > After the Out-Of-Band work, the monitor iothread may be accessi=
ng the
> >> >> > cur_mon as well (via monitor_qmp_dispatch_one()).
>=20
> Since renamed to monitor_qmp_dispatch().

True.

>=20
> Further down, we concluded that cur_mon isn't actually used from the I/=
O
> thread, didn't we?

I think so; if not we should either fix it or apply this patch. :)

>=20
> >> >> >                                                    Let's conver=
t the
> >> >> > cur_mon variable to be a per-thread variable to make sure there=
 won't be
> >> >> > a race between threads when accessing the variable.
> >> >>=20
> >> >> Hmm... why hasn't the OOB work created such a race already?
> >> >>=20
> >> >> A monitor reads, parses, dispatches and executes commands, format=
s and
> >> >> sends replies.
> >> >>=20
> >> >> Before OOB, all of that ran in the main thread.  Any access of cu=
r_mon
> >> >> should therefore be from the main thread.  No races.
> >> >>=20
> >> >> OOB moves read, parse, format and send to an I/O thread.  Dispatc=
h and
> >> >> execute remain in the main thread.  *Except* for commands execute=
d OOB,
> >> >> dispatch and execute move to the I/O thread, too.
> >> >>=20
> >> >> Why is this not racy?  I guess it relies on careful non-use of cu=
r_mon
> >> >> in any part that may now execute in the I/O thread.  Scary...
> >> >
> >> > I think it's because cur_mon is not really used in out-of-band com=
mand
> >> > executions - now we only have a few out-of-band enabled commands, =
and
> >> > IIUC none of them is using cur_mon (for example, in
> >> > qmp_migrate_recover() we don't even call error_report, and the cod=
e
> >> > path is quite straight forward to make sure of that).  So IIUC cur=
_mon
> >> > variable is still only touched by main thread for now hence we sho=
uld
> >> > be safe.  However that condition might change in the future when w=
e
> >> > add more out-of-band capable commands.
> >> >
> >> > (not to mention that I don't even know whether there are real user=
s of
> >> >  out-of-band if we haven't yet started to support that for libvirt=
...)
> >>=20
> >> It's not just the actual OOB commands (there are just two), it's als=
o
> >> the monitor code to read, parse, format and send.
> >
> > My understanding is that read, parse, format, send will not touch
> > cur_mon (it was touched before but some patches in the out-of-band
> > series should have removed the last users when parsing).  So IIUC onl=
y
> > the dispatcher would touch that now.  I didn't consider the callers
> > like net_init_socket() and I'm only considering the monitor code (and
> > those callers should be only in the main thread too after all).
>=20
> There *is* cur_mon use outside dispatch & execute, e.g.
>=20
>     void error_vprintf(const char *fmt, va_list ap)
>     {
>         if (cur_mon && !monitor_cur_is_qmp()) {
>             monitor_vprintf(cur_mon, fmt, ap);
>         } else {
>             vfprintf(stderr, fmt, ap);
>         }
>     }
>=20
> Obviously unsafe to use outside the main thread.  Consider:
>=20
>     bool monitor_cur_is_qmp(void)
>     {
>         return cur_mon && monitor_is_qmp(cur_mon);
>     }
>=20
>     static inline bool monitor_is_qmp(const Monitor *mon)
>     {
>         return (mon->flags & MONITOR_USE_CONTROL);
>     }
>=20
> If monitor_cur_is_qmp() reads cur_mon twice (which it is entitled to
> do), this crashes when the main thread sets cur_mon back to null in
> between.

Yes, but I thought we should not even use these error_vprintf() or
sister functions outside the QMP handlers, or at least that's what I
thought.  For example, in parsers, we should always use error_setg()
or something similar but never error_report().

>=20
> Did the OOB work make things any worse?  Let's see.
>=20
> @cur_mon is null unless the main thread is running monitor code, either
> HMP within monitor_read():
>=20
>     cur_mon =3D opaque;
>=20
>     if (cur_mon->rs) {
>         for (i =3D 0; i < size; i++)
>             readline_handle_byte(cur_mon->rs, buf[i]);
>     } else {
>         if (size =3D=3D 0 || buf[size - 1] !=3D 0)
>             monitor_printf(cur_mon, "corrupted command\n");
>         else
>             handle_hmp_command(cur_mon, (char *)buf);
>     }
>=20
>     cur_mon =3D old_mon;
>=20
> or QMP within monitor_qmp_dispatch():
>=20
>     old_mon =3D cur_mon;
>     cur_mon =3D mon;
>=20
>     rsp =3D qmp_dispatch(mon->qmp.commands, req, qmp_oob_enabled(mon));
>=20
>     cur_mon =3D old_mon;
>=20
> In both cases, old_mon is always null.
>=20
> Fine print: before commit 227a07552f3 "monitor: move the cur_mon hack
> deeper for QMP", we ran more code for QMP with cur_mon set, namely the
> JSON parser, but that doesn't matter here.
>=20
> More fine print: there's also qmp_human_monitor_command(), which stacks
> an HMP monitor on top of the QMP monitor.  Also doesn't matter here.
>=20
> The OOB work doesn't add any new races as long as
>=20
> * it doesn't add assignments to @cur_mon, and
>=20
> * none of the code it moves out of the main thread accesses @cur_mon.
>=20
> The first condition obviously holds.  The second one isn't obvious, but
> I figure it holds, too.
>=20
> Okay, I think I've convince myself the OOB work didn't add
> cur_mon-related races.

Hopefully, yes.  Thanks for the double check.

>=20
> >> >> Should this go into 3.0 to reduce the risk of bugs?
> >> >
> >> > Yes I think it would be good to have that even for 3.0, since it s=
till
> >> > can be seen as a bug fix of existing code.
> >>=20
> >> Agreed.
> >>=20
> >> > Regards,
> >> >
> >> >> > Note that thread variables are not initialized to a valid value=
 when new
> >> >> > thread is created.
> >>=20
> >> Confusing.  It sounds like @cur_mon's initial value would be
> >> indeterminate, like an automatic variable's.  Not true.  Variables w=
ith
> >> thread storage duration are initialized when the thread is created.
> >> Since @cur_mon's declaration lacks an initializer, it'll be initiali=
zed
> >> to a null pointer.  Your sentence is correct when you consider that =
null
> >> pointer not a valid value.
> >
> > Yes that's what I meant.  So how about this?
> >
> >   Note that the per-thread @cur_mon variable is not initialized to
> >   point to a valid Monitor struct when a new thread is created (the
> >   default value will be NULL).
> >
> > Please feel free to tune it up.
>=20
> I think what the patch really changes is the value of @cur_mon outside
> the main thread: it remains null there now.  Before, it depended on wha=
t
> the main thread was doing, and therefore could not be used safely.
>=20
> In other words, the patch makes uses of @cur_mon like the one in
> error_vprintf() shown above safe.
>=20
> I think that's what we should explain in the commit message.  I can try
> rewriting it,

I'll appreciate that if so.

> but right now I got to run.

Must be lunch time! :)

Regards,

>=20
> >>=20
> >> >> >                     However for our case we don't need to set i=
t up,
> >> >> > since the cur_mon variable is only used in such a pattern:
> >> >> >=20
> >> >> >   old_mon =3D cur_mon;
> >> >> >   cur_mon =3D xxx;
> >> >> >   (do something, read cur_mon if necessary in the stack)
> >
> > [1]
> >
> >> >> >   cur_mon =3D old_mon;
> >> >> >=20
> >> >> > It plays a role as stack variable, so no need to be initialized=
 at all.
> >> >> > We only need to make sure the variable won't be changed unexpec=
tedly by
> >> >> > other threads.
> >>=20
> >> Do we need this paragraph?  The commit doesn't mess with @cur_mon's
> >> initial value at all...
> >
> > I was trying to explain why we don't need to initialize that variable
> > for each thread.  A common idea (at least that's what I have had in
> > mind) is that when we create a new thread we should possibly inherit
> > that @cur_mon variable in a copy-on-write fashion for that new thread=
.
> > But that's not really necessary for the use case like above (as long
> > as we don't create thread during [1], and that's what we do).
> >
> > If you think the patch explains itself better without these lines,
> > please feel free to drop it.
> >
> >>=20
> >> >> > Reviewed-by: Eric Blake <eblake@redhat.com>
> >> >> > Reviewed-by: Marc-Andr=C3=A9 Lureau <marcandre.lureau@redhat.co=
m>
> >> >> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> >> >> > [peterx: touch up commit message a bit]
> >> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
> >
> > Thanks,

--=20
Peter Xu