From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56669) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fg5Vk-0000x5-BO for qemu-devel@nongnu.org; Thu, 19 Jul 2018 05:46:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fg5Vh-0006au-7M for qemu-devel@nongnu.org; Thu, 19 Jul 2018 05:46:12 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:38488 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fg5Vh-0006a3-0j for qemu-devel@nongnu.org; Thu, 19 Jul 2018 05:46:09 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5EC9A401EF09 for ; Thu, 19 Jul 2018 09:46:08 +0000 (UTC) Date: Thu, 19 Jul 2018 17:46:01 +0800 From: Peter Xu Message-ID: <20180719094601.GJ4071@xz-mi> References: <20180620071040.28729-1-peterx@redhat.com> <87y3e8lfks.fsf@dusky.pond.sub.org> <20180719050145.GD4071@xz-mi> <87va9bitdp.fsf@dusky.pond.sub.org> <20180719080306.GF4071@xz-mi> <87fu0fh9y9.fsf@dusky.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87fu0fh9y9.fsf@dusky.pond.sub.org> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v4] monitor: let cur_mon be per-thread List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: =?utf-8?Q?Marc-Andr=C3=A9?= Lureau , qemu-devel@nongnu.org, Stefan Hajnoczi , "Dr. David Alan Gilbert" On Thu, Jul 19, 2018 at 11:05:34AM +0200, Markus Armbruster wrote: > Peter Xu writes: >=20 > > On Thu, Jul 19, 2018 at 09:20:34AM +0200, Markus Armbruster wrote: > >> Peter Xu writes: > >>=20 > >> > On Wed, Jul 18, 2018 at 05:38:11PM +0200, Markus Armbruster wrote: > >> >> Peter Xu writes: > >> >>=20 > >> >> > After the Out-Of-Band work, the monitor iothread may be accessi= ng the > >> >> > cur_mon as well (via monitor_qmp_dispatch_one()). >=20 > Since renamed to monitor_qmp_dispatch(). True. >=20 > Further down, we concluded that cur_mon isn't actually used from the I/= O > thread, didn't we? I think so; if not we should either fix it or apply this patch. :) >=20 > >> >> > Let's conver= t the > >> >> > cur_mon variable to be a per-thread variable to make sure there= won't be > >> >> > a race between threads when accessing the variable. > >> >>=20 > >> >> Hmm... why hasn't the OOB work created such a race already? > >> >>=20 > >> >> A monitor reads, parses, dispatches and executes commands, format= s and > >> >> sends replies. > >> >>=20 > >> >> Before OOB, all of that ran in the main thread. Any access of cu= r_mon > >> >> should therefore be from the main thread. No races. > >> >>=20 > >> >> OOB moves read, parse, format and send to an I/O thread. Dispatc= h and > >> >> execute remain in the main thread. *Except* for commands execute= d OOB, > >> >> dispatch and execute move to the I/O thread, too. > >> >>=20 > >> >> Why is this not racy? I guess it relies on careful non-use of cu= r_mon > >> >> in any part that may now execute in the I/O thread. Scary... > >> > > >> > I think it's because cur_mon is not really used in out-of-band com= mand > >> > executions - now we only have a few out-of-band enabled commands, = and > >> > IIUC none of them is using cur_mon (for example, in > >> > qmp_migrate_recover() we don't even call error_report, and the cod= e > >> > path is quite straight forward to make sure of that). So IIUC cur= _mon > >> > variable is still only touched by main thread for now hence we sho= uld > >> > be safe. However that condition might change in the future when w= e > >> > add more out-of-band capable commands. > >> > > >> > (not to mention that I don't even know whether there are real user= s of > >> > out-of-band if we haven't yet started to support that for libvirt= ...) > >>=20 > >> It's not just the actual OOB commands (there are just two), it's als= o > >> the monitor code to read, parse, format and send. > > > > My understanding is that read, parse, format, send will not touch > > cur_mon (it was touched before but some patches in the out-of-band > > series should have removed the last users when parsing). So IIUC onl= y > > the dispatcher would touch that now. I didn't consider the callers > > like net_init_socket() and I'm only considering the monitor code (and > > those callers should be only in the main thread too after all). >=20 > There *is* cur_mon use outside dispatch & execute, e.g. >=20 > void error_vprintf(const char *fmt, va_list ap) > { > if (cur_mon && !monitor_cur_is_qmp()) { > monitor_vprintf(cur_mon, fmt, ap); > } else { > vfprintf(stderr, fmt, ap); > } > } >=20 > Obviously unsafe to use outside the main thread. Consider: >=20 > bool monitor_cur_is_qmp(void) > { > return cur_mon && monitor_is_qmp(cur_mon); > } >=20 > static inline bool monitor_is_qmp(const Monitor *mon) > { > return (mon->flags & MONITOR_USE_CONTROL); > } >=20 > If monitor_cur_is_qmp() reads cur_mon twice (which it is entitled to > do), this crashes when the main thread sets cur_mon back to null in > between. Yes, but I thought we should not even use these error_vprintf() or sister functions outside the QMP handlers, or at least that's what I thought. For example, in parsers, we should always use error_setg() or something similar but never error_report(). >=20 > Did the OOB work make things any worse? Let's see. >=20 > @cur_mon is null unless the main thread is running monitor code, either > HMP within monitor_read(): >=20 > cur_mon =3D opaque; >=20 > if (cur_mon->rs) { > for (i =3D 0; i < size; i++) > readline_handle_byte(cur_mon->rs, buf[i]); > } else { > if (size =3D=3D 0 || buf[size - 1] !=3D 0) > monitor_printf(cur_mon, "corrupted command\n"); > else > handle_hmp_command(cur_mon, (char *)buf); > } >=20 > cur_mon =3D old_mon; >=20 > or QMP within monitor_qmp_dispatch(): >=20 > old_mon =3D cur_mon; > cur_mon =3D mon; >=20 > rsp =3D qmp_dispatch(mon->qmp.commands, req, qmp_oob_enabled(mon)); >=20 > cur_mon =3D old_mon; >=20 > In both cases, old_mon is always null. >=20 > Fine print: before commit 227a07552f3 "monitor: move the cur_mon hack > deeper for QMP", we ran more code for QMP with cur_mon set, namely the > JSON parser, but that doesn't matter here. >=20 > More fine print: there's also qmp_human_monitor_command(), which stacks > an HMP monitor on top of the QMP monitor. Also doesn't matter here. >=20 > The OOB work doesn't add any new races as long as >=20 > * it doesn't add assignments to @cur_mon, and >=20 > * none of the code it moves out of the main thread accesses @cur_mon. >=20 > The first condition obviously holds. The second one isn't obvious, but > I figure it holds, too. >=20 > Okay, I think I've convince myself the OOB work didn't add > cur_mon-related races. Hopefully, yes. Thanks for the double check. >=20 > >> >> Should this go into 3.0 to reduce the risk of bugs? > >> > > >> > Yes I think it would be good to have that even for 3.0, since it s= till > >> > can be seen as a bug fix of existing code. > >>=20 > >> Agreed. > >>=20 > >> > Regards, > >> > > >> >> > Note that thread variables are not initialized to a valid value= when new > >> >> > thread is created. > >>=20 > >> Confusing. It sounds like @cur_mon's initial value would be > >> indeterminate, like an automatic variable's. Not true. Variables w= ith > >> thread storage duration are initialized when the thread is created. > >> Since @cur_mon's declaration lacks an initializer, it'll be initiali= zed > >> to a null pointer. Your sentence is correct when you consider that = null > >> pointer not a valid value. > > > > Yes that's what I meant. So how about this? > > > > Note that the per-thread @cur_mon variable is not initialized to > > point to a valid Monitor struct when a new thread is created (the > > default value will be NULL). > > > > Please feel free to tune it up. >=20 > I think what the patch really changes is the value of @cur_mon outside > the main thread: it remains null there now. Before, it depended on wha= t > the main thread was doing, and therefore could not be used safely. >=20 > In other words, the patch makes uses of @cur_mon like the one in > error_vprintf() shown above safe. >=20 > I think that's what we should explain in the commit message. I can try > rewriting it, I'll appreciate that if so. > but right now I got to run. Must be lunch time! :) Regards, >=20 > >>=20 > >> >> > However for our case we don't need to set i= t up, > >> >> > since the cur_mon variable is only used in such a pattern: > >> >> >=20 > >> >> > old_mon =3D cur_mon; > >> >> > cur_mon =3D xxx; > >> >> > (do something, read cur_mon if necessary in the stack) > > > > [1] > > > >> >> > cur_mon =3D old_mon; > >> >> >=20 > >> >> > It plays a role as stack variable, so no need to be initialized= at all. > >> >> > We only need to make sure the variable won't be changed unexpec= tedly by > >> >> > other threads. > >>=20 > >> Do we need this paragraph? The commit doesn't mess with @cur_mon's > >> initial value at all... > > > > I was trying to explain why we don't need to initialize that variable > > for each thread. A common idea (at least that's what I have had in > > mind) is that when we create a new thread we should possibly inherit > > that @cur_mon variable in a copy-on-write fashion for that new thread= . > > But that's not really necessary for the use case like above (as long > > as we don't create thread during [1], and that's what we do). > > > > If you think the patch explains itself better without these lines, > > please feel free to drop it. > > > >>=20 > >> >> > Reviewed-by: Eric Blake > >> >> > Reviewed-by: Marc-Andr=C3=A9 Lureau > >> >> > Reviewed-by: Stefan Hajnoczi > >> >> > [peterx: touch up commit message a bit] > >> >> > Signed-off-by: Peter Xu > > > > Thanks, --=20 Peter Xu