From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59707) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1d6Z6H-0002ib-Ub for qemu-devel@nongnu.org; Fri, 05 May 2017 05:00:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1d6Z6E-0004hy-Ss for qemu-devel@nongnu.org; Fri, 05 May 2017 05:00:34 -0400 Received: from mail-lf0-x22c.google.com ([2a00:1450:4010:c07::22c]:34866) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1d6Z6E-0004hT-GB for qemu-devel@nongnu.org; Fri, 05 May 2017 05:00:30 -0400 Received: by mail-lf0-x22c.google.com with SMTP id j1so21042976lfh.2 for ; Fri, 05 May 2017 02:00:30 -0700 (PDT) MIME-Version: 1.0 References: <20170118160332.13390-1-marcandre.lureau@redhat.com> <87mvaszbsu.fsf@dusky.pond.sub.org> In-Reply-To: <87mvaszbsu.fsf@dusky.pond.sub.org> From: =?UTF-8?B?TWFyYy1BbmRyw6kgTHVyZWF1?= Date: Fri, 05 May 2017 09:00:17 +0000 Message-ID: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2 00/25] qmp: add async command type List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: Kevin Wolf , John Snow , qemu-devel@nongnu.org, Stefan Hajnoczi , kraxel@redhat.com Hi On Thu, May 4, 2017 at 11:02 PM Markus Armbruster wrote= : > I figure everyone interested has had his say. Let me try to summarize > and draw my conclusions. > > Commands + events can replace asynchronous commands (they did). > > Asynchronous commands cannot replace events, because "interesting" side > effects of commands need to be broadcast. Doesn't mean asynchronous > commands can't make sense for us. Does mean that we need to address any > serious flaws in the existing commands + event mechanism even if we > adopt asynchronous commands. The cover letter lists multiple flaws, > which I'll discuss inline. > > If an operation satisfies certain conditions, then implementing it as > asynchronous command provides certain benefits. If the operation > > (C1) does not need to broadcast state changes, including start and end > of operation, and > > (C2) even the initiating client needs no state change notifications > except for end of operation, and > > (C3) even that isn't needed when the client loses the connection and > reconnects, and > > (C4) the operation doesn't have to be examined or controlled while it > runs, > > then making it an asynchronous command > > (B1) saves you the trouble of defining an event together with its link > to the command, and > > (B2) saves you the trouble of sending the event, and > > (B3) saves a bit of QMP traffic, namely the event broadcast. > > Letting clients opt into the asynchronousness as this series does > provides an additional benefit: > > (B4) when you find one of your synchronous commands is at times taking > too long, you can compatibly evolve it. > > We can remove (C3) and (C4) by providing suitable ways to examine and > control asynchronous commands while they run. > > We could remove (C1) and (C2) by sending events, but that would also > lose pretty much all benefits. > > The operation chosen to showcase asynchronous commands is taking > screenshots. It obviously satisfies (C1). I guess for (C2) and (C4) we > have to assume that it completes "quickly". I'm willing to buy that. > (C3) seems pretty dubious to me, however. > > The tradeoff to judge is desirability of the benefits vs. cost to get > them. > > Desirability depends on how many commands satisfy these conditions, and > how much trouble exactly is saved. > > A list of commands that is more complete than just "screendump" would > help with the former. query- commands that need to gather information > in ways that could block are among the candidates. > > On the latter, I figure (B1) dwarves (B2), (B3) is a relatively minor > optimization unless proven otherwise, and (B4) is real, but asynchronous > commands are not the only way to get it; we can equally well evolve > synchronous commands compatibly into synchronous + event. > > The cost is interface and implementation complexity. > > Interface complexity increases because we add the concept "asynchronous > command" and the concept "synchronous or asynchronous command, client's > choice". This is my main concern. > Since we already have client dispatch of unrelated reply and hidden-async pattern, this is less of a concern in practice, most of the complexity has to be already handled. Note it is a bit simpler for a client to handle a generic async return rather than various events. > Implementation complexity isn't prohibitive according to diffstat, but > it's not neglible, either: with the last six patches peeled off, we get > some 500 additional lines plus some 300 in tests/. The last six patches > put the new infrastructure to work, which takes another 200. Ways to > examine and control asynchronous commands while they run would add some > more, but we don't know how much exactly. > > Now let me take a step back: we need to crack the "jobs" problem anyway. > Merging this series now would in a way add another special case of the > general case "jobs" before adding the general case. In other words, > create more stuff to unify. I'm afraid that's a bad idea, not least > since it's an external interface. I think we should do "jobs" first. > Once we got at least a prototype, we can > > * judge how onerous defining and implementing jobs really is, i.e. have > a clearer idea of the potential benefits of asynchronous commands, and > > * explore asynchronous commands as a special case of "jobs" optimized > for operations that satisfy the conditions above, i.e. get a clearer > idea of the *actual* additional implementation complexity. > That's what I have been told for over 1.5y :) I would like to know the list of requirements / features "jobs" would have. I am willing to help implementing it. > > Bottom line: > > 1. I still don't want to merge this. > > 2. I want us to tackle jobs sooner rather than later. > > 3. Once we got at least a jobs prototype, I'm willing to reconsider > asynchronous commands implemented as special case of jobs. > > Who is going to work on it? > One of the most important maintainer duties is saying "no". It's also > one of the least fun duties. > > It doesn't sound like a "no" :) > > Marc-Andr=C3=A9 Lureau writes: > > > Hi, > > > > One of initial design goals of QMP was to have "asynchronous command > > completion" (http://wiki.qemu.org/Features/QAPI). Unfortunately, that > > goal was not fully achieved, and some broken bits left were removed > > progressively until commit 65207c59d that removed async command > > support. > > > > Note that qmp events are asynchronous messages, and must be handled > > appropriately by the client: dispatch both reply and events after a > > command is sent for example. > > > > The benefits of async commands that can be trade-off depending on the > > requirements are: > > > > 1) allow the command handler to re-enter the main loop if the command > > cannot be handled synchronously, or if it is long-lasting. This is > > currently not possible and make some bugs such as rhbz#1230527 tricky > > (see below) to solve. Furthermore, many QMP commands do IO and could > > be considered 'slow' and blocking today. > > > > 2) allow concurrent commands and events. This mainly implies hanlding > > concurrency in qemu and out-of-order replies for the client. As noted > > earlier, a good qmp client already has to handle dispatching of > > received messages (reply and events). > > > > The traditional approach to solving the above in qemu is the following > > scheme: > > -> { "execute": "do-foo" } > > <- { "return": {} } > > <- { "event": "FOO_DONE" } > > > > It has several flaws: > > Since asynchronous commands can't replace events outright, all of these > flaws stay even if we adopt asynchronous commands. Any serious flaws > need fixing in other ways. > > > - FOO_DONE event has no semantic link with do-foo in the qapi > > schema. It is not simple to generalize that pattern when writing > > qmp clients. It makes documentation and usage harder. > > I consider this serious enough to be worth fixing. We can either adopt > suitable naming and documentation conventions, or have commands declare > the events they can trigger. I think the latter makes sense only if we > can flag violations somehow. > > > - the FOO_DONE event has no clear association with the command that > > triggered it: commands/events have to come up with additional > > specific association schemes (ids, path etc) > > Existing events that need to be linked certainly have one. But there is > no single documented convention for such links. Well worth fixing; I'd > love to have a single convention for linking events to commands. > > We could then assert that an event's linked command declares the event. > > > - FOO_DONE is broadcasted to all clients, but they may have no way to > > interprete it or interest in it, or worse it may conflict with their > > own commands. > > An event that can be "mislinked" by clients is a design flaw that needs > fixing. A convention for linking (fix for the previous item) needs to > avoid this flaw. > > > - the arguably useless empty reply return > > Useless only if the client doesn't need to know that the (possibly > long-running) job has been started successfully. > > > For some cases, it makes sense to use that scheme, or a more complete > > one: to have an "handler" associated with an on-going operation, that > > can be queried, modified, cancelled etc (block jobs etc). Also, some > > operations have a global side-effect, in which case that cmd+event > > scheme is right, as all clients are listening for global events. > > I'm leaning more and more towards the idea that *any* long-running > operation implemented as command + event should be made an instance of > the future "job" abstraction, to give us a *uniform* way to examine and > control them all. > > > However, for the simple case where a client want to perform a "local > > context" operation (dump, query etc), QAPI can easily do better > > without resorting to this pattern: it can send the reply when the > > operation is done. That would make QAPI schema, usage and > > documentation more obvious. > > > > The following series implements an async solution, by introducing a > > context associated with a command, it can: > > - defer the return > > - return only to the caller (no broadcasted event) > > - return with the 'id' associated with the call (as originally intended= ) > > - optionnally allow cancellation when the client is gone > > - track on-going qapi command(s) per client > > > > 1) existing qemu commands can be gradually replaced by async:true > > variants when needed, and by carefully reviewing the concurrency > > aspects. The async:true commands marshaller helpers are splitted in > > half, the calling and return functions. The command is called with a > > QmpReturn context, that can return immediately or later, using the > > generated return helper. > > > > 2) allow concurrent commands when 'async' is negotiated. If the client > > doesn't support 'async', then the monitor is suspended during command > > execution (events are still sent). Effectively, async commands behave > > like normal commands from client POV in this case, giving full > > backward compatibility. > > > > The screendump command is converted to an async:true version to solve > > rhbz#1230527. The command shows basic cancellation (this could be > > extended if needed). HMP remains sync, but it would be worth to make > > it possible to call async:true qapi commands. > > > > The last patch cleans up qmp_dispatch usage to have consistant checks > > between qga & qemu, and simplify QmpClient/parser_feed usage. > > -- Marc-Andr=C3=A9 Lureau