* QMP and the 'id' parameter @ 2020-11-10 1:47 John Snow 2020-11-10 6:22 ` Markus Armbruster 0 siblings, 1 reply; 11+ messages in thread From: John Snow @ 2020-11-10 1:47 UTC (permalink / raw) To: Markus Armbruster; +Cc: QEMU Developers The QMP specification states: > NOTE: Some errors can occur before the Server is able to read the "id" > member, in these cases the "id" member will not be part of the error > response, even if provided by the client. I am assuming this case ONLY occurs for Parse errors: {'class': 'GenericError', 'desc': 'JSON parse error, expecting value'} And I am assuming, in the context of a client that /always/ sets an 'id' for its execute statements, that this means that any error response we receive without an 'id' field *must* be associated with the most-recently-sent command. Correct? --js ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: QMP and the 'id' parameter 2020-11-10 1:47 QMP and the 'id' parameter John Snow @ 2020-11-10 6:22 ` Markus Armbruster 2020-11-10 9:15 ` Daniel P. Berrangé 2020-11-10 16:32 ` John Snow 0 siblings, 2 replies; 11+ messages in thread From: Markus Armbruster @ 2020-11-10 6:22 UTC (permalink / raw) To: John Snow; +Cc: QEMU Developers John Snow <jsnow@redhat.com> writes: > The QMP specification states: > >> NOTE: Some errors can occur before the Server is able to read the "id" >> member, in these cases the "id" member will not be part of the error >> response, even if provided by the client. > > I am assuming this case ONLY occurs for Parse errors: > > {'class': 'GenericError', 'desc': 'JSON parse error, expecting value'} There are more "desc" possible, actually. The JSON parser gets fed chunks of input, and calls a callback for every full JSON value, and on parse error. QMP's callback is handle_qmp_command(). Parameter @req is the parsed JSON value, parameter @err is the (parse) error object, and exactly one of them is non-null. 1. Parse error If @err, we send an error response for it. It never has "id". See qmp_error_response() caller monitor_qmp_dispatcher_co(). The possible @err are: $ grep error_setg qobject/json-*[ch] qobject/json-parser.c: error_setg(&ctxt->err, "JSON parse error, %s", message); This is a syntax error. Search for parse_error() to see the possible @message patterns. qobject/json-streamer.c: error_setg(&err, "JSON parse error, stray '%s'", input->str); This is a lexical error. qobject/json-streamer.c: error_setg(&err, "JSON token size limit exceeded"); qobject/json-streamer.c: error_setg(&err, "JSON token count limit exceeded"); qobject/json-streamer.c: error_setg(&err, "JSON nesting depth limit exceeded"); These are (intentional) parser limits. 2. Successful parse If @req, it's a successful parse. If @req is not a JSON object, there is no "id". qmp_dispatch() reports error_setg(&err, "QMP input must be a JSON object"); If @req is a JSON object, it has "id" exactly when the client supplied one. The response mirrors @req's "id". See qmp_error_response() caller qmp_dispatch(). > And I am assuming, in the context of a client that /always/ sets an > 'id' for its execute statements, that this means that any error > response we receive without an 'id' field *must* be associated with > the most-recently-sent command. Only if the client keeps no more than one command in flight. Command responses get sent strictly in order (even parse errors), except for commands executed out-of-band. If the client sends N JSON values, and only then reads responses, and there are no parse errors, then it'll get N responses. The i-th response is for the i-th JSON value, except responses to OOB command may "jump the queue". With parse errors, it can get a different number of responses. Questions? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: QMP and the 'id' parameter 2020-11-10 6:22 ` Markus Armbruster @ 2020-11-10 9:15 ` Daniel P. Berrangé 2020-11-10 10:27 ` Markus Armbruster 2020-11-10 16:32 ` John Snow 1 sibling, 1 reply; 11+ messages in thread From: Daniel P. Berrangé @ 2020-11-10 9:15 UTC (permalink / raw) To: Markus Armbruster; +Cc: John Snow, QEMU Developers On Tue, Nov 10, 2020 at 07:22:26AM +0100, Markus Armbruster wrote: > John Snow <jsnow@redhat.com> writes: > > > The QMP specification states: > > > >> NOTE: Some errors can occur before the Server is able to read the "id" > >> member, in these cases the "id" member will not be part of the error > >> response, even if provided by the client. > > > > I am assuming this case ONLY occurs for Parse errors: > > > > {'class': 'GenericError', 'desc': 'JSON parse error, expecting value'} > > There are more "desc" possible, actually. > > The JSON parser gets fed chunks of input, and calls a callback for every > full JSON value, and on parse error. > > QMP's callback is handle_qmp_command(). Parameter @req is the parsed > JSON value, parameter @err is the (parse) error object, and exactly one > of them is non-null. > > 1. Parse error > > If @err, we send an error response for it. It never has "id". See > qmp_error_response() caller monitor_qmp_dispatcher_co(). The possible > @err are: > > $ grep error_setg qobject/json-*[ch] > qobject/json-parser.c: error_setg(&ctxt->err, "JSON parse error, %s", message); > > This is a syntax error. > > Search for parse_error() to see the possible @message patterns. > > qobject/json-streamer.c: error_setg(&err, "JSON parse error, stray '%s'", input->str); > > This is a lexical error. > > qobject/json-streamer.c: error_setg(&err, "JSON token size limit exceeded"); > qobject/json-streamer.c: error_setg(&err, "JSON token count limit exceeded"); > qobject/json-streamer.c: error_setg(&err, "JSON nesting depth limit exceeded"); > > These are (intentional) parser limits. > > 2. Successful parse > > If @req, it's a successful parse. > > If @req is not a JSON object, there is no "id". qmp_dispatch() reports > > error_setg(&err, "QMP input must be a JSON object"); > > If @req is a JSON object, it has "id" exactly when the client supplied > one. The response mirrors @req's "id". See qmp_error_response() caller > qmp_dispatch(). > > > And I am assuming, in the context of a client that /always/ sets an > > 'id' for its execute statements, that this means that any error > > response we receive without an 'id' field *must* be associated with > > the most-recently-sent command. > > Only if the client keeps no more than one command in flight. > > Command responses get sent strictly in order (even parse errors), except > for commands executed out-of-band. With out of band commands, how much runs in the background ? Is the JSON parsing still in the foreground, such that we can expect that even for OOB commands, a error response without a "id" is still received strictly in order. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: QMP and the 'id' parameter 2020-11-10 9:15 ` Daniel P. Berrangé @ 2020-11-10 10:27 ` Markus Armbruster 0 siblings, 0 replies; 11+ messages in thread From: Markus Armbruster @ 2020-11-10 10:27 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: John Snow, QEMU Developers Daniel P. Berrangé <berrange@redhat.com> writes: > On Tue, Nov 10, 2020 at 07:22:26AM +0100, Markus Armbruster wrote: [...] >> Command responses get sent strictly in order (even parse errors), except >> for commands executed out-of-band. > > With out of band commands, how much runs in the background ? Is the > JSON parsing still in the foreground, such that we can expect that > even for OOB commands, a error response without a "id" is still > received strictly in order. Yes, you can. We made sure both errors and results flow through the same pipeline[*]. The only fork is for OOB commands, and to take it, the parser must have yielded a JSON object. Use of exec-oob without "id" is foolish. Pipelining commands without "id" is merely unadvisable. [*] We queue parse errors along with successfully parsed values. The QMP dispatcher dequeues, executes if it's a request, sends the response, loop. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: QMP and the 'id' parameter 2020-11-10 6:22 ` Markus Armbruster 2020-11-10 9:15 ` Daniel P. Berrangé @ 2020-11-10 16:32 ` John Snow 2020-11-11 8:27 ` Markus Armbruster 1 sibling, 1 reply; 11+ messages in thread From: John Snow @ 2020-11-10 16:32 UTC (permalink / raw) To: Markus Armbruster; +Cc: QEMU Developers On 11/10/20 1:22 AM, Markus Armbruster wrote: > John Snow <jsnow@redhat.com> writes: > >> The QMP specification states: >> >>> NOTE: Some errors can occur before the Server is able to read the "id" >>> member, in these cases the "id" member will not be part of the error >>> response, even if provided by the client. >> >> I am assuming this case ONLY occurs for Parse errors: >> >> {'class': 'GenericError', 'desc': 'JSON parse error, expecting value'} > > There are more "desc" possible, actually. > > The JSON parser gets fed chunks of input, and calls a callback for every > full JSON value, and on parse error. > > QMP's callback is handle_qmp_command(). Parameter @req is the parsed > JSON value, parameter @err is the (parse) error object, and exactly one > of them is non-null. > > 1. Parse error > > If @err, we send an error response for it. It never has "id". See > qmp_error_response() caller monitor_qmp_dispatcher_co(). The possible > @err are: > > $ grep error_setg qobject/json-*[ch] > qobject/json-parser.c: error_setg(&ctxt->err, "JSON parse error, %s", message); > > This is a syntax error. > > Search for parse_error() to see the possible @message patterns. > > qobject/json-streamer.c: error_setg(&err, "JSON parse error, stray '%s'", input->str); > > This is a lexical error. > > qobject/json-streamer.c: error_setg(&err, "JSON token size limit exceeded"); > qobject/json-streamer.c: error_setg(&err, "JSON token count limit exceeded"); > qobject/json-streamer.c: error_setg(&err, "JSON nesting depth limit exceeded"); > > These are (intentional) parser limits. > > 2. Successful parse > > If @req, it's a successful parse. > > If @req is not a JSON object, there is no "id". qmp_dispatch() reports > > error_setg(&err, "QMP input must be a JSON object"); > > If @req is a JSON object, it has "id" exactly when the client supplied > one. The response mirrors @req's "id". See qmp_error_response() caller > qmp_dispatch(). > That's very helpful, thank you. So in summary, the possibilities are: 1. syntax error (with several descriptions) 2. lexical error (with several descriptions, templated from one message) 3. lexical limitation (with several descriptions) 4. grammatical error ("QMP input must be a JSON object") (I know we have declared the error_class passe and we now prefer "generic_error", but I lack the history lesson on why we do that. Wouldn't the error_class here be helpful for interpreting the response? It helps differentiate between 'The RPC call was received' vs 'The RPC call was received, but failed.' which may have semantic differences for who wants to consume the error: the QMP library itself, or the user making the RPC call. No?) >> And I am assuming, in the context of a client that /always/ sets an >> 'id' for its execute statements, that this means that any error >> response we receive without an 'id' field *must* be associated with >> the most-recently-sent command. > > Only if the client keeps no more than one command in flight. > OK, so without engaging the OOB capability, we receive responses strictly in a FIFO manner, always. So if we have: Send exec A, Send exec B, Send exec C and then we receive an error response with no ID attached, we know it must be "A" that errored. Correct? > Command responses get sent strictly in order (even parse errors), except > for commands executed out-of-band. > > If the client sends N JSON values, and only then reads responses, and > there are no parse errors, then it'll get N responses. The i-th > response is for the i-th JSON value, except responses to OOB command may > "jump the queue". > Let's assume now that A, B, and C are *all* OOB commands: - Send exec-oob A - Send exec-oob B - Send exec-oob C My client now reads an error message that has no ID attached. Which command execution was this associated with? Is it necessarily "A" if we have not yet received any other response? > With parse errors, it can get a different number of responses. > Oh, uhm -- if the parse error was bad enough that it didn't even notice we sent it three commands, right? ... So if we see a parse error, we might not know which, if any, of our queued commands were seen or can still expect a response. That seems difficult to code around, unless I misunderstand. (Every parser error I receive -- which I can "guess" that it's a parser error by the lack of an 'id' field, potentially invalidates the entire queue?) > Questions? > A few :) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: QMP and the 'id' parameter 2020-11-10 16:32 ` John Snow @ 2020-11-11 8:27 ` Markus Armbruster 2020-11-20 0:22 ` John Snow 0 siblings, 1 reply; 11+ messages in thread From: Markus Armbruster @ 2020-11-11 8:27 UTC (permalink / raw) To: John Snow; +Cc: QEMU Developers John Snow <jsnow@redhat.com> writes: > On 11/10/20 1:22 AM, Markus Armbruster wrote: >> John Snow <jsnow@redhat.com> writes: >> >>> The QMP specification states: >>> >>>> NOTE: Some errors can occur before the Server is able to read the "id" >>>> member, in these cases the "id" member will not be part of the error >>>> response, even if provided by the client. >>> >>> I am assuming this case ONLY occurs for Parse errors: >>> >>> {'class': 'GenericError', 'desc': 'JSON parse error, expecting value'} >> >> There are more "desc" possible, actually. >> >> The JSON parser gets fed chunks of input, and calls a callback for every >> full JSON value, and on parse error. >> >> QMP's callback is handle_qmp_command(). Parameter @req is the parsed >> JSON value, parameter @err is the (parse) error object, and exactly one >> of them is non-null. >> >> 1. Parse error >> >> If @err, we send an error response for it. It never has "id". See >> qmp_error_response() caller monitor_qmp_dispatcher_co(). The possible >> @err are: >> >> $ grep error_setg qobject/json-*[ch] >> qobject/json-parser.c: error_setg(&ctxt->err, "JSON parse error, %s", message); >> >> This is a syntax error. >> >> Search for parse_error() to see the possible @message patterns. >> >> qobject/json-streamer.c: error_setg(&err, "JSON parse error, stray '%s'", input->str); >> >> This is a lexical error. >> >> qobject/json-streamer.c: error_setg(&err, "JSON token size limit exceeded"); >> qobject/json-streamer.c: error_setg(&err, "JSON token count limit exceeded"); >> qobject/json-streamer.c: error_setg(&err, "JSON nesting depth limit exceeded"); >> >> These are (intentional) parser limits. >> >> 2. Successful parse >> >> If @req, it's a successful parse. >> >> If @req is not a JSON object, there is no "id". qmp_dispatch() reports >> >> error_setg(&err, "QMP input must be a JSON object"); >> >> If @req is a JSON object, it has "id" exactly when the client supplied >> one. The response mirrors @req's "id". See qmp_error_response() caller >> qmp_dispatch(). >> > > That's very helpful, thank you. So in summary, the possibilities are: > > 1. syntax error (with several descriptions) > 2. lexical error (with several descriptions, templated from one message) > 3. lexical limitation (with several descriptions) Nitpick: parser limitation. Nesting depth is not lexical :) > 4. grammatical error ("QMP input must be a JSON object") We got two layers of syntax: QAPI schema above JSON. The JSON layer builds JSON from characters, the QAPI schema layer builds QAPI schema from JSON. Errors 1-3 are for the JSON layer. Error 4 is a QAPI schema syntax error. The qobject input visitor reports more of them. Error 4 is special only because it's the only one where the JSON cannot have an ID. All the other QAPI schema syntax errors can have an ID. > (I know we have declared the error_class passe and we now prefer > "generic_error", but I lack the history lesson on why we do that. The initial design had "rich" error objects, basically: * An error QDict with a string "class" member and arbitrary data in a QDict "data" member To build a QMP error response, add the command's ID (if any) to the error QDict, and serialize to JSON. * A pointer into a table of pairs (JSON template string, human-friendly template string). More on the use of JSON template strings below. To build a human-friendly error message, interpolate the members of the data QDict into the human-friendly template. Say you're writing a piece of code. Among many other things, there are a few errors to report. You get to drop what you're doing, and do a three step dance: 1. Define a QERR_YOUR_NEW_ERROR macro that expands into a JSON template for the error QDict. These are all in one place, which is included all over the place. Merge conflicts galore. 2. Add a table entry mapping QERR_YOUR_NEW_ERROR to the human-friendly template. 3. Call the error function, passing QERR_YOUR_NEW_ERROR template and the arguments. The arguments must match the template. This builds the error QDict from template and the arguments, and looks up the template in the table. I strenuously objected to this. In the end, all I could accomplish was the inclusion of the human-friendly message in the QMP error response. I (correctly) predicted two issues: 1. Error classes multiplied. Not quite like rabbits, but bad enough to be confusing. Most of them were undocumented and of absolutely no use to QMP clients. 2. The three step dance proved bothersome, and people frequently took the obvious shortcut: instead of defining a new error that fits the situation, reuse some existing error. Even when it doesn't quite fit. Error message quality went into the toilet. After a few years of this, I went in with an axe, and nobody objected. Among the stuff I axed were most error classes. I spared only the ones that were documented and plausibly useful to some QMP client. > Wouldn't the error_class here be helpful for interpreting the response? > It helps differentiate between 'The RPC call was received' vs 'The RPC > call was received, but failed.' which may have semantic differences for > who wants to consume the error: the QMP library itself, or the user > making the RPC call. No?) Can we have more error classes? Yes, if they are documented and definitely useful to some QMP client that matters. >>> And I am assuming, in the context of a client that /always/ sets an >>> 'id' for its execute statements, that this means that any error >>> response we receive without an 'id' field *must* be associated with >>> the most-recently-sent command. >> >> Only if the client keeps no more than one command in flight. >> > > OK, so without engaging the OOB capability, we receive responses > strictly in a FIFO manner, always. > > So if we have: > > Send exec A, Send exec B, Send exec C > > and then we receive an error response with no ID attached, we know it > must be "A" that errored. Correct? Correct. QGA has a special case, but you probably don't want to know. Even though counting responses is a workable way to map responses back to requests, I recommend to either have at most one request in flight (so no counting is necessary), or to add IDs to all requests (so counting isn't necessary as long as you get the syntax right enough to avoid the ID-less errors discussed above). >> Command responses get sent strictly in order (even parse errors), except >> for commands executed out-of-band. >> >> If the client sends N JSON values, and only then reads responses, and >> there are no parse errors, then it'll get N responses. The i-th >> response is for the i-th JSON value, except responses to OOB command may >> "jump the queue". >> > > Let's assume now that A, B, and C are *all* OOB commands: > > - Send exec-oob A > - Send exec-oob B > - Send exec-oob C > > My client now reads an error message that has no ID attached. Which > command execution was this associated with? > > Is it necessarily "A" if we have not yet received any other response? In this case, it can only be "A", because exec-oob is executed right away, which means multiple exec-oob are executed in order, one after the other. More interesting example: - Send execute A - Send exec-oob B First, consider the "no errors" case. The first respone may be for A or, if B managed to jump the queue, for B. How? The I/O thread receives A, sends it to the main loop, receives B, executes it right away, and sends its result. Meanwhile, the main loop receives A, executes it and sends the result. Everything is received in the order it is sent. The two threads race to send result. That's a feature: it enables B to jump the queue. But what about ID-less errors? Never fear, we actually thought this out. The I/O thread needs to parse QMP input some to be able to recognize exec-oob. Any parse errors are sent to the main loop, just like requests. The main loop receives and processes in order. Errors it simply sends, requests it executes, then sends the result. This ensures that ID-less errors can never jump the queue. >> With parse errors, it can get a different number of responses. >> > > Oh, uhm -- if the parse error was bad enough that it didn't even notice > we sent it three commands, right? Yes. What can a parser do after detecting an error? Error recovery is fundamentally guesswork. The art has a substantial body of lore on minimizing error cascades. The basic technique is to silently skip tokens until a some promising point, then resume parsing in a suitably altered parser state. Our error recovery is simplistic. Happy to explain it in detail if you're curious. I figure you're more interested in practical advice today. docs/interop/qmp-spec.txt section has some: 2.6 Forcing the JSON parser into known-good state ------------------------------------------------- Incomplete or invalid input can leave the server's JSON parser in a state where it can't parse additional commands. To get it back into known-good state, the client should provoke a lexical error. The cleanest way to do that is sending an ASCII control character other than '\t' (horizontal tab), '\r' (carriage return), or '\n' (new line). Sadly, older versions of QEMU can fail to flag this as an error. If a client needs to deal with them, it should send a 0xFF byte. > we sent it three commands, right? ... So if we see a parse error, we > might not know which, if any, of our queued commands were seen or can > still expect a response. > > That seems difficult to code around, unless I misunderstand. "Doctor, it hurts when I send QMP input that isn't even JSON!" > (Every parser error I receive -- which I can "guess" that it's a parser > error by the lack of an 'id' field, potentially invalidates the entire > queue?) Yes, with stress on "potentially". How much gets thrown away depends on the error. You know for sure only when you receive a reply with an ID. >> Questions? >> > > A few :) More? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: QMP and the 'id' parameter 2020-11-11 8:27 ` Markus Armbruster @ 2020-11-20 0:22 ` John Snow 2020-11-20 10:25 ` Markus Armbruster 0 siblings, 1 reply; 11+ messages in thread From: John Snow @ 2020-11-20 0:22 UTC (permalink / raw) To: Markus Armbruster; +Cc: QEMU Developers On 11/11/20 3:27 AM, Markus Armbruster wrote: > John Snow <jsnow@redhat.com> writes: > >> On 11/10/20 1:22 AM, Markus Armbruster wrote: >>> John Snow <jsnow@redhat.com> writes: >>> >>>> The QMP specification states: >>>> >>>>> NOTE: Some errors can occur before the Server is able to read the "id" >>>>> member, in these cases the "id" member will not be part of the error >>>>> response, even if provided by the client. >>>> >>>> I am assuming this case ONLY occurs for Parse errors: >>>> >>>> {'class': 'GenericError', 'desc': 'JSON parse error, expecting value'} >>> >>> There are more "desc" possible, actually. >>> >>> The JSON parser gets fed chunks of input, and calls a callback for every >>> full JSON value, and on parse error. >>> >>> QMP's callback is handle_qmp_command(). Parameter @req is the parsed >>> JSON value, parameter @err is the (parse) error object, and exactly one >>> of them is non-null. >>> >>> 1. Parse error >>> >>> If @err, we send an error response for it. It never has "id". See >>> qmp_error_response() caller monitor_qmp_dispatcher_co(). The possible >>> @err are: >>> >>> $ grep error_setg qobject/json-*[ch] >>> qobject/json-parser.c: error_setg(&ctxt->err, "JSON parse error, %s", message); >>> >>> This is a syntax error. >>> >>> Search for parse_error() to see the possible @message patterns. >>> >>> qobject/json-streamer.c: error_setg(&err, "JSON parse error, stray '%s'", input->str); >>> >>> This is a lexical error. >>> >>> qobject/json-streamer.c: error_setg(&err, "JSON token size limit exceeded"); >>> qobject/json-streamer.c: error_setg(&err, "JSON token count limit exceeded"); >>> qobject/json-streamer.c: error_setg(&err, "JSON nesting depth limit exceeded"); >>> >>> These are (intentional) parser limits. >>> >>> 2. Successful parse >>> >>> If @req, it's a successful parse. >>> >>> If @req is not a JSON object, there is no "id". qmp_dispatch() reports >>> >>> error_setg(&err, "QMP input must be a JSON object"); >>> >>> If @req is a JSON object, it has "id" exactly when the client supplied >>> one. The response mirrors @req's "id". See qmp_error_response() caller >>> qmp_dispatch(). >>> >> >> That's very helpful, thank you. So in summary, the possibilities are: >> >> 1. syntax error (with several descriptions) >> 2. lexical error (with several descriptions, templated from one message) >> 3. lexical limitation (with several descriptions) > > Nitpick: parser limitation. Nesting depth is not lexical :) > >> 4. grammatical error ("QMP input must be a JSON object") > > We got two layers of syntax: QAPI schema above JSON. The JSON layer > builds JSON from characters, the QAPI schema layer builds QAPI schema > from JSON. > > Errors 1-3 are for the JSON layer. > > Error 4 is a QAPI schema syntax error. The qobject input visitor > reports more of them. Error 4 is special only because it's the only one > where the JSON cannot have an ID. All the other QAPI schema syntax > errors can have an ID. > >> (I know we have declared the error_class passe and we now prefer >> "generic_error", but I lack the history lesson on why we do that. > > The initial design had "rich" error objects, basically: > > * An error QDict with a string "class" member and arbitrary data in a > QDict "data" member > > To build a QMP error response, add the command's ID (if any) to the > error QDict, and serialize to JSON. > > * A pointer into a table of pairs (JSON template string, human-friendly > template string). More on the use of JSON template strings below. > > To build a human-friendly error message, interpolate the members of > the data QDict into the human-friendly template. > > Say you're writing a piece of code. Among many other things, there are > a few errors to report. You get to drop what you're doing, and do a > three step dance: > > 1. Define a QERR_YOUR_NEW_ERROR macro that expands into a JSON template > for the error QDict. These are all in one place, which is included all > over the place. Merge conflicts galore. > > 2. Add a table entry mapping QERR_YOUR_NEW_ERROR to the human-friendly > template. > > 3. Call the error function, passing QERR_YOUR_NEW_ERROR template and the > arguments. The arguments must match the template. This builds the > error QDict from template and the arguments, and looks up the template > in the table. > > I strenuously objected to this. In the end, all I could accomplish was > the inclusion of the human-friendly message in the QMP error response. > > I (correctly) predicted two issues: > > 1. Error classes multiplied. Not quite like rabbits, but bad enough to > be confusing. Most of them were undocumented and of absolutely no use > to QMP clients. > > 2. The three step dance proved bothersome, and people frequently took > the obvious shortcut: instead of defining a new error that fits the > situation, reuse some existing error. Even when it doesn't quite fit. > Error message quality went into the toilet. > > After a few years of this, I went in with an axe, and nobody objected. > > Among the stuff I axed were most error classes. I spared only the ones > that were documented and plausibly useful to some QMP client. > >> Wouldn't the error_class here be helpful for interpreting the response? >> It helps differentiate between 'The RPC call was received' vs 'The RPC >> call was received, but failed.' which may have semantic differences for >> who wants to consume the error: the QMP library itself, or the user >> making the RPC call. No?) > > Can we have more error classes? Yes, if they are documented and > definitely useful to some QMP client that matters. > OK, understood. It makes sense. I can see how semantic error classes would be a nuisance for a program as large as QEMU. However, making a distinction between errors at the JSON/QMP grammar level vs those at the QAPI level might be useful*. (*Maybe. A client library may wish to route errors differently based on if it's the "user's fault" or the "library's fault". However, an error with no ID is in the realm of "Not the user's fault". We can get by without it being explicit, but I still wonder if it'd be "Nice".) >>>> And I am assuming, in the context of a client that /always/ sets an >>>> 'id' for its execute statements, that this means that any error >>>> response we receive without an 'id' field *must* be associated with >>>> the most-recently-sent command. >>> >>> Only if the client keeps no more than one command in flight. >>> >> >> OK, so without engaging the OOB capability, we receive responses >> strictly in a FIFO manner, always. >> >> So if we have: >> >> Send exec A, Send exec B, Send exec C >> >> and then we receive an error response with no ID attached, we know it >> must be "A" that errored. Correct? > > Correct. > > QGA has a special case, but you probably don't want to know. > I do want to know, unfortunately. A good QMP client should likely support both targets. > Even though counting responses is a workable way to map responses back > to requests, I recommend to either have at most one request in flight > (so no counting is necessary), or to add IDs to all requests (so > counting isn't necessary as long as you get the syntax right enough to > avoid the ID-less errors discussed above). > I am taking this approach. I still need some error handling for the case that an incoming response cannot be routed to the correct pending queue. The QMP spec states that if I get an ID and I don't recognize it, I am free to drop the response on the floor (So I have done so), but I need to handle the case that I see a response with no ID at all. Perhaps the easiest, and most sensible thing to do, is to declare this a catastrophic failure and move the QMP connection into an error state, invalidate the whole queue, and disconnect. (And then maybe some of the commands you issued got processed, maybe they didn't. I guess it's up to the application driving the library to query around to find out what happened.) >>> Command responses get sent strictly in order (even parse errors), except >>> for commands executed out-of-band. >>> >>> If the client sends N JSON values, and only then reads responses, and >>> there are no parse errors, then it'll get N responses. The i-th >>> response is for the i-th JSON value, except responses to OOB command may >>> "jump the queue". >>> >> >> Let's assume now that A, B, and C are *all* OOB commands: >> >> - Send exec-oob A >> - Send exec-oob B >> - Send exec-oob C >> >> My client now reads an error message that has no ID attached. Which >> command execution was this associated with? >> >> Is it necessarily "A" if we have not yet received any other response? > > In this case, it can only be "A", because exec-oob is executed right > away, which means multiple exec-oob are executed in order, one after the > other. > I see. So OOB commands don't execute in parallel, and when they "jump the queue", further executions are not received. Right? > More interesting example: > > - Send execute A > - Send exec-oob B > > First, consider the "no errors" case. The first respone may be for A > or, if B managed to jump the queue, for B. > > How? The I/O thread receives A, sends it to the main loop, receives B, > executes it right away, and sends its result. > > Meanwhile, the main loop receives A, executes it and sends the result. > > Everything is received in the order it is sent. > Conceptually, then: There's one input queue and two pending execution slots. One slot is for OOB commands, and the other slot is for traditional commands. The input queue is FIFO, and when we pull a request from the queue, it occupies one of the conceptual slots (normal/oob). If the slot for this command type is already occupied, we block until it's free. Is that the correct mental model? (Even if it's not really even vaguely correct, details-wise.) > The two threads race to send result. That's a feature: it enables B to > jump the queue. > > But what about ID-less errors? Never fear, we actually thought this > out. > I trusted that you did! > The I/O thread needs to parse QMP input some to be able to recognize > exec-oob. Any parse errors are sent to the main loop, just like > requests. The main loop receives and processes in order. Errors it > simply sends, requests it executes, then sends the result. > > This ensures that ID-less errors can never jump the queue. > OK, understood. Essentially, parse errors for OOB commands lose their "queue-jumping" ability (Or, I suppose, they never *gain* it.) They're traditional commands until proven otherwise. If we can prove otherwise, we already know it isn't a parsing error. >>> With parse errors, it can get a different number of responses. >>> >> >> Oh, uhm -- if the parse error was bad enough that it didn't even notice >> we sent it three commands, right? > > Yes. > > What can a parser do after detecting an error? Error recovery is > fundamentally guesswork. The art has a substantial body of lore on > minimizing error cascades. The basic technique is to silently skip > tokens until a some promising point, then resume parsing in a suitably > altered parser state. > > Our error recovery is simplistic. Happy to explain it in detail if > you're curious. I figure you're more interested in practical advice > today. docs/interop/qmp-spec.txt section has some: > Little bit of both, really. I thank you for your time so far. > 2.6 Forcing the JSON parser into known-good state > ------------------------------------------------- > > Incomplete or invalid input can leave the server's JSON parser in a > state where it can't parse additional commands. To get it back into > known-good state, the client should provoke a lexical error. > > The cleanest way to do that is sending an ASCII control character > other than '\t' (horizontal tab), '\r' (carriage return), or '\n' (new > line). > > Sadly, older versions of QEMU can fail to flag this as an error. If a > client needs to deal with them, it should send a 0xFF byte. > Is there a reason to not always use \xFF ? >> we sent it three commands, right? ... So if we see a parse error, we >> might not know which, if any, of our queued commands were seen or can >> still expect a response. >> >> That seems difficult to code around, unless I misunderstand. > > "Doctor, it hurts when I send QMP input that isn't even JSON!" > Haha. I don't endeavor to keep causing parser errors. >> (Every parser error I receive -- which I can "guess" that it's a parser >> error by the lack of an 'id' field, potentially invalidates the entire >> queue?) > > Yes, with stress on "potentially". How much gets thrown away depends on > the error. You know for sure only when you receive a reply with an ID. > >>> Questions? >>> >> >> A few :) > > More? > None of tremendous import, I think, by now. Maybe the GQA stuff, but that can be later. --js ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: QMP and the 'id' parameter 2020-11-20 0:22 ` John Snow @ 2020-11-20 10:25 ` Markus Armbruster 2020-11-20 16:49 ` John Snow 0 siblings, 1 reply; 11+ messages in thread From: Markus Armbruster @ 2020-11-20 10:25 UTC (permalink / raw) To: John Snow; +Cc: QEMU Developers John Snow <jsnow@redhat.com> writes: > On 11/11/20 3:27 AM, Markus Armbruster wrote: >> John Snow <jsnow@redhat.com> writes: >> >>> On 11/10/20 1:22 AM, Markus Armbruster wrote: >>>> John Snow <jsnow@redhat.com> writes: >>>> >>>>> The QMP specification states: >>>>> >>>>>> NOTE: Some errors can occur before the Server is able to read the "id" >>>>>> member, in these cases the "id" member will not be part of the error >>>>>> response, even if provided by the client. >>>>> >>>>> I am assuming this case ONLY occurs for Parse errors: >>>>> >>>>> {'class': 'GenericError', 'desc': 'JSON parse error, expecting value'} >>>> >>>> There are more "desc" possible, actually. >>>> >>>> The JSON parser gets fed chunks of input, and calls a callback for every >>>> full JSON value, and on parse error. >>>> >>>> QMP's callback is handle_qmp_command(). Parameter @req is the parsed >>>> JSON value, parameter @err is the (parse) error object, and exactly one >>>> of them is non-null. >>>> >>>> 1. Parse error >>>> >>>> If @err, we send an error response for it. It never has "id". See >>>> qmp_error_response() caller monitor_qmp_dispatcher_co(). The possible >>>> @err are: >>>> >>>> $ grep error_setg qobject/json-*[ch] >>>> qobject/json-parser.c: error_setg(&ctxt->err, "JSON parse error, %s", message); >>>> >>>> This is a syntax error. >>>> >>>> Search for parse_error() to see the possible @message patterns. >>>> >>>> qobject/json-streamer.c: error_setg(&err, "JSON parse error, stray '%s'", input->str); >>>> >>>> This is a lexical error. >>>> >>>> qobject/json-streamer.c: error_setg(&err, "JSON token size limit exceeded"); >>>> qobject/json-streamer.c: error_setg(&err, "JSON token count limit exceeded"); >>>> qobject/json-streamer.c: error_setg(&err, "JSON nesting depth limit exceeded"); >>>> >>>> These are (intentional) parser limits. >>>> >>>> 2. Successful parse >>>> >>>> If @req, it's a successful parse. >>>> >>>> If @req is not a JSON object, there is no "id". qmp_dispatch() reports >>>> >>>> error_setg(&err, "QMP input must be a JSON object"); >>>> >>>> If @req is a JSON object, it has "id" exactly when the client supplied >>>> one. The response mirrors @req's "id". See qmp_error_response() caller >>>> qmp_dispatch(). >>>> >>> >>> That's very helpful, thank you. So in summary, the possibilities are: >>> >>> 1. syntax error (with several descriptions) >>> 2. lexical error (with several descriptions, templated from one message) >>> 3. lexical limitation (with several descriptions) >> >> Nitpick: parser limitation. Nesting depth is not lexical :) >> >>> 4. grammatical error ("QMP input must be a JSON object") >> >> We got two layers of syntax: QAPI schema above JSON. The JSON layer >> builds JSON from characters, the QAPI schema layer builds QAPI schema >> from JSON. >> >> Errors 1-3 are for the JSON layer. >> >> Error 4 is a QAPI schema syntax error. The qobject input visitor >> reports more of them. Error 4 is special only because it's the only one >> where the JSON cannot have an ID. All the other QAPI schema syntax >> errors can have an ID. >> >>> (I know we have declared the error_class passe and we now prefer >>> "generic_error", but I lack the history lesson on why we do that. >> >> The initial design had "rich" error objects, basically: >> >> * An error QDict with a string "class" member and arbitrary data in a >> QDict "data" member >> >> To build a QMP error response, add the command's ID (if any) to the >> error QDict, and serialize to JSON. >> >> * A pointer into a table of pairs (JSON template string, human-friendly >> template string). More on the use of JSON template strings below. >> >> To build a human-friendly error message, interpolate the members of >> the data QDict into the human-friendly template. >> >> Say you're writing a piece of code. Among many other things, there are >> a few errors to report. You get to drop what you're doing, and do a >> three step dance: >> >> 1. Define a QERR_YOUR_NEW_ERROR macro that expands into a JSON template >> for the error QDict. These are all in one place, which is included all >> over the place. Merge conflicts galore. >> >> 2. Add a table entry mapping QERR_YOUR_NEW_ERROR to the human-friendly >> template. >> >> 3. Call the error function, passing QERR_YOUR_NEW_ERROR template and the >> arguments. The arguments must match the template. This builds the >> error QDict from template and the arguments, and looks up the template >> in the table. >> >> I strenuously objected to this. In the end, all I could accomplish was >> the inclusion of the human-friendly message in the QMP error response. >> >> I (correctly) predicted two issues: >> >> 1. Error classes multiplied. Not quite like rabbits, but bad enough to >> be confusing. Most of them were undocumented and of absolutely no use >> to QMP clients. >> >> 2. The three step dance proved bothersome, and people frequently took >> the obvious shortcut: instead of defining a new error that fits the >> situation, reuse some existing error. Even when it doesn't quite fit. >> Error message quality went into the toilet. >> >> After a few years of this, I went in with an axe, and nobody objected. >> >> Among the stuff I axed were most error classes. I spared only the ones >> that were documented and plausibly useful to some QMP client. >> >>> Wouldn't the error_class here be helpful for interpreting the response? >>> It helps differentiate between 'The RPC call was received' vs 'The RPC >>> call was received, but failed.' which may have semantic differences for >>> who wants to consume the error: the QMP library itself, or the user >>> making the RPC call. No?) >> >> Can we have more error classes? Yes, if they are documented and >> definitely useful to some QMP client that matters. >> > > OK, understood. It makes sense. I can see how semantic error classes > would be a nuisance for a program as large as QEMU. However, making a > distinction between errors at the JSON/QMP grammar level vs those at > the QAPI level might be useful*. > > (*Maybe. A client library may wish to route errors differently based > on if it's the "user's fault" or the "library's fault". However, an > error with no ID is in the realm of "Not the user's fault". We can get > by without it being explicit, but I still wonder if it'd be "Nice".) If you find yourself in a situation where an error class would definitely be useful, we should talk about providing you one. >>>>> And I am assuming, in the context of a client that /always/ sets an >>>>> 'id' for its execute statements, that this means that any error >>>>> response we receive without an 'id' field *must* be associated with >>>>> the most-recently-sent command. >>>> >>>> Only if the client keeps no more than one command in flight. >>>> >>> >>> OK, so without engaging the OOB capability, we receive responses >>> strictly in a FIFO manner, always. >>> >>> So if we have: >>> >>> Send exec A, Send exec B, Send exec C >>> >>> and then we receive an error response with no ID attached, we know it >>> must be "A" that errored. Correct? >> >> Correct. >> >> QGA has a special case, but you probably don't want to know. > > I do want to know, unfortunately. A good QMP client should likely > support both targets. QGA has a few commands that send *nothing* on success. qapi-code-gen.txt explains: Normally, the QAPI schema is used to describe synchronous exchanges, where a response is expected. But in some cases, the action of a command is expected to change state in a way that a successful response is not possible (although the command will still return an error object on failure). When a successful reply is not possible, the command definition includes the optional member 'success-response' with boolean value false. So far, only QGA makes use of this member. qmp-spec.txt neglects to cover this special case. >> Even though counting responses is a workable way to map responses back >> to requests, I recommend to either have at most one request in flight >> (so no counting is necessary), or to add IDs to all requests (so >> counting isn't necessary as long as you get the syntax right enough to >> avoid the ID-less errors discussed above). > > I am taking this approach. I still need some error handling for the > case that an incoming response cannot be routed to the correct pending > queue. > > The QMP spec states that if I get an ID and I don't recognize it, I am > free to drop the response on the floor (So I have done so), Should not happen. When it does, something is wrong, possibly seriously wrong. One possible scenario: ID got corrupted on the way from client via server back to the client. The response with the uncorrupted ID will never come. Other responses and events may still arrive fine. Two recovery strategies: 1. Drop the unexpected response, carry on. If commands are in flight, be prepared for missing responses. 2. Give up, close connection. Pick the strategy that best fits your client's needs. > but I need > to handle the case that I see a response with no ID at all. Should not happen as long as the client sends syntactically sane input. When it does, something is wrong, possibly seriously wrong. > Perhaps the easiest, and most sensible thing to do, is to declare this > a catastrophic failure and move the QMP connection into an error > state, invalidate the whole queue, and disconnect. > > (And then maybe some of the commands you issued got processed, maybe > they didn't. I guess it's up to the application driving the library to > query around to find out what happened.) Again, two recovery strategies: 1. Drop the unexpected response, carry on. If commands are in flight, be prepared for missing responses. The server's JSON parser may now be in a state where it can't parse additional commands. You may want to provoke a lexical error to force it into known-good state, as explained below. 2. Give up, close connection. >>>> Command responses get sent strictly in order (even parse errors), except >>>> for commands executed out-of-band. >>>> >>>> If the client sends N JSON values, and only then reads responses, and >>>> there are no parse errors, then it'll get N responses. The i-th >>>> response is for the i-th JSON value, except responses to OOB command may >>>> "jump the queue". >>>> >>> >>> Let's assume now that A, B, and C are *all* OOB commands: >>> >>> - Send exec-oob A >>> - Send exec-oob B >>> - Send exec-oob C >>> >>> My client now reads an error message that has no ID attached. Which >>> command execution was this associated with? >>> >>> Is it necessarily "A" if we have not yet received any other response? >> >> In this case, it can only be "A", because exec-oob is executed right >> away, which means multiple exec-oob are executed in order, one after the >> other. > > I see. So OOB commands don't execute in parallel, and when they "jump > the queue", further executions are not received. Right? Right. It's "out of band", not "in parallel". >> More interesting example: >> >> - Send execute A >> - Send exec-oob B >> >> First, consider the "no errors" case. The first respone may be for A >> or, if B managed to jump the queue, for B. >> >> How? The I/O thread receives A, sends it to the main loop, receives B, >> executes it right away, and sends its result. >> >> Meanwhile, the main loop receives A, executes it and sends the result. >> >> Everything is received in the order it is sent. > > Conceptually, then: There's one input queue and two pending execution > slots. One slot is for OOB commands, and the other slot is for > traditional commands. > > The input queue is FIFO, and when we pull a request from the queue, it > occupies one of the conceptual slots (normal/oob). If the slot for > this command type is already occupied, we block until it's free. > > Is that the correct mental model? (Even if it's not really even > vaguely correct, details-wise.) Almost. Consider - Send execute A - Send execute B - Send exec-oob C - Send execute D What would happen in the server in this model? The I/O thread receives in-band A, and puts it into the empty in-band slot. Now the I/O thread doesn't want to receive anything, because if it receives an in-band command, it can't do anything with it. So it suspends reading. The main thread takes in-band A out of the slot (resuming reading), and starts executing it. A takes its own sweet time. The I/O thread receives in-band B, and puts it into the empty in-band slot, and suspends reading. Eventually, A terminates. The main thread takes in-band B out of the slot (resuming reading), and starts executing it. The I/O thread receives out-of-band C, and puts it into the out-of-band slot. Note that we don't read, let alone execute C until after A is done. Out-of-band commands can jump the queue, but not exactly far. More accurate model: in-band commands go into a length-limited queue, out-of-band commands get executed immediately. What happens in the server in this model? The I/O thread receives in-band A, and puts it into the empty in-band queue. The I/O thread receives in-band B, and puts it into the non-full in-band queue. The I/O thread receives out-of-band C, and exectutes it right away. The I/O thread receives in-band C, and puts it into the non-full in-band queue. Meanwhile, the main thread takes in-band A out of the queue, and starts executing it. A takes its own sweet time. Only if the client has more in-band commands in flight than the in-band queue can hold, the I/O thread encounters a full queue and suspends reading. qmp-spec.txt: If the client sends in-band commands faster than the server can execute them, the server will stop reading the requests from the QMP channel until the request queue length is reduced to an acceptable range. The queue length should probably be specified here. It's 8. The server limits the in-band request queue length for the same robustness reasons it limits JSON token length, token count and nesting depth: to not let a malfunctioning client make it consume unlimited amounts of memory. Hole (kind of): it doesn't limit output queue length. >> The two threads race to send result. That's a feature: it enables B to >> jump the queue. >> >> But what about ID-less errors? Never fear, we actually thought this >> out. > > I trusted that you did! > >> The I/O thread needs to parse QMP input some to be able to recognize >> exec-oob. Any parse errors are sent to the main loop, just like >> requests. The main loop receives and processes in order. Errors it >> simply sends, requests it executes, then sends the result. >> >> This ensures that ID-less errors can never jump the queue. > > OK, understood. Essentially, parse errors for OOB commands lose their > "queue-jumping" ability (Or, I suppose, they never *gain* it.) They're > traditional commands until proven otherwise. If we can prove > otherwise, we already know it isn't a parsing error. Correct! >>>> With parse errors, it can get a different number of responses. >>>> >>> >>> Oh, uhm -- if the parse error was bad enough that it didn't even notice >>> we sent it three commands, right? >> >> Yes. >> >> What can a parser do after detecting an error? Error recovery is >> fundamentally guesswork. The art has a substantial body of lore on >> minimizing error cascades. The basic technique is to silently skip >> tokens until a some promising point, then resume parsing in a suitably >> altered parser state. >> >> Our error recovery is simplistic. Happy to explain it in detail if >> you're curious. I figure you're more interested in practical advice >> today. docs/interop/qmp-spec.txt section has some: > > Little bit of both, really. I thank you for your time so far. > >> 2.6 Forcing the JSON parser into known-good state >> ------------------------------------------------- >> Incomplete or invalid input can leave the server's JSON parser in a >> state where it can't parse additional commands. To get it back into >> known-good state, the client should provoke a lexical error. >> >> The cleanest way to do that is sending an ASCII control character >> other than '\t' (horizontal tab), '\r' (carriage return), or '\n' (new >> line). >> >> Sadly, older versions of QEMU can fail to flag this as an error. If a >> client needs to deal with them, it should send a 0xFF byte. > > Is there a reason to not always use \xFF ? It's invalid UTF-8. "Older versions" should probably read "ancient versions" by now. I don't remember offhand which commit fixed the JSON parser to do the right thing in all states. >>> we sent it three commands, right? ... So if we see a parse error, we >>> might not know which, if any, of our queued commands were seen or can >>> still expect a response. >>> >>> That seems difficult to code around, unless I misunderstand. >> >> "Doctor, it hurts when I send QMP input that isn't even JSON!" > > Haha. I don't endeavor to keep causing parser errors. > >>> (Every parser error I receive -- which I can "guess" that it's a parser >>> error by the lack of an 'id' field, potentially invalidates the entire >>> queue?) >> >> Yes, with stress on "potentially". How much gets thrown away depends on >> the error. You know for sure only when you receive a reply with an ID. >> >>>> Questions? >>>> >>> >>> A few :) >> >> More? > > None of tremendous import, I think, by now. Maybe the GQA stuff, but > that can be later. Still more? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: QMP and the 'id' parameter 2020-11-20 10:25 ` Markus Armbruster @ 2020-11-20 16:49 ` John Snow 2020-11-23 6:57 ` Markus Armbruster 0 siblings, 1 reply; 11+ messages in thread From: John Snow @ 2020-11-20 16:49 UTC (permalink / raw) To: Markus Armbruster; +Cc: QEMU Developers On 11/20/20 5:25 AM, Markus Armbruster wrote: > John Snow <jsnow@redhat.com> writes: >> On 11/11/20 3:27 AM, Markus Armbruster wrote: >>> John Snow <jsnow@redhat.com> writes: >>>> On 11/10/20 1:22 AM, Markus Armbruster wrote: >>>>> John Snow <jsnow@redhat.com> writes: > > If you find yourself in a situation where an error class would > definitely be useful, we should talk about providing you one. > OK, thanks. I just wanted to check there wasn't a stronger opposition than I was aware of. I only vaguely knew they were "on the way out." >>> QGA has a special case, but you probably don't want to know. >> >> I do want to know, unfortunately. A good QMP client should likely >> support both targets. > > QGA has a few commands that send *nothing* on success. > > qapi-code-gen.txt explains: > > Normally, the QAPI schema is used to describe synchronous exchanges, > where a response is expected. But in some cases, the action of a > command is expected to change state in a way that a successful > response is not possible (although the command will still return an > error object on failure). When a successful reply is not possible, > the command definition includes the optional member 'success-response' > with boolean value false. So far, only QGA makes use of this member. > > qmp-spec.txt neglects to cover this special case. > Oh, I see. That's fun! (Was it not possible to have the client send an ACK response that doesn't indicate success, but just signals receipt? Semantically, it would be the difference between do-x and start-x as a command.) >>> Even though counting responses is a workable way to map responses back >>> to requests, I recommend to either have at most one request in flight >>> (so no counting is necessary), or to add IDs to all requests (so >>> counting isn't necessary as long as you get the syntax right enough to >>> avoid the ID-less errors discussed above). >> >> I am taking this approach. I still need some error handling for the >> case that an incoming response cannot be routed to the correct pending >> queue. >> >> The QMP spec states that if I get an ID and I don't recognize it, I am >> free to drop the response on the floor (So I have done so), > > Should not happen. When it does, something is wrong, possibly seriously > wrong. > Yep, I agree. (In my client library: I have decided to enforce string IDs and not allow the caller to specify their own. Though we allow arbitrary objects to be used as an ID, being able to compare more complex objects seems more prone to failure than a simple string.) > One possible scenario: ID got corrupted on the way from client via > server back to the client. The response with the uncorrupted ID will > never come. Other responses and events may still arrive fine. > > Two recovery strategies: > > 1. Drop the unexpected response, carry on. If commands are in flight, > be prepared for missing responses. > > 2. Give up, close connection. > > Pick the strategy that best fits your client's needs. > Dropping it on the floor seems fine for now, the spec says I can! :) The most realistic scenario for this outside of catastrophic error is that a client sends a command, but then times out waiting for a response. Later, the response arrives -- but that context has already come and gone, so there's nowhere to route the message to. Effectively, an orphaned reply. I suppose I could always stash them in an orphaned queue and a very, very rigorous client could check the orphaned queue if something winds up in there, but I suspect most clients would actually be just as happy to let it fall on the floor. >> but I need >> to handle the case that I see a response with no ID at all. > > Should not happen as long as the client sends syntactically sane input. > When it does, something is wrong, possibly seriously wrong. > >> Perhaps the easiest, and most sensible thing to do, is to declare this >> a catastrophic failure and move the QMP connection into an error >> state, invalidate the whole queue, and disconnect. >> >> (And then maybe some of the commands you issued got processed, maybe >> they didn't. I guess it's up to the application driving the library to >> query around to find out what happened.) > > Again, two recovery strategies: > > 1. Drop the unexpected response, carry on. If commands are in flight, > be prepared for missing responses. The server's JSON parser may now > be in a state where it can't parse additional commands. You may want > to provoke a lexical error to force it into known-good state, as > explained below. > > 2. Give up, close connection. > 2 is easier for now. 1 involves some more advanced recovery logic, but as you say: something is wrong, possibly seriously wrong. >> I see. So OOB commands don't execute in parallel, and when they "jump >> the queue", further executions are not received. Right? > > Right. It's "out of band", not "in parallel". > OK. Naively I thought that new OOB commands would continue to be processed "out of band", but it's fine if that capacity is finite. >> Conceptually, then: There's one input queue and two pending execution >> slots. One slot is for OOB commands, and the other slot is for >> traditional commands. >> >> The input queue is FIFO, and when we pull a request from the queue, it >> occupies one of the conceptual slots (normal/oob). If the slot for >> this command type is already occupied, we block until it's free. >> >> Is that the correct mental model? (Even if it's not really even >> vaguely correct, details-wise.) > > Almost. > > Consider > > - Send execute A > - Send execute B > - Send exec-oob C > - Send execute D > [...] > The I/O thread receives in-band A, and puts it into the empty in-band > queue. > > The I/O thread receives in-band B, and puts it into the non-full in-band > queue. > > The I/O thread receives out-of-band C, and exectutes it right away. > Ah, I see where I went wrong. OK, it's crystal clear now. > The I/O thread receives in-band C, and puts it into the non-full in-band > queue. > > Meanwhile, the main thread takes in-band A out of the queue, and starts > executing it. A takes its own sweet time. > > Only if the client has more in-band commands in flight than the in-band > queue can hold, the I/O thread encounters a full queue and suspends > reading. > > qmp-spec.txt: > > If the client sends in-band commands faster than the server can > execute them, the server will stop reading the requests from the QMP > channel until the request queue length is reduced to an acceptable > range. > > The queue length should probably be specified here. It's 8. > Oh! That's very helpful to know, actually. If you write a patch to amend the language of the doc, I will give you my R-B. > The server limits the in-band request queue length for the same > robustness reasons it limits JSON token length, token count and nesting > depth: to not let a malfunctioning client make it consume unlimited > amounts of memory. Hole (kind of): it doesn't limit output queue > length. > Understood! >> Is there a reason to not always use \xFF ? > > It's invalid UTF-8. > > "Older versions" should probably read "ancient versions" by now. I > don't remember offhand which commit fixed the JSON parser to do the > right thing in all states. > I suppose right now I don't have any code that tries to unjam the parser, so I suppose it doesn't matter. If it did, I guess I would just try to assume that I was allowed to use the newer method and too-bad to folks trying to talk to ancient stuff. >>>>> Questions? >>>> A few :) >>> More? >> None of tremendous import, I think, by now. Maybe the GQA stuff, but >> that can be later. > Still more? Where does matter come from? If energy cannot be created or destroyed, how did it come to be? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: QMP and the 'id' parameter 2020-11-20 16:49 ` John Snow @ 2020-11-23 6:57 ` Markus Armbruster 2020-11-30 18:32 ` John Snow 0 siblings, 1 reply; 11+ messages in thread From: Markus Armbruster @ 2020-11-23 6:57 UTC (permalink / raw) To: John Snow; +Cc: Michael Roth, QEMU Developers John Snow <jsnow@redhat.com> writes: > On 11/20/20 5:25 AM, Markus Armbruster wrote: >> John Snow <jsnow@redhat.com> writes: >>> On 11/11/20 3:27 AM, Markus Armbruster wrote: >>>> John Snow <jsnow@redhat.com> writes: >>>>> On 11/10/20 1:22 AM, Markus Armbruster wrote: >>>>>> John Snow <jsnow@redhat.com> writes: > >> >> If you find yourself in a situation where an error class would >> definitely be useful, we should talk about providing you one. >> > > OK, thanks. I just wanted to check there wasn't a stronger opposition > than I was aware of. I only vaguely knew they were "on the way out." > >>>> QGA has a special case, but you probably don't want to know. >>> >>> I do want to know, unfortunately. A good QMP client should likely >>> support both targets. >> >> QGA has a few commands that send *nothing* on success. >> >> qapi-code-gen.txt explains: >> >> Normally, the QAPI schema is used to describe synchronous exchanges, >> where a response is expected. But in some cases, the action of a >> command is expected to change state in a way that a successful >> response is not possible (although the command will still return an >> error object on failure). When a successful reply is not possible, >> the command definition includes the optional member 'success-response' >> with boolean value false. So far, only QGA makes use of this member. >> >> qmp-spec.txt neglects to cover this special case. >> > > Oh, I see. That's fun! > > (Was it not possible to have the client send an ACK response that > doesn't indicate success, but just signals receipt? Semantically, it > would be the difference between do-x and start-x as a command.) Feels quite possible to me. If I read git-log correctly, the commands' design ignored the race with shutdown / suspend, and 'success-response': false was a quick fix. I think changing the commands from 'do-x' to 'start-x' would have been better. A bit more work, though, and back then there weren't any 'start-x' examples to follow, I guess. I wish success-response didn't exist, but is getting rid of it worth changing the interface? I honestly don't know. >>>> Even though counting responses is a workable way to map responses back >>>> to requests, I recommend to either have at most one request in flight >>>> (so no counting is necessary), or to add IDs to all requests (so >>>> counting isn't necessary as long as you get the syntax right enough to >>>> avoid the ID-less errors discussed above). >>> >>> I am taking this approach. I still need some error handling for the >>> case that an incoming response cannot be routed to the correct pending >>> queue. >>> >>> The QMP spec states that if I get an ID and I don't recognize it, I am >>> free to drop the response on the floor (So I have done so), >> >> Should not happen. When it does, something is wrong, possibly seriously >> wrong. > > Yep, I agree. > > (In my client library: I have decided to enforce string IDs and not > allow the caller to specify their own. Though we allow arbitrary objects > to be used as an ID, being able to compare more complex objects seems > more prone to failure than a simple string.) Makes sense. >> One possible scenario: ID got corrupted on the way from client via >> server back to the client. The response with the uncorrupted ID will >> never come. Other responses and events may still arrive fine. >> >> Two recovery strategies: >> >> 1. Drop the unexpected response, carry on. If commands are in flight, >> be prepared for missing responses. >> >> 2. Give up, close connection. >> >> Pick the strategy that best fits your client's needs. > > Dropping it on the floor seems fine for now, the spec says I can! :) > > The most realistic scenario for this outside of catastrophic error is > that a client sends a command, but then times out waiting for a > response. Later, the response arrives -- but that context has already > come and gone, so there's nowhere to route the message to. > > Effectively, an orphaned reply. I suppose I could always stash them in > an orphaned queue and a very, very rigorous client could check the > orphaned queue if something winds up in there, but I suspect most > clients would actually be just as happy to let it fall on the floor. Sounds okay to me. >>> but I need >>> to handle the case that I see a response with no ID at all. >> >> Should not happen as long as the client sends syntactically sane input. >> When it does, something is wrong, possibly seriously wrong. >> >>> Perhaps the easiest, and most sensible thing to do, is to declare this >>> a catastrophic failure and move the QMP connection into an error >>> state, invalidate the whole queue, and disconnect. >>> >>> (And then maybe some of the commands you issued got processed, maybe >>> they didn't. I guess it's up to the application driving the library to >>> query around to find out what happened.) >> >> Again, two recovery strategies: >> >> 1. Drop the unexpected response, carry on. If commands are in flight, >> be prepared for missing responses. The server's JSON parser may now >> be in a state where it can't parse additional commands. You may want >> to provoke a lexical error to force it into known-good state, as >> explained below. >> >> 2. Give up, close connection. >> > > 2 is easier for now. 1 involves some more advanced recovery logic, but > as you say: something is wrong, possibly seriously wrong. Taking the easy way out sounds okay to me. Build more advanced recovery when you *know* you need it. >>> I see. So OOB commands don't execute in parallel, and when they "jump >>> the queue", further executions are not received. Right? >> >> Right. It's "out of band", not "in parallel". > > OK. Naively I thought that new OOB commands would continue to be > processed "out of band", but it's fine if that capacity is finite. The out-of-band feature is not meant to compete with jobs. It's a narrow solution to a specific problem: robust postcopy migration recovery needs to get in certain commands reliably and quickly, even previously sent commands hog the monitor. A second use has since come up: a 'yank' command to recover from hung network connections by shutting them down. >>> Conceptually, then: There's one input queue and two pending execution >>> slots. One slot is for OOB commands, and the other slot is for >>> traditional commands. >>> >>> The input queue is FIFO, and when we pull a request from the queue, it >>> occupies one of the conceptual slots (normal/oob). If the slot for >>> this command type is already occupied, we block until it's free. >>> >>> Is that the correct mental model? (Even if it's not really even >>> vaguely correct, details-wise.) >> >> Almost. >> >> Consider >> >> - Send execute A >> - Send execute B >> - Send exec-oob C >> - Send execute D >> > > [...] > >> The I/O thread receives in-band A, and puts it into the empty in-band >> queue. >> >> The I/O thread receives in-band B, and puts it into the non-full in-band >> queue. >> >> The I/O thread receives out-of-band C, and exectutes it right away. >> > > Ah, I see where I went wrong. OK, it's crystal clear now. > >> The I/O thread receives in-band C, and puts it into the non-full in-band >> queue. >> >> Meanwhile, the main thread takes in-band A out of the queue, and starts >> executing it. A takes its own sweet time. >> >> Only if the client has more in-band commands in flight than the in-band >> queue can hold, the I/O thread encounters a full queue and suspends >> reading. >> >> qmp-spec.txt: >> >> If the client sends in-band commands faster than the server can >> execute them, the server will stop reading the requests from the QMP >> channel until the request queue length is reduced to an acceptable >> range. >> >> The queue length should probably be specified here. It's 8. >> > > Oh! That's very helpful to know, actually. If you write a patch to amend > the language of the doc, I will give you my R-B. Added to my queue. >> The server limits the in-band request queue length for the same >> robustness reasons it limits JSON token length, token count and nesting >> depth: to not let a malfunctioning client make it consume unlimited >> amounts of memory. Hole (kind of): it doesn't limit output queue >> length. >> > > Understood! > >>> Is there a reason to not always use \xFF ? >> >> It's invalid UTF-8. >> >> "Older versions" should probably read "ancient versions" by now. I >> don't remember offhand which commit fixed the JSON parser to do the >> right thing in all states. >> > > I suppose right now I don't have any code that tries to unjam the > parser, so I suppose it doesn't matter. > > If it did, I guess I would just try to assume that I was allowed to use > the newer method and too-bad to folks trying to talk to ancient stuff. Right. >>>>>> Questions? >>>>> A few :) >>>> More? >>> None of tremendous import, I think, by now. Maybe the GQA stuff, but >>> that can be later. >> Still more? > > Where does matter come from? If energy cannot be created or destroyed, > how did it come to be? "D'où venons-nous? Que sommes-nous? Où allons-nous?" ;) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: QMP and the 'id' parameter 2020-11-23 6:57 ` Markus Armbruster @ 2020-11-30 18:32 ` John Snow 0 siblings, 0 replies; 11+ messages in thread From: John Snow @ 2020-11-30 18:32 UTC (permalink / raw) To: Markus Armbruster; +Cc: Michael Roth, QEMU Developers On 11/23/20 1:57 AM, Markus Armbruster wrote: >> (Was it not possible to have the client send an ACK response that >> doesn't indicate success, but just signals receipt? Semantically, it >> would be the difference between do-x and start-x as a command.) > Feels quite possible to me. > > If I read git-log correctly, the commands' design ignored the race with > shutdown / suspend, and 'success-response': false was a quick fix. > > I think changing the commands from 'do-x' to 'start-x' would have been > better. A bit more work, though, and back then there weren't any > 'start-x' examples to follow, I guess. > > I wish success-response didn't exist, but is getting rid of it worth > changing the interface? I honestly don't know. > OK, I'll jot it down in the ideas book and we can come back to it later, after we've fried the other 2,385 fish in queue. I think simplifying the interface is going to be helpful long-term, but I don't have a good grip on the actual work estimate. In terms of a simple library design, this edge case makes an async library quite a bit worse to support, because now it needs to understand that sometimes the target will never reply, and it's command dependent. Now the core "QMP" library, ostensibly the command-agnostic layer, needs to query to discover the semantics of each command it sends. (1) Identify 'success-response: false' commands: - guest-shutdown - guest-suspend-disk - guest-suspend-ram - guest-suspend-hybrid (2) Replace them with jobs, where a success response acknowledges receipt and start of the job. Errors encountered setting up the job can be returned synchronously; errors encountered past the point of no return can be queried via the job system. (I am assuming that it is untenable to have a system where we acknowledge receipt, go past the point of no return and then encounter an error and have no way to report it, since we've already responded to the shutdown/suspend request. Therefore, the job system seems appropriate.) ((The onus is now on the job layer to understand that on job success, the server goes away and cannot report success, but I think this is easier to implement in a client at the logical layer than at the protocol layer.)) (3) Deprecate the old interfaces. (4) Delete the old interfaces. Remove 'success-response: false' from the meta-schema. Remove 'success-response' from GuestAgentCommandInfo. ...but, not terribly important. Might be nice to help plumb some non-block jobs for sake of example for other areas where they might be useful. Other stuff to think about in the meantime, but thank you for the heads up. Maybe a good GSoC project, actually? It seems straightforward, just with a lot of plumbing. I suppose for now, what can happen is the client (using the AQMP library) can simply await the results of execute() with a timeout. If the server closes the connection, we know it worked -- or, if there's a timeout and we used --no-shutdown, we can interrogate the VM status. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-11-30 18:35 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-11-10 1:47 QMP and the 'id' parameter John Snow 2020-11-10 6:22 ` Markus Armbruster 2020-11-10 9:15 ` Daniel P. Berrangé 2020-11-10 10:27 ` Markus Armbruster 2020-11-10 16:32 ` John Snow 2020-11-11 8:27 ` Markus Armbruster 2020-11-20 0:22 ` John Snow 2020-11-20 10:25 ` Markus Armbruster 2020-11-20 16:49 ` John Snow 2020-11-23 6:57 ` Markus Armbruster 2020-11-30 18:32 ` John Snow
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).