From: Markus Armbruster <armbru@redhat.com>
To: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
qemu-devel@nongnu.org, yc-core@yandex-team.ru,
michael.roth@amd.com, vsementsov@yandex-team.ru,
marcandre.lureau@gmail.com
Subject: Re: [PATCH v3] qapi/qmp: Add timestamps to qmp command responses
Date: Mon, 17 Oct 2022 19:23:50 +0200 [thread overview]
Message-ID: <871qr6qs7t.fsf@pond.sub.org> (raw)
In-Reply-To: <a2ff7f2d-1655-19fa-8cad-baf9fe65a4fe@yandex-team.ru> (Denis Plotnikov's message of "Sun, 16 Oct 2022 12:25:48 +0300")
Denis Plotnikov <den-plotnikov@yandex-team.ru> writes:
> On 14.10.2022 16:19, Daniel P. Berrangé wrote:
>> On Fri, Oct 14, 2022 at 02:57:06PM +0200, Markus Armbruster wrote:
>>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>>
>>>> On Fri, Oct 14, 2022 at 11:31:13AM +0200, Markus Armbruster wrote:
>>>>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>>>>
>>>>>> On Thu, Oct 13, 2022 at 05:00:26PM +0200, Markus Armbruster wrote:
>>>>>>> Denis Plotnikov <den-plotnikov@yandex-team.ru> writes:
>>>>>>>
>>>>>>>> Add "start" & "end" time values to qmp command responses.
>>>>>>> Please spell it QMP. More of the same below.
>>>>>>>
>>>>>>>> These time values are added to let the qemu management layer get the exact
>>>>>>>> command execution time without any other time variance which might be brought by
>>>>>>>> other parts of management layer or qemu internals. This is particulary useful
>>>>>>>> for the management layer logging for later problems resolving.
>>>>>>> I'm still having difficulties seeing the value add over existing
>>>>>>> tracepoints and logging.
>>>>>>>
>>>>>>> Can you tell me about a problem you cracked (or could have cracked) with
>>>>>>> the help of this?
>>>>>> Consider your QMP client is logging all commands and replies in its
>>>>>> own logfile (libvirt can do this). Having this start/end timestamps
>>>>>> included means the QMP client log is self contained.
>>>>> A QMP client can include client-side timestamps in its log. What value
>>>>> is being added by server-side timestamps? According to the commit
>>>>> message, it's for getting "the exact command execution time without any
>>>>> other time variance which might be brought by other parts of management
>>>>> layer or qemu internals." Why is that useful? In particular, why is
>>>>> excluding network and QEMU queueing delays (inbound and outbound)
>>>>> useful?
>>>> Lets, say some commands normally runs in ~100ms, but occasionally
>>>> runs in 2secs, and you want to understand why.
>>>>
>>>> A first step is understanding whether a given command itself is
>>>> slow at executing, or whether its execution has merely been
>>>> delayed because some other aspect of QEMU has delayed its execution.
>>>> If the server timestamps show it was very fast, then that indicates
>>>> delayed processing. Thus instead of debugging the slow command, I
>>>> can think about what scenarios would be responsible for the delay.
>>>> Perhaps a previous QMP command was very slow, or maybe there is
>>>> simply a large volume of QMP commands backlogged, or some part of
>>>> QEMU got blocked.
>>>>
>>>> Another case would be a command that is normally fast, and sometimes
>>>> is slower, but still relatively fast. The network and queueing side
>>>> might be a significant enough proportion of the total time to obscure
>>>> the slowdown. If you can eliminate the non-execution time, you can
>>>> see the performance trends over time to spot the subtle slowdowns
>>>> and detect abnormal behaviour before it becomes too terrible.
>>> This is troubleshooting. Asking for better troubleshooting tools is
>>> fair.
>>>
>>> However, the proposed timestamps provide much more limited insight than
>>> existing tracepoints. For instance, enabling
>> tracepoints are absolutely great and let you get a hell of alot
>> more information, *provided* you are in a position to actually
>> use tracepoints. This is, unfortunately, frequently not the case
>> when supporting real world production deployments.
>
> Exactly!!! Thanks for the pointing out!
>>
>> Bug reports from customers typically include little more than a
>> log file they got from the mgmt client at time the problem happened.
>> The problem experianced may no longer exist, so asking them to run
>> a tracepoint script is not possible. They may also be reluctant to
>> actually run tracepoint scripts on a production system, or simply
>> lack the ability todo so at all, due to constraints of the deployment
>> environment. Logs from libvirt are something that are collected by
>> default for many mgmt apps, or can be turned on by the user with
>> minimal risk of disruption.
>>
>> Overall, there's a compelling desire to be proactive in collecting
>> information ahead of time, that might be useful in diagnosing
>> future bug reports.
>
> This is the main reason. When you encounter a problem one of the first
> questions is "Was there something similar in the past. Another question
> is how often does it happen.
>
> With the timestamps these questions answering becomes easier.
>
> Another thing is that with the qmp command timestamps you can build a
> monitoring system which will report about the cases when
> execution_time_from_mgmt_perspective - excution_time_qmp_command >
> some_threshold which in turn proactively tell you about the potential
> problems. And then you'll start using the qmp tracepoints (and other
> means) to figure out the real reason of the execution time variance.
>
> Thanks, Denis
>
>>
>> So it isn't an 'either / or' decision of QMP reply logs vs use of
>> tracepoints, both are beneficial, with their own pros/cons.
>>
>> With regards,
>> Daniel
Please give full rationale in the commit message. Make sure to address
the following points:
1. Why tracepoints can't do the job
2. Why client-side logging can't do the job
3. Why of all the possible points of interest in a QMP command's flow
through QEMU, you're picking these two.
So far, I've seen a full argument on 1., not much on 2., and basically
nothing on 3.
next prev parent reply other threads:[~2022-10-17 17:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-11 15:34 [PATCH v3] qapi/qmp: Add timestamps to qmp command responses Denis Plotnikov
2022-10-13 15:00 ` Markus Armbruster
2022-10-13 15:40 ` Daniel P. Berrangé
2022-10-14 9:31 ` Markus Armbruster
2022-10-14 11:31 ` Daniel P. Berrangé
2022-10-14 12:57 ` Markus Armbruster
2022-10-14 13:19 ` Daniel P. Berrangé
2022-10-16 9:25 ` Denis Plotnikov
2022-10-17 17:23 ` Markus Armbruster [this message]
2022-10-14 9:47 ` Denis Plotnikov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871qr6qs7t.fsf@pond.sub.org \
--to=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=den-plotnikov@yandex-team.ru \
--cc=marcandre.lureau@gmail.com \
--cc=michael.roth@amd.com \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@yandex-team.ru \
--cc=yc-core@yandex-team.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.