qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Performance CI - real-world example
@ 2021-10-04  7:41 Lukáš Doktor
  0 siblings, 0 replies; only message in thread
From: Lukáš Doktor @ 2021-10-04  7:41 UTC (permalink / raw)
  To: qemu-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 6594 bytes --]

Hello guys,

on KVM forum I presented the intention to establish a regular performance CI and now I have the opportunity to demonstrate a clear hit. Long story short, the commit message that resulted from bisection explains the situation so no big surprise here and normally I would not spam you with clear issues like this one and simply re-train the model. Anyway let's pretend the commit message would not be that informative and I'd need a confirmation that the different performance results are expected and intentional. Some questions that come to my mind:

* Which form of a report would you prefer, an email or GL issue?
* Should I CC to the commit author?
* Shell I include the reviewer as well?
* What files/reports should I include?
* Is the report.html understandable?
* Should I reorganize the body of the email somehow? Cut some pieces or expand?
* Should I create issues for improvements or only regressions?
* Do you event want perf CI?
* Would you like the perf-ci status in gitlab? Example of integration (with always passing regular CI per each push and soft-failure-based perf-ci ran only on some builds: https://gitlab.com/ldoktor/qemu/-/pipelines )
* Would you like to use latest-released RHEL (at this point 8.4) or latest-released RHEL + latest-kernel (from fedora's koji as that's stable enough while close to upstream)

Example of a simple email:

---
Subject: [PERF-CI] fio-nbd write-* 1300% performance improvement
To: qemu-devel,Nir Soffer

A very significant improvement in fio-nbd write had been detected by my pipeline, could you please verify it is expected?

commit 09615257058a0ae87b837bb041f56f7312d9ead8
Author: Nir Soffer <nirsof@gmail.com>
Date:   Fri Aug 13 23:55:19 2021 +0300

    qemu-nbd: Change default cache mode to writeback
    
    Both qemu and qemu-img use writeback cache mode by default, which is
    already documented in qemu(1). qemu-nbd uses writethrough cache mode by
    default, and the default cache mode is not documented.
    
    According to the qemu-nbd(8):
    
       --cache=CACHE
              The  cache  mode  to be used with the file.  See the
              documentation of the emulator's -drive cache=... option for
              allowed values.
    
    qemu(1) says:
    
        The default mode is cache=writeback.
    
    So users have no reason to assume that qemu-nbd is using writethough
    cache mode. The only hint is the painfully slow writing when using the
    defaults.
    
    Looking in git history, it seems that qemu used writethrough in the past
    to support broken guests that did not flush data properly, or could not
    flush due to limitations in qemu. But qemu-nbd clients can use
    NBD_CMD_FLUSH to flush data, so using writethrough does not help anyone.
    
    Change the default cache mode to writback, and document the default and
    available values properly in the online help and manual.
    
    With this change converting image via qemu-nbd is 3.5 times faster.
    
        $ qemu-img create dst.img 50g
        $ qemu-nbd -t -f raw -k /tmp/nbd.sock dst.img
    
    Before this change:
    
        $ hyperfine -r3 "./qemu-img convert -p -f raw -O raw -T none -W fedora34.img nbd+unix:///?socket=/tmp/nbd.sock"
        Benchmark #1: ./qemu-img convert -p -f raw -O raw -T none -W fedora34.img nbd+unix:///?socket=/tmp/nbd.sock
          Time (mean ± σ):     83.639 s ±  5.970 s    [User: 2.733 s, System: 6.112 s]
          Range (min … max):   76.749 s … 87.245 s    3 runs
    
    After this change:
    
        $ hyperfine -r3 "./qemu-img convert -p -f raw -O raw -T none -W fedora34.img nbd+unix:///?socket=/tmp/nbd.sock"
        Benchmark #1: ./qemu-img convert -p -f raw -O raw -T none -W fedora34.img nbd+unix:///?socket=/tmp/nbd.sock
          Time (mean ± σ):     23.522 s ±  0.433 s    [User: 2.083 s, System: 5.475 s]
          Range (min … max):   23.234 s … 24.019 s    3 runs
    
    Users can avoid the issue by using --cache=writeback[1] but the defaults
    should give good performance for the common use case.
    
    [1] https://bugzilla.redhat.com/1990656
    
    Signed-off-by: Nir Soffer <nsoffer@redhat.com>
    Message-Id: <20210813205519.50518-1-nsoffer@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>
    CC: qemu-stable@nongnu.org
    Signed-off-by: Eric Blake <eblake@redhat.com>

 docs/tools/qemu-nbd.rst | 6 ++++--
 qemu-nbd.c              | 6 ++++--
 2 files changed, 8 insertions(+), 4 deletions(-)


Bisection log:
git bisect start
# good: [6b54a31bf7b403672a798b6443b1930ae6c74dea] Merge remote-tracking branch 'remotes/jsnow-gitlab/tags/python-pull-request' into staging
git bisect good 6b54a31bf7b403672a798b6443b1930ae6c74dea
# bad: [bb4aa8f59e18412cff0d69f14aee7abba153161a] Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20210930' into staging
git bisect bad bb4aa8f59e18412cff0d69f14aee7abba153161a
# bad: [fce8f7735fcea23056ff41be55e73eacbca31b5e] Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.2-20210930' into staging
git bisect bad fce8f7735fcea23056ff41be55e73eacbca31b5e
# good: [ff8cdbbd7e4bb7a7422c080c004b899214a08b3a] MAINTAINERS: Add information for OpenPIC
git bisect good ff8cdbbd7e4bb7a7422c080c004b899214a08b3a
# good: [ba0fa56bc06e563de68d2a2bf3ddb0cfea1be4f9] Merge remote-tracking branch 'remotes/vivier/tags/q800-for-6.2-pull-request' into staging
git bisect good ba0fa56bc06e563de68d2a2bf3ddb0cfea1be4f9
# bad: [0c8022876f2183f93e23a7314862140c94ee62e7] block: use int64_t instead of int in driver discard handlers
git bisect bad 0c8022876f2183f93e23a7314862140c94ee62e7
# skip: [e75abedab7e10043ed81b1cbc3033409aff0159e] block: use int64_t instead of uint64_t in driver write handlers
git bisect skip e75abedab7e10043ed81b1cbc3033409aff0159e
# bad: [485350497b7a9be3477d453fe0fdf380b8535303] block: use int64_t instead of uint64_t in copy_range driver handlers
git bisect bad 485350497b7a9be3477d453fe0fdf380b8535303
# bad: [558902cc3dc61231930001b82dcd95d20d58b417] qcow2: check request on vmstate save/load path
git bisect bad 558902cc3dc61231930001b82dcd95d20d58b417
# bad: [09615257058a0ae87b837bb041f56f7312d9ead8] qemu-nbd: Change default cache mode to writeback
git bisect bad 09615257058a0ae87b837bb041f56f7312d9ead8
# first bad commit: [09615257058a0ae87b837bb041f56f7312d9ead8] qemu-nbd: Change default cache mode to writeback


Please find the bisection report with raw numbers attached.

Regards,
Lukáš Doktor

[-- Attachment #1.1.2: report.html --]
[-- Type: text/html, Size: 1096928 bytes --]

[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 12093 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-10-04  7:49 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-04  7:41 Performance CI - real-world example Lukáš Doktor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).