qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* starting to look at qemu savevm performance, a first regression detected
@ 2022-03-05 13:20 Claudio Fontana
  2022-03-05 14:11 ` Claudio Fontana
  0 siblings, 1 reply; 12+ messages in thread
From: Claudio Fontana @ 2022-03-05 13:20 UTC (permalink / raw)
  To: Juan Quintela, Dr. David Alan Gilbert, Daniel P. Berrangé; +Cc: qemu-devel


Hello all,

I have been looking at some reports of bad qemu savevm performance in large VMs (around 20+ Gb),
when used in libvirt commands like:


virsh save domain /dev/null



I have written a simple test to run in a Linux centos7-minimal-2009 guest, which allocates and touches 20G mem.

With any qemu version since around 2020, I am not seeing more than 580 Mb/Sec even in the most ideal of situations.

This drops to around 122 Mb/sec after commit: cbde7be900d2a2279cbc4becb91d1ddd6a014def .

Here is the bisection for this particular drop in throughput:

commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
Author: Daniel P. Berrangé <berrange@redhat.com>
Date:   Fri Feb 19 18:40:12 2021 +0000

    migrate: remove QMP/HMP commands for speed, downtime and cache size
    
    The generic 'migrate_set_parameters' command handle all types of param.
    
    Only the QMP commands were documented in the deprecations page, but the
    rationale for deprecating applies equally to HMP, and the replacements
    exist. Furthermore the HMP commands are just shims to the QMP commands,
    so removing the latter breaks the former unless they get re-implemented.
    
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>


git bisect start
# bad: [5c8463886d50eeb0337bd121ab877cf692731e36] Merge remote-tracking branch 'remotes/kraxel/tags/kraxel-20220304-pull-request' into staging
git bisect bad 5c8463886d50eeb0337bd121ab877cf692731e36
# good: [6cdf8c4efa073eac7d5f9894329e2d07743c2955] Update version for 4.2.1 release
git bisect good 6cdf8c4efa073eac7d5f9894329e2d07743c2955
# good: [b0ca999a43a22b38158a222233d3f5881648bb4f] Update version for v4.2.0 release
git bisect good b0ca999a43a22b38158a222233d3f5881648bb4f
# skip: [e2665f314d80d7edbfe7f8275abed7e2c93c0ddc] target/mips: Alias MSA vector registers on FPU scalar registers
git bisect skip e2665f314d80d7edbfe7f8275abed7e2c93c0ddc
# good: [4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3] tests/docker: Install static libc package in CentOS 7
git bisect good 4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3
# bad: [d4127349e316b5c78645f95dba5922196ac4cc23] Merge remote-tracking branch 'remotes/berrange-gitlab/tags/crypto-and-more-pull-request' into staging
git bisect bad d4127349e316b5c78645f95dba5922196ac4cc23
# bad: [d90f154867ec0ec22fd719164b88716e8fd48672] Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging
git bisect bad d90f154867ec0ec22fd719164b88716e8fd48672
# good: [dd5af6ece9b101d29895851a7441d848b7ccdbff] tests/docker: add a test-tcg for building then running check-tcg
git bisect good dd5af6ece9b101d29895851a7441d848b7ccdbff
# bad: [90ec1cff768fcbe1fa2870d2018f378376f4f744] target/riscv: Adjust privilege level for HLV(X)/HSV instructions
git bisect bad 90ec1cff768fcbe1fa2870d2018f378376f4f744
# good: [373969507a3dc7de2d291da7e1bd03acf46ec643] migration: Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARD
git bisect good 373969507a3dc7de2d291da7e1bd03acf46ec643
# good: [4083904bc9fe5da580f7ca397b1e828fbc322732] Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210317' into staging
git bisect good 4083904bc9fe5da580f7ca397b1e828fbc322732
# bad: [009ff89328b1da3ea8ba316bf2be2125bc9937c5] vl: allow passing JSON to -object
git bisect bad 009ff89328b1da3ea8ba316bf2be2125bc9937c5
# bad: [50243407457a9fb0ed17b9a9ba9fc9aee09495b1] qapi/qom: Drop deprecated 'props' from object-add
git bisect bad 50243407457a9fb0ed17b9a9ba9fc9aee09495b1
# bad: [1b507e55f8199eaad99744613823f6929e4d57c6] Merge remote-tracking branch 'remotes/berrange-gitlab/tags/dep-many-pull-request' into staging
git bisect bad 1b507e55f8199eaad99744613823f6929e4d57c6
# bad: [24e13a4dc1eb1630eceffc7ab334145d902e763d] chardev: reject use of 'wait' flag for socket client chardevs
git bisect bad 24e13a4dc1eb1630eceffc7ab334145d902e763d
# good: [8becb36063fb14df1e3ae4916215667e2cb65fa2] monitor: remove 'query-events' QMP command
git bisect good 8becb36063fb14df1e3ae4916215667e2cb65fa2
# bad: [8af54b9172ff3b9bbdbb3191ed84994d275a0d81] machine: remove 'query-cpus' QMP command
git bisect bad 8af54b9172ff3b9bbdbb3191ed84994d275a0d81
# bad: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP commands for speed, downtime and cache size
git bisect bad cbde7be900d2a2279cbc4becb91d1ddd6a014def
# first bad commit: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP commands for speed, downtime and cache size


Are there some obvious settings / options I am missing to regain the savevm performance after this commit?

I have seen projects attempting to improve other aspects of performance (snapshot performance, etc), is there something going on to improve the transfer of RAM in savevm too?


Thanks,

Claudio


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: starting to look at qemu savevm performance, a first regression detected
  2022-03-05 13:20 starting to look at qemu savevm performance, a first regression detected Claudio Fontana
@ 2022-03-05 14:11 ` Claudio Fontana
  2022-03-07 10:32   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 12+ messages in thread
From: Claudio Fontana @ 2022-03-05 14:11 UTC (permalink / raw)
  To: Juan Quintela, Dr. David Alan Gilbert, Daniel P. Berrangé; +Cc: qemu-devel

On 3/5/22 2:20 PM, Claudio Fontana wrote:
> 
> Hello all,
> 
> I have been looking at some reports of bad qemu savevm performance in large VMs (around 20+ Gb),
> when used in libvirt commands like:
> 
> 
> virsh save domain /dev/null
> 
> 
> 
> I have written a simple test to run in a Linux centos7-minimal-2009 guest, which allocates and touches 20G mem.
> 
> With any qemu version since around 2020, I am not seeing more than 580 Mb/Sec even in the most ideal of situations.
> 
> This drops to around 122 Mb/sec after commit: cbde7be900d2a2279cbc4becb91d1ddd6a014def .
> 
> Here is the bisection for this particular drop in throughput:
> 
> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
> Author: Daniel P. Berrangé <berrange@redhat.com>
> Date:   Fri Feb 19 18:40:12 2021 +0000
> 
>     migrate: remove QMP/HMP commands for speed, downtime and cache size
>     
>     The generic 'migrate_set_parameters' command handle all types of param.
>     
>     Only the QMP commands were documented in the deprecations page, but the
>     rationale for deprecating applies equally to HMP, and the replacements
>     exist. Furthermore the HMP commands are just shims to the QMP commands,
>     so removing the latter breaks the former unless they get re-implemented.
>     
>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>     Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> 
> 
> git bisect start
> # bad: [5c8463886d50eeb0337bd121ab877cf692731e36] Merge remote-tracking branch 'remotes/kraxel/tags/kraxel-20220304-pull-request' into staging
> git bisect bad 5c8463886d50eeb0337bd121ab877cf692731e36
> # good: [6cdf8c4efa073eac7d5f9894329e2d07743c2955] Update version for 4.2.1 release
> git bisect good 6cdf8c4efa073eac7d5f9894329e2d07743c2955
> # good: [b0ca999a43a22b38158a222233d3f5881648bb4f] Update version for v4.2.0 release
> git bisect good b0ca999a43a22b38158a222233d3f5881648bb4f
> # skip: [e2665f314d80d7edbfe7f8275abed7e2c93c0ddc] target/mips: Alias MSA vector registers on FPU scalar registers
> git bisect skip e2665f314d80d7edbfe7f8275abed7e2c93c0ddc
> # good: [4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3] tests/docker: Install static libc package in CentOS 7
> git bisect good 4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3
> # bad: [d4127349e316b5c78645f95dba5922196ac4cc23] Merge remote-tracking branch 'remotes/berrange-gitlab/tags/crypto-and-more-pull-request' into staging
> git bisect bad d4127349e316b5c78645f95dba5922196ac4cc23
> # bad: [d90f154867ec0ec22fd719164b88716e8fd48672] Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging
> git bisect bad d90f154867ec0ec22fd719164b88716e8fd48672
> # good: [dd5af6ece9b101d29895851a7441d848b7ccdbff] tests/docker: add a test-tcg for building then running check-tcg
> git bisect good dd5af6ece9b101d29895851a7441d848b7ccdbff
> # bad: [90ec1cff768fcbe1fa2870d2018f378376f4f744] target/riscv: Adjust privilege level for HLV(X)/HSV instructions
> git bisect bad 90ec1cff768fcbe1fa2870d2018f378376f4f744
> # good: [373969507a3dc7de2d291da7e1bd03acf46ec643] migration: Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARD
> git bisect good 373969507a3dc7de2d291da7e1bd03acf46ec643
> # good: [4083904bc9fe5da580f7ca397b1e828fbc322732] Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210317' into staging
> git bisect good 4083904bc9fe5da580f7ca397b1e828fbc322732
> # bad: [009ff89328b1da3ea8ba316bf2be2125bc9937c5] vl: allow passing JSON to -object
> git bisect bad 009ff89328b1da3ea8ba316bf2be2125bc9937c5
> # bad: [50243407457a9fb0ed17b9a9ba9fc9aee09495b1] qapi/qom: Drop deprecated 'props' from object-add
> git bisect bad 50243407457a9fb0ed17b9a9ba9fc9aee09495b1
> # bad: [1b507e55f8199eaad99744613823f6929e4d57c6] Merge remote-tracking branch 'remotes/berrange-gitlab/tags/dep-many-pull-request' into staging
> git bisect bad 1b507e55f8199eaad99744613823f6929e4d57c6
> # bad: [24e13a4dc1eb1630eceffc7ab334145d902e763d] chardev: reject use of 'wait' flag for socket client chardevs
> git bisect bad 24e13a4dc1eb1630eceffc7ab334145d902e763d
> # good: [8becb36063fb14df1e3ae4916215667e2cb65fa2] monitor: remove 'query-events' QMP command
> git bisect good 8becb36063fb14df1e3ae4916215667e2cb65fa2
> # bad: [8af54b9172ff3b9bbdbb3191ed84994d275a0d81] machine: remove 'query-cpus' QMP command
> git bisect bad 8af54b9172ff3b9bbdbb3191ed84994d275a0d81
> # bad: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP commands for speed, downtime and cache size
> git bisect bad cbde7be900d2a2279cbc4becb91d1ddd6a014def
> # first bad commit: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP commands for speed, downtime and cache size
> 
> 
> Are there some obvious settings / options I am missing to regain the savevm performance after this commit?

Answering myself: 

this seems to be due to a resulting different default xbzrle cache size (probably interactions between libvirt/qemu versions?).

When forcing the xbzrle cache size to a larger value, the performance is back.


> 
> I have seen projects attempting to improve other aspects of performance (snapshot performance, etc), is there something going on to improve the transfer of RAM in savevm too?


Still I would think that we should be able to do better than 600ish Mb/s , any ideas, prior work on this,
to improve savevm performance, especially looking at RAM regions transfer speed?

Thanks,

Claudio


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: starting to look at qemu savevm performance, a first regression detected
  2022-03-05 14:11 ` Claudio Fontana
@ 2022-03-07 10:32   ` Dr. David Alan Gilbert
  2022-03-07 11:06     ` Claudio Fontana
  0 siblings, 1 reply; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2022-03-07 10:32 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: Daniel P. Berrangé, qemu-devel, Juan Quintela

* Claudio Fontana (cfontana@suse.de) wrote:
> On 3/5/22 2:20 PM, Claudio Fontana wrote:
> > 
> > Hello all,
> > 
> > I have been looking at some reports of bad qemu savevm performance in large VMs (around 20+ Gb),
> > when used in libvirt commands like:
> > 
> > 
> > virsh save domain /dev/null
> > 
> > 
> > 
> > I have written a simple test to run in a Linux centos7-minimal-2009 guest, which allocates and touches 20G mem.
> > 
> > With any qemu version since around 2020, I am not seeing more than 580 Mb/Sec even in the most ideal of situations.
> > 
> > This drops to around 122 Mb/sec after commit: cbde7be900d2a2279cbc4becb91d1ddd6a014def .
> > 
> > Here is the bisection for this particular drop in throughput:
> > 
> > commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
> > Author: Daniel P. Berrangé <berrange@redhat.com>
> > Date:   Fri Feb 19 18:40:12 2021 +0000
> > 
> >     migrate: remove QMP/HMP commands for speed, downtime and cache size
> >     
> >     The generic 'migrate_set_parameters' command handle all types of param.
> >     
> >     Only the QMP commands were documented in the deprecations page, but the
> >     rationale for deprecating applies equally to HMP, and the replacements
> >     exist. Furthermore the HMP commands are just shims to the QMP commands,
> >     so removing the latter breaks the former unless they get re-implemented.
> >     
> >     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >     Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > 
> > 
> > git bisect start
> > # bad: [5c8463886d50eeb0337bd121ab877cf692731e36] Merge remote-tracking branch 'remotes/kraxel/tags/kraxel-20220304-pull-request' into staging
> > git bisect bad 5c8463886d50eeb0337bd121ab877cf692731e36
> > # good: [6cdf8c4efa073eac7d5f9894329e2d07743c2955] Update version for 4.2.1 release
> > git bisect good 6cdf8c4efa073eac7d5f9894329e2d07743c2955
> > # good: [b0ca999a43a22b38158a222233d3f5881648bb4f] Update version for v4.2.0 release
> > git bisect good b0ca999a43a22b38158a222233d3f5881648bb4f
> > # skip: [e2665f314d80d7edbfe7f8275abed7e2c93c0ddc] target/mips: Alias MSA vector registers on FPU scalar registers
> > git bisect skip e2665f314d80d7edbfe7f8275abed7e2c93c0ddc
> > # good: [4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3] tests/docker: Install static libc package in CentOS 7
> > git bisect good 4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3
> > # bad: [d4127349e316b5c78645f95dba5922196ac4cc23] Merge remote-tracking branch 'remotes/berrange-gitlab/tags/crypto-and-more-pull-request' into staging
> > git bisect bad d4127349e316b5c78645f95dba5922196ac4cc23
> > # bad: [d90f154867ec0ec22fd719164b88716e8fd48672] Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging
> > git bisect bad d90f154867ec0ec22fd719164b88716e8fd48672
> > # good: [dd5af6ece9b101d29895851a7441d848b7ccdbff] tests/docker: add a test-tcg for building then running check-tcg
> > git bisect good dd5af6ece9b101d29895851a7441d848b7ccdbff
> > # bad: [90ec1cff768fcbe1fa2870d2018f378376f4f744] target/riscv: Adjust privilege level for HLV(X)/HSV instructions
> > git bisect bad 90ec1cff768fcbe1fa2870d2018f378376f4f744
> > # good: [373969507a3dc7de2d291da7e1bd03acf46ec643] migration: Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARD
> > git bisect good 373969507a3dc7de2d291da7e1bd03acf46ec643
> > # good: [4083904bc9fe5da580f7ca397b1e828fbc322732] Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210317' into staging
> > git bisect good 4083904bc9fe5da580f7ca397b1e828fbc322732
> > # bad: [009ff89328b1da3ea8ba316bf2be2125bc9937c5] vl: allow passing JSON to -object
> > git bisect bad 009ff89328b1da3ea8ba316bf2be2125bc9937c5
> > # bad: [50243407457a9fb0ed17b9a9ba9fc9aee09495b1] qapi/qom: Drop deprecated 'props' from object-add
> > git bisect bad 50243407457a9fb0ed17b9a9ba9fc9aee09495b1
> > # bad: [1b507e55f8199eaad99744613823f6929e4d57c6] Merge remote-tracking branch 'remotes/berrange-gitlab/tags/dep-many-pull-request' into staging
> > git bisect bad 1b507e55f8199eaad99744613823f6929e4d57c6
> > # bad: [24e13a4dc1eb1630eceffc7ab334145d902e763d] chardev: reject use of 'wait' flag for socket client chardevs
> > git bisect bad 24e13a4dc1eb1630eceffc7ab334145d902e763d
> > # good: [8becb36063fb14df1e3ae4916215667e2cb65fa2] monitor: remove 'query-events' QMP command
> > git bisect good 8becb36063fb14df1e3ae4916215667e2cb65fa2
> > # bad: [8af54b9172ff3b9bbdbb3191ed84994d275a0d81] machine: remove 'query-cpus' QMP command
> > git bisect bad 8af54b9172ff3b9bbdbb3191ed84994d275a0d81
> > # bad: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP commands for speed, downtime and cache size
> > git bisect bad cbde7be900d2a2279cbc4becb91d1ddd6a014def
> > # first bad commit: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP commands for speed, downtime and cache size
> > 
> > 
> > Are there some obvious settings / options I am missing to regain the savevm performance after this commit?
> 
> Answering myself: 

<oops we seem to have split this thread into two>

> this seems to be due to a resulting different default xbzrle cache size (probably interactions between libvirt/qemu versions?).
> 
> When forcing the xbzrle cache size to a larger value, the performance is back.

That's weird that 'virsh save' is ending up using xbzrle.

> > 
> > I have seen projects attempting to improve other aspects of performance (snapshot performance, etc), is there something going on to improve the transfer of RAM in savevm too?
> 
> 
> Still I would think that we should be able to do better than 600ish Mb/s , any ideas, prior work on this,
> to improve savevm performance, especially looking at RAM regions transfer speed?

My normal feeling is ~10Gbps for a live migrate over the wire; I rarely
try virsh save though.
If you're using xbzrle that might explain it; it's known to eat cpu -
but I'd never expect it to have been used with 'virsh save'.

Dave


> Thanks,
> 
> Claudio
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: starting to look at qemu savevm performance, a first regression detected
  2022-03-07 10:32   ` Dr. David Alan Gilbert
@ 2022-03-07 11:06     ` Claudio Fontana
  2022-03-07 11:31       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 12+ messages in thread
From: Claudio Fontana @ 2022-03-07 11:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Daniel P. Berrangé, qemu-devel, Juan Quintela

On 3/7/22 11:32 AM, Dr. David Alan Gilbert wrote:
> * Claudio Fontana (cfontana@suse.de) wrote:
>> On 3/5/22 2:20 PM, Claudio Fontana wrote:
>>>
>>> Hello all,
>>>
>>> I have been looking at some reports of bad qemu savevm performance in large VMs (around 20+ Gb),
>>> when used in libvirt commands like:
>>>
>>>
>>> virsh save domain /dev/null
>>>
>>>
>>>
>>> I have written a simple test to run in a Linux centos7-minimal-2009 guest, which allocates and touches 20G mem.
>>>
>>> With any qemu version since around 2020, I am not seeing more than 580 Mb/Sec even in the most ideal of situations.
>>>
>>> This drops to around 122 Mb/sec after commit: cbde7be900d2a2279cbc4becb91d1ddd6a014def .
>>>
>>> Here is the bisection for this particular drop in throughput:
>>>
>>> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
>>> Author: Daniel P. Berrangé <berrange@redhat.com>
>>> Date:   Fri Feb 19 18:40:12 2021 +0000
>>>
>>>     migrate: remove QMP/HMP commands for speed, downtime and cache size
>>>     
>>>     The generic 'migrate_set_parameters' command handle all types of param.
>>>     
>>>     Only the QMP commands were documented in the deprecations page, but the
>>>     rationale for deprecating applies equally to HMP, and the replacements
>>>     exist. Furthermore the HMP commands are just shims to the QMP commands,
>>>     so removing the latter breaks the former unless they get re-implemented.
>>>     
>>>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>     Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>>
>>>
>>> git bisect start
>>> # bad: [5c8463886d50eeb0337bd121ab877cf692731e36] Merge remote-tracking branch 'remotes/kraxel/tags/kraxel-20220304-pull-request' into staging
>>> git bisect bad 5c8463886d50eeb0337bd121ab877cf692731e36
>>> # good: [6cdf8c4efa073eac7d5f9894329e2d07743c2955] Update version for 4.2.1 release
>>> git bisect good 6cdf8c4efa073eac7d5f9894329e2d07743c2955
>>> # good: [b0ca999a43a22b38158a222233d3f5881648bb4f] Update version for v4.2.0 release
>>> git bisect good b0ca999a43a22b38158a222233d3f5881648bb4f
>>> # skip: [e2665f314d80d7edbfe7f8275abed7e2c93c0ddc] target/mips: Alias MSA vector registers on FPU scalar registers
>>> git bisect skip e2665f314d80d7edbfe7f8275abed7e2c93c0ddc
>>> # good: [4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3] tests/docker: Install static libc package in CentOS 7
>>> git bisect good 4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3
>>> # bad: [d4127349e316b5c78645f95dba5922196ac4cc23] Merge remote-tracking branch 'remotes/berrange-gitlab/tags/crypto-and-more-pull-request' into staging
>>> git bisect bad d4127349e316b5c78645f95dba5922196ac4cc23
>>> # bad: [d90f154867ec0ec22fd719164b88716e8fd48672] Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging
>>> git bisect bad d90f154867ec0ec22fd719164b88716e8fd48672
>>> # good: [dd5af6ece9b101d29895851a7441d848b7ccdbff] tests/docker: add a test-tcg for building then running check-tcg
>>> git bisect good dd5af6ece9b101d29895851a7441d848b7ccdbff
>>> # bad: [90ec1cff768fcbe1fa2870d2018f378376f4f744] target/riscv: Adjust privilege level for HLV(X)/HSV instructions
>>> git bisect bad 90ec1cff768fcbe1fa2870d2018f378376f4f744
>>> # good: [373969507a3dc7de2d291da7e1bd03acf46ec643] migration: Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARD
>>> git bisect good 373969507a3dc7de2d291da7e1bd03acf46ec643
>>> # good: [4083904bc9fe5da580f7ca397b1e828fbc322732] Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210317' into staging
>>> git bisect good 4083904bc9fe5da580f7ca397b1e828fbc322732
>>> # bad: [009ff89328b1da3ea8ba316bf2be2125bc9937c5] vl: allow passing JSON to -object
>>> git bisect bad 009ff89328b1da3ea8ba316bf2be2125bc9937c5
>>> # bad: [50243407457a9fb0ed17b9a9ba9fc9aee09495b1] qapi/qom: Drop deprecated 'props' from object-add
>>> git bisect bad 50243407457a9fb0ed17b9a9ba9fc9aee09495b1
>>> # bad: [1b507e55f8199eaad99744613823f6929e4d57c6] Merge remote-tracking branch 'remotes/berrange-gitlab/tags/dep-many-pull-request' into staging
>>> git bisect bad 1b507e55f8199eaad99744613823f6929e4d57c6
>>> # bad: [24e13a4dc1eb1630eceffc7ab334145d902e763d] chardev: reject use of 'wait' flag for socket client chardevs
>>> git bisect bad 24e13a4dc1eb1630eceffc7ab334145d902e763d
>>> # good: [8becb36063fb14df1e3ae4916215667e2cb65fa2] monitor: remove 'query-events' QMP command
>>> git bisect good 8becb36063fb14df1e3ae4916215667e2cb65fa2
>>> # bad: [8af54b9172ff3b9bbdbb3191ed84994d275a0d81] machine: remove 'query-cpus' QMP command
>>> git bisect bad 8af54b9172ff3b9bbdbb3191ed84994d275a0d81
>>> # bad: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP commands for speed, downtime and cache size
>>> git bisect bad cbde7be900d2a2279cbc4becb91d1ddd6a014def
>>> # first bad commit: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP commands for speed, downtime and cache size
>>>
>>>
>>> Are there some obvious settings / options I am missing to regain the savevm performance after this commit?
>>
>> Answering myself: 
> 
> <oops we seem to have split this thread into two>
> 
>> this seems to be due to a resulting different default xbzrle cache size (probably interactions between libvirt/qemu versions?).
>>
>> When forcing the xbzrle cache size to a larger value, the performance is back.
> 
> That's weird that 'virsh save' is ending up using xbzrle.

virsh save (or qemu savevm..) seems to me like it uses a subset of the migration code and migration parameters but not all..

> 
>>>
>>> I have seen projects attempting to improve other aspects of performance (snapshot performance, etc), is there something going on to improve the transfer of RAM in savevm too?
>>
>>
>> Still I would think that we should be able to do better than 600ish Mb/s , any ideas, prior work on this,
>> to improve savevm performance, especially looking at RAM regions transfer speed?
> 
> My normal feeling is ~10Gbps for a live migrate over the wire; I rarely
> try virsh save though.
> If you're using xbzrle that might explain it; it's known to eat cpu -
> but I'd never expect it to have been used with 'virsh save'.

some valgrind shows it among the top cpu eaters;

I wonder why we are able to do more than 2x better for actual live migration, compared with virsh save /dev/null ...

Thanks,

Claudio



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: starting to look at qemu savevm performance, a first regression detected
       [not found]     ` <YiXVh1P4oJNuEtFM@redhat.com>
@ 2022-03-07 11:19       ` Claudio Fontana
  2022-03-07 12:00         ` Daniel P. Berrangé
  0 siblings, 1 reply; 12+ messages in thread
From: Claudio Fontana @ 2022-03-07 11:19 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, Dr. David Alan Gilbert, Juan Quintela

On 3/7/22 10:51 AM, Daniel P. Berrangé wrote:
> On Mon, Mar 07, 2022 at 10:44:56AM +0100, Claudio Fontana wrote:
>> Hello Daniel,
>>
>> On 3/7/22 10:27 AM, Daniel P. Berrangé wrote:
>>> On Sat, Mar 05, 2022 at 02:19:39PM +0100, Claudio Fontana wrote:
>>>>
>>>> Hello all,
>>>>
>>>> I have been looking at some reports of bad qemu savevm performance in large VMs (around 20+ Gb),
>>>> when used in libvirt commands like:
>>>>
>>>>
>>>> virsh save domain /dev/null
>>>>
>>>>
>>>>
>>>> I have written a simple test to run in a Linux centos7-minimal-2009 guest, which allocates and touches 20G mem.
>>>>
>>>> With any qemu version since around 2020, I am not seeing more than 580 Mb/Sec even in the most ideal of situations.
>>>>
>>>> This drops to around 122 Mb/sec after commit: cbde7be900d2a2279cbc4becb91d1ddd6a014def .
>>>>
>>>> Here is the bisection for this particular drop in throughput:
>>>>
>>>> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
>>>> Author: Daniel P. Berrangé <berrange@redhat.com>
>>>> Date:   Fri Feb 19 18:40:12 2021 +0000
>>>>
>>>>     migrate: remove QMP/HMP commands for speed, downtime and cache size
>>>>     
>>>>     The generic 'migrate_set_parameters' command handle all types of param.
>>>>     
>>>>     Only the QMP commands were documented in the deprecations page, but the
>>>>     rationale for deprecating applies equally to HMP, and the replacements
>>>>     exist. Furthermore the HMP commands are just shims to the QMP commands,
>>>>     so removing the latter breaks the former unless they get re-implemented.
>>>>     
>>>>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>     Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>>
>>> That doesn't make a whole lot of sense as a bisect result.
>>> How reliable is that bisect end point ? Have you bisected
>>> to that point more than once ?
>>
>> I did run through the bisect itself only once, so I'll double check that.
>> The results seem to be reproducible almost to the second though, a savevm that took 35 seconds before the commit takes 2m 48 seconds after.
>>
>> For this test I am using libvirt v6.0.0.
>>
>> If it helps, these are the current_migration->parameters pre-commit (captured in qemu_savevm_state_iterate):
>>
>>
>> pre-commit: in qemu_savevm_state_iterate:
>>
>> (gdb) p current_migration->parameters
> 
>>   tls_authz = 0x0, has_max_bandwidth = true, max_bandwidth = 9223372036853727232, has_downtime_limit = true, downtime_limit = 300,
> 
> snip
> 
>> and post-commit: in qemu_savevm_state_iterate:
>>
>> (gdb) p current_migration->parameters
> 
> snip
> 
>>   tls_authz = 0x0, has_max_bandwidth = true, max_bandwidth = 134217728, has_downtime_limit = true, downtime_limit = 300,
> 
>> so there seems to be a difference in the max_bandwidth parameter,
>> do we have a limit suddenly having effect for max_bandwidth after the commit?
> 
> Yes, that's very strange. I think we'll need to capture the QMP commands that
> libvirt is sending to QEMU, so see if there';s a difference in what it sends.
> This might indicate a latent bug in libvirt.


In the pre-commit case I see:

2022-03-07 10:41:00.928+0000: 132544: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0fd00028a0 event={"timestamp": {"seconds": 1646649660, "microseconds": 927920}, "event": "STOP"}
2022-03-07 10:41:00.929+0000: 132544: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7f0fd00028a0 reply={"return": {}, "id": "libvirt-13"}
2022-03-07 10:41:00.934+0000: 132549: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7f0fd00028a0 msg={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-14"}^M
 fd=-1
2022-03-07 10:41:00.934+0000: 132544: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7f0fd00028a0 buf={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-14"}^M
 len=93 ret=93 errno=0
2022-03-07 10:41:00.935+0000: 132544: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7f0fd00028a0 reply={"return": {}, "id": "libvirt-14"}
2022-03-07 10:41:00.936+0000: 132549: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7f0fd00028a0 msg={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-15"}^M
 fd=32
2022-03-07 10:41:00.936+0000: 132544: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7f0fd00028a0 buf={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-15"}^M
 len=72 ret=72 errno=0
2022-03-07 10:41:00.936+0000: 132544: info : qemuMonitorIOWrite:457 : QEMU_MONITOR_IO_SEND_FD: mon=0x7f0fd00028a0 fd=32 ret=72 errno=0
2022-03-07 10:41:00.937+0000: 132544: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7f0fd00028a0 reply={"return": {}, "id": "libvirt-15"}
2022-03-07 10:41:00.937+0000: 132549: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7f0fd00028a0 msg={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-16"}^M
 fd=-1
2022-03-07 10:41:00.937+0000: 132544: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7f0fd00028a0 buf={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-16"}^M
 len=112 ret=112 errno=0


In the post-commit case I see:


2022-03-07 10:47:07.316+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650027, "microseconds": 316537}, "event": "STOP"}
2022-03-07 10:47:07.317+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"return": {}, "id": "libvirt-13"}
2022-03-07 10:47:07.322+0000: 134391: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7fa4380028a0 msg={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-14"}^M
 fd=-1
2022-03-07 10:47:07.322+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-14"}^M
 len=93 ret=93 errno=0
2022-03-07 10:47:07.324+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"id": "libvirt-14", "error": {"class": "CommandNotFound", "desc": "The command migrate_set_speed has not been found"}}
2022-03-07 10:47:07.324+0000: 134391: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'migrate_set_speed': The command migrate_set_speed has not been found
2022-03-07 10:47:07.324+0000: 134391: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7fa4380028a0 msg={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-15"}^M
 fd=32
2022-03-07 10:47:07.324+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-15"}^M
 len=72 ret=72 errno=0
2022-03-07 10:47:07.324+0000: 134386: info : qemuMonitorIOWrite:457 : QEMU_MONITOR_IO_SEND_FD: mon=0x7fa4380028a0 fd=32 ret=72 errno=0
2022-03-07 10:47:07.325+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"return": {}, "id": "libvirt-15"}
2022-03-07 10:47:07.326+0000: 134391: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7fa4380028a0 msg={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-16"}^M
 fd=-1
2022-03-07 10:47:07.326+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-16"}^M
 len=112 ret=112 errno=0
2022-03-07 10:47:07.328+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650027, "microseconds": 327843}, "event": "MIGRATION", "data": {"status": "setup"}}
2022-03-07 10:47:07.328+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"return": {}, "id": "libvirt-16"}
2022-03-07 10:47:07.449+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650027, "microseconds": 449199}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
2022-03-07 10:47:07.449+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650027, "microseconds": 449363}, "event": "MIGRATION", "data": {"status": "active"}}
2022-03-07 10:47:07.807+0000: 134387: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7fa4380028a0 msg={"execute":"query-migrate","id":"libvirt-17"}^M
 fd=-1
2022-03-07 10:47:07.807+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"query-migrate","id":"libvirt-17"}^M
 len=47 ret=47 errno=0
2022-03-07 10:47:07.809+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"return": {"blocked": false, "expected-downtime": 300, "status": "active", "setup-time": 121, "total-time": 481, "ram": {"total": 32213049344, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 971380, "page-size": 4096, "remaining": 31357165568, "mbps": 70.597440000000006, "transferred": 28723376, "duplicate": 202401, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 26849280, "normal": 6555}}, "id": "libvirt-17"}
2022-03-07 10:47:20.063+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650040, "microseconds": 63299}, "event": "MIGRATION_PASS", "data": {"pass": 2}}
2022-03-07 10:47:20.068+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650040, "microseconds": 68660}, "event": "MIGRATION_PASS", "data": {"pass": 3}}
2022-03-07 10:47:20.142+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650040, "microseconds": 142735}, "event": "MIGRATION", "data": {"status": "completed"}}
2022-03-07 10:47:20.143+0000: 134391: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7fa4380028a0 msg={"execute":"query-migrate","id":"libvirt-18"}^M
 fd=-1
2022-03-07 10:47:20.143+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"query-migrate","id":"libvirt-18"}^M
 len=47 ret=47 errno=0
2022-03-07 10:47:20.145+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"return": {"blocked": false, "status": "completed", "setup-time": 121, "downtime": 79, "total-time": 12815, "ram": {"total": 32213049344, "postcopy-requests": 0, "dirty-sync-count": 3, "multifd-bytes": 0, "pages-per-second": 32710, "page-size": 4096, "remaining": 0, "mbps": 584.63040491570825, "transferred": 927267953, "duplicate": 7655360, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 856694784, "normal": 209154}}, "id": "libvirt-18"}
2022-03-07 10:47:20.145+0000: 134391: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7fa4380028a0 msg={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-19"}^M
 fd=-1
2022-03-07 10:47:20.145+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-19"}^M
 len=93 ret=93 errno=0
2022-03-07 10:47:20.146+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"id": "libvirt-19", "error": {"class": "CommandNotFound", "desc": "The command migrate_set_speed has not been found"}}
2022-03-07 10:47:20.147+0000: 134391: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'migrate_set_speed': The command migrate_set_speed has not been found
2022-03-07 10:47:20.150+0000: 134391: info : qemuMonitorClose:917 : QEMU_MONITOR_CLOSE: mon=0x7fa4380028a0 refs=2


> 
> If you libvirt_log_filters=2:qemu_monitor   then it ought to capture the
> QMP commands.
> 
> Regards,
> Daniel
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: starting to look at qemu savevm performance, a first regression detected
  2022-03-07 11:06     ` Claudio Fontana
@ 2022-03-07 11:31       ` Dr. David Alan Gilbert
  2022-03-07 12:07         ` Claudio Fontana
  0 siblings, 1 reply; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2022-03-07 11:31 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: Daniel P. Berrangé, qemu-devel, Juan Quintela

* Claudio Fontana (cfontana@suse.de) wrote:
> On 3/7/22 11:32 AM, Dr. David Alan Gilbert wrote:
> > * Claudio Fontana (cfontana@suse.de) wrote:
> >> On 3/5/22 2:20 PM, Claudio Fontana wrote:
> >>>
> >>> Hello all,
> >>>
> >>> I have been looking at some reports of bad qemu savevm performance in large VMs (around 20+ Gb),
> >>> when used in libvirt commands like:
> >>>
> >>>
> >>> virsh save domain /dev/null
> >>>
> >>>
> >>>
> >>> I have written a simple test to run in a Linux centos7-minimal-2009 guest, which allocates and touches 20G mem.
> >>>
> >>> With any qemu version since around 2020, I am not seeing more than 580 Mb/Sec even in the most ideal of situations.
> >>>
> >>> This drops to around 122 Mb/sec after commit: cbde7be900d2a2279cbc4becb91d1ddd6a014def .
> >>>
> >>> Here is the bisection for this particular drop in throughput:
> >>>
> >>> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
> >>> Author: Daniel P. Berrangé <berrange@redhat.com>
> >>> Date:   Fri Feb 19 18:40:12 2021 +0000
> >>>
> >>>     migrate: remove QMP/HMP commands for speed, downtime and cache size
> >>>     
> >>>     The generic 'migrate_set_parameters' command handle all types of param.
> >>>     
> >>>     Only the QMP commands were documented in the deprecations page, but the
> >>>     rationale for deprecating applies equally to HMP, and the replacements
> >>>     exist. Furthermore the HMP commands are just shims to the QMP commands,
> >>>     so removing the latter breaks the former unless they get re-implemented.
> >>>     
> >>>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >>>     Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> >>>
> >>>
> >>> git bisect start
> >>> # bad: [5c8463886d50eeb0337bd121ab877cf692731e36] Merge remote-tracking branch 'remotes/kraxel/tags/kraxel-20220304-pull-request' into staging
> >>> git bisect bad 5c8463886d50eeb0337bd121ab877cf692731e36
> >>> # good: [6cdf8c4efa073eac7d5f9894329e2d07743c2955] Update version for 4.2.1 release
> >>> git bisect good 6cdf8c4efa073eac7d5f9894329e2d07743c2955
> >>> # good: [b0ca999a43a22b38158a222233d3f5881648bb4f] Update version for v4.2.0 release
> >>> git bisect good b0ca999a43a22b38158a222233d3f5881648bb4f
> >>> # skip: [e2665f314d80d7edbfe7f8275abed7e2c93c0ddc] target/mips: Alias MSA vector registers on FPU scalar registers
> >>> git bisect skip e2665f314d80d7edbfe7f8275abed7e2c93c0ddc
> >>> # good: [4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3] tests/docker: Install static libc package in CentOS 7
> >>> git bisect good 4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3
> >>> # bad: [d4127349e316b5c78645f95dba5922196ac4cc23] Merge remote-tracking branch 'remotes/berrange-gitlab/tags/crypto-and-more-pull-request' into staging
> >>> git bisect bad d4127349e316b5c78645f95dba5922196ac4cc23
> >>> # bad: [d90f154867ec0ec22fd719164b88716e8fd48672] Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging
> >>> git bisect bad d90f154867ec0ec22fd719164b88716e8fd48672
> >>> # good: [dd5af6ece9b101d29895851a7441d848b7ccdbff] tests/docker: add a test-tcg for building then running check-tcg
> >>> git bisect good dd5af6ece9b101d29895851a7441d848b7ccdbff
> >>> # bad: [90ec1cff768fcbe1fa2870d2018f378376f4f744] target/riscv: Adjust privilege level for HLV(X)/HSV instructions
> >>> git bisect bad 90ec1cff768fcbe1fa2870d2018f378376f4f744
> >>> # good: [373969507a3dc7de2d291da7e1bd03acf46ec643] migration: Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARD
> >>> git bisect good 373969507a3dc7de2d291da7e1bd03acf46ec643
> >>> # good: [4083904bc9fe5da580f7ca397b1e828fbc322732] Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210317' into staging
> >>> git bisect good 4083904bc9fe5da580f7ca397b1e828fbc322732
> >>> # bad: [009ff89328b1da3ea8ba316bf2be2125bc9937c5] vl: allow passing JSON to -object
> >>> git bisect bad 009ff89328b1da3ea8ba316bf2be2125bc9937c5
> >>> # bad: [50243407457a9fb0ed17b9a9ba9fc9aee09495b1] qapi/qom: Drop deprecated 'props' from object-add
> >>> git bisect bad 50243407457a9fb0ed17b9a9ba9fc9aee09495b1
> >>> # bad: [1b507e55f8199eaad99744613823f6929e4d57c6] Merge remote-tracking branch 'remotes/berrange-gitlab/tags/dep-many-pull-request' into staging
> >>> git bisect bad 1b507e55f8199eaad99744613823f6929e4d57c6
> >>> # bad: [24e13a4dc1eb1630eceffc7ab334145d902e763d] chardev: reject use of 'wait' flag for socket client chardevs
> >>> git bisect bad 24e13a4dc1eb1630eceffc7ab334145d902e763d
> >>> # good: [8becb36063fb14df1e3ae4916215667e2cb65fa2] monitor: remove 'query-events' QMP command
> >>> git bisect good 8becb36063fb14df1e3ae4916215667e2cb65fa2
> >>> # bad: [8af54b9172ff3b9bbdbb3191ed84994d275a0d81] machine: remove 'query-cpus' QMP command
> >>> git bisect bad 8af54b9172ff3b9bbdbb3191ed84994d275a0d81
> >>> # bad: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP commands for speed, downtime and cache size
> >>> git bisect bad cbde7be900d2a2279cbc4becb91d1ddd6a014def
> >>> # first bad commit: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP commands for speed, downtime and cache size
> >>>
> >>>
> >>> Are there some obvious settings / options I am missing to regain the savevm performance after this commit?
> >>
> >> Answering myself: 
> > 
> > <oops we seem to have split this thread into two>
> > 
> >> this seems to be due to a resulting different default xbzrle cache size (probably interactions between libvirt/qemu versions?).
> >>
> >> When forcing the xbzrle cache size to a larger value, the performance is back.
> > 
> > That's weird that 'virsh save' is ending up using xbzrle.
> 
> virsh save (or qemu savevm..) seems to me like it uses a subset of the migration code and migration parameters but not all..
> 
> > 
> >>>
> >>> I have seen projects attempting to improve other aspects of performance (snapshot performance, etc), is there something going on to improve the transfer of RAM in savevm too?
> >>
> >>
> >> Still I would think that we should be able to do better than 600ish Mb/s , any ideas, prior work on this,
> >> to improve savevm performance, especially looking at RAM regions transfer speed?
> > 
> > My normal feeling is ~10Gbps for a live migrate over the wire; I rarely
> > try virsh save though.
> > If you're using xbzrle that might explain it; it's known to eat cpu -
> > but I'd never expect it to have been used with 'virsh save'.
> 
> some valgrind shows it among the top cpu eaters;
> 
> I wonder why we are able to do more than 2x better for actual live migration, compared with virsh save /dev/null ...

What speed do you get if you force xbzrle off?

Dave

> Thanks,
> 
> Claudio
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: starting to look at qemu savevm performance, a first regression detected
  2022-03-07 11:19       ` Claudio Fontana
@ 2022-03-07 12:00         ` Daniel P. Berrangé
  2022-03-07 12:09           ` Claudio Fontana
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel P. Berrangé @ 2022-03-07 12:00 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: qemu-devel, Dr. David Alan Gilbert, Juan Quintela

On Mon, Mar 07, 2022 at 12:19:22PM +0100, Claudio Fontana wrote:
> On 3/7/22 10:51 AM, Daniel P. Berrangé wrote:
> > On Mon, Mar 07, 2022 at 10:44:56AM +0100, Claudio Fontana wrote:
> >> Hello Daniel,
> >>
> >> On 3/7/22 10:27 AM, Daniel P. Berrangé wrote:
> >>> On Sat, Mar 05, 2022 at 02:19:39PM +0100, Claudio Fontana wrote:
> >>>>
> >>>> Hello all,
> >>>>
> >>>> I have been looking at some reports of bad qemu savevm performance in large VMs (around 20+ Gb),
> >>>> when used in libvirt commands like:
> >>>>
> >>>>
> >>>> virsh save domain /dev/null
> >>>>
> >>>>
> >>>>
> >>>> I have written a simple test to run in a Linux centos7-minimal-2009 guest, which allocates and touches 20G mem.
> >>>>
> >>>> With any qemu version since around 2020, I am not seeing more than 580 Mb/Sec even in the most ideal of situations.
> >>>>
> >>>> This drops to around 122 Mb/sec after commit: cbde7be900d2a2279cbc4becb91d1ddd6a014def .
> >>>>
> >>>> Here is the bisection for this particular drop in throughput:
> >>>>
> >>>> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
> >>>> Author: Daniel P. Berrangé <berrange@redhat.com>
> >>>> Date:   Fri Feb 19 18:40:12 2021 +0000
> >>>>
> >>>>     migrate: remove QMP/HMP commands for speed, downtime and cache size
> >>>>     
> >>>>     The generic 'migrate_set_parameters' command handle all types of param.
> >>>>     
> >>>>     Only the QMP commands were documented in the deprecations page, but the
> >>>>     rationale for deprecating applies equally to HMP, and the replacements
> >>>>     exist. Furthermore the HMP commands are just shims to the QMP commands,
> >>>>     so removing the latter breaks the former unless they get re-implemented.
> >>>>     
> >>>>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >>>>     Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> >>>
> >>> That doesn't make a whole lot of sense as a bisect result.
> >>> How reliable is that bisect end point ? Have you bisected
> >>> to that point more than once ?
> >>
> >> I did run through the bisect itself only once, so I'll double check that.
> >> The results seem to be reproducible almost to the second though, a savevm that took 35 seconds before the commit takes 2m 48 seconds after.
> >>
> >> For this test I am using libvirt v6.0.0.

I've just noticed this.  That version of libvirt is 2 years old and
doesn't have full support for migrate_set_parameters.


> 2022-03-07 10:47:20.145+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-19"}^M
>  len=93 ret=93 errno=0
> 2022-03-07 10:47:20.146+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"id": "libvirt-19", "error": {"class": "CommandNotFound", "desc": "The command migrate_set_speed has not been found"}}
> 2022-03-07 10:47:20.147+0000: 134391: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'migrate_set_speed': The command migrate_set_speed has not been found

We see the migrate_set_speed failing and libvirt obviously ignores that
failure.

In current libvirt migrate_set_speed is not used as it properly
handles migrate_set_parameters AFAICT.

I think you just need to upgrade libvirt if you want to use this
newer QEMU version

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: starting to look at qemu savevm performance, a first regression detected
  2022-03-07 11:31       ` Dr. David Alan Gilbert
@ 2022-03-07 12:07         ` Claudio Fontana
  0 siblings, 0 replies; 12+ messages in thread
From: Claudio Fontana @ 2022-03-07 12:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Daniel P. Berrangé, qemu-devel, Juan Quintela

On 3/7/22 12:31 PM, Dr. David Alan Gilbert wrote:
> * Claudio Fontana (cfontana@suse.de) wrote:
>> On 3/7/22 11:32 AM, Dr. David Alan Gilbert wrote:
>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>> On 3/5/22 2:20 PM, Claudio Fontana wrote:
>>>>>
>>>>> Hello all,
>>>>>
>>>>> I have been looking at some reports of bad qemu savevm performance in large VMs (around 20+ Gb),
>>>>> when used in libvirt commands like:
>>>>>
>>>>>
>>>>> virsh save domain /dev/null
>>>>>
>>>>>
>>>>>
>>>>> I have written a simple test to run in a Linux centos7-minimal-2009 guest, which allocates and touches 20G mem.
>>>>>
>>>>> With any qemu version since around 2020, I am not seeing more than 580 Mb/Sec even in the most ideal of situations.
>>>>>
>>>>> This drops to around 122 Mb/sec after commit: cbde7be900d2a2279cbc4becb91d1ddd6a014def .
>>>>>
>>>>> Here is the bisection for this particular drop in throughput:
>>>>>
>>>>> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
>>>>> Author: Daniel P. Berrangé <berrange@redhat.com>
>>>>> Date:   Fri Feb 19 18:40:12 2021 +0000
>>>>>
>>>>>     migrate: remove QMP/HMP commands for speed, downtime and cache size
>>>>>     
>>>>>     The generic 'migrate_set_parameters' command handle all types of param.
>>>>>     
>>>>>     Only the QMP commands were documented in the deprecations page, but the
>>>>>     rationale for deprecating applies equally to HMP, and the replacements
>>>>>     exist. Furthermore the HMP commands are just shims to the QMP commands,
>>>>>     so removing the latter breaks the former unless they get re-implemented.
>>>>>     
>>>>>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>>     Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>>>>
>>>>>
>>>>> git bisect start
>>>>> # bad: [5c8463886d50eeb0337bd121ab877cf692731e36] Merge remote-tracking branch 'remotes/kraxel/tags/kraxel-20220304-pull-request' into staging
>>>>> git bisect bad 5c8463886d50eeb0337bd121ab877cf692731e36
>>>>> # good: [6cdf8c4efa073eac7d5f9894329e2d07743c2955] Update version for 4.2.1 release
>>>>> git bisect good 6cdf8c4efa073eac7d5f9894329e2d07743c2955
>>>>> # good: [b0ca999a43a22b38158a222233d3f5881648bb4f] Update version for v4.2.0 release
>>>>> git bisect good b0ca999a43a22b38158a222233d3f5881648bb4f
>>>>> # skip: [e2665f314d80d7edbfe7f8275abed7e2c93c0ddc] target/mips: Alias MSA vector registers on FPU scalar registers
>>>>> git bisect skip e2665f314d80d7edbfe7f8275abed7e2c93c0ddc
>>>>> # good: [4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3] tests/docker: Install static libc package in CentOS 7
>>>>> git bisect good 4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3
>>>>> # bad: [d4127349e316b5c78645f95dba5922196ac4cc23] Merge remote-tracking branch 'remotes/berrange-gitlab/tags/crypto-and-more-pull-request' into staging
>>>>> git bisect bad d4127349e316b5c78645f95dba5922196ac4cc23
>>>>> # bad: [d90f154867ec0ec22fd719164b88716e8fd48672] Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging
>>>>> git bisect bad d90f154867ec0ec22fd719164b88716e8fd48672
>>>>> # good: [dd5af6ece9b101d29895851a7441d848b7ccdbff] tests/docker: add a test-tcg for building then running check-tcg
>>>>> git bisect good dd5af6ece9b101d29895851a7441d848b7ccdbff
>>>>> # bad: [90ec1cff768fcbe1fa2870d2018f378376f4f744] target/riscv: Adjust privilege level for HLV(X)/HSV instructions
>>>>> git bisect bad 90ec1cff768fcbe1fa2870d2018f378376f4f744
>>>>> # good: [373969507a3dc7de2d291da7e1bd03acf46ec643] migration: Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARD
>>>>> git bisect good 373969507a3dc7de2d291da7e1bd03acf46ec643
>>>>> # good: [4083904bc9fe5da580f7ca397b1e828fbc322732] Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210317' into staging
>>>>> git bisect good 4083904bc9fe5da580f7ca397b1e828fbc322732
>>>>> # bad: [009ff89328b1da3ea8ba316bf2be2125bc9937c5] vl: allow passing JSON to -object
>>>>> git bisect bad 009ff89328b1da3ea8ba316bf2be2125bc9937c5
>>>>> # bad: [50243407457a9fb0ed17b9a9ba9fc9aee09495b1] qapi/qom: Drop deprecated 'props' from object-add
>>>>> git bisect bad 50243407457a9fb0ed17b9a9ba9fc9aee09495b1
>>>>> # bad: [1b507e55f8199eaad99744613823f6929e4d57c6] Merge remote-tracking branch 'remotes/berrange-gitlab/tags/dep-many-pull-request' into staging
>>>>> git bisect bad 1b507e55f8199eaad99744613823f6929e4d57c6
>>>>> # bad: [24e13a4dc1eb1630eceffc7ab334145d902e763d] chardev: reject use of 'wait' flag for socket client chardevs
>>>>> git bisect bad 24e13a4dc1eb1630eceffc7ab334145d902e763d
>>>>> # good: [8becb36063fb14df1e3ae4916215667e2cb65fa2] monitor: remove 'query-events' QMP command
>>>>> git bisect good 8becb36063fb14df1e3ae4916215667e2cb65fa2
>>>>> # bad: [8af54b9172ff3b9bbdbb3191ed84994d275a0d81] machine: remove 'query-cpus' QMP command
>>>>> git bisect bad 8af54b9172ff3b9bbdbb3191ed84994d275a0d81
>>>>> # bad: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP commands for speed, downtime and cache size
>>>>> git bisect bad cbde7be900d2a2279cbc4becb91d1ddd6a014def
>>>>> # first bad commit: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP commands for speed, downtime and cache size
>>>>>
>>>>>
>>>>> Are there some obvious settings / options I am missing to regain the savevm performance after this commit?
>>>>
>>>> Answering myself: 
>>>
>>> <oops we seem to have split this thread into two>
>>>
>>>> this seems to be due to a resulting different default xbzrle cache size (probably interactions between libvirt/qemu versions?).
>>>>
>>>> When forcing the xbzrle cache size to a larger value, the performance is back.
>>>
>>> That's weird that 'virsh save' is ending up using xbzrle.
>>
>> virsh save (or qemu savevm..) seems to me like it uses a subset of the migration code and migration parameters but not all..
>>
>>>
>>>>>
>>>>> I have seen projects attempting to improve other aspects of performance (snapshot performance, etc), is there something going on to improve the transfer of RAM in savevm too?
>>>>
>>>>
>>>> Still I would think that we should be able to do better than 600ish Mb/s , any ideas, prior work on this,
>>>> to improve savevm performance, especially looking at RAM regions transfer speed?
>>>
>>> My normal feeling is ~10Gbps for a live migrate over the wire; I rarely
>>> try virsh save though.
>>> If you're using xbzrle that might explain it; it's known to eat cpu -
>>> but I'd never expect it to have been used with 'virsh save'.
>>
>> some valgrind shows it among the top cpu eaters;

well.. I was confused.

The usage of xbzrle is just on constantly calling migrate_use_xbzrle() and XBZRLE_cache_lock and XBZRLE_cache_unlock() as well as some xbzrle_cache_zero_page(),
which likely do not do anything useful, as ->ram_bulk_stage is not changed by anything so it should be true.


>>
>> I wonder why we are able to do more than 2x better for actual live migration, compared with virsh save /dev/null ...
> 
> What speed do you get if you force xbzrle off?


no substantial difference.


> 
> Dave
> 
>> Thanks,
>>
>> Claudio
>>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: starting to look at qemu savevm performance, a first regression detected
  2022-03-07 12:00         ` Daniel P. Berrangé
@ 2022-03-07 12:09           ` Claudio Fontana
  2022-03-07 12:20             ` Daniel P. Berrangé
  0 siblings, 1 reply; 12+ messages in thread
From: Claudio Fontana @ 2022-03-07 12:09 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, Dr. David Alan Gilbert, Juan Quintela

On 3/7/22 1:00 PM, Daniel P. Berrangé wrote:
> On Mon, Mar 07, 2022 at 12:19:22PM +0100, Claudio Fontana wrote:
>> On 3/7/22 10:51 AM, Daniel P. Berrangé wrote:
>>> On Mon, Mar 07, 2022 at 10:44:56AM +0100, Claudio Fontana wrote:
>>>> Hello Daniel,
>>>>
>>>> On 3/7/22 10:27 AM, Daniel P. Berrangé wrote:
>>>>> On Sat, Mar 05, 2022 at 02:19:39PM +0100, Claudio Fontana wrote:
>>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> I have been looking at some reports of bad qemu savevm performance in large VMs (around 20+ Gb),
>>>>>> when used in libvirt commands like:
>>>>>>
>>>>>>
>>>>>> virsh save domain /dev/null
>>>>>>
>>>>>>
>>>>>>
>>>>>> I have written a simple test to run in a Linux centos7-minimal-2009 guest, which allocates and touches 20G mem.
>>>>>>
>>>>>> With any qemu version since around 2020, I am not seeing more than 580 Mb/Sec even in the most ideal of situations.
>>>>>>
>>>>>> This drops to around 122 Mb/sec after commit: cbde7be900d2a2279cbc4becb91d1ddd6a014def .
>>>>>>
>>>>>> Here is the bisection for this particular drop in throughput:
>>>>>>
>>>>>> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
>>>>>> Author: Daniel P. Berrangé <berrange@redhat.com>
>>>>>> Date:   Fri Feb 19 18:40:12 2021 +0000
>>>>>>
>>>>>>     migrate: remove QMP/HMP commands for speed, downtime and cache size
>>>>>>     
>>>>>>     The generic 'migrate_set_parameters' command handle all types of param.
>>>>>>     
>>>>>>     Only the QMP commands were documented in the deprecations page, but the
>>>>>>     rationale for deprecating applies equally to HMP, and the replacements
>>>>>>     exist. Furthermore the HMP commands are just shims to the QMP commands,
>>>>>>     so removing the latter breaks the former unless they get re-implemented.
>>>>>>     
>>>>>>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>>>     Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>>>>
>>>>> That doesn't make a whole lot of sense as a bisect result.
>>>>> How reliable is that bisect end point ? Have you bisected
>>>>> to that point more than once ?
>>>>
>>>> I did run through the bisect itself only once, so I'll double check that.
>>>> The results seem to be reproducible almost to the second though, a savevm that took 35 seconds before the commit takes 2m 48 seconds after.
>>>>
>>>> For this test I am using libvirt v6.0.0.
> 
> I've just noticed this.  That version of libvirt is 2 years old and
> doesn't have full support for migrate_set_parameters.
> 
> 
>> 2022-03-07 10:47:20.145+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-19"}^M
>>  len=93 ret=93 errno=0
>> 2022-03-07 10:47:20.146+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"id": "libvirt-19", "error": {"class": "CommandNotFound", "desc": "The command migrate_set_speed has not been found"}}
>> 2022-03-07 10:47:20.147+0000: 134391: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'migrate_set_speed': The command migrate_set_speed has not been found
> 
> We see the migrate_set_speed failing and libvirt obviously ignores that
> failure.
> 
> In current libvirt migrate_set_speed is not used as it properly
> handles migrate_set_parameters AFAICT.
> 
> I think you just need to upgrade libvirt if you want to use this
> newer QEMU version
> 
> Regards,
> Daniel
> 

Got it, this explains it, sorry for the noise on this.

I'll continue to investigate the general issue of low throughput with virsh save / qemu savevm .

Thanks,

CLaudio


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: starting to look at qemu savevm performance, a first regression detected
  2022-03-07 12:09           ` Claudio Fontana
@ 2022-03-07 12:20             ` Daniel P. Berrangé
  2022-03-07 12:26               ` Claudio Fontana
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel P. Berrangé @ 2022-03-07 12:20 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: qemu-devel, Dr. David Alan Gilbert, Juan Quintela

On Mon, Mar 07, 2022 at 01:09:55PM +0100, Claudio Fontana wrote:
> On 3/7/22 1:00 PM, Daniel P. Berrangé wrote:
> > On Mon, Mar 07, 2022 at 12:19:22PM +0100, Claudio Fontana wrote:
> >> On 3/7/22 10:51 AM, Daniel P. Berrangé wrote:
> >>> On Mon, Mar 07, 2022 at 10:44:56AM +0100, Claudio Fontana wrote:
> >>>> Hello Daniel,
> >>>>
> >>>> On 3/7/22 10:27 AM, Daniel P. Berrangé wrote:
> >>>>> On Sat, Mar 05, 2022 at 02:19:39PM +0100, Claudio Fontana wrote:
> >>>>>>
> >>>>>> Hello all,
> >>>>>>
> >>>>>> I have been looking at some reports of bad qemu savevm performance in large VMs (around 20+ Gb),
> >>>>>> when used in libvirt commands like:
> >>>>>>
> >>>>>>
> >>>>>> virsh save domain /dev/null
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> I have written a simple test to run in a Linux centos7-minimal-2009 guest, which allocates and touches 20G mem.
> >>>>>>
> >>>>>> With any qemu version since around 2020, I am not seeing more than 580 Mb/Sec even in the most ideal of situations.
> >>>>>>
> >>>>>> This drops to around 122 Mb/sec after commit: cbde7be900d2a2279cbc4becb91d1ddd6a014def .
> >>>>>>
> >>>>>> Here is the bisection for this particular drop in throughput:
> >>>>>>
> >>>>>> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
> >>>>>> Author: Daniel P. Berrangé <berrange@redhat.com>
> >>>>>> Date:   Fri Feb 19 18:40:12 2021 +0000
> >>>>>>
> >>>>>>     migrate: remove QMP/HMP commands for speed, downtime and cache size
> >>>>>>     
> >>>>>>     The generic 'migrate_set_parameters' command handle all types of param.
> >>>>>>     
> >>>>>>     Only the QMP commands were documented in the deprecations page, but the
> >>>>>>     rationale for deprecating applies equally to HMP, and the replacements
> >>>>>>     exist. Furthermore the HMP commands are just shims to the QMP commands,
> >>>>>>     so removing the latter breaks the former unless they get re-implemented.
> >>>>>>     
> >>>>>>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >>>>>>     Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> >>>>>
> >>>>> That doesn't make a whole lot of sense as a bisect result.
> >>>>> How reliable is that bisect end point ? Have you bisected
> >>>>> to that point more than once ?
> >>>>
> >>>> I did run through the bisect itself only once, so I'll double check that.
> >>>> The results seem to be reproducible almost to the second though, a savevm that took 35 seconds before the commit takes 2m 48 seconds after.
> >>>>
> >>>> For this test I am using libvirt v6.0.0.
> > 
> > I've just noticed this.  That version of libvirt is 2 years old and
> > doesn't have full support for migrate_set_parameters.
> > 
> > 
> >> 2022-03-07 10:47:20.145+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-19"}^M
> >>  len=93 ret=93 errno=0
> >> 2022-03-07 10:47:20.146+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"id": "libvirt-19", "error": {"class": "CommandNotFound", "desc": "The command migrate_set_speed has not been found"}}
> >> 2022-03-07 10:47:20.147+0000: 134391: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'migrate_set_speed': The command migrate_set_speed has not been found
> > 
> > We see the migrate_set_speed failing and libvirt obviously ignores that
> > failure.
> > 
> > In current libvirt migrate_set_speed is not used as it properly
> > handles migrate_set_parameters AFAICT.
> > 
> > I think you just need to upgrade libvirt if you want to use this
> > newer QEMU version
> > 
> > Regards,
> > Daniel
> > 
> 
> Got it, this explains it, sorry for the noise on this.
> 
> I'll continue to investigate the general issue of low throughput with virsh save / qemu savevm .

BTW, consider measuring with the --bypass-cache flag to virsh save.
This causes libvirt to use a I/O helper that uses O_DIRECT when
saving the image. This should give more predictable results by
avoiding the influence of host I/O cache which can be in a differnt
state of usage each time you measure.  It was also intended that
by avoiding hitting cache, saving the memory image of a large VM
will not push other useful stuff out of host I/O  cache which can
negatively impact other running VMs.

Also it is possible to configure compression on the libvirt side
which may be useful if you have spare CPU cycles, but your storage
is slow. See 'save_image_format' in the /etc/libvirt/qemu.conf

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: starting to look at qemu savevm performance, a first regression detected
  2022-03-07 12:20             ` Daniel P. Berrangé
@ 2022-03-07 12:26               ` Claudio Fontana
  2022-03-07 12:28                 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 12+ messages in thread
From: Claudio Fontana @ 2022-03-07 12:26 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, Dr. David Alan Gilbert, Juan Quintela

On 3/7/22 1:20 PM, Daniel P. Berrangé wrote:
> On Mon, Mar 07, 2022 at 01:09:55PM +0100, Claudio Fontana wrote:
>> On 3/7/22 1:00 PM, Daniel P. Berrangé wrote:
>>> On Mon, Mar 07, 2022 at 12:19:22PM +0100, Claudio Fontana wrote:
>>>> On 3/7/22 10:51 AM, Daniel P. Berrangé wrote:
>>>>> On Mon, Mar 07, 2022 at 10:44:56AM +0100, Claudio Fontana wrote:
>>>>>> Hello Daniel,
>>>>>>
>>>>>> On 3/7/22 10:27 AM, Daniel P. Berrangé wrote:
>>>>>>> On Sat, Mar 05, 2022 at 02:19:39PM +0100, Claudio Fontana wrote:
>>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> I have been looking at some reports of bad qemu savevm performance in large VMs (around 20+ Gb),
>>>>>>>> when used in libvirt commands like:
>>>>>>>>
>>>>>>>>
>>>>>>>> virsh save domain /dev/null
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I have written a simple test to run in a Linux centos7-minimal-2009 guest, which allocates and touches 20G mem.
>>>>>>>>
>>>>>>>> With any qemu version since around 2020, I am not seeing more than 580 Mb/Sec even in the most ideal of situations.
>>>>>>>>
>>>>>>>> This drops to around 122 Mb/sec after commit: cbde7be900d2a2279cbc4becb91d1ddd6a014def .
>>>>>>>>
>>>>>>>> Here is the bisection for this particular drop in throughput:
>>>>>>>>
>>>>>>>> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
>>>>>>>> Author: Daniel P. Berrangé <berrange@redhat.com>
>>>>>>>> Date:   Fri Feb 19 18:40:12 2021 +0000
>>>>>>>>
>>>>>>>>     migrate: remove QMP/HMP commands for speed, downtime and cache size
>>>>>>>>     
>>>>>>>>     The generic 'migrate_set_parameters' command handle all types of param.
>>>>>>>>     
>>>>>>>>     Only the QMP commands were documented in the deprecations page, but the
>>>>>>>>     rationale for deprecating applies equally to HMP, and the replacements
>>>>>>>>     exist. Furthermore the HMP commands are just shims to the QMP commands,
>>>>>>>>     so removing the latter breaks the former unless they get re-implemented.
>>>>>>>>     
>>>>>>>>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>>>>>     Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>>>>>>
>>>>>>> That doesn't make a whole lot of sense as a bisect result.
>>>>>>> How reliable is that bisect end point ? Have you bisected
>>>>>>> to that point more than once ?
>>>>>>
>>>>>> I did run through the bisect itself only once, so I'll double check that.
>>>>>> The results seem to be reproducible almost to the second though, a savevm that took 35 seconds before the commit takes 2m 48 seconds after.
>>>>>>
>>>>>> For this test I am using libvirt v6.0.0.
>>>
>>> I've just noticed this.  That version of libvirt is 2 years old and
>>> doesn't have full support for migrate_set_parameters.
>>>
>>>
>>>> 2022-03-07 10:47:20.145+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-19"}^M
>>>>  len=93 ret=93 errno=0
>>>> 2022-03-07 10:47:20.146+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"id": "libvirt-19", "error": {"class": "CommandNotFound", "desc": "The command migrate_set_speed has not been found"}}
>>>> 2022-03-07 10:47:20.147+0000: 134391: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'migrate_set_speed': The command migrate_set_speed has not been found
>>>
>>> We see the migrate_set_speed failing and libvirt obviously ignores that
>>> failure.
>>>
>>> In current libvirt migrate_set_speed is not used as it properly
>>> handles migrate_set_parameters AFAICT.
>>>
>>> I think you just need to upgrade libvirt if you want to use this
>>> newer QEMU version
>>>
>>> Regards,
>>> Daniel
>>>
>>
>> Got it, this explains it, sorry for the noise on this.
>>
>> I'll continue to investigate the general issue of low throughput with virsh save / qemu savevm .
> 
> BTW, consider measuring with the --bypass-cache flag to virsh save.
> This causes libvirt to use a I/O helper that uses O_DIRECT when
> saving the image. This should give more predictable results by
> avoiding the influence of host I/O cache which can be in a differnt
> state of usage each time you measure.  It was also intended that
> by avoiding hitting cache, saving the memory image of a large VM
> will not push other useful stuff out of host I/O  cache which can
> negatively impact other running VMs.
> 
> Also it is possible to configure compression on the libvirt side
> which may be useful if you have spare CPU cycles, but your storage
> is slow. See 'save_image_format' in the /etc/libvirt/qemu.conf
> 
> With regards,
> Daniel
> 

Hi Daniel, thanks for these good info,

regarding slow storage, for these tests I am saving to /dev/null to avoid having to take storage into account
(and still getting low bandwidth unfortunately) so I guess compression is out of the question.

Thanks!

Claudio


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: starting to look at qemu savevm performance, a first regression detected
  2022-03-07 12:26               ` Claudio Fontana
@ 2022-03-07 12:28                 ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2022-03-07 12:28 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: Daniel P. Berrangé, qemu-devel, Juan Quintela

* Claudio Fontana (cfontana@suse.de) wrote:
> On 3/7/22 1:20 PM, Daniel P. Berrangé wrote:
> > On Mon, Mar 07, 2022 at 01:09:55PM +0100, Claudio Fontana wrote:
> >> On 3/7/22 1:00 PM, Daniel P. Berrangé wrote:
> >>> On Mon, Mar 07, 2022 at 12:19:22PM +0100, Claudio Fontana wrote:
> >>>> On 3/7/22 10:51 AM, Daniel P. Berrangé wrote:
> >>>>> On Mon, Mar 07, 2022 at 10:44:56AM +0100, Claudio Fontana wrote:
> >>>>>> Hello Daniel,
> >>>>>>
> >>>>>> On 3/7/22 10:27 AM, Daniel P. Berrangé wrote:
> >>>>>>> On Sat, Mar 05, 2022 at 02:19:39PM +0100, Claudio Fontana wrote:
> >>>>>>>>
> >>>>>>>> Hello all,
> >>>>>>>>
> >>>>>>>> I have been looking at some reports of bad qemu savevm performance in large VMs (around 20+ Gb),
> >>>>>>>> when used in libvirt commands like:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> virsh save domain /dev/null
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I have written a simple test to run in a Linux centos7-minimal-2009 guest, which allocates and touches 20G mem.
> >>>>>>>>
> >>>>>>>> With any qemu version since around 2020, I am not seeing more than 580 Mb/Sec even in the most ideal of situations.
> >>>>>>>>
> >>>>>>>> This drops to around 122 Mb/sec after commit: cbde7be900d2a2279cbc4becb91d1ddd6a014def .
> >>>>>>>>
> >>>>>>>> Here is the bisection for this particular drop in throughput:
> >>>>>>>>
> >>>>>>>> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
> >>>>>>>> Author: Daniel P. Berrangé <berrange@redhat.com>
> >>>>>>>> Date:   Fri Feb 19 18:40:12 2021 +0000
> >>>>>>>>
> >>>>>>>>     migrate: remove QMP/HMP commands for speed, downtime and cache size
> >>>>>>>>     
> >>>>>>>>     The generic 'migrate_set_parameters' command handle all types of param.
> >>>>>>>>     
> >>>>>>>>     Only the QMP commands were documented in the deprecations page, but the
> >>>>>>>>     rationale for deprecating applies equally to HMP, and the replacements
> >>>>>>>>     exist. Furthermore the HMP commands are just shims to the QMP commands,
> >>>>>>>>     so removing the latter breaks the former unless they get re-implemented.
> >>>>>>>>     
> >>>>>>>>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >>>>>>>>     Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> >>>>>>>
> >>>>>>> That doesn't make a whole lot of sense as a bisect result.
> >>>>>>> How reliable is that bisect end point ? Have you bisected
> >>>>>>> to that point more than once ?
> >>>>>>
> >>>>>> I did run through the bisect itself only once, so I'll double check that.
> >>>>>> The results seem to be reproducible almost to the second though, a savevm that took 35 seconds before the commit takes 2m 48 seconds after.
> >>>>>>
> >>>>>> For this test I am using libvirt v6.0.0.
> >>>
> >>> I've just noticed this.  That version of libvirt is 2 years old and
> >>> doesn't have full support for migrate_set_parameters.
> >>>
> >>>
> >>>> 2022-03-07 10:47:20.145+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-19"}^M
> >>>>  len=93 ret=93 errno=0
> >>>> 2022-03-07 10:47:20.146+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"id": "libvirt-19", "error": {"class": "CommandNotFound", "desc": "The command migrate_set_speed has not been found"}}
> >>>> 2022-03-07 10:47:20.147+0000: 134391: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'migrate_set_speed': The command migrate_set_speed has not been found
> >>>
> >>> We see the migrate_set_speed failing and libvirt obviously ignores that
> >>> failure.
> >>>
> >>> In current libvirt migrate_set_speed is not used as it properly
> >>> handles migrate_set_parameters AFAICT.
> >>>
> >>> I think you just need to upgrade libvirt if you want to use this
> >>> newer QEMU version
> >>>
> >>> Regards,
> >>> Daniel
> >>>
> >>
> >> Got it, this explains it, sorry for the noise on this.
> >>
> >> I'll continue to investigate the general issue of low throughput with virsh save / qemu savevm .
> > 
> > BTW, consider measuring with the --bypass-cache flag to virsh save.
> > This causes libvirt to use a I/O helper that uses O_DIRECT when
> > saving the image. This should give more predictable results by
> > avoiding the influence of host I/O cache which can be in a differnt
> > state of usage each time you measure.  It was also intended that
> > by avoiding hitting cache, saving the memory image of a large VM
> > will not push other useful stuff out of host I/O  cache which can
> > negatively impact other running VMs.
> > 
> > Also it is possible to configure compression on the libvirt side
> > which may be useful if you have spare CPU cycles, but your storage
> > is slow. See 'save_image_format' in the /etc/libvirt/qemu.conf
> > 
> > With regards,
> > Daniel
> > 
> 
> Hi Daniel, thanks for these good info,
> 
> regarding slow storage, for these tests I am saving to /dev/null to avoid having to take storage into account
> (and still getting low bandwidth unfortunately) so I guess compression is out of the question.

What type of speeds do you get if you try a migrate to a netcat socket?

Dave

> Thanks!
> 
> Claudio
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-07 12:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-05 13:20 starting to look at qemu savevm performance, a first regression detected Claudio Fontana
2022-03-05 14:11 ` Claudio Fontana
2022-03-07 10:32   ` Dr. David Alan Gilbert
2022-03-07 11:06     ` Claudio Fontana
2022-03-07 11:31       ` Dr. David Alan Gilbert
2022-03-07 12:07         ` Claudio Fontana
     [not found] <8826b03d-e5e9-0e65-cab7-ea1829f48e6c@suse.de>
     [not found] ` <YiXQHIWtHx5BocxK@redhat.com>
     [not found]   ` <62ba8b1e-d641-5b10-c1b3-54b7d5a652e7@suse.de>
     [not found]     ` <YiXVh1P4oJNuEtFM@redhat.com>
2022-03-07 11:19       ` Claudio Fontana
2022-03-07 12:00         ` Daniel P. Berrangé
2022-03-07 12:09           ` Claudio Fontana
2022-03-07 12:20             ` Daniel P. Berrangé
2022-03-07 12:26               ` Claudio Fontana
2022-03-07 12:28                 ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).