* [PATCH 0/2] tests/migration: Fix migration-test slowdown
@ 2023-04-12 14:19 Juan Quintela
2023-04-12 14:20 ` [PATCH 1/2] tests/migration: Make precopy fast Juan Quintela
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Juan Quintela @ 2023-04-12 14:19 UTC (permalink / raw)
To: qemu-devel; +Cc: Paolo Bonzini, Laurent Vivier, Juan Quintela, Thomas Huth
Since commit:
commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f
Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
Date: Mon Mar 6 15:26:12 2023 +0000
tests/migration: Tweek auto converge limits check
Thomas found an autoconverge test failure where the
migration completed before the autoconverge had kicked in.
[...]
migration-test has become very slow.
On my laptop, before that commit migration-test takes 2min10seconds
After that commit, it takes around 11minutes
We can't revert it because it fixes a real problem when the host
machine is overloaded. See the comment on test_migrate_auto_converge().
So I did two things here:
- Once that we have setup up precopy, we can move to full speed, no
need to continue at 3MB/s. That slowed everything a lot.
- Only run auto_converge tests when we are asking for slow tests.
Only that test on my hardware requires more than 1min to run. We
need to run it at 3MB/s, but we are asking it to do 15 iterations
throgh 150MB of RAM. We can have a test that is (reasonably) fast,
or one that also works when machine is very loaded.
To test that things still works over load, I used my desktop (ancient
core i9900), and run migration-test in a loop in 20 terminals (load
was 40) and didn't see a single failure in more than 1 hour run.
Please, review.
PD. Yes, I am still looking at the dreaded multifd_cancel test, but
even on this setup, I am unable to get a failure. I have never seen a
failure on it when I am running it, but I am only running x86 kvm
linux. Moving to arm or tcg and see how well that goes.
Juan Quintela (2):
tests/migration: Make precopy fast
tests/migration: Only run auto_converge in slow mode
tests/qtest/migration-test.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 1/2] tests/migration: Make precopy fast 2023-04-12 14:19 [PATCH 0/2] tests/migration: Fix migration-test slowdown Juan Quintela @ 2023-04-12 14:20 ` Juan Quintela 2023-04-18 11:53 ` Daniel P. Berrangé 2023-04-12 14:20 ` [PATCH 2/2] tests/migration: Only run auto_converge in slow mode Juan Quintela 2023-04-18 10:59 ` [PATCH 0/2] tests/migration: Fix migration-test slowdown Thomas Huth 2 siblings, 1 reply; 13+ messages in thread From: Juan Quintela @ 2023-04-12 14:20 UTC (permalink / raw) To: qemu-devel; +Cc: Paolo Bonzini, Laurent Vivier, Juan Quintela, Thomas Huth Otherwise we do the 1st migration iteration at a too slow speed. Signed-off-by: Juan Quintela <quintela@redhat.com> --- tests/qtest/migration-test.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 3b615b0da9..7b05b0b7dd 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -1348,6 +1348,7 @@ static void test_precopy_common(MigrateCommon *args) migrate_qmp(from, args->connect_uri, "{}"); } + migrate_ensure_converge(from); if (args->result != MIG_TEST_SUCCEED) { bool allow_active = args->result == MIG_TEST_FAIL; @@ -1365,8 +1366,6 @@ static void test_precopy_common(MigrateCommon *args) wait_for_migration_pass(from); } - migrate_ensure_converge(from); - /* We do this first, as it has a timeout to stop us * hanging forever if migration didn't converge */ wait_for_migration_complete(from); -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] tests/migration: Make precopy fast 2023-04-12 14:20 ` [PATCH 1/2] tests/migration: Make precopy fast Juan Quintela @ 2023-04-18 11:53 ` Daniel P. Berrangé 2023-04-18 12:20 ` Juan Quintela 0 siblings, 1 reply; 13+ messages in thread From: Daniel P. Berrangé @ 2023-04-18 11:53 UTC (permalink / raw) To: Juan Quintela; +Cc: qemu-devel, Paolo Bonzini, Laurent Vivier, Thomas Huth On Wed, Apr 12, 2023 at 04:20:00PM +0200, Juan Quintela wrote: > Otherwise we do the 1st migration iteration at a too slow speed. > > Signed-off-by: Juan Quintela <quintela@redhat.com> > --- > tests/qtest/migration-test.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > index 3b615b0da9..7b05b0b7dd 100644 > --- a/tests/qtest/migration-test.c > +++ b/tests/qtest/migration-test.c > @@ -1348,6 +1348,7 @@ static void test_precopy_common(MigrateCommon *args) > migrate_qmp(from, args->connect_uri, "{}"); > } > > + migrate_ensure_converge(from); This isn't right - it defeats the point of having the call to migrate_ensure_non_converge() a few lines earlier. > if (args->result != MIG_TEST_SUCCEED) { > bool allow_active = args->result == MIG_TEST_FAIL; > @@ -1365,8 +1366,6 @@ static void test_precopy_common(MigrateCommon *args) > wait_for_migration_pass(from); > } > > - migrate_ensure_converge(from); > - The reason why we had it here was to ensure that we test more than 1 iteration of migration. With this change, migrate will succeed on the first pass IIUC, and so we won't be exercising the more complex code path of repeated iterations. I do agree with the overall idea though. We have many many migration test scenarios and we don't need all of them to be testing multiple iterations - a couple would be sufficient. In fact we don't even need to be testing live migration for most of the cases. All the TLS test cases could be run with guest CPUs paused entirely removing any dirtying, since they're only interested in the initial network handshake/setup process testnig. I had some patches I was finishing off just before I went on vacation a few weeks ago which do this kind of optimization, which I can send out shortly. > /* We do this first, as it has a timeout to stop us > * hanging forever if migration didn't converge */ > wait_for_migration_complete(from); With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] tests/migration: Make precopy fast 2023-04-18 11:53 ` Daniel P. Berrangé @ 2023-04-18 12:20 ` Juan Quintela 2023-04-21 17:22 ` Daniel P. Berrangé 0 siblings, 1 reply; 13+ messages in thread From: Juan Quintela @ 2023-04-18 12:20 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Paolo Bonzini, Laurent Vivier, Thomas Huth Daniel P. Berrangé <berrange@redhat.com> wrote: > On Wed, Apr 12, 2023 at 04:20:00PM +0200, Juan Quintela wrote: >> Otherwise we do the 1st migration iteration at a too slow speed. >> >> Signed-off-by: Juan Quintela <quintela@redhat.com> >> --- >> tests/qtest/migration-test.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c >> index 3b615b0da9..7b05b0b7dd 100644 >> --- a/tests/qtest/migration-test.c >> +++ b/tests/qtest/migration-test.c >> @@ -1348,6 +1348,7 @@ static void test_precopy_common(MigrateCommon *args) >> migrate_qmp(from, args->connect_uri, "{}"); >> } >> >> + migrate_ensure_converge(from); > > This isn't right - it defeats the point of having the call to > migrate_ensure_non_converge() a few lines earlier. Depends on what is the definiton or "right" O:-) >> if (args->result != MIG_TEST_SUCCEED) { >> bool allow_active = args->result == MIG_TEST_FAIL; >> @@ -1365,8 +1366,6 @@ static void test_precopy_common(MigrateCommon *args) >> wait_for_migration_pass(from); >> } >> >> - migrate_ensure_converge(from); >> - > > The reason why we had it here was to ensure that we test more than > 1 iteration of migration. With this change, migrate will succeed > on the first pass IIUC, and so we won't be exercising the more > complex code path of repeated iterations. Aha. If that is the definition of "right", then I agree that my changes are wrong. But then I think we should change how we do the test. We should split this function (then same for postcopy, multifd, etc) to have to versions, one that want to have multiple rounds, and another that can finish as fast as possible. This way we need to setup the 3MB/s only for the tests that we want to loop, and for the others put something faster. > > I do agree with the overall idea though. We have many many migration > test scenarios and we don't need all of them to be testing multiple > iterations - a couple would be sufficient. > > In fact we don't even need to be testing live migration for most > of the cases. All the TLS test cases could be run with guest CPUs > paused entirely removing any dirtying, since they're only interested > in the initial network handshake/setup process testnig. > > I had some patches I was finishing off just before I went on vacation > a few weeks ago which do this kind of optimization, which I can send > out shortly. I will wait for your patches before I sent anything different. I have local patches for doing something different, changing "-serial file:%s/src_serial " and other friends to: "-serial file:%s/src_serial%pid " So we are sure that two tests never "reuse" the socket, as it can create problems for example when doing the cancel and relaunching the destination. But as said, will wait until you send your series to send anything. Later, Juan. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] tests/migration: Make precopy fast 2023-04-18 12:20 ` Juan Quintela @ 2023-04-21 17:22 ` Daniel P. Berrangé 0 siblings, 0 replies; 13+ messages in thread From: Daniel P. Berrangé @ 2023-04-21 17:22 UTC (permalink / raw) To: Juan Quintela; +Cc: qemu-devel, Paolo Bonzini, Laurent Vivier, Thomas Huth On Tue, Apr 18, 2023 at 02:20:27PM +0200, Juan Quintela wrote: > Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Wed, Apr 12, 2023 at 04:20:00PM +0200, Juan Quintela wrote: > >> Otherwise we do the 1st migration iteration at a too slow speed. > >> > >> Signed-off-by: Juan Quintela <quintela@redhat.com> > >> --- > >> tests/qtest/migration-test.c | 3 +-- > >> 1 file changed, 1 insertion(+), 2 deletions(-) > >> > >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > >> index 3b615b0da9..7b05b0b7dd 100644 > >> --- a/tests/qtest/migration-test.c > >> +++ b/tests/qtest/migration-test.c > >> @@ -1348,6 +1348,7 @@ static void test_precopy_common(MigrateCommon *args) > >> migrate_qmp(from, args->connect_uri, "{}"); > >> } > >> > >> + migrate_ensure_converge(from); > > > > This isn't right - it defeats the point of having the call to > > migrate_ensure_non_converge() a few lines earlier. > > Depends on what is the definiton or "right" O:-) > > >> if (args->result != MIG_TEST_SUCCEED) { > >> bool allow_active = args->result == MIG_TEST_FAIL; > >> @@ -1365,8 +1366,6 @@ static void test_precopy_common(MigrateCommon *args) > >> wait_for_migration_pass(from); > >> } > >> > >> - migrate_ensure_converge(from); > >> - > > > > The reason why we had it here was to ensure that we test more than > > 1 iteration of migration. With this change, migrate will succeed > > on the first pass IIUC, and so we won't be exercising the more > > complex code path of repeated iterations. > > Aha. > > If that is the definition of "right", then I agree that my changes are > wrong. > > But then I think we should change how we do the test. We should split > this function (then same for postcopy, multifd, etc) to have to > versions, one that want to have multiple rounds, and another that can > finish as fast as possible. > > This way we need to setup the 3MB/s only for the tests that we want to > loop, and for the others put something faster. > > > > > > I do agree with the overall idea though. We have many many migration > > test scenarios and we don't need all of them to be testing multiple > > iterations - a couple would be sufficient. > > > > In fact we don't even need to be testing live migration for most > > of the cases. All the TLS test cases could be run with guest CPUs > > paused entirely removing any dirtying, since they're only interested > > in the initial network handshake/setup process testnig. > > > > I had some patches I was finishing off just before I went on vacation > > a few weeks ago which do this kind of optimization, which I can send > > out shortly. > > I will wait for your patches before I sent anything different. > > I have local patches for doing something different, changing > > "-serial file:%s/src_serial " > > and other friends to: > > "-serial file:%s/src_serial%pid " > > So we are sure that two tests never "reuse" the socket, as it can create > problems for example when doing the cancel and relaunching the > destination. > > But as said, will wait until you send your series to send anything. I've just sent a new series which has some more differences and improvements https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg03688.html With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] tests/migration: Only run auto_converge in slow mode 2023-04-12 14:19 [PATCH 0/2] tests/migration: Fix migration-test slowdown Juan Quintela 2023-04-12 14:20 ` [PATCH 1/2] tests/migration: Make precopy fast Juan Quintela @ 2023-04-12 14:20 ` Juan Quintela 2023-04-18 10:59 ` [PATCH 0/2] tests/migration: Fix migration-test slowdown Thomas Huth 2 siblings, 0 replies; 13+ messages in thread From: Juan Quintela @ 2023-04-12 14:20 UTC (permalink / raw) To: qemu-devel; +Cc: Paolo Bonzini, Laurent Vivier, Juan Quintela, Thomas Huth Signed-off-by: Juan Quintela <quintela@redhat.com> --- tests/qtest/migration-test.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 7b05b0b7dd..6317131b50 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -1795,6 +1795,21 @@ static void test_validate_uuid_dst_not_set(void) do_test_validate_uuid(&args, false); } +/* + * The way auto_converge works, we need to do too many passes to + * run this test. Auto_converge logic is only run once every + * three iterations, so: + * + * - 3 iterations without auto_converge enabled + * - 3 iterations with pct = 5 + * - 3 iterations with pct = 30 + * - 3 iterations with pct = 55 + * - 3 iterations with pct = 80 + * - 3 iterations with pct = 95 (max(95, 80 + 25)) + * + * To make things even worse, we need to run the initial stage at + * 3MB/s so we enter autoconverge even when host is (over)loaded. + */ static void test_migrate_auto_converge(void) { g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); @@ -2574,8 +2589,12 @@ int main(int argc, char **argv) test_validate_uuid_src_not_set); qtest_add_func("/migration/validate_uuid_dst_not_set", test_validate_uuid_dst_not_set); - - qtest_add_func("/migration/auto_converge", test_migrate_auto_converge); + /* + * See explanation why this test is slow on function definition + */ + if (g_test_slow()) { + qtest_add_func("/migration/auto_converge", test_migrate_auto_converge); + } qtest_add_func("/migration/multifd/tcp/plain/none", test_multifd_tcp_none); /* -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown 2023-04-12 14:19 [PATCH 0/2] tests/migration: Fix migration-test slowdown Juan Quintela 2023-04-12 14:20 ` [PATCH 1/2] tests/migration: Make precopy fast Juan Quintela 2023-04-12 14:20 ` [PATCH 2/2] tests/migration: Only run auto_converge in slow mode Juan Quintela @ 2023-04-18 10:59 ` Thomas Huth 2023-04-18 11:42 ` Juan Quintela 2023-04-18 11:46 ` Juan Quintela 2 siblings, 2 replies; 13+ messages in thread From: Thomas Huth @ 2023-04-18 10:59 UTC (permalink / raw) To: Juan Quintela, qemu-devel; +Cc: Paolo Bonzini, Laurent Vivier On 12/04/2023 16.19, Juan Quintela wrote: > Since commit: > > commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f > Author: Dr. David Alan Gilbert <dgilbert@redhat.com> > Date: Mon Mar 6 15:26:12 2023 +0000 > > tests/migration: Tweek auto converge limits check > > Thomas found an autoconverge test failure where the > migration completed before the autoconverge had kicked in. > [...] > > migration-test has become very slow. > On my laptop, before that commit migration-test takes 2min10seconds > After that commit, it takes around 11minutes > > We can't revert it because it fixes a real problem when the host > machine is overloaded. See the comment on test_migrate_auto_converge(). Thanks, your patches decrease the time to run the migration-test from 16 minutes down to 5 minutes on my system, that's a great improvement, indeed! Tested-by: Thomas Huth <thuth@redhat.com> (though 5 minutes are still quite a lot for qtests ... maybe some other parts could be moved to only run with g_test_slow() ?) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown 2023-04-18 10:59 ` [PATCH 0/2] tests/migration: Fix migration-test slowdown Thomas Huth @ 2023-04-18 11:42 ` Juan Quintela 2023-04-18 12:44 ` Thomas Huth 2023-04-18 11:46 ` Juan Quintela 1 sibling, 1 reply; 13+ messages in thread From: Juan Quintela @ 2023-04-18 11:42 UTC (permalink / raw) To: Thomas Huth; +Cc: qemu-devel, Paolo Bonzini, Laurent Vivier Thomas Huth <thuth@redhat.com> wrote: > On 12/04/2023 16.19, Juan Quintela wrote: >> Since commit: >> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f >> Author: Dr. David Alan Gilbert <dgilbert@redhat.com> >> Date: Mon Mar 6 15:26:12 2023 +0000 >> tests/migration: Tweek auto converge limits check >> Thomas found an autoconverge test failure where the >> migration completed before the autoconverge had kicked in. >> [...] >> migration-test has become very slow. >> On my laptop, before that commit migration-test takes 2min10seconds >> After that commit, it takes around 11minutes >> We can't revert it because it fixes a real problem when the host >> machine is overloaded. See the comment on test_migrate_auto_converge(). > > Thanks, your patches decrease the time to run the migration-test from > 16 minutes down to 5 minutes on my system, that's a great improvement, > indeed! > > Tested-by: Thomas Huth <thuth@redhat.com> Thanks > (though 5 minutes are still quite a lot for qtests ... maybe some > other parts could be moved to only run with g_test_slow() ?) Hi Could you gime the output of: time for i in $(./tests/qtest/migration-test -l | grep "^/"); do echo $i; time ./tests/qtest/migration-test -p $i; done To see what tests are taking so long on your system? On my system (i9900K processor, i.e. not the latest) and auto_converge moved to slow the total of the tests take a bit more than 1 minute. qemu-system-x86_64 on x86_64 host: real 0m54.295s user 0m47.283s sys 0m16.969s qemu-system-aarch64 on x86_64 host: real 0m42.466s user 0m42.247s sys 0m13.747s s390x and ppc64 refuse to run non-natively. Later, Juan. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown 2023-04-18 11:42 ` Juan Quintela @ 2023-04-18 12:44 ` Thomas Huth 2023-04-18 13:19 ` Juan Quintela 0 siblings, 1 reply; 13+ messages in thread From: Thomas Huth @ 2023-04-18 12:44 UTC (permalink / raw) To: quintela; +Cc: qemu-devel, Paolo Bonzini, Laurent Vivier On 18/04/2023 13.42, Juan Quintela wrote: > Thomas Huth <thuth@redhat.com> wrote: >> On 12/04/2023 16.19, Juan Quintela wrote: >>> Since commit: >>> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f >>> Author: Dr. David Alan Gilbert <dgilbert@redhat.com> >>> Date: Mon Mar 6 15:26:12 2023 +0000 >>> tests/migration: Tweek auto converge limits check >>> Thomas found an autoconverge test failure where the >>> migration completed before the autoconverge had kicked in. >>> [...] >>> migration-test has become very slow. >>> On my laptop, before that commit migration-test takes 2min10seconds >>> After that commit, it takes around 11minutes >>> We can't revert it because it fixes a real problem when the host >>> machine is overloaded. See the comment on test_migrate_auto_converge(). >> >> Thanks, your patches decrease the time to run the migration-test from >> 16 minutes down to 5 minutes on my system, that's a great improvement, >> indeed! >> >> Tested-by: Thomas Huth <thuth@redhat.com> > > Thanks > >> (though 5 minutes are still quite a lot for qtests ... maybe some >> other parts could be moved to only run with g_test_slow() ?) > > Hi > > Could you gime the output of: > > time for i in $(./tests/qtest/migration-test -l | grep "^/"); do echo $i; time ./tests/qtest/migration-test -p $i; done > > To see what tests are taking so long on your system? > > On my system (i9900K processor, i.e. not the latest) and auto_converge > moved to slow the total of the tests take a bit more than 1 minute. This is with both of your patches applied: /x86_64/migration/bad_dest /x86_64/migration/bad_dest: OK real 0m0,342s user 0m0,123s sys 0m0,088s /x86_64/migration/fd_proto /x86_64/migration/fd_proto: OK real 0m1,135s user 0m0,730s sys 0m0,514s /x86_64/migration/validate_uuid /x86_64/migration/validate_uuid: OK real 0m0,519s user 0m0,379s sys 0m0,230s /x86_64/migration/validate_uuid_error /x86_64/migration/validate_uuid_error: OK real 0m0,275s user 0m0,145s sys 0m0,114s /x86_64/migration/validate_uuid_src_not_set /x86_64/migration/validate_uuid_src_not_set: OK real 0m0,514s user 0m0,377s sys 0m0,225s /x86_64/migration/validate_uuid_dst_not_set /x86_64/migration/validate_uuid_dst_not_set: OK real 0m0,519s user 0m0,392s sys 0m0,220s /x86_64/migration/dirty_ring /x86_64/migration/dirty_ring: OK real 0m1,079s user 0m0,613s sys 0m0,532s /x86_64/migration/vcpu_dirty_limit /x86_64/migration/vcpu_dirty_limit: OK real 0m6,308s user 0m4,025s sys 0m1,224s /x86_64/migration/postcopy/plain /x86_64/migration/postcopy/plain: OK real 0m35,446s user 0m47,208s sys 0m11,828s /x86_64/migration/postcopy/recovery/plain /x86_64/migration/postcopy/recovery/plain: OK real 0m34,707s user 0m46,357s sys 0m11,366s /x86_64/migration/postcopy/recovery/tls/psk /x86_64/migration/postcopy/recovery/tls/psk: OK real 0m33,052s user 0m46,539s sys 0m11,537s /x86_64/migration/postcopy/preempt/plain /x86_64/migration/postcopy/preempt/plain: OK real 0m35,107s user 0m46,556s sys 0m11,755s /x86_64/migration/postcopy/preempt/recovery/plain /x86_64/migration/postcopy/preempt/recovery/plain: OK real 0m35,329s user 0m46,951s sys 0m11,529s /x86_64/migration/postcopy/preempt/recovery/tls/psk /x86_64/migration/postcopy/preempt/recovery/tls/psk: OK real 0m36,237s user 0m51,450s sys 0m12,419s /x86_64/migration/postcopy/preempt/tls/psk /x86_64/migration/postcopy/preempt/tls/psk: OK real 0m35,033s user 0m49,244s sys 0m12,123s /x86_64/migration/postcopy/tls/psk /x86_64/migration/postcopy/tls/psk: OK real 0m36,097s user 0m50,873s sys 0m12,569s /x86_64/migration/precopy/unix/plain /x86_64/migration/precopy/unix/plain: OK real 0m1,034s user 0m0,654s sys 0m0,463s /x86_64/migration/precopy/unix/xbzrle /x86_64/migration/precopy/unix/xbzrle: OK real 0m1,119s user 0m0,740s sys 0m0,499s /x86_64/migration/precopy/unix/tls/psk /x86_64/migration/precopy/unix/tls/psk: OK real 0m3,555s user 0m5,448s sys 0m0,655s /x86_64/migration/precopy/unix/tls/x509/default-host /x86_64/migration/precopy/unix/tls/x509/default-host: OK real 0m1,022s user 0m1,664s sys 0m0,112s /x86_64/migration/precopy/unix/tls/x509/override-host /x86_64/migration/precopy/unix/tls/x509/override-host: OK real 0m1,841s user 0m1,921s sys 0m0,739s /x86_64/migration/precopy/tcp/plain /x86_64/migration/precopy/tcp/plain: OK real 0m1,241s user 0m0,859s sys 0m0,584s /x86_64/migration/precopy/tcp/tls/psk/match /x86_64/migration/precopy/tcp/tls/psk/match: OK real 0m2,114s user 0m2,628s sys 0m0,613s /x86_64/migration/precopy/tcp/tls/psk/mismatch /x86_64/migration/precopy/tcp/tls/psk/mismatch: OK real 0m0,575s user 0m0,554s sys 0m0,116s /x86_64/migration/precopy/tcp/tls/x509/default-host /x86_64/migration/precopy/tcp/tls/x509/default-host: OK real 0m1,538s user 0m1,460s sys 0m0,608s /x86_64/migration/precopy/tcp/tls/x509/override-host /x86_64/migration/precopy/tcp/tls/x509/override-host: OK real 0m1,825s user 0m1,915s sys 0m0,703s /x86_64/migration/precopy/tcp/tls/x509/mismatch-host /x86_64/migration/precopy/tcp/tls/x509/mismatch-host: OK real 0m0,961s user 0m1,430s sys 0m0,111s /x86_64/migration/precopy/tcp/tls/x509/friendly-client /x86_64/migration/precopy/tcp/tls/x509/friendly-client: OK real 0m1,806s user 0m1,897s sys 0m0,679s /x86_64/migration/precopy/tcp/tls/x509/hostile-client /x86_64/migration/precopy/tcp/tls/x509/hostile-client: OK real 0m0,645s user 0m0,614s sys 0m0,136s /x86_64/migration/precopy/tcp/tls/x509/allow-anon-client /x86_64/migration/precopy/tcp/tls/x509/allow-anon-client: OK real 0m2,204s user 0m2,695s sys 0m0,667s /x86_64/migration/precopy/tcp/tls/x509/reject-anon-client /x86_64/migration/precopy/tcp/tls/x509/reject-anon-client: OK real 0m1,530s user 0m2,360s sys 0m0,156s /x86_64/migration/multifd/tcp/plain/none /x86_64/migration/multifd/tcp/plain/none: OK real 0m1,055s user 0m0,647s sys 0m0,592s /x86_64/migration/multifd/tcp/plain/zlib /x86_64/migration/multifd/tcp/plain/zlib: OK real 0m1,144s user 0m1,763s sys 0m0,437s /x86_64/migration/multifd/tcp/plain/zstd /x86_64/migration/multifd/tcp/plain/zstd: OK real 0m1,073s user 0m0,999s sys 0m0,537s /x86_64/migration/multifd/tcp/tls/psk/match /x86_64/migration/multifd/tcp/tls/psk/match: OK real 0m1,453s user 0m2,475s sys 0m0,704s /x86_64/migration/multifd/tcp/tls/psk/mismatch /x86_64/migration/multifd/tcp/tls/psk/mismatch: OK real 0m0,905s user 0m1,256s sys 0m0,106s /x86_64/migration/multifd/tcp/tls/x509/default-host /x86_64/migration/multifd/tcp/tls/x509/default-host: OK real 0m3,761s user 0m5,874s sys 0m0,985s /x86_64/migration/multifd/tcp/tls/x509/override-host /x86_64/migration/multifd/tcp/tls/x509/override-host: OK real 0m3,238s user 0m4,794s sys 0m0,998s /x86_64/migration/multifd/tcp/tls/x509/mismatch-host /x86_64/migration/multifd/tcp/tls/x509/mismatch-host: OK real 0m0,851s user 0m1,007s sys 0m0,120s /x86_64/migration/multifd/tcp/tls/x509/allow-anon-client /x86_64/migration/multifd/tcp/tls/x509/allow-anon-client: OK real 0m2,607s user 0m3,530s sys 0m1,013s /x86_64/migration/multifd/tcp/tls/x509/reject-anon-client /x86_64/migration/multifd/tcp/tls/x509/reject-anon-client: OK real 0m1,915s user 0m3,223s sys 0m0,180s real 5m32,733s user 7m24,380s sys 1m50,801s Thomas ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown 2023-04-18 12:44 ` Thomas Huth @ 2023-04-18 13:19 ` Juan Quintela 2023-04-18 13:26 ` Thomas Huth 2023-04-18 14:53 ` Daniel P. Berrangé 0 siblings, 2 replies; 13+ messages in thread From: Juan Quintela @ 2023-04-18 13:19 UTC (permalink / raw) To: Thomas Huth; +Cc: qemu-devel, Paolo Bonzini, Laurent Vivier Thomas Huth <thuth@redhat.com> wrote: > On 18/04/2023 13.42, Juan Quintela wrote: >> Thomas Huth <thuth@redhat.com> wrote: >>> On 12/04/2023 16.19, Juan Quintela wrote: >>>> Since commit: >>>> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f >>>> Author: Dr. David Alan Gilbert <dgilbert@redhat.com> >>>> Date: Mon Mar 6 15:26:12 2023 +0000 >>>> tests/migration: Tweek auto converge limits check >>>> Thomas found an autoconverge test failure where the >>>> migration completed before the autoconverge had kicked in. >>>> [...] >>>> migration-test has become very slow. >>>> On my laptop, before that commit migration-test takes 2min10seconds >>>> After that commit, it takes around 11minutes >>>> We can't revert it because it fixes a real problem when the host >>>> machine is overloaded. See the comment on test_migrate_auto_converge(). >>> >>> Thanks, your patches decrease the time to run the migration-test from >>> 16 minutes down to 5 minutes on my system, that's a great improvement, >>> indeed! >>> >>> Tested-by: Thomas Huth <thuth@redhat.com> >> Thanks >> >>> (though 5 minutes are still quite a lot for qtests ... maybe some >>> other parts could be moved to only run with g_test_slow() ?) >> Hi >> Could you gime the output of: >> time for i in $(./tests/qtest/migration-test -l | grep "^/"); do >> echo $i; time ./tests/qtest/migration-test -p $i; done >> To see what tests are taking so long on your system? >> On my system (i9900K processor, i.e. not the latest) and >> auto_converge >> moved to slow the total of the tests take a bit more than 1 minute. > > This is with both of your patches applied: > /x86_64/migration/postcopy/plain > /x86_64/migration/postcopy/plain: OK > > real 0m35,446s > user 0m47,208s > sys 0m11,828s This is quite slower than on mine, basically almost all the code that does migration. $ time ./tests/qtest/migration-test -p /x86_64/migration/postcopy/plain # random seed: R02S42809b71f513e8524bd24df5facd5768 # Start of x86_64 tests # Start of migration tests # Start of postcopy tests # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-246853.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-246853.qmp,id=char0 -mon chardev=char0,mode=control -display none -accel kvm -accel tcg -name source,debug-threads=on -m 150M -serial file:/tmp/migration-test-1MGL31/src_serial -drive file=/tmp/migration-test-1MGL31/bootsect,format=raw -accel qtest # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-246853.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-246853.qmp,id=char0 -mon chardev=char0,mode=control -display none -accel kvm -accel tcg -name target,debug-threads=on -m 150M -serial file:/tmp/migration-test-1MGL31/dest_serial -incoming unix:/tmp/migration-test-1MGL31/migsocket -drive file=/tmp/migration-test-1MGL31/bootsect,format=raw -accel qtest ok 1 /x86_64/migration/postcopy/plain # End of postcopy tests # End of migration tests # End of x86_64 tests 1..1 real 0m1.104s user 0m0.697s sys 0m0.414s > /x86_64/migration/postcopy/recovery/plain > /x86_64/migration/postcopy/recovery/plain: OK > > real 0m34,707s > user 0m46,357s > sys 0m11,366s > /x86_64/migration/postcopy/recovery/tls/psk > /x86_64/migration/postcopy/recovery/tls/psk: OK > > real 0m33,052s > user 0m46,539s > sys 0m11,537s > /x86_64/migration/postcopy/preempt/plain > /x86_64/migration/postcopy/preempt/plain: OK > > real 0m35,107s > user 0m46,556s > sys 0m11,755s > /x86_64/migration/postcopy/preempt/recovery/plain > /x86_64/migration/postcopy/preempt/recovery/plain: OK > > real 0m35,329s > user 0m46,951s > sys 0m11,529s > /x86_64/migration/postcopy/preempt/recovery/tls/psk > /x86_64/migration/postcopy/preempt/recovery/tls/psk: OK > > real 0m36,237s > user 0m51,450s > sys 0m12,419s > /x86_64/migration/postcopy/preempt/tls/psk > /x86_64/migration/postcopy/preempt/tls/psk: OK > > real 0m35,033s > user 0m49,244s > sys 0m12,123s > /x86_64/migration/postcopy/tls/psk > /x86_64/migration/postcopy/tls/psk: OK > > real 0m36,097s > user 0m50,873s > sys 0m12,569s > real 5m32,733s > user 7m24,380s > sys 1m50,801s Ouch. Can I ask: - what is your machine? It is specially slow? Otherwise I want to know why it is happening. - as what is going slow to you is postcopy, can you told me what is this setting? # we want postcopy to work for normal users vm.unprivileged_userfaultfd = 1 And if it is not set, just change it and retest. Thanks, Juan. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown 2023-04-18 13:19 ` Juan Quintela @ 2023-04-18 13:26 ` Thomas Huth 2023-04-18 14:53 ` Daniel P. Berrangé 1 sibling, 0 replies; 13+ messages in thread From: Thomas Huth @ 2023-04-18 13:26 UTC (permalink / raw) To: quintela; +Cc: qemu-devel, Paolo Bonzini, Laurent Vivier On 18/04/2023 15.19, Juan Quintela wrote: > Thomas Huth <thuth@redhat.com> wrote: >> On 18/04/2023 13.42, Juan Quintela wrote: >>> Thomas Huth <thuth@redhat.com> wrote: >>>> On 12/04/2023 16.19, Juan Quintela wrote: >>>>> Since commit: >>>>> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f >>>>> Author: Dr. David Alan Gilbert <dgilbert@redhat.com> >>>>> Date: Mon Mar 6 15:26:12 2023 +0000 >>>>> tests/migration: Tweek auto converge limits check >>>>> Thomas found an autoconverge test failure where the >>>>> migration completed before the autoconverge had kicked in. >>>>> [...] >>>>> migration-test has become very slow. >>>>> On my laptop, before that commit migration-test takes 2min10seconds >>>>> After that commit, it takes around 11minutes >>>>> We can't revert it because it fixes a real problem when the host >>>>> machine is overloaded. See the comment on test_migrate_auto_converge(). >>>> >>>> Thanks, your patches decrease the time to run the migration-test from >>>> 16 minutes down to 5 minutes on my system, that's a great improvement, >>>> indeed! >>>> >>>> Tested-by: Thomas Huth <thuth@redhat.com> >>> Thanks >>> >>>> (though 5 minutes are still quite a lot for qtests ... maybe some >>>> other parts could be moved to only run with g_test_slow() ?) >>> Hi >>> Could you gime the output of: >>> time for i in $(./tests/qtest/migration-test -l | grep "^/"); do >>> echo $i; time ./tests/qtest/migration-test -p $i; done >>> To see what tests are taking so long on your system? >>> On my system (i9900K processor, i.e. not the latest) and >>> auto_converge >>> moved to slow the total of the tests take a bit more than 1 minute. >> >> This is with both of your patches applied: ... >> real 5m32,733s >> user 7m24,380s >> sys 1m50,801s > > Ouch. > > Can I ask: > - what is your machine? It is specially slow? It's a 4 year old T480s ThinkPad laptop. > Otherwise I want to know why it is happening. > > - as what is going slow to you is postcopy, can you told me what is this > setting? > > # we want postcopy to work for normal users > vm.unprivileged_userfaultfd = 1 $ sysctl vm.unprivileged_userfaultfd sysctl: cannot stat /proc/sys/vm/unprivileged_userfaultfd: No such file or directory > And if it is not set, just change it and retest. Seems like it is not available on RHEL 8 yet :-( Shall we maybe disable the postcopy tests if unprivileged_userfaultfd is not available? Thomas ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown 2023-04-18 13:19 ` Juan Quintela 2023-04-18 13:26 ` Thomas Huth @ 2023-04-18 14:53 ` Daniel P. Berrangé 1 sibling, 0 replies; 13+ messages in thread From: Daniel P. Berrangé @ 2023-04-18 14:53 UTC (permalink / raw) To: Juan Quintela; +Cc: Thomas Huth, qemu-devel, Paolo Bonzini, Laurent Vivier On Tue, Apr 18, 2023 at 03:19:33PM +0200, Juan Quintela wrote: > Thomas Huth <thuth@redhat.com> wrote: > > On 18/04/2023 13.42, Juan Quintela wrote: > >> Thomas Huth <thuth@redhat.com> wrote: > >>> On 12/04/2023 16.19, Juan Quintela wrote: > >>>> Since commit: > >>>> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f > >>>> Author: Dr. David Alan Gilbert <dgilbert@redhat.com> > >>>> Date: Mon Mar 6 15:26:12 2023 +0000 > >>>> tests/migration: Tweek auto converge limits check > >>>> Thomas found an autoconverge test failure where the > >>>> migration completed before the autoconverge had kicked in. > >>>> [...] > >>>> migration-test has become very slow. > >>>> On my laptop, before that commit migration-test takes 2min10seconds > >>>> After that commit, it takes around 11minutes > >>>> We can't revert it because it fixes a real problem when the host > >>>> machine is overloaded. See the comment on test_migrate_auto_converge(). > >>> > >>> Thanks, your patches decrease the time to run the migration-test from > >>> 16 minutes down to 5 minutes on my system, that's a great improvement, > >>> indeed! > >>> > >>> Tested-by: Thomas Huth <thuth@redhat.com> > >> Thanks > >> > >>> (though 5 minutes are still quite a lot for qtests ... maybe some > >>> other parts could be moved to only run with g_test_slow() ?) > >> Hi > >> Could you gime the output of: > >> time for i in $(./tests/qtest/migration-test -l | grep "^/"); do > >> echo $i; time ./tests/qtest/migration-test -p $i; done > >> To see what tests are taking so long on your system? > >> On my system (i9900K processor, i.e. not the latest) and > >> auto_converge > >> moved to slow the total of the tests take a bit more than 1 minute. > > > > This is with both of your patches applied: > > > > /x86_64/migration/postcopy/plain > > /x86_64/migration/postcopy/plain: OK > > > > real 0m35,446s > > user 0m47,208s > > sys 0m11,828s > > This is quite slower than on mine, basically almost all the code that > does migration. This is expected AFAIK. The migrate_postcopy_prepare method waits for 1 complete pre-copy pass to run at 3mbps, before switching to pre-copy mode. > > $ time ./tests/qtest/migration-test -p /x86_64/migration/postcopy/plain > # random seed: R02S42809b71f513e8524bd24df5facd5768 > # Start of x86_64 tests > # Start of migration tests > # Start of postcopy tests > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-246853.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-246853.qmp,id=char0 -mon chardev=char0,mode=control -display none -accel kvm -accel tcg -name source,debug-threads=on -m 150M -serial file:/tmp/migration-test-1MGL31/src_serial -drive file=/tmp/migration-test-1MGL31/bootsect,format=raw -accel qtest > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-246853.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-246853.qmp,id=char0 -mon chardev=char0,mode=control -display none -accel kvm -accel tcg -name target,debug-threads=on -m 150M -serial file:/tmp/migration-test-1MGL31/dest_serial -incoming unix:/tmp/migration-test-1MGL31/migsocket -drive file=/tmp/migration-test-1MGL31/bootsect,format=raw -accel qtest > ok 1 /x86_64/migration/postcopy/plain > # End of postcopy tests > # End of migration tests > # End of x86_64 tests > 1..1 > > real 0m1.104s > user 0m0.697s > sys 0m0.414s That is surprisingly fast - it is like it is not doing the pre-copy pass at all. > > real 5m32,733s > > user 7m24,380s > > sys 1m50,801s > > Ouch. > > Can I ask: > - what is your machine? It is specially slow? > Otherwise I want to know why it is happening. This matches what I see in my laptop - any test which runs a full pre-copy pass gets 30 seconds time added for this phase With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown 2023-04-18 10:59 ` [PATCH 0/2] tests/migration: Fix migration-test slowdown Thomas Huth 2023-04-18 11:42 ` Juan Quintela @ 2023-04-18 11:46 ` Juan Quintela 1 sibling, 0 replies; 13+ messages in thread From: Juan Quintela @ 2023-04-18 11:46 UTC (permalink / raw) To: Thomas Huth; +Cc: qemu-devel, Paolo Bonzini, Laurent Vivier Thomas Huth <thuth@redhat.com> wrote: > On 12/04/2023 16.19, Juan Quintela wrote: >> Since commit: >> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f >> Author: Dr. David Alan Gilbert <dgilbert@redhat.com> >> Date: Mon Mar 6 15:26:12 2023 +0000 >> tests/migration: Tweek auto converge limits check >> Thomas found an autoconverge test failure where the >> migration completed before the autoconverge had kicked in. >> [...] >> migration-test has become very slow. >> On my laptop, before that commit migration-test takes 2min10seconds >> After that commit, it takes around 11minutes >> We can't revert it because it fixes a real problem when the host >> machine is overloaded. See the comment on test_migrate_auto_converge(). > > Thanks, your patches decrease the time to run the migration-test from > 16 minutes down to 5 minutes on my system, that's a great improvement, > indeed! > > Tested-by: Thomas Huth <thuth@redhat.com> > > (though 5 minutes are still quite a lot for qtests ... maybe some > other parts could be moved to only run with g_test_slow() ?) And once that we are on this topic. Is there a way to launch several tests on the same binary on parallel? i.e. every migration thread uses a maximum of 2 cores, so in a server I can run several at the same time (I know that migration-test.c tests need to be modified so they don't interfere, but I have that changes on my tree), but I don't know of a way to launch them. Thanks, Juan. PD, and I don't know why launching a qemu is so slow, the minimal time that I am able to get for launching the two qemus is around 0.5 seconds. Later, Juan. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-04-21 17:22 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-04-12 14:19 [PATCH 0/2] tests/migration: Fix migration-test slowdown Juan Quintela 2023-04-12 14:20 ` [PATCH 1/2] tests/migration: Make precopy fast Juan Quintela 2023-04-18 11:53 ` Daniel P. Berrangé 2023-04-18 12:20 ` Juan Quintela 2023-04-21 17:22 ` Daniel P. Berrangé 2023-04-12 14:20 ` [PATCH 2/2] tests/migration: Only run auto_converge in slow mode Juan Quintela 2023-04-18 10:59 ` [PATCH 0/2] tests/migration: Fix migration-test slowdown Thomas Huth 2023-04-18 11:42 ` Juan Quintela 2023-04-18 12:44 ` Thomas Huth 2023-04-18 13:19 ` Juan Quintela 2023-04-18 13:26 ` Thomas Huth 2023-04-18 14:53 ` Daniel P. Berrangé 2023-04-18 11:46 ` Juan Quintela
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).