* [Qemu-devel] Testing migration under stress @ 2012-11-02 3:10 David Gibson 2012-11-02 12:12 ` Orit Wasserman ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: David Gibson @ 2012-11-02 3:10 UTC (permalink / raw) To: qemu-devel; +Cc: aik, quintela Asking for some advice on the list. I have prorotype savevm and migration support ready for the pseries machine. They seem to work under simple circumstances (idle guest). To test them more extensively I've been attempting to perform live migrations (just over tcp->localhost) which the guest is active with something. In particular I've tried while using octave to do matrix multiply (so exercising the FP unit) and my colleague Alexey has tried during some video encoding. However, in each of these cases, we've found that the migration only completes and the source instance only stops after the intensive workload has (just) completed. What I surmise is happening is that the workload is touching memory pages fast enough that the ram migration code is never getting below the threshold to complete the migration until the guest is idle again. Does anyone have some ideas for testing this better: workloads that are less likely to trigger this behaviour, or settings to tweak in the migration itself to make it more likely to complete migration while the workload is still active. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] Testing migration under stress 2012-11-02 3:10 [Qemu-devel] Testing migration under stress David Gibson @ 2012-11-02 12:12 ` Orit Wasserman 2012-11-05 0:30 ` David Gibson 2012-11-06 5:22 ` Alexey Kardashevskiy 2012-11-02 13:04 ` Paolo Bonzini 2012-11-02 13:07 ` Juan Quintela 2 siblings, 2 replies; 12+ messages in thread From: Orit Wasserman @ 2012-11-02 12:12 UTC (permalink / raw) To: David Gibson; +Cc: aik, qemu-devel, quintela On 11/02/2012 05:10 AM, David Gibson wrote: > Asking for some advice on the list. > > I have prorotype savevm and migration support ready for the pseries > machine. They seem to work under simple circumstances (idle guest). > To test them more extensively I've been attempting to perform live > migrations (just over tcp->localhost) which the guest is active with > something. In particular I've tried while using octave to do matrix > multiply (so exercising the FP unit) and my colleague Alexey has tried > during some video encoding. > As you are doing local migration one option is to setting the speed higher than line speed , as we don't actually send the data, another is to set high downtime. > However, in each of these cases, we've found that the migration only > completes and the source instance only stops after the intensive > workload has (just) completed. What I surmise is happening is that > the workload is touching memory pages fast enough that the ram > migration code is never getting below the threshold to complete the > migration until the guest is idle again. > The workload you chose is really bad for live migration, as all the guest does is dirtying his memory. I recommend looking for workload that does some networking or disk IO. Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I don't know if they run on pseries, but similar workload should be ok(small database/warehouse). We found out that SpecJbb on the other hand is hard to converge. Web workload or video streaming also do the trick. Cheers, Orit > Does anyone have some ideas for testing this better: workloads that > are less likely to trigger this behaviour, or settings to tweak in the > migration itself to make it more likely to complete migration while > the workload is still active. > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] Testing migration under stress 2012-11-02 12:12 ` Orit Wasserman @ 2012-11-05 0:30 ` David Gibson 2012-11-05 12:21 ` Orit Wasserman 2012-11-06 5:22 ` Alexey Kardashevskiy 1 sibling, 1 reply; 12+ messages in thread From: David Gibson @ 2012-11-05 0:30 UTC (permalink / raw) To: Orit Wasserman; +Cc: aik, qemu-devel, quintela On Fri, Nov 02, 2012 at 02:12:25PM +0200, Orit Wasserman wrote: > On 11/02/2012 05:10 AM, David Gibson wrote: > > Asking for some advice on the list. > > > > I have prorotype savevm and migration support ready for the pseries > > machine. They seem to work under simple circumstances (idle guest). > > To test them more extensively I've been attempting to perform live > > migrations (just over tcp->localhost) which the guest is active with > > something. In particular I've tried while using octave to do matrix > > multiply (so exercising the FP unit) and my colleague Alexey has tried > > during some video encoding. > As you are doing local migration one option is to setting the speed > higher than line speed , as we don't actually send the data, another > is to set high downtime. I'm not entirely sure what you mean by that. But I do have suspicions based on this and other factors that the default bandwidth it is limiting to is horribly, horribly low. > > However, in each of these cases, we've found that the migration only > > completes and the source instance only stops after the intensive > > workload has (just) completed. What I surmise is happening is that > > the workload is touching memory pages fast enough that the ram > > migration code is never getting below the threshold to complete the > > migration until the guest is idle again. > > > The workload you chose is really bad for live migration, as all the > guest does is dirtying his memory. Well, I realised that was true of the matrix multiply. For video encode though, the output data should be much, much smaller than the input, so I wouldn't expect it to be dirtying memory that fast. > I recommend looking for workload > that does some networking or disk IO. Vinod succeeded running > SwingBench and SLOB benchmarks that converged ok, I don't know if > they run on pseries, but similar workload should be ok(small > database/warehouse). We found out that SpecJbb on the other hand is > hard to converge. Web workload or video streaming also do the > trick. Hrm. As something really simple and stupid, I did try migrationg an ls -lR /, but even that didn't converge :/. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] Testing migration under stress 2012-11-05 0:30 ` David Gibson @ 2012-11-05 12:21 ` Orit Wasserman 2012-11-06 1:14 ` David Gibson 0 siblings, 1 reply; 12+ messages in thread From: Orit Wasserman @ 2012-11-05 12:21 UTC (permalink / raw) To: David Gibson; +Cc: aik, qemu-devel, quintela On 11/05/2012 02:30 AM, David Gibson wrote: > On Fri, Nov 02, 2012 at 02:12:25PM +0200, Orit Wasserman wrote: >> On 11/02/2012 05:10 AM, David Gibson wrote: >>> Asking for some advice on the list. >>> >>> I have prorotype savevm and migration support ready for the pseries >>> machine. They seem to work under simple circumstances (idle guest). >>> To test them more extensively I've been attempting to perform live >>> migrations (just over tcp->localhost) which the guest is active with >>> something. In particular I've tried while using octave to do matrix >>> multiply (so exercising the FP unit) and my colleague Alexey has tried >>> during some video encoding. > >> As you are doing local migration one option is to setting the speed >> higher than line speed , as we don't actually send the data, another >> is to set high downtime. > > I'm not entirely sure what you mean by that. But I do have suspicions > based on this and other factors that the default bandwidth it is > limiting to is horribly, horribly low. > >>> However, in each of these cases, we've found that the migration only >>> completes and the source instance only stops after the intensive >>> workload has (just) completed. What I surmise is happening is that >>> the workload is touching memory pages fast enough that the ram >>> migration code is never getting below the threshold to complete the >>> migration until the guest is idle again. >>> >> The workload you chose is really bad for live migration, as all the >> guest does is dirtying his memory. > > Well, I realised that was true of the matrix multiply. For video > encode though, the output data should be much, much smaller than the > input, so I wouldn't expect it to be dirtying memory that fast. > >> I recommend looking for workload >> that does some networking or disk IO. Vinod succeeded running >> SwingBench and SLOB benchmarks that converged ok, I don't know if >> they run on pseries, but similar workload should be ok(small >> database/warehouse). We found out that SpecJbb on the other hand is >> hard to converge. Web workload or video streaming also do the >> trick. > > Hrm. As something really simple and stupid, I did try migrationg an > ls -lR /, but even that didn't converge :/. That is strange, it should converge even with the defaults, Any special about your storage setup ? > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] Testing migration under stress 2012-11-05 12:21 ` Orit Wasserman @ 2012-11-06 1:14 ` David Gibson 0 siblings, 0 replies; 12+ messages in thread From: David Gibson @ 2012-11-06 1:14 UTC (permalink / raw) To: Orit Wasserman; +Cc: aik, qemu-devel, quintela On Mon, Nov 05, 2012 at 02:21:37PM +0200, Orit Wasserman wrote: > On 11/05/2012 02:30 AM, David Gibson wrote: > > On Fri, Nov 02, 2012 at 02:12:25PM +0200, Orit Wasserman wrote: > >> On 11/02/2012 05:10 AM, David Gibson wrote: > >>> Asking for some advice on the list. > >>> > >>> I have prorotype savevm and migration support ready for the pseries > >>> machine. They seem to work under simple circumstances (idle guest). > >>> To test them more extensively I've been attempting to perform live > >>> migrations (just over tcp->localhost) which the guest is active with > >>> something. In particular I've tried while using octave to do matrix > >>> multiply (so exercising the FP unit) and my colleague Alexey has tried > >>> during some video encoding. > > > >> As you are doing local migration one option is to setting the speed > >> higher than line speed , as we don't actually send the data, another > >> is to set high downtime. > > > > I'm not entirely sure what you mean by that. But I do have suspicions > > based on this and other factors that the default bandwidth it is > > limiting to is horribly, horribly low. > > > >>> However, in each of these cases, we've found that the migration only > >>> completes and the source instance only stops after the intensive > >>> workload has (just) completed. What I surmise is happening is that > >>> the workload is touching memory pages fast enough that the ram > >>> migration code is never getting below the threshold to complete the > >>> migration until the guest is idle again. > >>> > >> The workload you chose is really bad for live migration, as all the > >> guest does is dirtying his memory. > > > > Well, I realised that was true of the matrix multiply. For video > > encode though, the output data should be much, much smaller than the > > input, so I wouldn't expect it to be dirtying memory that fast. > > > >> I recommend looking for workload > >> that does some networking or disk IO. Vinod succeeded running > >> SwingBench and SLOB benchmarks that converged ok, I don't know if > >> they run on pseries, but similar workload should be ok(small > >> database/warehouse). We found out that SpecJbb on the other hand is > >> hard to converge. Web workload or video streaming also do the > >> trick. > > > > Hrm. As something really simple and stupid, I did try migrationg an > > ls -lR /, but even that didn't converge :/. > That is strange, it should converge even with the defaults, > Any special about your storage setup ? I didn't think so. Do you mean host or guest storage setup? -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] Testing migration under stress 2012-11-02 12:12 ` Orit Wasserman 2012-11-05 0:30 ` David Gibson @ 2012-11-06 5:22 ` Alexey Kardashevskiy 2012-11-06 6:55 ` David Gibson 2012-11-06 10:54 ` Orit Wasserman 1 sibling, 2 replies; 12+ messages in thread From: Alexey Kardashevskiy @ 2012-11-06 5:22 UTC (permalink / raw) To: Orit Wasserman; +Cc: quintela, qemu-devel, David Gibson On 02/11/12 23:12, Orit Wasserman wrote: > On 11/02/2012 05:10 AM, David Gibson wrote: >> Asking for some advice on the list. >> >> I have prorotype savevm and migration support ready for the pseries >> machine. They seem to work under simple circumstances (idle guest). >> To test them more extensively I've been attempting to perform live >> migrations (just over tcp->localhost) which the guest is active with >> something. In particular I've tried while using octave to do matrix >> multiply (so exercising the FP unit) and my colleague Alexey has tried >> during some video encoding. >> > As you are doing local migration one option is to setting the speed higher > than line speed , as we don't actually send the data, another is to set high downtime. > >> However, in each of these cases, we've found that the migration only >> completes and the source instance only stops after the intensive >> workload has (just) completed. What I surmise is happening is that >> the workload is touching memory pages fast enough that the ram >> migration code is never getting below the threshold to complete the >> migration until the guest is idle again. >> > The workload you chose is really bad for live migration, as all the guest does is > dirtying his memory. I recommend looking for workload that does some networking or disk IO. > Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I don't > know if they run on pseries, but similar workload should be ok(small database/warehouse). > We found out that SpecJbb on the other hand is hard to converge. > Web workload or video streaming also do the trick. My ffmpeg workload is simple encoding h263+ac3 to h263+ac3, 64*36 pixels. So it should not be dirtying memory too much. Or is it? (qemu) info migrate capabilities: xbzrle: off Migration status: completed total time: 14538 milliseconds downtime: 1273 milliseconds transferred ram: 389961 kbytes remaining ram: 0 kbytes total ram: 1065024 kbytes duplicate: 181949 pages normal: 97446 pages normal bytes: 389784 kbytes How many bytes were actually transferred? "duplicate" * 4K = 745MB? Is there any tool in QEMU to see how many pages are used/dirty/etc? "info" does not seem to have any kind of such statistic. btw the new guest did not resume (qemu still responds on commands) but this is probably our problem within "pseries" platform. What is strange is that "info migrate" on the new guest shows nothing: (qemu) info migrate (qemu) > Cheers, > Orit > >> Does anyone have some ideas for testing this better: workloads that >> are less likely to trigger this behaviour, or settings to tweak in the >> migration itself to make it more likely to complete migration while >> the workload is still active. >> > -- Alexey ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] Testing migration under stress 2012-11-06 5:22 ` Alexey Kardashevskiy @ 2012-11-06 6:55 ` David Gibson 2012-11-06 7:55 ` Alexey Kardashevskiy 2012-11-06 10:54 ` Orit Wasserman 1 sibling, 1 reply; 12+ messages in thread From: David Gibson @ 2012-11-06 6:55 UTC (permalink / raw) To: Alexey Kardashevskiy; +Cc: Orit Wasserman, qemu-devel, quintela On Tue, Nov 06, 2012 at 04:22:11PM +1100, Alexey Kardashevskiy wrote: > On 02/11/12 23:12, Orit Wasserman wrote: > >On 11/02/2012 05:10 AM, David Gibson wrote: > >>Asking for some advice on the list. > >> > >>I have prorotype savevm and migration support ready for the pseries > >>machine. They seem to work under simple circumstances (idle guest). > >>To test them more extensively I've been attempting to perform live > >>migrations (just over tcp->localhost) which the guest is active with > >>something. In particular I've tried while using octave to do matrix > >>multiply (so exercising the FP unit) and my colleague Alexey has tried > >>during some video encoding. > >> > >As you are doing local migration one option is to setting the speed higher > >than line speed , as we don't actually send the data, another is to set high downtime. > > > >>However, in each of these cases, we've found that the migration only > >>completes and the source instance only stops after the intensive > >>workload has (just) completed. What I surmise is happening is that > >>the workload is touching memory pages fast enough that the ram > >>migration code is never getting below the threshold to complete the > >>migration until the guest is idle again. > >> > >The workload you chose is really bad for live migration, as all the guest does is > >dirtying his memory. I recommend looking for workload that does some networking or disk IO. > >Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I don't > >know if they run on pseries, but similar workload should be ok(small database/warehouse). > >We found out that SpecJbb on the other hand is hard to converge. > >Web workload or video streaming also do the trick. > > > My ffmpeg workload is simple encoding h263+ac3 to h263+ac3, 64*36 > pixels. So it should not be dirtying memory too much. Or is it? Oh.. if your encoding the same format to the same format it may well be optimized and therefore memory limited. I was envisaging encoding an uncompressed format to a highly compressed format, which should be compute limited rather than memory bandwidth limited. The size and resolution of the input doesn't really matter as long as: * the output size is much smaller than the input size and * it takes several minutes for the full encode to give a reasonable amount of time for the migrate to converge. > > (qemu) info migrate > capabilities: xbzrle: off > Migration status: completed > total time: 14538 milliseconds > downtime: 1273 milliseconds > transferred ram: 389961 kbytes > remaining ram: 0 kbytes > total ram: 1065024 kbytes > duplicate: 181949 pages > normal: 97446 pages > normal bytes: 389784 kbytes > > How many bytes were actually transferred? "duplicate" * 4K = 745MB? > > Is there any tool in QEMU to see how many pages are used/dirty/etc? > "info" does not seem to have any kind of such statistic. > > btw the new guest did not resume (qemu still responds on commands) > but this is probably our problem within "pseries" platform. What is Uh, that's a bug, and I'm not sure when it broke. If the migrate isn't even working we're premature in attempting to work out why it isn't happening when we expect. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] Testing migration under stress 2012-11-06 6:55 ` David Gibson @ 2012-11-06 7:55 ` Alexey Kardashevskiy 0 siblings, 0 replies; 12+ messages in thread From: Alexey Kardashevskiy @ 2012-11-06 7:55 UTC (permalink / raw) To: David Gibson; +Cc: Orit Wasserman, qemu-devel, quintela On 06/11/12 17:55, David Gibson wrote: > On Tue, Nov 06, 2012 at 04:22:11PM +1100, Alexey Kardashevskiy wrote: >> On 02/11/12 23:12, Orit Wasserman wrote: >>> On 11/02/2012 05:10 AM, David Gibson wrote: >>>> Asking for some advice on the list. >>>> >>>> I have prorotype savevm and migration support ready for the pseries >>>> machine. They seem to work under simple circumstances (idle guest). >>>> To test them more extensively I've been attempting to perform live >>>> migrations (just over tcp->localhost) which the guest is active with >>>> something. In particular I've tried while using octave to do matrix >>>> multiply (so exercising the FP unit) and my colleague Alexey has tried >>>> during some video encoding. >>>> >>> As you are doing local migration one option is to setting the speed higher >>> than line speed , as we don't actually send the data, another is to set high downtime. >>> >>>> However, in each of these cases, we've found that the migration only >>>> completes and the source instance only stops after the intensive >>>> workload has (just) completed. What I surmise is happening is that >>>> the workload is touching memory pages fast enough that the ram >>>> migration code is never getting below the threshold to complete the >>>> migration until the guest is idle again. >>>> >>> The workload you chose is really bad for live migration, as all the guest does is >>> dirtying his memory. I recommend looking for workload that does some networking or disk IO. >>> Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I don't >>> know if they run on pseries, but similar workload should be ok(small database/warehouse). >>> We found out that SpecJbb on the other hand is hard to converge. >>> Web workload or video streaming also do the trick. >> >> >> My ffmpeg workload is simple encoding h263+ac3 to h263+ac3, 64*36 >> pixels. So it should not be dirtying memory too much. Or is it? > > Oh.. if your encoding the same format to the same format it may well > be optimized and therefore memory limited. No, it is not optimized, it still decodes and encodes as I inserted some filter in a chain. > I was envisaging encoding > an uncompressed format to a highly compressed format, which should be > compute limited rather than memory bandwidth limited. > The size and > resolution of the input doesn't really matter as long as: > * the output size is much smaller than the input size This is another scenario, I run both. I just tried to reduce memory consumption as it was recommended here and see if anything changes. Originally it was 1280*720 to 64*36 but I am not sure it does not use much memory as (I suspect at least sometime) ffmpeg may decode a series of full size frames to do motion detection or something. > and * it takes several minutes for the full encode to give a > reasonable amount of time for the migrate to converge. 90 seconds each file, if I run a script which does encoding in a loop, the pause between encodings is not big enough to finish migration anyway if I encode big video to small video. However if it is 64*36, migration finishes (the first qemu succeeds and stops) but the new guest does not resume. >> >> (qemu) info migrate >> capabilities: xbzrle: off >> Migration status: completed >> total time: 14538 milliseconds >> downtime: 1273 milliseconds >> transferred ram: 389961 kbytes >> remaining ram: 0 kbytes >> total ram: 1065024 kbytes >> duplicate: 181949 pages >> normal: 97446 pages >> normal bytes: 389784 kbytes >> >> How many bytes were actually transferred? "duplicate" * 4K = 745MB? >> >> Is there any tool in QEMU to see how many pages are used/dirty/etc? >> "info" does not seem to have any kind of such statistic. >> >> btw the new guest did not resume (qemu still responds on commands) >> but this is probably our problem within "pseries" platform. What is > > Uh, that's a bug, and I'm not sure when it broke. If the migrate > isn't even working we're premature in attempting to work out why it > isn't happening when we expect. Here I wanted to emphasize that I would like to find some way to get information about how migration is doing (or done) in the new guest - there is no statistic about it. -- Alexey ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] Testing migration under stress 2012-11-06 5:22 ` Alexey Kardashevskiy 2012-11-06 6:55 ` David Gibson @ 2012-11-06 10:54 ` Orit Wasserman 1 sibling, 0 replies; 12+ messages in thread From: Orit Wasserman @ 2012-11-06 10:54 UTC (permalink / raw) To: Alexey Kardashevskiy; +Cc: quintela, qemu-devel, David Gibson On 11/06/2012 07:22 AM, Alexey Kardashevskiy wrote: > On 02/11/12 23:12, Orit Wasserman wrote: >> On 11/02/2012 05:10 AM, David Gibson wrote: >>> Asking for some advice on the list. >>> >>> I have prorotype savevm and migration support ready for the pseries >>> machine. They seem to work under simple circumstances (idle guest). >>> To test them more extensively I've been attempting to perform live >>> migrations (just over tcp->localhost) which the guest is active with >>> something. In particular I've tried while using octave to do matrix >>> multiply (so exercising the FP unit) and my colleague Alexey has tried >>> during some video encoding. >>> >> As you are doing local migration one option is to setting the speed higher >> than line speed , as we don't actually send the data, another is to set high downtime. >> >>> However, in each of these cases, we've found that the migration only >>> completes and the source instance only stops after the intensive >>> workload has (just) completed. What I surmise is happening is that >>> the workload is touching memory pages fast enough that the ram >>> migration code is never getting below the threshold to complete the >>> migration until the guest is idle again. >>> >> The workload you chose is really bad for live migration, as all the guest does is >> dirtying his memory. I recommend looking for workload that does some networking or disk IO. >> Vinod succeeded running SwingBench and SLOB benchmarks that converged ok, I don't >> know if they run on pseries, but similar workload should be ok(small database/warehouse). >> We found out that SpecJbb on the other hand is hard to converge. >> Web workload or video streaming also do the trick. > > > My ffmpeg workload is simple encoding h263+ac3 to h263+ac3, 64*36 pixels. So it should not be dirtying memory too much. Or is it? > > (qemu) info migrate > capabilities: xbzrle: off > Migration status: completed > total time: 14538 milliseconds > downtime: 1273 milliseconds > transferred ram: 389961 kbytes > remaining ram: 0 kbytes > total ram: 1065024 kbytes > duplicate: 181949 pages > normal: 97446 pages > normal bytes: 389784 kbytes > > How many bytes were actually transferred? "duplicate" * 4K = 745MB? For duplicate we send one byte and those are usually zero pages + the page header. transferred is the actual amount of bytes sent so here is around 389M was sent. > > Is there any tool in QEMU to see how many pages are used/dirty/etc? sadly no. > "info" does not seem to have any kind of such statistic. > > btw the new guest did not resume (qemu still responds on commands) but this is probably our problem within "pseries" platform. What is strange is that "info migrate" on the new guest shows nothing: > > (qemu) info migrate > (qemu) > the "info migrate" command displays outgoing migration information not incoming .. > > > >> Cheers, >> Orit >> >>> Does anyone have some ideas for testing this better: workloads that >>> are less likely to trigger this behaviour, or settings to tweak in the >>> migration itself to make it more likely to complete migration while >>> the workload is still active. >>> >> > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] Testing migration under stress 2012-11-02 3:10 [Qemu-devel] Testing migration under stress David Gibson 2012-11-02 12:12 ` Orit Wasserman @ 2012-11-02 13:04 ` Paolo Bonzini 2012-11-02 13:07 ` Juan Quintela 2 siblings, 0 replies; 12+ messages in thread From: Paolo Bonzini @ 2012-11-02 13:04 UTC (permalink / raw) To: qemu-devel Il 02/11/2012 04:10, David Gibson ha scritto: > Asking for some advice on the list. > > I have prorotype savevm and migration support ready for the pseries > machine. They seem to work under simple circumstances (idle guest). > To test them more extensively I've been attempting to perform live > migrations (just over tcp->localhost) which the guest is active with > something. In particular I've tried while using octave to do matrix > multiply (so exercising the FP unit) and my colleague Alexey has tried > during some video encoding. > > However, in each of these cases, we've found that the migration only > completes and the source instance only stops after the intensive > workload has (just) completed. What I surmise is happening is that > the workload is touching memory pages fast enough that the ram > migration code is never getting below the threshold to complete the > migration until the guest is idle again. > > Does anyone have some ideas for testing this better: workloads that > are less likely to trigger this behaviour, or settings to tweak in the > migration itself to make it more likely to complete migration while > the workload is still active. Have you set the migration speed (migrate_set_speed) to something higher than the default 32MB/sec? Paolo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] Testing migration under stress 2012-11-02 3:10 [Qemu-devel] Testing migration under stress David Gibson 2012-11-02 12:12 ` Orit Wasserman 2012-11-02 13:04 ` Paolo Bonzini @ 2012-11-02 13:07 ` Juan Quintela 2012-11-05 0:31 ` David Gibson 2 siblings, 1 reply; 12+ messages in thread From: Juan Quintela @ 2012-11-02 13:07 UTC (permalink / raw) To: David Gibson; +Cc: aik, qemu-devel David Gibson <david@gibson.dropbear.id.au> wrote: > Asking for some advice on the list. > > I have prorotype savevm and migration support ready for the pseries > machine. They seem to work under simple circumstances (idle guest). > To test them more extensively I've been attempting to perform live > migrations (just over tcp->localhost) which the guest is active with > something. In particular I've tried while using octave to do matrix > multiply (so exercising the FP unit) and my colleague Alexey has tried > during some video encoding. > > However, in each of these cases, we've found that the migration only > completes and the source instance only stops after the intensive > workload has (just) completed. What I surmise is happening is that > the workload is touching memory pages fast enough that the ram > migration code is never getting below the threshold to complete the > migration until the guest is idle again. > > Does anyone have some ideas for testing this better: workloads that > are less likely to trigger this behaviour, or settings to tweak in the > migration itself to make it more likely to complete migration while > the workload is still active. You can: migrate_set_downtime 2s (or so) I normally run stress, and you move the memory that it dirties until it converges (depends a lot of your networking). Doing anything that is really memory intensive is basically never gonig to converge. Later, Juan. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] Testing migration under stress 2012-11-02 13:07 ` Juan Quintela @ 2012-11-05 0:31 ` David Gibson 0 siblings, 0 replies; 12+ messages in thread From: David Gibson @ 2012-11-05 0:31 UTC (permalink / raw) To: Juan Quintela; +Cc: aik, qemu-devel On Fri, Nov 02, 2012 at 02:07:45PM +0100, Juan Quintela wrote: > David Gibson <david@gibson.dropbear.id.au> wrote: > > Asking for some advice on the list. > > > > I have prorotype savevm and migration support ready for the pseries > > machine. They seem to work under simple circumstances (idle guest). > > To test them more extensively I've been attempting to perform live > > migrations (just over tcp->localhost) which the guest is active with > > something. In particular I've tried while using octave to do matrix > > multiply (so exercising the FP unit) and my colleague Alexey has tried > > during some video encoding. > > > > However, in each of these cases, we've found that the migration only > > completes and the source instance only stops after the intensive > > workload has (just) completed. What I surmise is happening is that > > the workload is touching memory pages fast enough that the ram > > migration code is never getting below the threshold to complete the > > migration until the guest is idle again. > > > > Does anyone have some ideas for testing this better: workloads that > > are less likely to trigger this behaviour, or settings to tweak in the > > migration itself to make it more likely to complete migration while > > the workload is still active. > > You can: > > migrate_set_downtime 2s (or so) > > I normally run stress, and you move the memory that it dirties until it > converges (depends a lot of your networking). So, I'm using tcp to localhost, so it should be really fast, but it doesn't seem to be :/. I suspect there are some other bugs here. > Doing anything that is really memory intensive is basically never gonig > to converge. Well, I didn't think the loads I chose would be memory limited (especially the video encode), but.. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-11-06 10:54 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-02 3:10 [Qemu-devel] Testing migration under stress David Gibson 2012-11-02 12:12 ` Orit Wasserman 2012-11-05 0:30 ` David Gibson 2012-11-05 12:21 ` Orit Wasserman 2012-11-06 1:14 ` David Gibson 2012-11-06 5:22 ` Alexey Kardashevskiy 2012-11-06 6:55 ` David Gibson 2012-11-06 7:55 ` Alexey Kardashevskiy 2012-11-06 10:54 ` Orit Wasserman 2012-11-02 13:04 ` Paolo Bonzini 2012-11-02 13:07 ` Juan Quintela 2012-11-05 0:31 ` David Gibson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).